Data Science Symposium

Opportunities for Massachusetts Community Colleges

Benjamin S. Baumer and Nicholas J. Horton

Monday, June 13, 2022



  • Wrangle, Analyze, Visualize

The Data Science Corps WAV

  • Experiential Learning with Local Community Organizations

  • 4 years, 9 institutions, $1.2 million budget

  • 25-30 students per semester doing projects, 5 cohorts begun January 2020

  • working in parallel with three community colleges

  • Thanks to NSF #1924017 and #1923388 for funding this work

DSC-WAV goals

  1. Workforce development for undergraduates
    • focus on diversifying the workforce
    • 74% student participants female
    • ~46% student participants non-white
  2. Building the pipeline

Goal 1: Partners

Goal 1: Papers

Goal 2

  • building data science programs at community colleges


  1. Motivate the importance of data science
  2. Familiarize institutional leadership with options
  3. Prepare faculty to teach data science
  4. Identify barriers to pursuing data science programs
  5. Share resources and best practices
  6. Explore next steps

Community Colleges

“the keystone for the nation’s plan to help more people earn a postsecondary credential.”

–Blumenstyk (2021)

  • substantial fraction of undergraduate students
  • only affordable game in town
  • critical to develop an educated workforce
  • more representative of the population
  • need for flexible and innovative articulation (transfer)

NASEM (2018)

  • definition of data acumen
  • importance of community colleges
  • cooperation with four-year institutions

2YCDS Summit (2019)

  • What is different at community colleges?
  • How can we support faculty development, curricular initiatives

Transfer pathways

Goal: A2B pathways on website

  • Mapped A2B pathways for data science

  • BHCC -> UMD?

  • HCC -> UMA?


  • Need a student-centered approach to foster success
  • Need to support associate’s to workforce
  • Need to support associate’s to transfer (today’s focus)
  • What are the points of friction we need to smooth or eliminate?

“The whitepaper”

Five points of curricular friction

  1. A first course in data science (subject of FacDev22 Workshop)
  2. A second course in data science ()
  3. A course in scientific computing, data science workflow, and/or reproducible computing (some courses and modules exist, can they be mapped?)
  4. Lab sciences (mapped courses exist, but are they appropriate/optimal?)
  5. Communication, ethics, and application domain (mapped gen ed courses exist, more work needed)

SWOT analysis

  • Faculty engagement
  • Rising tide of data science
  • Complexity
  • Siloing
  • Employment
  • This hasn’t been done before!
  • If we don’t act, chaos ensues and students suffer!

Let’s get started!

1: A first course in data science

In order for data science transfer pathways to work, community colleges must offer a first course in data science.

“To prepare their graduates for this new data-driven era, academic institutions should encourage the development of a basic understanding of data science in all undergraduates.”

–NASEM (2018)

2: A second course in data science

Cultivating a rich facility in data science requires repeated exposure: a single course is not sufficient for students to develop mastery.

  • bachelor’s programs in data science typically include a second course in data science, often taken during the sophomore year.

  • reinforce and extend fundamental skills in data wrangling, data visualization, statistical modeling, and predictive analytics.

3: A course in scientific computing, data science workflow, and reproducible computing

A generic bachelor’s program in data science will include explicit instruction in how to advance science by computing with data in a reproducible, collaborative workflow.

4: Lab science

Many of the existing STEM transfer options require two semesters of lab sciences as a component of their general education requirements.

At present, many of these courses may be less germane for data science students, but there is considerable potential for them to reinforce and build basic data sciences skills for all students while building domain knowledge.

As an alternative to explore, we can imagine that a future data science infused lab course could be developed as a way to provide more exposure to key data science topics while meeting the learning outcomes for a lab course.

5: Communication, ethics, and application domains

Bachelor’s programs in data science include training in communication and ethics (what responsibilities to data scientists have to their users, customers, and society as a whole?, see Baumer et al, 2022).

Our recommendation is that institutions think carefully and holistically about how requirements for communication, ethics, and domain application can be used to accrue credits at community colleges and foster successful transfers.