Project abstract

This project simultaneously addresses two problems: 1) the inability of community-based and non-profit organizations to tackle data science problems; and 2) the lack of real world experience gained by students studying data science. The increased availability of data, combined with increased computing power at lower costs, has brought to the desktop tremendous analytical and problem solving capabilities. Yet many organizations are not able to take advantage of these developments because they often lack appropriate staffing to wrestle with complex data science problems. Meanwhile, as students increasingly gravitate toward data science programs, much of their course-based problem solving experience focuses on clean problems with simple data sets. This leaves them unprepared for the reality of the data science applications they will face in professional settings. This project addresses both issues by deploying teams of data science students to assist local organizations, thereby increasing the long-term capacity of the data science workforce.

This is a multifaceted project that will provide immediate impact to local organizations and long-term benefit for students through valuable hands-on data science experience. There are two major components of the proposed project. First, Data Science WAV teams of four specially-trained undergraduate students will be deployed to community-based organizations to Wrangle, Analyze, and Visualize their data. Second, this project will offer summer faculty development workshops designed to help new instructors, especially those at community colleges, teach data science at their institutions. Curricular innovations that bring experiential data science learning into the curriculum will lead to sustained impact at the partnering academic institutions and in the larger Pioneer Valley region. This proposal is diverse across both institutions and student populations. It comprises one major research university (The University of Massachusetts, Amherst), four liberal arts colleges (Amherst, Hampshire, Mount Holyoke, and Smith), and three local community colleges (Greenfield, Holyoke, and Springfield Technical). The inclusion of two women’s colleges (Smith and Mount Holyoke) and two Hispanic-serving institutions (Holyoke and Springfield Technical) will help ensure that a diverse student population is engaged in the project.

About the Data Science Corps

NSF’s Harnessing the Data Revolution Data Science Corps program focuses on building capacity for harnessing the data revolution at the local, state, national, and international levels to help unleash the power of data in the service of science and society. Projects in this program are being jointly funded by the NSF’s Harnessing the Data Revolution Big Idea; the Directorate for Computer and Information Science and Engineering, Division of Information and Intelligent Systems; the Directorate for Education and Human Resources, Division of Undergraduate Education; the Directorate for Mathematical and Physical Sciences, Division of Mathematical Sciences; and the Directorate for Social, Behavioral and Economic Sciences, Office of Multidisciplinary Activities and Division of Behavioral and Cognitive Sciences.

Community projects

Please see our GitHub Organization for more information about our projects.


Students are building visualizations to assist planners and inform decisions about how to expand the program most appropriately and best serve under-resources neighborhoods and communities.

Western Mass Health Equity Network

Students have researched and used publicly available date to build a dashboard interface, facilitating the retrieval of multi-year data on health characteristics by neighborhood in Springfield.

Girls Inc of the Pioneer Valley

Students are working on a map for the organization to share with potential donors to demonstrate the accumulation of risk factors in the physical environment (air quality, water pollution, etc.) that face their target population on a daily basis


Students are working to make data accessible by converting an archive of pdf grant applications to text, automating the loading of the files to a database, and making them searchable.

See poster

The Nature Conservancy

Students are working to automate the import and analysis of patterns in images from wildlife cameras to assess whether a particular image includes an animal or not. The metadata for the images are read from a file and added to a growing database of available images.