Skip to main content

Making Meaningful Impact: Using Data Science for Social Good

Media Inquiries
Peter Kerwin
University Communications & Marketing

Abandoned buildings in disrepair pose a safety hazard and can have adverse effects on the structural integrity of adjacent residences — especially among the row homes that comprise the majority of housing units in Baltimore, Maryland. Neighbors deal with rat infestations, have difficulty getting insurance and experience damage to their own homes because of being attached to structures with severe roof damage.

These challenges are occurring at a citywide scale, where the Baltimore City Department of Housing and Community Development(opens in new window) (DHCD) is tasked with assessing 15,000 vacant homes to identify and remediate roof damage. The problem is complex, systemic and formidable.

Enter the Data Science for Social Good (DSSG) Summer Fellowship(opens in new window) at Carnegie Mellon University.

A group photo

The Data Science for Social Good Summer Fellows take a group photo on Carnegie Mellon's Pittsburgh campus.

Baltimore's DCHD partnered with Carnegie Mellon's DSSG to improve community safety and economic well-being by remediating buildings with roof damage(opens in new window). Aspiring data scientists from the DSSG team identified hazardous structures with roof damage, then prioritized the most urgent needs for preventative interventions. Team member Chae Won Lee, a graduate student at the University of Washington, said one significant challenge was determining from the ground level whether a roof had damage. A second was the scope of work, with so many vacant homes in Baltimore to assess.

Lee and her project teammates, Justin Clark, of Harvard University, and Jonas Coelho de Barros, of FGV EBAPE — Escola Brasileira de Administraão Pública e de Empresas, created a successful system that used machine learning (ML) to assign a roof damage score to each address. Incorporating data that included aerial images of the entire city, manual visual assessments of historical aerial inspections, housing inspection notes, details from 311 citizen's hotline calls, and other information provided by the city, the team developed an artificial intelligence (AI) system that effectively identified and prioritized structures with the most significant roof damage.

The prioritized list allows city inspectors to be more efficient and more equitable by focusing on buildings with actual damage across neighborhoods and communities that are most impacted by this problem. The list can be regenerated each year with minimal manual effort. The system is more effective than relying on human observation in accurately identifying roof damage. Finally, the model eliminates potential bias by identifying roof damage equitably across neighborhoods. Ultimately, their solution has the potential to improve the lives of people in 5,000 households on city blocks with damaged roofs.

The DHCD recently garnered an innovation award(opens in new window) for the project's impact.

The Baltimore roof initiative is just one example of the impact DSSG and CMU are having on communities locally, nationally and internationally. In another project, DSSG Fellows worked to improve call routing for 988(opens in new window), the 988 Suicide & Crisis Lifeline (formerly known as the National Suicide Prevention Lifeline).

An estimated 50 million people in the United States live with mental illness. 988 Suicide & Crisis Lifeline receives more than two million calls each year, which are routed to about 200 call centers around the country.

Tejumade Afonja, of Saarland University; Charles Cui, of Northwestern University; Paula Subías-Beltrán, of the University of Barcelona; and Irene Tang of the University of Chicago worked with Vibrant Emotional Health to address lengthy wait times for the Lifeline. Subías-Beltrán said that ideally, the team would need to know the current capacity of each call center, the current wait time for each call center, and the length of time a caller is willing to wait — but none of that data was available to the network because of its distributed nature.

The team worked with the data available in the system to determine an alternative routing approach based on where each call came from, the call center where the calls were routed, the wait times and whether the call was answered. They were able to create a model that predicted the likelihood that a call would be picked up at a specific call center at a given time. The team's model has the potential to be better than the approach the organization had been using, and allowed the team to build a new routing simulator that can increase the connection rate for callers. That improvement means thousands of additional callers seeking mental health assistance may get the support they need in time. The change will ultimately save lives.

How DSSG Came To Be

Rayid Ghani(opens in new window), Distinguished Career Professor in the School of Computer Science's Machine Learning Department(opens in new window) and the Heinz College of Information Systems and Public Policy(opens in new window) at CMU, created DSSG because he was looking to bridge a gap — for himself and for his students.

Rayid Ghani

"The intersection of what I cared about and what I was good at — that's the work I really wanted to do," Ghani said. As chief data scientist for the Obama 2012 campaign, Ghani had experienced what it felt like to do work that made an impact on society.

He had an "aha moment" in 2013 during a talk to a group of CMU graduate students in ML.

"I was trying to tell them about the intersection of ML and social issues," Ghani said. "What I expected was that they knew about the social problems but didn't find them interesting. What I heard that was a little bit surprising was that they didn't realize there was this intersection, and that we could do something about those problems with these skills."

At the same time, Ghani wondered why data and evidence were not used more often in government to solve societal problems. In talking with colleagues at government agencies and nonprofits who worked on social issues, Ghani consistently heard one of three explanations. Some individuals were familiar with the concepts of ML and AI, but were not sure exactly how they could be used to address specific issues. Another group understood the capabilities of AI, but lacked staff skilled in using it. Finally, some leaders had both comprehension and staff, but were without ML and AI tools designed for their specific needs.

The opportunity was ripe for partnership, and Ghani embraced it. He launched the Data Science for Social Good Initiative in 2013, while working at the University of Chicago.

The program has been replicated at the University of Washington (2015), Stanford University (2019), Georgia Institute of Technology (2019), and Imperial College of London (2019), among others.

DSSG at CMU: Multidisciplinary and Focused on Ethics

When Ghani returned to CMU — his alma mater — in 2019 to teach, he brought the DSSG initiative with him. DSSG Fellows spend 12 weeks working with nonprofits and government agencies to tackle problems affecting real communities. Their innovative solutions have real and significant impact.

Following a pause resulting from the pandemic, the first class of 24 DSSG Fellows at CMU completed six projects in 2022.

Though the projects ranged from reducing the risk of homelessness in Pittsburgh to improving patient care in Pakistani emergency rooms, the approach to each included some common elements.

Among those: The projects are problem-driven. Operational challenges are identified through collaboration with project partners and community members. Project teams work closely with those directly involved with and affected by the problem as they strategize and implement solutions.

Possibly the most important component is using the lens of ethics to approach every issue.

"It's less about ethics as a course or a lecture," Ghani said. Instead, he explained, it's about consistently considering the ethical implications of every decision. "What design choices are we making? What are the possible consequences of those choices downstream in three months or six months?"

Finally, project teams are interdisciplinary. Teams consisted of individuals from different backgrounds, including computer science, ML, AI, statistics, math, economics, public policy, sociology, psychology, engineering and physical sciences.

"None of these complex problems can be solved by any discipline alone," Ghani said.

— Related Content —