Prediction in Social Science: A Tool to Study Inequality in Populations

Joins us at 4 p.m. Wednesday, March 3, 2021 for a virtual Info Sci Colloquium led by Ian Lundberg, a Ph.D. candidate in Sociology and Social Policy at Princeton. Lundberg will present, "Prediction in Social Science: A Tool to Study Inequality in Populations."

Ian Lundberg is a PhD candidate in sociology and social policy at Princeton University. His research explores how computational tools can change the way we study inequality. This work has overturned interpretations of social mobility over multiple generations, visualized intergenerational mobility in new ways, documented the prevalence of housing eviction among children born in large U.S. cities, and measured the predictability of life outcomes through a scientific mass collaboration. The common thread through these projects is a belief that we can produce conceptually precise and meaningful summaries of inequality by deploying technology to ask research questions in new ways. You can read more at ianlundberg.org.

Talk: "Prediction in Social Science: A Tool to Study Inequality in Populations"

Abstract: Predictive algorithms could transform methodology in social science, yet the mapping between prediction and scientific knowledge is not always clear. This talk will address three uses of prediction: (1) predicting outcomes for individual people, (2) predicting unobserved factual outcomes to describe populations, and (3) predicting counterfactual outcomes for causal claims. Even if we cannot predict well for individuals (1), predictive algorithms that are correct on average can support important aggregate claims (2 and 3). An approach to computational social science that emphasizes carefully chosen aggregate claims creates opportunities for engagement in both social science and data science.

These opportunities are especially pronounced for causal claims. Disparities across social categories (e.g. Black and white) are important aggregate quantities. Academics and policymakers would also like to know the degree to which disparities would close under interventions (e.g. expand college, reduce incarceration, desegregate occupations). Those are causal quantities that I term gap-closing estimands. Learning a gap-closing estimand from data requires causal identification assumptions and an algorithm to predict the outcome each person would realize if exposed to the treatment. Looking forward, the gap-closing estimand is only one of many possible advances in knowledge that become possible by combining insights from social science and data science to produce new population-level claims about inequality.