Text Scaling for Open-Ended Survey Responses

Will Hobbs is a new assistant professor in the Department of Human Development at Cornell. He studies politics and health, especially the social effects of government actions and how small groups of people adapt to sudden changes in their lives. His recent projects have studied the development of public attitudes toward the Affordable Care Act, how social networks heal after a death, and unintended consequences of online censorship in China. His work has been published in the American Political Science Review, the American Journal of Political Science, the Journal of Politics, and the Proceedings of the National Academy of Sciences, among other outlets. It has also been covered in popular media, including The Atlantic, Science Magazine, The New York Times, the Los Angeles Times, and The Wall Street Journal. Before coming to Cornell, he was a postdoctoral fellow at Northeastern University's Network Science Institute. He has a PhD in Political Science from the University of California, San Diego.

Talk: Text Scaling for Open-Ended Survey Responses

Abstract: Open-ended survey responses contain valuable information about public opinions, but can consist of only a handful of words. This succinctness makes them hard to summarize, especially when the texts are based on common words and have little elaboration. Although some methods are commonly used to analyze these data, their many researcher degrees of freedom limit use in small samples. This presentation describes a simple, singular value decomposition based method to estimate compact word representations in these contexts. Intuitively, the method scores all words relative to common words, so that we are able to find variation in common word use when text responses are not sophisticated. Usefully, the implementation identifies the common words on which its output is based and we can use these as keywords to interpret the dimensions of the text summaries. In small data sets, a straightforward preprocessing step can bring in information from pretrained word embeddings for better keyword identification. I apply the method to open-ended survey responses on attitudes toward the Affordable Care Act, student loans, disability and unemployment insurance, food stamps, and positive activities in daily life to evaluate whether the method produces compact, meaningful text dimensions. Unlike comparison unsupervised techniques, the top dimensions produced by this method are also the best predictors of issue attitudes, vote choice, and, for positive activities in daily life, early death.