Baćak, V., & Kennedy, E. H. (2019). Principled machine learning using the super learner: An application to predicting prison violence. Sociological Methods & Research, 48(3), 698-721
A rapidly growing number of algorithms are available to researchers who apply statistical or machine learning methods to answer social science research questions. The unique advantages and limitations of each algorithm are relatively well known, but it is not possible to know in advance which algorithm is best suited for the particular research question and the data set at hand. Typically, researchers end up choosing, in a largely arbitrary fashion, one or a handful of algorithms. In this article, we present the Super Learner—a powerful new approach to statistical learning that leverages a variety of data-adaptive methods, such as random forests and spline regression, and systematically chooses the one, or a weighted combination of many, that produces the best forecasts. We illustrate the use of the Super Learner by predicting violence among inmates from the 2005 Census of State and Federal Adult Correctional Facilities. Over the past 40 years, mass incarceration has drastically weakened prisons’ capacities to ensure inmate safety, yet we know little about the characteristics of prisons related to inmate victimization. We discuss the value of the Super Learner in social science research and the implications of our findings for understanding prison violence.