On today’s show, Kyle speaks with Keyon Vafa, a computer science Ph.D. student at Columbia University. Keyon’s interest broadly lies in developing machine learning models that are applicable in the field of social science. He discusses his latest research developing a machine-learning model for career prediction.
He started by discussing the challenge of getting large datasets in social science fields. Keyon mentioned using two popular longitudinal survey datasets: the National Longitudinal Survey of Youth (NLSY) and the Panel Study of Income Dynamics (PSID). They, however, were still small for his building a predictive model.
Keyon discussed how he incorporated a resume dataset that contained the resumes of about 25 million American workers in his work. He also shared how he preprocessed the data into a cleaner format. Using transfer learning, Keyon discussed the process of training his model to learn from the large resume data and make predictions from the smaller survey dataset.
Keyon’s machine learning model called CAREER was also inspired by the popular Transformers for NLP applications. He shared his tweaks that made the transformer model suitable for job prediction. Keyon also mentioned the possibility of using the Markov and bag-of-jobs (similar to the bag-of-words) models. He mentioned that his CAREER model did not only learn the representation of individual jobs but also the representation of histories. Thus, enabling it to group workers with similar histories.
Keyon discussed the evaluation metrics for his model. He discussed how the metric works and why it fits his application well. He also explained the best-case scenario for getting the best possible prediction of his model. Rounding up, he touched on other things he is working on. You can follow Keyon on Twitter @keyonV.
I am a computer science PhD student at Columbia University, where I am advised by David Blei. My research develops machine learning methodology to uncover insights into human behavior in labor economics and political science, among other fields in the social sciences. Methodologically, I focus on NLP, probabilistic modeling, and causal inference.