From Job Descriptions to Occupations: Using Neural Language Models to Code Job Data


See more information at https://metrics-and-models.github.io/!

Occupation is a fundamental concept in social and policy research, but classifying job descriptions into occupational categories can be challenging and susceptible to errors. Traditionally, this involved expert manual coding, translating detailed, often ambiguous job descriptions to standardized categories, a process both laborious and costly. However, recent advances in computational techniques offer efficient automated coding alternatives. Existing autocoding tools, including the O*NET-SOC AutoCoder, the NIOCCS AutoCoder, and the SOCcer AutoCoder, rely on supervised machine learning methods and string-matching algorithms. Yet these autocoders are not designed to understand semantic meanings in occupational write-in text. We explore the use of Large Language Models (LLMs) for classifying jobs into Standard Census occupations. We evaluate and compare the prediction performance of LLMs using four different approaches: zero-shot learning, few-shot learning, chain-of-thought, and fine-tuning. The results show a wide range of autocoding accuracy rates, varying from 7.1% to 78%. Drawing from Census expert coding practices, we provide practical recommendations for using LLMs in occupational classification for sociological research. We demonstrate LLM applications for coding resume data, processing survey occupational write-ins, and converting international occupational classifications to U.S. standards.