Exploring Narratives of University Students of Rural Backgrounds in China via Natural Language Processing: Using User Posts on Social Media

The massification of higher education in China has extended its boundaries in access and taken in a greater number of underprivileged rural students (Levin & Xu, 2005; Mok & Wu, 2016). Both access to and students’ experiences in higher education play a critical role in examining HE equality (Marginson, 2018b). Rural students’ experiences in higher education have attracted increasing attention in recent years, although the student population has often been portrayed as ‘deficient’ or ‘incompetent’ in the existing literature (Cheng, 2018). Such a deficit paradigm largely neglects student agency and the enabling role of HE (Cheng, 2018; Marginson, 2018a). The widening urban-rural divide, massive internal migration, and the massification, increasing competition and stratification of the higher education system in China have created a more challenging external environment for students coming from underprivileged rural backgrounds to grow and thrive in higher education. This has also urged the need for a better understanding of how these university students describe their experiences in higher education, which would help to disentangle sociological issues related to higher education equality, societal changes, and social stratification. This paper aims to explore the narrative description of student experiences by using large-scale social media data and empowering underprivileged rural student voices. It also aims to exemplify the effective use of Natural Language Processing (NLP) methods to analyse large-scale textual data to help us better understand a more diverse and complete picture of the student population and empower student voices.