Censorship’s Implications for Artificial Intelligence

In this webinar, Jeffrey and Molly will discuss a recent paper Molly co-authored with Eddie Yang. Abstract: While artificial intelligence provides the backbone for many tools people use around the world, recent work has brought attention to the potential biases that may be baked into these algorithms. While most work in this area has focused on the ways in which these tools can exacerbate existing inequalities and discrimination, we bring to light another way in which algorithmic decision making may be affected by institutional and societal forces. We study how censorship has affected the development of Wikipedia corpuses, which are in turn regularly used as training data that provide inputs to NLP algorithms. We show that word embeddings trained on the regularly censored Baidu Baike have very different associations between adjectives and a range of concepts about democracy, freedom, collective action, equality, and people and historical events in China than its uncensored counterpart Chinese language Wikipedia. We examine the origins of these discrepancies using surveys from mainland China and their implications by examining their use in downstream AI applications.