Causal Representation Learning with Generative Artificial Intelligence: Application to Images and Texts as Treatments

OxTalks is Changing

OxTalks will soon be transitioning to Oxford Events (full details are available on the Staff Gateway). A two-week publishing freeze is expected to start before the end of Hilary Term to allow all future events to be migrated to the new platform. During this period, you will not be able to submit or edit events on OxTalks. The exact freeze dates will be confirmed on the Staff Gateway and via email to identified OxTalks users.

If you have any questions, please contact halo@digital.ox.ac.uk

Causal Representation Learning with Generative Artificial Intelligence: Application to Images and Texts as Treatments

In this talk, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like images and texts, by leveraging the power of generative Artificial Intelligence. Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics of texts, from other possibly unknown confounding features. Unlike the existing methods, our proposed approach eliminates the need to learn causal representation from the data and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings, in which the treatment feature is based on human perception rather than is assumed to be fixed given the treatment object. The proposed methodology is also applicable to text or image reuse where an LLM is used to regenerate the existing texts and images. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama~3, to illustrate the advantages of our estimator over the state-of-the-art causal representation learning algorithms. The paper is available at imai.fas.harvard.edu/research/LLM.html

Date: 7 February 2025, 14:15
Venue:
Manor Road Building
Manor Road OX1 3UQ
See location on maps.ox

Details: Seminar Room C
Speaker: Kosuke Imai (Harvard University)
Organising department: Department of Economics
Part of: Nuffield Econometrics Seminar
Booking required?: Not required
Audience: Members of the University only
Editor: Edward Valenzano