This paper studies the identification of the coefficients in a linear equation when data on the outcome, covariates, and an error-laden proxy for a latent variable are available. We maintain the classical error-in-variables assumptions and relax the assumption that the proxy is excluded from the outcome equation. This enables the proxy to directly impact the outcome and allows for differential measurement error. Without the exclusion restriction, we first show that the coefficients on the latent variable, the proxy, and the covariates are not identified. Then, we derive the sharp identification regions for these coefficients under either or both of two auxiliary restrictions. The first restriction weakens the assumption of “no measurement error” by imposing an upper bound on the net of the covariates “noise to signal” ratio, i.e. the ratio of the variance of the measurement error to the variance of the latent variable given the covariates. The second restriction weakens the proxy exclusion restriction by specifying whether the latent variable and its proxy affect the outcome in the same or the opposite direction, if at all. Using the College Scorecard data, we employ this framework to study the financial returns to college selectivity and characteristics. Here, college selectivity, defined as the average SAT score of a student cohort, serves as a proxy for the latent average scholastic ability and is included in the average earnings equation. We obtain an informative upper bound on the return to college selectivity which becomes smaller upon conditioning on the instructional expenditures per student and the completion rate. Further, we obtain tight bounds on the returns to the college characteristics and find that conditioning on the composition of majors reduces the magnitude of the bounds on the effect of some of these characteristics, such as the gender composition.
Link to paper: