We have discussed in the earlier blog post about the endogeneity problem (see What is instrumental variable? and Endogeneity and use of instrumental variable). In these two blog posts, I simply listed out the cause of endogeneity in the regression model and introduced IV as a remedial measure. Now, in this blog post, I will demonstrate why endogeneity is actually a problem.
Let's recall the causes of endogeneity in our model. First, simultaneous causality bias, Second, omitted variable bias, and third, the existence of covariates.
Let's consider a simple regression model.
Y=Xβ+e−−−i
Existence of simultaneity
If our model suffers from simultaneity, then the stuff on the RHS in one equation must show up in the LHS in the other equation(s), that is,
Y=Xβ+e−−−i
X=Yα+ZΓ−−−ii
then,
Substituting equation(i) into equation(ii) yields
X=(Xβ+e)α+ZΓ
X=Xβα+ZΓ+eα
X−Xβα=ZΓ+eα
X=[I−βα](−1)(ZΓ+eα)−−−iii
In equation(iii), X is correlated with e. This means that E[X,e]≠0. Hence, the fundamental OLS assumption is violated.
Omitted variable bias
The omitted variable bias occurs when we fail to include a relevant variable that is correlated with an independent variable (s) in our model. We considered the regression model as Y=Xβ+e−i. However, the true model is Y=Xβ+ZΓ+e−iv. Nevertheless, we estimate based on equation(i).
We know,
βOLS=(XTX)(−1)(XTY)
The expectation of the OLS estimator is:
E[βOLS]=E[(XTX)(−1)(XTY)]
E[βOLS]=E[(XTX)(−1)[XT(Xβ+ZΓ+e)]]
E[βOLS]=E[[(XTX)(−1)(XTX)β]+[(XTX)(−1)(XT(ZΓ+e))]]
E[βOLS]=E[β+[(XTX)(−1)(XT(ZΓ+e))]
E[βOLS]=E[β+[(XTX)(−1)((XTZΓ)+(XTe))]
E[βOLS]=E[β+[(XTX)(−1)(XTZΓ)]+[(XTX)(−1)(XTe)]
E[βOLS]=β+[(XTX)(−1)(XTZΓ)]+[(XTX)(−1)E(XTe)]
Since, E(XTe)=0
E[βOLS]=β+[(XTX)(−1)(XTZΓ)]−−−v
Equation(v) provides an interesting conclusion. e, the error term, is exogenous to X. The ˆβtextOLS is biased if E[XTZ]≠0. If Z is a random variable, then E[XTZ]=0. Thus, only correlated missing variables are a problem. We should not worry about missing variables that are uncorrelated to X. If X and Z are correlated then the OLS estimator is comprised of two terms added together: (1) the true coefficient on X and (2) the marginal effect of X on ZΓ.
To be continued...
Post a Comment