We have discussed in the earlier blog post about the endogeneity problem (see What is instrumental variable? and Endogeneity and use of instrumental variable). In these two blog posts, I simply listed out the cause of endogeneity in the regression model and introduced IV as a remedial measure. Now, in this blog post, I will demonstrate why endogeneity is actually a problem.
Let's recall the causes of endogeneity in our model. First, simultaneous causality bias, Second, omitted variable bias, and third, the existence of covariates.
Let's consider a simple regression model.
$Y=X\beta+e---i$
Existence of simultaneity
If our model suffers from simultaneity, then the stuff on the RHS in one equation must show up in the LHS in the other equation(s), that is,
$Y=X\beta+e---i$
$X=Y\alpha+Z\Gamma---ii$
then,
Substituting $equation (i)$ into $equation (ii)$ yields
$X=(X\beta+e)\alpha+Z\Gamma$
$X=X\beta\alpha+Z\Gamma+e\alpha$
$X-X\beta\alpha = Z\Gamma+e\alpha$
$X=[I-\beta\alpha]^(-1)(Z\Gamma+e\alpha)---iii$
In $equation (iii)$, $X$ is correlated with $e$. This means that $E[X,e]\ne0$. Hence, the fundamental OLS assumption is violated.
Omitted variable bias
The omitted variable bias occurs when we fail to include a relevant variable that is correlated with an independent variable (s) in our model. We considered the regression model as $Y=X\beta+e-i$. However, the true model is $Y=X\beta+Z\Gamma+e-iv$. Nevertheless, we estimate based on $equation (i)$.
We know,
$\beta_\text{OLS}=(X^TX)^(-1)(X^TY)$
The expectation of the OLS estimator is:
$E[\beta_\text{OLS}]=E[(X^TX)^(-1)(X^TY)]$
$E[\beta_\text{OLS}]=E[(X^TX)^(-1)[X^T(X\beta+Z\Gamma+e)]]$
$E[\beta_\text{OLS}]=E[[(X^TX)^(-1)(X^TX)\beta]+[(X^TX)^(-1)(X^T(Z\Gamma+e))]]$
$E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^T(Z\Gamma+e))]$
$E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)((X^TZ\Gamma)+(X^Te))]$
$E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)(X^Te)]$
$E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)E(X^Te)]$
$\text{Since, } E(X^Te)=0$
$E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]---v$
$Equation (v)$ provides an interesting conclusion. $e$, the error term, is exogenous to $X$. The $\hat\beta_text{OLS}$ is biased if $E[X^TZ]\ne0$. If $Z$ is a random variable, then $E[X^TZ]=0$. Thus, only correlated missing variables are a problem. We should not worry about missing variables that are uncorrelated to $X$. If $X$ and $Z$ are correlated then the OLS estimator is comprised of two terms added together: (1) the true coefficient on X and (2) the marginal effect of $X$ on $Z\Gamma$.
$\text{To be continued...}$
Post a Comment