Tuesday, December 23, 2014

The limits of knowledge in economics, Part II

Everyone--and I mean everyone--who does empirical analysis should read Charles Manski's Identification Problems in the Social Sciences (I am happy to say I took advanced econometrics from Manski when I was a Ph.D. student at Wisconsin).  The book reminds us that we are constantly relying on unstated assumptions when we do statistical analysis, and we need to do a better job of stating them.

The focus of this post involves a simple issue: extrapolation.  Let me show two graphs from Manski's chapter:


Suppose one wanted to infer y based on x.  Obviously, as the sample size gets larger, the confidence interval gets smaller, for the set of x that we are able to observe.  Note that in this instance, however, no x between 4 and 6 is sampled.

Most empirical analysis would simply assume that E(y|x=5) is some smooth function that gives weights to E(y|x=4) and E(y|x=6).  Put in English, one would just draw some sort of line between the x,y relation at x=4 and the x,y relation at x=6, and read off an x,y relation for x=5.

But doing this involves an important assumption: that y doesn't go flying off in one direction or another at x=5.  We actually cannot know this, because we have no observations at x=5; indeed, maybe the reason we never observe x=5 is because y is highly unstable at that point.  Just as problematic (perhaps more so) is predicting y when x > 9.

I am pretty sure that it is hard to go a day without reading something that involves someone extrapolating outside the support of observed data.  Sometimes it is necessary to do this, but when we do, we should always say so.


No comments:

Post a Comment