## Tuesday, December 23, 2014

### The limits of knowledge in economics, Part II

Everyone--and I mean everyone--who does empirical analysis should read Charles Manski's Identification Problems in the Social Sciences (I am happy to say I took advanced econometrics from Manski when I was a Ph.D. student at Wisconsin).  The book reminds us that we are constantly relying on unstated assumptions when we do statistical analysis, and we need to do a better job of stating them.

The focus of this post involves a simple issue: extrapolation.  Let me show two graphs from Manski's chapter:

Suppose one wanted to infer y based on x.  Obviously, as the sample size gets larger, the confidence interval gets smaller, for the set of x that we are able to observe.  Note that in this instance, however, no x between 4 and 6 is sampled.

Most empirical analysis would simply assume that E(y|x=5) is some smooth function that gives weights to E(y|x=4) and E(y|x=6).  Put in English, one would just draw some sort of line between the x,y relation at x=4 and the x,y relation at x=6, and read off an x,y relation for x=5.

But doing this involves an important assumption: that y doesn't go flying off in one direction or another at x=5.  We actually cannot know this, because we have no observations at x=5; indeed, maybe the reason we never observe x=5 is because y is highly unstable at that point.  Just as problematic (perhaps more so) is predicting y when x > 9.

I am pretty sure that it is hard to go a day without reading something that involves someone extrapolating outside the support of observed data.  Sometimes it is necessary to do this, but when we do, we should always say so.