Repeat Sales Price Indexes
Repeat sales indexes are estimated by analyzing data where all units have sold at least twice. Such data allow us to annualize the percentage growth in sales prices over time. These are time-series indexes in their pure form. They do not provide information on the value of individual house characteristics or on price levels. They have the advantage of being based on actual transaction prices, and in principle allow us to sidestep the problem of omitted variable bias. However, units that sell are not necessarily representative of all units. Sometimes it's difficult to tell whether a unit retains the same characteristics across time. For example, remodeling could change a house’s characteristics.
The best way to understand how repeat sales indexes work is to look at an example. Figure 2.15 shows a graph of 17 properties that sold twice in the Shorewood Hills neighborhood of Madison, Wisconsin, in the late 1980s and early 1990s. Each property is numbered from 1 to 17, and each property appears twice. The Y-axis is the logarithm of the selling price of the unit.
We can think of the repeat sales estimator as an attempt to measure the average slope of the lines in Figure 2.15, year by year. In a classic paper, Bailey, Muth, and Nourse (1963) illustrated how to compute this using regression methods and a larger sample.
One way to motivate the actual technique used to construct the repeat sales index is to start by reconsidering the hedonic model. Consider a simple semilog hedonic equation
ln P = Xb + b1D1 + b2D2 + b3D3 + b4D4
where P is the value or rent for the unit, and where the vector X includes all the relevant characters, including a constant term; and the time dummies Di represent periods that follow the initial base case period.
The vector X represents a list of housing and neighborhood characteristics that would enter a hedonic equation. The vector D is a series of dummy variables representing the time periods under consideration. These could be months, quarters, or years, depending upon the type of data at hand.
Consider a house, “A,” that sells in periods 2 and 4 (period 0 is the base year). In period 2, we calculate:
ln PA2 = Xb + b1D1 + b2D2 + b3D3 + b4D4
= Xb + b2D2
since D1, D3, and D4 = 0. And of course, by similar reasoning, in period 4:
ln PA4 = Xb+ b4D4
Then, by subtraction, we find:
ln PA4 - ln PA2 = Xb + b4D4 - Xb - b2D2
= b4D4 - b2D2
This is for a representative housing unit that sells twice. Given a sample of such units, we want, in effect, the “average” b4 and b2. (Recall that regression is, in effect, estimating a series of conditional means.) Clearly, by subtraction, the characteristics vector drops out, as do the dummy variables for periods in which no transaction takes place.
Case-Shiller actually does a better job of keeping quality constant than OFHEO, but OFHEO has more coverage (I think). I will explain why in another post.