8.1 Identification of the Model without Heterogeneity
The McCall (1970) Search Model that we introduced as a prototype model is an excellent introduction to the identification of structural models. A classic treatment of identification of this model is provided by Flinn and Heckman (1982).
8.1.1 Data
Identification arguments always begin with an assumption on available data. Even though the model is dynamic, we will show that all we really need is a single cross-section of data:
\[ (E_{n},t_{U,n},W_{n})_{n=0}^{N} \]
where :
\(E_{n}\in\{0,1\}\) indicates the employment status of individual \(n\)
If the individual is employed (\(E_{n}=1\)) we see the wage, \(W_{n}\), which is otherwise assumed to be missing.
If the individual is unemployed (\(E_{n}=0\)), then we see the duration of unemployment \(t_{U,n}\), which is otherwise assumed to be missing.
Example: CPS Data
Example 8.1 (Reading the data) Let’s take a look at data from the CPS on wages, employment status, and labor market transitions. Here is code to read in the data:
As you can see from the preview of the data, the data is taken from January-March 2018. Here is a quick snippet of code to see how many observations we have on average per person:
@chain data begingroupby(:CPSIDP)@combine:T =length(:EMPSTAT)@combine:average =mean(:T) :frac_panel =mean(:T.>1)end
1×2 DataFrame
Row
average
frac_panel
Float64
Float64
1
1.83191
0.582806
So we see that more than half of the individuals in this sample can be found in more than one month of the data.
The @chain macro comes from the package DataFramesMeta and is a convenient syntax for composing operations into one block. For example:
@chain x beginfunc1(y1)func2(y2)func3(y3)end
is equivalent to
func3(func2(func1(x,y1),y2),y3)
Example 8.2 (Calculating some moments) You may find the codebook useful for understanding particular variables. We have already limited the data to individuals who are working (EMPSTAT=10), have a job but did not work last week (EMPSTAT==12), or are unemployed (EMPSTAT==21).
Suppose we wanted to use the panel dimension to measure transition rates. Here is a simple way to do that by simply measuring transitions between January and Feburary.
data[!,:E] .= data.EMPSTAT.<21#<- code the employment variabledata_jan =@chain data begin@subset:MONTH.==1@select:CPSIDP :AGE :SEX :EDUC :RACE :E@rename:E_lag =:Eenddata_merged =@chain data begin@subset:MONTH.==2@select:CPSIDP :Einnerjoin(data_jan,on=:CPSIDP)end
41262×7 DataFrame
41237 rows omitted
Row
CPSIDP
E
AGE
SEX
EDUC
RACE
E_lag
Int64
Bool
Int64
Int64
Int64
Int64
Bool
1
20161200000201
true
72
1
81
100
true
2
20180100000301
true
66
1
111
100
true
3
20180100000302
true
61
2
111
100
true
4
20170100000901
true
23
2
124
100
true
5
20170100000902
true
24
2
124
100
true
6
20170100001001
true
59
2
111
200
true
7
20170100001002
true
53
1
81
200
true
8
20171200001201
true
24
2
73
200
true
9
20161200000801
true
60
1
124
100
true
10
20161200000802
true
57
2
123
100
true
11
20170100001401
false
50
2
73
200
false
12
20170100001403
true
18
1
81
200
true
13
20170100001405
true
29
1
50
200
true
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
41251
20161107451001
true
59
1
92
100
true
41252
20161107451901
true
59
1
91
100
true
41253
20171107237801
true
45
1
91
100
true
41254
20171107237802
true
37
2
123
100
true
41255
20171207232201
true
41
1
111
100
true
41256
20171207232202
true
41
2
73
100
true
41257
20161107452301
true
38
1
73
100
true
41258
20161107452302
true
29
2
73
100
true
41259
20170107445501
true
41
1
73
100
true
41260
20171207232601
true
42
1
111
100
true
41261
20171207232602
true
43
2
123
100
true
41262
20171207232604
true
17
1
60
100
true
So now we can calculate the overall transition rate out of unemployment:
So here we’re estimating a very low separation rate and a pretty high hazard rate out of unemployment.
Example 8.3 (Observable heterogeneity) Next we’ll define a very simple education classification (Bachelor’s degree or not) and race classification (white vs non-white), and use groupby to calculate rates separately by demographics:
What do these differences in transition rates tell you about how we should extend the simple model with homogenous parameters?
8.1.2 Steady State
A key assumption for identification in this case is that the economy is in steady state. Let \(U_{t}\) be fraction of individuals that are unemployed at time \(t\). Let \(p_{\tau,t}\) be the fraction of individuals at time \(t\) with unemployment duration \(\tau\). In general, these objects evolve according to the rules:
\[\begin{eqnarray}
U_{t+1} = (1-h)U_{t} + \delta (1-U_{t+1}) \\
p_{\tau+1,t+1} = (1-h)p_{\tau,t} \\
p_{0,t+1} = \delta (1-U_{t})
\end{eqnarray}\] where \(h\) is the “hazard rate” of exiting unemployment, \(\lambda(1-F_{W}(w^*))\). Enforcing that these objects are constant between \(t\) and \(t+1\) (the steady state assumption) gives:
and so the hazard rate \(h\) can be inferred from the distribution of unemployment durations.
Next, notice that the probability of being unemployed is
\[ \mathbb{P}(E=0) = \frac{\delta}{\delta+h} \]
and so \(\delta\) is given by \(h\) (which we now know) and the fraction of unemployed.
Finally, we see that
\[ \mathbb{P}(W|E=1) = F_{W}(W|W>w^*) \]
Observed wages are equal to the offer distribution conditional on the wage offer being acceptable. This tells us that, although the conditional distribution can be identifiable, we do not know the distribution of wage offers that are never accepted (those below \(w^*\)). Let \(\underline{w}\) be the lower bound on the support of the sampling distribution \(\mathbb{P}\). Clearly, \(w^*\) is identified since:
\[ w^* = \underline{w} \]
Can we identify the deeper structural parameters, \(b\), \(\lambda\), and \(\beta\)? Not quite. Let’s rewrite the reservation wage equation as:
\[ w^* = b + \beta h\int_{w^*}\frac{1-F(W|W>w^*)}{1-\beta(1-\delta)}dw \]
and we can see that infinitely many combinations of \(b\) and \(\beta\) can rationalize the same reservation wage. Assuming a plausible value for \(\beta\) (you will learn that this is common across many structural estimation exercises), \(b\) is identified by this equation.
8.2 Credibility and a Policy Counterfactual
Suppose we used this model to forecast the effect of a tax credit \(\tau\) that is proportional to earnings. Under the counterfactual the value function \(V\) becomes:
and so we see that the effect of the subsidy in the model is equivalent to reducing the flow utility of unemployment.
It is simple to forecast the effect of the policy counterfactual: we take our estimates and re-calculate the reservation wage. However notice that since the reservation wage will decrease in response to the tax credit, our forecast of the effect depends on an arbitrary parametric assumption that we had to make in order to identify the distribution of wages below the reservation wage \(w^*\).
Things look slightly better if we were to impose a tax (\(\tau<0\)) since this at would use information about the wage distribution that is directly observable, but the underlying identifying assumption is particularly stark: it says that the policy’s effect can essentially be inferred from the cross-sectional distribution of wages without any existing policy variation to speak of.
Discussion
Does this seem like a credible way to identify the causal impact of the tax subsidy?
Note
One thing to note about this discussion: while the initial identification discussion seems intuitive and reasonable, it takes a new light once we arrive that the specific research question of interest. Conceptually it is important to view the strengths and weaknesses of identification through the lens of desired counterfactuals.
8.3 Identification of the model with an exclusion restriction
Suppose that we have access to a variable \(Z\) that enters the model only by moving the flow utility from unemployment. We can re-write the reservation wage equation as:
We’ll refer to \(Z\) as an excluded variable because it effects selection into and out of jobs without moving the distribution of offered wages. Let \(\underline{w}(z)\) be the lower bound on the support of the conditional sampling distribution of wages \(\mathbb{P}(W|E=1,Z)\) and let \(\underline{w}_{F}\) be the lower bound on the support of the offer distribution \(F_{W}\).
As before, we know that \(w^*(z) = \underline{w}(z)\). Now suppose there is sufficient movement in \(Z\) such that the set
The unconditional distribution \(F_{W}\) is non-parametrically identified by simply looking at the distribution of wages for values of \(Z\) where the lower bound of the support achieves this minimum:
Identification of all other parameters can be pursued non-parametrically as a function of \(z\).
\[ F_{W}(w) = \mathbb{P}(w | E = 1, z),\ z \in \underline{\mathcal{Z}} \]
Note
Forecasting the effect of the policy now makes use of articulated variation. The model casts variation that scales wages as being equivalent to variation in the value of unemployment \(b\), for which we now have a source. This means that we can recreate the experiment by interpolating existing variation in \(b\).
8.3.1 Further identification by functional form
Suppose that \(b(z)\) takes a linear form:
\[ b(z) = b_0 + b_1 z. \]
In this case, for three values (\(z,z',z''\)) we can write:
which is sufficient to identify the two unknowns in the equation, \(\beta\) and \(b_1\). Thus, \(\beta\) is identified by additional functional form restrictions.
Discussion
Notice that now the policy counterfactual is given by additional parameters that attempt to match how wages and hazard rates respond to existing policy variation.
How do you feel about this approach relative to the case without variation?
How do you feel about the model’s key implication that variation in \(b\) is equivalent to variation in \(\tau\)?
What would be an alternative and potentially preferable source of variation?
8.3.2 Identification with known scale restrictions
There are cases where it would be reasonable to assume a known relationship \(b(z)\). For example, if \(Z\) is a random policy variable that tells us the generosity of welfare payments. To be consistent with our specification of utility while employed, we should then write:
\[ b(z) = b_0 + Z \]
and here \(\beta\) would be identified.
Note
Not only is \(\beta\) identified in this case, but we can see that it is a key parameter for matching the effect of \(Z\), and hence will be a key parameter in determining the response of reservation wages and hazard rates to the policy.
8.3.3 The Whether vs How of Identification
Let’s now assume that we have access to two instruments, \((Z_{1},Z_{2})\). The former shifts \(b\), as in the previous example, while the latter shifts wages. \(Z_{2}\) could be a plausible instrument for labor demand, or if we wanted to think of \(W\) as net wages, \(Z_{2}\) could represent a policy variable that sets marginal tax rates. Let’s assume that:
Fixing \(Z_{2}\), let \(Z_{1}\) have the same support conditions as in the previous section, implying that \(F_{W}(W|Z_2)\) is identified for all \(Z_{2}\) in an appropriately defined subset of its support. Notice that the location of the distribution of \(F_{\omega}\) and the location of \(\mu\) are not separately identified. Hence, we are free to normalize \(\mathbb{E}[\omega] = 0\) such that \(\log(\mu(Z)) = \mathbb{E}[\log(W)|Z_2]\), and hence \(\mu(z)\) is non-parametrically identified by the conditional mean of \(\log(W)\) when \(Z_{1}\) is in the region of it’s support where \(\omega^*(Z_1,Z_2)\leq \underline{\omega}\).
In order to introduce the whether versus the how of identification, let us begin with the observation that the model is over-identified: we only need a subset of the available data to identify the model. For example, note that hazard rates out of unemployment are:
As we now know, the hazard rate \(h(z_1,z_2)\) can be identified directly by the distribution of durations, conditional on \(z_1,z_2\). But notice that \(h(z'_1,z'_2)/h(z_1,z_2) = (1-F_\omega(\omega^*(z_1,z_2)))/(1-F_{\omega}(\omega^*(z_1,z_2)))\) where the right hand side consists of objects that are directly identified from wage data.
Now you can go one of two ways: you can enrich the model based on the observation that you have many “spare” features of the data that could be used to identify this extra richness. Or you could decide that your model is sufficiently detailed to answer your question of interest (shout out Marschak (1953)), and choose the estimation approach that makes your answer most credible. Since the model is undoubtedly misspecified, you will have to make peace with the idea that your model will not match every feature of the data simultaneously. This is the how of identification: which features of the data will you use to identify and estimate your model to make the answer to your question of interest most credible?
The “how” of identification
When your model is over-identified, which features of the data will you use to identify and estimate your model?
Here are two potential tensions from our example:
As we discussed above: the shape of the hazard function is identified directly from wage data. Hazard rates are important because they determine unemployment rates, and their responsiveness to \(Z_1\) and \(Z_2\) determine the how unemployment rates shift with these variables.
The reservation wage equation above shows that the model treats changes in \(Z_1\) and \(Z_2\) through the same mechanism. The model requires that variation in these two variables have an identical effect on reservation wages (and hence hazard rates). This is a strong restriction that would likely not hold in extended versions of the model. Would we want identification to rely on this property if we didn’t have to?
How should we approach these tensions? Just as Marschak (1953) did, let’s return to the question of interest. We want to evaluate a policy counterfactual that varies marginal tax rates. Wouldn’t this be most convincing if we show that the model parameters have been chosen to replicate the effects of pre-existing variation that is functionally identical? Some options along these lines:
[Minimum Distance] Choose structural parameters to match — along with more fundamental moments — moments based on the joint distribution of unemployment durations and \(z_2\).
[Indirect Inference] Compute some quasi-experimental estimates of the effect of \(Z_2\) on wages and hazard rates, and
[Maximum Likelihood] Maximize the log-likelihood, and validate ex-post that the model can fit observed changes in the hazard rate with respect to \(Z_2\), or replicate the results of a quasi-experimental study. This would show that even though the model is overi-identified, it is still using the sources of variation that we view conceptually as being most appropriate.
In the next section of this course, we will review the asymptotic properties of these estimators and discuss practical issues related to implementation.
Discussion
How do you think approach compares to the case where we did not have plausible exogenous variation in wages?
How do you think it compares to the case where we had exogenous variation in tax rates explicitly (rather than intepolating via articulated variation: wage rates and marginal tax rates have identical effects).
8.4 Identification with unobserved heterogeneity
Now suppose that we allow for \(K\)unobserved types with population proportions given by \(\pi=\{\pi_{k}\}_{k=1}^{K}\) that enter the model through the value of unemployment. Assume that we have an instrument \(Z\) that shifts the flow value of unemployment in a known way:
\[ b_{k}(Z) = b_{k} + Z \]
such that we get \(K\) latent reservation wages:
\[ w^*_{k}(z) = b_{k} + Z + \frac{\beta\lambda}{1-\beta(1-\delta)}\int_{w^*_{k}(z)}[1-F_{W}(w)]dw \]
This gives a vector of latent hazard rates \(h_{k}(z)\) for all values of the instrument \(z\). The distribution of hazard rates is itself a mixture:
Heckman and Singer (1984) show that, for a fixed \(K\), the parameters \((\tilde{\pi}_{k},h_{k}(z))\) are identified from this distribution of durations.
Exercise
Exercise 8.1 Complete the identification argument for this model by following these steps:
Argue that the vector \(\pi\) and \(\delta\) can be inverted from \(\tilde{\pi}\) and the unemployment rate \(U(z)\).
State an assumption on the support of \(Z\) such that \(F_{W}\) is identified in some part of the support of \(Z\).
Argue that \(\lambda\) is identified for \(Z\) in this same region of the support.
Argue that each \(w_{k}^*(z)\) is identified from \(h_{k}(z)\) and \(F_{W}\).
Use the reservation wage equation to argue that each \(b_{k}\) and \(\beta\) is identified.
Flinn, Christopher, and James Heckman. 1982. “New Methods for Analyzing Structural Models of Labor Force Dynamics.”Journal of Econometrics 18 (1): 115–68.
Heckman, James, and Burton Singer. 1984. “A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data.”Econometrica: Journal of the Econometric Society, 271–320.
Marschak, Jacob. 1953. “Economic Measurements for Policy and Prediction.” In Studies in Econometric Method, edited by W. Hood and C. Koopmans. John Wiley & Sons.
McCall, John Joseph. 1970. “Economics of Information and Job Search.”The Quarterly Journal of Economics 84 (1): 113–26.