6 How and Why to Use Models
Before we dive into methods, it will be helpful to review a number of important use cases for quantitative economic models. Why use a model? This might seem like a strange question to devote time to, given that there is no such thing as economics without them. Still, when you are out in the world presenting your research, you may sometimes encounter this question. Indeed, you will read many important and useful applied papers that have no need of an economic model, so it’s not so surprising to think that there are research questions that do not demand a structural estimation exercise. What is my model for? How is it central to answering my research question?
6.1 When you possibly don’t need a model.
A lot of students, when they write their first paper, end up posing simple questions like “What is the effect of policy \(X\) on outcome \(Y\)”? Often, answering these questions requires nothing more than simple statistical models of causality (such as the Potential Outcomes Model or the Generalized Roy Model, which formalize causality in terms of potential outcomes).
In particular, if your question concerns the causal effect of an observed historical change in some variable, you likely won’t anything more elaborate than these simple frameworks. Consider below three examples of quasi-experimental variation from our prototype models that answer particular research questions without needing specification of the underlying models.
6.1.2 Firm Entry
Consider the firm entry model. Suppose you have panel data \((A_{m,t,1},A_{m,t,2},Z_{m,t})_{m=1,t=1}^{M,T}\) on a set of markets (indexed by \(m\)) in a number of periods (indexed by \(t\)). Recall that \(A_{m,t,j}\) indicates whether firm \(j\) is active in market \(m\) at time \(t\), and firm entry occurs when \(A_{m,t+1,j}-A_{m,t,j}=1\). Recall that \(X_{m}\) is a a market-level factor that is unobservable here. Let \(Z\in\{0,1\}\) indicate the presence of a policy that applies locally to market \(m\). For the purposes of this example, let’s say it’s a local minimum wage policy. To incorporate the policy \(Z\), suppose we write:
\[ u_{1}(x,a,d') = \phi_{0} + \phi_{1}x - \phi_{2}d' - \phi_{3}(1-a) + \phi_{4}z \]
so that \(\phi_{4}\) embodies the effect of the policy on payoffs for the firm. Finally, for simplicity, let’s assume that \(Z_{m,1}=0\) for all states initially, and in period \(t^*\) a subset of states adopt the policy permanently and that this adoption is unanticipated.
Suppose your question is: “What is the effect of the minimum wage on firm entry?” Let \(N_{m,t} = D_{m,t,1}+D_{m,t,2}\) be the number of participating firms in market \(m\) at time \(t\). Let \(N_{\tau,m}(z)\) be the potential outcomes of \(N\) in market \(m\), \(\tau\) periods after the adoption of the minimum wage policy. Let’s define the dynamic effect of treatment on the treated (the effect of the minimum wage on markets that adopt it) as:
\[ \alpha_{\tau} = \mathbb{E}[N_{\tau}(1) - N_{\tau}(0)|Z = 1] = \mathbb{E}[\Delta_{\tau}|Z=1] \]
Our model doesn’t outline a theory of why certain markets adopt the minimum wage and why others don’t, but it does highlight that the effects of the policy will differ across markets, so it is important to account for heterogeneity: if there is any selection into policy adoption, we know that \(\mathbb{E}[\Delta_{\tau}|Z=1] \neq \mathbb{E}[\Delta_{\tau}]\).
A parallel trends assumption that justifies the event-study approach would be: \[ \mathbb{E}[N_{t}(0)|Z=1] - \mathbb{E}[N_{t}(0)|Z=0] = \text{constant}. \] Note that if we assume that each market is in the ergodic distribution governed by the Markov Perfect Equilibria, then the distribution of \(N\) is stationary in each market absent the policy intervention and the parallel trends assumption is justified. The event-study specification: \[ D_{m,t} = \gamma_{m} + \mu_{t} + \mathbf{1}\{t\geq t^*\}\alpha_{t-t^*} + \epsilon_{m,t} \] would then robustly identify the average effect of the policy among the markets that adopted it. This is partly true because the timing of adoption was uniform. We should note that if adoption was staggered, given that there may be heterogeneous treatment effects, a regression-based approach to the event study would deliver some weighted average of these impacts that is hard to interpret (Goodman-Bacon 2021).
6.1.3 Bundles of Tax Reforms
Finally, consider a suite of tax reforms in the dynamic labor supply model. Suppose that there are three states, \(A\), \(B\), \(C\). Suppose that each runs a different experiment where they introduce a different set of taxes and transfers. Let \(\mathcal{Y}_{j}\) indicate a net income function for each state \(j\). Consider the following examples:
\[\begin{align} \mathcal{Y}_{A}(W,H) = b_{A} + (1-\tau_{A})WH \\ \mathcal{Y}_{B}(W,H) = WH + \sum_{k=0}^{5}\tau_{k}(WH-\overline{E}_{k})\mathbf{1}\{WH>\overline{E}_{k}\} \\ \mathcal{Y}_{C}(WH) = WH(1-\tau_{C}) + \mathbf{1}\{H>20\}b_{C} \end{align}\]
And suppose that these participants do not anticipate their assignment to treatment in this experiment, which is expected to last for 3 periods. Let \(Z_{j}\in\{0,1\}\) indicate assignment to either treatment or control group in state \(j\).
If your research question is: “What is the effect of each unannounced, temporary, tax reform on labor supply?” then one could simply compare the means of treatment and control in each state:
\[ \mathbb{E}[H|Z_{j}=1, j] - \mathbb{E}[H|Z_{j} = 0, j] \]
which uncovers the causal effect of \(Z\) by virtue of random assignment.
6.2 Reasons to Use a Model
Having covered these three (perfectly valid) applications of quasi-experimental methods to answer specific research questions, let us now consider some questions that these methods don’t answer and consider the useful role that models can play in these (and related) contexts.
6.2.1 When the question can’t be articulated without one
Perhaps the most obvious reason to use a model is when you research question simply cannot be articulated without one. Some of the most useful insights from economic modeling are statements about economic efficiency, the potential for policies to resolve market inefficiencies, and the design of policies. Some examples:
- In the labor supply model, given a particular weighted welfare objective, what does the optimal system of taxes and transfers look like?
- What is the cheapest way to incentivize competition between two firms in dynamic duopoly?
- What is are the welfare costs of incomplete markets in the life-cycle savings model?.
6.2.2 To make welfare calculations
Revealed preference is one of the more powerful tools in an economist’s toolkit: if we treat individuals in our data as people who know what they like, we can try to infer their preferences and decide how they value different policy environments. For example:
- How do individuals value the social security program introduced in the savings model? How would they value a program with a different combination of taxes (\(\tau\)) and payments \(b\)?
- In the labor supply model, what is the sum of individual’s willingness to pay for a lumb sum payment \(b\) that is financed by proportion taxes \(\tau\)? We’ll come back to this one below.
6.2.3 To make sense of otherwise puzzling data
Here it’s hard to look past the most fundamental and basic causal question in our profession: what is the effect of price on quantities? You know of course that this is a silly question, but we only know this because the theory of supply and demand is so fundamental to our view of the world.
Suppose you observe prices and quantities in a market over time. Without the theory of supply and demand, all you would see is a cloud of points.
With the theory of supply and demand, we understand that each point is the simultaneous equilibrium outcome of two underlying structural relationships in equilibrium. Phillip and / or Sewall Wright proposed the solution: Instrumtal Variables, which is perhaps the earliest known example of an estimated structural model.
6.2.4 When variation does not identify the counterfactual of interest
In our examples above, you may have noted that we were careful to very specifically define the “treatment” in order to specify the causal object of interest. Models can help us articulate just exactly what kind of causal parameters can and cannot be identified by observed variation, as well as outlining a specific set of assumptions under which related counterfactuals can be forecast even though they are not exactly replicated by existing variation. Each section below discussed a number of examples in the context of each application. Researchers often frame this as a question of internal vs external validity, but there are too many interesting examples in the “external validity” column to not discuss them in more depth. In general, a key point made by Heckman and Vytlacil (2005) is that estimands from simple statistical mdoels designed to infer causal effects (such as those we get from difference-in-differences, IV, and regression discontinuity) are rarely parameters of exact policy interest.
We’ll use examples to explore these ideas.
6.2.4.2 Tax Reform
For concreteness, let’s consider the effect of reform \(A\) on labor supply, when \(\beta(1+r)=1\) and optimal consumption is stationary over time. It is given by: \[\Delta H_{n,t} = \psi\log(1-\tau) - \psi\sigma\Delta \log\left(C^*_{n}\right). \] The consumption response \(\Delta C^*_{n}\) embodies the income effect and can be solved by plugging optimal labor supply into the intertemporal budget constraint.
This concrete example helps us to understand three more general points about each reform:
- The average treatment effect depends on income effects, which in turn depend on the perceived length of time that the tax reform is enforced.
- The model exhibits lots of heterogeneity in treatment effects. As such, if there are differences in underlying distributions of wages or work costs, we should expect different impacts.
Thus, although each welfare experiment robustly identifies the effect of tax reform \(A\) and population \(A\), tax reform \(B\) on population \(B\), and so forth. It does not identify:
- The effect of any tax reform with different persistence (real or perceived). Moreover, one would want to interpret the experimental findings very carefully to ensure that individuals in the experiment were given adequate information about the length of time of the experiment.
- The effect of tax reform \(A\) on population \(B\), \(C\), etc, and likewise for tax reforms \(B\), \(C\), etc on alternative populations.
- The distribution of treatment effects at each location.
- The effect of tax reforms that are already partly anticipated by individuals.
- The effect of a scaled up tax reform, where equilibrium effects on wages might appear.
- The distribution of effects of each tax reform. Here the model can be used to be interpret available panel data and invert out distributions in observed labor market productivities and work costs. One can then return to the experimental data and estimate average treatment effects along these latent dimensions.
6.2.4.3 Entry-Exit Model
By now we have made the point several ways, but when it comes to the model of dynamic duopoly, we can note that the model-free approach does not identify:
- The effect of the minimum wage policy when it’s introduction is anticipated several periods earlier.
- The effect of the minimum wage on the markets that do not adopt it.
- The effect of repealing the minimum wage.
- The effect of nominal changes to the minimum wage.
Each of which might be considered much more useful or compelling policy calculations.
6.2.5 To interpolate existing variation in the data
Sticking with the tax reform example, let’s consider what it would take to jointly understand the effects. We know that each reform is related because each comes in the form of an infinite dimensional object: a function \(\mathcal{T}(W,H)\) of wages and hours. A model-free theory that attempts to estimate labor supply as a non-parametric function of this function would not get very far due to the implausible quantities of policy variation that would be needed to estimate it.
Economic models often provide a useful way to interpolate related – but not functionally identically related – variation. Notice that in the labor supply model any function \(\mathcal{T}\) is articulated through its effect on the budget constraint, and different policy reforms can be compared in the model without the addition of any new parameters. I refer to this property as articulated variation: when the effect of a variable can be modeled without the need for additional parameters. Typically, this involves a priori known changes to prices and endowments.
In this example, with structural parameters in hand, any function $ of wages and hours and can be modeled, and hence any observed variation in this function can be interpreted through the model without additional parameters. In Mullins (2026) I conduct a very similar example using data on welfare reform experiments in the United States.
Here is a useful counterexample: policy variation that is not well-articulated inside a model. Consider our choice of modeling the minimum wage in our entry and exit model: embodied as a parameter \(\phi_{4}\). Our choice to not model within-period production decisions of the firm (and instead estimate a reduced form) means that this policy is not well-articulated. We need an additional parameter \(\phi_{4}\) to model its effect, and further changes to the nominal wage cannot be studied.
6.3 Marschak’s Maxim
Our discussion so far has focused on exploring the boundaries of what simpler causal inference strategies can identify in order to make statements about how economic modeling can add value in research. This is not very instructive for how we do economic modeling. Of course this topic is more of an art than a science, but Marschak’s Maxim (Marschak 1953; Heckman and Vytlacil 2007) is a useful principle that can guide us. Here it is:
Researchers should specify the minimal set of model ingredients in order to answer the research question of interest.
This may seem obvious, and that the hard part is figuring out what this minimal set is. It’s true that this can be hard to decide, but it’s surprisingly easy to forget this simple rule once you are deep inside your model and making decisions. The question “Is this essential to my question of interest?” is not always easy to answer but it is one you should be repeatedly asking yourself. Your research question is the mast you tie yourself to, and you should decide as early as possible what it is.
Let’s do some clean examples to make the point clear.
6.3.1 Marschak’s Original Example
Marschak (1953) considers a very simple problem: a monopolist that chooses quantities to maximize profit, and a government that taxes quantities. Suppose that demand is given by \[ p = \alpha_0 - \alpha_1 q \] and that firms produce with constant marginal costs, \(c\). Firm profits as a function of quantities is: \[ \Pi(q) = (\alpha_0 - c - \tau)q - \alpha_1 q^2 \] yielding optimal quantities \[ q^* = \frac{\alpha_0 - c - \tau}{2\alpha_1} \] The government’s tax revenue is: \[ R = q \tau = \frac{\alpha_0 - c - \tau}{2\alpha_1} \tau \]
Supposing that, intially, quantity \(q\) varies for exogenous reasons across markets, Marschak considers three problems and how an individual could learn the solution to their problems from data.
- Maximizing profit To maximize profit, the firm could look at how their profits evolve with \(q\): \(\Pi = a q - b q^2\), and set \(q^* = \frac{a}{2b}\)
- To forecast how their revenue changes with firms quantities across markets, the Government needs only to know their tax rate \(\tau\).
- To set taxes to optimize tax revenue, and assuming that firms learn their optimal output also, the government need only extract the linear component \(a = \alpha_0 - c - \ tau\) from the observed profit relationship and add the observed tax rate \(\tau\), setting \(\tau^* = (a+\tau)/2 = (\alpha_0 - c)/2\).
Although the setup is a little unconventional (we are used to assuming agents already act optimally), two lessons resonate from this exercise:
- For each question, the analyst only needs to know certain combinations of parameters – each of which can be observed in reduced-form relationships – to answer their question of interest.
- Once the firm optimizes, the reduced form relationship between tax revenue and quantities is not sufficient for forecasting the tax revenue from future changes to \(\tau\).
Point (2) is essentially an early version of the Lucas Jr (1976) critique, while point (1) is the one we will mostly try to take lessons from. It emphasizes that in some stylized cases, the answers to some questions are essentiall invariant to particular model ingredients and we need not consider them. Although in practice we cannot always guarantee this, the insight undergirds a philosophical approach to quantitative modeling that bends always toward finding a minimal set of ingredients.
6.3.2 Two-Stage Budgeting
Returning to the life-cycle savings model, suppose we instead that utility depended on a vector of \(K\) commodities \(x\) in each period, such that the problem becomes: \[ V_{t}(p,a,y) = \max_{x,a'}\left\{v(x) + \beta \mathbb{E}_{p',y'|p,y}V_{t+1}(p',a',y')\right\}\] subject to: \[ p \cdot x + \frac{1}{1+r}a' \leq y_t,\qquad a\geq \underline{a} \] where \(p\) is the vector of prices of each commodity. Imagine however that our research question is concerned only with studying the accumulation of savings over the life-cycle, and studying policy counterfactuals (such as the social security system above) that affect the intertemporal allocation of resources. A relatively weak assumption we could place on \(v\) is that it is homothetic, in which case there exists a price index \(P(p)\) and an indirect utility function \(u\) such that (Gorman 1959): \[ V(p,X) = \max_{x}v(x)\ s.t.\ p\cdot x \leq X = u(X/P(p)) \] This approach, known as two-stage budgeting, says that it is reasonable for consumers to first allocate an aggregate expenditure budget across periods and – with access to a price index – disregard the problem of allocating expenditures to more detailed categories. It is a classic and clean application of Marschak’s maxim: if your research question concerns inter-temporal consumption allocation decisions, and your policy counterfactuals do not affect any intratemporal margins, then under regularity conditions on demand there is no need to consider intra-temporal consumption allocations. All you need is a reasonable price index with which to deflate nominal values.
In the same spirit, a main goal of Gorman (1959) and others in the two-stage budgeting literature was to find assumptions that would permit broader categories of expenditure (food, utilities, clothing) when modeling and estimating systems of demand.
6.3.3 Sufficient Statistics
Consider a static labor supply model for individuals \(i\in[0,1]\) in an economy \[ H_i = \arg\max_{h} C - v(h) \] where \[ C = b + (1-\tau)hW_i \] Assume that the transfer \(b\) is financed through \(\tau\), so that in aggregate: \[ b = \int \tau H_iW_i di. \]
Consider a utilitarian planner with welfare objective: \[ V(\tau) = \int(C_i - v(H_i))di\]
What is the welfare effect of a marginal expansion in \(b\), financed through \(\tau\)? Taking \(dV/d\tau\) we can apply the envelope theorem to get:
\[ \frac{dV}{d\tau} = -\int W_iH_idi + \int W_iH_idi + \tau \int d(W_iH_i)/d\tau d_i \] which we can rearrange, assuming a constant elasticity of taxable income wrt to \(\tau\) \[ \frac{dV}{d\tau} = \int Z_i di \varepsilon_{Z,\tau} \] where \(Z_i = W_iH_i\) is taxable income.
In this setup, the willingess to pay for the marginal policy expansion exactly cancels out with the mechanical component of the cost, and we are left only with the distortionary cost of behavioral adjustments, similar to Harberger (1954). 1
There is a larger literature on similar so-called sufficient statistics in public finance, concerned with deriving approximations to the effect of local policy changes. Their appear is based on formula that tend to rely only on empirical quantities and elasticities, and less on specific functional form assumptions in the model.
Athough of course, as models get more elaborate, the number of elasticities you need for these sufficient statistics becomes intractable (Kleven 2021). But they can often be a useful guide to understanding the fundamentals of what your policy is doing. And they are a great example of Marschak’s maxim!
6.4 Exercises
Exercise 6.1 Consider a static version of the labor supply model where earnings are the only source of income:
\[ H_{n} = \arg\max_{h} \frac{(W_nh)^{1-\sigma}}{1-\sigma} - \alpha_{n}^{-1}\frac{h^{1+1/\psi}}{1+1/\psi} \]
Assume that wages follow
\[ \log(W_{n}) = \gamma_{0} + \gamma_{1}Z_{n} + \epsilon_{n} \]
where \(Z \perp \alpha\) but \(\epsilon\) and \(\alpha\) are potentially correlated.
- Solve for \(H_{n}\) in terms of \(W_{n}\)
- Suppose you have cross-sectional data \((W_{n},Z_{n},H_{n})_{n=1}^{N}\) and that you use \(Z\) as an instrument to estimate the relationship \[ \log(H) = \alpha_0 + \alpha_1 \log(W) + \varepsilon \] via 2SLS. Express the estimand \(\alpha_1\) as a combination of structural parameters. (Hint: it should follow from your answer to part 1).
- Consider a policy reform that introduces a proportional subsidy, \(\tau\), so that net wages are \((1+\tau)W\). Is \(\alpha_{1}\) sufficient for forecasting the effects of this policy? If this is your question of interest, what does Marschak’s Maxim suggest about the need to separately identify the parameters that combine to form \(\alpha_1\)?
- Now suppose an alternative policy reform that consists of a lump sum transfer \(b\) and a proportional labor market tax, \(\tau\), so that net income is \(b + (1-\tau)WH\). Do you think \(\alpha_{1}\) is sufficient to forecast the effect of this new policy on labor supply? An intuitive explanation is sufficient, but you can try calculating the local change \(\partial H / \partial b |_{b=0}\) if you want to be more precise.
- Based on your answer to part 4, and assuming that the model we wrote down is the true data generating process, do you think it is even possible to forecast the effect of this alternative policy given these data?
Exercise 6.2 Consider the following simple model of time allocation. Individual utility is given by:
\[ U(C,L) = (\phi C^{\rho} + (1-\phi) L^{\rho})^{1/\rho} \]
where \(L\) is an aggregate leisure good composed of \(K\) different activities:
\[ L = \prod_{k=1}^{K}l_{k}^{\delta_{k}},\qquad \sum_{k}\delta_{k} = 1 \]
In addition to these leisure activities, the agent may supply labor to the market at a wage rate of \(w\). Letting \(h_{k}\) be hours, the time constraint is:
\[ h_{k} + \sum_{k}l_{k} = 1 \]
The model is static and the individual solves the following problem:
\[ \max_{C,\{l_{k}\}_{k=1}^{K}} U(C,L) \]
subject to the constraint:
\[ C + w \left(\sum_{k} l_{k}\right) \leq w \]
Suppose you are interested in using this model to study the effects of a wage subsidy on labor supply. Notice that the model can be written as \[ \max_{C,h} U(wh,L^*(1-h)) \] where \[ L^*(1-h) = \max_{\{l_{k}\}_{k=1}^{K}} \prod_{k=1}^{K}l_{k}^{\delta_{k}} \] subject to \(\sum_{k}l_{k} = 1-h\). Given this simplification, what does Marschak’s Maxim (and common sense) suggest about what parameters need to be estimated here?
Based on your answer to the above, you simplify the model to the following specification: \[ h^* = \arg\max (\phi (wh)^{\rho} + (1-\phi) (1-h)^{\rho})^{1/\rho} \] and you derive the following relationship: \[ \log\left(\frac{C}{L}\right) = \frac{1}{1-\rho}\log\left(\frac{\phi}{1-\phi}\right) + \frac{1}{1-\rho}\log(w) \] where \(C=wh^*\) is total labor income and \(L=1-h^*\) is non-market time. Suppose you have a cross-section of data \((C_{n},L_{n},W_{n})\) where \(C_{n}\) is labor market earnings, \(L_{n}\) is non-market time, and \(W_{n}\) is the wage-rate for person \(n\). This could be taken (for example) from the Outgoing Rotation Group of the CPS monthly survey. Does the model, as written, allow for any randomness in the relationship between \(C_{n}/L_{n}\) and \(W_{n}\)? Is this likely to be replicated in the data?
Suppose now you augment the model to acommodate some randomness in how much individuals work by allowing for heterogeneity in preferences (\(\phi\)): \[ \log\left(\frac{C}{L}\right) = \frac{1}{1-\rho}\log\left(\frac{\phi_{n}}{1-\phi_{n}}\right) + \frac{1}{1-\rho}\log(w) \] What assumption do you need for an OLS regression of \(\log(C_{n}/L_{n})\) on \(\log(W_{n})\) to consistently recover the elasticity of labor supply, \(1/(1-\rho)\)? Do you consider this credible? Why/why not?
The assumption of quasi-linear utility simplifies things but is not required. In the general case, we can express the willingess to pay for the marginal policy change as: \(-\partial e(w(1-\tau),u)/\partial \tau + dy/d\tau\) where \(u\) is utility in the baseline, \(e(w(1-\tau),u) = \min (1-\tau)wl + c\) s.t. \(u(c)+v(1-l)\geq u\) is the Hicksian expenditure function, and \(y=(1-\tau)w + b\) is full income. Shephard’s lemma gives that the willingness to pay for the change in price is \(lw\) which we can compare to \(dy/d\tau\) which is \(-w - \int (1-l_i)w_i di + \tau\int d((1-l_i)w_i)/d\tau di\). Averaging over everyone gives the total willingness to pay as \(\varepsilon_{z,\tau}\int z_i di\) where \(z_i = (1-l_i)w_i\) is earnings and \(\varepsilon_{z,\tau}\) is the elasticity of taxable income with respect to taxes.↩︎
6.1.1 Social Security
In the life-cycle savings model, consider the inclusion of a social security system that provides income at older ages. The budget constraint becomes:
\[ c_{t} + a_{t+1}/(1 + r) \leq (1-\tau)y_{t} + \mathbf{1}\{t\geq 65\}b \]
so that individuals become eligible for a benefit, \(b\), after age 65, and pay into the system with proportional taxes, \(\tau\). Suppose that the age of eligibility for social security is decreased, unexpectedly, from 65 to 60. And suppose that you have a repeated cross-section of data on individual consumption and age from periods before and after the unannounced reform.
Let’s consider an approach to identifying objects of interest without specifying the full underlying model. To begin, let \(t^*\) index cohorts by their age at the time of policy announcement. Notice that \(t^*\) indicates both a cohort and a treatment. Suppose your question is: “What was the effect of this eligibility expansion on consumption at age \(t\) for a cohort aged \(t^*\) at the time of expansion?” Instead of specifying modeling assumptions, let’s specify how far one can get with a difference-in-differences approach.
To specify the causal effects of interest, we’ll use the language of potential outcomes. Let \(C_{t*,t}(1)\) be the potential outcome – a random variable – indicating consumption of an individual in cohort \(t^*\) at age \(t\) under the policy announcement. Similarly let \(C_{t^*,t}(0)\) be their corresponding potential outcome under the counterfactual: if the policy had never been announced. The target parameter identified by the research question above is \[ \alpha_{t*,t} = \mathbb{E}[C_{t*,t}(1) - C_{t^*,t}(0)] \] If there is sufficient variation, this parameter can be identified by assuming parallel trends with an untreated reference cohort. For the sake of this example, consider individuals who were 65 at the time of the policy announcement and hence (in principle) unaffected by the policy change. Parallel trends assumes that for some \(t^*<65\) and for a pair of ages \(t\geq t^*\), \(s<t^*\): \[ \mathbb{E}[C_{t^*,t}(0)] - \mathbb{E}[C_{65,t}(0)] = \mathbb{E}[C_{t^*,s}(0)] - \mathbb{E}[C_{65,t}] \]
All we need then to identify \(\alpha_{t,t*}\) is consumption for both cohorts at age \(s\) as well as at age \(t\), which allows us to construct the counterfactual \(\mathbb{E}[C_{t*,t}(0)]\) in terms of observable quantities: \[ \alpha_{t,t*} = \mathbb{E}[C_{t^*,t}(1)] - \mathbb{E}[C_{65,t}(0)] - (\mathbb{E}[C_{t^*,s}(0)] - \mathbb{E}[C_{65,s}(0)]) \] Note of course this was for a specific cohort, using a specific other cohort and a specific age \(s\) with which to construct the counterfactual. One could use many other cohorts and ages to do this, as is assumed by the common regression specification: \[ \mathbb{E}[C_{t^*,t}] = \gamma_{t^*} + \mu_{t} + \alpha_{t^*,t}\mathbf{1}\{t\geq t^*\} \] which implicitly layers in a stricter and interconnected set of parallel trends assumptions.
To proceed, we have only to defend the parallel trends assumption, which many view as less burdensome compared to defending the many layers of assumptions in a quantitative model. Of course, the farther apart two cohorts are from each other, the stronger this assumption might appear, and the regression specification does not allow us to select “more ideal” control groups for each cohort (Goodman-Bacon 2021).