Assignment 2

We are building toward estimating a dynamic discrete choice model with persistent unobserved heterogeneity. To make a start, in this problem set you will estimate a static discrete choice problem on the data.

Setup: Code for MLE

Reading in Data

You are going to build on the code below that estimates a very simple model of labor supply and program participation.

First, here is a chunk of code to read in the data and prepare it. The variables in julia DataFrames are typically not type stable which can slow down your code considerably. So you will note that in the final step here I bring the variables I want to use out of a DataFrame and into a named tuple with vectors containing a fixed type.

using CSV, DataFrames, DataFramesMeta, Optim
include("../children-cash-transfers/src/Transfers.jl")

# for this simple model we can just drop missing data. When we estimate the model with persistent latent heterogeneity, we will need complete panels (including years where choices are missing).

data = @chain begin
    CSV.read("../children-cash-transfers/data/MainPanelFile.csv",DataFrame,missingstring = "NA")
    #@select :MID :year :wage :hrs :earn :SOI :CPIU :WelfH :FSInd :num_child :age
    @subset :year.>=1985 :year.<=2010
    @transform :AFDC = :WelfH.>0 
    @rename :FS = :FSInd
    @transform :P  = :FS + :AFDC :H = min.(2,round.(Union{Int64, Missing},:hrs / (52*20)))
    @subset .!ismissing.(:P) .&& .!ismissing.(:H)
    @transform @byrow :wage = begin
        if :hrs>0 && :earn>0
            return :earn / :hrs / :CPIU
        else
            return missing
        end
    end
end

data_mle = (;P = Int64.(data.P), H = Int64.(data.H), year = data.year, age = data.age,
            soi = data.SOI, num_kids = data.num_child, cpi = data.CPIU,
            logwage = log.(coalesce.(data.wage,1.)),wage_missing = ismissing.(data.wage))
(P = [2, 2, 2, 1, 0, 0, 0, 2, 1, 0  …  2, 2, 1, 2, 0, 0, 1, 1, 1, 1], H = [0, 0, 0, 0, 0, 0, 0, 0, 0, 2  …  0, 0, 2, 2, 2, 2, 0, 0, 0, 0], year = [1994, 1995, 1996, 1998, 2000, 2002, 2004, 2006, 2008, 1990  …  1991, 1992, 1991, 1992, 1991, 1992, 1991, 1992, 1991, 1992], age = [21, 22, 23, 25, 27, 29, 31, 33, 35, 17  …  21, 22, 23, 24, 39, 40, 30, 31, 25, 26], soi = [43, 43, 43, 43, 43, 43, 43, 43, 43, 17  …  44, 44, 7, 7, 5, 5, 39, 39, 39, 39], num_kids = [2, 2, 3, 3, 3, 3, 3, 3, 3, 1  …  2, 2, 2, 2, 3, 3, 4, 4, 2, 2], cpi = [0.860812349005761, 0.884959812302546, 0.910948243820851, 0.946664188812488, 1.0, 1.04457233785542, 1.09707768072849, 1.17054218546739, 1.25008130459022, 0.758792510685746  …  0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336], logwage = [0.0, 0.0, 0.0, 0.0, 1.2039728043259361, 0.0, 0.0, 0.0, 0.0, 0.8230555833449215  …  0.0, 0.0, 2.6512574120409043, 2.3314412292534223, 1.265948758245773, 1.8744481199656744, 0.0, 0.0, 0.0, 0.0], wage_missing = Bool[1, 1, 1, 1, 0, 1, 1, 1, 1, 0  …  1, 1, 0, 0, 0, 0, 1, 1, 1, 1])

The Model

Consider a static model with 9 discrete choices. Hours choices are given by \(H_{j}\in\{0,20,40\}\) and participation choices indicated by \(P_{j}\in\{0,1,2\}\). Individuals may choose any combination of these, with a Type I extreme value distributed shock \(\epsilon_{j}\) for choice \(j\) that is iid over time and across individuals. The scale of these shocks is given by \(\sigma\).

Utility for individual \(i\) at time \(t\) is given by:

\[ U_{itj} = \log(\max\{50,Y_{it}(W_{it}H_{j})\}) + \alpha_{l}\log(112 - H_{j}) - \alpha_{P,1}\mathbf{1}\{P_{j}>0\} - \alpha_{P,2}\mathbf{1}\{P_{j}>1\} \]

where \(Y_{it}\) is a net income function for individual \(i\) at time \(t\) and depends on their earnings (\(W_{it}H_{j}\)), the number of children in the household, and the policy environment in the individual’s state of residence.

Wages are a deterministic function of age only:

\[ \log(W_{it}) = \gamma_{0} + \gamma_{1}\text{Age}_{it} + \gamma_{2}\text{Age}_{it}^2 \]

and are measured with normally distributed iid measurement error:

\[ \log(W^{o}_{it}) = \log(W_{it}) + \zeta_{it},\qquad\zeta_{it}\sim\mathcal{N}(0,\sigma^2_{W}) \]

The code below calculates utility and indexes choices.

j_idx(p,h) = p*3 + h + 1
function utility(p,h,soi,cpi,year,num_kids,wage,pars)
    (;αl,Hgrid,σ,αP) = pars
    hrs = Hgrid[1+h]
    earn = wage * hrs
    net_income = max(50.,Transfers.budget(earn,0.,soi,year,num_kids,cpi,p))
    return log(net_income) + αl*log(112-hrs) - αP[1]*(p>0) - αP[2]*(p>1)
end

pars = (;αl = 1., σ = 1., σW = 1., γ = zeros(3),
    Hgrid = [0,20.,40.],αP = zeros(2))
(αl = 1.0, σ = 1.0, σW = 1.0, γ = [0.0, 0.0, 0.0], Hgrid = [0.0, 20.0, 40.0], αP = [0.0, 0.0])

Choice probabilities are given by the logit formula.

function choice_prob!(logP,it,data,pars)
    (;σ,γ) = pars
    wage = exp(γ[1] + γ[2] * data.age[it] + γ[3] * data.age[it]^2)
    denom = 0.
    umax = -Inf
    for p in 0:2
        for h in 0:2
            j = j_idx(p,h)
            u = utility(p,h,data.soi[it],data.cpi[it],data.year[it],data.num_kids[it],wage,pars)
            logP[j] = u / σ
            umax = u>umax ? u : umax
        end
    end
    logP[:] .-= umax / σ
    denom = log(sum(exp.(logP)))
    logP[:] .-= denom #<- nornalize choice probabilities
end
choice_prob! (generic function with 1 method)

And so the log-likelihood takes a simple form.

function log_likelihood(logP,it,data,pars)
    (;γ,σW) = pars
    ll = 0.
    if !data.wage_missing[it]
        ll += -0.5((data.logwage[it] - γ[1] - γ[2]*data.age[it] - γ[3]*data.age[it]^2) / σW)^2 - log(σW)
    end
    choice_prob!(logP,it,data,pars)
    j = j_idx(data.P[it],data.H[it])
    ll += logP[j]
    return ll
end

function update(x,p)
    αl = exp(x[1])
    σ = exp(x[2])
    γ = x[3:5]
    σW = exp(x[6])
    αP = x[7:8]
    return (;p...,αl,σ,σW,αP,γ)
end

function log_likelihood(x,data,pars)
    pars = update(x,pars)
    logP = zeros(eltype(x),9)
    ll = 0.
    for it in eachindex(data.P)
        ll += log_likelihood(logP,it,data,pars)
    end
    return ll
end
log_likelihood (generic function with 2 methods)

We can use Optim to maximize the log-likelihood:

x0 = zeros(8)
x0 = [0.,0.,log(10.),0.,0.,0.,0.,0.]
res = optimize(x->-log_likelihood(x,data_mle,pars),x0,BFGS(),autodiff=:forward,Optim.Options(show_trace = true))
pars = update(res.minimizer,pars)
Iter     Function value   Gradient norm 
     0     7.234596e+04     3.085114e+06
 * time: 0.00633692741394043
     1     7.219007e+04     7.930873e+05
 * time: 0.4547231197357178
     2     7.189756e+04     8.240034e+05
 * time: 0.5587921142578125
     3     6.311750e+04     1.040038e+06
 * time: 0.803109884262085
     4     5.773072e+04     3.637938e+06
 * time: 1.2063300609588623
     5     5.738226e+04     6.079719e+06
 * time: 1.3685970306396484
     6     5.692685e+04     7.218571e+06
 * time: 1.5767319202423096
     7     5.644027e+04     8.094668e+06
 * time: 1.9543049335479736
     8     5.629959e+04     8.406176e+06
 * time: 2.011056900024414
     9     5.408423e+04     4.483122e+06
 * time: 2.0493218898773193
    10     5.315575e+04     3.603074e+06
 * time: 2.0878500938415527
    11     5.209566e+04     2.625888e+05
 * time: 2.145103931427002
    12     5.195541e+04     2.083292e+05
 * time: 2.1833839416503906
    13     5.188423e+04     6.807290e+05
 * time: 2.221687078475952
    14     5.187199e+04     2.396172e+05
 * time: 2.279866933822632
    15     5.186933e+04     5.094151e+03
 * time: 2.3182590007781982
    16     5.186887e+04     7.774766e+03
 * time: 2.3758020401000977
    17     5.186878e+04     2.823744e+04
 * time: 2.433206081390381
    18     5.186871e+04     1.085063e+04
 * time: 2.4902100563049316
    19     5.186852e+04     2.080594e+04
 * time: 2.56597900390625
    20     5.186850e+04     5.468883e+02
 * time: 2.6230380535125732
    21     5.186850e+04     4.523469e+00
 * time: 2.679996967315674
    22     5.186850e+04     8.466642e-01
 * time: 2.736649990081787
    23     5.186850e+04     3.014776e-02
 * time: 2.7939069271087646
    24     5.186850e+04     1.164486e-03
 * time: 2.8322980403900146
    25     5.186850e+04     1.957495e-05
 * time: 2.888066053390503
    26     5.186850e+04     1.250626e-07
 * time: 2.9259419441223145
    27     5.186850e+04     1.206905e-08
 * time: 2.9638988971710205
    28     5.186850e+04     3.893547e-09
 * time: 3.0017800331115723
(αl = 2.8546233783929473, σ = 0.9509286759263859, σW = 0.7224860062936598, γ = [-0.023620750582789916, 0.10877398375254817, -0.0012722347394312002], Hgrid = [0.0, 20.0, 40.0], αP = [2.035301636135555, 1.0137619542631466])

Question 1

Write code to calculate the standard errors for these parameters and present the estimates with standard errors in a table.

Question 2

What sources of variation are identifying \(\sigma\), the responsiveness of choices to changes in the payoffs? How is the responsivess of program participation and labor supply to changes in incentives limited in this framework?

Question 3

Re-estimate the model with a one layer nested logit structure, where the individual first makes one of the three participation choices before making one of the three hours choices. That is, if the choices can be ordered as: \[ \{P_{j}\}_{j=1}^9 = \{0,0,0,1,1,1,2,2,2\}, \{H_{j}\}_{j=1}^9 = \{0,20,40,0,20,40,0,20,40\} \]

then the partition is:

\[ \mathcal{B} = \{\{1,2,3\},\{4,5,6\},\{7,8,9\}\} \]

Let \(\sigma_{H}\) be the dispersion of shocks within each partition and \(\sigma_{P}\) be the dispersion of shocks across partitions. Intuitively, how are these dispersion parameters identified? (Hint: think about what is changing payoffs across observations and think about the two dimensions of the discrete choice).

Question 4

What sort of variation do you think is in the data that we are failing to put in our model? What do you think are the implications for key parameter estimates?