Some Hints and Tricks for Assignment 3

Adapting Code for Automatic Differentiation

Depending on how much you played around last week, you may have found that methods like QuadGK don’t play nicely with automatic differentiation. This is because QuadGK adjusts the number of nodes adaptively.

One solution is to use a fixed number of nodes and weights with FastGaussQuadrature. Here is a simple example.

using FastGaussQuadrature, QuadGK, ForwardDiff, Distributions, Optim, Roots

# a function to integrate using quadrature (default: 10 nodes)
function integrateGL(f,a,b ; num_nodes = 10)
    nodes, weights = gausslegendre( num_nodes )
    ∫f = 0.
    for k in eachindex(nodes)
        x = (a+b)/2 + (b - a)/2 * nodes[k]
        ∫f += weights[k] * f(x)
    end
    return (b - a)/2 * ∫f
end

dS(x ; F,β,δ) = (1-cdf(F,x)) / (1-β*(1-δ))
res_wage_1(wres , b,λ,δ,β,F) = wres - b - β * λ * integrateGL(x->dS(x;F,β,δ),wres,quantile(F,0.9999))
res_wage_2(wres , b,λ,δ,β,F) = wres - b - β * λ * quadgk(x->dS(x;F,β,δ),wres,Inf)[1]

ForwardDiff.derivative(x->res_wage_1(x,0.,0.5,0.03,0.99,LogNormal(0,1)),1.) 
#ForwardDiff.derivative(x->res_wage_2(x,0.,0.5,0.03,0.99,LogNormal(0,1)),1.) <- you can try running this and you will see an error

7.224647846541125

Re-writing the model solution

Based on this, we’re going to re-write the model solution using this new integration routine. We will also use Roots to solve for the reservation wage in a way that will also play nicely with ForwardDiff.

res_wage(wres , b,λ,δ,β,F::Distribution) = wres - b - β * λ * integrateGL(x->dS(x;F,β,δ),wres,quantile(F,0.9999))
pars = (;b = -5.,λ = 0.45,δ = 0.03,β = 0.99,F = LogNormal(1,1))


function solve_res_wage(b,λ,δ,β,F)
    return find_zero(x->res_wage(x,b,λ,δ,β,F),eltype(b)(4.)) #<- initial guess of $4/hour
end
solve_res_wage(0.,0.4,0.03,0.995,LogNormal())
# testing that we can take a derivative here:
ForwardDiff.derivative(x->solve_res_wage(x,0.4,0.04,0.995,LogNormal()),0.)

0.45978046027514413

Cleaning the Data

None of this is really different from before, but we added a function that pulls a named tuple out for a specific demographic group. You might find that useful for your assignment.

using CSV, DataFrames, DataFramesMeta, Statistics

data = CSV.read("../data/cps_00019.csv",DataFrame)
data = @chain data begin
    @transform :E = :EMPSTAT.<21
    @transform @byrow :wage = begin
        if :PAIDHOUR==0
            return missing
        elseif :PAIDHOUR==2
            if :HOURWAGE<99.99 && :HOURWAGE>0
                return :HOURWAGE
            else
                return missing
            end
        elseif :PAIDHOUR==1
            if :EARNWEEK>0 && :UHRSWORKT<997 && :UHRSWORKT>0
                return :EARNWEEK / :UHRSWORKT
            else
                return missing
            end
        end
    end
    @subset :MONTH.==1
    @select :AGE :SEX :RACE :EDUC :wage :E :DURUNEMP
    @transform begin
        :bachelors = :EDUC.>=111
        :nonwhite = :RACE.!=100 
        :female = :SEX.==2
        :DURUNEMP = round.(:DURUNEMP .* 12/52)
    end
end

# the whole dataset in a named tuple
wage_missing = ismissing.(data.wage)
wage = coalesce.(data.wage,1.)
N = length(data.AGE)
X = [ones(N) data.bachelors data.female data.nonwhite]
# create a named tuple with all variables to conveniently pass to the log-likelihood:
d = (;logwage = log.(wage),wage_missing,E = data.E,tU = data.DURUNEMP, X) #<- you will need to add your demographics as well.

function get_data(data,C,F,R)
    data = @subset data :bachelors.==C :female.==F :nonwhite.==R
    wage_missing = ismissing.(data.wage)
    wage = coalesce.(data.wage,1.)
    N = length(data.AGE)
    # create a named tuple with all variables to conveniently pass to the log-likelihood:
    return d = (;logwage = log.(wage),wage_missing,E = data.E,tU = data.DURUNEMP) 
end

dx = get_data(data,1,0,0) #<- data for white men with a college degree

(logwage = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.516124491189194, 3.872798692268385, 0.0  …  2.7762279256323206, 0.0, 0.0, 0.0, 2.976307324928243, 0.0, 0.0, 3.410759848526933, 0.0, 0.0], wage_missing = Bool[1, 1, 1, 1, 1, 1, 1, 0, 0, 1  …  0, 1, 1, 1, 0, 1, 1, 0, 1, 1], E = Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], tU = [231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0  …  231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0])

Coding the log-likelihood with one set of parameters

In your assignment you will estimate the model for one set of demographic characteristics by directly estimating \((h,\delta,\mu,\sigma,w^*)\) and backing out \(b\) and \(\lambda\). Here is a log-likelihood function that you could use (feel free to write your own).

ϕ(x,μ,σ) = pdf(Normal(μ,σ),x)
Φ(x,μ,σ) = cdf(Normal(μ,σ),x)
# a function for the log-likelihood of observed wages (integrating out measurement error)
function logwage_likelihood(logwage,F,σζ,wres)
    f(x) = pdf(F,x) / (1-cdf(F,wres)) * ϕ(logwage,log(x),σζ)
    ub = quantile(F,0.9999)
    return integrateGL(f,wres,ub)
end
# a function to get the log-likelihood of a single observation
function log_likelihood(n,data,pars)
    (;h,δ,wres,F,σζ) = pars
    ll = 0.
    if data.E[n]
        ll += log(h) - log(h + δ)
        if !data.wage_missing[n]
            ll += logwage_likelihood(data.logwage[n],F,σζ,wres)
        end
    else
        ll += log(δ) - log(h + δ)
        ll += log(h) + data.tU[n] * log(1-h)
    end
    return ll
end

# a function to map the vector x into parameters
logit(x) = exp(x) / (1+exp(x))
logit_inv(x) = log(x/(1-x))
function update(pars,x)
    h = logit(x[1])
    δ = logit(x[2])
    μ = x[3]
    σ = exp(x[4])
    wres = exp(x[5])
    F = LogNormal(μ,σ)
    σζ = 0.05
    return (;pars...,h,δ,μ,σ,wres,F,σζ)
end
# a function to iterate over all observations
function log_likelihood_obj(x,pars,data)
    pars = update(pars,x)
    ll = 0.
    for n in eachindex(data.E)
        ll += log_likelihood(n,data,pars)
    end
    return ll / length(data.E)
end

log_likelihood_obj (generic function with 1 method)

With this log-likelihood we can pass straight to Optim.

x0 = [logit_inv(0.5),logit_inv(0.03),2.,log(1.),log(5.)]
pars = (;σζ = 0.05, β = 0.995)
log_likelihood_obj(x0,pars,dx) #<- test.
res = optimize(x->-log_likelihood_obj(x,pars,dx),x0,Newton(),Optim.Options(show_trace=true))

Iter     Function value   Gradient norm 
     0     1.566021e-01     9.112564e-02
 * time: 8.916854858398438e-5
     1     1.124900e-01     8.425665e-02
 * time: 0.18092608451843262
     2     9.438133e-02     7.424673e-02
 * time: 0.3399372100830078
     3     7.262438e-02     9.064264e-02
 * time: 0.49828410148620605
     4     6.527359e-02     4.043801e-01
 * time: 0.6780951023101807
     5     5.390735e-02     7.264660e-02
 * time: 0.8790051937103271
     6     4.456252e-02     5.104587e-02
 * time: 1.2180511951446533
     7     4.419638e-02     4.246399e-02
 * time: 1.492077112197876
     8     4.331776e-02     9.864366e-02
 * time: 1.6455280780792236
     9     4.279575e-02     8.358208e-02
 * time: 1.8211140632629395
    10     4.266464e-02     3.147197e-02
 * time: 1.9958012104034424
    11     4.263086e-02     1.838941e-02
 * time: 2.1707379817962646
    12     4.262640e-02     3.127715e-03
 * time: 2.3470070362091064
    13     4.262633e-02     3.959605e-07
 * time: 2.499476194381714
    14     4.262633e-02     2.184184e-08
 * time: 2.674283981323242
    15     4.262633e-02     7.462975e-10
 * time: 2.744335174560547

 * Status: success

 * Candidate solution
    Final objective value:     4.262633e-02

 * Found with
    Algorithm:     Newton's Method

 * Convergence measures
    |x - x'|               = 6.68e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.20e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = 8.81e-15 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 2.07e-13 ≰ 0.0e+00
    |g(x)|                 = 7.46e-10 ≤ 1.0e-08

 * Work counters
    Seconds run:   3  (vs limit Inf)
    Iterations:    15
    f(x) calls:    54
    ∇f(x) calls:   54
    ∇²f(x) calls:  15

pars = update(pars,res.minimizer)

(σζ = 0.05, h = 0.17641764242681596, δ = 0.003828372201989135, μ = 2.416002041470427, σ = 1.147127370295339, wres = 18.6280911963576, F = LogNormal{Float64}(μ=2.416002041470427, σ=1.147127370295339))

We could back out estimates of \(b\) and \(\lambda\) as follows:

λ = pars.h / (1 - cdf(pars.F,pars.wres))
b = pars.wres - pars.β * λ * integrateGL(x->dS(x;pars.F,pars.β,pars.δ),pars.wres,quantile(pars.F,0.9999))

-601.9015559474371

What about standard errors? Some parameters (such as \(\delta\)) are transformations of the estimated vector. Here we need the delta method, which tells us that if \(\mathbb{V}(\hat{\theta}) = v\) and \(\beta = f(\theta)\) then \(\mathbb{V}(\hat{\beta}) = f'(\theta)^2v\):

H = ForwardDiff.hessian(x->log_likelihood_obj(x,pars,dx),res.minimizer)
N = length(dx.E)
avar = inv(-H) #<- asymptotic variance estimate
std_err_δ = pars.δ * sqrt(avar[2,2] / N)

0.00037289231946497606

One final note. You could automate this whole calculation by writing a function that returns all your transformed parameters of interest as a function of the estimated vector, then calling ForwardDiff:

using LinearAlgebra
function final_ests(x_est,pars)
    pars = update(pars,x_est)
    (;h,δ,μ,σ,wres,F,β) = pars
    λ = h / (1 - cdf(F,wres))
    b = wres - β * λ * integrateGL(x->dS(x;F,β,δ),wres,quantile(F,0.9999))
    return [h , δ, μ, σ, wres, λ, b]
end
dF = ForwardDiff.jacobian(x->final_ests(x,pars),res.minimizer)
var_est = dF * avar * dF' / N #<- this is the variance of the estimates implied by the delta method
# now a table with estimates and standard errors:
p_str = ["h","δ","μ","σ","wres","λ","b"]
DataFrame(par = p_str,est = final_ests(res.minimizer,pars),se = sqrt.(diag(var_est)))

7×3 DataFrame

Row	par	est	se
	String	Float64	Float64
1	h	0.176418	0.0114358
2	δ	0.00382837	0.000371465
3	μ	2.416	0.167726
4	σ	1.14713	0.044975
5	wres	18.6281	0.178748
6	λ	0.536668	0.0841872
7	b	-601.902	31.4198

Beautiful!