In this assignment we’ll start working with data from the PSID. If you would like more details on how these data are constructed, you should refer to Arellano, Blundell, and Bonhomme (2018).
To begin, let’s load the data and pull out the variables we are interested in using. These are person identifiers (person), year, total income (y), savings (tot_assets1) and age. You should bear in mind that it is by no means trivial to measure total income and total assets in these data. The variables we are looking at are the product of a lot of data cleaning and careful choices by the authors.
You are going to estimate the parameters of the income process in the simple savings model by matching the implied variances and covariances from the model to those that are calculated from the data.
Recall that the income process is:
\[ \log(y_{it}) = \mu_{t} + \varepsilon_{it} \]
where \(\varepsilon_{it}\) is an AR1 process with autocorrelation \(\rho\) and variance \(\sigma^2_{\eta} / (1-\rho^2)\). Thus, there are only two parameters dictating the income process: (\(\rho,\sigma_\eta\)).
Setup
To map to the model, assume that agents begin (\(t=1\)) when aged 25 and live for 40 years (so the “terminal” period is at age 64). Thus, we should filter the data to look at only these ages.
@subset!(data,:age.>=25,:age.<=64)
19139×6 DataFrame
19114 rows omitted
Row
person
y
tot_assets1
asset
age
year
Int64
Int64
Int64
Float64
Int64
Int64
1
17118
54000
60000
0.0
49
98
2
12630
61283
224000
39283.0
59
98
3
12647
42300
28240
0.0
38
98
4
5239
82275
7500
0.0
56
98
5
2671
69501
48000
3600.0
35
98
6
13027
68000
148000
20000.0
49
98
7
6791
93758
80000
160.0
41
98
8
6475
26581
23300
0.0
35
98
9
18332
33785
0
0.0
42
98
10
3856
55300
311000
5300.0
33
98
11
19326
40200
105250
0.0
40
98
12
21818
42500
13000
0.0
36
98
13
7300
121508
178000
10008.0
59
98
⋮
⋮
⋮
⋮
⋮
⋮
⋮
19128
6617
115887
241000
21346.0
62
108
19129
626
128600
98000
0.0
46
108
19130
4795
105000
-68000
0.0
34
108
19131
3223
120000
132000
0.0
47
108
19132
8098
26527
4700
0.0
37
108
19133
8954
144026
220000
25.0
46
108
19134
12990
122665
220000
0.0
53
108
19135
8782
55000
69000
0.0
31
108
19136
13059
42728
-10000
0.0
26
108
19137
13535
57000
0
0.0
26
108
19138
3806
87000
74200
0.0
26
108
19139
11085
74000
-50000
0.0
31
108
Part 1
Estimate the parameters \(\mu\) using the sample mean of log income at each age. Create residuals \(\hat{\varepsilon}_{it}\) for each individual in each period using these estimates.
Part 2
The PSID data are taken biennially (every two years). Thus, write a function that takes a guess of \((\rho,\sigma_\eta)\) and calculates:
The unconditional variance of the residual.
The covariance of the residual with its two year lag.
The covariance of the residual with its four year lag.
Part 3
Calculate the sample equivalent of these moments from the data, and write a function that calculates the sum of squared differences between the data and those predicted by a particular choice of \((\rho,\sigma_\eta)\).
If it helps, here is code to create the lags for income (you could adapt this code to create lags for the residuals you calculated in part 1).
d1 =@chain data begin@select:year :person :y@transform:year =:year .+2@rename:ylag1 =:yendd2 =@chain data begin@select:year :person :y@transform:year =:year .+4@rename:ylag2 =:yenddata =@chain data begininnerjoin(d1 , on=[:person,:year])innerjoin(d2 , on=[:person,:year])end
9785×8 DataFrame
9760 rows omitted
Row
person
y
tot_assets1
asset
age
year
ylag1
ylag2
Int64
Int64
Int64
Float64
Int64
Int64
Int64
Int64
1
17118
43799
-2000
0.0
53
102
51700
54000
2
12630
68554
1519000
29454.0
63
102
104104
61283
3
12647
35000
78000
0.0
42
102
30500
42300
4
5239
49948
29900
0.0
60
102
54332
82275
5
2671
77000
84000
0.0
39
102
75000
69501
6
13027
91000
248000
25000.0
53
102
50678
68000
7
6791
122296
118650
154.0
45
102
100503
93758
8
18332
54000
56000
0.0
46
102
40200
33785
9
3856
95800
357000
9600.0
37
102
76400
55300
10
21818
72334
25540
0.0
40
102
58700
42500
11
7300
64319
710000
6130.0
63
102
111140
121508
12
20796
88000
75000
0.0
48
102
112600
105000
13
8455
50880
110500
130.0
53
102
46000
54000
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
9774
3360
198300
775000
3300.0
40
108
161000
161000
9775
1204
44710
5200
0.0
30
108
46360
35000
9776
6483
55019
-3000
0.0
38
108
28000
22720
9777
2182
53908
9000
0.0
32
108
51247
57625
9778
3971
48406
108000
15905.0
43
108
109000
55400
9779
12094
70500
83000
0.0
41
108
55830
10375
9780
12975
104500
266800
2500.0
40
108
77240
98200
9781
9940
133074
282000
274.0
45
108
102900
100300
9782
8048
185500
88000
50000.0
45
108
183000
75200
9783
2921
84845
390800
773.0
59
108
76507
77235
9784
13562
39200
18000
0.0
43
108
35000
47500
9785
3193
66000
37400
0.0
44
108
51000
61210
An example of calculating covariances:
@chain data begin@combinebegin:c1 =cov(log.(:y),log.(:ylag1)) :c2 =cov(log.(:y),log.(:ylag2))endend
1×2 DataFrame
Row
c1
c2
Float64
Float64
1
0.402346
0.363079
Part 4
Now estimate the income process parameters by minimizing this weighted sum of squares (i.e. implement a minimum distance estimator with identity weighting matrix).
where \(\zeta_{it}\) is an additional shock to income that is completely iid (i.e. no persistence). Suppose we estimate the persistence parameter \(\rho\) using the relationship above (which is now misspecified).
Does the population limit of our estimator over- or under-estimate \(\rho\), the persistence in \(\varepsilon\)?
References
Arellano, Manuel, Richard Blundell, and Stephane Bonhomme. 2018. “Nonlinear Persistence and Partial Insurance: Income and Consumption Dynamics in the PSID.”AEA Papers and Proceedings 108 (May): 281–86. https://doi.org/10.1257/pandp.20181049.