Recall from class that we have to be careful to sample from the bootstrap at the right unit level. Since we know that observations at the level of person are correlated, we have to sample on this variable with replacement. Here’s a function to create a bootstrapped panel dataset by sampling individuals with replacement.
functiondraw_bootstrap_sample(data) id =unique(data.person) #<- create a list of all person identifiers in the data N =length(id) id_boot = id[rand(1:N,N)] #<- sample individuals randomly N times with replacement d =DataFrame(person = id_boot, id_boot =1:N) #<- add a unique identified for each draw (:id_boot) boot_data =innerjoin(d,data,on=:person) @transform!(boot_data,:id_old =:person, :person =:id_boot) #<- since we are creating lags using the person identified, we need this to be unique for each draw.return boot_dataenddb =draw_bootstrap_sample(data)
19078×8 DataFrame
19053 rows omitted
Row
person
id_boot
y
tot_assets1
asset
age
year
id_old
Int64
Int64
Int64
Int64
Float64
Int64
Int64
Int64
1
4489
4489
42300
28240
0.0
38
98
12647
2
4158
4158
82275
7500
0.0
56
98
5239
3
3180
3180
69501
48000
3600.0
35
98
2671
4
937
937
68000
148000
20000.0
49
98
13027
5
4278
4278
93758
80000
160.0
41
98
6791
6
4855
4855
93758
80000
160.0
41
98
6791
7
590
590
26581
23300
0.0
35
98
6475
8
3136
3136
55300
311000
5300.0
33
98
3856
9
4937
4937
55300
311000
5300.0
33
98
3856
10
2966
2966
42500
13000
0.0
36
98
21818
11
1300
1300
121508
178000
10008.0
59
98
7300
12
3010
3010
121508
178000
10008.0
59
98
7300
13
3144
3144
121508
178000
10008.0
59
98
7300
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
19067
2635
2635
128600
98000
0.0
46
108
626
19068
1170
1170
105000
-68000
0.0
34
108
4795
19069
3701
3701
105000
-68000
0.0
34
108
4795
19070
5023
5023
105000
-68000
0.0
34
108
4795
19071
1703
1703
26527
4700
0.0
37
108
8098
19072
2410
2410
26527
4700
0.0
37
108
8098
19073
4654
4654
144026
220000
25.0
46
108
8954
19074
1207
1207
55000
69000
0.0
31
108
8782
19075
4153
4153
55000
69000
0.0
31
108
8782
19076
2238
2238
42728
-10000
0.0
26
108
13059
19077
3722
3722
42728
-10000
0.0
26
108
13059
19078
4025
4025
42728
-10000
0.0
26
108
13059
Bootstrapping for a Weighting Matrix
Suppose we would like to weight out minimum distance criterion using the inverse of the variance for each statistic. A simple way to compute this is to use the bootstrap. Here is code to do that:
usingRandomfunctionboot_moment_variance(data,B ; seed =1010) M =zeros(3,B)Random.seed!(seed)for b inaxes(M,2) db =draw_bootstrap_sample(data) M[:,b] =get_moments(db)end V =cov(M')endV_mom =boot_moment_variance(data,100)
Now we can apply the exact same routine as before. In this case, let’s return the entire bootstrapped sample of parameters instead of just the variance.
Try comparing the variance of these estimators using the estimated optimal weighting matrix to the identity matrix. Does it make an appreciable difference to the estimated distributino of the parameter estimates?