Assignment 1

Setup: Loading the Data

Here is code to load the dataset and do a little cleaning / filtering.

The sample is all mothers in the PSID who are unmarried at the time of their first childbirth.

using CSV, DataFrames, DataFramesMeta

data = @chain begin
    CSV.read("../children-cash-transfers/data/MainPanelFile.csv",DataFrame,missingstring = "NA")
    @select :MID :year :wage :hrs :earn :SOI :CPIU :WelfH :FSInd
    @subset :year.>=1985 :year.<=2010
    @transform :AFDC = :WelfH.>0
    @rename :FS = :FSInd
    end
89747×10 DataFrame
89722 rows omitted
Row MID year wage hrs earn SOI CPIU WelfH FS AFDC
Int64 Int64 Float64? Int64? Float64? Int64 Float64 Float64? Int64? Bool?
1 4031 1990 missing missing missing 43 0.758793 0.0 0 false
2 4031 1991 missing missing missing 43 0.790786 0.0 0 false
3 4031 1992 missing missing missing 43 0.814835 0.0 1 false
4 4031 1993 missing missing missing 43 0.839034 0.0 0 false
5 4031 1994 missing 0 0.0 43 0.860812 1704.0 1 true
6 4031 1995 missing 0 0.0 43 0.88496 1704.0 1 true
7 4031 1996 missing 0 0.0 43 0.910948 1704.0 1 true
8 4031 1997 missing missing missing 43 0.932244 missing missing missing
9 4031 1998 missing 0 0.0 43 0.946664 0.0 1 false
10 4031 1999 missing missing missing 43 0.967426 missing missing missing
11 4031 2000 3.33333 120 400.0 43 1.0 0.0 0 false
12 4031 2001 missing missing missing 43 1.02817 missing missing missing
13 4031 2002 missing 0 0.0 43 1.04457 0.0 0 false
89736 9308002 1999 missing missing missing 39 0.967426 missing missing missing
89737 9308002 2000 missing missing missing 39 1.0 missing missing missing
89738 9308002 2001 missing missing missing 39 1.02817 missing missing missing
89739 9308002 2002 missing missing missing 39 1.04457 missing missing missing
89740 9308002 2003 missing missing missing 39 1.06857 missing missing missing
89741 9308002 2004 missing missing missing 39 1.09708 missing missing missing
89742 9308002 2005 missing missing missing 39 1.13401 missing missing missing
89743 9308002 2006 missing missing missing 39 1.17054 missing missing missing
89744 9308002 2007 missing missing missing 39 1.20414 missing missing missing
89745 9308002 2008 missing missing missing 39 1.25008 missing missing missing
89746 9308002 2009 missing missing missing 39 1.24608 missing missing missing
89747 9308002 2010 missing missing missing 39 1.26647 missing missing missing

You may be unfamiliar with some of these commands, which make use of DataFrames and DataFramesMeta. In particular, think of the @chain macro as a way to compose functions. So for example:

d1 = @chain d2 begin
    func1(x)
    func2(y)
    func3(z)
end

is equivalent to calling:

d1 = func3(func2(func1(d2,x),y),z)

If you want to understand better, google is your friend!

Question 1

Calculate average welfare participation (AFDC) by year and plot it. What do you think happened with welfare participation in 1996 and after? If you don’t know the historical context, a quick search online or a read of this paper should help you out.

If you are new to julia, here is average hours calculated and plotted to get you started.

using StatsPlots, Statistics

d = @chain data begin
    groupby(:year)
    @combine :Hours = mean(skipmissing(:hrs))
    @subset .!isnan.(:Hours)
end

@df d plot(:year,:Hours, legend = :none, linewidth = 2)
xlabel!("Year")
ylabel!("Average Welfare Participation")

Question 2

Now write code to

  1. Deflate earnings by CPI (CPIU).
  2. Calculate annual average earnings for each individual (identified by MID).
  3. Drop individuals with fewer than 10 years of data.
  4. Categorize individuals by whether their average earnings is below or above the median across individuals.
  5. Plot average participation in each year for individuals in each of these two categories.

Do you think this pattern is likely to be generated by a model without persistent unobserved heterogeneity? No strictly correct answer here, just curious to read what you think.

In case it helps, here is code for the first three steps. You could edit this to add additional operations to the chain or work with d directly.

d = @chain data begin
    @transform :earn = :earn ./ :CPIU
    groupby(:MID)
    @combine :T = sum(.!ismissing.(:earn)) :earn = mean(skipmissing(:earn)) 
    @subset :T .>= 10
end
1089×3 DataFrame
1064 rows omitted
Row MID T earn
Int64 Int64 Float64
1 4031 10 40.0
2 4179 16 6089.12
3 7030 11 4500.53
4 41007 12 16147.7
5 41008 11 0.0
6 45030 11 14374.3
7 45031 11 17693.5
8 47031 11 15946.7
9 84005 18 30769.6
10 105030 14 4633.61
11 106173 13 17714.4
12 122173 13 56275.2
13 126003 19 6705.08
1078 6843006 19 1805.93
1079 6843173 19 5005.44
1080 6845005 19 19795.4
1081 6849005 19 2450.55
1082 6849188 15 23259.0
1083 6853003 19 4856.42
1084 6862005 11 18899.4
1085 6862008 19 26634.8
1086 6864002 19 8695.92
1087 6864003 18 13627.5
1088 6867013 13 389.097
1089 6872171 17 3294.07

Question 3

This question is to familiarize you with the module Tranfers.jl which will enable you to calculate post-tax and transfer income for individuals given their earnings, non-labor income, state, year, and family size. The function budget in this module takes the arguments:

  • E: monthly earnings (either real or nominal)
  • N: monthly non-labor income (real or nominal)
  • SOI: the SOI code for state of residence
  • year: calendar year
  • num_kids: the number of children
  • cpi: set to 1. if E and N are nominal
  • p: equal to 0 if no programs, 1 if food stamps, 2 if food stamps + welfare.

For example the function call:

Transfers.budget(500.,0.,23,2000,2,1.,2)

calculates net income for a mother in Michigan (SOI code 23) with 2 kids, nominal labor income of $500 a month in the year 2000, and no non-labor income.

include("../children-cash-transfers/src/Transfers.jl")

Transfers.budget(500.,0.,23,2000,2,1.,2)
978.5408333333334

Create a graph that represents total net transfers for a single mother with two kids in the years 1990 and 2000 and in the states of Mississippi and New York. Depict these transfers as a function of earnings between the values of 0 and $1,000 a month (nominal). You can assume that all households are receiving both food stamps and welfare.

What do you make of the differences in these transfers across states and over time?