clear

set memory 1g

set matsize 1000

*load data

use "C:\Users\Goosephie\Desktop\GradQuant\Panel\psid.dta", clear

* This panel contains observations on tenure, race, education, union membership, age and log of hourly wages

*Pooled OLS

reg lnhourlywage tenure tenuresquared age agesquared black union educ

*Tell Stata that this is a panel and that it has id and time identifiers
xtset id year

*FE regression. Adding robust after fe is the same as adding vce(cluster id)
xtreg lnhourlywage tenure tenuresquared age agesquared black union educ, fe

*black and educ drop because they dont change.

*The return to 5 years of tenure is 0.0791665

*One of the ways to implement LSDV regression.
areg lnhourlywage tenure tenuresquared age agesquared black union educ i.id, absorb(id)

*You can add vce(cluster id) at the end too so that your standard errors taken into account within correlation.
*However, the correct clustered standard errors are with xtreg because we are using a short panel. More on this can be found at Cameron and Trivedi.
*When reporting R squared, use the one in areg or LSDV

*You can get coefficients of the dummies but remember that since this is FE these estimates are only sample based and not consistent
reg lnhourlywage tenure tenuresquared age agesquared black union educ i.id

*We can get the exact same result (we are doing the same calculation but creating the dummy variables separately
tabulate id, generate(dummyid)

*Then you would need to regress as before but adding all the dummies that have been created (type over 900 variables!... so no point in doing that here)
*reg regressors dummyid1 dummyid2 dummyid3 and so on...

*Yet another way to perform the same calculation
xi: reg lnhourlywage tenure tenuresquared age agesquared black union educ i.id

*Demeaned regression=FE. First calculate means and then demean. Then run the regression on demeaned variables. Notice that the fixed effects drop out.
by id: egen union_mean=mean(union)

by id: egen educ_mean=mean(educ)

by id: egen black_mean=mean(black)

by id: egen lnhourlywage_mean=mean(lnhourlywage)

by id: egen hourlywage_mean=mean(hourlywage)

by id: egen age_mean=mean(age)

by id: egen tenure_mean=mean(tenure)

by id: egen tenuresquared_mean=mean(tenuresquared)

by id: egen agesquared_mean=mean(agesquared)

gen dm_lnhourlywage =lnhourlywage- lnhourlywage_mean

gen dm_hourlywage = hourlywage-hourlywage_mean

gen dm_tenure = tenure-tenure_mean

gen dm_tenuresquared = tenuresquared-tenuresquared_mean

gen dm_age = age-age_mean

gen dm_agesquared = agesquared-agesquared_mean

gen dm_union = union-union_mean

gen dm_black = black-black_mean

reg dm_lnhourlywage dm_tenure dm_tenuresquared dm_age dm_agesquared dm_union dm_black

*1st differenced regression (not a good idea if T is big)
sort id year
regress D.(lnhourlywage agesquared age educ union tenure tenuresquared union black)


*Instead of demeaning and then differencing Stata can do this for us in one command. Howver the data will be permanently changed so it is good to use preserve and then restore after the regression is estimated.
preserve
xtdata, fe
reg lnhourlywage tenure tenuresquared age agesquared black union educ
restore
*The results are the same and the variables have been demeaned for us



*Figure for LSDV
quietly xi: reg lnhourlywage hours i.id
predict yhat
separate lnhourlywage, by(id)
separate yhat, by(id)
twoway connected yhat238 yhat86 yhat95 yhat2048 yhat2118 yhat9562 yhat9809 hours, msymbol(none diamond_hollow triangle_hollow square_hollow + circle_hollow x) msize(medium) mcolor(black black black black black black black) || lfit lnhourlywage hours, clwidth(thick) clcolor(black)

*Hausman test says FE is the way to go
xtreg lnhourlywage tenure tenuresquared age agesquared black union educ, fe
estimates store FE
xtreg lnhourlywage tenure tenuresquared age agesquared black union educ, re
estimates store RE
hausman FE RE,sigmamore

*Want to test if your model needs time dummies?
xtreg lnhourlywage tenure tenuresquared age agesquared black union educ i.year, fe vce(cluster id)
testparm i.year
*Null of insignificance not rejected, time dummies are not adding anything.
