ECON 2300: INTRODUCTORY ECONOMETRICS
Coordinator: Dr. Antonio Peyrache
Research Project 1
Due: 5 pm on 20 September
You are interested in estimating the effect of education on earnings. The data file cps4 small.dta contains 1,000 observations on hourly wage rates, education, and other variables from the 2008 Current Population Survey (CPS):
• wage: earnings per hour
• educ: years of education
• exper: post education years experience
• hrswk: working hours per week
• married: dummy for married
• female: dummy for female
• metro, midwest, south, west: location dummies
• black: dummy for black
• asian: dummy for Asian
Submission of your report
Your report must be single-spaced and in 12 Font size. You should give your answer to each of the following questions following a similar format of the solutions to the tutorial problem sets. When you are required to use R, you must show your R command and R outputs (screenshots or figures generated from R). You will lose 2 points whenever you fail to provide R commands and outputs. For each question, when you are asked to discuss or interpret, your answer has to be brief and compact. You will lose 2 points if your answer is needlessly wordy. In addition, you may lose marks for any of the following: failing to grasp or address core concepts and ideas; poor or ineffective structure; unclear or illogical flow of ideas; fluffy or unclear arguments; and weak or badly composed arguments. You must upload your assignment on the course webpage (Blackboard) in PDF format. (Do not submit a hard
1. (20 points) Load and explore the main variables.
(a) (7 points) You are given a dataset in the .dta format. Figure out how to load this dataset in R. Provide your R-commands to load the data. In particular, be clear about which R-package you install and use. (Hint: use the Internet.)
(b) (13 points) Obtain summary statistics and histograms for the variables wage and educ. For the histograms, give informative titles and variable names instead of just using the default titles and variable names. For example, you could use Years of Education in place of educ. Discuss the data characteristics.
2. (25 points) Estimate the linear regression
ln(wagei) = ß1 + ß2educi + ei.
where ei is the error and ß1 and ß2 are the unknown population coefficients.
(a) (5 points) Report the estimation results in a common form as introduced in the lecture note 3. For example, see page 9 of the note 3, where the estimates are presented in an equation form, along with standard errors and some measures for model fit.
(b) (5 points) Construct a scatter diagram of educ and ln(wage) and plot the estimated regression equation in (a) on the scatter diagram. Give informative title and labels for the variables, e.g., do not use the default title and labels.
(c) (4 points) Assuming that E[e|educ] = 0, interpret the estimated coefficient on educ (2 points) and test whether or not the population coefficient is zero at the 1 % significance level (2 points).
(d) (6 point) You suspect that the hourly wage could depend on working hours per week. Discuss under what condition(s) the estimated coefficients in (a) would be biased due to the omission of the weekly working hours (2 points). Give a reasonable and intuitive story on why omission of the weekly working hours would cause omitted variable bias in the regression in (a) (2 points). Under your story, explain whether the estimated coefficient on educ in (a) would be overestimated or underestimated (2 points). See pages 4 and 5 of Lecture note 4.
(e) (5 point) The variable hrswk is the average weekly working hours for each individual in the data. Regress ln(wage) on educ and hrswk. Discuss the estimation results. In particular, how would you revise your answer in (c)? Are the estimates are statistically significant?
3. (40 points) You are concerned about omitted variable bias in the regressions of Question 1. For that reason, you decide to regress ln(wage) on all other variables in the dataset and use this model as a benchmark.
(a) (11 points) Report a 95% confidence interval for the estimated slope parameter of educ (3 points), explain the relationship between the confidence interval and hypothesis testing (4 points), and test the hypothesis that one year of additional education would increase hourly wage by 12% (4 points).
(b) (7 points) Assuming there is no omitted variable bias, discuss the estimated coefficient on female in the benchmark model. In particular, explain what the estimated coefficient on female means on hourly wage (3 points), compare the effect of being female and the effect of one year of additional education (2 points), and discuss whether it is statistically significant or not (2 points).
(c) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the hourly wage is not affected by the geographic location. Explain how you reach your conclusion. (Hint: use package car.)
(d) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the wage differential associated with African American is equal to the wage differential associated with Asian American. Explain how you reach your conclusion. (Hint: use package car.)
(e) (7 points) How would you modify the benchmark model to estimate the effects on hourly wage of one additional year of education separately for each gender (4 points). How do the effects of education differ between the genders and is the difference statistically significant? (3 points)
(f) (5 point) Keoka is an African American woman, working in a metropolitan area. After she obtained her high school diploma, she got a job and started working instead of getting a higher education. She has never been married. Now she has five years of experience in the industry and is working full time (40 hours a week). Using the benchmark model, predict her hourly wage.
Be careful: the left-hand side variable is ln(wage), but you should predict Keoka’s wage.
4. (15 points) It may be more useful to estimate the effect on earnings of education by using the highest diploma/degree rather than years of schooling. Define four dummy variables to indicate educational achievements;
• lt hs = 1 if educ 12
• hs = 1 if educ = 12
• col = 1 if educ = 16
• some col = 1 for all other values of educ.
(a) (6 points) Create the dummy variables (lt hs,hs,col,some col) as defined above (3 points) and compute the sample means of hourly wage for each of the four education categories (3 points).
(b) (9 points) Regress wage on the four dummies (lt hs,hs,col,some col). You will face a problem. What is the problem here? Under what circumstances would you face the problem (4 points). To avoid it, you now regress wage on three dummies (lt hs,col,some col) excluding hs. Interpret the estimated coefficients and compare the estimation results with the findings in (a) (5 points).