ECON 2300: INTRODUCTORY ECONOMETRICS

Coordinator: Dr. Antonio Peyrache

Research Project 1

Due: 5 pm on 20 September

Background

You are interested in estimating the effect of education on earnings. The data file cps4 small.dta contains 1,000 observations on hourly wage rates, education, and other variables from the 2008 Current Population Survey (CPS):

• wage: earnings per hour

• educ: years of education

• exper: post education years experience

• hrswk: working hours per week

• married: dummy for married

• female: dummy for female

• metro, midwest, south, west: location dummies

• black: dummy for black

• asian: dummy for Asian

Submission of your report

Your report must be single-spaced and in 12 Font size. You should give your answer to each of the following questions following a similar format of the solutions to the tutorial problem sets. When you are required to use R, you must show your R command and R outputs (screenshots or figures generated from R). You will lose 2 points whenever you fail to provide R commands and outputs. For each question, when you are asked to discuss or interpret, your answer has to be brief and compact. You will lose 2 points if your answer is needlessly wordy. In addition, you may lose marks for any of the following: failing to grasp or address core concepts and ideas; poor or ineffective structure; unclear or illogical flow of ideas; fluffy or unclear arguments; and weak or badly composed arguments. You must upload your assignment on the course webpage (Blackboard) in PDF format. (Do not submit a hard

copy.)

Research tasks

1. (20 points) Load and explore the main variables.

(a) (7 points) You are given a dataset in the .dta format. Figure out how to load this dataset in R. Provide your R-commands to load the data. In particular, be clear about which R-package you install and use. (Hint: use the Internet.)

(b) (13 points) Obtain summary statistics and histograms for the variables wage and educ. For the histograms, give informative titles and variable names instead of just using the default titles and variable names. For example, you could use Years of Education in place of educ. Discuss the data characteristics.

2. (25 points) Estimate the linear regression

ln(wagei) = ß1 + ß2educi + ei.

where ei is the error and ß1 and ß2 are the unknown population coefficients.

(a) (5 points) Report the estimation results in a common form as introduced in the lecture note 3. For example, see page 9 of the note 3, where the estimates are presented in an equation form, along with standard errors and some measures for model fit.

(b) (5 points) Construct a scatter diagram of educ and ln(wage) and plot the estimated regression equation in (a) on the scatter diagram. Give informative title and labels for the variables, e.g., do not use the default title and labels.

(c) (4 points) Assuming that E[e|educ] = 0, interpret the estimated coefficient on educ (2 points) and test whether or not the population coefficient is zero at the 1 % significance level (2 points).

(d) (6 point) You suspect that the hourly wage could depend on working hours per week. Discuss under what condition(s) the estimated coefficients in (a) would be biased due to the omission of the weekly working hours (2 points). Give a reasonable and intuitive story on why omission of the weekly working hours would cause omitted variable bias in the regression in (a) (2 points). Under your story, explain whether the estimated coefficient on educ in (a) would be overestimated or underestimated (2 points). See pages 4 and 5 of Lecture note 4.

(e) (5 point) The variable hrswk is the average weekly working hours for each individual in the data. Regress ln(wage) on educ and hrswk. Discuss the estimation results. In particular, how would you revise your answer in (c)? Are the estimates are statistically significant?

3. (40 points) You are concerned about omitted variable bias in the regressions of Question 1. For that reason, you decide to regress ln(wage) on all other variables in the dataset and use this model as a benchmark.

(a) (11 points) Report a 95% confidence interval for the estimated slope parameter of educ (3 points), explain the relationship between the confidence interval and hypothesis testing (4 points), and test the hypothesis that one year of additional education would increase hourly wage by 12% (4 points).

(b) (7 points) Assuming there is no omitted variable bias, discuss the estimated coefficient on female in the benchmark model. In particular, explain what the estimated coefficient on female means on hourly wage (3 points), compare the effect of being female and the effect of one year of additional education (2 points), and discuss whether it is statistically significant or not (2 points).

(c) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the hourly wage is not affected by the geographic location. Explain how you reach your conclusion. (Hint: use package car.)

(d) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the wage differential associated with African American is equal to the wage differential associated with Asian American. Explain how you reach your conclusion. (Hint: use package car.)

(e) (7 points) How would you modify the benchmark model to estimate the effects on hourly wage of one additional year of education separately for each gender (4 points). How do the effects of education differ between the genders and is the difference statistically significant? (3 points)

(f) (5 point) Keoka is an African American woman, working in a metropolitan area. After she obtained her high school diploma, she got a job and started working instead of getting a higher education. She has never been married. Now she has five years of experience in the industry and is working full time (40 hours a week). Using the benchmark model, predict her hourly wage.

Be careful: the left-hand side variable is ln(wage), but you should predict Keoka’s wage.

4. (15 points) It may be more useful to estimate the effect on earnings of education by using the highest diploma/degree rather than years of schooling. Define four dummy variables to indicate educational achievements;

• lt hs = 1 if educ 12

• hs = 1 if educ = 12

• col = 1 if educ = 16

• some col = 1 for all other values of educ.

(a) (6 points) Create the dummy variables (lt hs,hs,col,some col) as defined above (3 points) and compute the sample means of hourly wage for each of the four education categories (3 points).

(b) (9 points) Regress wage on the four dummies (lt hs,hs,col,some col). You will face a problem. What is the problem here? Under what circumstances would you face the problem (4 points). To avoid it, you now regress wage on three dummies (lt hs,col,some col) excluding hs. Interpret the estimated coefficients and compare the estimation results with the findings in (a) (5 points).

Coordinator: Dr. Antonio Peyrache

Research Project 1

Due: 5 pm on 20 September

Background

You are interested in estimating the effect of education on earnings. The data file cps4 small.dta contains 1,000 observations on hourly wage rates, education, and other variables from the 2008 Current Population Survey (CPS):

• wage: earnings per hour

• educ: years of education

• exper: post education years experience

• hrswk: working hours per week

• married: dummy for married

• female: dummy for female

• metro, midwest, south, west: location dummies

• black: dummy for black

• asian: dummy for Asian

Submission of your report

Your report must be single-spaced and in 12 Font size. You should give your answer to each of the following questions following a similar format of the solutions to the tutorial problem sets. When you are required to use R, you must show your R command and R outputs (screenshots or figures generated from R). You will lose 2 points whenever you fail to provide R commands and outputs. For each question, when you are asked to discuss or interpret, your answer has to be brief and compact. You will lose 2 points if your answer is needlessly wordy. In addition, you may lose marks for any of the following: failing to grasp or address core concepts and ideas; poor or ineffective structure; unclear or illogical flow of ideas; fluffy or unclear arguments; and weak or badly composed arguments. You must upload your assignment on the course webpage (Blackboard) in PDF format. (Do not submit a hard

copy.)

Research tasks

1. (20 points) Load and explore the main variables.

(a) (7 points) You are given a dataset in the .dta format. Figure out how to load this dataset in R. Provide your R-commands to load the data. In particular, be clear about which R-package you install and use. (Hint: use the Internet.)

(b) (13 points) Obtain summary statistics and histograms for the variables wage and educ. For the histograms, give informative titles and variable names instead of just using the default titles and variable names. For example, you could use Years of Education in place of educ. Discuss the data characteristics.

2. (25 points) Estimate the linear regression

ln(wagei) = ß1 + ß2educi + ei.

where ei is the error and ß1 and ß2 are the unknown population coefficients.

(a) (5 points) Report the estimation results in a common form as introduced in the lecture note 3. For example, see page 9 of the note 3, where the estimates are presented in an equation form, along with standard errors and some measures for model fit.

(b) (5 points) Construct a scatter diagram of educ and ln(wage) and plot the estimated regression equation in (a) on the scatter diagram. Give informative title and labels for the variables, e.g., do not use the default title and labels.

(c) (4 points) Assuming that E[e|educ] = 0, interpret the estimated coefficient on educ (2 points) and test whether or not the population coefficient is zero at the 1 % significance level (2 points).

(d) (6 point) You suspect that the hourly wage could depend on working hours per week. Discuss under what condition(s) the estimated coefficients in (a) would be biased due to the omission of the weekly working hours (2 points). Give a reasonable and intuitive story on why omission of the weekly working hours would cause omitted variable bias in the regression in (a) (2 points). Under your story, explain whether the estimated coefficient on educ in (a) would be overestimated or underestimated (2 points). See pages 4 and 5 of Lecture note 4.

(e) (5 point) The variable hrswk is the average weekly working hours for each individual in the data. Regress ln(wage) on educ and hrswk. Discuss the estimation results. In particular, how would you revise your answer in (c)? Are the estimates are statistically significant?

3. (40 points) You are concerned about omitted variable bias in the regressions of Question 1. For that reason, you decide to regress ln(wage) on all other variables in the dataset and use this model as a benchmark.

(a) (11 points) Report a 95% confidence interval for the estimated slope parameter of educ (3 points), explain the relationship between the confidence interval and hypothesis testing (4 points), and test the hypothesis that one year of additional education would increase hourly wage by 12% (4 points).

(b) (7 points) Assuming there is no omitted variable bias, discuss the estimated coefficient on female in the benchmark model. In particular, explain what the estimated coefficient on female means on hourly wage (3 points), compare the effect of being female and the effect of one year of additional education (2 points), and discuss whether it is statistically significant or not (2 points).

(c) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the hourly wage is not affected by the geographic location. Explain how you reach your conclusion. (Hint: use package car.)

(d) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the wage differential associated with African American is equal to the wage differential associated with Asian American. Explain how you reach your conclusion. (Hint: use package car.)

(e) (7 points) How would you modify the benchmark model to estimate the effects on hourly wage of one additional year of education separately for each gender (4 points). How do the effects of education differ between the genders and is the difference statistically significant? (3 points)

(f) (5 point) Keoka is an African American woman, working in a metropolitan area. After she obtained her high school diploma, she got a job and started working instead of getting a higher education. She has never been married. Now she has five years of experience in the industry and is working full time (40 hours a week). Using the benchmark model, predict her hourly wage.

Be careful: the left-hand side variable is ln(wage), but you should predict Keoka’s wage.

4. (15 points) It may be more useful to estimate the effect on earnings of education by using the highest diploma/degree rather than years of schooling. Define four dummy variables to indicate educational achievements;

• lt hs = 1 if educ 12

• hs = 1 if educ = 12

• col = 1 if educ = 16

• some col = 1 for all other values of educ.

(a) (6 points) Create the dummy variables (lt hs,hs,col,some col) as defined above (3 points) and compute the sample means of hourly wage for each of the four education categories (3 points).

(b) (9 points) Regress wage on the four dummies (lt hs,hs,col,some col). You will face a problem. What is the problem here? Under what circumstances would you face the problem (4 points). To avoid it, you now regress wage on three dummies (lt hs,col,some col) excluding hs. Interpret the estimated coefficients and compare the estimation results with the findings in (a) (5 points).

GATEWAY BUSINESS COLLEGEAssignmentACC201 Management AccountingAssessment detailsWeight 25%Due date 11.00 pm Sunday 26 SeptemberThis is an individual assessment taskYou are required to complete a research-based...Mr Vinh NguyenMr Vinh Nguyen is a 48-year-old male, who emigrated from Vietnam over 30 years ago. He currently lives with his mother-in-law and son (age 21) in the Inner West City of Sydney. Mr Nguyen...Assessment InformationSubject Code: BUS606Subject Name: Business Research Proposal Topic Selection, Justification and PresentationAssessment Title: Assessment 1 – Individual PresentationWeighting: 15 %Total...I need the answer in modified points2d.h and README.txt where you state you have completed for each question.CSCI 335First programming assignment (100 points)Due September 16Please follow the blackboard...Students will review the specific case scenario materials provided for this assignment. Using this information students will identify and prioritise the patient's complex health issues. Students will...Paper 1: Professional SkepticismLearning Objectives:1. Familiarize students with academic research2. Provide an overall definition of professional skepticism3. Apply the concept of professional skepticism...Assignment Instructions: Create a Detailed, Resource-Leveled Work Breakdown ScheduleWhat you must submit, a project scenario, additional information useful for this assignment and tips for developing a...**Show All Questions**