### Recent Question/Assignment

STATS6900
Assignment – 2015/07
Date Due: Refer to Course Description
Total Marks: 50 (45 marks for questions + up to 5 bonus marks for overall structure)
Worth: 20% of final assessment
This assignment requires a considerable amount of computer work and written comment. You may need to seek guidance from your tutor along the way. Do not leave the Assignment until too late. Each question carefully describes what you are required to do, so please follow the instructions carefully. Your answer to each question should begin with the number of the question. Refer to the Assessment Criteria and General Marking Guide below for details.
As anyone who has looked at house prices knows, house prices depend on the local market. To control for that, we will restrict our attention to a single market. We have a random sample of 211 home sales for January 2015 from a metro region around the city of Chicago; data are obtained from a real estate research site of Zillow.com.
The first thing often mentioned in describing a house for sale is the number of bedrooms. In this assignment you will examine the relationship between house prices and the number of bedrooms, and develop a potential statistical regression model to predict house prices from the variable number of bedrooms.
Data is contained in a file called ‘HouseSales.xls’ and the columns of the file contain the following information:
Column Name Description
A House ID Number Number to identify each house sold
B
House Price
Price of the house (in thousands of \$US)
C Bedrooms Number of bedrooms in the house
For the analysis you must use your own random sample of 180 records from the 211 provided in the file HouseSales.xls. Find this in the file
For each task below, you must answer all the questions in sequential order and submit all of the required printouts, graphs, tables and summaries required.
NB: Each graph and table should have a heading and each axis should have a label!!
1. Introduction and Variable List: Give a brief introduction to your report. Describe the nature of the data. Read questions 3 to 7 and briefly describe the specific data and relationships which will be examined here.
[3 marks]
2. Data: Provide a printout of the data in your sample, sorted in ascending order based on House ID Number.
[2 mark]
3. Produce a histogram showing the distribution of the price of ALL houses in your data sample. Provide your comments on the graph shape, and the most suitable measures of centre and spread for this data.
[6 marks]
4. Produce a side-by-side box plot of house prices against bedrooms. Discuss what this box plot implies in context of the problem.
[5 marks]

5. T-Test and CI:
a) Obtain appropriate descriptive statistics, and calculate a 95% confidence interval for the mean price of the houses, of your sample.
b) Assume that the average price of all houses sold in the region in January 2014 is \$330, 000. Conduct a statistical hypothesis test to determine if the average price of houses sold in January 2015 (from your sample) is significantly different from the average price of the houses sold in January 2014. Mention any assumptions, include relevant hypotheses and report the results and conclusion in the conventional manner.
[7 marks]
6. Scatter plot with trend lines: Obtain a scatterplot comparing the relationship between house prices and number of bedrooms. Think carefully about which variable should go on the vertical axis. Remember, it is the independent variable that goes on the horizontal axis (i.e. the x-axis). Include trend lines, their equations and R-squared values on the graph. Make sure you label axes properly and that your graph has an appropriate title.
Briefly compare the nature of the relationship between these two variables.
[5 marks]
7. Use Excel to carry out a regression analysis on the two variables: house prices (in thousands of dollars) and the number of bedrooms.
a) Copy the output into your assignment and use it to determine the answers to the following questions.
b) Write down the regression equation.
c) State the R-squared value and the standard error and explain what they mean with respect to the data.
d) Write down the value of the gradient of the regression line and explain what it means for this data.
e) Are the values for the constant and the gradient (slope) significant (i.e. significantly different from zero) in this case? Justify your answer.
f) Do you think this regression model is a good model? Justify your answer using the regression output.
[14 marks]
8. Using the information obtained for your analyses write a short conclusion about what you found from the study above.
[3 marks]
Assessment Criteria:
First class level student work will display an ability to apply conceptual understanding of statistics and data analysis techniques to practical situations. In this case, such work will show evidence of student’s ability to:
• Identify the problem of interest and its practical importance.
• Source appropriate data for analysis.
• Use appropriate tools to extract and analyse the data.
• Determine the best method of data analysis and identify limitations of such method.
• Explain results with clarity and confidence.
Other levels of work: The further the student’s work deviates from the generalised ideal described above, the lower their resulting mark is likely to be.
General Marking Guide:
• Q1 [Total 3 marks] – Introduction and variable list (3 marks).
• Q2 [Total 2 marks] – Data sample (1 mark), arranged in ascending order (1 mark),
• Q3 [Total 6 marks] – Histogram with appropriate labels and title (3 marks), comment on the shape (1 mark), and suitable measure of centre and spread (2 marks).
• Q4 [Total 5 marks] – Side-by-side box plot with appropriate labels and title (3 marks), and discussion in context of the problem (2 marks).
• Q5 [Total 7 marks] – Q5a Descriptive statistics (1 mark), and CI with calculations (2 marks); Q5b Name of the hypothesis test (1 mark), assumptions of the test (2 marks), and suitable conclusion (1 mark).
• Q6 [Total 5 marks] – Scatter plot with appropriate labels (3 marks), and discussion on the nature of the relationship (2 marks).
• Q7 [Total 14 marks] – Q7a Output (1 mark); Q7b Regression equation (1 mark); Q7c Values of R-sq and standard error (1 mark), and their explanation in context of the problem (2 marks); Q7d Gradient of the regression line (1 mark), and its explanation in context of the problem (2 mark); Q7e Significance of the constant and the gradient (1 mark), and the justification of each (2 marks); Q7f Is the model good? (1 mark), justification (2 marks).
• Q8 [Total 3 marks] – Overall conclusions (3 marks).
• Bonus mark if overall flow and structure of discussion, technical language, grammar and spelling is sound (up to 5 marks).

THE END