Recent Question/Assignment

STATS1900 Business Statistics
Major Assignment
Date Due: Week 10, Thursday 5pm
Worth: 20%
This assignment requires a substantial amount of computer work and written comment. You may need to seek guidance from your tutor along the way. Do not leave things until too late!!
The questions give a careful statement of what is required and information about the presentation of your answers. Please follow these carefully! Marks may be deducted for poor presentation.
Question 1 [30 marks]
You will again examine the data from the minor assignment from the health agency that is conducting an evaluation of all the hospitals in its region. The data collected is contained in a file called ‘houses.xls’ which has been placed in Moodle. Make sure you check with your lab tutor how to obtain this data. For this assignment, you need to use Excel to conduct the statistical analyses.
The columns of the file contain the following information:
Column Name Description
C1 Price Selling price ($)
C2 Location House location (0 = small residential area; 1 = large residential area)
C3 Custom Customised house (0 = not customised; 1 = customised)
C4 Square Metres Square metres of living space (m2)
Before you begin any analysis you must take a random sample of 50 houses from the 117 provided in the data set. Note that this will mean that you need to generate a different data set to the one used in the minor assignment. To do this, find your name in the email or use the Random Sample Generator available on Moodle (Random Sampler Generator-sum_15).
Each of the three tasks below will guide you through various statistical analyses that will investigate various aspects of the data. Your responses to each task should be represented in a well written manner with appropriate statistical analyses, and suitable conclusions in the context of each task. Appropriate annotated computer output should be incorporated within the body of each response.

Task 1:
1. Use Excel to obtain the descriptive statistics to find the mean, standard deviation and standard error of the price of houses , and calculate the 95% confidence interval for the prices of houses based on your sample of 50. Show all working and Excel output.

2. Explain what the confidence interval represents in the context of the problem.
Task 2:
The real estate agency was interested in testing the hypothesis that the price of houses in their area is significantly greater than $280,000. Test this hypothesis using your sample of 50 houses. Write a brief report that should include appropriate computer output and perform the following:
a) Check any assumptions of the hypothesis test;
b) Write down both the null and alternative hypotheses;
c) Carry out the t test and report the p-value, and the test statistic
d) Write an appropriate conclusion in the context of the problem.
variables that will enable you to predict the selling price given the square metres of living space of a particular house. Include this output in your assignment. Answer the following questions:
Task 3:
Is there a linear relationship between the selling price and square metres of living space? Produce a scatterplot with a regression line and perform a regression analysis on these two Write down the regression equation?
a) In practical terms, what does R2 tell us?
b) In practical terms, what do the constant and slope coefficients tell us?
c) Write down the standard error and explain what it means in the context of the problem.
d) Conduct a hypothesis test on the slope coefficient to test whether there is a linear relationship between price and square metres. Include the null and alternative hypotheses; key test results and an appropriate conclusion.
e) Does the linear regression provide a good model? Give statistical reasons based on the scatterplot, p-values, the standard error and coefficient of determination.
f) Is the regression equation useful in predicting the selling price for a house with 50 square metres of living space? Explain.
g) If you were developing a model to predict the selling price based on square metres, what other factors would you like to be able to include?

Question 2 [15 marks]
This question uses the data of quarterly holiday bookings for travel firm from 2010 to 2014 and is contained in the file ‘holidays.xls’. You will need to examine and discuss the original time series plot as well as smoothing the data using a 4-period centred moving average and exponential smoothing. Follow the instructions below:
1. Original Time Series Plot:
a) Use Excel to obtain a time series plot of the bookings (quarters).
b) Write a brief summary on the characteristics of the time series.
c) Mention the absence or presence of the four time series components.
d) Provide examples from the time series to support your discussion. Make sure you include the time series plot.
2. 4 point centred moving average:
a) Use Excel to smooth the data using a 4-period centred moving average.
b) Produce another time series plot that shows both the original data and the 4-period centred moving average on the same plot.
c) Discuss what the 4-period centred moving average shows? Make sure you include the time series plot.
3. Exponential Smoothing:
a) Use Excel to smooth the data using exponential smoothing.
b) Initially, produce a time series plot with the original data and two exponentially smoothed data series with ? = 0.5 and ? = 0.2. That is, you should have 3 time series plots on the one graph.
c) Which exponentially smoothed time series do you prefer? Why? What does your preferred smoothed time series show? Again, make sure you include your time series plot.
For all of the above sections make sure you include a copy of your Excel Spreadsheet as well as all time series plots.