STATS1900 Business Statistics
Date Due: Week 10, Thursday May 28, 4pm
This assignment requires a substantial amount of computer work and written comment. You may need to seek guidance from your tutor along the way. Do not leave things until too late!!
The questions give a careful statement of what is required and information about the presentation of your answers. Please follow these carefully! Marks may be deducted for poor presentation.
Question 1 [30 marks]
You will again examine the data from the minor assignment from the health agency that is conducting an evaluation of all the hospitals in its region. The data collected is contained in a file called ‘hospitals.xls’ which has been placed in Moodle. Make sure you check with your lab tutor how to obtain this data. For this assignment, you need to use Excel to conduct the statistical analyses.
The columns of the file contain the following information:
Column Name Description
C1 Hospital Hospital Number
C2 Control Type of control (1 = government, non-federal;
2 = nongovernment, not-for-profit;
3 = for-profit;
4 = federal government)
C3 Service Type of service (1 = general medical; 2 = psychiatric)
C4 Beds Number of beds
C5 Admissions Number of admissions
Before you begin any analysis you must take a random sample of 50 hospitals from the 200 provided in the data set. Note that this will mean that you need to generate a different data set to the one used in the minor assignment. To do this, use the Random Sample Generator available on Moodle (Random Sample Generator-sem1_15).
Each of the three tasks below will guide you through various statistical analyses that will investigate various aspects of the data. Your responses to each task should be represented in a well written manner with appropriate statistical analyses, and suitable conclusions in the context of each task. Appropriate annotated computer output should be incorporated within the body of each response.
the number of admissions at the hospitals based on your sample of 50. Show all working and Excel output.
2. Explain what the confidence interval represents in the context of the problem.
The health agency was interested in testing the hypothesis that the number of admissions for all US hospitals is significantly greater than 6000 patients. Test this hypothesis using your sample of 50 hospitals. Write a brief report that should include appropriate computer output and perform the following: 4 marks
1. Use Excel to obtain the descriptive statistics to find the mean, standard deviation and standard error of the number of admissions, and calculate the 95% confidence interval for
a) Check any assumptions of the hypothesis test; 1 mark
b) Write down both the null and alternative hypotheses; 2 marks
c) Carry out the t test and report the p-value, and the test statistic 3 marks
Is there a linear relationship between the admissions and number of beds? Produce a
scatterplot with a regression line and perform a regression analysis on these two variables that will enable you to predict the number of admissions given the number of beds of a particular hospital. Include this output in your assignment. Answer the following questions:
d) Write an appropriate conclusion in the context of the problem. 2 marks a) Write down the regression equation?
b) In practical terms, what does R2 tell us?
d) Write down the standard error and explain what it means in the context of the problem.
e) Conduct a hypothesis test on the slope coefficient to test whether there is a linear relationship between number of admissions and number of beds. Include the null and alternative hypotheses; key test results and an appropriate conclusion.
f) Does the linear regression provide a good model? Give statistical reasons based on the scatterplot, p-values, the standard error and coefficient of determination.
g) Is the regression equation useful in predicting the number of admissions for a hospital with 1500 beds? Explain.
h) If you were developing a model to predict the admissions based on the number of beds, what other factors would you like to be able to include? 2 marks
c) In practical terms, what do the constant and slope coefficients tell us?
Question 2 [15 marks]
This question uses the data of monthly Australian beer production (Megalitres) from January 1991 – August 1995 and is contained in the file ‘beer.xls’. You will need to examine and discuss the original time series plot as well as smoothing the data using a 12 point centred moving average and exponential smoothing. Follow the instructions below:
1. Original Time Series Plot:
a) Use Excel to obtain a time series plot of the beer production versus time (months).
b) Write a brief summary on the characteristics of the time series.
c) Mention the absence or presence of the four time series components.
d) Provide examples from the time series to support your discussion. Make sure you include the time series plot.
2. 12 point centred moving average:
a) Use Excel to smooth the data using a 12 point centred moving average.
b) Produce another time series plot that shows both the original data and the 12-point centred moving average on the same plot.
c) Discuss what the 12-point centred moving average shows? Make sure you include the time series plot.
3. Exponential Smoothing:
a) Use Excel to smooth the data using exponential smoothing.
b) Initially, produce a time series plot with the original data and two exponentially smoothed data series with ? = 0.4 and ? = 0.1. That is, you should have 3 time series plots on the one graph.
c) Which exponentially smoothed time series do you prefer? Why? What does your preferred smoothed time series show? Again, make sure you include your time series plot.
For all of the above sections make sure you include a copy of your Excel Spreadsheet as well 5 marks
as all time series plots.