Recent Question/Assignment

QBM117 Business Statistics
Due date: Monday 31 August 2015
Value: 15%
This assignment is designed to assess the following learning outcomes:
• be able to summarise and interpret data graphically and numerically;
• be able to explain the standard uses of Statistics in the media and in business environments,and judge whether the statistical methodology and conclusions drawn are appropriate;
• be able to use a statistical package to analyse data appropriately, and then interpret the output;
• be able to calculate and interpret probabilities, and use standard discrete and continuous probability distributions;
• be able to evaluate if the assumptions underlying statistical techniques are valid in a given scenario;
• be able to apply basic principles of survey design, such as determination of appropriate sample sizes and sampling techniques.
The assignment must be neatly handwritten with any Excel output inserted where required at the appropriate place in the assignment, not in an Appendix at the back of the assignment. Use the assignment template provided in the assignment folder in resources when preparing this assignment.
A completed SCM coversheet (available in the assignment folder in resources) must be attached to the front cover of the assignment. Read the Student Declaration carefully before signing and dating the coversheet.
If submitting the assignment through EASTS the assignment must be uploaded as a single Word or PDF file. Assignments submitted in non-printable formats such as a ZIP file or as a collection of images will not be marked. If your scanner produces separate graphics files please paste them into a Word document before submitting to EASTS. Pages must be numbered, and your name and student number must be included on every page.
Marks will be deducted for assignments which do not follow these guidelines.
Question 1 (62 marks)
Download the data set ‘auction data.xls’ from the Assignment folder in the resources section of Interact. The data given in the worksheet tab “Data” show the sales results (as compiled by the Australian Property Manager (APM)) for properties listed for auction in Sydney for the week ending 20 June, 2015. The variables in this data set are: Beds, Type, Price and Result representing the number of bedrooms, the type of property (house, unit ot townhouse), the selling price (if provided) and the result of the auction respectively. The name of the selling agent is also included. An explanation of the ‘results’ categories is provided in the auction data file in the worksheet tab “Results Categories” .
a. For each of the variables in the list below, state whether the data are quantitative or qualitative and include the level of the data (nominal, ordinal, interval or ratio).
i. Result ii. Type iii. Price
(6 marks)
b. i. Using the coding from the worksheet “Results Categories” provided with the data set, copy and then complete the table below by allocating the codes appropriately. The first row has been done for you.
Sold Prior SP, PN
Sold at auction
Sold after the auction
Withdrawn from sale
Did not sell
Are there any codes that do not fit into any of these five categories? If you think that some of the codes do not fit, justify your answer.
Most real data sets you encounter will contain problem data such as typographical errors, transcription errors, coding errors and possible outliers. This data set is no exception. In a real situation, we would make a note of these anomalies and ask for them to be investigated or checked.Since we cannot contact the owner of this data set, for the purpose of this assignment, we will ignore the anomalies and work with the data as best we can.
Read the document ‘working with real data sets.pdf’ found in the Assignment folder in Resources, which explains some ways of identifying errors in a data set and how to deal with them.
ii. List any two different possible errors or problems that you have found in this data set and explain why you have decided they need investigating. Do not list two identical types of problems such as two houses listed as withdrawn from sale but each with a selling price.
You may want to leave completing this question till later. As you work with the data set you will encounter some of these errors so just make a note of them as you find them.
(4 marks)
c. Shiby and Kristin are property owners in the Sydney region and are planning to sell their five bedroom house over the next few months. They are considering putting it up for auction so are interested in using these data to gain an insight into the current Sydney auction market.
Using the complete data set, generate a three way pivot table report of ‘beds’ by ‘type’ by ‘result’. Use ‘type’ and ‘beds’ as row labels. (Hint: include ‘Result’ in both the column and in the body of the table).
Include the table as part of the submitted assignment.
Use the data in the pivot table to answer the following vendors’ questions about the properties listed for auction in Sydney on 20 June 2015. You may find it easier to read the table if you right justify the column labels (along the top of the table).
i. How many properties were originally listed for auction for the day in question?
How many of these were withdrawn from sale?
ii. How many of the listed properties were sold? Express this as a percentage of all the properties listed for auction that week excluding all properties that were withdrawn from sale.
iii. How many properties were definitely sold before the auction date? How many properties other than houses did not sell during the auction process (i.e. did not sell under the hammer)?
iv. How many 4 bedroom houses did not sell? Do not include any that were withdrawn from sale.
Express this number as a percentage of all the 4-bedroom houses listed for auction that week excluding all houses that were withdrawn from sale.
v. Of all the properties listed for auction, how many houses, including those that were sold prior to and those that were sold after, were sold that week? Then, express this as a percentage of all the houses listed for auction that week excluding those that were withdrawn from sale.
(13 marks)
d. Extract the houses data from the complete data set.
i. Use Excel to generate a table of descriptive statistics for these houses data only for the variable ‘Price’ and include it in your assignment submission. Express the mean to the nearest thousand dollars and the standard deviation to the nearest hundred dollars.
ii. What was the selling price of the cheapest house sold that week? Look further afield to include information about what type of property it was, how many bedrooms it had, which real estate agent sold it and interpret its results code.
iii. The table produced in part i. contains a figure that represents the sample variance. Copy the table below into your assignment and fill in both cells with the sample variance expressed as an actual number value and in Scientific Notation. One of the answers will come straight from your table of descriptive statistics. To obtain the other answer, you will have to convert the sample variance from scientific notation to number form or from number form into scientific notation. When expressing the number using Scientific Notation, do not use Excel’s method of representing such large numbers.
Sample Variance
Actual number value
Scientific Notation
(8 marks)
e. Using the data set for house prices only, use Excel to prepare a frequency distribution and histogram of the variable ‘Price’ for the houses data.
Use $500 000 as the upper limit of the first class and a class width of $500 000.
(5 marks)
f. After preparing this histogram, discuss whether the choice of classes suggested above is appropriate. Refer to important aspects such as the number of classes, the width of the classes and whether all data are included in the classes chosen.
(3 marks)
g. Generate a boxplot for the variable ‘Price’ for the houses data only. Include the 5-number summary generated by Excel and the boxplot with your assignment submission.
(4 marks)
h. Answer the following questions regarding the data for the houses only and indicate which of the outputs in the list (pivot table, histogram, boxplot, five number summary, tables of descriptive statistics) provided the answer.
i. Exactly how many outliers are there in the distribution of the selling prices of houses ?
ii. How many houses had a sale price listed? Why is this not the same number as the answer you obtained in your answer to part c (v)?
iii. 25% of houses sold for $x or more. What is x?
iv. Comment on the shape of the distribution of the ‘Price’ variable for houses (skewed, symmetric, direction of skewness if relevant, unimodal, bimodal, etc.). Provide at least three items of supporting evidence from the output generated in parts (d), (e) and (g).
(10 marks)
i. To report typical selling prices of houses and units sold at auction, would it be more appropriate for a media outlet to quote the mean or the median price? Why?
(2 marks)
j. Shiby and Kristin would like to compare the sales outcomes of all the properties that were listed. For all the properties that were listed for auction (excluding the studio apartments),
i. generate a two way pivot table of ‘Type’ by ‘Result’.
Hint: include ‘Result’ in both the column and in the body of the table.
ii. From this pivot table, generate a single horizontal 100% component bar chart with type along the vertical axis and the different types of ‘result’ making up the components of each of the three bars. Include both the pivot table and the component bar chart with your assignment submission.
iii. Shiby and Kristin showed the chart to a friend who concluded from the chart that more units than houses were sold prior to auction (SP). Was their friend correct? Explain.
(7 marks)
Question 2 (20 marks)

a. At its annual office party, a company runs a lucky dip for its 30 employees. In the container of prizes there are 25 movie vouchers, 3 vouchers for dinner for two, one voucher for a night at a prestigious hotel and one voucher for a weekend at a resort. If an employee and her partner (also an employee) each chooses an envelope, what is the probability that one of them picks the envelope containing the weekend at a resort?
(2 marks)
b. P(A) ? 0.4, P(B) ? 0.2 and P(A| B) ? 0.04 . Calculate
i. P(A?B) (3 marks) ii. P(B?A) . (3 marks)
c. Given that Z is the standard normal random variable,
i. calculate P(Z?1.81)
(2 marks)
ii. calculate P(?1.37 ?Z??0.81)
(3 marks)
iii. if P(Z?a) ? 0.65, find a.
(3 marks) Note: A relevant and fully labelled diagram is required for each part in part c. Refer to those included in the lecture slides and tutorials as examples of what is required.
d. The Australian Bureau of Statistics (ABS) uses the Monthly Population Survey to monitor changes in the population. Give two reasons for not using a census to monitor the monthly changes.
(2 marks)
e. An estimate of the Australian Population obtained from the 2011 Census is subject to sampling error. Is this statement True or False? Explain.
(2 marks)
(10 = 9 marks as indicated + 1 for answering the question as a
sentence in each of parts b, c and d)
Information about trade union membership and gender was obtained from the results of a survey conducted in 2010 by the ABS. The data about trade union membership and gender are presented in the two-way table below.
Member of a trade union (T) Not a member of a trade
union (T) Totals
Female (F) 880 3520
Male (M) 920 3780
a. Copy the table into your assignment and complete all row and column totals in the table.
(2 marks)
Use the notation from the table above when you answer parts b, c and d that follow.
b. If one employee is selected at random from those surveyed, find the probability that this employee is a member of a trade union.
(2 marks)
c. If one employee is selected at random from those surveyed, find the probability that this employee is female given that the selected employee is not a trade union member.
(2 marks)
d. Are the events ‘member of a trade union’ and ‘being a female employee’ independent? Justify your answer using probabilities.
(3 marks)
Question 4 (15 marks 14 marks as indicated +1 for answering the question as a sentence in each of parts c, d and e)
A study into the survival rates of Australian Businesses found that of the businesses operating in 2007, 95.8% were classified as small businesses (i.e. they had fewer than 20 employees). If a business is not classified as a small business, the probability that it is still operating after four years is 75.7%. If a business is classified as a small business, then the probability of its still operating after four years is 59.7%.
a. Using letters of the alphabet and appropriate probability notation, define the two simple events described in this problem and their complements (4 definitions altogether).
(2 marks)
b. Draw a probability tree to represent the information given in the question using the letters that you used to define the simple events in part a.
Note: Marks will be deducted if the probabilities do not appear along the branches of the probability tree.
(3 marks)
c. Determine the probability that a randomly chosen business has fewer than 20 employees and will still be operating after four years.
(3 marks)
d. Determine the probability that a randomly chosen business will no longer be operating after four years.
(3 marks)
e. Given that a business that had been operating more than four years ago is no longer operating, what is the probability that it had been a small business?
(3 marks)
(11 10 marks as indicated +1 for answering the question as a sentence in each part)
Use the appropriate statistical tables for part b. below and the appropriate Excel statistical function to determine the probabilities in parts c and d below. Include with your answer the Excel formula used to answer parts c. and d.
Government research has found that 78% of people aged 15 to 19 who live in a major city are attending an educational institution. The research also found that only 40% of people aged 15 to 19 who live in very remote parts of the country are attending an educational institution.
A sample of 25 people aged 15 to 19 living in very remote areas of the country is selected.
a. Identify the type of distribution being described by the random variable in the sentence above and write down the value(s) of its parameter(s).
(2 marks)
b. Calculate the probability that less than half of these 25 people are attending an educational institution.
(2 marks)
Another sample of one hundred 15- to 19-year-olds living in a major city is selected. The number attending an educational institution is counted.
c. Calculate the probability that more than three quarters of these 15- to 19-year-olds are attending an educational institution.
(3 marks)
d. Calculate the probability that more than 25 but fewer than 75 of these 15- to 19year-olds are attending an educational institution.
(3 marks)
(10 9 marks as indicated + 1 for answering the question as a
sentence in each part)
Use the appropriate statistical tables for part a. in the following question and the appropriate Excel statistical function to determine the probability in part b. Also include the Excel formula used with your answer to part b.
A manufacturer of vinyl floor coverings (lino) finds that the average number of flaws in a roll of vinyl is 0.05.
a. A business customer who has replaced the existing floor coverings in his shop with lino needed four rolls to do the job. What is the probability that the completed work showed no flaws in the lino?
(3 marks)
b. Suppose the retailer sells 1000 rolls of this floor covering each year. What is the probability that more than 80 flaws exist in these rolls?
Present your answer using scientific notation with 5 decimal places. (Please note that it is not acceptable to present Excel’s code for expressing scientific notation as your answer here).
(4 marks)
c. What is the probability that no more than 80 flaws occur in these 1000 rolls of lino?
(2 marks)
(17 = 16 marks as indicated + 1 for answering the question as a
sentence in each part)
A relevant and fully labelled diagram is required for each of parts b., c. and d. Refer to the diagrams included in the lecture slides and tutorials as examples of what is required.
According to the records to the end of 2014 kept by a newsagent who delivers newspapers to households, the mean travel time to distribute the newspapers on a Friday is 1 hour and 2 minutes with a standard deviation of 5 minutes. Assume these travel times are normally distributed.
a. Write down the value(s) of the parameter(s) of this distribution. Express the parameter(s) in minutes.
(2 marks)
b. A random Friday is chosen from the records. What is the probability the newsagent took less than 50 minutes to make the deliveries on that chosen
(4 marks)
c. One of the newsagent’s customers is very impatient and calls the newsagency if the paper has not been delivered by 6:15 a.m. If on one particular morning in 2015 the newsagent sets off at 5:00 make the deliveries, what is the chance that the customer will ring the agency to find out where his newspaper is? Explain the meaning of this probability in simple terms.
(5 marks)
d. There are 52 Fridays in 2015. If at the end of the year, the newsagent calculates the average delivery time for Fridays, what is the probability that this time will be more than 1 hour and four minutes?
(5 marks)