BUS708 Statistics and Data Analysis
Statistical Modelling Assignment
Trimester 3, 2018
1 OVERVIEW OF THE ASSIGNMENT
This assignment will test your skills of collecting and analysing data to answer a specific business problem. It also gives you the opportunity to apply the theories you have learned in this course such as finding numerical summaries, displaying with appropriate graphs and using statistical inferences to solve business problems, including constructing hypotheses, test them and interpret the findings. You may have to use two Data sets. One Data set will be sent you via KOI student email individually and other you need to find or collect.
Suppose you are working for an agency who analyses airlines services Data of Australia to make a recommendation to improve Airport services to airline. You will be given series of research questions. Use your knowledge that you gain from this course to answer these questions by displaying appropriate outputs of Excel, StatKey or Wolfram alpha. Use these answers to write an executive summary which might be a valuable recommendation to Sydney Airport to improve their services to airlines.
There are two datasets involved in this assignment: Dataset 1 and Dataset 2, detailed below.
Dataset 1: You will receive an email that contains a dataset that is specifically allocated to you. This dataset is a subset of an International Airlines, Operated Flights and Seats to and from Australia sample file, provided by the Australian Government Open Data and has been edited to only include a subset of the cases and variables. The original dataset can be obtained from https://data.gov.au/dataset/e82787e4-a480-4189-b963-1d0b6088103e/resource/c308bb0a-98a94cbb-bd0b-206d338ca06c/download/international_airline_activity_opfltsseats.csv and it is under the license of Creative Commons Attribution 3.0 Australia. Data dictionary of the edited dataset is given in the following table.
Variable Description Values
In-Out Airlines comes in or goes out I for in and O for out
Australian City Which Australian city airline lands or Flies out. Australian city names
International City Which international city airline lands or flies out International city names
Airlines Name of the airline Name of the airline
Route Via which airport airlines flies Short forms of various airports
Port country Which country airlines belongs to Name of the country
Port Region Which region airline belongs Region name
to
Service country Which country do the service Country name
Stops Number of stops airlines have 0,1,2
All Flights Number flight in or out in the month Number in integer
Max seat Number of maximum seats Number in integer
Year Which year Number in the year
Month Number Which month Number of the month
Dataset 2: Collect data (e.g. via a survey) that will answer research question given in section 3. There is no requirement about the number of variables, sampling methods and sample size, but you need to justify your approaches in Section 1 (see below).
Both datasets should be saved in an Excel file (one file, separate worksheets). All data processing should be performed in Excel or Statkey (http://www.lock5stat.com/StatKey).
Prepare a report in a document file (.doc or .docx) which includes all relevant tables and figures, using the following structure:
1. Section 1: Introduction
a. Give a brief introduction about the assignment and search related Article and write a one paragraph of summary which should be a support for your assignment. You need to give the full citation of the article.
b. Dataset 1: Give a short description about this dataset. Is this primary or secondary data? What are the types of variables involved? Explain briefly what are the possible cases used in this study.
c. Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What is/are the type(s) of variable(s) involved? Give a description of cases you consider for this data set.
2. Section 2: Analysis of single variable in Dataset 1
a. To answer the research question “What is the shape of the distribution of the variable All Flights?”, provide a suitable numerical summary and graphical display for the variables All Flight of Dataset 1. Give a detail comments to answer the research question.
b. Now to answer the research question “Is the average number of flights came in and flew out to Australia in a month between September 2003 and September 2018 more than 30?” setup an appropriate hypotheses, perform hypotheses test by checking all the assumptions and answer the research question by writing the conclusion of the test.
3. Section 3: Analysis of two variables in Dataset 1
Sydney Airport has competition with its counterparts Melbourne and Brisbane. To identify which airport perform better find the answers for the following series of research questions and write good recommendations to Sydney Airport;
a. Give numerical summary and appropriate graphical display for comparing the variables Australian city only for those three main cities and Airlines by considering main three Airlines namely Singapore Airlines, Air New Zealand and Cathy Pacific Airways.
b. Perform a suitable hypothesis test at a 5% level of significance to test whether there is association between Australian City and Airlines by considering only those three cities and three airlines in the part (a).
c. Use the conclusion and contribution of the test in part b and outputs in part a, write an accurate information about which Airport perform best.
4. Section 4: Collect and analysis Dataset2
You are interested in finding that KOI student have good experience in Flying in or out through which airport in Australia specifically Sydney, Melbourne or Brisbane. By considering appropriate number of cases and variable, give a proper graphical display and use it to write a comments.
5. Section 5: Discussion & Conclusion
a. Write and executive summary by combining all your findings in the previous sections which must be a valuable recommendation for Sydney Airport.
b. Give a suggestion for further research
A presentation/interview for the assignment is scheduled on Week 11, in your allocated tutorial.
You do NOT need to prepare a presentation material (e.g. power-point slides), instead, you will be asked to demonstrate and/or explain how you summarised the data and how you performed the analysis. You may be asked to reproduce what you have made in your written report (e.g. generate a chart or numerical summary using Excel or Statkey).
4 SUBMISSION REQUIREMENT
Deadline to submit written report: Week 10 Friday (25 Jan 2019), 11.59pm
You need to submit 2 files to Turnitin:
1. Main report, in a Microsoft Word document file (this is the file that will be marked, it should contain all necessary tables and figures). Submit in word icon.
2. Dataset, in a Microsoft Excel file (this is just a supporting file). Submit in Excel icon.
Main report (word document):
1. Size: A4
3. Single space
4. Font: Calibri, 11pt
Dataset (excel document):
1. Dataset 1 in Sheet 1
2. Dataset 2 in Sheet 2
3. Data processing for each section in other sheets (rename the sheet appropriately)
5 DEDUCTION, LATE SUBMISSION AND EXTENSION
Late submission penalty: - 5% of the total available marks per calendar day unless an extension is approved.
