COIT12209 - DATA SCIENCE
Assessment 1 Specification
Due date: Week 6 Monday (26 August 2019) 11:45 pm AEST ASSESSMENT1
Length: No fixed length
Assessment 1 relates to unit learning outcomes 1 and 2 stated in the e-unit profile. This assessment contributes to 40% of the total marks.
Assessment 1 is an individual assessment. In assessment 1, you are assigned tasks which assess your unit knowledge gained between weeks 1 and 5 about different facets of data science. You are required to write and execute R code for the given tasks. You are also required to write a report which will have R code, output screenshots showing the answers to the questions, and analysis of the generated outputs in tasks 1-10 provided below.
Please note that ALL submitted Assessment 1 reports are passed through a computerized copy detection system and it is extremely easy for the teaching staff to identify copied or otherwise plagiarised work.
• Copying (plagiarism) can incur penalties ranging from deduction of marks to failing the course or even exclusion from the University.
• Please ensure you are familiar with the Academic Misconduct Procedures, available from: http://policv.cqu.edu.au/Policv/policv file.do?policvid=1244
You will use R language for data analysis exercises provided in this assessment. These tasks will help to build your knowledge of data formats, storage, retrieval, and analysis techniques.
You are required to work on the Boston housing dataset from the UCI Machine Learning Repository. Download the Boston housing dataset from the UCI Machine Learning Repository to your working directory.
For each task, write R code, generate the output by executing the R code on the given dataset and save the screenshots of the output. Save all R source codes, output screenshots and analysis on the generated outputs in an MS Word file. This Word file is required to be submitted as a report for marking. Each task should be numbered correctly for marking.
The data analysis tasks are given as follows.
1. Write R code to load the Boston housing dataset into the variable present_local? Place your screen shot.
2. Write R code to see the number of variables and records in the given dataset.
3. Write R code using tail () function to view the last 10 rows from the given dataset.
4. Write R code to generate a summary of information on the given dataset that should include the minimum, maximum, mean, and standard deviation of each variable.
5. Write R code to generate a correlation matrix (Hints: Use corrplot() function.) Write down a short description on the generated matrix with a proper screenshot.
6. Write R code to see the summary of per capita crime rate by town (CRIM) and find out number of high crime rate suburbs and low crime rate suburbs in Boston. (Hints: Use quantile (x, …) function. Consider CRIM above 90th percentile value as high crime rate and CRIM below 30th percentile value as low crime rate.) Describe the findings and place the screenshot.
7. Write R code to generate scatter plots of (i) per capita crime rate (CRIM) vs. median value of owner-occupied homes (MEDV), (ii) property-tax rate (TAX) vs. median value of owner-occupied homes (MEDV) and (iii) % lower status of the population (LSTAT) vs. median value of owner-occupied homes (MEDV). Explain the findings and place the screenshots.
8. Write R code to generate a histogram plot of % lower status of the population (LSTAT). Write down your analysis on the distribution of LSTAT with a screenshot.
9. Write R code to generate Box plots of all variables in the given dataset. (Hints: Use boxplot() function.) Provide the screen shot of the Box plots and identify which variables have outliers.
10. Write a conclusion on your overall analysis and data quality.
Assessment 1 will be marked based on the following criteria.
Working R source code provided: 12 marks
Submitted screen shots of all tasks: 12 marks
Analysis presented on the generated outputs: 12 marks Report nicely written: 4 marks
Total: 40 marks
You must submit the assessment to Moodle for marking by the due date.
To help you communicate, forums have been set up for you on the unit Moodle website for Assessments 1 and 2. Please use them to help you work through your report.
Alternatively, you can take help from Academic Learning Centres (ALC). We have advisers situated in Academic Learning Centres at many CQUniversity locations, who offer generic group sessions, unit specific workshops, individual appointments, drop in centres, and print and online resources. ALC provides a range of services which include:
• workshops online and on-campus (see our Moodle website for details)
• online review of assignments
• online query
• one-on-one appointments in person, over the phone or online via Zoom