Recent Question/Assignment

FACULTY OF LAW & BUSINESS
Peter Faber Business School
Semester 2, 2018
DATA201: Data Analytics and Decision Making
Case Study: Don’t Get Kicked
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk that the vehicle might have serious issues that prevent it from being sold to customers. The auto community calls these unfortunate purchases -kicks-.
Kicked cars often result when there are tampered odometers, mechanical issues the dealer is not able to address, issues with getting the vehicle title from the seller, or some other unforeseen problem. Kicked cars can be very costly to dealers after transportation cost, throw-away repair work, and market losses in reselling the vehicle.
Data analysts who can figure out which cars have a higher risk of being kick can provide real value to dealerships trying to provide the best inventory selection possible to their customers.
The challenge of this case study is to predict if a car purchased at an Auction is a Kick (bad buy).
The data dictionary, Carvana_Data_Dictionary.txt, and the data files can be downloaded from LEO under Assessment tab. The data dictionary describes the 34 attributes: RefId, in the first column, contains the ID number for each record. IsBadBuy, in the second column, is the binary dependent variable, where a 1 (one) means “is Kick” and 0 (zero) “is not Kick”. The remaining columns (3 through 30) are independent variables. The dataset contains records for 72,561 vehicles, of which 12.3% are Kick.
(Adapted from Kaggle competition)
Case Assignment
The overall objective of this assignment is two-fold. First, determine if a classification tool can be constructed that can effectively assist the buyer in identification of cars likely to be Kicks. Second, if successful in creating such a tool, describe the tool’s functionality with respect to input contributions to the Kick classification.
To accomplish this, do the following:
Step 1 (for Assessment Task 1): Data pre-processing
Explore and prepare the dataset for mining as follows:
1. Evaluate and determine what to do with missing values (including blanks and dummy values such as NULL)
2. Eliminate columns, which contain highly correlated values or values that are otherwise of no value to the analysis process.
3. Search the dataset and remove obvious outliers.
Use the statistical methods you have learned in weeks 1-5 to explore and pre-process data. Describe what data pre-processing you did and explain why. Also describe how you engage with stakeholders to elicit requirements, data and relevant information.
Submit your Report (Word) and Excel file on ePortfolio. Submit your Report that contains the secret link to your ePortfolio on LEO Turnitin.
See the Unit Outline for information about what to include in your Assignment and how to submit your it. More information and help are available on LEO / Assessment.
Step 2 (for Assessment Task 2): Constructing the classification tool
1. Experiment different configurations of the decision tree tool in Knime to find the best one you can. (NB. The error rate should be less than 15%).
2. It is expected that while exploring this tool, you may need to keep coming back to explore the dataset to find the best set of inputs for your classification problem.
3. When you are happy with your classification tool, create a dashboard in Tableau or Excel to present these inputs and how they affect IsBadBuy (Kicks). Be mindful to choose appropriate visuals for your dashboard.
Use the data analytic methods you have learnt in the whole semester. Explain your analysis in the experiment. Evaluate your classification tool and explain how it may assist the buyer to reduce the Kicks rate.
Submit your Report (Word), data file (Excel), Knime file (of your decision tree), and Dashboard (Tableau or Excel) on ePortfolio. Submit your Report that contains the secret link to your ePortfolio on LEO Turnitin.
See the Unit Outline for information about what to include in your Assignment and how to submit your it. More information and help are available on LEO / Assessment.