Business Analytics and Big Data (ACC73002)

Assessment 2 – Calculations and short written responses (30%)

INSTRUCTIONS

Perform the required calculations, present your findings, and prepare short written responses to the following questions.

Question 1. Predicting Boston Housing Prices using multiple linear regression analysis (9 marks)

The file BostonHousing.xls contains information collected by the U.S. Bureau of the Census concerning housing in the area of Boston, Massachusetts. The dataset includes information on 506 census housing tracts in the Boston area. The goal is to predict the median house price in new tracts based on information such as crime rate, pollution, and number of rooms. The dataset contains 12 predictors, and the response is the median house price (MEDV).

a. Why should the data be partitioned into training and validation sets? What will the training set be used for? What will the validation set be used for? (3 marks)

b. Run the R code and fit a multiple linear regression model to the median house price (MEDV) as a function of CRIM, CHAS, and RM. Write the equation for predicting the median house price from the predictors in the model. Interpret the estimated coefficients (CRIM, CHAS, and RM). (4 marks)

c. Using the estimated regression model, what median house price is predicted for a tract in the Boston area that does not bound the Charles River, has a crime rate of 0.1, and where the average number of rooms per house is 6? (2 marks)

Question 2 Calculating Distance with Categorical Predictors (9 marks)

Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance, and the creation of binary dummies. The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat) and other (Other). It also tracks, for each customer, the number of years since first contact (years). Consider the following customers; Information about whether they have taken a course or not (the outcome to be predicted) is included:

Customer 1: Stat, 1 year, did not take course

Customer 2: Other, 1.1 year, took course

a. Consider now the following new prospect: Prospect 1: IT, 1 year

Using the above information on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 binaries, and a similar dataset with the categorical predictor variable transformed into 3 binaries. (3 marks)

b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers. (Note: while it is typical to normalize data for k NN, this is not an ironclad rule and you may proceed here without normalization. (3 marks)

c. Using k NN with k = 1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies? (3 marks)

Question 3. Financial Condition of Banks - Logistic regression analysis (12 marks)

The file Banks.xls includes data on a sample of 20 banks. The “Financial Condition” column records the judgment of an expert on the financial condition of each bank. This dependent variable takes one of two possible values— weak or strong —according to the financial condition of the bank. The predictors are two ratios used in the financial analysis of banks: TotLns&Lses/Assets is the ratio of total loans and leases to total assets and TotExp/Assets is the ratio of total expenses to total assets. The target is to use the two ratios for classifying the financial condition of a new bank.

Run a logistic regression model (on the entire dataset) that models the status of a bank as a function of the two financial measures provided. Specify the success class as weak (this is similar to creating a dummy that is 1 for financially weak banks and 0 otherwise), and use the default cut-off value of 0.5.

a. Write the estimated equation that associates the financial condition of a bank with its two predictors in three formats:

i. The logit as a function of the predictors (2 marks)

ii. The odds as a function of the predictors (2 marks)

iii. The probability as a function of the predictors (2 marks)

b. Consider a new bank whose total loans and leases/assets ratio = 0.6 and total expenses/assets ratio = 0.11. From your logistic regression model, estimate the following four quantities for this bank (use Excel to do all the intermediate calculations; show your final answers to four decimal places): the logit, the odds, the probability of being financially weak, and the classification of the bank (use cut-off=0.5). (3 marks)

c. The cut-off value of 0.5 is used in conjunction with the probability of being financially weak. Compute the threshold that should be used if we want to make a classification based on the odds of being financially weak, and the threshold for the corresponding logit. (3 marks)

Assessment 2 – Calculations and short written responses (30%)

INSTRUCTIONS

Perform the required calculations, present your findings, and prepare short written responses to the following questions.

Question 1. Predicting Boston Housing Prices using multiple linear regression analysis (9 marks)

The file BostonHousing.xls contains information collected by the U.S. Bureau of the Census concerning housing in the area of Boston, Massachusetts. The dataset includes information on 506 census housing tracts in the Boston area. The goal is to predict the median house price in new tracts based on information such as crime rate, pollution, and number of rooms. The dataset contains 12 predictors, and the response is the median house price (MEDV).

a. Why should the data be partitioned into training and validation sets? What will the training set be used for? What will the validation set be used for? (3 marks)

b. Run the R code and fit a multiple linear regression model to the median house price (MEDV) as a function of CRIM, CHAS, and RM. Write the equation for predicting the median house price from the predictors in the model. Interpret the estimated coefficients (CRIM, CHAS, and RM). (4 marks)

c. Using the estimated regression model, what median house price is predicted for a tract in the Boston area that does not bound the Charles River, has a crime rate of 0.1, and where the average number of rooms per house is 6? (2 marks)

Question 2 Calculating Distance with Categorical Predictors (9 marks)

Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance, and the creation of binary dummies. The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat) and other (Other). It also tracks, for each customer, the number of years since first contact (years). Consider the following customers; Information about whether they have taken a course or not (the outcome to be predicted) is included:

Customer 1: Stat, 1 year, did not take course

Customer 2: Other, 1.1 year, took course

a. Consider now the following new prospect: Prospect 1: IT, 1 year

Using the above information on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 binaries, and a similar dataset with the categorical predictor variable transformed into 3 binaries. (3 marks)

b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers. (Note: while it is typical to normalize data for k NN, this is not an ironclad rule and you may proceed here without normalization. (3 marks)

c. Using k NN with k = 1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies? (3 marks)

Question 3. Financial Condition of Banks - Logistic regression analysis (12 marks)

The file Banks.xls includes data on a sample of 20 banks. The “Financial Condition” column records the judgment of an expert on the financial condition of each bank. This dependent variable takes one of two possible values— weak or strong —according to the financial condition of the bank. The predictors are two ratios used in the financial analysis of banks: TotLns&Lses/Assets is the ratio of total loans and leases to total assets and TotExp/Assets is the ratio of total expenses to total assets. The target is to use the two ratios for classifying the financial condition of a new bank.

Run a logistic regression model (on the entire dataset) that models the status of a bank as a function of the two financial measures provided. Specify the success class as weak (this is similar to creating a dummy that is 1 for financially weak banks and 0 otherwise), and use the default cut-off value of 0.5.

a. Write the estimated equation that associates the financial condition of a bank with its two predictors in three formats:

i. The logit as a function of the predictors (2 marks)

ii. The odds as a function of the predictors (2 marks)

iii. The probability as a function of the predictors (2 marks)

b. Consider a new bank whose total loans and leases/assets ratio = 0.6 and total expenses/assets ratio = 0.11. From your logistic regression model, estimate the following four quantities for this bank (use Excel to do all the intermediate calculations; show your final answers to four decimal places): the logit, the odds, the probability of being financially weak, and the classification of the bank (use cut-off=0.5). (3 marks)

c. The cut-off value of 0.5 is used in conjunction with the probability of being financially weak. Compute the threshold that should be used if we want to make a classification based on the odds of being financially weak, and the threshold for the corresponding logit. (3 marks)

This above price is for already used answers. Please do not submit them directly as it may lead to plagiarism. Once paid, the deal will be non-refundable and there is no after-sale support for the quality or modification of the contents. Either use them for learning purpose or re-write them in your own language. If you are looking for new unused assignment, please use live chat to discuss and get best possible quote.

Instructions & Marketing Guide for Case Study 2:“Rocket Fuel: Measuring the Effectiveness of Online Advertising”Aim of this case study assignmentThe purpose of this case study assignment is to analyse...ASSESSMENT COVER SHEETSTUDENT DETAILS / DECLARATION:Course Name: BSB40215 Certificate IV in BusinessUnit / Subject Name: BSBMKG401 Profile the marketTrainer’s Name: Assessment No: MKG401#2019.05I declare...ASSESSMENT BRIEFSubject Code and Title PUBH6005: EpidemiologyAssessment Assessment 2: Study design, sampling and population riskIndividual/Group IndividualLength 1500 wordsLearning Outcomes This assessment...Please can you quote me on my accounting for home business class assignment.Many thanksJon QUESTION 1 (10 Marks) Ken Kennett Building Services is a local business operating in the housing industry. Ken...ASSESSMENT 1 BRIEF; Part ASubject Code and Title MGT501 Business EnvironmentAssessment Part A: Initial Statement of IntentIndividual/Group IndividualLength 1000 wordsLearning Outcomes This assessments...Group Assignment of BAP71BTRIMESTER 2, 2019Group Assignment (10%)This assignment is part of the continuous assessment and feedback. It is a group assignment with a maximum of 3 students in a group. Please...This task requires you to prepare a report individually to evaluate information provided in the annual report of a chosen company listed on the Australian Stock Exchange (ASX) according to related Australian...**Show All Questions**