Business Analytics and Big Data (ACC73002)

Assessment 2 – Calculations and short written responses (30%)

INSTRUCTIONS

Perform the required calculations, present your findings, and prepare short written responses to the following questions.

Question 1. Predicting Boston Housing Prices using multiple linear regression analysis (9 marks)

The file BostonHousing.xls contains information collected by the U.S. Bureau of the Census concerning housing in the area of Boston, Massachusetts. The dataset includes information on 506 census housing tracts in the Boston area. The goal is to predict the median house price in new tracts based on information such as crime rate, pollution, and number of rooms. The dataset contains 12 predictors, and the response is the median house price (MEDV).

a. Why should the data be partitioned into training and validation sets? What will the training set be used for? What will the validation set be used for? (3 marks)

b. Run the R code and fit a multiple linear regression model to the median house price (MEDV) as a function of CRIM, CHAS, and RM. Write the equation for predicting the median house price from the predictors in the model. Interpret the estimated coefficients (CRIM, CHAS, and RM). (4 marks)

c. Using the estimated regression model, what median house price is predicted for a tract in the Boston area that does not bound the Charles River, has a crime rate of 0.1, and where the average number of rooms per house is 6? (2 marks)

Question 2 Calculating Distance with Categorical Predictors (9 marks)

Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance, and the creation of binary dummies. The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat) and other (Other). It also tracks, for each customer, the number of years since first contact (years). Consider the following customers; Information about whether they have taken a course or not (the outcome to be predicted) is included:

Customer 1: Stat, 1 year, did not take course

Customer 2: Other, 1.1 year, took course

a. Consider now the following new prospect: Prospect 1: IT, 1 year

Using the above information on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 binaries, and a similar dataset with the categorical predictor variable transformed into 3 binaries. (3 marks)

b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers. (Note: while it is typical to normalize data for k NN, this is not an ironclad rule and you may proceed here without normalization. (3 marks)

c. Using k NN with k = 1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies? (3 marks)

Question 3. Financial Condition of Banks - Logistic regression analysis (12 marks)

The file Banks.xls includes data on a sample of 20 banks. The “Financial Condition” column records the judgment of an expert on the financial condition of each bank. This dependent variable takes one of two possible values— weak or strong —according to the financial condition of the bank. The predictors are two ratios used in the financial analysis of banks: TotLns&Lses/Assets is the ratio of total loans and leases to total assets and TotExp/Assets is the ratio of total expenses to total assets. The target is to use the two ratios for classifying the financial condition of a new bank.

Run a logistic regression model (on the entire dataset) that models the status of a bank as a function of the two financial measures provided. Specify the success class as weak (this is similar to creating a dummy that is 1 for financially weak banks and 0 otherwise), and use the default cut-off value of 0.5.

a. Write the estimated equation that associates the financial condition of a bank with its two predictors in three formats:

i. The logit as a function of the predictors (2 marks)

ii. The odds as a function of the predictors (2 marks)

iii. The probability as a function of the predictors (2 marks)

b. Consider a new bank whose total loans and leases/assets ratio = 0.6 and total expenses/assets ratio = 0.11. From your logistic regression model, estimate the following four quantities for this bank (use Excel to do all the intermediate calculations; show your final answers to four decimal places): the logit, the odds, the probability of being financially weak, and the classification of the bank (use cut-off=0.5). (3 marks)

c. The cut-off value of 0.5 is used in conjunction with the probability of being financially weak. Compute the threshold that should be used if we want to make a classification based on the odds of being financially weak, and the threshold for the corresponding logit. (3 marks)

Assessment 2 – Calculations and short written responses (30%)

INSTRUCTIONS

Perform the required calculations, present your findings, and prepare short written responses to the following questions.

Question 1. Predicting Boston Housing Prices using multiple linear regression analysis (9 marks)

The file BostonHousing.xls contains information collected by the U.S. Bureau of the Census concerning housing in the area of Boston, Massachusetts. The dataset includes information on 506 census housing tracts in the Boston area. The goal is to predict the median house price in new tracts based on information such as crime rate, pollution, and number of rooms. The dataset contains 12 predictors, and the response is the median house price (MEDV).

a. Why should the data be partitioned into training and validation sets? What will the training set be used for? What will the validation set be used for? (3 marks)

b. Run the R code and fit a multiple linear regression model to the median house price (MEDV) as a function of CRIM, CHAS, and RM. Write the equation for predicting the median house price from the predictors in the model. Interpret the estimated coefficients (CRIM, CHAS, and RM). (4 marks)

c. Using the estimated regression model, what median house price is predicted for a tract in the Boston area that does not bound the Charles River, has a crime rate of 0.1, and where the average number of rooms per house is 6? (2 marks)

Question 2 Calculating Distance with Categorical Predictors (9 marks)

Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance, and the creation of binary dummies. The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat) and other (Other). It also tracks, for each customer, the number of years since first contact (years). Consider the following customers; Information about whether they have taken a course or not (the outcome to be predicted) is included:

Customer 1: Stat, 1 year, did not take course

Customer 2: Other, 1.1 year, took course

a. Consider now the following new prospect: Prospect 1: IT, 1 year

Using the above information on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 binaries, and a similar dataset with the categorical predictor variable transformed into 3 binaries. (3 marks)

b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers. (Note: while it is typical to normalize data for k NN, this is not an ironclad rule and you may proceed here without normalization. (3 marks)

c. Using k NN with k = 1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies? (3 marks)

Question 3. Financial Condition of Banks - Logistic regression analysis (12 marks)

The file Banks.xls includes data on a sample of 20 banks. The “Financial Condition” column records the judgment of an expert on the financial condition of each bank. This dependent variable takes one of two possible values— weak or strong —according to the financial condition of the bank. The predictors are two ratios used in the financial analysis of banks: TotLns&Lses/Assets is the ratio of total loans and leases to total assets and TotExp/Assets is the ratio of total expenses to total assets. The target is to use the two ratios for classifying the financial condition of a new bank.

Run a logistic regression model (on the entire dataset) that models the status of a bank as a function of the two financial measures provided. Specify the success class as weak (this is similar to creating a dummy that is 1 for financially weak banks and 0 otherwise), and use the default cut-off value of 0.5.

a. Write the estimated equation that associates the financial condition of a bank with its two predictors in three formats:

i. The logit as a function of the predictors (2 marks)

ii. The odds as a function of the predictors (2 marks)

iii. The probability as a function of the predictors (2 marks)

b. Consider a new bank whose total loans and leases/assets ratio = 0.6 and total expenses/assets ratio = 0.11. From your logistic regression model, estimate the following four quantities for this bank (use Excel to do all the intermediate calculations; show your final answers to four decimal places): the logit, the odds, the probability of being financially weak, and the classification of the bank (use cut-off=0.5). (3 marks)

c. The cut-off value of 0.5 is used in conjunction with the probability of being financially weak. Compute the threshold that should be used if we want to make a classification based on the odds of being financially weak, and the threshold for the corresponding logit. (3 marks)

This above price is for already used answers. Please do not submit them directly as it may lead to plagiarism. Once paid, the deal will be non-refundable and there is no after-sale support for the quality or modification of the contents. Either use them for learning purpose or re-write them in your own language. If you are looking for new unused assignment, please use live chat to discuss and get best possible quote.

1. ASSESSMENT TASK 2: CASE STUDYAligned subject learning outcomes:o Apply advanced communication strategies for potentially challenging conversations and relationships relevant to the healthcare professional.o...Homework Activnies Booklet5 ad this research paper professional Development in Early Chikihood Programs. Process Issues and Research Zeeeds, which can be access. on https://www.tandfonline.com/doi/pdf/j0,08wie4092808025,3279,)...QuestionIdentify enterprise environmental factors (EEF) of the project;Propose a tool, technique or framework intend to use for qualitative and/or quantitative impact analysis of EEF;Justify why the selected...This individual assignment is to ensure your ability to analyze and synthesize major theories and concepts in international settings. This assignment will also be used as one of assessments for Global...Part 2 80% - Individual submission: Value and Risk Report and Risk EssayAssume that a number of months have passed and the project has now developed to the detailed design stage……Value Engineering the...FINA2002: FINANCIAL MARKETS AND INSTITUTIONSSTUDENT GROUP PRESENTATION TOPICS (2015)Students should refer to the Course Profile (Assessment) for specific information in relation to the format of the Group...ASSESSMENT BRIEFSubject Code and Title ECON2002 Principles of EconomicsAssessment Assessment 2 - Assignment (Research Based)Individual/Group IndividualLength 1 500 Words (+-10%)Learning Outcomes 1, 2,...**Show All Questions**