FACULTY OF INFORMATION AND COMMUNICATION TECHNOLOGY

SEMESTER 1 SESSION 2021/2022

BITI 2233 STATISTICS AND PROBABILITY

ASSIGNMENT 1 using Rstudio 10%

INSTRUCTIONS:

a) This assignment must be done in a group of 3 members.

b) Use R software (R Studio) to complete this assignment.

c) Your submission should include:

a. Your report, which includes:

i. Snippet of commands, answers, and respective graph in the report.

ii. Each figure in the report must be properly titled. iii. Attach the data and the results of your calculation in the appendix.

iv. Include the names of each member and relevant details.

b. R code file in R file,

c. Data file in xls file.

d) Put all files into 1 folder. Zip the folder. Submit the zip file onto the Ulearn/Assessment.

e) The due date is on Week 9 – 10 Dec 2021 (refer to your lecturer for the exact date). Any late submission will be penalized.

IMPORTANT:

a) You must understand your codes!

b) Do not copy other groups’ work or allow your group work to be copied.

c) Any plagiarism detected will be penalized.

d) Final marks are individually-based and will be given based on the lecturer’s evaluation /after the presentation if required.

Question 1 (30 marks)

The Covid-19 data is shared openly by the Ministry of Health Malaysia in the following sites:

Open data on COVID-19 in Malaysia

https://github.com/MoH-Malaysia/covid19-public/tree/main/vaccination

Open data on Malaysias National Covid-19 Immunisation Programme https://github.com/CITF-Malaysia/citf-public

You are required to conduct a statistical analysis on the Covid-19 vaccination effort in Malaysia. You are required to plan your strategies to effectively analysing and reporting the vaccination status. Hence, choose carefully the related data to achieve your goal. You need to determine the focus and storyline of your report before starting the analysis using the R scripting language. It will be helpful to do a brief preliminary research on the Covid-19 vaccination program to help you set up the tone for analysis. Include citations and references in your report accordingly.

Once complete the analysis, summarize your findings in a business report. The report will be read by people who may not be familiar with statistical terms. Therefore, it is important that your report be written in a manner that could be understood by people with varying statistical knowledge.

Minimum requirements for this report include the following:

i. Analysis (2-10 pages including diagram) 10 marks

Use the statistical analysis methods you have learned in Chapter 1 (Data Description and Numerical Measures) for the analysis. Be creative, use critical thinking and problem-solving skills to set the tone of the analysis.

Your analysis should at least (but not limited to) answer the following questions:

a. Is the vaccine registration process satisfactory?

b. Is the vaccination process satisfactory?

c. Is the vaccination process for the 12-18 year-old teenager group satisfactory? ii. Summary (0.5 page) 5 marks

This page is written after the analysis is complete. Determine the goal and purpose of the analysis and state that goal/purpose in the summary.

iii. Conclusion (1 page) 5 marks

Summarize the analysis. Refer to all statistics calculated, outline the planning for possible analysis for the next step.

Submission requirements:

i. You are required to submit

(1) the R scripting files for your analysis part;

(2) a README file to describe the data, and to guide the related libraries installation, if any; and

(3) the busines report. Non executable R scripts will be considered as fail to complete.

ii. Turn in your assignment on time. Late submission WILL NOT be accepted.

iii. Submission is ONLY allowed through ULearn.

iv. Plagiarism WILL NOT be tolerated.

Question 2 (20 marks)

You are running an online business that requires boxes and bubble wrap for postage purpose. Lets say you got the boxes and bubble wrap from four different suppliers, each of the suppliers represent 25% of your supply. Some of the boxes and bubble wrap you received are poor in quality and not meeting the specification. You would like to investigate which suppliers that supply good items and which are not.

Based on the table below, generate randomly value A1 to A4 and B1 to B4 from 0 to 1 in decimal form. Then, based on the values generated, determine what is the value of A’1 to A’4 and B’1 to B’4.

Table 1: Condition of box and bubble wrap for Company – Company 4

Supplier Percentage of supply

(%) Item A: BOX Item B: BUBBLE WRAP

Good Not Good Good Not Good

Company 1 25 A1 A’1 B1 B’1

Company 2 25 A2 A’2 B2 B’2

Company 3 25 A3 A’3 B3 B’3

Company 4 25 A4 A’4 B4 B’4

i. Based on the data from Table 1, suppose that a box is in a poor-quality condition, calculate the probability it comes from each company. ii. Based on the data from Table 1, suppose that a bubble wrap is in a poor-quality condition, calculate the probability it comes from each company.

iii. As a business owner, what will be your action following the findings in (ii) and (iii)?

Question 3 (20 marks) Question 3a (10 marks)

During the 2020 Badminton National Championships, a new national badminton player had the highest point rate in the match. The player scored with 65.5% of his shots. Suppose you choose a random sample of 80 shots made by the player during the 2020 National Championships. Let X = the number of shots that scored points.

a. What is the probability distribution for X?

b. Sketch the probability density function

c. Sketch the cumulative distribution function

d. Calculate the mean and standard deviation of X.

e. Find the probability that the player scored with 60 of these shots.

f. Find the probability that the player scored with more than 50 of these shots.

g. Find the probability that the player scored between 40 and 55 of these shots

Question 3b (5 marks)

Suppose that a technology task force is being formed to study technology awareness among instructors.

Assume that ten people will be randomly chosen to be on the committee from a group of 28 volunteers, 20 who are technically proficient and eight who are not. We are interested in the number on the committee who are not technically proficient.

a. In words, define the random variable X.

b. List the values that X may take on.

c. How many instructors do you expect on the committee who are not technically proficient?

d. Find the probability that at least five on the committee are not technically proficient.

e. Find the probability that at most three on the committee are not technically proficient.

Question 3c (5 marks)

Conduct a short survey with among a group members, determine a student receive or send text messages per day on average. Let X = the number of text messages a student receives or send per day ( based on the group members’ answers.)

a. List the value that X may take on.

b. Define the probability distribution for X

c. How many text messages does a text message user receive or send per hour?

d. What is the probability that a text message user receives or 2 sends two messages per hour?

e. What is the probability that a text message user receives or sends more than two messages per hour

Question 4 (20 marks) (Refer to data Set : Babies_Weight.txt) Question 4a (15 marks)

Babies’ weight analysis

You are interested to investigate the weights of babies in the provided data set that is Babies_Weight.txt. The data set consists of weights (in ounces) and gender (0 – male, 1 – female). Based on this data set, answer the following question by using R:

i. Create a histogram of the weights of babies.

ii. Create a histogram of the weight of male babies and female babies. iii. Calculate the mean and standard deviation of the weight of male babies and female babies.

iv. Calculate the probability of a randomly picked female babies with a weight higher or equal to 132 ounces.

v. Calculate the probability of a randomly picked female babies from with a weight between 120 and 130 ounces. vi. Calculate the probability of a randomly picked female babies with a weight between 130 and 140 ounces.

vii. What can you conclude for both probabilities in questions (v) and (vi)?

viii. Calculate the probability of a randomly picked male babies with a weight lower than 128 ounces.

ix. Calculate the probability of a randomly picked male babies with a weight between 130 and 140 ounces.

x. What can you conclude for both probabilities in questions (vi) and (ix)?

xi. We want to know the weight of female babies that are 85% of all babies are lower than this value.

xii. We want to know between what two values which 60% of the weight of female babies that lie symmetrically distributed around the population mean. xiii. We want to know the weight of male babies that are 80% of all babies are higher than this value.

xiv. We want to know between what two values which 60% of the weight of male babies that lie symmetrically distributed around the population mean.

xv. What can you conclude when comparing between question (xii) and (xiv)?

Question 4b (5 marks)

i. Generate a random sample of 15,000 numbers from a normal distribution with mean of 70 and standard deviation of 5. Store that data in object called number_random.

ii. Create a histogram of that random sample. iii. How many values in your number_random vector is below 60?

iv. For a theoretical normal distribution, how many of those 15,000 values would you expect to be below 60?

v. Is your answer in part a (iii) reasonably close to your answer in part b (iv)?

Question 5 (10 marks)

i. A survey found that the American family generates an average of 17.2 pounds of glass garbage each year. Assume the standard deviation of the distribution is 2.5 pounds. Find the probability that the mean of a sample of 55 families will be between 17 and 18 pounds.

ii. The average number of moves a person makes in his or her lifetime is 12. If the standard deviation is 3.2, find the probability that the mean of a sample of 36 people is a. Less than 13

b. Greater than 13

c. Between 11 and 12

iii. The average number of earthquakes that occur in Los Angeles over one month is 36. (Most are undetectable.) Assume the standard deviation is 3.6. If a random sample of 35 months is selected, find the probability that the mean of the sample is between 34 and 37.5.

iv. A study of 800 homeowners in a certain area showed that the average value of the homes was $82,000, and the standard deviation was $5000. If 50 homes are for sale, find the probability that the mean of the values of these homes is greater than $83,500.

v. In a recent year, Delaware had the highest per capita annual income with $51,803. If s = $4850, what is the probability that a random sample of 34 state residents had a mean income greater than $50,000? Less than $48,000?

-QUESTIONS END-

SEMESTER 1 SESSION 2021/2022

BITI 2233 STATISTICS AND PROBABILITY

ASSIGNMENT 1 using Rstudio 10%

INSTRUCTIONS:

a) This assignment must be done in a group of 3 members.

b) Use R software (R Studio) to complete this assignment.

c) Your submission should include:

a. Your report, which includes:

i. Snippet of commands, answers, and respective graph in the report.

ii. Each figure in the report must be properly titled. iii. Attach the data and the results of your calculation in the appendix.

iv. Include the names of each member and relevant details.

b. R code file in R file,

c. Data file in xls file.

d) Put all files into 1 folder. Zip the folder. Submit the zip file onto the Ulearn/Assessment.

e) The due date is on Week 9 – 10 Dec 2021 (refer to your lecturer for the exact date). Any late submission will be penalized.

IMPORTANT:

a) You must understand your codes!

b) Do not copy other groups’ work or allow your group work to be copied.

c) Any plagiarism detected will be penalized.

d) Final marks are individually-based and will be given based on the lecturer’s evaluation /after the presentation if required.

Question 1 (30 marks)

The Covid-19 data is shared openly by the Ministry of Health Malaysia in the following sites:

Open data on COVID-19 in Malaysia

https://github.com/MoH-Malaysia/covid19-public/tree/main/vaccination

Open data on Malaysias National Covid-19 Immunisation Programme https://github.com/CITF-Malaysia/citf-public

You are required to conduct a statistical analysis on the Covid-19 vaccination effort in Malaysia. You are required to plan your strategies to effectively analysing and reporting the vaccination status. Hence, choose carefully the related data to achieve your goal. You need to determine the focus and storyline of your report before starting the analysis using the R scripting language. It will be helpful to do a brief preliminary research on the Covid-19 vaccination program to help you set up the tone for analysis. Include citations and references in your report accordingly.

Once complete the analysis, summarize your findings in a business report. The report will be read by people who may not be familiar with statistical terms. Therefore, it is important that your report be written in a manner that could be understood by people with varying statistical knowledge.

Minimum requirements for this report include the following:

i. Analysis (2-10 pages including diagram) 10 marks

Use the statistical analysis methods you have learned in Chapter 1 (Data Description and Numerical Measures) for the analysis. Be creative, use critical thinking and problem-solving skills to set the tone of the analysis.

Your analysis should at least (but not limited to) answer the following questions:

a. Is the vaccine registration process satisfactory?

b. Is the vaccination process satisfactory?

c. Is the vaccination process for the 12-18 year-old teenager group satisfactory? ii. Summary (0.5 page) 5 marks

This page is written after the analysis is complete. Determine the goal and purpose of the analysis and state that goal/purpose in the summary.

iii. Conclusion (1 page) 5 marks

Summarize the analysis. Refer to all statistics calculated, outline the planning for possible analysis for the next step.

Submission requirements:

i. You are required to submit

(1) the R scripting files for your analysis part;

(2) a README file to describe the data, and to guide the related libraries installation, if any; and

(3) the busines report. Non executable R scripts will be considered as fail to complete.

ii. Turn in your assignment on time. Late submission WILL NOT be accepted.

iii. Submission is ONLY allowed through ULearn.

iv. Plagiarism WILL NOT be tolerated.

Question 2 (20 marks)

You are running an online business that requires boxes and bubble wrap for postage purpose. Lets say you got the boxes and bubble wrap from four different suppliers, each of the suppliers represent 25% of your supply. Some of the boxes and bubble wrap you received are poor in quality and not meeting the specification. You would like to investigate which suppliers that supply good items and which are not.

Based on the table below, generate randomly value A1 to A4 and B1 to B4 from 0 to 1 in decimal form. Then, based on the values generated, determine what is the value of A’1 to A’4 and B’1 to B’4.

Table 1: Condition of box and bubble wrap for Company – Company 4

Supplier Percentage of supply

(%) Item A: BOX Item B: BUBBLE WRAP

Good Not Good Good Not Good

Company 1 25 A1 A’1 B1 B’1

Company 2 25 A2 A’2 B2 B’2

Company 3 25 A3 A’3 B3 B’3

Company 4 25 A4 A’4 B4 B’4

i. Based on the data from Table 1, suppose that a box is in a poor-quality condition, calculate the probability it comes from each company. ii. Based on the data from Table 1, suppose that a bubble wrap is in a poor-quality condition, calculate the probability it comes from each company.

iii. As a business owner, what will be your action following the findings in (ii) and (iii)?

Question 3 (20 marks) Question 3a (10 marks)

During the 2020 Badminton National Championships, a new national badminton player had the highest point rate in the match. The player scored with 65.5% of his shots. Suppose you choose a random sample of 80 shots made by the player during the 2020 National Championships. Let X = the number of shots that scored points.

a. What is the probability distribution for X?

b. Sketch the probability density function

c. Sketch the cumulative distribution function

d. Calculate the mean and standard deviation of X.

e. Find the probability that the player scored with 60 of these shots.

f. Find the probability that the player scored with more than 50 of these shots.

g. Find the probability that the player scored between 40 and 55 of these shots

Question 3b (5 marks)

Suppose that a technology task force is being formed to study technology awareness among instructors.

Assume that ten people will be randomly chosen to be on the committee from a group of 28 volunteers, 20 who are technically proficient and eight who are not. We are interested in the number on the committee who are not technically proficient.

a. In words, define the random variable X.

b. List the values that X may take on.

c. How many instructors do you expect on the committee who are not technically proficient?

d. Find the probability that at least five on the committee are not technically proficient.

e. Find the probability that at most three on the committee are not technically proficient.

Question 3c (5 marks)

Conduct a short survey with among a group members, determine a student receive or send text messages per day on average. Let X = the number of text messages a student receives or send per day ( based on the group members’ answers.)

a. List the value that X may take on.

b. Define the probability distribution for X

c. How many text messages does a text message user receive or send per hour?

d. What is the probability that a text message user receives or 2 sends two messages per hour?

e. What is the probability that a text message user receives or sends more than two messages per hour

Question 4 (20 marks) (Refer to data Set : Babies_Weight.txt) Question 4a (15 marks)

Babies’ weight analysis

You are interested to investigate the weights of babies in the provided data set that is Babies_Weight.txt. The data set consists of weights (in ounces) and gender (0 – male, 1 – female). Based on this data set, answer the following question by using R:

i. Create a histogram of the weights of babies.

ii. Create a histogram of the weight of male babies and female babies. iii. Calculate the mean and standard deviation of the weight of male babies and female babies.

iv. Calculate the probability of a randomly picked female babies with a weight higher or equal to 132 ounces.

v. Calculate the probability of a randomly picked female babies from with a weight between 120 and 130 ounces. vi. Calculate the probability of a randomly picked female babies with a weight between 130 and 140 ounces.

vii. What can you conclude for both probabilities in questions (v) and (vi)?

viii. Calculate the probability of a randomly picked male babies with a weight lower than 128 ounces.

ix. Calculate the probability of a randomly picked male babies with a weight between 130 and 140 ounces.

x. What can you conclude for both probabilities in questions (vi) and (ix)?

xi. We want to know the weight of female babies that are 85% of all babies are lower than this value.

xii. We want to know between what two values which 60% of the weight of female babies that lie symmetrically distributed around the population mean. xiii. We want to know the weight of male babies that are 80% of all babies are higher than this value.

xiv. We want to know between what two values which 60% of the weight of male babies that lie symmetrically distributed around the population mean.

xv. What can you conclude when comparing between question (xii) and (xiv)?

Question 4b (5 marks)

i. Generate a random sample of 15,000 numbers from a normal distribution with mean of 70 and standard deviation of 5. Store that data in object called number_random.

ii. Create a histogram of that random sample. iii. How many values in your number_random vector is below 60?

iv. For a theoretical normal distribution, how many of those 15,000 values would you expect to be below 60?

v. Is your answer in part a (iii) reasonably close to your answer in part b (iv)?

Question 5 (10 marks)

i. A survey found that the American family generates an average of 17.2 pounds of glass garbage each year. Assume the standard deviation of the distribution is 2.5 pounds. Find the probability that the mean of a sample of 55 families will be between 17 and 18 pounds.

ii. The average number of moves a person makes in his or her lifetime is 12. If the standard deviation is 3.2, find the probability that the mean of a sample of 36 people is a. Less than 13

b. Greater than 13

c. Between 11 and 12

iii. The average number of earthquakes that occur in Los Angeles over one month is 36. (Most are undetectable.) Assume the standard deviation is 3.6. If a random sample of 35 months is selected, find the probability that the mean of the sample is between 34 and 37.5.

iv. A study of 800 homeowners in a certain area showed that the average value of the homes was $82,000, and the standard deviation was $5000. If 50 homes are for sale, find the probability that the mean of the values of these homes is greater than $83,500.

v. In a recent year, Delaware had the highest per capita annual income with $51,803. If s = $4850, what is the probability that a random sample of 34 state residents had a mean income greater than $50,000? Less than $48,000?

-QUESTIONS END-

ASSESSMENT BRIEFSubject Code and Name ACCT6001 Accounting Information SystemsAssessment Assessment 4 – Database Application - Case StudyIndividual/Group IndividualLearning Outcomes (b) Explain the characteristics...Hello,I have done some of my assessments but am not sure about the rest, if you have time to have a look at it.ThanksProject DetailsLearning OutcomesStudents are required to:1. Work in a team and engage in negotiation of goals and methods. (PTS4)2. Solve a software development related problem while choosing the right...Suppose 𝑦 = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2 + 𝜖 is linear regressionmodel of weekly food expenditure (y) on weeklyhousehold income (x1) and household size (x2)....I want to submit an interim report basis my research proposal topic submission and acceptance. Research proposal topic submission file and guideline documents received from the university are attached...RESEARCH PROJECTASSESSMENT 2 ASSESSMENT OVERVIEWKey Assessment Information PURPOSEExamine the significance of integrated reporting within the context of different financial and nonfinancial disclosure...Question:In the motivation letter, articulate clearly your reasons for applying. Consider your activities, experiences and commitment related to the field of the programme. Relate your plans after graduation,...**Show All Questions**