ITECH1400 Foundations of Programming

Logarithms, Benford’s Law and

Fraudulent Data

Overview

In this assignment you will write an application in Python that will apply Benford’s Law to a given set of your own data. This is an individual assignment.

Timelines and Expectations

Percentage Value of Task: 20%

Due: Friday 5 June 2020 @17:00 (week 11)

Minimum time expectation: 20 hours

Learning Outcomes Assessed

The following course learning outcomes are assessed by completing this assessment:

K1. Identify and use the correct syntax of a common programming language.

K2. Recall and use typical programming constructs to design and implement simple software solutions.

K3. Reproduce and adapt commonly used basic algorithms.

K4. Explain the importance of programming style concepts (documentation, mnemonic names, indentation)

S2. Write and implement a solution algorithm using basic programming constructs. S3. Demonstrate debugging and testing skills whilst writing code.

A1. Develop self-reliance and judgement in adapting algorithms to diverse contexts.

A2. Design and write program solutions to identified problems using accepted design constructs.

Assessment Details

Background

1 https://www.wikiwand.com/en/Common_logarithm

2 Newcomb, S. (1881). Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4(1), 39-40.

sports statistics, molecular weights and so on – and it is base and scale invariant – the length of rivers could be in miles, kilometres, metres or even cubits.

Theory

Although you do not need to know the derivation or proof of Benford’s law, all you need to know is how to apply it to a set of data.

Benford’s law states :

Pr(D d1 ? ?1) log (110 ? d11) d1 ?{1,2,...,9} (Equation 1)

So that, for the first digit in a number, the probability that this digit is a ‘1’ is:

or about 30.1%.

Similarly for the remaining digits 2-9.

If we do this for all the digits and plot them as a bar graph, we get:

Your Task

Develop a Python program which will load up a set of data, determine the frequencies of the leading digits and compare them with the predicted distribution of Benford’s law. Display this in a bar chart and a table of values. For example:

Digit 1: Observed = 0.321 Expected = 0.301

Digit 2: Observed = 0.153 Expected = 0.176 and so on up till digit 9.

We shall look at three cases.

An Excel spreadsheet has been taken from Office-Watch: Benford’s Law and Excel to let you quickly visualize the Python application that we need make.

Case 1 - Fibonacci series

This series begins with two numbers 1,1 – these two numbers are added to continue the series giving rise to the following (only the first 8 terms of the series are shown here):

1,1,2,3,5,8,13,21,. . .

There are many examples of this pattern in Nature and the series is closely related to the Golden ratio.

Using the Excel spreadsheet generate a Fibonacci series up to the 24th term and see if the first digits obey Benford’s Law. Does it get better if you add more terms?

The Chi-test measures how close an actual value is to the expected value – the closer it is to 100% the closer the actual value is to the expected value. In our case, we are testing how close the frequency of each digit in our dataset is to Benford’s prediction for that digit.

What is the value of the ChiTest comparison for this Fibonacci series? Does it get better if we add more terms to the series?

Case 2 – Fibonacci numbers & Benford’s law using Python

In this case you are to repeat the analysis in Case 1 but using you Python code.

Case 3 – Length of Rivers in the World

In this case, use your Python code to see whether the lengths of rivers in the world follow Benford’s law.

Fraud detection using Benford’s Law

One use of Benford’s Law is to detect cases of Fraud. Consider the 1993 case of State of Arizona v Nelson. The accused diverted nearly $2M to fake vendors in an attempt to defraud the State. The frequency of first digits in the written cheques clearly violates Benford’s Law leading to a conviction.

Another case is that of Enron in its posting of revenue for the year 2000. Comparison of the frequency of first digits versus the expected frequency shows large discrepancies. The company went bankrupt the following year – one of the greatest financial failures in history.

Submission

A report is to be submitted in this assignment. There is a discussion section in the report in which you can apply step 6 in the six-step problem solving process and ask the four questions often used in evaluating a solution.

More details on academic reports are available - please refer to this link:

https://federation.edu.au/current-students/learning-and-study/online-help-with/guides-to-yourassessments

There are three important parts at the link above:

1. General Guide to Writing and Study Skills

This section describes the content of a report – refer to page 34 – Abstract, Table of Contents, Introduction and Conclusion and so on.

2. General Guide to Referencing

APA referencing style is described in this section – EndNote is also available to students

3. Assignment Layout and Appearance Guidelines

This section describes how the report should appear: margin sizes, fonts, how diagrams and tables are presented and so on.

You must supply your program source code files and your documentation, together with any files required to run your application, as a single zip file named as follows:

YOUR-NAME _ YOUR-STUDENT-ID .zip

e.g. Ada_LOVELACE_30331815.zip

You may supply your word processed documentation in either Microsoft Word or LibreOffice/OpenOffice formats only – no proprietary Mac specific formats, please.

Assignments will be marked on the basis of fulfilment of the requirements and the quality of the work.

In addition to the marking criteria, marks may be deducted for failure to comply with the assignment requirements, including (but not limited to):

• Incomplete implementation(s), and

• Incomplete submissions (e.g. missing files), and

• Poor spelling and grammar.

You might be asked to demonstrate and explain your work.

Marking Criteria/Rubric

Task Mark

1 Pseudo-code for all Python scripts 10

2 Final Python code (Exceptions 2 marks), annotated with author details and with comments throughout the code (2 marks), consistent with pseudo-code 10

3 Tests to check that Python code is working correctly 10

4 Case 1 - Fibonacci numbers using example Excel sheet 5

5 Case 2 - Fibonacci numbers using your Python script – bar chart (10) & table

(5) 15

6 Case 3 - Lengths of Rivers using your Python script – bar chart (10) & table

(5) 15

7 Discussion (including 4 Questions in Step 6) 15

8 Report: Abstract, Title Page, Table of Contents (including Figures & Tables),

Introduction, Method, Results, Discussion (including the 4 Questions in Step 6 of problem solving), Acknowledgements & Statement of Authorship, References 20

TOTAL 100

Final Grade /20

Feedback

Ongoing feedback will be given in lectures and labs/tutes online classes and in arranged meeting. Feedback will also be given in Moodle.

Plagiarism

Plagiarism is the presentation of the expressed thought or work of another person as though it is one's own without properly acknowledging that person. You must not allow other students to copy your work and must take care to safeguard against this happening. More information about the plagiarism policy and procedure for the university can be found at http://federation.edu.au/students/learning-and-study/online-helpwith/plagiarism.

Logarithms, Benford’s Law and

Fraudulent Data

Overview

In this assignment you will write an application in Python that will apply Benford’s Law to a given set of your own data. This is an individual assignment.

Timelines and Expectations

Percentage Value of Task: 20%

Due: Friday 5 June 2020 @17:00 (week 11)

Minimum time expectation: 20 hours

Learning Outcomes Assessed

The following course learning outcomes are assessed by completing this assessment:

K1. Identify and use the correct syntax of a common programming language.

K2. Recall and use typical programming constructs to design and implement simple software solutions.

K3. Reproduce and adapt commonly used basic algorithms.

K4. Explain the importance of programming style concepts (documentation, mnemonic names, indentation)

S2. Write and implement a solution algorithm using basic programming constructs. S3. Demonstrate debugging and testing skills whilst writing code.

A1. Develop self-reliance and judgement in adapting algorithms to diverse contexts.

A2. Design and write program solutions to identified problems using accepted design constructs.

Assessment Details

Background

1 https://www.wikiwand.com/en/Common_logarithm

2 Newcomb, S. (1881). Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4(1), 39-40.

sports statistics, molecular weights and so on – and it is base and scale invariant – the length of rivers could be in miles, kilometres, metres or even cubits.

Theory

Although you do not need to know the derivation or proof of Benford’s law, all you need to know is how to apply it to a set of data.

Benford’s law states :

Pr(D d1 ? ?1) log (110 ? d11) d1 ?{1,2,...,9} (Equation 1)

So that, for the first digit in a number, the probability that this digit is a ‘1’ is:

or about 30.1%.

Similarly for the remaining digits 2-9.

If we do this for all the digits and plot them as a bar graph, we get:

Your Task

Develop a Python program which will load up a set of data, determine the frequencies of the leading digits and compare them with the predicted distribution of Benford’s law. Display this in a bar chart and a table of values. For example:

Digit 1: Observed = 0.321 Expected = 0.301

Digit 2: Observed = 0.153 Expected = 0.176 and so on up till digit 9.

We shall look at three cases.

An Excel spreadsheet has been taken from Office-Watch: Benford’s Law and Excel to let you quickly visualize the Python application that we need make.

Case 1 - Fibonacci series

This series begins with two numbers 1,1 – these two numbers are added to continue the series giving rise to the following (only the first 8 terms of the series are shown here):

1,1,2,3,5,8,13,21,. . .

There are many examples of this pattern in Nature and the series is closely related to the Golden ratio.

Using the Excel spreadsheet generate a Fibonacci series up to the 24th term and see if the first digits obey Benford’s Law. Does it get better if you add more terms?

The Chi-test measures how close an actual value is to the expected value – the closer it is to 100% the closer the actual value is to the expected value. In our case, we are testing how close the frequency of each digit in our dataset is to Benford’s prediction for that digit.

What is the value of the ChiTest comparison for this Fibonacci series? Does it get better if we add more terms to the series?

Case 2 – Fibonacci numbers & Benford’s law using Python

In this case you are to repeat the analysis in Case 1 but using you Python code.

Case 3 – Length of Rivers in the World

In this case, use your Python code to see whether the lengths of rivers in the world follow Benford’s law.

Fraud detection using Benford’s Law

One use of Benford’s Law is to detect cases of Fraud. Consider the 1993 case of State of Arizona v Nelson. The accused diverted nearly $2M to fake vendors in an attempt to defraud the State. The frequency of first digits in the written cheques clearly violates Benford’s Law leading to a conviction.

Another case is that of Enron in its posting of revenue for the year 2000. Comparison of the frequency of first digits versus the expected frequency shows large discrepancies. The company went bankrupt the following year – one of the greatest financial failures in history.

Submission

A report is to be submitted in this assignment. There is a discussion section in the report in which you can apply step 6 in the six-step problem solving process and ask the four questions often used in evaluating a solution.

More details on academic reports are available - please refer to this link:

https://federation.edu.au/current-students/learning-and-study/online-help-with/guides-to-yourassessments

There are three important parts at the link above:

1. General Guide to Writing and Study Skills

This section describes the content of a report – refer to page 34 – Abstract, Table of Contents, Introduction and Conclusion and so on.

2. General Guide to Referencing

APA referencing style is described in this section – EndNote is also available to students

3. Assignment Layout and Appearance Guidelines

This section describes how the report should appear: margin sizes, fonts, how diagrams and tables are presented and so on.

You must supply your program source code files and your documentation, together with any files required to run your application, as a single zip file named as follows:

YOUR-NAME _ YOUR-STUDENT-ID .zip

e.g. Ada_LOVELACE_30331815.zip

You may supply your word processed documentation in either Microsoft Word or LibreOffice/OpenOffice formats only – no proprietary Mac specific formats, please.

Assignments will be marked on the basis of fulfilment of the requirements and the quality of the work.

In addition to the marking criteria, marks may be deducted for failure to comply with the assignment requirements, including (but not limited to):

• Incomplete implementation(s), and

• Incomplete submissions (e.g. missing files), and

• Poor spelling and grammar.

You might be asked to demonstrate and explain your work.

Marking Criteria/Rubric

Task Mark

1 Pseudo-code for all Python scripts 10

2 Final Python code (Exceptions 2 marks), annotated with author details and with comments throughout the code (2 marks), consistent with pseudo-code 10

3 Tests to check that Python code is working correctly 10

4 Case 1 - Fibonacci numbers using example Excel sheet 5

5 Case 2 - Fibonacci numbers using your Python script – bar chart (10) & table

(5) 15

6 Case 3 - Lengths of Rivers using your Python script – bar chart (10) & table

(5) 15

7 Discussion (including 4 Questions in Step 6) 15

8 Report: Abstract, Title Page, Table of Contents (including Figures & Tables),

Introduction, Method, Results, Discussion (including the 4 Questions in Step 6 of problem solving), Acknowledgements & Statement of Authorship, References 20

TOTAL 100

Final Grade /20

Feedback

Ongoing feedback will be given in lectures and labs/tutes online classes and in arranged meeting. Feedback will also be given in Moodle.

Plagiarism

Plagiarism is the presentation of the expressed thought or work of another person as though it is one's own without properly acknowledging that person. You must not allow other students to copy your work and must take care to safeguard against this happening. More information about the plagiarism policy and procedure for the university can be found at http://federation.edu.au/students/learning-and-study/online-helpwith/plagiarism.

Editable Microsoft Word Document

Word Count: 1191 words including Graphs, Calculations and References

And a zip file of BENFORD

This above price is for already used answers. Please do not submit them directly as it may lead to plagiarism. Once paid, the deal will be non-refundable and there is no after-sale support for the quality or modification of the contents. Either use them for learning purpose or re-write them in your own language. If you are looking for new unused assignment, please use live chat to discuss and get best possible quote.

Brief:Each student is required to produce three assignments in the form of:1. Assignment 1-Review of child Protection Case Study2. Assignment 2-Creation of a policy and procedure document3. Project-Historical...Subject Title Business ResearchSubject Code RES 800Lecturer / Tutor Dr Chengeto Chaderopa & Dr Andrew LeSemester May 2021Assessment Title Business Research ProposalLearning Outcome/s • Analyse ethical...Unit name: Information and Knowledge Management in HealthcareUnit Modules (Health informatics, Knowledge management, Health information systems and eHealth)Task (Word Length: 2500 words)Provide a critical...The Role of Brands• Identify the maker• Simplify product handling• Signify quality• Create barriers to entry for new arrivals• Serve as a competitive advantage• Secure price premiumQ: Given the role of...Q: Given the reasons for new product failure, explain each point above in detail clearly.ASSESSMENT 3 – Case Study, Policy development & ObservationPurpose:This is to be used for assessing students via the method of Scenario based questions.Unit Code : SITXCCS008 Unit name: Develop and...Subject Code and Name CMP1041 - Foundation ProgrammingAssessment Number 2Assessment Title Pseudo Coding and FlowchartingAssessment Type ReportLength or Duration Four (4) Tasks / 1 FileSubject Learning...**Show All Questions**