Assessment item 3 - Practical Analysis and Report Writing
Due Date: 05-Oct-2020
Return Date: 26-Oct-2020
Submission method options: Alternative submission method
This assessment task, is related to Topic 1, and Topics 5-9.
Task 1 : Practical Analysis [50 marks]
There are two steps to complete in this task:
Step 1: You are required to perform a data mining task to evaluate different classification algorithms. Load the soybean.arff data set into Weka and compare the performance on this data set for the following classification algorithms:
• Naive Bayes
• SVM ( or SMO)
Step 2: From step 1 outputs, write a report that shows the performance of the different algorithms and comment on their accuracy using the confusion matrix and other performance metrics used in Weka. In your report consider:
• Is there a difference in performance between the algorithms?
• Which algorithm performs best?
Your report should include the necessary screenshots, tables, graphs, etc. to make your report understandable to the reader.
Task 2: Data Mining Report [50 marks]
Topic: Security, Privacy and Ethics in Data Mining.
In this task, you are required to read the journal articles provided below and write a short discussion paper based on the topic of security, privacy and ethics in data mining. You must:
• identify the major security, privacy and ethical implications in data mining;
• evaluate how significant these implications are for the business sector; and
• support your response with appropriate examples and references (at least 2 additional references should be sought in addition to the ones below).
The recommended word length for this task is 700 to 1000 words.
Ryoo, J. ‘Big data security problems threaten consumers’ privacy’ (March 23, 2016) theconversation.comhttp://theconversation.com/big-data-security-problems-threatenconsumers-privacy-54798
Tasioulas, J. ‘Big Data, Human Rights and the Ethics of Scientific Research’ (December 1, 2016) abc.net.auhttp://www.abc.net.au/religion/articles/2016/11/30/4584324.htm
This assessment task will assess the following learning outcome/s:
• be able to identify and analyse business requirements for the identification of patterns and trends in data sets.
• be able to appraise the different approaches and categories of data mining problems.
• be able to compare and evaluate output patterns.
• be able to explore and critically analyse data sets and evaluate their data quality, integrity and security requirements.
• be able to compare and evaluate appropriate techniques for detecting and evaluating patterns in a given data set.
MARKING CRITERIA AND STANDARDS
The grade you receive for this assessment as a whole is determined by the cumulative marks gained for each question. The tasks in this assessment involve a sequence of several steps and therefore you will be marked on the correctness of your answer as well as clear and neat
presentation of your diagrams, where required.
Practical Analysis and Report Writing
Criteria HD DI
CR PS FL
The student The student The student The student The student has has has understood has understood has not thoroughly understood the the fully understood
Task 1 : understoodthe theclassification classificationmethods, classificationmethods, theclassification
Practical classification methods, providing a providing a methods,
Analysis methods, providing a description of description of providing a [50 providing a detailed the methods the methods description of marks] detailed description of and its output and its output the methods description of the methods on the given on the given and its output the methods and its output data set. The data set. The on the given and its output on the given discussion discussion data set. The on the given data set. The involving the involving the discussion is
Criteria HD DI CR PS FL
involving the validation and validation and fully involving validation and accuracy of the accuracy of the the validation
accuracy of model model shows and accuracy of the model demonstrates basic the model
demonstrates understanding understanding shows basic
good of the of the understanding
understanding classification classification of the
of the methods as methods as classification
classification applied to the applied to the methods as
methods as given data set. given data set. applied to the applied to the given data set. given data set.
Demonstrate Demonstrate anability to ability toanalyse, reason
an ability to analyse, reason and discuss analyse, and discuss the most concepts reason and concepts to to draw
discuss the draw justified justified Demonstrates concepts to conclusions conclusions incomplete/ draw justified that are that are insufficient conclusions generally generally research in that are logically logically security, privacy logically supported by supported by and ethics in supported by examples and examples and data mining examples and best practice. best practice. with incomplete best practice. The answers The answers responses The answers are generally are partially supported by no are logically logically structured into or irrelevant
structured to structured to loosely-linked examples, create create a rudimentary incorrect
cohesive and comprehensive, sentences to terminologies coherent mainly create a and poor/
piece of descriptive comprehensive, inadequate analysis that piece of descriptive references. consistently analysis. Some piece of use correct use of correct analysis. Some data mining data mining use of correct terminologies. terminologies. data mining terminologies.
data set. The discussion involving the validation and accuracy of the model demonstrates thorough understanding of the classification methods as applied to the given data set
an ability to analyse, reason and discuss the concepts to draw justified conclusions that are logically supported by examples and best practice.
Task 2 : Answers
Writing integrate and
Marks] information into cohesive and coherent piece of analysis and consistently use correct data mining terminologies and
You are recommended to write the answers in a word document and submit it via Turnitin. You can also submit your document in pdf format as well.
Your answers to the questions should be precise but complete and informative.
It should also include a background that describes the dataset eg the number of attributes, number of instances, the distribution of the class attribute and so on. It should also describe how the experiment was set up, whether or not default parameters for each of the algorithms where used or not.
A brief summary of each algorithm should be provided with screenshots of the results.
An analysis that compares the results should be provided.
Marks distributed as follows: 10 marks for background, 7.5 marks each for an algorithm (for a total of 20 marks), 10 marks for the analysis that compares the results.
Your report should have an introduction, a main section covering each of the three requirements and a conclusion.
Your report should also include at least 2 additional relevant references.
Marks are distributed as follows: 7.5 marks for introduction, 10 marks each for the 3 requirements, 7.5 marks for the conclusion, 5 marks for logical structure