Recent Question/Assignment

32130 Fundamentals of Data Analytics
Assignment 1: the data analytics consultant
Due date Thursday 13 April 2017 11:59PM
Marks Out of 100, weighted to 35% of your final mark.
Submission format Adobe PDF (preferable) or MS Word Doc.
Filename ida_a1_xxxxxxxx.pdf or ida_a1_xxxxxxxx.doc where xxxxxxxx is your student id.
Report format Ten pages maximum with the information and sections described below. Use 11 or 12 point Times or Arial fonts.
Submit to UTS Online assignment submission button.
Please, make sure to call the filename as described above and make sure you put your name and student ID in the report.
In this assignment you need to develop a project proposal that would use data analytics methods to address a problem for a company. Choose one of the areas from the table below. Formulate a specific business problem in that area and make up the specific details of the company yourself.
You will also give a 3-­-minute pitch for your project proposal. Please submit this as a link to a YouTube video or similar.
This assignment is individual work.
The project proposal is oriented towards a client to provide funding to support such projects in science, business and technology. The funding is issued on a competitive basis, so the aim of your proposal is to convince the client to fund your project. A good proposal communicates the importance of the problem, makes a strong case that the proposer (i.e. you) knows how to go about solving the problem and leaves the impression that you would be successful if you were given the money.
The project proposal is limited to 10 pages and should include the following:
• Project title
Give a title that describes what the project seeks to do.
• Your name and student ID
Remember to provide these so that I know who to give the marks to!
• Section 1: Aims, objectives and possible outcomes.
Provide a clear statement of the aims and objectives of the data analytics study and the possible outcomes in terms of discovered knowledge and its potential application towards solution of the problem. In this section you need to discuss the business problem.
• Section 2: Background.
In this section you should include the background information to the problem, including the approaches that have been used so far by other researchers. You will need to do some research into how other people have tried to solve the problem. This section should demonstrate to the client that you have a clear picture of what is happening in the field and how similar problems have been approached so far. It is even better if you can point out deficiencies in how others have tried to solve the problem and link that to your proposal. Do not forget to refer to the sources of the information that you have used in your References section.
• Section 3: Data analytics scenario and methodology.
This section should take into account the CRISP-­-DM methodology. Here you discuss the data analytics problem you have formulated from the business problem. In this section you should:
o formulate the problem as a data mining problem and identify the data analytics tasks;
o formulate the data collection and organisation strategy (what kind of data, how to record it, format(s) in which it is preserved, integration issues, and, if applicable, changes in current data collection and organisation strategies) relevant to the objectives and the possible outcomes of the project;
o briefly discuss some of the data mining method(s) that might be used; o briefly consider how the results will be evaluated with respect to the project objectives;
o briefly consider how to deploy the results into the business.
Your proposal will benefit if you include examples of data, similar to the one that you plan to collect. Include also examples of the results that the data mining methods produce from these data, illustrating their applicability to the problem. You may illustrate your proposal with examples of what you can get out of the tools for the type of data that you address -­- if that is done correctly then it will definitely convince the client that you know what you are talking about.
• Section 4: Plan and timetable.
In this section you should provide details about the plan to run the project, including a timeline and a budget that has the project completed (if necessary consider possible contingencies).
• References
List the references for information used in the background section in Harvard format.
Problem Areas
Problem Area Description
Biomedical and
DNA data analysis Client: Cancer Research Centre, which produces microarray DNA data, SNP data and possesses also clinical records of the patients from whom the DNA samples were taken.
Current data source: Microarray data, patients' clinical data, patients' demographic database.
Problem: Identify particular gene sequence patterns that play a key role in cancer diseases.
Brief Background: An important focus in medical research, particularly for cancer, is the study of DNA sequences since such sequences form the foundation of the genetic codes of all living organisms. A gene is comprised of hundreds of individual nucleotides arranged in particular order. There are almost an unlimited number of ways that nucleotides can be ordered and sequenced to form distinct genes. Since many interesting sequential pattern analysis and similarity search techniques have been developed in data mining, the biologists and medical researchers in the Cancer Research Centre expect data analytics to be able to contribute to the identification of co-­-occurring gene sequences and to link genes to different stages of disease development.
Detecting financial
fraud Client: A large Australian bank, which collects relatively complete, reliable and high quality data.
Current data source: Distributed databases, which have data about business and individual customer transactions (including ATM transactions), credit (such as business, mortgage, and car loans) and investment services (such as mutual funds, stock investment).
Problem: Detect fraudulent activities.
Brief Background: One of the steps in detecting money laundering and other financial crimes, is to integrate information from different databases (like bank transaction databases, federal or state police databases,
even criminal library databases). Data analytics and data mining can identify important relations and patterns of activities and help financial investigators to focus on suspicious cases for further detailed examination. Most likely the project will require the use of a broad range of data mining tools that will operate over different data.
financial products
Client: National financial institution, which has collected data over the past three decades.
Current data source: Historical data about loans, customers (including income levels, education level, residence region, credit history, etc.), loan packages and their performance.
Problem: Develop novel financial products that will be attractive to a broad range of customers.
Brief Background: Customer profiles, including customer credit analysis and loan payment prediction, are critical to the business of a financial institution. On the other hand, in a competitive market, loan packages have to offer features markedly different, perhaps targeting specific customer segments. Data mining methods may help to identify potential customer segments, important factors that may influence the selection of a loan package, and eliminate irrelevant factors. Based on the results, the financial institution may then decide to adjust packages, change its loan-­granting policy so as to grant loans to those whose application was previously denied for particular package, but whose profile, derived from the patterns, discovered in the data, shows relatively low risks under specific conditions.
Sales and marketing Client: Large retail company, which collects huge amounts of data on sales and customer shopping history (through a loyalty card scheme).
Current data source: Data about transactions, demographic information for customers with a loyalty card.
Problem: Identify and support loyal customers.
Brief Background: Customer loyalty and purchase trends can be analysed in a systematic way. Goods purchased at different periods by the same customers can be grouped into sequences. Methods of sequential
pattern mining can then be used to investigate changes in customer consumption or loyalty, and suggest adjustments on the pricing and variety of goods in order to help retain customers and attract new ones.
fraud Client: Large telecom company
Current data source: Multidimensional data
(dimensions, such as calling time, duration, location of caller, location of callee, type of call, etc.).
Problem: Identify typical patterns of fraudulent activity and identify other unusual behaviour patterns.
Brief Background: Fraudulent activity costs the telecommunication industry millions of dollars a year. It is important to identify potentially fraudulent users and their usage patterns; detect attempts to gain fraudulent entry to customer accounts; and to discover unusual patterns that may need special attention, such as busy-­hour, frustrated call attempts, switch and route congestion patterns, and periodic calls from automatic dial-­-up equipment, like computer logins and logouts, that differ from typical patterns of such calls. The expectation is that cluster analysis and outlier analysis may do a good job.
Sales and marketing in
industry Client: Large telecom company
Current data source: Multidimensional data
(dimensions, such as calling time, duration, location of caller, location of callee, type of call, etc.).
Problem: Identify successful aggregations (package deals) of telecommunication services.
Brief Background: The telecommunication industry has quickly evolved from offering local and long-­-distance telephone services to providing many other comprehensive communication services including voice, fax, pager, mobile phone, images, e-­-mail, computer and Web data transmission and other data traffic. With the deregulation of the telecommunication industry and the development of new computer and communication technologies, the telecommunication market is rapidly expanding and highly competitive. Identifying unique and competitive packages of services is one way to survive in such market. For example, suppose that you have discovered that -If a customer in NSW works in a city different from the residential one (e.g. works in
Sydney and lives in Wollongong), s/he is likely to first use the long-­-distance service between the two cities around 5:30 pm and then to use a mobile phone for at least 30 minutes in the subsequent hour every weekday. Further analysis may determine whether this holds for particular groups of persons (e.g. age group or profession group) and particular pairs of cities. Then this can help promote the sales of specific long-­-distance and cellular phone combinations (package deals) and improve the availability of particular services in the region.
Stock exchange
Client: Major stock exchange market
Current data source: Record of all transactions at the stock exchange, database of financial news, collection of transcripts of discussions of stocks that are traded at this stock exchange.
Problem: Identify patterns of insider trading and the influence of different events on the price of particular shares.
Brief Background: The development of software systems that collect data directly from the stock market led to the collection of enormous amounts of historical data about the behaviour of different players at the stock exchange. Moreover significant amount of text data, in the form of news and transcripts from chat rooms, is available as complementary data. The idea is whether data mining methods can be used to utilise these data sets, discover unusual sequential patterns of behaviour and connect them with the price variation of particular stock. Going further -­- can data analytics methods for unstructured data be used to discover the influence of specific events (e.g. a visit of the new Pope, a new member on the Board of Company Directors, a change of a CEO, etc.) on the price of particular stocks.
The Pitch
The aim of your 3-­-minute pitch is to sell the idea to an investor, i.e. the coordinator and the rest of the class. In 3 minutes you will not be able to give more than an overview of the most important aspects of the project with the aim of exciting the investors. Make your pitch as a YouTube video or similar and submit the link to the video as part of your assignment. The best pitches will be shown in class.
This assignment is assessed as individual work. The assessment criteria are:
• Formulation of the business problem in terms of the specific aims, objectives and potential project outcomes (section 1) -­--­- 20%
• The background to the data analytics project in terms of comprehensiveness and understanding (section 2) -­--­- 20%;
• Formulation of the data analytics problem and methodology and how well they connect to the aims, objectives and possible outcomes of the project
(section 3) -­--­- 20%;
• The feasibility of the planned data analytics solution and how well it ensures that the goals will be achieved. (section 4) -­--­- 20%;
• Quality of the 3 minute pitch: was it within time? does it inspire investment? did we understand what you were proposing to do?
Relationship to Objectives
This assignment addresses subject objectives 1 and 5.
Return of Assignments
We plan to return marked assignments within 3 weeks of submission. Emails will be sent when marking is complete.
Academic Standards
All text in your assignment should be paraphrased into your own words and referenced using the Harvard referencing style. Please refer to the Subject Outline for details about penalties for Academic Misconduct.
Late Penalties
A late penalty of up to 50% may be applied to submitted work unless prior arrangements have been made with the subject coordinator. Unless an extension has been approved, assignments submitted late will incur a penalty of 10% per calendar day or part thereof up to 5 days after which the assignment will not be accepted.
Special Consideration
You may apply for special consideration (SC) due to unforeseen circumstances as described in the subject outline. You must provide documentary evidence to support your claim, such as a doctor's certificate, a statutory declaration, or a letter from your employer. Note
The assignments will be checked through the Turnitin ® Plagiarism Prevention system, for identifying unoriginal material, copied (without reference to the source) from an electronic source on the Internet, electronic libraries, other assignments.