CSC8001: Data Science Project Report
For your Final Project, your task is to analyse and report on the extent and nature of the injury or violence problems for your assigned country. The country you will report on has been assigned based on the last digit of your Student ID number. country :8
Country Last digit of Student ID number
Costa Rica 3
“As more governments around the world come to recognize that injuries and violence can and must be prevented, many are trying to get a better understanding of the problem in their countries as a basis for designing, implementing and monitoring effective prevention strategies
(World Health Organization, 2015).”
World Health Organisation, “Injuries and violence: the facts 2014”, p.14
Data Based Prevention and Control
Please read the Injuries and violence: the facts 2014 publication included in the Project_Files folder. The
World Health Organisation’s publication, Injuries and violence: the facts 2014, outlines a four step
Prevention and control process (Figure 1). Your Data Science Project Report will be based on the Surveillance step of this process: Using data to understand the extent and nature of the injury and violence problem.
Figure 1: Prevention and control
All of your final code, analysis and discussions will be included in a single Jupyter notebook (Python 3) with appropriate headings for each section. All plots should meet the expected standards of visualisation and be displayed inline. Tables can be created by either formatted print statements or using markdown cells with Markdown or HTML code. Your report may include supporting images if you feel this is relevant. You may also source additional datasets to support your analysis and discussions, again only if you feel it is relevant.
Your Jupyter notebook project report should include the following sections:
1. Introduction [10 marks]:
o Introduce and discuss relevant features/facts of your country. You may find the WHO’s country pages (http://www.who.int/countries/en/) and the The World Factbook (https://www.cia.gov/library/publications/the-world-factbook/) helpful. All data sources should be appropriately referenced.
2. Datasets [10 marks]:
a. Discuss the datasets you’ve used and any relevant details which are required to understand how you’ve extracted your country’s data. Examples of relevant information for the WHO database would be:
i. the WHO Mortality Database country code for your selected country,
ii. the ICD files, Years and Lists used in your analysis,
iii. a table which summarises the code and causes of death descriptions for your country’s leading causes of death, as discussed in your report,
iv. a table which summarises which causes of death codes you classified as a death due to injury or violence.
Any long detailed tables should be included as an Appendix at the end of your report.
3. Analysis and Discussion [70 marks]:
In this section you will analyse and discuss the extent and nature of the injury or violence problem for your selected country. You are required to provide graphs and tables, as indicated below, to support and illustrate your analysis and discussion. You may also include additional graphs and tables where and if you feel they are relevant. .
This section should provide analysis and discussion to address the following questions:
a. What are the current leading causes of injury deaths in your country?
i. Provide a pie chart, which displays the top 5 leading causes of deaths due to injuries and violence based on your country’s most current year’s data. Include the remainder of the injury and violence deaths as other.
b. Have injury deaths risen in rank over the last thirty years?
i. Provide tables which compare the top twenty causes of all deaths (not just due to injuries and violence) over the last thirty years, in fifteen year increments starting from the most current year’s data and going backwards. For example, if your country has data from 1950 to 2010 you will have a table displaying the top 20 causes of all deaths for the years 1980, 1995 and 2010.
ii. Provide a time series chart depicting how the top ten current leading causes of injury and deaths have changed over the last thirty years. Your data should be in five year increments. For example, if your country has data from 1950 to 2010, your chart will include the years: 1980, 1985, 1990, 1995, 2000, 2005 and 2010.
c. Are some groups more vulnerable to injuries and violence than others?
i. Provide a vertical bar chart which displays the death rates by cause of injury and group for the most current year’s data. Use the top five current leading causes of injury and deaths (from part b), and the following groups: youths (ages 15-29, any gender), males (all ages) and females (all ages).
d. Does poverty increase the risk of injury?
i. Compare your country’s deaths due to injury and violence to another country in your world region which has a different WHO income level classification. For example, per the provided LMIC-HIC_country_grouping document, Australia’s WHO region code is Wpr HI, indicating that Australia is in the West pacific region and is considered High income. So a good comparison country for Australia could be Vanuatu, with a WHO region of Wpr LMI.
For each of your countries, Provide an appropriate chart with the top twenty causes of all deaths (not just due to injuries and violence) based on the most recent years data which is available for both countries. For example, if the most recent year’s data for one country is 2011, and the other’s is 2009, you should use the 2009 data for both countries. ii. Provide a vertical bar chart displaying the percentage of all deaths due to injury and violence for both countries.
4. Conclusion [10 marks]:
a. Summarize the main points discussed in your report, including your findings for the leading causes of death due to injury and violence for your country.
All sources must be referenced, but since Data Science is a broad multi-disciplinary field I leave the choice of referencing style up to you. The style you choose must be used consistently for all in-text references and all in-text references must be included in your list of references.
6. Appendices (optional)
Please review the marking criteria document provided.
Submit a single zip file which contains your Jupyter project notebook and all other files that are necessary to reproduce your notebook. When I test your project notebook I will unzip your submission to a local folder on my machine, and re-run all cells on your notebook. Please make sure that all links in your Jupyter notebook to data files, imported code files, images, etc. are relative to the notebook.
WHO Mortality Database
The data for this assignments is provided by the WHO Mortality Database available at:
http://www.who.int/healthinfo/mortality_data/en/. Your analysis can be verified by using the WHO online tool available at: http://apps.who.int/healthinfo/statistics/mortality/causeofdeath_query/ The online tool allows you to query the WHO Mortality database to:
• Extract data for causes of death by:
o country, year, sex and age with all individual causes of death
o country, year, sex and age for a few selected causes of death by coding systems
• Extract data for population data by:
o country, year, sex and age
Review the data and documents available in the provided Project_Files folder. Table 1 below describes the contents of each file.
Table 1. Data and document files (last updated 25 November 2015)
Documentation_25nov2015.doc Word file with information on the WHO Mortality Database, file specifications and list of causes of death. Last updated: 25 November 2014
list_ctry_years_25nov2015.xlsx Excel file with the list of countries-years available for the mortality and population data. Last updated: 25 November 2014
country_codes.csv Country codes and names.
Last updated: 03 November 2014
notes.csv Notes pertaining to data for some countries-years. Last updated: 25 November 2014
pop.csv Reference populations and live births (for regular users, figures are now in units).
Last updated: 25 November 2014
MortIcd7.csv Data file containing the detailed mortality data for the seventh revision of the ICD (International Classification of Diseases).
Last updated: 18 February 2004.
Morticd8.csv Data file containing the detailed mortality data for the eighth revision of the ICD (International Classification of Diseases).
Last updated: 09 July 2012.
Morticd9.csv Data file containing the detailed mortality data for the ninth revision of the ICD (International Classification of Diseases).
Last updated: 25 November 2015.
Morticd10_part1.csv Data file containing the detailed mortality data for the tenth revision of the ICD (International Classification of Diseases).
Last updated: 25 November 2015.
Morticd10_part2.csv Data file containing the detailed mortality data for the tenth revision of the ICD (International Classification of Diseases).
Last updated: 25 November 2015.
LMIC-HIC_country_grouping.pdf Country 3 letter code, WHO region and Income regions. Last updated: May 2014
WHO-Violence_Injury_Prevention.pdf WHO document which highlights that more than 5 million people die each year as a result of injuries, resulting from acts of violence against oneself or others, road traffic crashes, burns, drowning, falls, and poisonings, among other causes.
World Health Statistics 2016-SDG.pdf The World Health Statistics series - WHO’s annual compilation of health statistics for its 194 Member States. World Health Statistics 2016 focuses on the proposed health and health-related Sustainable Development Goals (SDGs) and associated targets.