UNIVERSITY OF SOUTHERN QUEENSLAND
CSC8003 – Machine learning, S2 2017
Epileptic EEG Data Classification
Due date: 28 August 2017.
1. Assignment outline
In this assignment, you need do a classification for epileptic electroencephalograph (EEG) signals using the supervised machine learning techniques you learnt in the CSC8003 course. During this assignment, you need submit one report before its deadline to achieve the 20% assessment.
2. Background knowledge and Data description
What is Epileptic EEG Data Classification?
Epileptogenic localization is a critical factor for successful epilepsy surgery. Therefore, it is meaningful to classify the epileptic EEG Data and nonepileptic EEG Data accurately. The main process for EEG data classification can be seen in Figure 1. Our assignment only covers the classification stage.
Figure 1: EEG Data Classification
Your target in this assignment is classifying the epileptic EEG Data and nonepileptic EEG Data based on given features set.
At the beginning of the assignment, 9000 cases of patients’ data are given as attachments. They include 6000 cases training data and 3000 cases testing data. For all the 9000 cases, there are 8 data sets which are feature data sets(x1, x2, x3, x4, x5, x6, x7 and x8). In addition, for each case, there is a label to show whether it is an epileptic EEG Data (S) or nonepileptic EEG Data (N).
The 8 feature data sets are calculated from the raw EEG data using 8 different feature extraction methods (First Quartile Q1, Third Quartile Q3, Q1-Q3, Standard Deviation, Min, Max, Mean and Median). However, all the feature data x1 from different cases are calculated by the same feature extracting method. So do x2, x3, x4, x5, x6, x7 and x8.
In this report, the following content should be covered:
• Survey about machine learning application on epileptic EEG Data classification. The survey should only focus on the machine learning application on feature selections and model building. Not necessary for feature extraction.
• Analyze the data sets given in this assignment, what are their features?
• According to the data sets and survey, discuss which machine learning methods you will use in this assignment and show the reasons. You need selected two methods from the following options:
1. Decision Tree
3. Neutral networking
4. Support vector machines
• Use the machine learning methods you selected to process the training data sets and briefly analysis which data set is useful for EEG Data classification. The data set here means feature set, for example, x1 feature set. In training set, the x1 feature set has 6000 values.
• According to the labels, two machine learning methods are discussed and used to do the EEG Data classification based on training data sets.
• The feature selection methods and model building results need to be presented in your report clearly. You need add key equations, figures or tables to present your methods and results. The programming codes and supporting figures or excel data should be presented in the appendix part of the report.
• The performance of your classification methods is evaluated based on testing data sets and their labels. The results should be presented in tables or figures.
• You should comparing the results which are obtained by two different machine learning results. Discuss the advantages and disadvantages of your two machine learning methods used in this assignment separately.
• The report includes more than 1100 words but less than 8 pages (except title page, table of content, appendix and reference list).
For each report, you should save all your documents in one PDF document.
Please submit the report via “Assignment1 submission portal links” on the StudyDesk before the deadline.
You are allowed to submit only once. Make sure that the file you submit is the correct file and the correct version.
5. Marking Criteria
The Marking Criteria is presented in another file:Assignment1 marking rubric.
• The machine learning methods you used in this assignment should not be out of our course materials.
• You are encouraged to use Matlab to do the programming code for this project. You can also use other software. However, you may not obtain advising nor feedback from teaching staffs for your code.
• The experimental results are not the most important. The analysis and deep thinking about machine learning methods application are more significant. To obtain full mark, you should present your analysis and your deep understanding clearly and logically.
6. Plagiarism and Academic Misconduct
USQ has zero tolerance to academic misconduct including plagiarism and collusion. Plagiarism refers to the activities of presenting someone else's work as if you wrote it yourself. Collusion is a specific type of cheating that occurs when two or more students exceed a permitted level of collaboration on a piece of assessment. Identical layout, identical mistakes, identical argument and identical presentation in students' assignments are evidence of plagiarism and collusion. Such academic misconduct may lead to serious consequences, such as:
• Required to undertake additional assessment in the course
• Failed in the piece of assessment
• Awarded a grade of Fail for the course
• Withdrawn from the course with academic penalty
• Excluded from the course or the program for a period of time
Refer to USQ Policy Academic Misconduct- for further details.