The importance and difficulty of ground truth generation for EEG data analysis through machine learning

As you may know if you have been following this blog ground truth plays a very important role in the implementation of machine learning algorithms for EEG data analysis. Machine learning includes several adaptive procedures. At the core of supervised algorithms is the concept of learning by example. We show different examples of possible EEG streams coming from different subject/user states/task classes and the algorithms find out by means of mathematical computation where to place the borders among samples. So the examples shown to the algorithm must belong to different classes and these classes have to be known in advance. This is what is called ground truth (or golden standard in other circles). The ground truth is also important for the performance evaluation through the procedures presented in past posts[1],[2].

When you develop algorithms to be applied for diagnosis a natural way to obtain the ground truth is to take the diagnosis result as such. But, what happens if the patient has not been rightly diagnosed? You might think this is an unusual case, but it is not. Especially, the fact that you are working in a diagnosis system based on EEG might mean that the current alternative diagnostic systems do not work so well (other possible reasons being e.g. the cost, invasiveness, or non-acceptance of current diagnosis procedures)


An example of ground truth-associated problems has happened to us in the MJFF (Michael J. Fox Foundation[3]) project[4]. To recall a bit we are analyzing in this project a set of EEG data half controls, half patients with so-called REM Behaviour Disorder (RBD). RBD patients are at risk of developing a neurodegenerative disease like Parkinson’s (PD) after 13 years on average. So we have used machine learning in the project for predicting which subjects with RBD will develop PD. We obtained extraordinary results with approximately 50 subjects. However when doubling the number of subjects the performance of the classification rapidly degraded. After discussing with Jacques Montplaisir and his group at H. Sacré Coeur Montreal[5], who are providing the data, we realized that this second group of subjects included several RBD patients with a short follow-up (1 year), and therefore who did not have the chance to develop PD (although their EEG shows that they will do in the future). So we were probably using a group of subjects as belonging to the RBD class, while they should have been assigned to another one.

A second example came to my mind while attending a very interesting talk by Steven Laureys and Jacobo Sitt in Paris. The talk was organized within the Human Brain Project ethical activities. The talk dealt with so-called disorders of consciousness (DOC), a term that includes patients in permanent vegetative state, minimum conscious state, and with locked-in syndrome. Interestingly enough some of the consciousness level measures used in diagnostics of such disorders are based on EEG, but I leave this topic for a future post. Coming back to the original topic, one problem of DOCs is the high misdiagnosis rates associated with them, i.e. around 40% of patients. So with such numbers it is an adventure trying to apply Machine Learning, at least in its supervised version. However Sitt and his colleagues[6] have tried to classify different EEG biomarkers of consciousness as obtained from DOC patients and healthy controls. The preliminary results are not promising for DOC diagnosis, but make an interesting distinction within the patients in vegetative state, being able to predict subjects who will recover fast. You will find the results in their paper to be published in Brain[7] in the short term.

At the core of the problem lies the fact that diagnosis in medicine is based on the subjective evaluation of symptoms. If you take for instance psychiatric diagnosis the well-known DSM[8] is a collection of symptoms that a psychiatrist assesses in order to issue a diagnosis. This is why a trend in medicine, particularly in mental health, is trying to change from such subjective evaluation to a more objective one based on data, like EEG, MRI, and fMRI. Definitely machine learning will play a role in the implementation of this vision. But some fundamental problems, like this of the ground truth generation, have to be solved first.

[1] https://blog.neuroelectrics.com/blog/bid/286066/How-Good-is-my-Computational-Intelligence-Algorithm-for-EEG-Analysis

[2] https://blog.neuroelectrics.com/blog/bid/314411/Two-alternatives-for-performance-evaluation-in-EEG-analysis-through-Computational-Intelligence

[3] https://www.michaeljfox.org/

[4] https://blog.neuroelectrics.com/blog/bid/324690/EEG-analysis-through-Machine-Learning-can-help-predict-Parkinson-s

[5] http://www.ceams-carsm.ca/en/jacques.html

[6] http://www.unicog.org/pm/pmwiki.php/Main/People

[7] http://brain.oxfordjournals.org/

[8] http://en.wikipedia.org/wiki/Diagnostic_and_Statistical_Manual_of_Mental_Disorders

Leave a Reply

Your email address will not be published. Required fields are marked *