Classifying ECG Data Using a Multi-layer Perceptron and m-Ways Cross Validation

by Cody Dunn

Spring 2016

CS/ECE 539 – Yu Hen Hu

Abstract:

Classifying ECG waveforms has a number of clinically significant applications if a network can be trained to distinguish between normal and abnormal profiles. This project uses deep learning techniques to train multi-layer perceptrons on a dataset of preprocessed waveform profiles in order to test their ability to classify novel profiles. It uses m-ways cross validation to produce aggregate performance measures. Experiments produced 95-99% classification accuracy across all trials, with highest performance occurring in simple networks, as well as complex networks with high values of m. Further work is needed to increase the speed of training and classification, as well as to develop a method of processing incoming ECG data in a timely manner, if clinical application is desired.

Introduction:

The intent of this project is to use deep neural network techniques to classify echocardiogram data as either normal or abnormal. There are several reasons why such classification is useful in a clinical setting. Several studies, such as Santengeli et al., have shown that abnormal heartbeats, like pre-ventricular contractions (PVCs) are strong predictors of heart attacks and other ailments [1]. Too, industry experience has taught me that clinicians perform a significant amount of manual labor in order to annotate such abnormal heart beats during what is called a “code blue,” an episode of cardiac or pulmonary arrest. Physicians later review these annotations for various kinds of information, such as how patients responded to medication. Thus, the ability to automatically identify and annotate these heartbeats in real time leads to more efficient workflows for these physicians and better detection of downward turns in patient health. Therefore, the goal is to train an artificial neural network on a set of waveform profiles and use that trained network to classify novel waveform data.

Work Performed:

The dataset I used in the project was already processed, so while no work had to be done to prepare the data, it will be useful to explain what has already been done. I am using a dataset from the Massachusetts Institute of Technology and Beth Israel Hospital (MIT/BIH) [2]. The dataset includes over 70,000 waveform profiles, each with nine features and an associated tenth feature, the label itself (1 for normal and -1 for abnormal). The first four feature vectors store temporal information, such as the distance between the peak of the current QRS complex and the next (see figure). The fifth and sixth feature vectors represent the normalized correlation between this waveform and the next, indicating their similarity. The final three features are meant to store the proportion of the waveform that exists outside predefined thresholds. This is because normal QRS complexes are sharp and compact, meaning that very little of the waveform exists in extremity. The task for the neural network is to associate the first nine features to the label in such a way that it may classify a new waveform based solely on its nine features alone.

My code performs m-ways cross validation on the dataset using a deep neural network of various settings. The Matlab code I used was “Deep Neural Network with Back Propagation” by Hesham Eraqi, an easily accessible open-source script [3]. The script trains the neural network on a given dataset for a defined number of epochs, or until the mean squared error across a testing portion reaches zero. Every two-hundredth epoch resets the neuron weights and mean squared error in an attempt to avoid local minima, but this was not found to have a significant improvement on results (sometimes actually making them worse), so few experiments went above 199 epochs. My portion of the algorithm divided up the data into m partitions and trained the network on the remaining m-1. I then tested the trained network on the set-aside partition and averaged the mean squared error across all m trials to form an aggregate performance measure. This was performed for a varying number of partitions (m), learning rates (mu), epochs (mostly under 199) and hidden neuron configurations. The goal of the experimentation is thus to find the optimal configuration of these variables.

Any software implementation of this task will require an incredible amount of computation and is prone to slow performance. Many neural network solutions leverage the system’s graphics card as an extra processor in order to increase performance. However, many packages (such as Caffe) operate on NVIDIA graphics cards [4]. My implementation ran on a Macbook, which do not come equipped with such cards. Therefore, my project used solely the laptop’s standard processor and could not leverage extra hardware. Because of this, extensive trials using many epochs ran for days at a time. Since the task itself is so complicated and hardware could not be leveraged against it, the amount of trials I was able to run in a reasonable amount of time was relatively low. Anyone trying to experiment with a similar task in the future would do well to make use of hardware solutions or explore more efficient software implementations.

Results and Discussion:

Config: / Epochs / m / Mu / Accuracy
[10] / 50 / 7 / .05 / 97.20%
[10] / 50 / 7 / .15 / 97.36%
[10] / 50 / 7 / .5 / 98.41%
[10, 10] / 50 / 7 / .05 / 97.05%
[10, 10] / 50 / 7 / .15 / 96.42%
[10, 10] / 50 / 7 / .5 / 97.53%
[10, 10, 10] / 50 / 7 / .05 / 96.65%
[10, 10, 10] / 50 / 7 / .15 / 96.84%
[10, 10, 10] / 50 / 7 / .5 / 95.99%
[10, 10, 10] / 500 / 7 / .05 / 96.35%
[10, 10, 10] / 50 / 15 / .5 / 99.00%
[10, 10, 10] / 50 / 15 / .15 / 98.88%
[10, 10, 10] / 50 / 15 / .01 / 98.85%

Due to time constraints, I was only able to run thirteen trials, though some interesting trends still emerged. For each trial, I would alter either the number of epochs, the number of partitions, or the hidden neuron configuration and test across three recurring learning rates: 0.05, 0.15 and 0.5 (see table). In general, simpler networks performed better than more complicated ones, as did those that had a higher number of partitions. The top performer overall was a network with three hidden layers of ten neurons apiece and an m value of fifteen (i.e. fifteen partitions). This network exhibited 99% classification accuracy when trained over 50 epochs. The only network trained above 50 epochs, which was trained for 500, exhibited some of the worst performance. Too, networks that trained with a higher mu value (.5) performed better overall, suggesting that the low number of epochs necessary within my time limits necessitated a high learning rate to rush good results. The overall worst network was a three-layer network with ten neurons at each level, trained over fifty epochs with a learning rate of .5. Since most networks with high learning rates performed better than their counterparts, all other variables held constant. This suggests that the complexity of the network may over-fit the training data and generalize poorly.

The main takeaway from the results is the strong performance of networks with higher m values and low complexity. Jadhav et al. were able to classify heartbeats with 92-100% accuracy when classifying them as normal or specific classes of abnormal beats [5]. Since their task is more fine-grained with respect to identifying specific kinds of abnormal beats, their problem space is more complex than mine. Therefore, the fact that my results performed on the high end of the results for a more complicated problem is exactly what one would hope for. My results suggest that networks perform better when they have more access to data, since a network with m equal to 15 has access to 14/15 of the data, whereas a network with m equal to 7 only has access to 6/7. Too, if the network poorly classifies any of the m partitions, minimizing that partition’s size and maximizing the number of other partitions will mitigate its negative effect on the average. While in these trials, I never reached a setting of m above which classification performance decreased, it is possible that giving the network access to too much of the data would cause it to over fit the current subset and generalize poorly, in much the same way that overly complex networks appeared to over-fit.

Conclusion:

The classification of normal and abnormal heartbeats has a number of clinically useful and significant applications. Here, neural networks were shown to be an effective framework for classifying waveform profiles, especially those networks that were simple, or deep networks with access to the great majority of the data. It should not, however, be underestimated how simplified this solution space is. This project classified preprocessed neural network profiles in accordance with target variables. In a clinical situation, such processing would have to be done on the spot since the network would only have access to raw, ECG data (electrical data points). Thus, although these results are promising, translation to clinical application would entail several intermediate steps. Future research should seek to increase the speed of this project’s results and to process raw ECG data quickly into the profiles necessary for the network to classify it.

References:

[1] Santangeli, Pasquale, and Francis E. Marchlinski. "Ventricular Ectopy as a Modifiable Risk Factor for Heart Failure and Death." Journal of the American College of Cardiology 66, no. 2 (2015): 110-12.

[2] Both the MIT/BIH dataset and its supporting documentation can be accessed at the following URL: http://homepages.cae.wisc.edu/~ece539/data/ecg/index.html

[3] Eraqi’s code and supporting documentation can be accessed at the following URL
http://www.mathworks.com/matlabcentral/fileexchange/54076-mlp-neural-network-with- backpropagation

[4] Caffe’s homepage and documentation can be accessed at the following URL: http://caffe.berkeleyvision.org/

[5] Jadhav, Shivajirao M., Sanjay L. Nalbalwar, and Ashok A. Ghatol. "Arrhythmia Disease Classification Using Artificial Neural Network Model." 2010 IEEE International Conference on Computational Intelligence and Computing Research, 2010.

7