In this paper, we present a novel and efficient scheme for detection of P300 component of the event-related potential in the Brain Computer Interface (BCI) speller paradigm that needs significantly less EEG channels and uses a minimal subset of effective features. Removing unnecessary channels and reducing the feature dimension resulted in lower cost and shorter time and thus improved the BCI implementation. The idea was to employ a proper method to optimize the number of channels and feature vectors while keeping high accuracy in classification performance. Optimal channel selection was based on both discriminative criteria and forward-backward investigation. Besides, we obtained a minimal subset of effective features by choosing the discriminant coefficients of wavelet decomposition. Our algorithm was tested on dataset II of the BCI competition 2005. We achieved 92% accuracy using a simple LDA classifier, as compared with the second best result in BCI 2005 with an accuracy of 90.5% using SVM for classification which required more computation, and against the highest accuracy of 96.5% in BCI 2005 that used SVM and much more channels requiring excessive calculations. We also applied our proposed scheme on Hoffmann’s dataset to evaluate the effectiveness of channel reduction and achieved acceptable results.
The electroencephalogram (EEG) is a recording of brain activity. It is widely used as an important diagnostic tool for neurological disorders. Many BCIs utilize EEG signals to translate these signals into users’ commands, which can control some external systems. Some BCI systems, such as the P300 oddball event response, are based on the analysis of the EEG event related potentials (ERPs) [
A BCI system critically depends on several factors such as its cost, accuracy, how fast it can be trained and so on. The main goal of this paper is to propose an algorithm for achieving above factors. Utilizing proper channels and efficient features are two key factors that play an important role in enhancing the BCI systems. The effective features are obtained by eliminating poor features from extracted ones. In this study, we used wavelet decomposition for feature extraction, which was an efficient tool for multi-resolution analysis of non-stationary signals such as the EEG. Also, we applied Mahalanobis’s criteria to choose wavelet coefficients which were more discriminated. Optimal channels are selected by removing unnecessary channels based on Mahalanobis’s criteria and applying forward-backward selection (FBS) algorithm. In classification section, linear dis- criminant analysis (LDA) was used as a classifier because it had suitable factors such as fast training and simple implementation; hence, it brought high accuracy in output as well.
In the following sections of the paper, the P300 speller data and preprocessing phases are described in Section 2. In Section 3, we present feature extraction based on wavelet transform, optimal channel selection, optimal sub-bands selection and classification algorithm. Experimental results and conclusions are given in Sections 4 and 5, respectively.
The P300 speller paradigm described by Farwell and Donchin [
row and one particular column). Thus, a P300 can be achieved when the row/co- lumn flashes with attended symbol.
We applied proposed method on data set II from the third edition of the BCI Competition, which was recorded for two different subjects, A and B [
The training and the testing sets were made of 85 and 100 characters, respectively. As such, the number of corresponding epochs for each subject was 85 × 12 × 15 = 15,300 and 100 × 12 × 15 = 18,000, respectively.
First, some preprocessing must be done on the signal to improve the signal to noise ratio and make it appropriate for using in BCI systems. To this end, all data were normalized as bellow:
where x denotes the original signal,
The Discrete Wavelet Transform (DWT) has been extensively used in ERP analysis due to its ability to effectively explore both the time-domain and the frequency-domain features of ERPs [
The spelling accuracy in P300 speller depends on utilizing effective channels. It is obvious that less important channels leads to extraction of poor features. So, removing ineffective channels can decrease the computation time, implementation cost, and increase the output performance as well. To achieve this purpose, we used a hybrid method which extracts the optimal channels in two stages. In the first stage, we investigated the channels which had more discrimination ability of target signals from non-target signals based on Mahalanobis’s distance (MD) [
where μ1 is the mean for target class, μ2 is the mean for non-target class, and
The procedure of channel selection starts with computing the MD of each of 64 channels. First, we chose 44 channels with larger MD which were about 66% of all channels.
In the next stage, we used the FBS algorithm to find optimal channels. First of all, just one channel which had the highest accuracy on validation set was selected. In each running stage of the FBS algorithm, three channels were added and two channels were eliminated. So, one channel was added in each stage. The classification accuracy was assessed on the validation set described in the appendix. The FBS algorithm was implemented by defining the initial channel set which included the channel with highest accuracy on validation set. The FBS algorithm’s steps are described below:
1) Forward procedure:
・ Add each channel separately to the channel set (includes k channels).
・ Find a channel which the maximum validation set accuracy can be obtained by adding it. So, the number of channels will be
・ Run all the above process two times.
・ In this case,
2) Backward procedure:
・ Calculate validation set accuracy by removing each channel of selected channel set at forward procedure (includes
・ Find channel which it has the maximum bypass accuracy and eliminate it.
・ Run all the above process again. So, the number of remained channels would be equal to
As you see, one channel in each stage of the FBS algorithm was added to the channel set. This process continued until the optimal channel set was obtained.
selected channel set for each subject is shown in
P300 component doesn’t appear at the same sub-bands for different channels. On the other hand, all sub-bands in wavelet analysis don’t make enough discrimination between two classes. So, using an algorithm for choosing the optimal sub- bands seems to be necessary. In this situation, not only the redundancy gotreduced in feature dimension, discrimination between two classes can increase as well. For meeting these requirements, we used Mahalanobis’s criterion which was defined in Equation (2).
After computing MD of sub-bands, it is necessary to use threshold limit for optimal sub-bands selection. In order to select suitable threshold, four steps should be considered as below:
・ Computing the MD of sub-bands for selected channels. Dividing area of max (MD) and min (MD) to five levels which were defined as threshold levels.
・ Eliminating poor sub-bands whose MD are smaller than thresholds.
・ Evaluating output accuracy on validation data set.
・ Choosing the threshold corresponding to the best validation performance.
By applying the threshold level, one can use important sub-bands with nonzero values to construct the effective features. The appropriate threshold levels were 78.36 and 45.9 for subject A and B, respectively.
We used the LDA classifier based on linear transform y = WTx to classify the feature vectors of two classes. Here, W is the discriminant vector, x is the feature vector and y is output of the LDA classifier. Fisher’s LDA defined in Equation (3), tries to obtain transformation matrix W by maximizing the ratio of between- class scatter [
where Sb and Sw are between-class scatter matrix and within-class scatter matrix. By computing the derivative of F and setting it to zero, one can show the optimal W is determined by below equation [
For evaluation of the proposed method in selecting optimal channel set, our channel set was compared with three other channel sets. Their list is illustrated in
To show that our proposed scheme extracts effective features, we compared the classification accuracies for all sub-bands and the optimal sub-bands as feature vectors in
of the optimal sub-bands in all trials, except in single trial, is higher than that of using all sub-bands. Besides, the feature vector dimension reduced up to 40%.
To investigate the robustness of the proposed method, we employed Hoffmann’s dataset [
Channel Set Number | Reference | Channels |
---|---|---|
1 | M. Kaper et al. [ | {Fz, Cz, Pz, Oz, C3, C4, P3, P4, PO7, PO8} |
2 | E. W. Sellers et al. [ | {Fz, Cz, Pz, Oz, FP1, FP2, F3, F4, C3, C4, P3, P4, P7, P8, T7, T8} |
3 | H. Zhang et al. [ | {F3, FC3, C3, CP3, P3, Fz, FCz, Cz, CPz, Pz, F4, FC4, C4, CP4, P4} |
4.a | This study for Subject A | {F1, F6, FC3, FCZ, C3, CZ, CP2, CPZ, CP3, P2, PZ, PO7, POZ, PO8, O1, OZ, O2} |
4.b | This study for Subject B | {FZ, FC6, C3, CZ, CPZ, P2, PO3, PO4, POZ, PO8, O1, OZ, IZ} |
Feature Vector | Percentage of Feature Reduction (%) | Classification Accuracy (%) | ||||
---|---|---|---|---|---|---|
1 Trial | 5 Trials | 10 Trials | 15 Trials | |||
All sub-bands | _ | 26.5 | 67 | 85 | 91.5 | |
Optimal sub-bands | about 40 | 25 | 68 | 86 | 92 | |
Number of Channels | Classifier | Classification Accuracy (%) | |||
---|---|---|---|---|---|
Subject A | Subject B | 5 Trials | 15 Trials | ||
Our scheme | 17 | 13 | LDA | 68 | 92 |
First ranked [ | 64 | 64 | SVM | 73.5 | 96.5 |
Second ranked [ | 11 | 10 | SVM | 55 | 90.5 |
Our previous work [ | 26 | 19 | BLDA | 69.5 | 93 |
with neurological deficits. The recorded EEG data were based on visual stimuli (TV, telephone, lamp, door, window, and a radio) that evoked the P300 component. Each subject had to complete four sessions. In each session, having six runs, subjects were asked to focus on a specific image for each run, while the sequence of stimuli was randomly presented. The number of blocks inside each run was randomly chosen between 20 and 25. During every block, each image was flashed one time. The data contained 32 channels of EEG signals recorded at sampling rate of 2048. We used the data recorded in the first three sessions as the training and the last session as the test data for all eight subjects.
First, EEG signals were preprocessed according to Section 2. For each session, the single trial features corresponding to the first 20 blocks of flashes were extracted via DWT decomposition. For each subject, we reduced the number of channels from 32 to 20 by using the sorted MD values in decreasing order. We ran the FBS algorithm (as described in Section 3.2) to choose the most effective channels from the pre-selected 20 channels.
We compared the computation time in our approach with our previous works [
Subjects | S1 | S2 | S3 | S4 | S6 | S7 | S8 | S9 |
---|---|---|---|---|---|---|---|---|
Number of Channels | 11 | 8 | 9 | 7 | 9 | 6 | 11 | 7 |
computational requirements and training time. It is worth mentioning that we used fewer channels than the first ranked competitor. Additionally, we used the LDA classifier that needs less calculation as compared to the SVM. We observed that the training phase in this paper was nearly 2 and 1.5 times faster than the previous works in [
Three main features of a suitable BCI system are defined as low cost, real time responses and high accuracy. To achieve these objectives, we proposed a scheme for selecting minimal channels and effective features. Proper channels were obtained by utilizing Mahalanobis’s criteria and FBS algorithm. To extract effective features, we used discrete wavelet decomposition via mother wavelet db4 and reduced the number of coefficients by utilizing Mahalanobis’s criteria. The set of minimal features and effective channels resulted in less computation, and reasonable accuracy. We achieved 92% accuracy using a simple classification algorithm based on LDA, as compared with our previous works [
Perseh, B., Kiamini, M. and Jabbari, S. (2017) Feature Conditioning Based on DWT Sub-Bands Selection on Proposed Channels in BCI Speller. J. Biomedical Science and Engineering, 10, 120-133. https://doi.org/10.4236/jbise.2017.103010
We applied validation process based on five-fold cross-validation method to obtain proper channels. The procedure follows items below:
・ Training data with 85 × 12 × 15 × Channel Count signals (where, 85: characters, 12: stimuli, 15: time repetitions, and Channel Count: number of channels) averaged over all signals by 3 times repetitions. So, training data contained 85 × 12 × 5 × Channel Count signals.
・ We divided 85 characters to five partitions and built validation set from N × 12 × 5 × Channel Count signals, where, N contains 17 characters and used residual data to form a training set.
・ Feature vectors were created based on wavelet coefficients (approximate coefficients level 4 and detail coefficients levels 1 to 4 (A4, D4, D3, D2, D1).
・ The LDA classifier was trained and output precision was evaluated based on
validation set. The precision is defined as:
FN are the number of true positive, false positive and false negative respectively.
・ Validation performance was assessed by averaging between five precisions.
Submit or recommend next manuscript to SCIRP and we will provide best service for you:
Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.
A wide selection of journals (inclusive of 9 subjects, more than 200 journals)
Providing 24-hour high-quality service
User-friendly online submission system
Fair and swift peer-review system
Efficient typesetting and proofreading procedure
Display of the result of downloads and visits, as well as the number of cited articles
Maximum dissemination of your research work
Submit your manuscript at: http://papersubmission.scirp.org/
Or contact jbise@scirp.org