Malaria is a leading cause of deaths globally. Rapid and accurate diagnosis of the disease is key to its effective treatment and management. Identification of plasmodium parasites life stages and species forms part of the diagnosis. In this study, a technique for identifying the parasites life stages and species using microscopic images of thin blood smears stained with Giemsa was developed. The technique entailed designing and training Artificial Neural Network (ANN) classifiers to perform the classification of infected erythrocytes into their respective stages and species. The outputs of the system were compared to the results of expert microscopists. A total of 205 infected erythrocytes images were used to train and test the performance of the system. The system recorded 99.9% in recognizing stages and 96.2% in recognizing plasmodium species.
Malaria is a global public health threat. It is estimated that about 1 million lives are lost annually due to the disease, majority of which are children below five years of age [
A key element to successful treatment of malaria is speedy and accurate diagnosis of the disease. Malaria diagnosis entails detection of Plasmodium parasites, determination of the parasites life stages and species as well as quantification of degree of infection (parasitemia) [
The goal of this work is to develop a technique for automating the process of determining Plasmodium parasite stages and species. These two diagnostic tasks are necessary in administration of correct treatment to a malaria patient [
To address the challenges discussed above, a novel technique of determining the species and life stages of Plasmodium parasites in infected erythrocytes images is described in this paper. A strategy which mimics human microscope operator is developed using back propagation artificial neural network (ANN). ANN has got many advantages [
In situations where statistical properties of pattern class are not known, classification of a decision theoretic problem is best handled by methods that yield the required decision functions directly via training. Neural network is one such approach. It comprises of inter-connections of nonlinear computing elements organized as networks reminiscent of the way neurons are believed to be interconnected in the brain. The basic block of a neural network comprises of a computing element where weighted inputs are added followed by a nonlinear activation element which receives the sum of the weighted inputs and gives an output value. This basic architecture of artificial neural network is referred to as the perceptron. They consist of a single layer of neurons. Perceptrons can learn linear decision functions that separate two linearly separable training sets.
The response of the device is based on a weighted sum of its inputs; that is
This is the linear decision function with respect to the components of the pattern vectors. When
A learning rule is a procedure for modifying the weights and biases of a network. Learning rules can be classified under three categories namely; supervised learning, unsupervised learning, and reinforcement learning.
In supervised learning, the networkis provided with a training set, which is a set of examples which give proper network behaviour:
In unsupervised learning, the weights and the biases are modified in response to the network inputs only. There are no target outputs available. The algorithm performs some kind of clustering operation. Inputs are categorized into finite number of classes.
In reinforcement learning, the learning algorithm is provided with the network inputs and a grade (also called score). The grade is a measure of the network performance over some sequence of output.
Expert microscopist use color, shapes, relative sizes and texture of both infected erythrocyte and Plasmodium
The neuron model, it consists of the weighted inputs, the summer, and the activation function f
parasites to distinguish between different life stages and species of Plasmodium parasites. Therefore, an automated system for performing such classification using images of thin blood smear should be able of inferring information about color, and morphology of infected erythrocytes and Plasmodium parasites from the images. It is therefore logical that the system should comprise of classifiers trained with color and morphological features on infected erythrocyte images in order to categorize various classes of the parasite life stages and species. This is the model used to develop such a system in this work. A block diagram of the model is depicted in the
Images of thin blood smear were obtained from two sources, namely Center for Disease Control (CDC) [
Automation of Plasmodium parasites classification into their respective stages and species is a difficult task. This is due to the high correlation of parasite features in different stages and species. Based on this difficulty, neural network classifiers were considered to be the best tools for the job. This is because they would learn to distinguish different stages and species using examples of image features of these parasites used as training set.
Two sets of features were used to train neural network classifiers to classify Plasmodium life stages. One was, direct pixel values of RGB images obtained from the two sources. Another set of features comprised of the parasites and infected erythrocytes morphological, color and texture information. The features were divided into four classes corresponding to the four main life stages of Plasmodium parasites namely; early trophozoites (the ring stage), mature trophozoite, schizoint and gametocyte stages. These features were then used to train two ANN classifiers to recognize Plasmodium life stages. Algorithmic steps for the two procedure followed are given in
Block diagram showing a black box model of Plasmodium parasites stages and species classification
. Algorithmic steps for training ANN to identify Plasmodium life stages using color features
1. Load RGB images with different life stages ofPlasmodium parasites 2. Convert the image into double class. 3. Extract the red, green, and blue intensity pixel values from images of infected erythrocytes. These features should be obtained from infected regions of erythrocytes 4. Form feature vectors comprising of three elements using the three colour components extracted in step 3 5. Categorize these feature vectors into four classes; early trophozoites, mature trophozoite, schizonts, and gametocytes stages. 6. Form the corresponding target (desired output) vector for feature classes of step 5. The four target vectors for the four classes were, [1 0 0 0]T, [0 1 0 0]T, [0 0 1 0]T, [0 0 0 1]T where T denotes matrix transpose. 7. Train a multilayer neural network with varying numbers of hidden neurons and record learning accuracies. 8. Choose the ANN with highest degree of classification accuracy and generalization |
---|
. Algorithmic steps for training ANN to identify Plasmodium life stages using colour, morphological and texture features
1. Load images of infected erythrocytes to a computer 2. Extract the RGB features from infected regions of erythrocytes 3. Obtain binary images of infected erythrocytes and Plasmodium parasites using suitable segmentation techniques 4. Determine the following features from the segmented objects; i. Ratio of the parasite area to area of the infected erythrocyte ii. The seven moment invariants of both the color and binary images 5. Use the intensity and saturation components of infected erythrocyte to determine the following features i. R-measure ii. 3rd moment iii. Uniformity iv. Entropy 6. Form a feature vector from the features extracted above 7. Use the feature vector obtained in 5 above to train a multi-layer neural network to categorize images of infected erythrocytes into their respective life stages. 8. Determine the classification accuracy of the multi-layer artificial neural network. 9. Choose the network that gave the highest degree of classification accuracy and generalization |
---|
Artificial neural network was designed and trained using the steps described in
The next technique explored for Plasmodium parasites stages differentiation was the use of the parasites morphological, colour and texture information. Here, another set of features comprising of morphological (shape and size), colour and texture features were extracted. The features are similar to what human microscopists use to distinguish between stages and species of Plasmodium parasites. Once an erythrocyte had been identified as infected, the parasite size, shape, texture, number of nucleated objects per infected erythrocyte, and their separation distances were evaluated. Besides colour information was also represented in form of the average hue, saturation, intensity, red, green, and blue components of the infected erythrocytes. These parameters were then used as feature vector for training a multilayer neural network to classify Plasmodium parasites into its respective life stages. The training set comprised was made up of 15 by 800 input features and a 4 by 800 target matrix. This training set was also divided into four groups corresponding to four plasmodium parasites life stages with a fifth of the feature vector used for validation. The algorithmic steps used for this classification task are given in
The neural network classifier described above was trained with different number of hidden neurons and their performances recorded.
As was the case with stages classification, species identification was experimented using two approaches. One was training a multilayer neural network classifier using only colour information of the infected erythrocyte, and the second scheme was to use a combination of colour, morphology, and texture features of infected erythrocyte. In both cases, a total of 205 infected erythrocyte sub-images were used to form 205 feature vectors. 80 feature vectors were extracted from Plasmodium falciparum infected erythrocytes, 50 feature vectors were extracted from Plasmodium ovale infected erythrocytes, 50 feature vectors were extracted from Plasmodium vivax infected erythrocytes, while, 35 feature vectors were extracted from Plasmodium malariae infected erythrocytes. The feature vectors were used to train multilayer artificial neural network classifiers.
In the first case RGB features were used to form the feature vector. These were the red, green, and blue components of the infected erythrocytes. The features were divided into four classes—the four species of Plasmodium parasites which infect humans.
In the second case colour, morphological and texture information for both the detected parasites and infected erythrocytes were used to form the feature vector. The features were divided into four classes, the four species of Plasmodium parasites which infect humans.
A feature vector of 3 by 205 elements was formed. Four fifth of these features were used in training of an ANN classifier while remaining features were used for validation of the network.
A total of 4000 features were extracted from the infected erythrocyte images. These features were divided into 4 classes based on the species of plasmodium parasites that was infecting the erythrocyte they were extracted from.
For the second ANN classifier, morphological, colour and texture features were extracted from the same 30 images used in the first classifier. The performance of the network was recorded in
From these results it can be seen that the network attained an overall classification accuracy of 90.34% with a generalization ability of 99.91%. Regression plot for the network is shown in
Classification accuracies of neural networks trained to differentiate species of Plasmodium parasites using two sets of features were investigated. One neural network was trained with RGB features while the other network was trained with a combination of morphological, colour and texture features. The networks performances were monitored for different number of hidden neurons and training was stopped when the network reached the optimum performance, i.e. when no further improvement in the classification accuracy of the network could be made.
. Algorithmic steps for training ANN to identify Plasmodium parasites species using color features
1. Load RGB images with different life stages of Plasmodium parasites 2. Convert the image into double class. 3. Extract the red, green, and blue intensity pixel values from images of infected erythrocytes. These features should be obtained from infected regions of erythrocytes 4. Form feature vectors comprising of three elements using the three colour components extracted in step 3 5. Categorize these feature vectors into four classes based on the parasite species that infected the erythrocyte which the features were extracted from. 6. Form the corresponding target (desired output) vector for feature classes of step 5. The four target vectors for the four classes were, [1 0 0 0]T, [0 1 0 0]T, [0 0 1 0]T, [0 0 0 1]T where T denotes matrix transpose. 7. Train a multilayer neural network with varying numbers of hidden neurons and record learning accuracies. 8. Choose the ANN with highest degree of classification accuracy and generalization. |
---|
. Algorithmic steps for training ANN to identify Plasmodium species using colour, morphological and texture features
1. For each infected erythrocyte sub-image, generate its RGB and HSI colour histograms and compute the first five statistical moments for each histogram. 2. Use RGB and HSI colour components of infected erythrocyte sub-images to compute four statistical texture measures namely; i. R-measure ii. 3rd moment iii. Uniformity iv. Entropy 3. Threshold the infected erythrocyte sub-image using the first threshold value, T1 obtained from Zack’s algorithm to produce a binary image of the infected erythrocyte and use this image to compute the following features; i. Infected erythrocyte relative size, Sif. This is obtained as follows; where, I_area = the number of foreground pixels in an infected erythrocyte nI_area = the number of foreground pixels in a non-infected erythrocyte ii. First five statistical moments of the infected erythrocyte shape signature iii. Eccentricity of the erythrocyte iv. Compactness v. Roundness vi. Aspect ratio vii. Form factor viii. Solidity ix. Convexity x. Extent xi. Erythrocyte centroid 4. Threshold the infected erythrocyte sub-image using the second threshold value, T2 obtained by zack’s algorithm to produce a binary image of the potential Plasmodium parasite. Use this binary image to compute the following features; i. Relative size of the parasite. This is given by the following expression; where Ap is the area of the parasitized region and A.I.Eis the total area of the infected erythrocytes. ii. Eccentricity of the parasite iii. Compactness iv. Solidity v. Convexity vi. Aspect ratio vii. Form factor viii. Extent ix. Roundness x. Number of nucleated objects xi. Separation distances of the nucleated objects xii. Distances of the nucleated object from the centroid of the infected erythrocyte 5. Form a feature vector of each infected erythrocyte sub-image using features obtained from steps 1, 2, 3, and 4 above. 6. Group these feature vectors in four categories based on the Plasmodium species infecting the erythrocyte 7. Train a multilayer ANN using the features of step six above as the training set. 8. Vary the number of neurons in the hidden layer of the ANN and record the performance of the resulting classifier. 9. Determine the best performance obtained in step 8 above. This is the classification accuracy of the ANN. |
---|
. Performance of stages identification using only RGB colour features of the detected Plasmodium parasites
Feature vector used | Training session % accuracy | Validation session % accuracy | Test session % accuracy | Overall performance % accuracy |
---|---|---|---|---|
RGB features vectors only | 91.9 | 51.8 | 74.8 | 75.5 |
Performance of an ANN trained with RGB features to perform stages recognition
. Performance of stages identification using morphological, colour and texture features of the detected Plasmodium parasites
Feature vector used | Training session % accuracy | Validation session % accuracy | Test session % accuracy | Overall performance % accuracy |
---|---|---|---|---|
RGB features vectors only | 95.93 | 99.91 | 99.76 | 90.34 |
Performance of an ANN trained with colour, morphological and texture features to perform Plasmodium stages identification
The best performance recorded was produced by the network trained with a combination of morphological, colour and texture features. This network yielded an overall classification of 95.85% with 93.2% generalization ability as can be seen from the regression plot of
In this paper, a technique of classifying plasmodium parasites life stages and species in thin blood smear images
. ANN performance for species identification
Feature vector used | Training session % accuracy | Validation session % accuracy | Test session % accuracy | Overall performance % accuracy |
---|---|---|---|---|
RGB features vectors only | 60.0 | 24.0 | 16.6 | 33.5 |
Morphological, colour and texture | 100 | 93.2 | 96.2 | 95.9 |
Performance of ANN trained with colour, morphological and texture features to identify Plasmodium species
using ANN was developed. Algorithmic steps for extracting features from infected images as well as training of different ANN classifiers were described. The trained networks were then tested using validation samples different from those used as the training set.
The ANN classifier for stages identification attained a classification accuracy of 99.9% and 97.76% in training and validation respectively. The ANN classifier for species identification attained sensitivities of 96.2% and 93.2% in training and validation respectively. Colour, morphological and texture features of infected erythrocyte images were found to be most suitable inputs to the above classifiers.
This technique has the potential for substituting human microscopist in clinical diagnosis of malaria. The instrumentation requires implementing the technique comprising of an optical microscope fitted with a digital microscope. The digital camera should be interfaced to a computer installed with a software for acquisition of microscopic images and processing them using the described technique.