Diabetes is a metabolic disorder that results in a retinal complication called diabetic retinopathy (DR) which is one of the four main reasons for sightlessness all over the globe. DR usually has no clear symptoms before the onset, thus making disease identification a challenging task. The healthcare industry may face unfavorable consequences if the gap in identifying DR is not filled with effective automation. Thus, our objective is to develop an automatic and cost-effective method for classifying DR samples. In this work, we present a custom Faster-RCNN technique for the recognition and classification of DR lesions from retinal images. After pre-processing, we generate the annotations of the dataset which is required for model training. Then, introduce DenseNet-65 at the feature extraction level of Faster-RCNN to compute the representative set of key points. Finally, the Faster-RCNN localizes and classifies the input sample into five classes. Rigorous experiments performed on a Kaggle dataset comprising of 88,704 images show that the introduced methodology outperforms with an accuracy of 97.2%. We have compared our technique with state-of-the-art approaches to show its robustness in term of DR localization and classification. Additionally, we performed cross-dataset validation on the Kaggle and APTOS datasets and achieved remarkable results on both training and testing phases.
Diabetes, scientifically known as diabetes mellitus is an imbalance of metabolism that precedes to increase in the level of glucose in the bloodstream. According to an estimate provided in [
For treatment purposes and avoiding vision impairment, the DR is classified in different levels concerning the severity of the disorder. According to the research of the early treatment of DR and international clinical DR, there are five levels of DR severity. In the zeroth level of DR severity, there is no abnormality. The first, second, third, and fourth levels are identified as the presence of mild- aneurysms, moderate non-proliferative Diabetic Retinopathy (NPDR), severe NPDR, and proliferative DR, respectively.
DR type | Visual observations using fundoscopy | Severity level |
---|---|---|
Type 0 | No observable abnormalities | No DR |
Type 1 | Observable micro-aneurysms | Mild NP DR |
Type 2 | Observable micro-aneurysms OR retinal dots and hemorrhages OR hard exudates OR cotton wool spots | Moderate NPDR |
Type 3 | Observable beading in 2 or more quadrants OR intra-retinal microvascular abnormality (IRMA) in 5 or more quadrants OR intra-retinal hemorrhages (more than 20) in each of the 4 quadrants | Severe NPDR |
Type 4 | Observable Neo-vascularization OR pre-retinal hemorrhages | Proliferative DR |
For computerized identification of DR, initially, hand-coded key points were used to detect the lesions of DR [
Object detection and classification in images using various machine learning techniques have been a focus of the research community [ The development of the annotations of the large dataset having images with a total of 88,704 images. We have introduced a customized Faster-RCNN with DenseNet-65 at the feature extraction level which can accurately increase the performance to locate the small objects while decreasing both training and testing time complexity. By removing the unnecessary layers, the Densenet-65 minimizes the loss of the bottom-level high-resolution key points and saves the data of small targeted regions, which are lost by repeated key points. To develop a technique for classifying DR images using DenseNet-65 architecture instead of hand-engineered features and reduce cost-effectiveness and the need for face-to-face consultations and diagnosis. Furthermore, we have compared the classification accuracy of the presented framework with other algorithms like AlexNet, VGG, GoogleNet, and ResNet-11. The results presented in this work show that DenseNet architecture performs well in comparison to the latest approaches.
The remaining manuscript is arranged as follows: In Section 2, we present the related work. This includes the work on the classification of DR images using handcrafted features and DL approaches. In Section 3, we present the proposed methodology of DR image classification using Custom Faster-RCNN. In Section 4, we present the results and evaluations of the introduced work. Finally, in Section 5 we conclude our work.
In history, several approaches have been introduced to correctly classify the images of normal retina and retina with DR. In [
With the introduction of DL, a focus is on introducing methods for classifying DR images through employing deep neural networks as a replacement for hand-coded key points. The related work of approaches to categorizing normal and DR retinas utilizing DL methodologies is discussed in
Reference | Methodology | Findings | Gaps identified | |
---|---|---|---|---|
Xu et al. [ |
Deep CNN-based technique to correctly classify DR using fundus samples. The presented network is based on the most fundamental architecture of CNN. | The proposed method can accurately categorize 94.5% of the color DR images. | The architecture of CNN can be modified for achieving better results, such as training time, and accuracy. | |
Li et al. [ |
Fine-tuning based CNN classifier design, that is applied to all the layers of the pre-trained CNN and then applied only on selected layers of the pre-trained network. Also, an alternative CNN method is proposed to compute key points and then trains SVM for classifying the DR images. | The CNN fine-tuning of the specific layers performed the best as compared to the method in which all the layers are fine-tuned and the SVM methods. The former method results in an accuracy of 92.01%. | The classification results can be improved by carefully fine-tuning an advanced architecture of CNN. | |
Zhang et al. [ |
Deeply supervised ResNet for the classification of DR severity level. ResNet architecture has been modified by the addition of 3 sets of side-output layers in the hidden layer of 11-layer ResNet. | The 11-layer ResNet achieves a classification accuracy of 81.0%. | The accuracy can be improved by either making ResNet deeper or using different architectures. | |
Wang et al. [ |
Using various CNN frameworks for the severity level classification of DR. The authors use AlexNet, VGG16, and InceptionNet-V3 for the classification of DR samples. | The proposed algorithms for DR classification provide an accuracy of 37.43%, 50.03%, and 63.23% for AlexNet, VGG16, and InceptionNet- V3, respectively. | The accuracy of the proposed work is very low. Modification in architecture can lead to better accuracy. | |
Wan et al. [ |
Hyper-parameter tunings of various CNN frameworks, like AlexNet, VGG16, GoogleNet, and ResNet for the classification of DR images | The fine-tuned CNN architectures provided a classification accuracy of 89.75%, 95.68%, 93.36%, and 90.40% for AlexNet, VGG, GoogleNet, and ResNet, respectively. | The accuracy of the algorithms can be further improved. | |
Zhang et al. [ |
The authors propose a system called DeepDR for the automatic recognition of DR images and their severity level using transfer learning and ensemble learning. | The proposed DeepDR has the sensitivity and specificity of 97.5% and 97.7%, respectively. | The accuracy can be improved, and the complexity of the model can be lowered. | |
Bodapati et al. [ |
DLL based automated DR identification network using fundus images. The author computes deep features by employing VGG16-fc1, CGG16-fc2, and Xception networks. Based on the obtained set of hybrid features, a DNN model was used to specify the DR severity level. | The introduced framework has achieved an accuracy of 80.96%. | The prediction accuracy of the technique can be further enhanced. | |
Kathiresan et al. [ |
This work presents an automated DL based approach to categorize the DR fundus samples. After preprocessing, the histogram-based segmentation approach was employed to compute the important information from the images, on which Synergic Deep Learning (SDL) framework was utilized for classification. | The presented framework has exhibited an accuracy value of 99.28% along with 98.54% sensitivity and 99.38 specificity values. | The classification results of the introduced method can be further enhanced via hyperparameters tuning. | |
Torre et al. [ |
A DL based solution was proposed for classifying the DR images and determining the disease severity level. This technique could estimate the predicted class and allocate values to each pixel to show their significance over all samples. The allocated score was utilized to give a concluding classification result. | The presented DL approach attained beyond 90% of sensitivity and specificity results. | The detection accuracy of the introduced technique can be enhanced with significant measures. | |
Li et al. [ |
A new database for fundus images named DDR was proposed on which five DR classification models (VGG-16, ResNet-18, GoogLeNet, DenseNet-121, and SE-BN-Inception), two segmentation models (DeepLab-v3+ and HED), and three DR localization techniques were evaluated. | For HED and DeepLab-v3+, the mAP values are .1587 and .3010 while for SSD, YOLO, and Faster-RCNN the mAP score is 0.001515, 0.003045, and 0.000900 respectively. | The segmentation techniques perform poorly on the introduced dataset. |
The presented work comprises of two main parts. The first is ‘dataset preparation’ and the second is Custom ‘Faster-RCNN builder’ for localization and classification.
The first module develops the annotations for DR lesions to locate the exact region of the lesion. While the second Component of the introduced framework builds a new type of Faster-RCNN. This module comprises two sub-modules in which the first one is a CNN framework and the other is the training component, which performs training of Faster-RCNN through employing the key points computed from the CNN model. Faster-RCNN accepts two types of input, image sample and location of the lesion in the input image.
Like any other real-world dataset, our data contains various artifacts, such as noise, out of focus images, underexposed or overexposed images. This may lead to poor classification results. Therefore, we perform data pre-processing on the samples beforehand inputting them to CNNs.
where
where,
Second, the removal of regions which have no information. In the original dataset, there are certain areas in the image that if removed do not affect the output. Therefore, we crop these regions from the input image. The process of cropping images not only enhances the performance of the classification but also assist in reducing the computations.
The location of DR lesions of every sample is necessary to detect the diseased area for the training procedure. In this work, we have used the LabelImg tool to generate the annotations of the retinal samples and have manually created a bbox of every sample. The dimensions of the bbox and associated class for each object are stored in XML files, i.e., xmin, ymin, xmax, ymax, width, and height. The XML files are utilized to generate the CSV file, train. record file is created from the CSV file which is later employed in the training procedure.
Faster-RCNN [
A CNN is a special type of NN that is essentially developed/evolved to perceive, recognize, and detect visual attributes from 1D, 2D, or ND matrices. In the presented work, image pixels are passed as input to the CNN framework. We have employed DenseNet-65 as a feature extractor in the Faster-RCNN approach. DenseNet [
Network parameters | Value |
---|---|
Epochs | 20 |
Learning rate | 0.001 |
IOU threshold | 0.90 |
Matched threshold | 0.5 |
Unmatched threshold | 0.5 |
Layer | Densenet-65 | ||
---|---|---|---|
Size | Stride | ||
Convolutional_layer_1 | 2 | ||
Pooling_layer_1 | 2 | ||
Dense_block_1 | 1 | ||
Transition_layer | Convolutional_layer_2 | 1 | |
Pooling_layer_2 | 2 | ||
Dense_block_2 | 1 | ||
Transition_layer | Convolutional_layer_3 | 1 | |
Pooling_layer_3 | 2 | ||
Dense_block_3 | 1 | ||
Transition_layer | Convolutional_layer_4 | 1 | |
Pooling_layer_4 | 2 | ||
Dense_block_4 | 1 | ||
Classification_layer | |||
Fully connected layer | |||
SoftMax |
The main process of lesion classification through Faster-RCNN can be divided into four steps. Firstly, the input sample along with annotation is given to the denseNet-65 to compute the feature map, then, the calculated key points are used as input to the RPN component to obtain the features information of the region proposals. In the third step, the ROI pooling layer produces the proposal feature maps by using the calculated feature map and proposals from convolutional layers and the RPN unit, respectively. In the last step, the classifier unit shows the class associated with each lesion while the bbox generated by the bbox regression is used to show the final location of the identified lesion.
The proposed method is assessed employing the Intersection over Union (IOU) as described in
The first decision for lesions being identified when the value of IOU is greater than 0.5, or not is determined when the value is less than 0.5. The Average Precision (AP) is mostly employed in evaluating the precision of object detectors i.e., R-CNN, SSD, and YOLO, etc. The geometrical explanation of precision is shown in
The Densenet-65 has two potential difference from traditional DenseNet: (i) Densenet-65 has less number of parameters from the actual model as instead of 64, it has 32 channels on the first convolution layer, and the size of the kernel is
The Dense block is the important component of DenseNet-65 as shown in
After multiple dense connections, the number of FPs will rise significantly, the transition layer (TL) is added to decrease the feature dimension from the preceding dense block. The structure of TL is shown in
Faster-RCNN is a deep-learning-based technique which is not dependent on methods like the selective search for its proposal generation. Therefore, the input sample with annotation is given as input to the network, on which it directly computes the bbox to show the digit location and associated class.
In this method, we employ the DR images database provided by Kaggle. There are two sets of training images with a total of 88704 images. A
The detection accuracy of proposed DenseNet-65 method is compared with base models, i.e., DenseNet-65, ResNet, EfficientNet-B5, AlexNet, VGG, and GoogleNet.
In this part, we show the simulation results of the ResNet, DenseNet-65, and EfficientNet-B5. The results are presented in terms of accuracy for DR image classification.
Evaluation parameter | DenseNet-65 (proposed) | EfficientNet-B5 | ResNet |
---|---|---|---|
Total parameters | 7,042,600 | 28,178,299 | 25,691,013 |
Trainable parameters | 6,958,900 | 28,178,299 | 25,637,893 |
Non-trainable parameters | 83,640 | 0 | 53,120 |
Test loss | 0.11 | 0.216 | 0.19 |
Test accuracy | 0.972 | 0.7998 | 0.94 |
The number of trainable parameters of DenseNet-65 is small, i.e., 6, 958, 900, as compared to the trainable parameters of ResNet and EficientNet-B5. Consequently, the training time for the former deep network, i.e., DenseNet-65, is short as compared to the later methods, i.e., ResNet and EfficientNet-B5.
Our analysis reveals that the classification performance of the DenseNet-65 is higher than the other methods as shown in
Network architecture | Accuracy (%) |
---|---|
AlexNet [ |
89.75 |
VGG [ |
95.6 |
GoogleNet [ |
93.36 |
ResNet [ |
90.40 |
DenseNet-121 [ |
92.39 |
EfficientNet-B5 [ |
94.5 |
DenseNet-65 (proposed) | 97.2 |
For localization of the DR signs, the diseased areas are declared a positive example while the remaining healthy parts are known as a negative example. The correlated area is categorized by a threshold score IOU, which was set to 0.5, less than this score, considering the area as background or negative. Likewise, the value of IOU more than 0.5 the areas are classified as lesions. The localization outcome of Custom Faster-RCNN as shown in
The presented methodology results are analyzed by employing the mean IOU and precision over all samples of the test database.
DR lesions | Mean IOU | Precision |
---|---|---|
Hard exudates | 0.990 | 0.99 |
Soft exudates | 0.970 | 0.961 |
Micro aneurysms | 0.989 | 0.85 |
Hemorrhages | 0.928 | 0.96 |
The stage-wise results of the introduced framework are analyzed through the experiments. Faster-RCNN precisely localized and classify the lesions of the DR. The classification results of DR in terms of accuracy, precision, recall, F1-score, and error-rate are presented in
Stages | Accuracy | Precision | Recall | F1-score | Error-rate |
---|---|---|---|---|---|
No DR | 0.97 | 0.923 | 0.917 | 1 | 0 |
Mild | 0.928 | 0.987 | 0.939 | 0.921 | 0.079 |
Moderate | 0.99 | 0.979 | 1 | 0.993 | 0.007 |
Severe | 0.992 | 0.989 | 0.954 | 0.954 | 0.046 |
Proliferative | 0.981 | 0.993 | 0.99 | 0.962 | 0.038 |
In the present work, we reported results by running a computer simulation 10 times. In each run, we randomly selected data with a ratio of 70% to 30% for training and testing, respectively. The average results in form of performance evaluation metrics were then considered.
In
Network architecture | Accuracy (%) |
---|---|
Xu et al. [ |
94.50 |
Li et al. [ |
92.01 |
Zhang et al. [ |
81.00 |
Li et al. [ |
82.80 |
Wu et al. [ |
83.10 |
Pratt et al. [ |
75.00 |
Proposed | 97.20 |
The proposed method achieved the average accuracy of 97.2%, while the comparative approaches attained the average accuracy of 84.735%, we can say that our technique gave a 12.46% performance gain. Furthermore, the presented approach can simply be adopted or run-on CPU or GPU based systems and every sample test time is 0.9 s which is faster than the other method’s time. Our analysis shows that the proposed technique can correctly classify the images.
To more assess the presented approach, we present the validation of the cross dataset, which means we trained our method on the Kaggle database, and testing is performed on the APTOS-2019 dataset [
We have plotted the box plot for evaluation of cross dataset in
In this work, we introduced a novel approach to accurately identify the different levels of the DR by using a custom Faster-RCNN framework and have presented an application for lesion classification as well. More precisely, we utilized DenseNet-65 for computing the deep features from the given sample on which Faster-RCNN is trained for DR recognition. The proposed approach can efficiently localize retinal images into five classes. Moreover, our method is robust to various artifacts, i.e., blurring, scale and rotational variations, intensity changes, and contrast variations. Reported results have confirmed that our technique outperforms the latest approaches. In the future, we plan to enhance our technique to other eye-related diseases.
We would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.