Presently, Person Re-IDentification (PRe-ID) acts as a vital part of real time video surveillance to ensure the rising need for public safety. Resolving the PRe-ID problem includes the process of matching observations of persons among distinct camera views. Earlier models consider PRe-ID as a unique object retrieval issue and determine the retrieval results mainly based on the unidirectional matching among the probe and gallery images. But the accurate matching might not be present in the top-k ranking results owing to the appearance modifications caused by the difference in illumination, pose, viewpoint, and occlusion. For addressing these issues, a new Hyper-parameter Optimized Deep Learning (DL) approach with Expanded Neighborhood Distance Reranking (HPO-DLDN) model is proposed for PRe-ID. The proposed HPO-DLDN involves different processes for PRe-ID, such as feature extraction, similarity measurement, and feature re-ranking. The HPO-DLDN model uses a Adam optimizer with Densely Connected Convolutional Networks (DenseNet169) model as a feature extractor. Additionally, Euclidean distance-based similarity measurement is employed to determine the resemblance between the probe and gallery images. Finally, the HPO-DLDN model incorporated ENDR model to re-rank the outcome of the person-reidentification along with Mahalanobis distance. An extensive experimental analysis is carried out on CUHK01 benchmark dataset and the obtained results verified the effective performance of the HPO-DLDN model in different aspects.
Person Re-Identification (PRe-ID) aims to recognize a person to be actually explored in views that are generated by several non-overlapping cameras covering a wider region [
In PRe-ID, a probe image of the required person is fed to the system. Then, the relevant pictures of probe are obtained from various gallery image datasets. Next, the quantification of learning model is performed and the identical images of probe are generated. Since, extraction of useful and applicable features tends to enhance the PRe-ID task, feature extraction is one of the essential PRe-ID processes. Users could be prone to sudden changes in background. A person is investigated by diverse points of camera view. For instance, [
In recent times, eminent level semantic features like hair and dressing style were used for computing effective PRe-ID process. When compared with alternate features, the high-level parameters are discrete and match the prediction process. Various attribute learning models were deployed for extracting high-level attributes. [
Deep Learning (DL) is a well-known and proficient method used for extracting high-level semantic features, with modern advancements for gaining best function in PRe-ID. For instance, the efficiency of deep modules is reliant on, (i) transparent capability of a model to be maximized by deeper network structure, which deals with the modifications present in the pedestrian image, brightness, background, and additional factors in the posture; and (ii) the probe is abstracted with extremely semantic feature definition. Developers have concentrated on PRe-ID process in recent times as it is considered to be important in public safety screening as well as warning methods. Therefore, developing interpretable person PRe-ID models is essential for end users. Basically, surveillance systems are operated independently with no human contribution for predicting the existence of pedestrians from cameras placed in streets which is also emanated with limitations of capturing location, minimum resolution, and huge scale lighting modifications. Unlike, it is highly applicable for pedestrians matching when compared with wearing and carrying process. Furthermore, the concatenation of deep features and metric learning helps in learning whether the unified approach is capable of certain domains and enhances the degree of security on detection made by a machine.
The function of feature transformation is limited if the complexity of developed model becomes impossible for dynamic scenarios. Hence, distance metric learning is employed for resolving the issues of feature matching. The major objective of Metric Learning (ML) is to identify best distance metric during the process of learning sample distribution where the features of similar pedestrian in various scenarios are same and character margin of diverse pedestrians are enhanced. Classical distance metric frameworks like Large Margin Nearest Neighbor (LMNN) as well as Information Theoretic Metric Learning (ITML) cannot be applied in real-time applications because of the requirement for massive data as well as prolonged optimization iterations. Currently, some of the productive approaches like Keep-it-simple-and-straightforward metric learning (KISSME), probability relative distance comparison, relaxed pairwise learned metric, and Ranking by support vector machine (RankSVM) were employed. Though the above approaches are efficient in identification task, it is still suffering from issues. Distance metric learning and tiny sample sizes are constant with clear challenge in PRe-ID.
Since the person re-identification (Pre-ID) is a tedious task due to the variation in appearance caused by the differences in illumination, pose, viewpoint, and occlusion, A new deep learning (DL) with an expanded neighborhood distance reranking (HPO-DLDN) model is presented for PRe-ID. The proposed HPO-DLDN model performs PRe-ID by the use of different processes namely feature extraction, similarity measurement, and feature re-ranking. The HPO-DLDN model uses a Densely Connected Convolutional Networks (DenseNet169) model for feature extraction to extract robust discriminant features from the probe and gallery image. Furthermore, Euclidean distance-based similarity measurement is employed to determine the resemblance between the probe and gallery images. Lastly, the HPO-DLDN model incorporated ENDR model to re-rank the outcome of the person-reidentification along with Mahalanobis distance. Extensive set of simulation analysis was carried out to highlight the goodness of the proposed HPO-DLDN model.
This section reviews recent works carried out on the process of Pre-ID. Fan et al. employed Convolutional Neural Network (CNN) for computing feature extraction to cluster the unlabeled features [
Specifically, supervised neural networks are significant for [
A CNN based multi scale context aware network (MSCAN) approach was proposed by [
The working process of HPO-DLDN model is demonstrated in
Once the input probe images are fed into the HPO-DLDN model, the foremost step of feature extraction process is executed to determine the actual feature vectors. Actually, CNN models are evolved from DL method. It is applied for PRe-ID process. Firstly, CNNs are applied for Re-ID and rank the images collected from gallery either as true or false. CNNs are effective in image classification that classifies the relevant images according to the given input image. Moreover, image tensor has been convolved using collection of
The individual matrix
In order to reduce the computational multifaceted behavior, CNNs apply pooling layers which is useful in reducing the size of final layer from input with one layer. Diverse pooling strategies are employed to reduce the output at the time of preserving essential properties. Also, the prominently used pooling methods are max-pooling in which huge scale activation is selected as a pooling window.
The CNN is executed as differential model which applies backpropagation (BP) method obtained from sigmoid (
Here,
where
DenseNet is one of the familiar DL architectures commonly used for object detection as shown in
Next to the feature extraction process, Euclidean distance base similarity measurement approach is applied to determine the highly relevant images from the gallery. Euclidean distance is one of the intuitive metrics used to show the distance value. Assume a dataset
By means of vector process,
In the PRe-ID process, the estimation model of Euclidean distance shows robust execution. Therefore, Euclidean distance considers every dimension uniformly. Various features among the samples are distinct [
At this stage, the re-ranking process gets executed by the execution of the ENDR model. Usually, actual distance from two images
In
In Expanded neighborhoods distance,
In general, once the initial ranking list
Lastly, END distance of image pair
where
Here, final distance in re-ranking is accomplished. The initial distance of last re-ranking, collects the Mahalanobis distance and pass to calculate END distance:
where
Mahalanobis distance metric refers the identification of person from the collective gallery. Assume the pair of feature vector as
where,
The performance of the proposed HPO-DLDN model has been validated against CUHK01 dataset [
On determining the results under rank-1, the MCC model has shown poor performance by offering the least accuracy of 12%. At the same time, the ITML and Adaboost models depicted slightly better outcomes over the MCC with the closer accuracy values of 21.67% and 22.79% respectively. Followed by, the Xing’s and LMNN models have demonstrated higher accuracy values over the earlier models with the accuracy of 23.18% and 23.7% respectively. Besides, L1-Norm and PRDC models demonstrated moderate accuracy of 26.73% and 32.6%. Ahmed et al. has shown an accuracy of 65%. These models failed to outperform the proposed HPO-DLDN model which has a maximum accuracy of 90.23%. On determining the results under rank-5, the MCC model shows poor performance by offering a minimum accuracy of 33.66%. Simultaneously, the ITML and Adaboost methods have showcased moderate outcomes over the MCC with the closer accuracy values of 41.8% and 44.41% respectively. Then, the Xing’s and LMNN models illustrated considerable accuracy values over the earlier models with the accuracy of 45.24% and 45.42% respectively. On the other hand, the L1-Norm models and PRDC showcased better accuracy of 54.55% and 49.04%. Ahmed et al. has shown an accuracy of 87.94%. The above techniques failed to perform well than the proposed HPO-DLDN model which has gained a maximum accuracy of 92.18%.
Methods | Rank-1 (%) | Rank-5 (%) | Rank-10 (%) | Rank-20 (%) |
---|---|---|---|---|
Proposed HPO-DLDN | 90.23 | 92.18 | 94.97 | 95.64 |
Ahmed et al. | 65.00 | 87.94 | 93.12 | 97.20 |
MCC | 12.00 | 33.66 | 47.96 | 67.00 |
ITML | 21.67 | 41.80 | 55.12 | 71.31 |
Adaboost | 22.79 | 44.41 | 57.16 | 70.55 |
LMNN | 23.70 | 45.42 | 57.32 | 70.92 |
Xing’s | 23.18 | 45.24 | 56.90 | 70.46 |
L1-Norm | 26.73 | 49.04 | 60.32 | 72.07 |
PRDC | 32.60 | 54.55 | 65.89 | 78.30 |
Similarly, on determining the results under rank-10, the MCC method shows poor performance by offering the least accuracy of 47.96%. Meantime, the ITML and Adaboost models depicted slightly better outcomes over the MCC with the closer accuracy values of 55.12% and 57.16% respectively. The Xing’s and LMNN models demonstrated somewhat higher accuracy values over the earlier models with the accuracy of 56.9% and 57.32% correspondingly. Next, the PRDC and L1-Norm approaches illustrated an acceptable accuracy of 65.89% and 60.32%. Though the Ahmed et al. model exhibited maximum accuracy values of 93.12%, it failed to outperform the proposed HPO-DLDN model which has achieved a maximum accuracy of 94.97%.
Likewise, on computing the results under rank-20, the MCC technique showed inferior performance by offering the least accuracy of 67%. The Adaboost and ITML approaches demonstrated somewhat higher accuracy values over the previous models with the accuracy of 70.55% and 71.31% respectively. Meanwhile, the LMNN and Xing’s models depicted slightly better outcomes with identical accuracy values of 70.92% and 70.46% respectively. Besides, the PRDC and L1-Norm models showcased reasonable accuracy of 78.30% and 72.07%. Though the Ahmed et al. model exhibited maximum accuracy values of 97.20% at Rank 20, it failed to outperform the proposed HPO-DLDN model which has obtained a maximum accuracy of 90.23% at Rank 1 itself.
Methods | Rank-1 (%) | mAP |
---|---|---|
Proposed HPO-DLDN | 90.23 | 94.32 |
ResNet50+KISSME | 87.47 | – |
Inceptionv3+KISSME | 86.91 | – |
LOMO + XQDA | 49.70 | 56.40 |
LOMO + XQDA + k-RE | 50.00 | 56.80 |
IDE (C) | 57.00 | 63.10 |
IDE (C) + k-RE | 57.20 | 63.20 |
IDE (C) + XQDA | 61.70 | 67.60 |
IDE (C) + XQDA + k-RE | 61.60 | 67.60 |
At the same time, the LOMO + XQDA + k-RE and IDE (C) models depicted slightly better outcomes over the LOMO + XQDA with the closer accuracy values of 50% and 57% respectively. The IDE (C) + k-RE and IDE (C) + XQDA + k-RE models demonstrated somewhat higher accuracy values over the earlier models with the accuracy of 57.2% and 61.6% respectively. Besides, IDE (C) + XQDA model demonstrated moderate accuracy of 61.7%. Though the Inceptionv3+KISSME and ResNet50+KISSME models exhibited higher accuracy values of 86.91% and 87.47%, it failed to outperform the proposed HPO-DLDN model which has obtained a maximum accuracy of 89.74%.
A novel HPO-DLDN model for PRe-ID is introduced to solve person re-identification problem. The proposed HPO-DLDN involves different processes for PRe-ID, such as DenseNet-169 based feature extraction, Euclidean distance-based similarity measurement, and ENDR based feature re-ranking. Firstly, the DenseNet-169 model-based feature extraction process is used to compute the useful set of feature vectors of the probe image and gallery images. Similarity is computed between the query vectors and gallery vectors using Euclidean distance. Initial ranking and re-ranking process is done by the ENDR model with Mahalanobis distance. Finally, the images with top