With the development of new media technology, vehicle matching plays a further significant role in video surveillance systems. Recent methods explored the vehicle matching based on the feature extraction. Meanwhile, similarity metric learning also has achieved enormous progress in vehicle matching. But most of these methods are less effective in some realistic scenarios where vehicles usually be captured in different times. To address this cross-domain problem, we propose a cross-domain similarity metric learning method that utilizes the GAN to generate vehicle images with another domain and propose the two-channel Siamese network to learn a similarity metric from both domains (i.e., Day pattern or Night pattern) for vehicle matching. To exploit properties and relationships among vehicle datasets, we first apply the domain transformer to translate the domain of vehicle images, and then utilize the two-channel Siamese network to extract features from both domains for better feature similarity learning. Experimental results illustrate that our models achieve improvements over state-of-the-arts.
With the popularization of vehicles and the rapid development of traffic, the demand for obtaining traffic information on camera equipment is also increasing. Vehicle matching entification has many practical applications like video surveillance systems that aims to identify a target vehicle in different cameras in different conditions, such as multi-viewpoint, and day or night patterns.
Previous works [
Inspired by researches on the cross-domain [ A framework based on the cross-domain similarity metric learning is designed for vehicle matching. In this framework, we unify the domain of input image pairs and then feed it into the Siamese network to calculate their distance metrics. The proposed framework can solve the cross-domain problem of vehicle matching well. To address that network has poor ability to extract features of night pattern images, the two-channel Siamese Siamese network is proposed that not only extracts the day pattern image features, but also night pattern image features. The two-channel Siamese network calculates the similarity metrics from both domains. We conduct extensive experiments to show that the proposed method outperforms state-of-the-arts on the VehicleID and VERI-Wild datasets.
With the rapid development of deep learning, the researchers use deep learning-based feature representations for vehicle matching. Li et al. [
In recent years, various methods have explored similarity metrics to handle the vehicle matching tasks. The main idea of similarity metric learning is features that belong to the same class are kept closed and differences are distant. Some existing networks such as Siamese [
Although these works reach remarkable success in vehicle matching tasks, the performance always be terrible when these vehicle images belong to the different domains. Thus, the difference between the vehicles day and night patterns have caused difficulties and challenges.
In this section, we illustrate the details of our proposed method. Specifically, we first introduce the pattern discrimination, and then introduce domain transition. At last, the two-channel Siamese network learns the similarity metric on the basis of unified domain. As shown in
As shown in
The pattern discrimination was applied at the front of the framework, which consists of a lightweight network called Resnet10. We discriminate the day-night pattern of a pair of images respectively to make different processing for different domains.
The domain transition uses a transformer which is a pre-train network called Cyclegan [
In the stage of similarity metric learning, we propose the two-channel Siamese structure, which could extract features from target domain without loss of source domain feature and have better generalization ability for cross-domain similarity metric learning. Inputs of the two-channel Siamese are selected by day pattern pairs and the night pattern pairs. Positive and negative samples are inputted with equal probability for the balance dataset. The two-channel Siamese network map the inputs to the new feature space respectively, and the similarity of two inputs is evaluated by calculating the loss value.
Foremost, choosing a pair of images as inputs of the gating to discriminate their pattern. The images will be fed into the domain transformer to translate their domain to the other, i.e., day pattern or night pattern. In the above manner, we ensure that each pair of the inputs are from the same domain. The feature representation from the same domain is beneficial to the similarity metric learning. The pairs with day pattern and night pattern will be fed into the two-channel Siamese network to learn the similarity metrics, respectively.
We propose the gating to discriminate the image patterns. This dataset is defined as two labels for the pattern discrimination: day and night. The pattern discrimination is critical to the next stages. On the one hand, it will make the domain transition to transform the specific domain, on the other hand, the images from two different domains will be fed into the corresponding branches correctly.
The input sample is defined as
The linear functions of all the k classes are combined to form a linear transformation layer, where the
The larger value of the affinity score
We utilize the pre-train network to translate the domain, which based on the pix2pix framework of Isola et al. This framework uses conditional generative adversarial networks to learn the mapping from input to output images. This network learns the mapping functions between two domains A and B.
The input of GAN is defined by
The network combines the two discriminators and two generators, it is beneficial to the network to translate different input images to different output images. The same set of input images may be inputted to any random permutation of images in another domain, where any of the learned mappings can induce an output distribution that matches the target distribution. The full loss function is:
The pairs of day pattern images and night pattern images were inputted into the two-channel Siamese network, and then measure the distances of input pairs which belong to the same domain. To make it clear, these samples are defined by
We aim to pull features from the same class closer to each other and push features from different classes away by adopting the contrastive loss as follows:
This loss function can expression of matching degree of paired samples. If the two inputs are the same vehicle, the output features will be spatially close. If not, the output features will be spatially distant. Y means whether the input samples belong to the same class and the loss function will be:
When the samples belong to the different vehicles, if the Euclidean distance of its feature space is decline, the loss value will be increase. Thus, we train the network to learn the similarity metric of vehicle images by reducing the contrastive loss value.
We propose the two-channel Siamese network to learn a similarity metric for both day pattern and night pattern. The feature of the night pattern image also contains some meaningful information even it is not sufficient to support the similarity learning for the network. The domain transition will cause the loss of some feature details. Thus, we calculate the similarity metric of images and fuse with the similarity metric of another domain, and it is beneficial to the generalization ability of network by calculating the loss.
VehicleID dataset [
VERI-Wild dataset [
The mean average precision (MAP) which is computed from its precision-recall curve (P-R) is adopted in our experiments.
Here we test the influence of different parameter combinations on the experimental results. There are five combinations: {0, 1}, {0.1, 0.9}, {0.2, 0.8}, {0.3, 0.7}, {0.4, 0.6}, {0.5, 0.5}and we set
VehicleID | VERI-wild | |
---|---|---|
(0, 1) | 94.55 | 90.73 |
(0.1, 0.9) | 94.21 | 90.25 |
(0.2, 0.8) | 94.71 | 91.72 |
(0.3, 0.7) | 95.74 | 92.55 |
(0.4, 0.6) | 95.12 | 91.85 |
(0.5, 0.5) | 92.68 | 89.12 |
The quantity of night pattern images is much less than the day pattern images. Thus, we translate the images from day pattern to night pattern to change the percentage of the images. It is clearly observed in the
We compare our method on VehicleID and VERI-Wild datasets with several state-of-the-art methods, including LABNet [
Methods | VERI-wild MAP |
---|---|
LABNet [ |
82.61 |
LABNet-50 [ |
81.05 |
PVEN [ |
82.53 |
92.55 |
Methods | VehicleID MAP |
---|---|
LABNet [ |
89.63 |
LABNet-50 [ |
87.54 |
DMML [ |
87.37 |
95.72 |
This paper proposes the cross-domain similarity metric learning method for vehicle matching through a two-channel Siamese network. In the proposed method, we first discriminate the day-night pattern of a pair of images respectively and translate their domain to the other. Then, the network can learn the similarity metric between pairs of vehicle images whether they belong to the same domain or not because we calculate the distance matrix of both domains. Experimental results confirm that that the proposed method brings substantial improvements to vehicle matching accuracy. However, the proposed method relies on an extra network to distinguish the domain. In the future, we will make the domain transition generate images based on demand, rather than requiring an additional discriminant.
Thanks to the teacher and students of my team for their guidance in the process of completing this article. We gratefully acknowledge the support of Engineering Research Center of Digital Forensics for providing us the RTX3090 GPU which was used in this research.