This paper introduces the third enhanced version of a genetic algorithm-based technique to allow fast and accurate detection of vehicle plate numbers (VPLN) in challenging image datasets. Since binarization of the input image is the most important and difficult step in the detection of VPLN, a hybrid technique is introduced that fuses the outputs of three fast techniques into a pool of connected components objects (CCO) and hence enriches the solution space with more solution candidates. Due to the combination of the outputs of the three binarization techniques, many CCOs are produced into the output pool from which one or more sequences are to be selected as candidate solutions. The pool is filtered and submitted to a new memetic algorithm to select the best fit sequence of CCOs based on an objective distance between the tested sequence and the defined geometrical relationship matrix that represents the layout of the VPLN symbols inside the concerned plate prototype. Using any of the previous versions will give moderate results but with very low speed. Hence, a new local search is added as a memetic operator to increase the fitness of the best chromosomes based on the linear arrangement of the license plate symbols. The memetic operator speeds up the convergence to the best solution and hence compensates for the overhead of the used hybrid binarization techniques and allows for real-time detection especially after using GPUs in implementing most of the used techniques. Also, a deep convolutional network is used to detect false positives to prevent fake detection of non-plate text or similar patterns. Various image samples with a wide range of scale, orientation, and illumination conditions have been experimented with to verify the effect of the new improvements. Encouraging results with 97.55% detection precision have been reported using the recent challenging public Chinese City Parking Dataset (CCPD) outperforming the author of the dataset by 3.05% and the state-of-the-art technique by 1.45%.
Automatic license plate recognition (ALPR) is a basic component of Intelligent Transportation Systems (ITS) which is an essential system in smart cities. ALPR has many applications such as law enforcement, surveillance, and toll booth operations [
The proposed LP symbols detection system is composed of four phases and each phase is composed of many stages as shown in
In this phase, an input color image is subjected to four stages to extract a number N of foreground objects. These four stages (Color-to-Grey Conversion, Adaptive Binarization, Morphological Operations (optional), and Connected Component Analysis (CCA) are different than those in the previous versions [
Unlike most of the work that deals with LP detection that uses the traditional color-to-grey formula [
In the first trial, the system performs binarization using a fixed-size window binarization method based on the Nick technique [
In the Nick method, the local threshold LT (
The second binarization technique is called Niblack [
The third binarization technique introduced by Singh et al. [
The reason for selecting the above three binarization techniques is their fast and complementary effects based on trial and error.
Since the adaptive filters may produce connected characters, it is sometimes helpful to use the opening operation to separate the connected characters. In this research, I perform an erosion (or open) operation for the output of the Niblack filter using a vertical structuring element of size equal to one-third of the window length (WL/3). Due to the composition of the Chinese character of 2 or more CCOs, a dilation (or close) operation is performed using a horizontal structuring element of size equal to one fifth the window length (WL/5) after the Delta technique to unify the Chinese symbol and at the same time to connect broken symbols.
In this research, 4-connected component objects are detected, and their bounding boxes parameters are calculated relative to a rotating axis (from −85° to +85°). This allows using the exact dimensions of each object when it is included in a chromosome having a sequence of objects that have an inclined direction without resorting to rotating the prototype template or the sequence of objects themselves to obtain accurate objective distances as shown in
Objects filtering is carried out through three stages. The first stage filters out objects based on their properties (width, height, aspect ratio (Asp), and extension (Ext)). Where the extension is the number of foreground pixels divided by the object area. The second stage removes repeated objects. The third stage is region-based filtering which depends on the distribution of objects around each object.
Assuming the image width is ImageWidth, then the maximum width of an object (MaxW) is equal to ImageWidth/L; where L is the number of license plate symbols. The maximum height (MaxH) of an object is (2
Any two objects (j1 and j2) having similarity in their coordinates (X, Y) and dimensions (W, H) are unified which means only one of them is retained and the other is removed where the similarity (S) is given by the following logical expression:
This stage depends on the distribution of objects over two sizes of square areas, small-size squares, and large-size squares. Small-size squares are of size 10 × 10. Large size squares have a size of V × V, where V is estimated based on the average object width (Wavg) of all the detected objects (M) and the number of symbols in an LP (L) as follows:
Then we create two matrices each having a number of cells equal to the number of squares in the image, one for the small squares and the other for the large squares. Every object is assigned to its corresponding cell in the two matrices based on its X and Y position and each cell of both matrices contains the number of belonging objects. The integral matrix is calculated for the small-size matrix.
Because symbols in an LP are located near each other and there is a semi-fixed spacing between every two consecutive symbols and this spacing is related to the average width of the symbols, then each LP symbol should have at least 3 neighbor symbols around it if we define the symbol’s neighborhood as the square area that surrounds the symbol and have dimensions
Objects in very high-density regions are removed based on the large-size squares matrix by removing objects that belong to cells having more than 3 × (L + 3) objects (because we have CCOs output from three binarization techniques and the Chinese symbol may be composed of two or more CCOs).
Significant enhancements of this phase have been carried out on the updated GA in [
As in [
In [
The main relations (RX, RY, and RH) defined in [
These angle dependent relations will be used in calculating the objective distance based on the horizontal (at angle 0°) geometric relationship matrix GRM for the concerned plate that is defined as follows:
Hence to calculate the distance between any chromosome
In this research, two penalties are added to impose more accurate localization.
The first penalty is added to impose correct ordering of objects inside the LP and maintain a minimum spacing between objects in the X-direction as follows:
The second penalty is added to prevent large deviation in the height of the LP objects as follows:
Also, the weights in the final objective distance, between a prototype p and any chromosome k, are modified to give more weighting for the X and H components as follows:
Hence, the fitness of a chromosome
In [
If we consider the maximum acceptable objective distance threshold (
Due to errors caused by tilt and because a powerful fake detection phase is used, the
The mutation and crossover operators are left intact as in [
In [
In [
Unlike the objective function of a chromosome which is defined to measure the distance between the current chromosome and the LP prototype and used to give a relative fitness of the chromosome within the current population we need to define a genetic distance to measure the fitness of the current gene within the current chromosome relative to other genes in the same chromosome based on the geometric properties of the LP problem. The purpose of the memetic operator is to minimize the memetic distance using a local search [
In this operator the collinearity of a chromosome’s objects is increased by replacing the most deviated objects with others to ultimately detect the sequence of LP symbols that lie on a single straight line. By considering the Hough Line transform [
Since the LP objects represented in a certain chromosome should have similarities in their widths (W) and heights (H), the MA operator should maximize these similarities by considering the deviation of each gene property from the mean value of the same property in the current chromosome. Hence, the proposed algorithm is composed of three steps as follows:
Hence the output of this step is the index of the worst gene
If one of the above conditions is true the worst fit object obworst is replaced by a best-fit object
By applying the above algorithm, the chromosome state is moved to a local minimum as in the Hill-climbing algorithm which speeds up the search process significantly. For instance, after testing the plate used in the experiment shown in
It is worth mensionning that the proposed memetic operator mostly allows a partial solution to reach the optimum one even if the partial solution has only 2 out of 6 symbols of the correct (or semi-correct) solution in a given chromosome as in
Unlike the familiar mutation operator which is executed with a fixed probability not related to the fitness or objective distance of the chromosome, the introduced memetic operator is performed only on the best fit chromosomes whose Objective distances ODs are less than the value (ODmin + ODavg/10) to reduce the processing time and concentrate on the most promising solutions.
In the previous paper [ 1- Generating the training data set
In this step the MA-based algorithm is run on the Base training dataset and for each image the MA-based algorithm is run until an LP is found based on the ground truth data, where the TP regions are stored in the TP folder and the FP ones are stored in the FP folder.
2- Training a ready-made CNN model on the generated training set
In this step we tested two ready-made CNN models, mainly GoogleNet [ 3- Optimization of the binary-classification accuracy of the model based on the training data
In this step the enhancement of the produced model is tried in two dimensions. The first dimension regards data augmentation during the training process. In all the conducted experiments, random rotation in the range from −30 to 30 and random shear in both X and Y directions in the range from −10 to 10 are performed. The second dimension regards the contents of the prepared images for the training process. Two cases are tested, in the first case, the training images contain only the regions that just include the LP while in the second case the image area is longitudinally doubled, and the LP region is kept in the center.
A third experiment is conducted based on the second case but after replacing the average global pooling layer in the GoogleNet with a 3 × 3 average pooling layer with a stride 2 × 2 such that the output of the pooling layer is 3 × 3 × 1024 instead of 1 × 1 × 1024. This modification increased the detection accuracy of the Rotate dataset section by 1.6% reaching 97.8% because it allowed the LP context to be well presented to the next fully connected layer and hence higher accuracy is achieved. A further experiment is conducted after increasing the height of the LP area by double the average hight of its symbols (extended by Havg from each side) and the pooling layer is removed such that the input to the fully connected layer is 7 × 7 × 1024. This modification resulted in an increase of 0.90% reaching 98.70% for the Rotate data section. When using the ground truth data for verification instead of the CNN model the accuracy of the Rotate section reached 99.75% which means that the proposed candidates by the MA-based technique cover most of the solution space but because the CNN model is the only data-dependent unit in the proposed technique the accuracy is still less than the maximum allowed limit. Since the technique is based on MA and CNN it will be refered to as MA-CNN Technique.
The evaluation of the MA-CNN technique will be based on the CCPD benchmark dataset [
As in [
Experiments on the AOLP will consider recall also, which is defined as follows:
All the experiments performed in this research are implemented using MATLAB 19B on a PC with 32 GB random-access memory (RAM), Intel core i7-8700K CPU@3.70 GHz, and a graphical accelerated processing unit (GPU) of NVIDIA GeForce GTX 1080 Ti with 11 GB RAM. Since in [
Detection technique | FPS | AP | Base | DB | Weather | FN | Rotate | Tilt | Challenge |
---|---|---|---|---|---|---|---|---|---|
Faster R-CNN) [ |
15 | 92.9 | 98.1 | 92.1 | 81.8 | 83.7 | 91.8 | 89.4 | 83.9 |
YOLO9000 [ |
42 | 93.1 | 98.8 | 89.6 | 84.2 | 77.3 | 93.3 | 91.8 | 88.6 |
SSD300 [ |
40 | 94.4 | 99.1 | 89.2 | 83.4 | 84.7 | 95.6 | 94.9 | 93.1 |
TE2E [ |
3 | 94.2 | 98.5 | 91.7 | 83.6 | 83.8 | 95.1 | 94.5 | 93.1 |
RPnet [ |
61 | 94.5 | 99.3 | 89.5 | 84.1 | 85.3 | 94.7 | 93.2 | 92.8 |
Ref [ |
14 | 96.1 | |||||||
MA-CNN technique | 23.2 | 90.63 | 92.65 |
From
To adapt the technique parameters for the AOLP dataset, the blue channel is considered with an equal weight to other channels in
Detection technique | AC | LE | RP | |||
---|---|---|---|---|---|---|
Precision | Recall | Precision | Recall | Precision | Recall | |
Ref. [ |
91% | 96% | 91% | 95% | 91% | 94% |
Ref. [ |
99.71% | 99.4% | 99.47% | |||
Ref. [ |
99.56% | 99.87% | 99.87% | 99.02% | 99.18% | |
MA-CNN | 99.27% | 99.02% | 99.18% |
Based on the results in
Although the MA-CNN has been accelerated using C++ and GPU computing in about 50% of the implemented routines, it is still relatively slow compared to the state-of-the-art real-time techniques. This can be overcome by re-implementing all the routines in the memetic algorithm using GPUs.
A novel MA-CNN technique has been introduced for the localization of the LP symbols using a memetic algorithm that proposes a candidate solution that is verified by a CNN model which filters out FP solutions and pushes the technique to do more searches for the best solution.
A significant enhancement of the previously designed GA-based system for localizing the vehicle plate number inside plane images has been introduced to support fast and accurate detection of LPs in complex contexts. Hence, a deeper solution to the binarization problems has been introduced to overcome difficult illumination and physical problems. In the binarization stage, a hybrid local adaptive technique with different window sizes and binarization parameters has been introduced to enrich the solution space and to guarantee the detection of the TP solutions if they exist in different perspective and illumination contexts. A neighbourhoud-based filtering stage has been introduced to minimize the number of non-LP symbols and hence minimize the MA search space to support fast and accurate detection of the LP symbols. The output of the connected component analysis stage is enriched by storing the positions and dimensions of every CCA object at any angle to allow accurate calculation of the objective distance between the chromosome genes and the concerned LP GRM defined in one horizontal direction resulting in more accurate and fast localization of the unknown plate. The fake stage in the previous versions has been replaced with a CNN model that is optimized to differentiate between LP and non-LP regions based on the surrounding context of the localized plate. A novel memetic operator has been introduced based on the Hough line representation to perform a local search for the linearly aligned LP-symbols which significantly minimized the search time and guaranteed convergence to the global minimum with the least number of iterations.
Finally, the new MA-CNN technique has been tested on two challenging benchmark datasets, which resulted in highly accurate detection of the LP with a low FP rate. The precision of the new system now competes with the state of the art techniques although maintaining a wider range of angle and scale. The wide range of the complexities of the LP images in the experimented datasets proves the robustness of the new system and its high ability to overcome challenging contexts.
The author wishes to thank all the FCIT members, especially the Dean and the Head of the CIS department for the facilities provided to complete this research.