In the augmented reality system, the position and direction of the user’s point of view and line of sight in the real scene is acquired in real-time. The position and direction information will determine the exact position of the virtual object of the real scene. At the same time, various coordinate systems are established according to the user’s line of sight. So registration tracking technology is very important. The paper proposes an accurate, stable, and effective augmented reality registration algorithm. The method adopts the method of ORB (oriented FAST and rotated BRIEF) features matching combined with RANSAC (random sample consensus) to obtain the homography matrix and then uses the KLT (kanade-lucas-tomasi) tracking algorithm to track the mark, which is a better solution. The error accumulation defect based on the natural feature tracking registration method is improved, and the stability and accuracy of the registration are improved. Experiments have proved that the algorithm in this paper is accurate, stable, and effective, and can complete the virtual and real registration tasks accurately and stably even when the marked part is not visible.
Augmented Reality (AR) can be defined as a system that fulfills three basic features: a combination of real and virtual worlds, real-time interaction, and accurate 3D registration of virtual and real objects [
The concept of AR is defined by Professor Ronald T. Azuma in [
The structure of the paper is as follows: Section 2 shows the 3D registration methods of augmented reality. Among these algorithms, unmarked registration has gradually become a new hot spot in the research of the augmented reality system. Therefore, by comparing the three methods of natural feature point registration, SIFT (scale-invariant feature transform), SURF (speeded up robust features), and ORB (oriented FAST and rotated BRIEF), the paper proposes the improvement and application of ORB algorithm. In Section 3, we mainly describe the transformation principle of the coordinate system in the enhancement implementation; in Section 4, we propose our own improvement method by describing the problems existing in the traditional ORB algorithm and use the KLT (Kanade-Lucas-Tomasi) tracking algorithm to track the mark. Finally, the analysis of experimental results and conclusions are discussed in Section 5 and Section 6.
The paper proposes an accurate, stable, and effective augmented reality registration algorithm. The method adopts the method of ORB feature matching combined with RANSAC to get the homography matrix and then uses the KLT tracking algorithm to track the mark, which is a better solution the error accumulation defect based on the natural feature tracking registration method is improved, and the stability and accuracy of the registration are improved. Experiments have proved that the algorithm in this paper is accurate, stable, and effective, and can complete the virtual and real registration tasks accurately and stably even when the marked part is not visible.
To achieve accurate three-dimensional registration for tracking and registration in the augmented reality system, the position of the camera in the real scene must be obtained in real-time, the position and direction information of the user’s point of view, to use the conversion relationship between the coordinate systems according to the position information of the camera. The superimposed virtual objects are correctly projected to the exact position of the real scene. Obviously, the registration part is a necessary prerequisite for seamless integration between the real scene and the virtual scene. If there is a slight error in the 3D registration module, the registration result of the augmented reality system will be wrong, which will cause the fusion between the real scene and the virtual object to fail [
In the field of computer vision, the feature point refers to the point where the gray value of the image changes drastically or a point with a larger curvature on the edge of the image. The feature point is composed of two parts: a key point and a descriptor. The key point refers to the specific position of the feature point in the image [
There are three feature points based on natural feature points registration are SIFT(Scale-Invariant Feature Transform) , SURF(Speeded Up Robust Features),and ORB(Oriented FAST and Rotated BRIEF). SIFT features have many advantages such as rotation, scale, and illumination in-variance, and each feature points are highly unique, but an enormous amount of calculation. The SURF feature uses the Hessian matrix for feature extraction. Compared with the SIFT feature, SURF features have a significant advantage in extraction speed. The ORB feature is implemented based on the FAST feature points and the binary descriptor BRIEF. The ORB feature extraction speed is extremely fast, and it has rotation and scale in-variance. Since the extraction and matching of feature points are only a small part of the many links in the system, for systems with high real-time requirements, ORB features are usually selected. KLT is a feature point tracking algorithm based on the principle of optical flow. The KLT algorithm is essentially based on the three assumptions of the optical flow principle. Unlike the optical flow algorithm that directly compares the gray values of pixels; the KLT algorithm compares the window pixels around the pixel to find the most similar pixel.
This paper uses the ORB (oriented FAST and rotated BRIEF) feature matched algorithm combined with the random sampling consistency (RANSAC) method to get the homography matrix to complete the virtual and real registration, to find the three-dimensional information of a real scene image, and then use the KLT tracking algorithm to track the mark.
AR needs to consider multiple transformation systems, see
The Cartesian coordinate system is used mainly for its simplicity and familiarity and most virtual spaces are defined by it. The x-y-z based coordinate system is precise for specifying the location of 3D objects in virtual space. The three coordinate planes are perpendicular to each other. Distances and locations are specified from the point of origin which is the point where the three planes intersect with each other. This system is mainly used for defining visual coordinates of 3D objects.
The image collected by the camera is converted into a digital image by a high-speed image acquisition system in the form of a standard TV signal and then input into the computer. Each image is an M*N array and the value of each element in the image of M rows and N columns is the brightness of the image point, and (u, v) are the pixels of the image coordinate system coordinates.
Since (u, v) only represents the number of columns and rows of the pixel in the array, the position of the pixel in the image is not expressed in physical units. Therefore, it is necessary to establish an image coordinate system expressed in physical units, that is, the XOY coordinate system shown in the
Then (u0, v0) are the coordinates of the origin of the image coordinate system in the pixel coordinate system, dx and dy are the physical dimensions of each pixel in the x and y directions of the image plane.
As shown in
The virtual coordinate system is a geometric description of virtual objects. When using an augmented reality system, the virtual coordinate system coincides with the world coordinate system or is proportional to each other. The transformation relationship between them is as follows:
For any point P, the intersection of the image plane and O is the projection position of point P on the image. This kind of projection relationship is also called central projection or perspective projection. Assuming p point is in the imaging plane coordinates The coordinates of the system are (x, y), and the coordinates in the camera coordinate system are (Xc, Yc, Zc, 1), the perspective projection relationship can be described in homogeneous coordinates and matrix form:
According to formulas (1) and (2), the relationship between the coordinates of point p and the corresponding projection coordinates (u, v) of point p can be obtained, as shown in formula (5):
M is the projection matrix; M1 is determined by four parameters bx, by, u0, v0, among which the parameters are only related to the internal parameters of the camera, so we define it as the internal parameters of the camera. M2 is affected by the camera relative to the world coordinate system and is called the external parameters of the camera. Therefore, determining the internal and external parameters of a certain camera is called camera calibration.
Registration technology enables virtual images to be superimposed accurately in the proper environment. The main flow of 3d registration technology has two steps. First, determine the relationship between the virtual image, the model and the direction and position information of the camera or display device. Second, the virtual rendered image and model are accurately projected into the real environment, so the virtual image and model can be merged with the real environment. For the three-dimensional registration technology based on computer vision, it sets the reference point to realize the determination of the direction and position of the real scene by the camera or the display.
To realize the three-dimensional registration process in augmented reality, the internal and external parameters of the camera are required. In fact, the camera calibration determines the internal parameter matrix (5), and the external parameter matrix M contains a translation component T and 3 rotation components. Therefore, the external parameter matrix M of each frame of the image obtained in the three-dimensional registration can be uniquely determined
The position of the point, that is, the accurate registration position of the virtual object in the real scene.
Due to the use of special manual marking, there are some limitations. This paper uses the ORB (oriented FAST and rotated BRIEF) feature matching algorithm combined with the random sampling consistency (RANSAC) method to obtain the homography matrix to complete the virtual and real registration, that is, to find the three-dimensional information of a real scene image, and then use the KLT tracking algorithm to track the mark. Experiments show that the tracking and registration algorithm based on natural feature points can accurately identify natural objects and have a certain resistance to occlusion.
ORB (Oriented FAST and Rotated BRIEF) was developed at OpenCV labs by Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary R. Bradski in 2011, as an efficient and viable alternative to SIFT and SURF. ORB was conceived mainly because SIFT and SURF are patented algorithms. However, ORB is free to use. ORB performs and SIFT on the task of feature detection (and is better than SURF) while being almost two orders of magnitude faster. ORB builds on the well-known FAST key points detector and the BRIEF descriptor. All of these techniques are attractive because of their excellent performance and low cost [
The ORB algorithm is based on the FAST algorithm for feature point detection [
So x is the pixel in point p, I(x) and I (p) are the gray points of point p and point x in the target image; and where ɛd is the given threshold; then the preset threshold is less than the number of all pixels that meet the above formula N, So this point is corner point.
To obtain the rotational invariant feature, the first moment is used to calculate the local direction through the weighted average of the pixel size in the local region. The moment definition of image block in the literature is given as equation.
We can describe the basic idea of BRIEF feature description as
Randomly select n pixel pairs in the neighborhood of a feature point, and calculate the gray value of all point pairs according to the binary rule. Then generate a binary string of length n, which is the feature descriptor of the feature point. Once the ORB feature detection algorithm is derived [
The binary criterion can be effectively defined by selecting nd (x,y) position pairs, and the binary bit string of dimension is the BRIEF descriptor,
Optical Flow is the projection of the motion of an object in a three-dimensional space on a two-dimensional image plane. It is generated by the relative speed of the object and the camera and reflects the direction and speed of the image pixel corresponding to the object in a tiny time. KLT (Kanade-Lucas-Tomasi) is a feature point tracking algorithm based on the principle of optical flow [
The KLT algorithm is essentially based on the three assumptions of the optical flow principle. Unlike the optical flow algorithm that directly compares the gray values of pixels; the KLT algorithm compares the window pixels around the pixel to find the most similar pixel. Based on the optical flow assumption, in a short time τ, the two images before and after meeting:
The pixel displacement vector satisfies the affine motion model D = Dx + d; among them, D is called Deformation Matrix, and d is called Displacement Vector. D represents the deformation of the two pixel window blocks after the movement, so when the window is small, it will be more difficult to estimate. Usually D can measure the similarity of two pixel windows, that is, to measure whether the feature points have drifted. For the amount of optical flow tracking only the translation model is considered:
Under the pixel window, construct the error function:
Where w(x) is the weight function, which can be defined as Gaussian. The above equation takes the derivative of the variables D and d respectively:
Given the translational motion model result, let D = 0:
However, the KLT algorithm cannot maintain the number of feature points, so the number of feature points will gradually decrease, and the reduction of feature points will affect the accuracy of the homography matrix solution [
In this paper, the homography matrix can be obtained by using ORB feature matching combined with random sampling consistency, and then the KLT tracking algorithm is used to track the markers. Through the previous (1), (2), and (3), the homography matrix s value can be obtained by an iterative method. With the initial frame, it can be obtained by the method of ORB feature matching combined with RANSAC [
Therefore, the camera pose M can be calculated to complete the virtual and real registration.
This experiment runs on the system of windows 10, VS 2015, OpenCV 3.2.1 and OpenGL are used to implement image registration methods based on improved ORB and RANSAC. In the experiment, the hardware configuration included Intel Core i7-7700K CPU 3.60GHz, 8G memory, and microsoftHD-3000 camera.
In the
Through experiments, it is found that the algorithm in this paper can accurately track and register flat natural objects. From
Aiming at the defects of low registration rate and high error rate of traditional ORB algorithm, this paper proposes an improved AR image registration method based on ORB algorithm and RANSAC, to obtain the homography matrix to complete the virtual and real registration, that is, to find the three-dimensional information of a real scene image, and then use the KLT tracking algorithm to track the mark. Experiments show that the tracking and registration algorithm based on natural feature points can accurately identify natural objects and have a certain resistance to occlusion. In future work, we will further optimize the algorithm of this paper to satisfy the actual requirements of various environments. Besides, research on camera pose calculation will be continued to further enhance the scene’s effect.