In nowadays society, the safety of the elderly population is becoming a pressing concern, especially for those who live alone. There might be daily risks such as accidental falling or treatment attack on them. Aiming at these problems, indoor positioning could be a critical way to monitor their states. With the rapidly development of the imaging techniques, wearable and portable cameras are very popular, which could be set on human individual. And in view of the advantages of the visual positioning, the authors propose a binocular visual positioning algorithm to real-timely locate the elderly indoor. In this paper, the imaging model has been established with the corrected image data from the binocular camera; then feature extraction has been completed to provide reference to adjacent image matching based on the binary robust independent elementary feature (BRIEF) descriptor, finally the camera movement and the states of the elderly have been estimated to distinguish their falling risk. In the experiments, the real-sense D435i sensors were adopted as the binocular cameras to obtain indoor images, and three experimental scenarios have been carried out to test the proposed method. The results show that the proposed algorithm can effectively locate the elderly indoor and improve the real-time monitoring capability.

Due to the decrease of the physical function of the elderly and the influence of various chronic diseases, it is of great social significance to improve the safety monitoring of their daily behaviors, especially for those who live alone. Indoor positioning technology with high accuracy and reliability can help to determine the individual position in real time. Various technologies for indoor positioning, such as, wireless Wi-Fi, Bluetooth, ultrasonic positioning, radio frequency identification (RFID) and ultra-wideband (UWB), have been proposed in the past decades. Wi-Fi [

Throughout the development of the visual positioning, Sattler et al. [

For binocular indoor monitoring of the elderly, the cameras are mounted on the individual, the most critical part is the positioning of the carrier in successive images. The popular implementations of binocular positioning mainly include the optical flow method and feature point method. The optical flow method is under the illumination invariance assumption [

The feature matching of the binocular positioning can be divided into two aspects, namely the matching of the successive frames and the matching between the left and right images. For the matching of the successive frame matching, the widely-used direct matching method has large computation burden, so that this paper adopts the approximate nearest-neighbor algorithm to simplify the process. The depth information of the feature point can be obtained from the left and right images, also known as the stereo matching. Which can be divided into the global stereo matching algorithm and local stereo matching algorithm [

Given the needs to the accurate and real-time stereo matching of the indoor positioning for the elderly living alone, this paper contributes to propose a binocular positioning based on the feature point extraction algorithm that unconventionally uses the BRIEF descriptor to construct the cost function to obtain accurate depth information. The proposed algorithm is dedicated to find the feature correspondence between two images by tracking the ORB feature points, and the PnP 3D-2D model has been built to perform the real-time motion estimation of the individual. After all, the position and movement tracking of the elderly could be timely monitored. Thus, it is possible to send out warning messages for potential risks. Through a series of experiments, the feasibility and effectiveness of the proposed indoor positioning technique have been evaluated.

The organization of this paper is as follows. Section 2 presents the general principles of the binocular positioning and the schematic process of the proposed design. And it describes the feature matching, the stereo matching method, and the implementation details of the BRIEF descriptor in Section 3. Section 4 verifies the proposed algorithm through a set of indoor tests. Section 5 concludes the paper.

The main flow diagram of the binocular positioning system is shown in

The detailed algorithm diagram is shown in

In the key point tracking process, a time threshold ΔT has been designed. It is assumed that the elderly generally moves slowly. That is to say, there would plenty of common key points in sequential images if the elderly is in a normal situation. If the elderly is accidental falling, there would be hardly key points that can be tracked. So that the key point tracking between two frames in the threshold is used to evaluate whether the elderly is in danger of falling.

The binocular camera model is shown in

The camera coordinate of a certain point in space is ^{c} _{c}_{c}_{c}_{c}_{l}_{l}, v_{r}_{r}, v_{l}_{l}, y_{r}_{r}, y

After simplification:

Based on this model, the stereo alignment algorithm, feature extraction algorithm, feature point matching algorithm and motion estimation algorithm are studied gradually to achieve the indoor positioning of the elderly. In addition, the feature point matching algorithm has been employed to monitor the falling of the elderly.

In actual situations, it is often difficult for two cameras to achieve ideal coplanar and line alignment conditions. Generally, the rotation matrix ^{w} _{w}_{w}_{w}^{cl}_{cl}_{cl}_{cl}^{cr}_{cr}_{cr}_{cr}^{cl}^{cr}

The expressions of ^{cl}^{cr}^{w}

In _{wcl}_{wcl} _{wcr}, are external parameters of the left and right camera positions. The combination of three formulas can be solved:

Because of the projection errors, the rotation matrix _{1}, _{2}, _{3} are the radial distortion parameters, and _{1}, _{2} are the tangential distortion parameters.

With the estimated rotation matrix _{l}_{r}_{l}_{r}

Afterwards, the left and right image planes are rotated about _{l}_{r}

However, the baselines of the two image planes are not parallel, so that the transformation matrix _{rect}_{1}^{T}_{2}^{T}_{3}^{T}^{T}_{1}=_{2}= [−_{y}_{x}^{T}_{3}= _{1}_{2}.

After all, the transformation matrix of the stereo correction can be expressed as

In this paper, the feature matching of adjacent images has been completed by extracting ORB points, which are composed of key points and descriptors. Likewise, the feature detection algorithm can also be divided into two parts, namely, the key point extraction and descriptor construction. The key point extraction algorithm also known as the oFAST algorithm [

The main steps of the oFAST algorithm are presented as follows:

Step 1: A circle at point _{1}∼_{16}, and the brightness value of point _{p}

Step 2: Set the threshold △_{1} to _{16}, the brightness of the 1st, 5^{th}, 9^{th}, and 13^{th} pixels are first detected on the circle. Only if over three quarters’ brightness of the pixels are greater than _{p+t}_{p−t}_{p+t}_{p-t}

Step 3: Repeat the above two steps and perform the same operation on all pixels.

Step 4: Remove locally dense candidate feature points, and calculate the FAST credit of each candidate point through

_{n}_{b}_{p}_{+t}, and _{l}_{p−t}

Step 5: Construct a Gaussian pyramid and add the scale invariance of key points. Set the number of pyramid levels ^{ ′}

Step 6: The direction vector is constructed by the gray-centroid method, to strengthen the rotation invariance of key points. In the neighborhood image block of the key point, the moment of the image block _{pq}

The centroid of the image block can be determined by the moment

Connecting the geometric center

After the extraction of the oFAST key points, the descriptor of each point needs to be calculated, where the improved BRIEF is used. BRIEF is a binary descriptor. Its description vector consists of 0 and 1, which encode the size relationship between two pixels near the key point (denoted as

For the feature point matching of the two images, the computation load is heavy through directly comparing the Hamming distance of each feature point, which hardly satisfies the real-time requirements. The approximate nearest-neighbor algorithm integrated in the Fast Library for Approximate Nearest Neighbors Open-source library is faster and adaptive for real-time occasions, but mismatches may occur more or less. The RANSAC algorithm can eliminate the mismatch by effectively calculating the homograph matrix. The homograph matrix is a conversion matrix that describes the mapping relationship between the corresponding points of two pictures on the plane, it is defined as follows [^{T} ^{T}

A pair of points can determine two equations, that is to say, at least 4 pairs of matching points are needed to determine the

Step 1: Randomly select 4 pairs of matching points to fit the model (that is, estimate the homograph matrix

Step 2: Due to the matching errors, the data points have certain fluctuations. Assuming that the error envelope is

Step 3: Randomly select 4 pairs of points again, and repeat the operations of Step 1 and Step 2 until the iteration stops;

Step 4: Find the homograph matrix

When tracking the feature points in successive frames, a time threshold was set to count the feature point quantity in the interval to evaluate the falling risk of the elderly. It is considered that the elderly generally moves slowly, so that if there are plenty of common feature points in in successive frames, the elderly is thought be safe; otherwise, if there are few common feature points in in successive frames, the elderly has falling risk.

For the corresponding feature points matching of the left and right images, this paper uses the area-based algorithm based on the BRIEF descriptor. The traditional block matching algorithm is shown in _{l}_{l}_{r}

Because of the existence of sheltering, the key points in the left image may not be able to find matching points in the right image. In the matching process, the sum of absolute differences _{1} is used as the similarity function to measure the matching degree of two points and the surrounding window, it is expressed in

In which, _{l}_{r}_{2} is shown in

With the tracks of the key points, the matching relations of them can be obtained to further estimate the camera motion. That is, after the stereo matching, the three-dimensional camera coordinate _{c}

The reprojection error, comparing the pixel coordinates (observed projection after matching) with 3D point projection of real-time pose estimation, is produced. For which, a nonlinear optimization has been adopted to find a possible solution. As shown in _{1} and _{2} are the projections of the same spatial point _{2}_{2}. Therefore, the pose of the camera needs be adjusted to reduce this difference. And there are many points to deal with, the error of each point is hardly zero.

In ^{T}_{1} are _{1} = [_{1}, _{1}]^{T}_{2} are _{2}^{∧} = [_{2}_{2}^{T}_{2} is _{2} = [_{2}, _{2}]^{T}_{2}-_{2}^{∧} represents the reprojection error. The ideal re-projection process is expressed by _{2} represents the depth of the spatial point _{2} is located, and _{1} to image _{2}, which can also be represented by

There are often more than one feature point observed in a camera pose. Assuming that there are

The experimental platform of this paper uses a laptop computer (Lenovo Xiaoxin Air15), and the running environment is the Ubuntu 16.04 operating system under the VirtualBox virtual machine. Using Intel's RealSense camera D435i, the camera is a global shutter, the frame rate is 30 fps, the image resolution is 1280 × 720, and the camera baseline is 5 cm. The experimental scene is indoor, and the sensor is handheld to move in the scenario to estimate the pose. The camera installation and its connection with computer is shown in

During the experiment, the ORB feature points were extracted and matched on two adjacent frames of images.

Then, the hand-held camera performed a slow linear motion, the sampling interval was 1 s. From the observation of the feature tracking between two frames, as shown in

Afterwards, the hand-held camera had been swung quickly to simulate the falling situation of the elderly, and two image frames were sampled at 1 s. The result in

The positioning experiment was also carried out indoor, including the linear reciprocating motion and arbitrary motion, to evaluate the positioning capability of the binocular scheme designed in this paper.

In this scenario, the handheld camera was kept at a certain height while moving straightly between two points at three different distances: 1 m, 3 m, and 5 m, respectively. The camera coordinate of the first image after the system initialization was defined as the world coordinate. The camera coordinate system was defined as: the facing direction of the camera lens was the positive direction of the

The trajectories of the linear motion at the three distances are shown in

It can be seen from

Movement process
End point positioning coordinates
Positioning error (m)
First linear motion
(0.03832, 0.9687)
0.0495
Second linear motion
(0.6433, 2.9735)
0.6438
The third linear motion
(−0.4996, −4.928)
0.5048

After holding the camera indoor for arbitrary movement, it returns to the start point, and the result of ORB feature point extraction of the collected images is shown in

Combining the experimental results of the carrier's linear reciprocating motion and arbitrary trajectory motion in the room, it can be seen that the binocular positioning algorithm designed in this paper has a good positioning capability for indoor environments. As for the fluctuating magnitude in y-axis is mainly caused by the walking up and down of the human body.

Aiming at the pressing concern of the indoor monitoring of the alone-living elderly, this paper proposed a positioning algorithm based on the binocular visual scheme through feature extraction, feature matching, and motion estimation, to finally obtain a high accuracy location of the indoor elderly. And feature matching is focused and modified. On one hand, the RANSAC algorithm has been adopted to eliminate the mismatch caused by the approximate nearest neighbor method; on the other hand, a cost function based on the BRIEF descriptor has been proposed as the feature information to improve the stereo matching accuracy. On this basis, the feature point comparison of two image frames within a certain time interval is used to determine whether the elderly is in falling danger. Three sets of experiments are carried out to verify the feasibility of the proposed method. Through the feature matching experiment, it can be intuitively seen that the RANSAC algorithm can effectively eliminate the mismatch; the contrast of the walking and falling situations in feature matching experiment also demonstrates the tracking efficacy of two images in the designed time threshold and verifies the feasibility of the falling danger evaluation; furthermore, the effectiveness and accuracy of the improved method with the BRIEF descriptor is verified by the indoor positioning experiments in different situations. It is worthy to mention that, there were position and attitude drifts due to accumulated errors in the measurement system. The further study will continue to work on this issue by adding auxiliary navigation, taking the inertial measurement unit for instance, to improve accuracy.