Jing He1,2, Haonan Chen3,*, Lingxiao Li4, Yebin Zou5
1 Institute for Advanced Studies in Humanities and Social Sciences, Beihang University, Beijing, 100083, China
2Beijing Key Laboratory of Urban Spatial Information Engineering, Beijing, 100000, China
3 College of Geoscience and Surveying Engineering, China University of Mining and Technology-Beijing, Beijing, 100083, China
4 School of Literature, Capital Normal University, Beijing, 100089, China
5 School of Civil and Hydraulic Engineering, Ningxia University, Yinchuan, 750021, China
* Corresponding Author: Haonan Chen. Email:
(This article belongs to this Special Issue: Computational Mechanics Assisted Modern Urban Planning and Infrastructure)
Computer Modeling in Engineering & Sciences 2023, 135(1), 211-237. https://doi.org/10.32604/cmes.2022.020597
Received 02 December 2021; Accepted 13 May 2022; Issue published 29 September 2022
In the era of big data, we can obtain higher precision motion data. However, with the increase of dimension, the amount of calculation increases exponentially, and the difficulty of visualization also increases. How to solve it? Star coordinate is a high-dimensional data visualization technology, which is most widely studied in the fields of biology and medicine. An interesting application of star coordinates is vista system , which uses linear mapping to avoid cluster rupture after dimensional to 2D spatial mapping. Users can use the visual output to confirm the effectiveness of cluster structure. The main disadvantage of this method is that it can only be used for the visualization of dimensional data. So far, several Vista-like systems have been introduced. For example, maps , Section  and fastmap  are constellation-based visualization technologies, which are suitable for generating static clusters for multidimensional data. Since the substantive analysis of trajectory data may involve variables beyond space and time, Gatalsky et al. , based on the expansion of star coordinate technology, proposed stretchplot, an interactive positioning technology method similar to star coordinates for multidimensional spatio-temporal trajectory data, which allows users to map trajectory set variables to high-dimensional space and express them as connected linear sequences. It embeds sequential events (and the variables associated with the event) in entities and connects them according to their time sequence to form tracks. However, the way based on track lines is suitable for track sets with a small amount of data.
The behavior pattern mining and visualization of high-dimensional trajectory data face the problem that the data projection method is difficult to obtain data from high-dimensional space and map it to low-dimensional space with minimum error. When the data is complex and dynamic, it is difficult to establish a high-dimensional data mining and visualization model. Therefore, this paper establishes new trajectory interactive star coordinate models i-tStar and i-tStar (3D) for trajectory data of different dimensions. By setting measurement standards, detecting dimension similarity, detecting attribute similarity, reordering attribute axes, interactively manipulating data sets, adding labels to enhance clustering information, and designing an engine to guide cluster perception, Thus, the technical defects of the original Star coordinates are overcome, the star coordinates are applied to the dynamic space-time trajectory data, the technical reliability of the star coordinates for the visualization of high-dimensional data is improved, the layout configuration of the star coordinates is optimized, the cluster discovery is enhanced, and the point cloud clustering effect is better, to mine the evolution law of multi-attribute of any trajectory data set with time and space.
The value of this paper is: Based on the designed i-tStar and i-tStar (3D) methods, display the attribute patterns of mine trajectory data samples, and mine their internal associations and laws; the process of clustering exploration of the star coordinate system is realized, and a variety of interactive means supporting the design are displayed; based on the attribute merging method, the interaction behavior of multiple attributes is analyzed, and the correlation and influence of various indexes during tramcar operation are explained; the point cloud aggregation effects of i-tStar and i-tStar (3D) methods are compared. The experimental results show that the two methods can effectively realize the behavior pattern mining and visual analysis of multidimensional trajectory data.
In Star Coordinate, data points are represented as points, and data dimensions are represented by axes, i.e., . All of the axes here are radial lines starting from the origin and axes are inclined at an angle of . The angles between the axes of the original Star Coordinates are equal and all axes have the same length. The user can apply a scaling transformation to change the length of the axis, thereby increasing or decreasing the weight of the dimension to achieve the goal of optimizing the separation and resolution of the point cloud (cluster). The Star Coordinates maps the data instances to the visible space through a linear combination of axes, and the position of each data instance is given by :
where n is the data dimension and is the -th attribute axis. The point mapping from the -dimensional space to the two-dimensional Cartesian coordinates is determined by multiplying the sum of all unit vectors on each coordinate by the data element values of the coordinates.
Projecting high-dimensional data into a two-dimensional space inevitably introduces overlap and blur, even bias. This means that multiple points in the k-dimensional space can be mapped to one point in Cartesian space. In addition, the vector addition in the space of Star Coordinates must be valid to project all data points correctly on the Star Coordinate. However, the original Star Coordinates is converted to a range of by normalizing all data elements of the vector (including negative values), and the placement of independent dimensions on the opposite axis cannot cancel each other [6–8]). The design flaws inherent in the original Star Coordinates reduce the technical reliability of the Star Coordinates for data visualization. In addition, the original Star Coordinates also has problems such as hierarchical mapping of dimension maps, difficulty in characterizing dynamic data, and inflexibility of visual adjustment mechanisms. Therefore, it is necessary to improve the original Star Coordinates so that the high-dimensional trajectory data is characterized by the optimal configuration while revealing the interaction relationship of the trajectory data attributes.
Due to the above defects of the original Star Coordinates, it is not suitable for spatiotemporal data and semantic data. Therefore, it is necessary to evaluate the axis arrangement of traditional Star Coordinates and the quality of point cloud layout to establish a framework for a new interactive Star Coordinates model. Before doing this research, the technique was first named: interactive trajectory Star Coordinates (i-tStar).
Initially, the i-tStar design only adjusted the arrangement of the original Star Coordinates. There are still three problems: 1) it depends on the adjustment of the visual parameters to identify the overlap in multiple frames (visualization results are considered as frames); 2) visual distortion is inevitable, and the retained data clusters may overlap each other in the visualization; 3) the number of dimensions affects the view layout. When a small number of dimensions are involved, the layout produced by i-tStar is clear and readable (Fig. 1a). As the number of dimensionalities increases, the layout begins to get confused (Fig. 1b). When added to more dimensions, the results may become unreadable (Fig. 1c). Therefore, the scalability of i-tStar will be improved by redesigning from two aspects of point layout and axis. Among them, to adapt to the spatiotemporal feature of the trajectory data, the dimensions and attributes in the axis layout are separated.
The dimension arrangement idea of i-tStar visualization technology is to rearrange the data dimensions according to the similarity of data, that is, the similarity data dimensions are adjacent to each other. In order to deal with large-scale dynamic trajectory data sets, i-tStar uses three methods to measure the similarity between two data dimensions, namely distance dissimilarity (DSIM), Pearson correlation coefficient similarity (PSIM) and cosine similarity (CSIM). The calculation is as follows:
The similarity matrix is defined as , where varies between 0 and 1. If is closer to 1, the -th and -th dimensions are more similar; If the value is closer to 0, they are less similar.
The -th attribute in the data instance is represented as , and the variance of the attribute is given by:
where m is the number of instances and is the average of the jth attribute. If is closer to 0, the more similar the attributes j and k are considered. Continue to cluster similar properties after a given variance.
The PCA method is used to measure the similarity between attributes, and each attribute is treated as a point in the -dimensional space ( is the number of data instances). These points are mapped into a two-dimensional space using PCA, and if the two attributes are similar to each other after mapping, the two are considered to be similar. After doing PCA downscaling, those with less correlation are eliminated, and some information they more or less contain is lost. Then more or less it will affect the accuracy. But from another point of view, if the scale of the calculation is significantly reduced, the efficiency will be significantly improved, in a given limited time and cost, the efficiency is increased, which means that you can get better results.
The K-Means clustering algorithm groups similar attributes , and the centroid mechanism identifies similar attributes based on the cluster information. Specifically, given a training set, it is desired to group the data into several clusters. K-Means is intuitively represented as an iterative process that starts by guessing the initial clustering centroids and then repeatedly assigns samples to the closest centers, recalculating the centroids based on the assignment. The inner loop of the algorithm repeats two steps: assigning each training sample to its closest centroid, and recalculating the mean of each centroid using the points assigned to it. Note that the fusion solution may not always be ideal and depends on the initial setting of the center of mass. Therefore, in practice, the K-Means algorithm is usually run several times with different random initializations, and one way to select these different solutions from the different random initializations is to choose the solution with the lowest cost function value (distortion).
The centroid of cluster is given by:
where is the number of instances in the categories. Considering that each centroid can be used as a representative example of each cluster, first, construct a matrix M with a centroid as a column vector. Then, k-means calculations are performed on the row vectors of M to group attributes of similar centroids.
Through the above method, each calculated attribute of each group is arranged on i-tStar to generate each attribute axis, and the axis length is set to . By averaging the values 1 of all the attributes j in the corresponding group, the positional effect of each attribute axis on the instance can be obtained.
Arranging the dimension axes and attribute axes correctly is critical to revealing the patterns in the i-tStar layout . i-tStar offers two mechanisms for automatically arranging axes, one based on combinatorial optimization and the other based on a powerful mechanism. According to the similarity measure described in Section 3.1.2, if the similarity matrix S is a distribution, where k is the number of axes, then:
where () is the -th (-th) axis of Example , and () and () are the minimum and maximum values of the -th (-th) axis, respectively. If the matrix M is filled with other similarity measures based on correlation, the different axes of the data can be explored from other perspectives. The similarity matrix is represented as a complete Star Coordinates visualization with each node corresponding to one axis. According to the genetic algorithm , the best closed path connecting all nodes could be found.
The above steps provide the order in which the axes are placed. Next, a simple scheme for setting the angle 3 between axes 1 and 2 is introduced. Let W be the sum of the weights of the best paths found by the reordering process, then the angle maps to:
The forcing mechanism distributes the axes evenly in a uniform circle and then swaps their positions to find the optimal configuration. The layout evaluation is performed based on the layout quality metric, and the topology protection and the Dunn index are also used as quality indicators. Fig. 2 shows the axis configuration based on the optimization mechanism and the forcing mechanism rearrangement using simulation data. The combination optimization method changes the initial configuration, while the forcing mechanism only swaps some axes.
In i-tStar, the purpose of interactive exploration is to distinguish between visually overlapped clusters.
The normalized range of different parameters ( and varies with parameters) has a significant impact on the resulting visualization and interaction. In the setting of Kandogan’s system, although the normalized range causes visual tilt, the display area is used inefficiently. Therefore, this section draws on the setting of the VISTA model (normalization range ): assuming that the data points are samples from the joint multidimensional distribution, let x denote the random variables of the distribution. Correspondingly, the mapping result has a two-dimensional distribution, and y represents a two-dimensional distributed random variable. Aligning the visualization with the center is equivalent to aligning the two-dimensional distribution to 0, which means . Assuming that the parameter is independent of the data distribution, it can be expressed as:
Therefore, to make , or is required. Obviously, if the normalization range is set to , is required. And , indicating that the random variation of the visualization is evenly distributed to all directions around the center, which effectively utilizes the display space.
Adjusting in the range will also bring more dynamic information. Suppose the distribution of the target dimension i has two modes, and , . By adjusting , the movement along the axis i is and , respectively, and the distance between the two modes is . Therefore, increasing will separate them, and reducing will cause them to contract. Changing to will use to map the two modes from the mirror position to their original position. Therefore, a continuous change of in will produce a similar “rotation” effect, showing the user more information.
The interaction of parameter range settings is an important factor affecting interactive cluster visualization . Because the purpose of exploration is to distinguish visually overlapping clusters, it is hoped to maximize the utility of each interaction (such as parameter adjustment) towards the goal. It is well known that linear mapping does not destroy clusters, but may lead to cluster overlap. Fig. 3 shows the original data distribution from the simulated dataset, which contains 100 data points and 4 clusters. Fig. 3a depicts the raw data distribution of the dataset. Fig. 3b uses the K-means clustering algorithm to cluster and show its distribution, with some clusters creating an overlap. Fig. 3c is a -normalized setup using to represent a particular model. The results show that the cluster distribution performed by the interaction shows better resolution.
The scaling of data manipulation allows the user to change the length of one or more axes simultaneously, thereby increasing or decreasing the impact of a particular column of data (specific dimensions or features) on the visualization results , the basic idea is to recalculate the contribution of the attribute by multiplying the ratio and the “mapping” formula, and re-mapping according to the new scaling factor, as shown in the following equation:
By using axis scaling interactively, the user can observe the dynamic change of the data distribution, which is:
where provides visually tunable parameters. covers a fairly large range of mapping functions, and this range combined with a scaling factor of c is sufficient to find a satisfactory visualization. For example, set all axis scale dimensions for all of the first attributes (axes) to 1, and the data points are observed as coarsely scattered points on each attribute, as shown in Fig. 4a; when the scale size of axis 1 is set to 0.2, some form of the cluster is displayed, as shown in Fig. 4b. This proves that when the data of different factors belong to the same cluster, the visualization of data similarity is usually generated.
Rotating axes make a particular data attribute more or less related to other attributes by modifying the direction of the axis unit vector and changing the correlation of the corresponding feature axis to other feature axes. The immediate benefit is to effectively solve the overlap problem, and help the user distinguish clusters that may be mistakenly overlapped. Model the Star Coordinates using the Euler formula:
Among them, , i is imaginary units. As shown in the experimental results, adjusting the scaling transformation is sufficient to find a satisfactory visualization. Therefore, can be kept as . However, rotation changes the angle of the axis and redistributes the scatter plot as follows:
The user can rotate a particular property by adjusting the angle value of the axis, recalculating and re-mapping the data as the angle changes. Fig. 4c shows the results of point clustering after the axis is rotated.
Coloring is the classification of data based on similar factors, and assigns colors to each set of factors to achieve visual or clear clustering of information data distribution. It creates another dimension of data visualization, which can be classified as an interactive feature because the user is free to choose different color values in the various color representation dimensions. Based on the same data, Fig. 4d clearly indicates the two generated clusters.
As described in the literature , when the number of dimensions exceeds 50, the use of user interaction does not effectively visualize the data, and the cluster overlap problem cannot be solved. It can be found that this problem could be solved by marking a small amount of data in i-tStar. The tag information used for data clustering is identifiable. According to the experimental situation, satisfactory results can also be obtained by using limited tags, i.e., unsupervised clustering , including available scenarios for two clusters and more than two clusters.
There are two types of tags that can be used for the data portion of the tag. One set of k-dimensional samples is labeled and the other set of samples is labeled . Since the tag information is typically limited, (the first set of tag data points) and (the second set of tag data points) are much smaller than the total number of data points N (, ) in the dataset. Use the label to find the best -adjustment that projects the k-dimensional data into a two-dimensional space such that the mapped clusters are heterogeneous or isomorphic . To this end, the Fisher discriminant is used as a linear classification of the objective function.
In Eq. (14), is the Fisher discriminant, and and represent the distance between the clusters and the cluster respectively, based on the axis scaling parameter , inter-cluster scattering matrix and intra-cluster scattering matrix . The increase of the distance between clusters means that the clustering pattern is more separated, and the decrease in the distance within the cluster indicates that the clusters in the mapping space are denser. To find the optimal axis scaling parameter , the sum of the Euclidean distances of each point and its cluster mean can be minimized and the distance between the mean (centroid) of the cluster can be maximized.
If there are more than 2 clusters (), the visualization information provided by the partial data can be used to enhance the visualization results. The general form of the scatter matrix within a cluster is:
The generalized form of can be defined as the following Fisher discriminant:
where is the average of the tagged data in each cluster and can be calculated as Eq. (23). Define the total average vector , then:
Using the generalized Eq. (19), it can be got:
where is the average of the i dimension of the marker data in the -th cluster, and is the average of the j dimension of all marker data. Finally, the target function can be demonstrated as:
By maximizing , it could be found that the best vector to get dense and separate cluster visualization results. Using the computed vector and Star Coordinates mapping, the optimal projection of k-dimensional data into a two-dimensional space can be achieved.
In the configuration described above, the visual perception of the cluster is enhanced. However, when visualizing higher dimensional data, even if a possible parameter adjustment method is provided, it is difficult or even impossible for the user to achieve favorable adjustments. Therefore, this section attempts to solve this problem using cluster recognition to achieve the separation of target clusters with a minimum number of interactions.
The engine design consists of three steps, including information object transformation, dimension mapping, and interactive functional design. Step 3 has been explained in Section 3.2. Steps 1 and 2 are described below.
Suppose the target dataset is a six-dimensional dataset with six attributes . Step 1 involves converting an information object from a data file, which essentially allocates values to non-numeric objects. The data is then arranged into a matrix with columns representing the dimensions and row values for each field in the record. Fig. 5 shows the matrix model of the information object .
Step 2 involves mapping each information object onto an axis. The axis representing the dimension is derived from the common origin and can be conveniently represented as in the Cartesian coordinate system, as shown in Fig. 6. Each vector is calculated by multiplying the distance by its corresponding unit vector, which is oriented in the direction of the axis , followed by the vector of the final projected point.
The cluster detection of Star Coordinates not only improves the efficiency of axis operations with higher cluster quality, but also allows users to analyze the relationship between cluster and data attributes. To achieve this goal, Approximated Silhouette Index (ASI) could be used  to assess cluster quality based on inter-cluster distance and intra-cluster distance. This approach requires the construction of an SI view to inform the user of the quality of the real-time projection.
To get the best projection matrix, the maximum global contour index is obtained by the energy function, it can be expressed as:
where n is the number of data points, is the m dimensional data point, is a linear transformation that maps the of the m dimension to the of the l dimension (lower dimension) by the matrix product.
Let be the Euclidean distance between and in the -dimensional space. Approximate contour index :
The projection space can visualize and explore the influence of different data attributes when separating point clouds. Therefore, the quality of the clustering structure is evaluated by calculating the contour index in the projected space: point-based ASI averages the points within the cluster and defines cluster-based to measure the SI value of each cluster. In addition, the global for all clusters is defined:
The constructed SI view is used to reflect the quality of the real-time projection point cloud. The whole process is as follows: First, the data points in each cluster are sorted in descending order of SI value , and the SI values (horizontal: −ve on the left and +ve on the right) are plotted as data point (vertical) clusters after sorting from top to bottom clusters in the SI view. For data points with an SI value of +ve, they are colored using their associated cluster color, and for data points with a SI value of −ve, the cluster color currently misclassified at that point is used to help the user quickly understand how to merge (or mix) between the cluster. As shown in Fig. 7, the view in Fig. 1 is supplemented by Si view, which can effectively visualize the overall cluster quality and individual cluster quality.
i-tStar is designed to display multidimensional data in a two-dimensional visualization space, and its natural extension is to extend the visualization space to three dimensions . This approach extends the data exploration space and helps discover subtle patterns hidden in the 2D space, but two flaws still exist: the original data symbols cannot be preserved (no signals in the Star Coordinates), and the opposite axis configuration (two irrelevant attributes may cancel each other out). This section will introduce a 3D visualization algorithm for complex high-dimensional data, which extends i-tStar to 3D star coordinate system, which is called i-tStar (3D) in this paper.
The spherical coordinate visualization model is shown in the following equation :
where v is the original value and is the normalized result value. Then, the map maps the -dimensional points onto the three-dimensional space with the convenience of visual parameter adjustment. Let the three-dimensional point represent the image , of the F-dimensional normalized data points in the three-dimensional space. is determined by the average of the vector sums of the d vectors , where is the spherical coordinate representing the d dimension in the three-dimensional visual space. According to the A mapping, the three-dimensional projection point is determined by the following formula:
Here, the vector is an adjustable scaling parameter; the initial rotation parameters and are set to , which can be adjusted later. The point refers to the center of the display area. The A map is a linear map with fixed values of , , . If the center o is fixed, the mapping can be expressed as , where
is a linear transformation that will not break down the cluster in the visualization, but it may cause cluster overlap . Separating clusters that may overlap can be achieved with interactive visualization through interactive visualization.
In order to distinguish the visual differences between i-tStar (3D) and i-tStar, the three-dimensional Star Coordinates are combined with the spherical coordinate system.
The process of manual intervention to determine the optimal configuration for projecting high-dimensional data in low-dimensional space  is cumbersome and may need to browse a large number of configurations. The proposed algorithm will enable the user to obtain the best projection by eliminating the need for manual browsing in all possible configurations, as shown in Table 1.
If there is a large number of dimensions and records in the dataset, it is effective to combine semi-supervised clustering with three-dimensional visual clustering, that is, to find the optimal projection distance metric given by the matrix M. The following are several alternatives for modeling and evaluating the best projection distance metrics for advanced data analysis, interactive visual clustering flexibility, and manual parameter adjustment.
If using the category label for annotation, the canonical variable  can be used to get the spherical coordinates of the optimal projection distance metric M. According to Bishop , the canonical variables of the three-dimensional projection can be obtained as follows:
For each cluster, first form the Mahalanobis covariance matrix and the mean , and then define the weighted covariance matrix , where is the data instance in cluster k Quantity, c is the total number of clusters.
Using μ, the average of the entire dataset and , the average of each cluster k, form a matrix .
An optimal projection matrix having three first eigenvectors is formed to be projected into the three-dimensional space.
After obtaining the projection matrix , the matrix equation is solved in the following equation by elemental decomposition.
If are not all zero, then Eq. (1) has a unique solution of , and . This method is similar to converting spherical coordinates to spherical coordinates: , , , . If the projection matrix is a non-singular matrix, it may correspond to a unique set of , and visualized by the Star Coordinates.
The use of Fisher discriminant analysis usually makes implicit assumptions about the polynomial distribution of the data. When there is no specific assumption of the data distribution, the distance metric can be obtained from the set of similarity and dissimilarity pairs by optimizing the function of reducing the distance between similar items while increasing the distance between different pairs of items. When exploring the projection distance metric M of a dataset separated in a three-dimensional projection space (rather than the original space), it is defined as the distance between two items and in the projected three-dimensional space:
For the case of processing a set of similar pairs S and a set of dissimilar pairs D, assuming that some items process category labels, items having the same category label form a similarity set S, and items having different labels form a dissimilarity set D.
To illustrate the efficacy of the i-tStar (3D) algorithm, the performance of i-tStar (3D) was compared with that of the i-tStar, and simulated data sets were used in the empirical analysis. The simulated data is composed of three types of Gaussian distribution data in five dimensions, and the mean and covariance matrices used are given by the following formula:
Fig. 8 shows the results obtained using i-tStar and i-tStar (3D) projections. The i-tStar (3D) algorithm seems to render better visualizations because of the clear images involving three classes. This may be due to the fact that in some data sets, the projection obtained by the i-tStar algorithm involves more fuzzy indications of classes than the i-tStar (3D) algorithm, and data points are relatively sparsely distributed with no clear boundaries between two of the three classes involved.
In the mining field, open pit mining processes often rely on large mining trucks as the primary means of transport. According to the GPS receiving module installed on the mining truck, the GPS satellite signal is periodically received to obtain the real-time three-dimensional coordinates of the truck, and a large amount of trajectory data is accumulated as the truck moves continuously. The mine car data has general features or metadata combined with spatiotemporal data, the spatial dimension of which exists in the expressed geolocation characters, and the time dimension represents the continuity of these data over time. As a result, these data are multidimensional in space and time. Moreover, the movement process of the mine car is accompanied by changes in direction, speed, tire temperature and tire pressure, which constitute the variable data of the mine car, which is the property of the mine car. Therefore, our dataset represents continuous time data collection for a mining area in Inner Mongolia, China, from June 28, 2016 to August 30, 2016, it consists of four-dimension (three-dimensional geospatial, time) and four-attribute-trajectory data (direction, speed, tire temperature, tire pressure). To facilitate visualization, instead of distinguishing between multidimensional and multivariate conceptual operations, they are treated as data instances of eight dimensions that describe the statistics of all the relevant information that the mine car has. This paper hopes to use i-tStar and i-tStar (3D) to realize the mining and visual modeling of a high-dimensional trajectory dataset.
We use DSIM, PSIM and CSIM to measure the similarity between the two data dimensions, and then use the data set visualization of the proposed multi-class method to confirm the best visualization effect of the number of tags. Figs. 9a to 9d show the visualization results of uniform star coordinates, i-tStar of DSIM based dataset, i-tStar of PSIM based dataset and i-tStar of CSIM based dataset, respectively. It can be seen that some clusters are overlapped based on uniform star coordinates, which cannot achieve the perfect separation of clusters, including some mixed clusters. The latter three methods of configuring constellation coordinate layout can better separate clustering. All modified star coordinates are better than standard star coordinates, and the i-tStar visualization effect of the data set based on DSIM is the best.
In order to visualize multiple clusters in multidimensional trajectory dataset, one visual space is not enough to show the separation of clusters. The visualization of dataset using the proposed multi-class method effectively solves this problem. In this case, samples from multiple classes are randomly selected as marker data input. As shown in Figs. 9e∼9h, we marked a small number of data samples, including 3 samples from the class, 4 samples from the class and 5 samples from the class. Although the number of labeled samples will affect the proposed method, the results are satisfactory over a wide range of values. We show that the best data visualization is achieved where the axis is adjusted until the mapping point cloud (cluster) in the mapping plane is as dense and separated as possible. I-tStar aims to achieve this optimal mapping. Even if the number of labeled samples is limited, users can easily identify the visual results using a set of labeled samples. This method automatically and clearly shows the clustering without any direct user participation. And the minimized cluster overlapping region proves the effectiveness of i-tStar, and the results are very close to our previous reasoning. Therefore, our subsequent experimental data visualization is based on the labeled DSIM i-tStar.
Then do further analysis and merge the relevant attributes. The PCA-based clustering algorithm is used to cluster some attributes of the dataset. This process is a collection of the time axis and the tire temperature axis, the speed axis and the tire pressure axis. The axis starts at 12 o’clock, and clockwise is the elevation axis, the longitude axis, the latitude axis, the time/tire temperature axis, the tire pressure/speed axis, and the direction axis. The attributes assigned to the same axis indicate that they are highly correlated. (tire pressure and speed, time and temperature). The i-tStar visualization results are shown in Fig. 10a. After cluster identification, it can be seen from Fig. 10b that the layout also shows three clusters of stay, no-load, and full-load (the stay point accounts for about 5%, the no-load point accounts for about 30%, and the full-load point accounts for about 65%).
The initial state of the six-dimensional experimental process and the clustering result generated by the interactive manipulation process are also indicated, and the link between the SI view and the projected view is also implemented to show the importance of the cluster, as shown in Fig. 11, it shows i-tStar attribute clustering based on PCA and variance, in addition of 11 different layouts of the produced dataset that rearranged. The distribution of point clouds has changed, as well as the discrete and aggregated features of the cluster.
Fig. 12 illustrates the actual interactive resource operations. For example, certain attributes first perform scaling and rotation operations interactively to better differentiate three clusters (fully loaded, empty, stay), and move interactively from one cluster to another. In Fig. 12a, the combined attributes use time and tire temperature, speed, and tire pressure as clustering attributes. In Fig. 12b, the combined attributes use time and tire temperature, speed, and elevation as clustering attributes. The reason for this is that the tire pressure property in Fig. 12b has moved from the red axis to the green axis, and the elevation attribute has been swapped. The lens is used to describe the contents of the clustered axis.
Fig. 12a shows that in the purple lens, the clusters with high-pressure values and low-speed values represent fully loaded trucks, and those with low-pressure values and high-speed values indicate empty trucks. The stay point is observed at the vicinity of the two axes and the origin, which indicates that the tire pressure and speed are significantly affected and the two values cancel each other out during the stay; in the blue lens, the clusters of empty trucks exist in the place where the time and temperature values are large, and the clusters fully loaded trucks exist in time and the temperature values are small or where the two axes are close to the origin. The position of the stop point indicates that the dwell state is not related to temperature and time, and the correlation between the two is stronger.
Fig. 12b shows that in the purple lens, where the pressure value is high and the elevation value is low, most of the clusters are fully loaded trucks. Where the pressure value is low and the speed value is high, most of the clusters are empty trucks, and the stay point is on the axis. In the blue lens, most of the empty-truck clusters exist in places where the time and temperature values are great, and most of the full-truck clusters exist in places where the time and temperature values are small or the neighborhood of the origin. Although the distribution of point clouds differs from Fig. 12a, the overall trend is the same, and the time, elevation, tire temperature, tire pressure, and speed are highly relevant to the three clusters. These visualizations further validate the behavioral patterns of multi-attribute interactions in mine cars.
In general, i-tStar achieves better data mining and visualization effects in high-dimensional relationship distribution, and can classify non-numeric data, that is, clusters are visualized during data mapping, and i-tStar shows the dispersion distribution of attribute correlations. Although the degree of separation between some clusters is small, it can be seen that all clusters are separated from each other.
We express the visual presentation using i-tStar in Section 5.2.1 in the form of i-tStar (3D). The automatic configuration of i-tStar (3D) reveals the hidden mode in complex data sets without human intervention. On the premise of necessity, semi-supervised clustering is realized.
The experimental results show that in i-tStar, the basic representation of data is essentially two-dimensional, the display is essentially two-dimensional, and the input device is essentially two-dimensional. When there is no obvious separation between two of the three classes in the i-tStar display database, the result is similar to the scatter diagram. On the contrary, the projection results produced by i-tStar (3D) projection algorithm have clear category separation, clear boundaries and compact clusters, that is, it provides a better data trend than i-tStar projection. Therefore, to some extent, it can be explained that compared with the visualization technology of i-tStar, i-tStar (3D) reveals the hidden patterns in the data and helps to better visualize the complex high-dimensional data.
As a valuable extension of i-tStar, i-tStar (3D) not only retains all the functions of i-tStar, but also provides and makes use of the new three-dimensional aspects of the system. It is easy to note that i-tStar (3D) projection has a higher degree of freedom because i-tStar (3D) visualization algorithm defines a process to select the best configuration for 3D projection using clustering validity index. In general, compared with i-tStar technology, i-tStar (3D) has the following advantages: 1) System rotation allows to maintain the configuration of data while considering different views; 2) The infinite expansion of the volume relative to the surface allows easier discovery of the structure of the data; 3) The attribute reference provided can be used to perform more complex multivariate analysis.
Based on the original Star Coordinates in high-dimensional data visualization technology, we improved i-tStar for high-dimensional trajectory data and extended i-tStar to i-tStar (3D) with better visualization. This type of model is not only the most scalable technique for visualizing high-dimensional trajectory big data, but also can be used for exploratory tasks such as cluster analysis, outlier detection, trend prediction or decision making. Obviously, any projection will result in loss of information and inevitably have cluster overlap. We implemented i-tStar and i-tStar (3D) in a variety of aspects to perform a complete and complementary visual search of high-dimensional data based on local and global patterns in an iterative visual search process. More importantly, we point out their strengths and weaknesses, which are based on guiding recommendations for future research.
Funding Statement: Beijing Key Laboratory of Urban Spatial Information Engineering, Grant No. 20220105. Ningxia Natural Science Foundation, No. 2021AAC03060.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.