Table of Content

Open Access iconOpen Access

ARTICLE

A Scalable Method of Maintaining Order Statistics for Big Data Stream

Zhaohui Zhang*,1,2,3, Jian Chen1, Ligong Chen1, Qiuwen Liu1, Lijun Yang1, Pengwei Wang1,2,3, Yongjun Zheng4

School of Computer Science and Technology, Donghua University, 201620, China.
The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, 201804, China.
Shanghai Engineering Research Center of Network Information Services, 201804, China.
School of Electronics, Computing and Mathematics, University of Derby, Derby, United Kingdom.

* Corresponding Author: Zhaohui Zhang. Email: email.

Computers, Materials & Continua 2019, 60(1), 117-132. https://doi.org/10.32604/cmc.2019.05325

Abstract

Recently, there are some online quantile algorithms that work on how to analyze the order statistics about the high-volume and high-velocity data stream, but the drawback of these algorithms is not scalable because they take the GK algorithm as the subroutine, which is not known to be mergeable. Another drawback is that they can’t maintain the correctness, which means the error will increase during the process of the window sliding. In this paper, we use a novel data structure to store the sketch that maintains the order statistics over sliding windows. Therefore three algorithms have been proposed based on the data structure. And the fixed-size window algorithm can keep the sketch of the last W elements. It is also scalable because of the mergeable property. The time-based window algorithm can always keep the sketch of the data in the last T time units. Finally, we provide the window aggregation algorithm which can help extend our algorithm into the distributed system. This provides a speed performance boost and makes it more suitable for modern applications such as system/network monitoring and anomaly detection. The experimental results show that our algorithm can not only achieve acceptable performance but also can actually maintain the correctness and be mergeable.

Keywords


Cite This Article

Z. Zhang, J. Chen, L. Chen, Q. Liu, L. Yang et al., "A scalable method of maintaining order statistics for big data stream," Computers, Materials & Continua, vol. 60, no.1, pp. 117–132, 2019.



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1894

    View

  • 1018

    Download

  • 0

    Like

Related articles

Share Link