Table of Content

Open Access iconOpen Access

ARTICLE

An Improved Memory Cache Management Study Based on Spark

Suzhen Wang1, Yanpiao Zhang1, Lu Zhang1, Ning Cao2, *, Chaoyi Pang3

Hebei University of Economics and Business, Shijiazhuang, Hebei, 050061, China.
University College Dublin, Belfield, Dublin 4, Ireland.
The Australian e-Health Research Centre, ICT Centre, CSIRO, Australia.

* Corresponding Author: Ning Cao. Email: email.

Computers, Materials & Continua 2018, 56(3), 415-431. https://doi.org/10.3970/cmc.2018.03716

Abstract

Spark is a fast unified analysis engine for big data and machine learning, in which the memory is a crucial resource. Resilient Distribution Datasets (RDDs) are parallel data structures that allow users explicitly persist intermediate results in memory or on disk, and each one can be divided into several partitions. During task execution, Spark automatically monitors cache usage on each node. And when there is a RDD that needs to be stored in the cache where the space is insufficient, the system would drop out old data partitions in a least recently used (LRU) fashion to release more space. However, there is no mechanism specifically for caching RDD in Spark, and the dependency of RDDs and the need for future stages are not been taken into consideration with LRU. In this paper, we propose the optimization approach for RDDs cache and LRU based on the features of partitions, which includes three parts: the prediction mechanism for persistence, the weight model by using the entropy method, and the update mechanism of weight and memory based on RDDs partition feature. Finally, through the verification on the spark platform, the experiment results show that our strategy can effectively reduce the time in performing and improve the memory usage.

Keywords


Cite This Article

APA Style
Wang, S., Zhang, Y., Zhang, L., Cao, N., Pang, C. (2018). An improved memory cache management study based on spark. Computers, Materials & Continua, 56(3), 415-431. https://doi.org/10.3970/cmc.2018.03716
Vancouver Style
Wang S, Zhang Y, Zhang L, Cao N, Pang C. An improved memory cache management study based on spark. Comput Mater Contin. 2018;56(3):415-431 https://doi.org/10.3970/cmc.2018.03716
IEEE Style
S. Wang, Y. Zhang, L. Zhang, N. Cao, and C. Pang, “An Improved Memory Cache Management Study Based on Spark,” Comput. Mater. Contin., vol. 56, no. 3, pp. 415-431, 2018. https://doi.org/10.3970/cmc.2018.03716



cc Copyright © 2018 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2283

    View

  • 1085

    Download

  • 0

    Like

Related articles

Share Link