An Improved Memory Cache Management Study Based on Spark

Suzhen Wang; Yanpiao Zhang; Lu Zhang; Ning Cao; Chaoyi Pang

doi:10.3970/cmc.2018.03716

Open Access icon Open Access

ARTICLE

An Improved Memory Cache Management Study Based on Spark

Suzhen Wang¹, Yanpiao Zhang¹, Lu Zhang¹, Ning Cao^{2, *}, Chaoyi Pang³

1 Hebei University of Economics and Business, Shijiazhuang, Hebei, 050061, China.
2 University College Dublin, Belfield, Dublin 4, Ireland.
3 The Australian e-Health Research Centre, ICT Centre, CSIRO, Australia.

* Corresponding Author: Ning Cao. Email: email .

Computers, Materials & Continua 2018, 56(3), 415-431. https://doi.org/10.3970/cmc.2018.03716

Download PDF

Abstract

Spark is a fast unified analysis engine for big data and machine learning, in which the memory is a crucial resource. Resilient Distribution Datasets (RDDs) are parallel data structures that allow users explicitly persist intermediate results in memory or on disk, and each one can be divided into several partitions. During task execution, Spark automatically monitors cache usage on each node. And when there is a RDD that needs to be stored in the cache where the space is insufficient, the system would drop out old data partitions in a least recently used (LRU) fashion to release more space. However, there is no mechanism specifically for caching RDD in Spark, and the dependency of RDDs and the need for future stages are not been taken into consideration with LRU. In this paper, we propose the optimization approach for RDDs cache and LRU based on the features of partitions, which includes three parts: the prediction mechanism for persistence, the weight model by using the entropy method, and the update mechanism of weight and memory based on RDDs partition feature. Finally, through the verification on the spark platform, the experiment results show that our strategy can effectively reduce the time in performing and improve the memory usage.

Keywords

Resilient distribution datasets, update mechanism, weight mode

Cite This Article

APA Style

Wang, S., Zhang, Y., Zhang, L., Cao, N., Pang, C. (2018). An Improved Memory Cache Management Study Based on Spark. Computers, Materials & Continua, 56(3), 415–431. https://doi.org/10.3970/cmc.2018.03716

Vancouver Style

Wang S, Zhang Y, Zhang L, Cao N, Pang C. An Improved Memory Cache Management Study Based on Spark. Comput Mater Contin. 2018;56(3):415–431. https://doi.org/10.3970/cmc.2018.03716

IEEE Style

S. Wang, Y. Zhang, L. Zhang, N. Cao, and C. Pang, “An Improved Memory Cache Management Study Based on Spark,” Comput. Mater. Contin., vol. 56, no. 3, pp. 415–431, 2018. https://doi.org/10.3970/cmc.2018.03716

BibTex EndNote RIS

Copyright © 2018 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

An Improved Memory Cache Management Study Based on Spark

Abstract

Keywords

Cite This Article

4714

1679

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link