Improving Cache Management with Redundant RDDs Eviction in Spark

Yao Zhao; Jian Dong; Hongwei Liu; Jin Wu; Yanxin Liu

doi:10.32604/cmc.2021.016462

Open Access icon Open Access

ARTICLE

Improving Cache Management with Redundant RDDs Eviction in Spark

Yao Zhao¹, Jian Dong^1,*, Hongwei Liu¹, Jin Wu², Yanxin Liu¹

1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
2 School of Engineering, University of Georgia, Athens, 30602, USA

* Corresponding Author: Jian Dong. Email: email

Computers, Materials & Continua 2021, 68(1), 727-741. https://doi.org/10.32604/cmc.2021.016462

Received 02 January 2021; Accepted 03 February 2021; Issue published 22 March 2021

Abstract

Efficient cache management plays a vital role in in-memory data-parallel systems, such as Spark, Tez, Storm and HANA. Recent research, notably research on the Least Reference Count (LRC) and Most Reference Distance (MRD) policies, has shown that dependency-aware caching management practices that consider the application’s directed acyclic graph (DAG) perform well in Spark. However, these practices ignore the further relationship between RDDs and cached some redundant RDDs with the same child RDDs, which degrades the memory performance. Hence, in memory-constrained situations, systems may encounter a performance bottleneck due to frequent data block replacement. In addition, the prefetch mechanisms in some cache management policies, such as MRD, are hard to trigger. In this paper, we propose a new cache management method called RDE (Redundant Data Eviction) that can fully utilize applications’ DAG information to optimize the management result. By considering both RDDs’ dependencies and the reference sequence, we effectively evict RDDs with redundant features and perfect the memory for incoming data blocks. Experiments show that RDE improves performance by an average of 55% compared to LRU and by up to 48% and 20% compared to LRC and MRD, respectively. RDE also shows less sensitivity to memory bottlenecks, which means better availability in memory-constrained environments.

Keywords

Dependency-aware; cache management; in-memory computing; spark

Cite This Article

APA Style

Zhao, Y., Dong, J., Liu, H., Wu, J., Liu, Y. (2021). Improving Cache Management with Redundant RDDs Eviction in Spark. Computers, Materials & Continua, 68(1), 727–741. https://doi.org/10.32604/cmc.2021.016462

Vancouver Style

Zhao Y, Dong J, Liu H, Wu J, Liu Y. Improving Cache Management with Redundant RDDs Eviction in Spark. Comput Mater Contin. 2021;68(1):727–741. https://doi.org/10.32604/cmc.2021.016462

IEEE Style

Y. Zhao, J. Dong, H. Liu, J. Wu, and Y. Liu, “Improving Cache Management with Redundant RDDs Eviction in Spark,” Comput. Mater. Contin., vol. 68, no. 1, pp. 727–741, 2021. https://doi.org/10.32604/cmc.2021.016462

BibTex EndNote RIS

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Improving Cache Management with Redundant RDDs Eviction in Spark

Abstract

Keywords

Cite This Article

2289

1417

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link