Open Access
ARTICLE
SMConf: One-Size-Fit-Bunch, Automated Memory Capacity Configuration for In-Memory Data Analytic Platform
1 Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
2 Department of Informatics, University of Zurich, Zurich, CH-8050, Switzerland
* Corresponding Author: Yi Liang. Email:
Computers, Materials & Continua 2021, 66(2), 1697-1717. https://doi.org/10.32604/cmc.2020.012513
Received 02 July 2020; Accepted 31 July 2020; Issue published 26 November 2020
Abstract
Spark is the most popular in-memory processing framework for big data analytics. Memory is the crucial resource for workloads to achieve performance acceleration on Spark. The extant memory capacity configuration approach in Spark is to statically configure the memory capacity for workloads based on user’s specifications. However, without the deep knowledge of the workload’s system-level characteristics, users in practice often conservatively overestimate the memory utilizations of their workloads and require resource manager to grant more memory share than that they actually need, which leads to the severe waste of memory resources. To address the above issue, SMConf, an automated memory capacity configuration solution for in-memory computing workloads in Spark is proposed. SMConf is designed based on the observation that, though there is not one-size-fit-all proper configuration, the one-size-fit-bunch configuration can be found for in-memory computing workloads. SMConf classifies typical Spark workloads into categories based on metrics across layers of Spark system stack. For each workload category, an individual memory requirement model is learned from the workload’s input data size and the strong-correlated configuration parameters. For an ad-hoc workload, SMConf matches its memory requirement signature to one of the workload categories with small-sized input data and determines its proper memory capacity configuration with the corresponding memory requirement model. Experimental results demonstrate that, compared to the conservative default configuration, SMConf can reduce the memory resource provision to Spark workloads by up to 69% with the slight performance degradation, and reduce the average turnaround time of Spark workloads by up to 55% in the multi-tenant environments.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.