Open Access
ARTICLE
A Spark Scheduling Strategy for Heterogeneous Cluster
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, P.R.C.
Eberly College of Science, Pennsylvania State University, Old Main, State College, PA 16801, U.S.A.
* Corresponding Author: Gongshen Liu. Email: .
Computers, Materials & Continua 2018, 55(3), 405-417. https://doi.org/10.3970/cmc.2018.02527
Abstract
As a main distributed computing system, Spark has been used to solve problems with more and more complex tasks. However, the native scheduling strategy of Spark assumes it works on a homogenized cluster, which is not so effective when it comes to heterogeneous cluster. The aim of this study is looking for a more effective strategy to schedule tasks and adding it to the source code of Spark. After investigating Spark scheduling principles and mechanisms, we developed a stratifying algorithm and a node scheduling algorithm is proposed in this paper to optimize the native scheduling strategy of Spark. In this new strategy, the static level of nodes is calculated, the dynamic factors such as the length of running tasks, and CPU usage of work nodes are considered comprehensively. And through a series of comparative experiments in alienation cluster, the new strategy costs less running time and lower CPU usage rate than the original Spark strategy, which verifies that the new schedule strategy is more effective one.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.