Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.063047
Special Issues
Table of Content

Open Access

REVIEW

A Survey of Spark Scheduling Strategy Optimization Techniques and Development Trends

Chuan Li, Xuanlin Wen*
Department of Computer Science and Technology, School of Computer Science, Xi’an University of Posts and Telecommunications, Xi’an, 710100, China
* Corresponding Author: Xuanlin Wen. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.063047

Received 03 January 2025; Accepted 06 March 2025; Published online 01 April 2025

Abstract

Spark performs excellently in large-scale data-parallel computing and iterative processing. However, with the increase in data size and program complexity, the default scheduling strategy has difficulty meeting the demands of resource utilization and performance optimization. Scheduling strategy optimization, as a key direction for improving Spark’s execution efficiency, has attracted widespread attention. This paper first introduces the basic theories of Spark, compares several default scheduling strategies, and discusses common scheduling performance evaluation indicators and factors affecting scheduling efficiency. Subsequently, existing scheduling optimization schemes are summarized based on three scheduling modes: load characteristics, cluster characteristics, and matching of both, and representative algorithms are analyzed in terms of performance indicators and applicable scenarios, comparing the advantages and disadvantages of different scheduling modes. The article also explores in detail the integration of Spark scheduling strategies with specific application scenarios and the challenges in production environments. Finally, the limitations of the existing schemes are analyzed, and prospects are envisioned.

Keywords

Spark; scheduling optimization; load balancing; resource utilization; distributed computing
  • 68

    View

  • 16

    Download

  • 0

    Like

Share Link