Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

Research on Performance Optimization of Spark Distributed Computing Platform

Qinlu He^1,*, Fan Zhang¹, Genqing Bian¹, Weiqi Zhang¹, Zhen Li²

CMC-Computers, Materials & Continua, Vol.79, No.2, pp. 2833-2850, 2024, DOI:10.32604/cmc.2024.046807

Abstract Spark, a distributed computing platform, has rapidly developed in the field of big data. Its in-memory computing feature reduces disk read overhead and shortens data processing time, making it have broad application prospects in large-scale computing applications such as machine learning and image processing. However, the performance of the Spark platform still needs to be improved. When a large number of tasks are processed simultaneously, Spark’s cache replacement mechanism cannot identify high-value data partitions, resulting in memory resources not being fully utilized and affecting the performance of the Spark platform. To address the problem that… More >

Open Access

ARTICLE

Deep Learning Model for Big Data Classification in Apache Spark Environment

T. M. Nithya^1,*, R. Umanesan², T. Kalavathidevi³, C. Selvarathi⁴, A. Kavitha⁵

Intelligent Automation & Soft Computing, Vol.37, No.3, pp. 2537-2547, 2023, DOI:10.32604/iasc.2022.028804

Abstract Big data analytics is a popular research topic due to its applicability in various real time applications. The recent advent of machine learning and deep learning models can be applied to analyze big data with better performance. Since big data involves numerous features and necessitates high computational time, feature selection methodologies using metaheuristic optimization algorithms can be adopted to choose optimum set of features and thereby improves the overall classification performance. This study proposes a new sigmoid butterfly optimization method with an optimum gated recurrent unit (SBOA-OGRU) model for big data classification in Apache Spark. More >

Open Access

ARTICLE

Analysis of CLARANS Algorithm for Weather Data Based on Spark

Jiahao Zhang, Honglin Wang^*

CMC-Computers, Materials & Continua, Vol.76, No.2, pp. 2427-2441, 2023, DOI:10.32604/cmc.2023.038462

Abstract With the rapid development of technology, processing the explosive growth of meteorological data on traditional standalone computing has become increasingly time-consuming, which cannot meet the demands of scientific research and business. Therefore, this paper proposes the implementation of the parallel Clustering Large Application based upon RANdomized Search (CLARANS) clustering algorithm on the Spark cloud computing platform to cluster China’s climate regions using meteorological data from 1988 to 2018. The aim is to address the challenge of applying clustering algorithms to large datasets. In this paper, the morphological similarity distance is adopted as the similarity measurement… More >

Open Access

ARTICLE

A Parallel Approach for Sentiment Analysis on Social Networks Using Spark

M. Mohamed Iqbal^1,*, K. Latha²

Intelligent Automation & Soft Computing, Vol.35, No.2, pp. 1831-1842, 2023, DOI:10.32604/iasc.2023.029036

Abstract The public is increasingly using social media platforms such as Twitter and Facebook to express their views on a variety of topics. As a result, social media has emerged as the most effective and largest open source for obtaining public opinion. Single node computational methods are inefficient for sentiment analysis on such large datasets. Supercomputers or parallel or distributed processing are two options for dealing with such large amounts of data. Most parallel programming frameworks, such as MPI (Message Processing Interface), are difficult to use and scale in environments where supercomputers are expensive. Using the… More >

Open Access

ARTICLE

Research on Optimization of Random Forest Algorithm Based on Spark

Suzhen Wang¹, Zhanfeng Zhang^1,*, Shanshan Geng¹, Chaoyi Pang²

CMC-Computers, Materials & Continua, Vol.71, No.2, pp. 3721-3731, 2022, DOI:10.32604/cmc.2022.015378

Abstract As society has developed, increasing amounts of data have been generated by various industries. The random forest algorithm, as a classification algorithm, is widely used because of its superior performance. However, the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features, thereby affecting its classification accuracy, and resulting in a low data calculation efficiency in the stand-alone mode. In response to the aforementioned problems, related optimization research was conducted with Spark in the present paper. This improved random forest algorithm performs feature extraction according More >

Open Access

ARTICLE

Effects of Spark Energy on Spark Plug Fault Recognition in a Spark Ignition Engine

A. A. Azrin^1,*, I. M. Yusri^1,2, M. H. Mat Yasin³, A. Zainal⁴

Energy Engineering, Vol.119, No.1, pp. 189-199, 2022, DOI:10.32604/EE.2022.017843

Abstract The increasing demands for fuel economy and emission reduction have led to the development of lean/diluted combustion strategies for modern Spark Ignition (SI) engines. The new generation of SI engines requires higher spark energy and a longer discharge duration to improve efficiency and reduce the backpressure. However, the increased spark energy gives negative impacts on the ignition system which results in deterioration of the spark plug. Therefore, a numerical model was used to estimate the spark energy of the ignition system based on the breakdown voltage. The trend of spark energy is then recognized by… More >

Open Access

ARTICLE

Spark Spectrum Allocation for D2D Communication in Cellular Networks

Tanveer Ahmad¹, Imran Khan², Azeem Irshad³, Shafiq Ahmad⁴, Ahmed T. Soliman⁴, Akber Abid Gardezi⁵, Muhammad Shafiq^6,*, Jin-Ghoo Choi⁶

CMC-Computers, Materials & Continua, Vol.70, No.3, pp. 6381-6394, 2022, DOI:10.32604/cmc.2022.019787

Abstract The device-to-device (D2D) technology performs explicit communication between the terminal and the base station (BS) terminal, so there is no need to transmit data through the BS system. The establishment of a short-distance D2D communication link can greatly reduce the burden on the BS server. At present, D2D is one of the key technologies in 5G technology and has been studied in depth. D2D communication reuses the resources of cellular users to improve system key parameters like utilization and throughput. However, repeated use of the spectrum and coexistence of cellular users can cause co-channel interference. More >

Open Access

ARTICLE

Applying Apache Spark on Streaming Big Data for Health Status Prediction

Ahmed Ismail Ebada¹, Ibrahim Elhenawy², Chang-Won Jeong³, Yunyoung Nam^4,*, Hazem Elbakry¹, Samir Abdelrazek¹

CMC-Computers, Materials & Continua, Vol.70, No.2, pp. 3511-3527, 2022, DOI:10.32604/cmc.2022.019458

Abstract Big data applications in healthcare have provided a variety of solutions to reduce costs, errors, and waste. This work aims to develop a real-time system based on big medical data processing in the cloud for the prediction of health issues. In the proposed scalable system, medical parameters are sent to Apache Spark to extract attributes from data and apply the proposed machine learning algorithm. In this way, healthcare risks can be predicted and sent as alerts and recommendations to users and healthcare providers. The proposed work also aims to provide an effective recommendation system by… More >

Open Access

ARTICLE

Improving Cache Management with Redundant RDDs Eviction in Spark

Yao Zhao¹, Jian Dong^1,*, Hongwei Liu¹, Jin Wu², Yanxin Liu¹

CMC-Computers, Materials & Continua, Vol.68, No.1, pp. 727-741, 2021, DOI:10.32604/cmc.2021.016462

Abstract Efficient cache management plays a vital role in in-memory data-parallel systems, such as Spark, Tez, Storm and HANA. Recent research, notably research on the Least Reference Count (LRC) and Most Reference Distance (MRD) policies, has shown that dependency-aware caching management practices that consider the application’s directed acyclic graph (DAG) perform well in Spark. However, these practices ignore the further relationship between RDDs and cached some redundant RDDs with the same child RDDs, which degrades the memory performance. Hence, in memory-constrained situations, systems may encounter a performance bottleneck due to frequent data block replacement. In addition,… More >

Open Access

ARTICLE

Deep Learning-Based Hybrid Intelligent Intrusion Detection System

Muhammad Ashfaq Khan^1,2, Yangwoo Kim^1,*

CMC-Computers, Materials & Continua, Vol.68, No.1, pp. 671-687, 2021, DOI:10.32604/cmc.2021.015647

Abstract Machine learning (ML) algorithms are often used to design effective intrusion detection (ID) systems for appropriate mitigation and effective detection of malicious cyber threats at the host and network levels. However, cybersecurity attacks are still increasing. An ID system can play a vital role in detecting such threats. Existing ID systems are unable to detect malicious threats, primarily because they adopt approaches that are based on traditional ML techniques, which are less concerned with the accurate classification and feature selection. Thus, developing an accurate and intelligent ID system is a priority. The main objective of… More >

Displaying 1-10 on page 1 of 17. Per Page

View

203

Download

94

Like

0

View

1017

Download

367

Like

0

View

494

Download

295

Like

0

View

1234

Download

628

Like

0

View

1791

Download

1374

Like

0

View

1857

Download

1444

Like

0

View

1880

Download

929

Like

0

View

1915

Download

1525

Like

0

View

1950

Download

1231

Like

0

View

3635

Download

1936

Like

1

Cited by

8

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: