Open Access
ARTICLE
A Parallel Approach for Sentiment Analysis on Social Networks Using Spark
1 Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, 600062, India
2 University College of Engineering, Anna University (B. I. T Campus), Tiruchirappalli, 620024, India
* Corresponding Author: M. Mohamed Iqbal. Email:
Intelligent Automation & Soft Computing 2023, 35(2), 1831-1842. https://doi.org/10.32604/iasc.2023.029036
Received 23 February 2022; Accepted 29 March 2022; Issue published 19 July 2022
Abstract
The public is increasingly using social media platforms such as Twitter and Facebook to express their views on a variety of topics. As a result, social media has emerged as the most effective and largest open source for obtaining public opinion. Single node computational methods are inefficient for sentiment analysis on such large datasets. Supercomputers or parallel or distributed processing are two options for dealing with such large amounts of data. Most parallel programming frameworks, such as MPI (Message Processing Interface), are difficult to use and scale in environments where supercomputers are expensive. Using the Apache Spark Parallel Model, this proposed work presents a scalable system for sentiment analysis on Twitter. A Spark-based Naive Bayes training technique is suggested for this purpose; unlike prior research, this algorithm does not need any disk access. Millions of tweets have been classified using the trained model. Experiments with various-sized clusters reveal that the suggested strategy is extremely scalable and cost-effective for larger data sets. It is nearly 12 times quicker than the Map Reduce-based model and nearly 21 times faster than the Naive Bayes Classifier in Apache Mahout. To evaluate the framework’s scalability, we gathered a large training corpus from Twitter. The accuracy of the classifier trained with this new dataset was more than 80%.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.