Open Access
ARTICLE
Twitter Data Analysis Using Hadoop and ‘R’ and Emotional Analysis Using Optimized SVNN
Department of CA, M S Ramaiah Institute of Technology, 560054, Bangalore, India
* Corresponding Author: K. Sailaja Kumar. Email:
Computer Systems Science and Engineering 2023, 44(1), 485-499. https://doi.org/10.32604/csse.2023.025390
Received 22 November 2021; Accepted 11 January 2022; Issue published 01 June 2022
Abstract
Standalone systems cannot handle the giant traffic loads generated by Twitter due to memory constraints. A parallel computational environment provided by Apache Hadoop can distribute and process the data over different destination systems. In this paper, the Hadoop cluster with four nodes integrated with RHadoop, Flume, and Hive is created to analyze the tweets gathered from the Twitter stream. Twitter stream data is collected relevant to an event/topic like IPL- 2015, cricket, Royal Challengers Bangalore, Kohli, Modi, from May 24 to 30, 2016 using Flume. Hive is used as a data warehouse to store the streamed tweets. Twitter analytics like maximum number of tweets by users, the average number of followers, and maximum number of friends are obtained using Hive. The network graph is constructed with the user’s unique screen name and mentions using ‘R’. A timeline graph of individual users is generated using ‘R’. Also, the proposed solution analyses the emotions of cricket fans by classifying their Twitter messages into appropriate emotional categories using the optimized support vector neural network (OSVNN) classification model. To attain better classification accuracy, the performance of SVNN is enhanced using a chimp optimization algorithm (ChOA). Extracting the users’ emotions toward an event is beneficial for prediction, but when coupled with visualizations, it becomes more powerful. Bar-chart and wordcloud are generated to visualize the emotional analysis results.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.