K. Sailaja Kumar*, H. K. Manoj, D. Evangelin Geetha
Computer Systems Science and Engineering, Vol.44, No.1, pp. 485-499, 2023, DOI:10.32604/csse.2023.025390
- 01 June 2022
Abstract Standalone systems cannot handle the giant traffic loads generated by Twitter due to memory constraints. A parallel computational environment provided by Apache Hadoop can distribute and process the data over different destination systems. In this paper, the Hadoop cluster with four nodes integrated with RHadoop, Flume, and Hive is created to analyze the tweets gathered from the Twitter stream. Twitter stream data is collected relevant to an event/topic like IPL- 2015, cricket, Royal Challengers Bangalore, Kohli, Modi, from May 24 to 30, 2016 using Flume. Hive is used as a data warehouse to store the… More >