Open Access
ARTICLE
Tracking Dengue on Twitter Using Hybrid Filtration-Polarity and Apache Flume
1 Department of Information Systems, Faculty of Computer Science & Information Technology, Universiti Malaya, 50603, Kuala Lumpur, Malaysia
2 School of Computer Science and Engineering SCE, Taylor’s University, Subang Jaya, 47500, Malaysia
3 Department of Information Technology, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
4 Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
* Corresponding Author: Norjihan Binti Abdul Ghani. Email:
Computer Systems Science and Engineering 2022, 40(3), 913-926. https://doi.org/10.32604/csse.2022.018467
Received 09 March 2021; Accepted 07 May 2021; Issue published 24 September 2021
Abstract
The world health organization (WHO) terms dengue as a serious illness that impacts almost half of the world’s population and carries no specific treatment. Early and accurate detection of spread in affected regions can save precious lives. Despite the severity of the disease, a few noticeable works can be found that involve sentiment analysis to mine accurate intuitions from the social media text streams. However, the massive data explosion in recent years has led to difficulties in terms of storing and processing large amounts of data, as reliable mechanisms to gather the data and suitable techniques to extract meaningful insights from the data are required. This research study proposes a sentiment analysis polarity approach for collecting data and extracting relevant information about dengue via Apache Hadoop. The method consists of two main parts: the first part collects data from social media using Apache Flume, while the second part focuses on querying and extracting relevant information via the hybrid filtration-polarity algorithm using Apache Hive. To overcome the noisy and unstructured nature of the data, the process of extracting information is characterized by pre and post-filtration phases. As a result, only with the integration of Flume and Hive with filtration and polarity analysis, can a reliable sentiment analysis technique be offered to collect and process large-scale data from the social network. We introduce how the Apache Hadoop ecosystem – Flume and Hive – can provide a sentiment analysis capability by storing and processing large amounts of data. An important finding of this paper is that developing efficient sentiment analysis applications for detecting diseases can be more reliable through the use of the Hadoop ecosystem components than through the use of normal machines.Keywords
Cite This Article
Citations
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.