Open Access
ARTICLE
Sentiment Analysis System in Big Data Environment
1 University of Computer Studies, Yangon
2 University of Computer Studies, Maubin
E-mail: wintnyeinchan2012@gmail.com; thandartheinn@gmail.com
Computer Systems Science and Engineering 2018, 33(3), 187-202. https://doi.org/10.32604/csse.2018.33.187
Abstract
Nowadays, Big Data, a large volume of both structured and unstructured data, is generated from Social Media. Social Media are powerful marketing tools and social big data can offer the business insights. The major challenge facing social big data is attaining efficient techniques to collect a large volume of social data and extract insights from the huge amount of collected data. Sentiment Analysis of social big data can provide business insights by extracting the public opinions. The traditional analytic platforms need to be scaled up for analyzing a large volume of social big data. Social data are by nature shorter and generally not constructed with proper grammatical rules and hence difficult to achieve high reliable result in Sentiment Analysis. Acquiring effective training data is a challenge, although learning based approaches are good for sentiment classification. Manual Labeling for training data is time and labor consuming. In this paper, Sentiment Analysis system on Big Data Analytics platform is proposed to provide valuable information by analyzing large scale social data in an efficient and timely manner since they have been implemented using a MapReduce framework and a Hadoop distributed storage (HDFS). The proposed Sentiment Analysis system consists of four modules: data collection, data cleaning and preprocessing, class labeling and sentiment classification. The system enables high-level performance of sentiment classification while taking advantage of combining lexicon-based classifier’s effortless setup process and learning based classifier. Twitter stream data is used for system evaluation as the Twitter is widespread Social Media and a good source of information in the sense of snapshots of moods and feelings as well as up-to-date events. The evaluation results show that this system achieve a promising accuracy by 84.2%. Moreover, this system is able to scale up to analyze the large scale data by decreasing the processing time when adding more nodes in the cluster.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.