Online reviews significantly influence decision-making in many aspects of society. The integrity of internet evaluations is crucial for both consumers and vendors. This concern necessitates the development of effective fake review detection techniques. The goal of this study is to identify fraudulent text reviews. A comparison is made on shill reviews
People’s significant way of expressing themselves is now through the use of websites. Using e-commerce sites, forums and blogs, people can readily exchange their opinions on items and services. Most customers examine product and service reviews before purchasing them. Everyone on the internet increasingly recognizes the value of these online evaluations for other consumers and suppliers combined. Vendors can even build extra marketing tactics [
The problem could not be solved by physically assessing the linguistic content of a single review. Some features of the modified review were similar to another [
Fake reviews elicit more positive or negative feelings than authentic reviews [
As the influence of false reviews rises, recognizing them has become a significant issue, and ongoing study is required to handle this concerning situation. Researchers have suggested the holistic model [ We developed a novel method for extracting and categorizing features included in reviews to demonstrate that readability can also be used as an effective shill review identification component. We conducted a careful benchmark analysis on the problem of misinformation, utilizing multiple transformers models and compared the outcomes to the State Of The Art (SOTA) models. We also addressed the weaknesses in the present study by using the deceptive opinion dataset and gave future directions for enhancing spam review filtering.
The rest of the paper is organized as follows; Section 2 summarizes the related works. Section 3 contains the data, methods and techniques used in this study as well as an analysis of the recommended model’s structures. Section 4 goes over the experiments and their results. Finally, Section 5 concludes the study and suggests some future research directions.
This section examines previous efforts to identify fake reviews using various detection algorithms. The extant literature may be classified into three categories as given below.
When a pre-trained model is used to retain contextual information concealed in raw data, the model may better understand the meaning of a letter, word, or sentence in context. BERT leverages Masking Modelling Language as a ground-breaking language model, enabling self-supervised training on massive text datasets [
Over the last decade, LSTM models have been acknowledged as effective models that can learn from sequence data. The capacity that makes LSTM useful is its ability to grasp long-range correlations and learn quickly from sequences of varying durations. Fraudulent card transactions have also been examined using LSTM models [
Traditional machine learning algorithms for example, Support Vector Machine (SVM) [
The Kaggle Deceptive Opinion dataset is used to test the proposed analysis. This dataset contains 1600 recordings with five attributes. It’s a collection of 20 real and fake hotel reviews from Chicago. The descriptions of five fields are listed in
Fields | Description |
---|---|
Deceptive | There are two sorts of reviews: “truthful” and “deceptive.” |
Hotel | It contains the hotel’s name. |
Polarity | It expresses the review’s emotions like positive and negative |
Source | It identifies the source of the review, which comes from three sources: TripAdvisor, Mturk, and the web. |
Text | It includes the reviews. |
The proposed framework for analysis expands on the existing research by incorporating Deep neural network model (Transformer models) approaches with the distinct linguistic feature of readability and sentiment mining, sets to categorize reviews from untruthful domains, thereby increasing the credibility of user-generated content available online as shown in
There are a few databases that contain both excellent quality real reviews and deceptive ones. Inquiring about past efforts based on the references given in Section 2, we discovered that a single labeled dataset was generally employed. The labeled dataset is obtained from Ott et al. [
In this study, a series of preprocessing techniques were utilized to prepare the raw review data from the deceptive opinion dataset for computational activity. They are Tokenization, Stop words removal and Lemmatization. Tokenization divides raw text into words and phrases known as tokens. Tokenization aids in determining the meaning of the text by evaluating the word sequence. Stop words are the words which lacks meaning (e.g., “a”, “an”). Any human language has an abundance of stop words. We eliminate the low-level information from our text by deleting these terms, allowing us to focus on the crucial information. In this study, all data is cleansed of stop words before proceeding with the fake review identification technique. The technique of collecting together the many inflected forms of a word so that they may be studied as a single item is known as lemmatization. Lemmatization is similar to stemming in that it adds context to words. As a result, it connects words with similar meanings to a single term. Thus the raw data goes through these three preprocessing stages.
This part shows a key role in the analysis phase as the feature selection decides the accuracy of the classifiers involved. This examination of the literature found two key features of distinct approaches, such as readability and sentiment for fake review detection. The extraction of the feature is based on the review dataset and the accuracy of review spam detection is dependent on the feature engineering strategy used. As a consequence, these components must be considered in tandem for the efficient deployment of the fake review detection model and enhanced accuracy [
In addition to the criteria listed above, we suggest an additional set of features extracted based on readability tests [
Score | School level | Notes |
---|---|---|
90–100 | Grade 5 | Very easy |
80–90 | Grade 6 | Easy |
70–80 | Grade 7 | Fairly easy |
60–70 | Grade 8 | Plain english |
50–60 | Grade 10–12 | Fairly difficult |
30–50 | College | Difficult |
0–30 | College graduate | Very difficult |
Thus by exploiting these readability tests over the deceptive opinion dataset, we can able to analyze that the fake reviews have more complexity of readability than compared to the truthful reviews, as depicted in the above
The second feature which is considered for the study is the sentiment feature, as the sentiment plays a major role in terms of classification and the VADER (Valence Aware Dictionary for Sentiment Reasoning) does the extraction of the sentiment from the review. The VADER library makes use of the polarity feature, which categorizes sentiment as positive, negative, or neutral. The compound score is calculated by summing the valence ratings of each word in the lexicon, then normalizing between extreme negative and positive. The compound score is computed using
Reviewing evaluations to determine if they are positive, negative, or neutral. It entails predicting whether the reviews will be decent or negative based on the text’s words, emoji’s and review scores, among other factors. Fake reviews, according to comparable research [
In recent decades, transformer models have shown greater classification performance. As it employs a pre-trained model for training, the computational time is decreased and since pre-trained models are widely available as open-source, the cost of environmental setup is also lowered. This section addresses the numerous transformer models used in this investigation, which are listed below.
BERT is a deep learning language processing model with sophisticated features. By a large margin, BERT beats all previous language models. In all tiers, it operates on the collaborative left and right context phenomena. BERT is a basic yet effective tool. It shows promise in a variety of machine learning tasks. For each new model to execute a range of functions, a fine-tuned BERT model just has to add one additional layer. A veiled language model is used. MLM (Masked Language Model) is based on the phenomenon of masking random words from input and then predicting the ID of those words based on their context. MLM employs both left and right contexts, allowing for bidirectional model training. In contrast to previous language models, BERT can learn the contextual representation from both ends of the sentence. For tokenization, BERT used a 30 K vocabulary of character level Byte-Pair Encoding. The input sequence is used to produce tokens and a positional embedding. [CLS] and [SEP], two unique tokens, are added to the beginning and end of a sequence, respectively. Text categorization techniques such as Next Sentence Prediction use the [CLS] token. A separator is provided by the [SEP] token. As a result, we employed BERTBASE in our work. It isn’t suitable for tasks involving ambiguous data mining or text mining. It was employed in the identification of bogus news in reference [
The Transformers’ Bidirectional Encoder Representation is abbreviated into the RoBERTa model [
The goal of the RoBERTa base layers is to offer a meaningful word embedding as the feature representation so that succeeding layers may readily extract useful information from it.
XLNet is a BERT-based autoregressive language model that overcomes the problem of concurrently generated forecasts using BERT [
The transformer-based multilingual masked language model XLM-RoBERTa has been pre-trained on text in 100 languages and delivers cutting-edge performance in cross-lingual classification, sequence labeling and question answering [
This section describes the experiment and the outcomes of several machine learning, deep learning, and transformer models. The tables and graphs are supplied to allow for a comparison of the models’ performance.
The experiments are written in Python 3.6.9 in Google Colab to make use of the GPU’s computing capabilities. Numpy 1.18.5 and Huggingface 3.5.1 are used for data preparation and tokenization. Huggingface 3.5.1 is also used to implement the pre-trained transformers. Scikit-learn 0.23.2 is used to implement the Machine learning model. Pytorch 1.7.0 or Tensorflow 2.3.0 are used to create deep learning models. Matplotlib 3.2.2 is used to create the graphs.
The reviews are categorized as fake or non-fake using several Machine learning classifiers and assessed for accuracy. Logistic regression classifiers excel in accuracy, whereas Support Vector Machine and Multinomial Naive Bayes outperform the other classifiers. The dataset is alienated into multiple train and test sections, and the accuracy is reported as given in
Classifiers involved | Training and testing ratio (%) for classification accuracy | |||
---|---|---|---|---|
60:40 ratio (%) | 70:30 ratio (%) | 80:20 ratio (%) | 90:10 ratio (%) | |
Support vector machine | 85.9 | 83.9 | 86.2 | 86.2 |
Random forest | 78.4 | 74.5 | 74 | 78 |
Decision tree | 65.3 | 67.5 | 64 | 68 |
Logistic regression | 86.5 | 85.6 | 87 | 87.7 |
Ada-boost | 79.2 | 80 | 78 | 80 |
Multinomial naïve bayes | 81.8 | 83.3 | 83.1 | 85 |
Classifiers | Truthful reviews | Fake reviews | ||||
---|---|---|---|---|---|---|
Precision (%) | Recall (%) | F-score (%) | Precision (%) | Recall (%) | F-score (%) | |
Support vector machine | 85 | 81 | 83 | 82 | 86 | 84 |
Random forest | 73 | 81 | 77 | 79 | 71 | 75 |
Decision tree | 67 | 67 | 67 | 67 | 67 | 67 |
Logistic regression | 86 | 84 | 85 | 84 | 86 | 85 |
Ada-boost | 71 | 85 | 81 | 83 | 75 | 79 |
Multinomial naïve bayes | 91 | 76 | 83 | 80 | 93 | 86 |
Thus this paper studied various machine learning classifiers for classifying the fake and truthful reviews. Logistic regression excels in accuracy among the classifiers. Multinomial Naïve Bayes (MNB) and Support Vector Machine performed better than other classifiers.
The Deceptive opinion dataset was used in the experiment. Compared to traditional models like LSTM, BI-LSTM, CNN + Bi-LSTM and CNN + GRU, the proposed combinational recommended model CNN and LSTM with sentiment intensity value offer superior results. Furthermore, the findings of this hybrid technique surpass the sentiment intensity values-based model. The proposed hybrid (CNN-LSTM) model outperforms existing methods in terms of accuracy. The loss function of the recommended model outperformed other models in terms of performance measures.
Deep learning model | Accuracy percentage involving sentiment features only [ |
Accuracy percentage involving readability and sentiment features |
---|---|---|
LSTM | 80.5 | 80.5 |
Bi-LSTM | 82.5 | 82.5 |
CNN + Bi-LSTM | 49 | 59 |
CNN + GRU | 42 | 66 |
CNN + LSTM | 83.7 | 87.7 |
This article investigated the use of deep learning models, finding that a hybrid mix of CNN and LSTM with readability and sentiment features outperforms other deep learning models such as LSTM, Bi-LSTM, CNN + Bi-LSTM and CNN + GRU in terms of accuracy.
The study aims to exploit the transformer models over the deceptive opinion dataset. A range of learning rates between 1e − 3 and 5e − 5 will be examined, along with batch sizes ranging from 16 to 32. The models will be trained using Adam optimization and cross-entropy as a loss function. The following settings will be fine-tuned: Number of batches: [
Models | Accuracy | Precision | Recall | F-score | Epoch |
---|---|---|---|---|---|
BERT | 91.2 | 93 | 89 | 91 | 5 |
XLNET | 94.3 | 94.7 | 94 | 94.3 | 5 |
RoBERTa | 97.13 | 98 | 96.4 | 97 | 5 |
XLM-RoBERTa | 98.2 | 97.8 | 98.6 | 98.2 | 5 |
The study’s goal was to investigate how we can employ pre-trained transformers to detect spam reviews. To begin, we experimentally investigated the best combination of machine learning and deep learning models. Next, we demonstrated that combining transformer-based classifiers improves performance against spam review filtering. BERT, RoBERTa, XLNet pre-trained language models were used. RoBERTa and XLNet were able to classify false reviews more effectively. Overall, RoBERTa-based combination models outperformed all others. In machine learning, logistic regression gave better excellence, and in the case of deep learning, the CNN-LSTM combination outperformed the other models.
This study looked into how different pre-trained transformers may be used to identify online spam reviews. Furthermore, this study added to current research by assembling the best models utilizing readability and sentiment features for spam review identification. In conjunction with all classification models, RoBERTa and the combination of RoBERTa with XLM outperformed BERT in detecting spam reviews. These transformers are more sophisticated and as a result, can better convey the review’s content. Additionally, the transformer model outperformed the machine learning and deep learning models. Thus transformers in depth might be quite valuable for natural language processing as it saves time with the pre-trained models by achieving excellence of efficiency. The Future direction is towards working on the unavailability of the labeled dataset where the behavioral features will be taken for considering the fake review filtering. As the reviews are of user-generated content, they may consist of multilingual categories, and thus multilingual review spam detection will be explored.