Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection

Badriyya Al-onazi; Jaber Alzahrani; Najm Alotaibi; Hussain Alshahrani; Mohamed Elfaki; Radwa Marzouk; Heba Mohsen; Abdelwahed Motwakel

doi:10.32604/iasc.2023.033835

icon Open Access

ARTICLE

Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection

Badriyya B. Al-onazi¹, Jaber S. Alzahrani², Najm Alotaibi³, Hussain Alshahrani⁴, Mohamed Ahmed Elfaki⁴, Radwa Marzouk⁵, Heba Mohsen⁶, Abdelwahed Motwakel^7,*

1 Department of Language Preparation, Arabic Language Teaching Institute, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
2 Department of Industrial Engineering, College of Engineering at Alqunfudah, Umm Al-Qura University, Mecca, 24382, Saudi Arabia
3 Prince Saud AlFaisal Institute for Diplomatic Studies, Riyadh, 12735, Saudi Arabia
4 Department of Computer Science, College of Computing and Information Technology, Shaqra University, Shaqra, 11911, Saudi Arabia
5 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
6 Department of Computer Science, Faculty of Computers and Information Technology, Future University in Egypt New Cairo, New Cairo, 11835, Egypt
7 Department of Computer and Self Development, Preparatory Year Deanship, Al-Kharj, 16278, Saudi Arabia

* Corresponding Author: Abdelwahed Motwakel. Email: email

Intelligent Automation & Soft Computing 2024, 39(3), 567-583. https://doi.org/10.32604/iasc.2023.033835

Received 29 June 2022; Accepted 14 October 2022; Issue published 11 July 2024

Abstract

In recent years, the usage of social networking sites has considerably increased in the Arab world. It has empowered individuals to express their opinions, especially in politics. Furthermore, various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales. This is attributed to business owners’ understanding of social media’s importance for business development. However, the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns. Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies. In this background, the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection (CEHOML-HSD) model in the context of the Arabic language. The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal. To attain this, the CEHOML-HSD model follows different sub-processes as discussed herewith. At the initial stage, the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer. Secondly, the Support Vector Machine (SVM) model is utilized to detect and classify the hate speech texts made in the Arabic language. Lastly, the CEHO approach is employed for fine-tuning the parameters involved in SVM. This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm. The design of the CEHO algorithm for parameter tuning shows the novelty of the work. A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach. The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches.

Keywords

Arabic language; machine learning; elephant herd optimization; TF-IDF vectorizer; hate speech detection

1 Introduction

In recent times, the Arab world has been getting much attention from various multi-national predictors as it is an important performer in international politics and the global economy [1]. Opinion mining has become an important research phenomenon, especially for the problems faced in politics, market movements and oil and gas prices. It gained a significant amount of interest after the Arab spring movement. Reflecting this, the accessibility of social networking media has increased in Arabic countries, which led to the generation of huge volumes of Arabic texts on the internet. As per the ‘Internet World Stats’, Arabic is the 4th commonly-applied language on the internet after English, Chinese and Spanish [2]. Notwithstanding this, Sentiment Analysis (ASA)-related studies in the Arabic language are too few, mainly because of the lack of sentiment resources in the Arabic language [3,4]. This is because the bilingual methodologies are simply incompetent for the Arabic language, which has unique characteristics compared to the English language in terms of grammar and structure [5]. The studies pertaining to the Arabic language SA find it challenging to accomplish the outcomes due to different formats of the language and tits free writing style. Most of the time, the Arabic language-based interactions on social networking platforms are generated in local dialects [6,7]. But, monolingual studies that take the Arabic language texts into consideration mainly disregard the dialects and the Arabic texts that contain Latin letters. The existing resources and their respective tools only consider the current version of Arabic (MSA). As a result, imitation cannot be executed since it produces low performance in real-time applications [8]. The real-time value of the SA approach for the study corresponds to the outcome, replicated with them [9].

Hate speech is the usage of insulting, offensive and abusive language towards individuals or a group of people [10]. The aim of hate speech is to disseminate hatred and discriminate the opposition based on their gender, race, disability or religion. The European Court of Human Rights (ECHR) recognizes the conception of hate speech as the usage of words or expressions that incite, spread or encourage hatred against an individual or a group of people based on xenophobia or race and some kind of intolerance against minorities or immigrants (Court) [11]. Being a microblogging site, Twitter is a social networking platform that facilitates the end users to express their opinions and discuss easily interactive concepts [12]. The increased penetration of social media networks results in the generation of massive volumes of data that are analysed through smart Machine Learning (ML) algorithms. Data mining and ML techniques are used to interpret such huge volumes of data, which in turn provides the capability to understand the hidden patterns of the data [13]. Therefore, the prospective is high for the identification of hatred information patterns. Natural Language Processing (NLP) techniques employ employing dissimilar statistical pre-processing methods. The NLP technique aims to transform the text dataset into a dataset that is possibly used by ML algorithms [14]. The NLP process includes various sub-processes such as stemming, data normalization, feature extraction and tokenization. But it also faces numerous challenges while handling sophisticated languages.

Aljarah et al. [15] proposed a method involving NLP and ML techniques for detecting cyber hate speech in the context of the Arabic language on the Twitter platform. This work considered a group of tweets encompassing sports, orientation, Islam, racism, journalism and terrorism. Distinct emotions and features were extracted from the dataset and arranged under 15 distinct data classes. In literature [16], the authors developed an NN-based classification of tweets expressed in seven different languages under hate or non-hate categories. This study covered the texts in one or more one languages, too, simultaneously. The study utilized CNNs and character-level representations. In the study conducted earlier [17], the authors examined the capabilities of CNN, CNN-LSTM and BiLSTM-CNN DL networks for automatic classification and recognising hateful content posted on social media. In this study, the deep network was trained and tested using the ArHS dataset, that had a total of 9,833 tweets. These tweets were recognized and categorized as hateful speech in the Arabic language.

Aldjanabi et al. [18] examined numerous aggressive and hate speeches on Arab social media to develop an accurate aggressive and hate speech recognition method. To be specific, this study established a classification method to define offensive content and hate speech with the help of the Multi-Task Learning (MTL) technique. This technique was developed on the basis of the pre-trained Arabic language method. In literature [19], the authors focused only on the technical features of designing an automated model such as monitoring and detecting hate speech made in the Arabic language. This study used the data collected from several companies that utilize it to prevent hate speech and cyberbullying. Also, the researchers utilized deep RNNs to detect and classify hate speeches.

The current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection (CEHOML-HSD) model in the context of the Arabic language. The presented CEHOML-HSD model follows different sub-processes. At the initial stage, the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer. Secondly, the Support Vector Machine (SVM) model is applied to detect and classify hate speech made in the Arabic language. At last, the CEHO technique is employed for optimal fine-tuning of the SVM parameters. This study introduces the CEHO algorithm by combining the chaotic functions with the classical EHO algorithm. A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD model.

2 Design of CEHOML-HSD Model

In this study, a novel CEHOML-HSD approach is proposed for the identification and categorization of the Arabic text into hate speech and normal. The proposed CEHOML-HSD model follows data pre-processing with the help of the TF-IDF vectorizer at the initial stage. Next, the SVM technique is used to detect and classify the hate speech made in the Arabic language. Finally, the CEHO technique is employed to modify the SVM parameters optimally. Fig. 1 showcases the overall processes involved in the CEHOML-HSD approach.

images

Figure 1: Overall processes of the CEHOML-HSD approach

2.1 Data Preprocessing

Data pre-processing commences after the tweets are collected with the help of R language and Twitter APIs. Then, the data is annotated by two volunteers, after which the clean data is obtained without any redundant or irrelevant tweets. Then, the tweets are tokenized, normalized and vectorised for feature representation. During the data cleaning process, all the hashtags, non-Arabic characters, punctuation marks, numerals, diacritics, symbols, Arabic stop words and web addresses are filtered. Tweets’ normalization is a procedure in which the Arabic characters are transformed so that the characters can be written in a standard writing manner to a colloquial-writing manner. Predominantly, the arithmetical features indicate the statistical dimensions of the words. The feature collections are represented as vectors; hence, the process is termed text vectorization.

2.2 Hate Speech Detection Using SVM Model

Next, the SVM model is used to detect and classify the hate speeches made in Arabic. The SVM is a classifier approach that determines the decision to separate data points from distinct classes [20]. This procedure aims to increase the width of the ‘street’ by splitting it into two classes. $H_{0}$ $H0$ indicates the boundary (hyperplane) which splits the street into two halves; $H_{1}$ $H1$ and $H_{2}$ $H2$ denote the two planes (similar to $H_{0}$ $H0$ ) that touch a nearby point from every class to the boundary as given below:

$H0:wTx+b=0H1:wTx+b=1H2:wTx+b=−1$ (1)

In Eq. (1), $w$ indicates a weight vector, $x$ denotes an input vector, and $b$ signifies the bias. The distance between $H0$ and $H1$ planes is represented by $|w⋅x|w||=1||w||$ . Therefore, the distance between $H1$ and $H2$ is denoted by $2||w|$ .

To increase the margin size, it is necessary to minimize the $||w||$ , with the condition that there exist no data points between the planes $H1$ and $H2$ . It results in a constraint optimization challenge, as stated in Eq. (2).

$minw12||w||2subject to:yi(w⋅xi)+b)−1≥0$ (2)

In Eq. (2), $yi$ represents the label of every data point $xi(yi$ is $+1$ for a single class and $−1$ for another class) and $i$ indicates the iterative data point.

It is to be noted that the constraints are divided as follows:

$w⋅xi+b≥+1 when yi=+1(classA)w⋅xi+b≤−1 when yi=−1(classB)$ (3)

The abovementioned optimization issue is a quadratic convex optimization issue with a linear constraint that can be resolved by a quadratic programming solver. The SVM is expanded to obtain a non-linear classifier in which a kernel is employed in the input dataset and the input the mapped with high-dimensional feature spaces.

In comparison with $AE$ , the SVM is a supervised learning-based algorithm. This implies that each point should be given a label. Assume that a training point set belongs to a class. A non-probabilistic method is constructed during the SVM training procedure. This method categorizes the novel data points into one of the two classification types. Fig. 2 displays the SVM hyperplane. According to the abovementioned optimization issue, the SVM method characterizes each data entry as a point in high-dimensional space. Further, it also identifies a hyperplane to split the class. This process increases the distance among the nearby data points that belong to every class in the decision boundary.

images

Figure 2: SVM hyperplane

2.3 Parameter Tuning using CEHO Algorithm

Finally, the CEHO algorithm is introduced as a combination of chaotic functions and the classical EHO algorithm. The elephant herding behavior is utilized in implementing the EHO approach [21]. This section discusses the elephant herding behaviour in detail. The single-elephant population is divided into multiple clans. Each clan follows the matriarchy process i.e.; a female elephant leads the clan. In every population, a specific male elephant leaves the clan to live an isolated life farther from the clan. Concerning Swarm Intelligence (SI) approach, the clan indicates the local search process, whereas the male elephant that leaves the clan denotes the global search. The matriarch pattern is the solution (elephant) with better fitness value in the clan. On the contrary, the movement of the male elephants represents the solution with the worst fitness value. EHO approach is determined as discussed herewith. The elephant population is classified into $k$ clans. At first, a $D$ -dimension solution is randomly generated in the searching space with $xmin$ and $xmax$ that correspond to lower and upper bounds as follows:

$x=xmin+(xmax−xmin+1)rand$ (4)

In Eq. (4), $rand$ refers to a uniformly-distributed random integer that lies in the range of 0 and 1.

In every generation, the solution changes as follows. A member $j$ of the clan $i$ inclines towards the solution $xbest,ci$ with an optimal fitness value in clan $ci$ :

$xnew,ci,j=xci,j+α(xbest,ci−xci,j)rand$ (5)

In Eq. (5), $xnew,ci,j$ refers to a novel solution $j$ in clan $ci$ whereas $xci,j$ indicates the solution in the preceding generation, variable $α∈$ [0, 1] refers to a variable that is fixed based on the problem and $rand∈[0,1]$ indicates a uniformly-distributed random number. A scale factor $α$ defines the impact of the optimal solution. The location of the optimal solution in every clan is upgraded based on the formula given below:

$xnew,ci=βxcenter,ci$ (6)

In Eq. (6), $β∈[0,1]$ indicates the parameter and controls the effect of the clan center $xcenter,ci$ as follows:

$xcenter,ci,d=1nci∑l=1ncixci,l,d$ (7)

Here, $1≤d≤D$ signifies the $dth$ dimension and $nci$ indicates the count of the solutions in clan $ci.$ In every population, the exploration process is performed as follows. In every clan, $themci$ solution with the worst fitness value of the clan $ci$ is selected for substitution with the subsequent solution.

$xworst,ci=xmin+(xmax−xmin+1)rand$ (8)

In Eq. (8), $xmin$ and $xmax$ represent the lower as well as upper limits correspondingly. Variable $rand∈[0,1]$ signifies a uniformly-distributed arbitrary value. The SI algorithm has been demonstrated to be comparatively effective in finding the best solution for hard optimization problems. Consequently, this algorithm was improved with additional elements and gained much attention in recent years. The frequently-used improvement in SI algorithms is substituting random values with a chaotic map. This is because the chaotic map generates the numbers with ergodicity and non-repetition while better searches are predictable in this domain [22]. Two different 1D maps are assumed in this regard such as circle and sinusoidal maps.

The circle map is determined as follows:

$xk+1=[xk+b−a2πsin⁡(2πxk)]mod1$ (9)

In Eq. (9), for $a=0.5$ and $b=0.2$ , the produced chaotic sequence lies in the interval of 0 and 1.

The sinusoidal map is determined as follows:

$xk+1=axk2sin⁡(πxk)$ (10)

In Eq. (10), for $a=2.3$ and $x0=0.7$ , the subsequent equation is employed.

$xk+1=sin⁡(πxk)$ (11)

The presented chaotic map is applied in the CEHO algorithm for generating chaos sequences. Then, the random numbers are replaced in Eqs. (4)–(6) with the numbers attained from the chaos sequence.

images

3 Results and Discussion

In this section, the proposed CEHOML-HSD model was experimentally validated and the results are discussed in detail. The model was validated using a dataset composed of Arabic text under two classes: hate and normal.

Table 1 and Fig. 3 depict the overall hate speech detection outcomes accomplished by the proposed CEHOML-HSD model under 500 epochs. The results infer that the proposed CEHOML-HSD model accomplished superior performance in each aspect. In hate class, the proposed CEHOML-HSD model offered an $accuy$ of 92.60%, $precn$ of 96.71%, $recal$ of 88.20%, $Fscore$ of 92.26%, and a $Jaccardindex$ of 85.63%. Also, in a normal class, the presented CEHOML-HSD approach obtained an $accuy$ of 92.60%, $precn$ of 89.15%, $recal$ of 97%, $Fscore$ of 92.91%, and a $Jaccardindex$ of 86.76%.

images

Figure 3: Results of the analysis of the CEHOML-HSD approach under 500 epochs

Table 2 and Fig. 4 demonstrate the overall hate speech detection outcomes achieved by the proposed CEHOML-HSD model with 1000 epochs. The outcomes imply that the proposed CEHOML-HSD system accomplished improved outcomes under each aspect. In hate class, the presented CEHOML-HSD approach offered an $accuy$ of 90.60%, $precn$ of 94.52%, $recal$ of 86.20%, $Fscore$ of 90.17%, and a $Jaccardindex$ of 82.10%. Similarly, in normal class, the proposed CEHOML-HSD technique offered an $accuy$ of 90.60%, $precn$ of 87.32%, $recal$ of 95%, $Fscore$ of 91%, and a $Jaccardindex$ of 83.48%.

images

Figure 4: Results of the analysis of the CEHOML-HSD approach under 1000 epochs

Table 3 and Fig. 5 illustrate the overall hate speech detection outcomes attained by the proposed CEHOML-HSD technique with 1500 epochs. The results demonstrate that the proposed CEHOML-HSD model accomplished enhanced results under each aspect. In hate class, the proposed CEHOML-HSD approach obtained an $accuy$ of 89.70%, $precn$ of 93.82%, $recal$ of 85%, $Fscore$ of 89.19%, and a $Jaccardindex$ of 80.49%. In addition, under normal class, the presented CEHOML-HSD algorithm achieved an $accuy$ of 89.70%, $precn$ of 86.29%, $recal$ of 94.40%, $Fscore$ of 90.16%, and a $Jaccardindex$ of 82.09%.

images

Figure 5: Results of the analysis of CEHOML-HSD approach under 1500 epochs

Table 4 and Fig. 6 showcase the overall hate speech detection outcomes produced by the proposed CEHOML-HSD model with 2000 epochs. The outcomes depict that the proposed CEHOML-HSD system accomplished exemplary performance under each aspect. In hate class, the proposed CEHOML-HSD method achieved an $accuy$ of 91.50%, $precn$ of 95.21%, $recal$ of 87.40%, $Fscore$ of 91.14%, and a $Jaccardindex$ of 83.72%. Eventually, under normal class, the proposed CEHOML-HSD model offered an $accuy$ of 91.50%, $precn$ of 88.35%, $recal$ of 95.60%, $Fscore$ of 91.83% and a $Jaccardindex$ of 84.90%.

images

Figure 6: Results of the analysis of CEHOML-HSD approach under 2000 epochs

Both Training Accuracy (TA) and the Validation Accuracy (VA) values, acquired by the proposed CEHOML-HSD methodology on the test dataset, are depicted in Fig. 7. The experimental outcomes expose that the proposed CEHOML-HSD system achieved high TA and VA values. In contrast, the VA values were superior to TA.

images

Figure 7: TA and VA analyses results of CEHOML-HSD methodology

Both Training Loss (TL) and the Validation Loss (VL) values, accomplished by the proposed CEHOML-HSD approach on the test dataset, are represented in Fig. 8. The experimental outcomes reveal that the proposed CEHOML-HSD algorithm achieved the minimal TL and VL values whereas the VL values were lesser than TL.

images

Figure 8: TL and VL analyses results of CEHOML-HSD methodology

A clear precision-recall inspection was conducted upon the CEHOML-HSD method using the test dataset, and the results are depicted in Fig. 9. The figure implies that the proposed CEHOML-HSD methodology produced high precision-recall values under all the classes.

images

Figure 9: Precision-recall curve analysis results of CEHOML-HSD methodology

A brief ROC analysis was conducted upon CEHOML-HSD system using the test dataset and the results are demonstrated in Fig. 10. The outcomes reveal that the proposed CEHOML-HSD approach established its ability in categorizing the test dataset under distinct classes.

images

Figure 10: ROC curve analysis results of CEHOML-HSD methodology

Table 5 offers the hate speech detection outcomes yielded by the proposed CEHOML-HSD model and other existing models [23]. Fig. 11 portrays the comparative study results achieved by the proposed CEHOML-HSD model and other recent models in terms of $accuy$ . The figure implies that the Bag-of-words model achieved the least $accuy$ of 69.66%. Then, the TF-IDF method achieved an enhanced $accuy$ of 72.34%. Meanwhile, the Word2vec and Glove models produced reasonable $accuy$ values such as 78.60% and 77.49%, respectively. Though the N-grams model accomplished a considerably-high $accuy$ of 83.20%, the proposed CEHOML-HSD model achieved a maximum $accuy$ of 92.60%.

images

Figure 11: $Accuy$ analysis results of CEHOML-HSD method and other existing methodologies

Fig. 12 reports the comparative analysis results achieved by the proposed CEHOML-HSD model and other recent models with respect to $precn$ . The figure signifies that the Bag-of-words model displayed the least $precn$ of 69.42%. Afterward, the TF-IDF algorithm achieved a somewhat enhanced $precn$ of 82.61%. In the meantime, the Word2vec and Glove systems produced reasonable $precn$ values such as 77.83% and 73.46% correspondingly. Though the N-grams approach accomplished a significant $precn$ of 73.50%, the proposed CEHOML-HSD technique attained a maximum $precn$ of 92.93%.

images

Figure 12: $Precn$ analysis results of CEHOML-HSD method and other existing methodologies

Fig. 13 portrays the comparative investigation outcomes of the proposed CEHOML-HSD system and other recent models in terms of $recal$ . The figure exposes that the Bag-of-words model achieved the least $recal$ of 71.49%. Also, the TF-IDF technique exhibited a somewhat enhanced $recal$ of 68.18%. Likewise, the Word2vec and Glove approach achieved reasonable $recal$ values, such as 72.82% and 71.52%, correspondingly. Eventually, the N-grams method accomplished a considerable $recal$ of 70.62%. However, the proposed CEHOML-HSD algorithm attained a maximal $recal$ of 92.60%. Therefore, it can be inferred that the proposed CEHOML-HSD approach achieved a maximum classification performance in the hate speech detection process. The enhanced performance is attributed to the application of CEHO-based parameter optimization process as illustrated in Fig. 14.

images

Figure 13: $Recal$ analysis results of CEHOML-HSD approach and other existing methodologies

images

Figure 14: $Fscore$ analysis results of CEHOML-HSD approach and other existing methodologies

4 Conclusion

In this study, a novel CEHOML-HSD approach has been developed for the identification and categorization of the Arabic text into hate speech and normal. The proposed CEHOML-HSD model uses the TF-IDF vectorizer at the initial stage to pre-process the data. Next, the SVM method is used to detect and classify the hate speeches made in the Arabic language. Lastly, the CEHO technique is employed for optimal fine-tuning SVM parameters. The CEHO algorithm is introduced by combining the chaotic functions with the classical EHO algorithm. A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD model. The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches. In the future, the performance of the proposed CEHOML-HSD model can be improved with the help of advanced feature selection and feature reduction approaches.

Acknowledgement: None.

Funding Statement: Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2024R263), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. This study is supported via funding from Prince Sattam bin Abdulaziz University Project Number (PSAU/2024/R/1445).

Author Contributions: Conceptualization, Badriyya B. Al-onazi; Methodology, Jaber S. Alzahrani; Software, Najm Alotaibi; Validation, Hussain Alshahrani and Jaber S. Alzahrani; Investigation, Badriyya B. Al-onazi; Data curation, Jaber S. Alzahrani; Writing–original draft, BadriyyaB. Al-onazi, Jaber S. Alzahrani, Najm Alotaibi, Mohamed Ahmed Elfaki, Radwa Marzouk and Heba Mohsen; Writing–review & editing, Abdelwahed Motwakel, Jaber S. Alzahrani, Radwa Marzouk and Heba Mohsen; Visualization, Abdelwahed Motwakel; Project administration, Abdelwahed Motwakel; Funding acquisition, Badriyya B. Al-onazi and Jaber S. Alzahrani. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: Data sharing not applicable to this article as no datasets were generated during the current study.

Conflicts of Interest: The authors declare they have no conflicts of interest to report regarding the present study.

References

1. R. Alshalan and H. A. Khalifa, “A deep learning approach for automatic hate speech detection in the Saudi twittersphere,” Applied Sciences, vol. 10, no. 23, pp. 8614, 2020. doi: 10.3390/app10238614. [Google Scholar] [CrossRef]

2. N. Defersha and K. Tune, “Detection of hate speech text in Afan Oromo social media using machine learning approach,” Indian Journal of Science and Technology, vol. 14, no. 31, pp. 2567–2578, 2021. doi: 10.17485/IJST. [Google Scholar] [CrossRef]

3. Z. Mossie and J. H. Wang, “Vulnerable community identification using hate speech detection on social media,” Information Processing & Management, vol. 57, no. 3, pp. 102087, 2020. doi: 10.1016/j.ipm.2019.102087. [Google Scholar] [CrossRef]

4. F. N. Al-Wesabi, “Proposing high-smart approach for content authentication and tampering detection of Arabic text transmitted via internet,” IEICE Transactions on Information and Systems, vol. E103.D, no. 10, pp. 2104–2112, 2020. doi: 10.1587/transinf.2020EDP7011. [Google Scholar] [CrossRef]

5. F. Poletto, V. Basile, M. Sanguinetti, C. Bosco and V. Patti, “Resources and benchmark corpora for hate speech detection: A systematic review,” Language Resources and Evaluation, vol. 55, no. 2, pp. 477–523, 2021. doi: 10.1007/s10579-020-09502-8. [Google Scholar] [CrossRef]

6. F. N. Al-Wesabi, “A smart english text zero-watermarking approach based on third-level order and word mechanism of Markov model,” Computers, Materials & Continua, vol. 65, no. 2, pp. 1137–1156, 2020. doi: 10.32604/cmc.2020.011151. [Google Scholar] [CrossRef]

7. E. Pronoza, P. Panicheva, O. Koltsova and P. Rosso, “Detecting ethnicity-targeted hate speech in Russian social media texts,” Information Processing & Management, vol. 58, no. 6, pp. 102674, 2021. doi: 10.1016/j.ipm.2021.102674. [Google Scholar] [CrossRef]

8. F. N. Al-Wesabi, “A hybrid intelligent approach for content authentication and tampering detection of Arabic text transmitted via internet,” Computers, Materials & Continua, vol. 66, no. 1, pp. 195–211, 2021. doi: 10.32604/cmc.2020.012088. [Google Scholar] [CrossRef]

9. O. Araque and C. A. Iglesias, “An ensemble method for radicalization and hate speech detection online empowered by sentic computing,” Cognitive Computation, vol. 14, no. 1, pp. 48–61, 2022. doi: 10.1007/s12559-021-09845-6. [Google Scholar] [CrossRef]

10. F. N. Al-Wesabi, “Entropy-based watermarking approach for sensitive tamper detection of arabic text,” Computers, Materials & Continua, vol. 67, no. 3, pp. 3635–3648, 2021. doi: 10.32604/cmc.2021.015865. [Google Scholar] [CrossRef]

11. A. Y. Muaad, H. J. Davanagere, M. A. Al-antari, J. V. B. Benifa and C. Chola, “AI-based misogyny detection from Arabic Levantine twitter tweets,” in Proc. of the 1st Int. Electronic Conf. on Algorithms, Computer Sciences & Mathematics Forum, Switzerland, vol. 2, pp. 15, 2021. [Google Scholar]

12. N. Albadi, M. Kurdi and S. Mishra, “Investigating the effect of combining gru neural networks with handcrafted features for religious hatred detection on Arabic twitter space,” Social Network Analysis and Mining, vol. 9, no. 1, pp. 1–19, 2019. [Google Scholar]

13. M. K. A. Aljero and N. Dimililer, “A novel stacked ensemble for hate speech recognition,” Applied Sciences, vol. 11, no. 24, pp. 11684, 2021. doi: 10.3390/app112411684. [Google Scholar] [CrossRef]

14. A. Y. Muaad, H. Jayappa, M. A. Al-antari and S. Lee, “ArCAR: A novel deep learning computer-aided recognition for character-level Arabic text representation and recognition,” Algorithms, vol. 14, no. 7, pp. 216, 2021. doi: 10.3390/a14070216. [Google Scholar] [CrossRef]

15. I. Aljarah, M. Habib, N. Hijazi, H. Faris, R. Qaddoura et al., “Intelligent detection of hate speech in Arabic social network: A machine learning approach,” Journal of Information Science, vol. 47, no. 4, pp. 483–501, 2021. doi: 10.1177/0165551520917651. [Google Scholar] [CrossRef]

16. A. Elouali, Z. Elberrichi and N. Elouali, “Hate speech detection on multilingual twitter using convolutional neural networks,” Revue d’Intelligence Artificielle, vol. 34, no. 1, pp. 81–88, 2020. doi: 10.18280/ria. [Google Scholar] [CrossRef]

17. R. Duwairi, A. Hayajneh and M. Quwaider, “A deep learning framework for automatic detection of hate speech embedded in Arabic tweets,” Arabian Journal for Science and Engineering, vol. 46, no. 4, pp. 4001–4014, 2021. doi: 10.1007/s13369-021-05383-3. [Google Scholar] [CrossRef]

18. W. Aldjanabi, A. Dahou, M. A. A. Al-qaness, M. A. Elaziz, A. M. Helmi et al., “Arabic offensive and hate speech detection using a cross-corpora multi-task learning model,” Informatics, vol. 8, no. 4, pp. 69, 2021. doi: 10.3390/informatics8040069. [Google Scholar] [CrossRef]

19. F. Y. A. Anezi, “Arabic hate speech detection using deep recurrent neural networks,” Applied Sciences, vol. 12, no. 12, pp. 6010, 2022. doi: 10.3390/app12126010. [Google Scholar] [CrossRef]

20. H. Wang, B. Zheng, S. W. Yoon and H. S. Ko, “A support vector machine-based ensemble algorithm for breast cancer diagnosis,” European Journal of Operational Research, vol. 267, no. 2, pp. 687–699, 2018. doi: 10.1016/j.ejor.2017.12.001. [Google Scholar] [CrossRef]

21. S. D. Correia, M. Beko, L. A. D. S. Cruz and S. Tomic, “Elephant herding optimization for energy-based localization,” Sensors, vol. 18, no. 9, pp. 2849, 2018. doi: 10.3390/s18092849. [Google Scholar] [PubMed] [CrossRef]

22. D. Oliva, A. A. Ewees, M. A. E. Aziz, A. E. Hassanien and M. Peréz-Cisneros, “A chaotic improved artificial bee colony for parameter estimation of photovoltaic cells,” Energies, vol. 10, no. 7, pp. 865, 2017. doi: 10.3390/en10070865. [Google Scholar] [CrossRef]

23. H. Faris, I. Aljarah, M. Habib and P. A. Castillo, “Hate speech detection using word embedding and deep learning in the Arabic language context,” in Proc. of the 9th Int. Conf. on Pattern Recognition Applications and Methods (ICPRAM 2020), Valleta, Matta, pp. 453–460, 2020. [Google Scholar]

Cite This Article

APA Style

Al-onazi, B.B., Alzahrani, J.S., Alotaibi, N., Alshahrani, H., Elfaki, M.A. et al. (2024). Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection. Intelligent Automation & Soft Computing, 39(3), 567–583. https://doi.org/10.32604/iasc.2023.033835

Vancouver Style

Al-onazi BB, Alzahrani JS, Alotaibi N, Alshahrani H, Elfaki MA, Marzouk R, et al. Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection. Intell Automat Soft Comput. 2024;39(3):567–583. https://doi.org/10.32604/iasc.2023.033835

IEEE Style

B. B. Al-onazi et al., “Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection,” Intell. Automat. Soft Comput., vol. 39, no. 3, pp. 567–583, 2024. https://doi.org/10.32604/iasc.2023.033835

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection

Abstract

Keywords

References

Cite This Article

613

321

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link