Open Access
ARTICLE
Cyberbullying Sexism Harassment Identification by Metaheurustics-Tuned eXtreme Gradient Boosting
1 Technical Faculty, Singidunum University, Belgrade, 11000, Serbia
2 Informatics and Computing, Singidunum University, Belgrade, 11000, Serbia
3 Business Economics, Singidunum University, Belgrade, 11000, Serbia
4 Computing and Informatics, Sinergija University, Bijeljina, 76300, Bosnia and Herzegovina
5 Department for Information Systems and Technologies, University “Union Nikola Tesla”, Cara Dusana, Belgrade, 11080, Serbia
6 Department for Computer Science and Informatics, School of Electrical Engineering, University of Belgrade, Belgrade, 11000, Serbia
7 Department of Electrical and Electronics Engineering, Kongu Engineering College (Autonomous), Perundurai, Erode, 638060, India
8 Department of Mathematics, Saveetha School of Engineering (Deemed to be University), SIMATS Thandalam, Chennai, 602105, India
9 MEU Research Unit, Middle East University, Amman, 11831, Jordan
* Corresponding Author: Nebojsa Bacanin. Email:
Computers, Materials & Continua 2024, 80(3), 4997-5027. https://doi.org/10.32604/cmc.2024.054459
Received 28 May 2024; Accepted 29 August 2024; Issue published 12 September 2024
Abstract
Cyberbullying is a form of harassment or bullying that takes place online or through digital devices like smartphones, computers, or tablets. It can occur through various channels, such as social media, text messages, online forums, or gaming platforms. Cyberbullying involves using technology to intentionally harm, harass, or intimidate others and may take different forms, including exclusion, doxing, impersonation, harassment, and cyberstalking. Unfortunately, due to the rapid growth of malicious internet users, this social phenomenon is becoming more frequent, and there is a huge need to address this issue. Therefore, the main goal of the research proposed in this manuscript is to tackle this emerging challenge. A dataset of sexist harassment on Twitter, containing tweets about the harassment of people on a sexual basis, for natural language processing (NLP), is used for this purpose. Two algorithms are used to transform the text into a meaningful representation of numbers for machine learning (ML) input: Term frequency inverse document frequency (TF-IDF) and Bidirectional encoder representations from transformers (BERT). The well-known eXtreme gradient boosting (XGBoost) ML model is employed to classify whether certain tweets fall into the category of sexual-based harassment or not. Additionally, with the goal of reaching better performance, several XGBoost models were devised conducting hyperparameter tuning by metaheuristics. For this purpose, the recently emerging Coyote optimization algorithm (COA) was modified and adjusted to optimize the XGBoost model. Additionally, other cutting-edge metaheuristics approach for this challenge were also implemented, and rigid comparative analysis of the captured classification metrics (accuracy, Cohen kappa score, precision, recall, and F1-score) was performed. Finally, the best-generated model was interpreted by Shapley additive explanations (SHAP), and useful insights were gained about the behavioral patterns of people who perform social harassment.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.