Open Access iconOpen Access

ARTICLE

crossmark

Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration

Aiman1, Muhammad Arshad1,*, Bilal Khan1, Sadique Ahmad2,*, Muhammad Asim2,3

1 Department of Computer Science, City University of Science and Information Technology, Peshawar, 25000, Pakistan
2 EIAS: Data Science and Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Saudi Arabia
3 School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, China

* Corresponding Authors: Muhammad Arshad. Email: email; Sadique Ahmad. Email: email

(This article belongs to the Special Issue: Unveiling the Role of AIGC, Large Models, and Human - Centric Insights in Digital Defense)

Computers, Materials & Continua 2024, 79(2), 3333-3353. https://doi.org/10.32604/cmc.2024.049254

Abstract

Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personal information, such as age, gender, occupation, and education, based on various linguistic features, e.g., stylistic, semantic, and syntactic. The importance of AP lies in various fields, including forensics, security, medicine, and marketing. In previous studies, many works have been done using different languages, e.g., English, Arabic, French, etc. However, the research on Roman Urdu is not up to the mark. Hence, this study focuses on detecting the author’s age and gender based on Roman Urdu text messages. The dataset used in this study is Fire’18-MaponSMS. This study proposed an ensemble model based on AdaBoostM1 and Random Forest (AMBRF) for AP using multiple linguistic features that are stylistic, character-based, word-based, and sentence-based. The proposed model is contrasted with several of the well-known models from the literature, including J48-Decision Tree (J48), Naïve Bays (NB), K Nearest Neighbor (KNN), and Composite Hypercube on Random Projection (CHIRP), NB-Updatable, RF, and AdaboostM1. The overall outcome shows the better performance of the proposed AdaboostM1 with Random Forest (ABMRF) with an accuracy of 54.2857% for age prediction and 71.1429% for gender prediction calculated on stylistic features. Regarding word-based features, age and gender were considered in 50.5714% and 60%, respectively. On the other hand, KNN and CHIRP show the weakest performance using all the linguistic features for age and gender prediction.

Keywords


Cite This Article

APA Style
Aiman, , Arshad, M., Khan, B., Ahmad, S., Asim, M. (2024). Predicting age and gender in author profiling: A multi-feature exploration. Computers, Materials & Continua, 79(2), 3333-3353. https://doi.org/10.32604/cmc.2024.049254
Vancouver Style
Aiman , Arshad M, Khan B, Ahmad S, Asim M. Predicting age and gender in author profiling: A multi-feature exploration. Comput Mater Contin. 2024;79(2):3333-3353 https://doi.org/10.32604/cmc.2024.049254
IEEE Style
Aiman, M. Arshad, B. Khan, S. Ahmad, and M. Asim "Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration," Comput. Mater. Contin., vol. 79, no. 2, pp. 3333-3353. 2024. https://doi.org/10.32604/cmc.2024.049254



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 292

    View

  • 116

    Download

  • 0

    Like

Share Link