Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration

Aiman; Muhammad Arshad; Bilal Khan; Sadique Ahmad; Muhammad Asim

doi:10.32604/cmc.2024.049254

Open Access icon Open Access

ARTICLE

Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration

Aiman¹, Muhammad Arshad^1,*, Bilal Khan¹, Sadique Ahmad^2,*, Muhammad Asim^2,3

1 Department of Computer Science, City University of Science and Information Technology, Peshawar, 25000, Pakistan
2 EIAS: Data Science and Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Saudi Arabia
3 School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, China

* Corresponding Authors: Muhammad Arshad. Email: email ; Sadique Ahmad. Email: email

(This article belongs to the Special Issue: Unveiling the Role of AIGC, Large Models, and Human - Centric Insights in Digital Defense)

Computers, Materials & Continua 2024, 79(2), 3333-3353. https://doi.org/10.32604/cmc.2024.049254

Received 01 January 2024; Accepted 09 March 2024; Issue published 15 May 2024

Abstract

Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personal information, such as age, gender, occupation, and education, based on various linguistic features, e.g., stylistic, semantic, and syntactic. The importance of AP lies in various fields, including forensics, security, medicine, and marketing. In previous studies, many works have been done using different languages, e.g., English, Arabic, French, etc. However, the research on Roman Urdu is not up to the mark. Hence, this study focuses on detecting the author’s age and gender based on Roman Urdu text messages. The dataset used in this study is Fire’18-MaponSMS. This study proposed an ensemble model based on AdaBoostM1 and Random Forest (AMBRF) for AP using multiple linguistic features that are stylistic, character-based, word-based, and sentence-based. The proposed model is contrasted with several of the well-known models from the literature, including J48-Decision Tree (J48), Naïve Bays (NB), K Nearest Neighbor (KNN), and Composite Hypercube on Random Projection (CHIRP), NB-Updatable, RF, and AdaboostM1. The overall outcome shows the better performance of the proposed AdaboostM1 with Random Forest (ABMRF) with an accuracy of 54.2857% for age prediction and 71.1429% for gender prediction calculated on stylistic features. Regarding word-based features, age and gender were considered in 50.5714% and 60%, respectively. On the other hand, KNN and CHIRP show the weakest performance using all the linguistic features for age and gender prediction.

Keywords

Digital forensics; author profiling for security; AdaBoostM1; random forest; ensemble learning

Cite This Article

APA Style

Aiman, , Arshad, M., Khan, B., Ahmad, S., Asim, M. (2024). Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration. Computers, Materials & Continua, 79(2), 3333–3353. https://doi.org/10.32604/cmc.2024.049254

Vancouver Style

Aiman , Arshad M, Khan B, Ahmad S, Asim M. Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration. Comput Mater Contin. 2024;79(2):3333–3353. https://doi.org/10.32604/cmc.2024.049254

IEEE Style

Aiman, M. Arshad, B. Khan, S. Ahmad, and M. Asim, “Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration,” Comput. Mater. Contin., vol. 79, no. 2, pp. 3333–3353, 2024. https://doi.org/10.32604/cmc.2024.049254

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Predicting Age and Gender in Author Profiling: A Multi-Feature Exploration

Abstract

Keywords

Cite This Article

990

357

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link