Active Learning Strategies for Textual Dataset-Automatic Labelling

Sher Daudpota; Saif Hassan; Yazeed Alkhurayyif; Abdullah Alqahtani; Muhammad Aziz

doi:10.32604/cmc.2023.034157

Open Access icon Open Access

ARTICLE

Active Learning Strategies for Textual Dataset-Automatic Labelling

Sher Muhammad Daudpota¹, Saif Hassan¹, Yazeed Alkhurayyif^2,*, Abdullah Saleh Alqahtani^3,4, Muhammad Haris Aziz⁵

1 Department of Computer Science, Sukkur IBA University, Sukkur, 65200, Pakistan
2 Al Quwayiyah College of Sciences and Humanities, Shaqra University, Shaqra, 15526, Saudi Arabia
3 Self-Development Skills Department, Common First Year Deanship, King Saud University, Riyadh, 12373, Saudi Arabia
4 STC’s Artificial Intelligence Chair, Department of Information Systems, College of Computer and Information Sciences, King Saud University, Riyadh, 11451, Saudi Arabia
5 College of Engineering & Technology, University of Sargodha, Sargodha, 40100, Pakistan

* Corresponding Author: Yazeed Alkhurayyif. Email: email

(This article belongs to the Special Issue: Emerging Techniques on Citation Analysis in Scholarly Articles)

Computers, Materials & Continua 2023, 76(2), 1409-1422. https://doi.org/10.32604/cmc.2023.034157

Received 07 July 2022; Accepted 23 September 2022; Issue published 30 August 2023

Abstract

The Internet revolution has resulted in abundant data from various sources, including social media, traditional media, etcetera. Although the availability of data is no longer an issue, data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts. The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning. More specifically, this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels. To achieve this objective, different experiments have been performed on the publicly available dataset. In first set of experiments, we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set. In the second set of experiments, we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set. The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3% on sentence level datasets for auto labelling.

Keywords

Active learning; automatic labelling; textual datasets

Cite This Article

APA Style

Daudpota, S.M., Hassan, S., Alkhurayyif, Y., Alqahtani, A.S., Aziz, M.H. (2023). Active learning strategies for textual dataset-automatic labelling. Computers, Materials & Continua, 76(2), 1409–1422. https://doi.org/10.32604/cmc.2023.034157

Vancouver Style

Daudpota SM, Hassan S, Alkhurayyif Y, Alqahtani AS, Aziz MH. Active learning strategies for textual dataset-automatic labelling. Comput Mater Contin. 2023;76(2):1409–1422. https://doi.org/10.32604/cmc.2023.034157

IEEE Style

S. M. Daudpota, S. Hassan, Y. Alkhurayyif, A. S. Alqahtani, and M. H. Aziz, “Active Learning Strategies for Textual Dataset-Automatic Labelling,” Comput. Mater. Contin., vol. 76, no. 2, pp. 1409–1422, 2023. https://doi.org/10.32604/cmc.2023.034157

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Active Learning Strategies for Textual Dataset-Automatic Labelling

Abstract

Keywords

Cite This Article

1823

594

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link