Open Access iconOpen Access

ARTICLE

crossmark

A Hybrid Query-Based Extractive Text Summarization Based on K-Means and Latent Dirichlet Allocation Techniques

Sohail Muhammad1, Muzammil Khan2, Sarwar Shah Khan2,3,*

1 Department of Computer Science, City University of Science and IT, Peshawar, 24370, Pakistan
2 Department of Computer and Software Technology, University of Swat, Swat, 19120, Pakistan
3 Department of Computer Science, IQRA National University Swat, Swat, 19130, Pakistan

* Corresponding Author: Sarwar Shah Khan. Email: email

Journal on Artificial Intelligence 2024, 6, 193-209. https://doi.org/10.32604/jai.2024.052099

Abstract

Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections. Large-scale time consumption is certain to scan and analyze to retrieve the most relevant textual data item from all the documents required a sophisticated technique for a query against the document collection. It is always challenging to retrieve a more accurate and fast retrieval from a large collection. Text summarization is a dominant research field in information retrieval and text processing to locate the most appropriate data object as single or multiple documents from the collection. Machine learning and knowledge-based techniques are the two query-based extractive text summarization techniques in Natural Language Processing (NLP) which can be used for precise retrieval and are considered to be the best option. NLP uses machine learning approaches for both supervised and unsupervised learning for calculating probabilistic features. The study aims to propose a hybrid approach for query-based extractive text summarization in the research study. Text-Rank Algorithm is used as a core algorithm for the flow of an implementation of the approach to gain the required goals. Query-based text summarization of multiple documents using a hybrid approach, combining the K-Means clustering technique with Latent Dirichlet Allocation (LDA) as topic modeling technique produces 0.288, 0.631, and 0.328 for precision, recall, and F-score, respectively. The results show that the proposed hybrid approach performs better than the graph-based independent approach and the sentences and word frequency-based approach.

Keywords


Cite This Article

APA Style
Muhammad, S., Khan, M., Khan, S.S. (2024). A hybrid query-based extractive text summarization based on k-means and latent dirichlet allocation techniques. Journal on Artificial Intelligence, 6(1), 193-209. https://doi.org/10.32604/jai.2024.052099
Vancouver Style
Muhammad S, Khan M, Khan SS. A hybrid query-based extractive text summarization based on k-means and latent dirichlet allocation techniques. J Artif Intell . 2024;6(1):193-209 https://doi.org/10.32604/jai.2024.052099
IEEE Style
S. Muhammad, M. Khan, and S.S. Khan, “A Hybrid Query-Based Extractive Text Summarization Based on K-Means and Latent Dirichlet Allocation Techniques,” J. Artif. Intell. , vol. 6, no. 1, pp. 193-209, 2024. https://doi.org/10.32604/jai.2024.052099



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 368

    View

  • 149

    Download

  • 0

    Like

Share Link