Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (2)
  • Open Access

    ARTICLE

    Benchmarking Performance of Document Level Classification and Topic Modeling

    Muhammad Shahid Bhatti1,*, Azmat Ullah1, Rohaya Latip2, Abid Sohail1, Anum Riaz1, Rohail Hassan3

    CMC-Computers, Materials & Continua, Vol.71, No.1, pp. 125-141, 2022, DOI:10.32604/cmc.2022.020083

    Abstract Text classification of low resource language is always a trivial and challenging problem. This paper discusses the process of Urdu news classification and Urdu documents similarity. Urdu is one of the most famous spoken languages in Asia. The implementation of computational methodologies for text classification has increased over time. However, Urdu language has not much experimented with research, it does not have readily available datasets, which turn out to be the primary reason behind limited research and applying the latest methodologies to the Urdu. To overcome these obstacles, a medium-sized dataset having six categories is collected from authentic Pakistani news… More >

  • Open Access

    ARTICLE

    News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark

    Zhuo Zhou1, Jiaohua Qin1,*, Xuyu Xiang1, Yun Tan1, Qiang Liu1, Neal N. Xiong2

    CMC-Computers, Materials & Continua, Vol.62, No.1, pp. 217-231, 2020, DOI:10.32604/cmc.2020.06431

    Abstract Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data, this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform. Since the TF-IDF (term frequency-inverse document frequency) algorithm under Spark is irreversible to word mapping, the mapped words indexes cannot be traced back to the original words. In this paper, an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored. Firstly, the text feature is extracted by the TF-IDF algorithm combined CountVectorizer… More >

Displaying 1-10 on page 1 of 2. Per Page