An Accurate Persian Part-of-Speech Tagger

Morteza Okhovvat; Mohsen Sharifi; Behrouz Bidgoli

doi:10.32604/csse.2020.35.423

Open Access icon Open Access

ARTICLE

An Accurate Persian Part-of-Speech Tagger

Morteza Okhovvat^1,∗, Mohsen Sharifi^2,†, Behrouz Minaei Bidgoli^2,‡

1 Health Management and Social Development Research Center, Golestan University of Medical Sciences, Gorgan, Iran
2 School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
† Mohsen Sharifi, msharifi@iust.ac.ir
‡ Behrouz Minaei Bidgoli, b_minaei@iust.ac.ir

* Corresponding Author: Morteza Okhovvat, email

Computer Systems Science and Engineering 2020, 35(6), 423-430. https://doi.org/10.32604/csse.2020.35.423

Download PDF

Abstract

The processing of any natural language requires that the grammatical properties of every word in that language are tagged by a part of speech (POS) tagger. To present a more accurate POS tagger for the Persian language, we propose an improved and accurate tagger called IAoM that supports properties of text to speech systems such as Lexical Stress Search, Homograph words Disambiguation, Break Phrase Detection, and main aspects of Persian morphology. IAoM uses Maximum Likelihood Estimation (MLE) to determine the tags of unknown words. In addition, it uses a few defined rules for the sake of achieving high accuracy. For tagging the input corpus, IAoM uses a Hidden Markov Model (HMM) alongside the Viterbi algorithm. To present a fair evaluation, we have performed various experiments on both homogeneous and heterogeneous Persian corpora and studied the effect of the size of training set on the accuracy of IAoM. Experimental results demonstrate the merit of the proposed tagger in achieving an overall accuracy of 97.6%.

Keywords

Hidden Markov Model, Maximum Likelihood Estimation, Morphology, POS Tagger, Viterbi Algorithm

Cite This Article

APA Style

Okhovvat, M., Sharifi, M., Minaei Bidgoli, B. (2020). An Accurate Persian Part-of-Speech Tagger. Computer Systems Science and Engineering, 35(6), 423–430. https://doi.org/10.32604/csse.2020.35.423

Vancouver Style

Okhovvat M, Sharifi M, Minaei Bidgoli B. An Accurate Persian Part-of-Speech Tagger. Comput Syst Sci Eng. 2020;35(6):423–430. https://doi.org/10.32604/csse.2020.35.423

IEEE Style

M. Okhovvat, M. Sharifi, and B. Minaei Bidgoli, “An Accurate Persian Part-of-Speech Tagger,” Comput. Syst. Sci. Eng., vol. 35, no. 6, pp. 423–430, 2020. https://doi.org/10.32604/csse.2020.35.423

BibTex EndNote RIS

Citations

1

[click to view]

Copyright © 2020 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

An Accurate Persian Part-of-Speech Tagger

Abstract

Keywords

Cite This Article

Citations

3471

2180

2

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link