Xiaolei Ma1, 2, Yang Lu1, 2, Yinan Lu1, *, Zhili Pei2, Jichao Liu3
CMC-Computers, Materials & Continua, Vol.63, No.2, pp. 923-941, 2020, DOI:10.32604/cmc.2020.07711
- 01 May 2020
Abstract Supervised machine learning approaches are effective in text mining, but their
success relies heavily on manually annotated corpora. However, there are limited numbers
of annotated biomedical event corpora, and the available datasets contain insufficient
examples for training classifiers; the common cure is to seek large amounts of training
samples from unlabeled data, but such data sets often contain many mislabeled samples,
which will degrade the performance of classifiers. Therefore, this study proposes a novel
error data detection approach suitable for reducing noise in unlabeled biomedical event
data. First, we construct the mislabeled dataset through error… More >