Open Access
ARTICLE
A Two-Phase Paradigm for Joint Entity-Relation Extraction
1 College of Computer, National University of Defense Technology, Changsha, 410073, China
2 The Affiliated Eye Hospital of Nanjing Medical University, Nanjing, 210029, China
* Corresponding Author: Yuke Ji. Email:
Computers, Materials & Continua 2023, 74(1), 1303-1318. https://doi.org/10.32604/cmc.2023.032168
Received 09 May 2022; Accepted 12 June 2022; Issue published 22 September 2022
Abstract
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task. However, these models sample a large number of negative entities and negative relations during the model training, which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance. In order to address the above issues, we propose a two-phase paradigm for the span-based joint entity and relation extraction, which involves classifying the entities and relations in the first phase, and predicting the types of these entities and relations in the second phase. The two-phase paradigm enables our model to significantly reduce the data distribution gap, including the gap between negative entities and other entities, as well as the gap between negative relations and other relations. In addition, we make the first attempt at combining entity type and entity distance as global features, which has proven effective, especially for the relation extraction. Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-of-the-art span-based models for the joint extraction task, establishing a new standard benchmark. Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.Keywords
Span-based joint entity and relation extraction models simultaneously conduct NER (Named Entity Recognition) and RE (Relation Extraction) in text span forms. Typically, these models are constructed as follows: given an unstructured text, the model divides it into text spans; it then constructs ordered span pairs (a.k.a. relation tuples); and finally, it obtains entities and relations by performing classifications on the semantic representations of spans and relation tuples, respectively. We present a typical case study in Fig. 1: the “In”, “In 1831”, and “James Garfield” are three span examples; the <“James Garfield”, “U.S.”> and <“James Garfield”, “Ohio”> are two relation tuple examples; a span-based model predicts the types of spans and relation tuples by performing classifications on related semantic representations. For instance, the “In” is classed as the Not-Entity type, and the <“James Garfield”, “Ohio”> is classified as the Live type.1
Span-based joint extraction models [2–7] sample numerous negative entities and relations (i.e., spans of the Not-Entity type and relation tuples of the Not-Relation type) during the model training. These negative examples actually lead to grossly imbalanced data distributions, which is one of the primary reasons for the suboptimal model performance. As shown in Tab. 1, the entity distribution between Other and Not-Entity is 592: 101555 (approximate to 1: 172), the relation distribution between Kill and Not-Relation is 229: 12915 (approximate to 1: 56). Paradoxically, previous work [1] demonstrates that an adequate number of negative examples are required to ensure that the model performs well. Thus, resolving the issue of grossly imbalanced data distributions while maintaining an adequate number of negative examples is a feasible way to improve the model performance.
Global features, such as those derived from entity information, can be critical in the joint extraction task. As illustrated in Fig. 1, if SpERT is aware that the “James Garfield” is a person (Per) entity and the “U.S.” is a location (Loc) entity beforehand, it may easily classify the <“James Garfield”, “U.S.”> into the Live type. Moreover, entity distance, which tracks the word counts between two entities, can reflect the entities’ correlation. For example, in the CoNLL04 dataset, relations with an entity distance of less than 6 account for 64.5%, and the smaller the distance, the more likely the two entities have a relation. However, as far as we know, previous work [8–12] has used either entity type or entity distance but not both. The combination of the above two types of information may play a more important role in the joint extraction task. As shown in Tab. 2, the <Loc, Loc> tends to have the LocIn relation when the entity distance is smaller, such as 76.6% for [0–3], 12.8% for [4–7] and 3.5% for [8–11], whereas the <Per, Per> tends to have the Kill relation in the case of a bigger entity distance, such as 21.3% for [0–3], 33.5% for [4–7], and 26.7% for [8–11].
In this paper, we propose a two-phase span-based model for the joint extraction task, with the goal of addressing the issue of grossly imbalanced data distributions and the lack of effective global features. Motivated by the fact that we can achieve NER (RE) in two steps, namely first classify all entities (relations) and then predict their types. We divide the joint extraction task into two phases, with the first phase obtaining entities and relations and the second phase predicting their types. Our model reduces the data distribution gap by dozens of times using the two-phase paradigm. Take the data in Tab. 1 as an example: (1) in the first stage, the entity distribution can be reduced to 1: 24 and the relation distribution to 1: 8, whereas the corresponding values in SpERT are 1: 172 and 1: 56, respectively.2 (2) In the second phase, our model predicts the types of entities and relations, implying that the data distributions are roughly even.3 Moreover, we attempt for the first time to combine entity type and entity distance as global features and use them to augment our model. Furthermore, we propose a gated mechanism for fusing various semantic representations, taking the weighted importance of each representation into account. In Section 4.5, we validate the effectiveness of the above model components.
Experimental results on the ACE05, CoNLL04 and SciERC datasets demonstrate that our model consistently outperforms the strongest span-based baselines in terms of F1-score, providing a new span-based benchmark for the joint extraction task. Extensive analyses further validate the effectiveness of our model.
In summary, our model differs from the previous span-based models in three ways: (1) As far as we know, our model makes the first attempt to balance the grossly imbalanced data distributions. (2) Our model combines entity type and entity distance as the global features, whereas previous span-based models use at most one of them. (3) Our model uses a gated mechanism to fuse various semantic representations, whereas previous span-based models use a simple concatenation manner.
2.1 Span-based Joint Entity and Relation Extraction
Recently, span-based models have been extensively investigated for the joint entity and relation extraction task. Luan et al. [2] propose almost the first span-based joint model and attempt to further improve model performance by incorporating the coreference resolution task [13,14]. Luan et al. [4] also include the coreference resolution task in their span-based joint model. Moreover, some other span-based models [5] have examined how to incorporate additional natural language processing tasks, such as event detection [15,16]. More recently, Dixit and Al-Onaizan [3] introduce the pre-trained language model, i.e., ELMo (Embeddings from Language Models) [17], into the span-based joint model for the first time. Eberts and Ulges [1] propose to use BERT (Bidirectional Encoder Representation from Transformers) [18] as the backbone of their span-based joint model. Zhong and Chen [7] propose to use ALBERT (A Lite BERT) [19] in their span-based joint model. However, these models suffer from grossly imbalanced data distributions, as the span-based paradigm requires extensive negative entities and relations. Although our model also samples a large number of examples, we propose a two-phase paradigm to eliminate the data distribution gap effectively.
The entity type and entity distance are two types of important global features that are frequently used in joint extraction models [20–27]. Miwa and Bansal [18], Sun and Grishman [28], and Bekoulis et al. [9], are among the first to use entity types as global features in their joint extraction models. They concatenate fixed-size embeddings trained for entity types to relation semantic representations. Zhao et al. [10] model strong correlations between entity labels and text tokens and concatenate entity label embeddings to relation semantic representations. For entity distance, Zeng et al. [11] and Ye et al. [12] concatenate relative entity position features to relation semantic representations. However, the above models use either entity type or entity distance but make no attempt to combine them. In comparison, our model suggests combining the entity type and entity distance as global features, which is validated to be more effective.
The neural architecture of our two-phase span-based model is illustrated in Fig. 2. For a given unstructured text
We formulate the text spans (denoted as
where
Our model uses the BERT [18] model as the word embedding generator. We denote the BERT embedding sequence for text
where
Based on
As shown in Fig. 2, the Phase One is composed of two modules: Entity Classification and Relation Classification, where the former obtains coarse-grained entities and the latter obtains coarse-grained relations.
This module obtains coarse-grained entities by performing binary classification on span semantic representations. We begin by converting all entity types in the training set to the Entity type and set the type of sampled negative entities to the Not-Entity type. Our model will be trained to classify spans as the Entity type when they are predicted to be entities, otherwise the Not-Entity type.
In this paper, we obtain the span semantic representations using three different types of semantic representations: (1) span token representation, (2) contextual representation, and (3) span width embedding.
For the span
where
In this paper, we take the
Span width embedding allows the model to incorporate prior experience over span widths. In this paper, we train a fixed-size embedding for each span width (i.e., 1, 2,…) during the model training. And we refer to the width embedding for the span
where
To obtain coarse-grained entities, we first pass the
where
This module obtains coarse-grained relations by performing binary classification on semantic representations of relation tuples. We begin by converting all relation types in the training set to the Relation type and assigning the Not-Relation type to sampled negative relations. Our model will be trained to classify relation tuples as the Relation type if they have relations, otherwise the Not-Relation type.
Let
We obtain the sematic representation of
Relation context is the text that between the two entities of a relation tuple [29]. In this paper, we assume the relation context of
We obtain the contextual representation of
In this paper, we propose to combine entity type and entity distance as global features. Due to the fact that all entities here are the Entity type, only the entity distance can be used to distinguish different feature entries. As show in Fig. 2, we refer to them as binary global features. During model training, we train a fixed-size embedding for each feature entry and denote the feature embedding for
We obtain the semantic representation of
To obtain coarse-grained relations, we first pass the
where
3.2.3 Training Loss of Phase One
For each of the above two binary classifications, the training objective is to minimize the following binary cross-entropy loss:
where t denotes one of the above two classifications.
In the Phase Two, our model predicts the types of coarse-grained entities and relations, obtaining fine-grained entities and relations. The Phase Two, as illustrated in Fig. 2, is composed of two modules: Entity Type Predication and Relation Type Predication.
In this module, we obtain entity types by conducting multi-class classifications on the semantic representations of coarse grained entities. Specifically, for each coarse-grained entity e in
where
3.3.2 Relation Type Predication
We obtain relation types by performing multi-class classifications on relation semantic representations. As shown in Fig. 2, the relation semantic representation is derived from two parts: the relation representation used for the binary relation classification and multi-class global features.
For each coarse-grained relation r in
where
Then we obtain the relation semantic representation (denoted as
To obtain the type of r, we first pass
where
3.3.3 Training Loss of the Phase Two
For each of above two multi-class classification tasks, the training objective is to minimize the following cross-entropy loss:
where
During the model training, we minimize the following joint training loss:
where T denotes the two binary classifications and
We evaluate our model on ACE05 [30], CoNLL04 [31], and SciERC [2].
ACE05 defines seven entity types (Per, Org, Loc, Gpe, Fac, Veh, and Wea) and six relation types (Phys, Part-whole, Per-soc, Org-aff, Art, and Gen-aff) between entities. We use the same data splits, pre-processing, and task settings proposed by Li and Ji [32] and Li et al. [33]. It has 351 documents for training, 80 for development and 80 for test.
CoNLL04 defines four entity types (Loc, Org, Per, and Other) and five relation types (Kill, Live, LocIn, OrgBI, and Work). We use the splits defined by Ji et al. [6] and Wang et al. [25]. The dataset consists of 910 instances for training, 243 for development and 288 for test.
SciERC is derived from 500 abstracts of AI papers. The dataset defines six scientific entities (Task, Method, Metric, Material, Other, and Generic) and seven relation types (Compare, Conjunction, Evaluate-for, Used-for, Feature-of, Part-of, and Hyponym-of) in a total of 2,687 sentences. We use the same training (1,861 sentences), development (275 sentences), and test (551 sentences) split following the previous work [3,34].
For a fair comparison with previous work, we use the bert-base-cased model on ACE05 and CoNLL04, and use the scibert-scivocab-cased model on SciERC. We optimize our model using the BertAdam for 120 epochs with a learning rate of 5e-5 and a weight decay of 1e-2. We set the span width threshold
where
For ACE05, an entity mention is considered correct if its head region and type match the ground truth, and a relation is correct if both its relation type and two entity mentions are correct. For CoNLL04, an entity mention is considered correct if its offsets and type match the ground truth, and a relation is correct if both its relation type and two entity mentions are correct. For SciERC, the entity type is not considered when evaluating relation extraction, which is in line with the previous work [6,7]. And the remaining settings are identical to those for CoNLL04.
We compare our model with all the published span-based models for the joint extraction task that we are aware of. We report the comparison results in Tab. 3–Tab. 5, from which we can observe that our model consistently outperforms the strongest baselines in terms of F1-score across the three datasets.
To be more precise, on ACE05, our model achieves + 0.4% and + 3.2% absolute F1 gains on NER and RE, respectively, when compared to Ji et al. [6] that achieves the previous best NER performance. In addition, when compared to Zhong and Chen [7] that achieves the previous best RE performance, our model achieves + 1.3% and + 1.4% absolute F1 gains on NER and RE, respectively. On CoNLL04, our model achieves + 0.3% and + 1.6% absolute F1 gains on NER and RE, respectively, when compared to the strongest baseline Ji et al. [6]. On SciERC, when compared to Santosh et al. [35] that achieves the previous best NER performance, our model delivers + 0.5% and + 1.4% absolute F1 gains. When compared to Zhang et al. [36] that achieves the previous best RE results, our model achieves + 0.6% and + 0.2% absolute F1 gains.
We attribute the above performance improvements to that our model is capable of balancing the grossly imbalanced data distributions and exploiting the effective global features.
4.4 Effectiveness Investigations
We conduct extensive effectiveness investigations across the three datasets and use SpERT [1] as the baseline. SpERT is the most similar model to ours, and it uses two linear decoders for entity and relation classifications, as well as the BERT model as a backbone. SpERT, on the other hand, ignores the global features and does not balance the imbalanced data distributions. Furthermore, to make a fair comparison, our model employs the same negative sampling strategy as SpERT.
As illustrated in Tab. 6, we compare our model with the baseline in terms of the most imbalanced data distributions. We obtain the data distributions on NER and RE by comparing the numbers of different types of entities and relations, i.e., the smallest number vs the largest number. And we obtain the data distributions during the model training. We have the following observations: (1) On ACE05, the most imbalanced data distributions of the baseline are 1: 773.3 on NER and 1: 150.0 on RE. Our model, on the other hand, reduces the ratios to 1: 21.3 and 1: 13.8, respectively. (2) On CoNLL04, the most imbalanced data distributions of the baseline are 1: 171.5 on NER and 1: 56.4 on RE. Our model, on the other hand, reduces the ratios to 1: 23.7 and 1: 9.9, respectively. (3) On SciERC, the most imbalanced data distributions of the baseline are 1: 605.3 on NER and 1: 913.5 on RE. Our model, on the other hand, reduces the ratios to 1: 25.5 and 1: 35.6, respectively.
Based on the above observations, we conclude that the two-phase paradigm allows our model to avoid suffering from grossly imbalanced data distributions.
4.4.2 Effectiveness Against Entity Length
In general, as the entity lengths increase, it becomes increasingly difficult to recognize the entities. In this section, we conduct investigations on NER performance in relation to entity lengths. We divide all entity lengths, which are restricted by the span width threshold
4.4.3 Effectiveness Against Entity Distance
In general, as the distance between the two entities of a relation increases, the relation becomes more difficult to extract. In this section, we conduct investigations on RE performance in relation to entity distances. We divide all entity distances into five intervals, namely [0], [1–3], [4–6], [7–9], and [>=10]. We conduct investigations on the dev sets of the three datasets and report the investigation results in Fig. 4. The results demonstrate that our model beats the baseline across all distance intervals. Specifically, our model obtains greater improvement when the distance increases, demonstrating that our model is more effective in the case of long entity distances.
We conduct ablation studies on the dev sets of the three datasets to analyze the effects of various model components. We report the ablation results in Tab. 7, where the “w/o Two-Phase” denotes ablating the two-phase paradigm. As a result, our approach is incapable of dealing with unbalanced data distributions. Additionally, our model cannot make use of binary global features, but retains multi-class global features. The “w/o Bi-Features” denotes ablating the binary global features, which is realized by removing
We have the following observations: (1) The two-phase paradigm consistently improves the model performance across the three datasets, delivering + 0.6% to + 3.2% F1-scores on NER and + 2.6% to + 3.1% F1-scores on RE, which can be attributed to the paradigm’s ability to prevent our model from being harmed by grossly imbalanced data distributions. (2) Both binary and multi-class global features consistently benefit RE performance, and the multi-class features are generally more effective than the binary ones, as demonstrated on ACE05 and CoNLL04. The explanation for this could be that the multi-class features take into account fine-grained entity types. Additionally, both types of global features have a negligible effect on NER. A plausible explanation is that these features are derived from entity information and are employed in the relation extraction. (3) The combination of the two types of global features results in improved RE performance, suggesting that they have a beneficial effect on one another. (4) The proposed gated method consistently improves model performance, bringing + 0.2% to + 1.50% F1-scores on NER and + 0.5% to + 0.9% on RE, suggesting that the gated mechanism can better fuse various semantic representations.
In this paper, we propose a two-phase span-based model for the joint entity and relation extraction task, aiming to tackle the grossly imbalanced data distributions caused by the essential negative sampling. And we augment the proposed model with global features obtained by combining entity types and entity distances. Moreover, we propose a gated mechanism for effectively fusing various semantic representations. Experimental results on several datasets demonstrate that our model consistently outperforms the strongest span-based models for the joint extraction task, establishing a new standard benchmark.
Funding Statement: This research was supported by the National Key Research and Development Program [2020YFB1006302].
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
1Span-based models add a Not-Entity type for spans that are not entities and a Not-Relation type for relation tuples that don’t hold relations.
21: 24 ≈ (592+786+1370+1541): 101555; 1: 8 ≈ (229+312+325+347+421): 12915.
3The entity distributions can be approximate to (592: 786: 1370: 1541), and the relation distributions can be approximate to (229: 312: 325: 347: 421), which are approximately even.
4The [CLS] token is a specific token that added to the beginning of tokenized texts. The embedding of the [CLS] token is generally used for text classifications.
References
1. M. Eberts and A. Ulges, “Span-based joint entity and relation extraction with transformer pre-training,” in Proc. ECAI, Santiago de Compostela, Spain, pp. 1–8, 2020. [Google Scholar]
2. Y. Luan, L. H. He, M. Ostendorf and H. Hajishirzi, “Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction,” in Proc. EMNLP, Brussels, Belgium, pp. 3219–3232, 2018. [Google Scholar]
3. K. Dixit and Y. Al-Onaizan, “Span-level model for relation extraction,” in Proc. ACL, Florence, Italy, pp. 5308–5314, 2019. [Google Scholar]
4. Y. Luan, D. Wadden, L. H. He, A. Shah, M. Ostendorf et al., “A general framework for information extraction using dynamic span graphs,” in Proc. NAACL, Minneapolis, MN, USA, pp. 3036–3046, 2019. [Google Scholar]
5. D. Wadden, U. Wennberg, Y. Luan and H. Hajishirzi, “Entity, relation, and event extraction with contextualized span representations,” in Proc. EMNLP, Hong Kong, China, pp. 5783–5788, 2019. [Google Scholar]
6. B. Ji, J. Yu, S. S. Li, J. Ma, Q. B. Wu et al., “Span-based joint entity and relation extraction with attention-based span-specific and contextual semantic representations,” in Proc. COLING, Barcelona, Spain, pp. 88–99, 2020. [Google Scholar]
7. Z. X. Zhong and D. Q. Chen, “A frustratingly easy approach for entity and relation extraction,” in Proc. NAACL, Online, pp. 50–61, 2021. [Google Scholar]
8. M. Miwa and M. Bansal, “End-to-end relation extraction using LSTMS on sequences and tree structures,” in Proc. ACL, Berlin, Germany, pp. 1105–1116, 2016. [Google Scholar]
9. G. Bekoulis, J. Deleu, T. Demeester and C. Develder, “Joint entity recognition and relation extraction as a multi-head selection problem,” Expert Systems with Applications, vol. 114, pp. 34–45, 2018. [Google Scholar]
10. S. Zhao, M. H. Hu, Z. P. Cai and F. Liu, “Modeling dense cross-modal interactions for joint entity-relation extraction,” in Proc. IJCAI, Yokohama, Japan, pp. 4032–4038, 2020. [Google Scholar]
11. D. J. Zeng, K. Liu, S. W. Lai, G. Y. Zhou and J. Zhao, “Relation classification via convolutional deep neural network,” in Proc. COLING, Dublin, Ireland, pp. 2335–2344, 2014. [Google Scholar]
12. W. Ye, B. Li, R. Xie, Z. H. Sheng, L. Chen et al., “Exploiting entity bio tag embeddings and multi-task learning for relation extraction with imbalanced data,” in Proc. ACL, Florence, Italy, pp. 1351–1360, 2019. [Google Scholar]
13. K. Lee, L. H. He, M. Lewis and L. Zettlemoyer, “End-to-end neural coreference resolution,” in Proc. EMNLP, Copenhagen, Denmark, pp. 188–197, 2017. [Google Scholar]
14. L. H. He, K. Lee, O. Levy and L. Zettlemoyer, “Jointly predicting predicates and arguments in neural semantic role labeling,” in Proc. ACL, Melbourne, Australia, pp. 364–369, 2018. [Google Scholar]
15. A. P. B. Veyseh, M. V. Nguyen, N. N. Trung, B. Min and T. H. Nguyen, “Modeling document-level context for event detection via important context selection,” in Proc. EMNLP, Punta Cana, Dominican Republic, pp. 5403–5413, 2021. [Google Scholar]
16. R. D. Girolamo, C. Esposito, V. Moscato and G. Sperli, “Evolutionary game theoretical on-line event detection over tweet streams,” Knowledge-Based Systems, vol. 211, pp. 106563, 2021. [Google Scholar]
17. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., “Deep contextualized word representations,” in Proc. NAACL, New Orleans, Louisiana, pp. 2227–2237, 2018. [Google Scholar]
18. J. Devlin, M. W. Chang, K. Lee and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL, Minneapolis, Minnesota, pp. 4171–4186, 2019. [Google Scholar]
19. Z. Z. Lan, M. D. Chen, S. Goodman, K. Gimpel, P. Sharma et al., “ALBERT: A lite bert for self-supervised learning of language representations,” in Proc. ICLR, Addis Ababa, Ethiopia, pp. 1–8, 2019. [Google Scholar]
20. Z. Q. Geng, Y. H. Zhang and Y. M. Han, “Joint entity and relation extraction model based on rich semantics,” Neurocomputing, vol. 429, pp. 132–140, 2021. [Google Scholar]
21. Y. J. Wang, C. Z. Sun, Y. B. Wu, H. Zhou, L. Li et al., “ENPAR: Enhancing entity and entity pair representations for joint entity relation extraction,” in Proc. ECAC, Online, pp. 2877–2887, 2021. [Google Scholar]
22. L. M. Hu, L. H. Zhang, C. Shi, L. Q. Nie, W. L. Guan et al., “Improving distantly-supervised relation extraction with joint label embedding,” in Proc. EMNLP, Hong Kong, China, pp. 3821–3829, 2019. [Google Scholar]
23. A. Katiyar and C. Cardie, “Going out on a limb: Joint extraction of entity mentions and relations without dependency trees,” in Proc. ACL, Vancouver, Canada, pp. 917–928, 2017. [Google Scholar]
24. C. Z. Sun, Y. B. Wu, M. Lan, S. L. Sun, W. T. Wang et al., “Extracting entities and relations with joint minimum risk training,” in Proc. EMNLP, Brussels, Belgium, pp. 2256–2265, 2018. [Google Scholar]
25. J. Wang and W. Lu, “Two are better than one: Joint entity and relation extraction with table-sequence encoders,” in Proc. EMNLP, Online, pp. 1706–1721, 2020. [Google Scholar]
26. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., “Attention is all you need,” in Proc. NIPS, Long Beach, CA, USA, pp. 5998–6008, 2017. [Google Scholar]
27. K. Ding, S. S. Liu, Y. H. Zhang, H. Zhang, X. X. Zhang et al., “A knowledge-enriched and span-based network for joint entity and relation extraction,” Computers, Materials & Continua, vol. 68, no. 1, pp. 377–389, 2021. [Google Scholar]
28. H. Y. Sun and R. Grishman, “Lexicalized dependency paths based supervised learning for relation extraction,” Computer Systems Science & Engineering, vol. 43, no. 3, pp. 861–870, 2022. [Google Scholar]
29. B. Ji, S. S. Li, J. Yu, J. Ma, J. T. Tang et al., “Research on Chinese medical named entity recognition based on collaborative cooperation of multiple neural network models,” Journal of Biomedical Informatics, vol. 104, pp. 103395, 2020. [Google Scholar]
30. G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel et al., “The automatic content extraction (ace) program-Tasks, data, and evaluation,” in Proc. LREC, Lisbon, Portugal, pp. 837–840, 2004. [Google Scholar]
31. D. Roth and W. T. Yih, “A linear programming formulation for global inference in natural language tasks,” in Proc. NAACL, Boston, Massachusetts, USA, pp. 1–8, 2004. [Google Scholar]
32. Q. Li and H. Ji, “Incremental joint extraction of entity mentions and relations,” in Proc. ACL, Baltimore, MD, USA, pp. 402–412, 2014. [Google Scholar]
33. X. Y. Li, F. Yin, Z. J. Sun, X. Y. Li, A. Yuan et al., “Entity-relation extraction as multi-turn question answering,” in Proc. ACL, Florence, Italy, pp. 1340–1350, 2019. [Google Scholar]
34. Y. J. Wang, C. Z. Sun, Y. B. Wu, H. Zhou, L. Li et al., “UniRE: A unified label space for entity relation extraction,” in Proc. ACL, Online, pp. 220–231, 2021. [Google Scholar]
35. T. Y. S. S. Santosh, P. Chakraborty, S. Dutta, D. K. Sanyal and P. P. Das, “Joint entity and relation extraction from scientific documents: Role of linguistic information and entity types,” in Proc. EEKE, Online, pp. 15–19, 2021. [Google Scholar]
36. H. Y. Zhang, G. Q. Zhang and Y. Ma, “Syntax-informed self-attention network for span-based joint entity and relation extraction,” Applied Sciences, vol. 11, no. 4, pid. 1480, pp. 1–16, 2021. [Google Scholar]
37. X. G. Wang, D. Wang and F. P. Ji, “A span-based model for joint entity and relation extraction with relational graphs,” in Proc. IBDCloud, Exeter, UK, pp. 513–520, 2020. [Google Scholar]
38. Y. T. Tang, J. Yu, S. S. Li, B. Ji, Y. S. Tan et al., “Span representation generation method in entity-relation joint extraction,” in Proc. ICTA, Xi’an, China, pp. 465–476, 2021. [Google Scholar]
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.