Open Access
ARTICLE
Ext-ICAS: A Novel Self-Normalized Extractive Intra Cosine Attention Similarity Summarization
1 Deparment of Information Technology, Thiagarajar College of Engineering, Madurai, India
2 Deparment of Data Science, Thiagarajar College of Engineering, Madurai, India
* Corresponding Author: P. Sharmila. Email:
Computer Systems Science and Engineering 2023, 45(1), 377-393. https://doi.org/10.32604/csse.2023.027481
Received 18 January 2022; Accepted 08 April 2022; Issue published 16 August 2022
Abstract
With the continuous growth of online news articles, there arises the necessity for an efficient abstractive summarization technique for the problem of information overloading. Abstractive summarization is highly complex and requires a deeper understanding and proper reasoning to come up with its own summary outline. Abstractive summarization task is framed as seq2seq modeling. Existing seq2seq methods perform better on short sequences; however, for long sequences, the performance degrades due to high computation and hence a two-phase self-normalized deep neural document summarization model consisting of improvised extractive cosine normalization and seq2seq abstractive phases has been proposed in this paper. The novelty is to parallelize the sequence computation training by incorporating feed-forward, the self-normalized neural network in the Extractive phase using Intra Cosine Attention Similarity (Ext-ICAS) with sentence dependency position. Also, it does not require any normalization technique explicitly. Our proposed abstractive Bidirectional Long Short Term Memory (Bi-LSTM) encoder sequence model performs better than the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder with minimum training loss and with fast convergence. The proposed model was evaluated on the Cable News Network (CNN)/Daily Mail dataset and an average rouge score of 0.435 was achieved also computational training in the extractive phase was reduced by 59% with an average number of similarity computations.Keywords
The contemporary web has a massive increase in text content resulting in information overload [1]. Searching for informative data is a time-consuming process. Abstractive summarization is an effective solution for summarizing these data. The primary research on text summarization began in the 1960s [2]. Summarization process can be chiefly categorized as extractive and abstractive [3] methods. An extractive summarization is a method of selecting key words, phrases, and sentences from the source articles. Abstractive Summarization is the method of shortening the content as a short passage without contriving the entire meaning of the complete document.
Lately, Deep Learning has emerged as a powerful learning technique applied to a variety of Natural Language Processing tasks like Machine Translation [4], Speech Recognition [5], Question-Answering [6], Text Conversation and Summarization [7]. Finaly, abstractive summarization task has been framed as a sequence language modeling task [8] as a standard encoder-decoder framework. In this framework, two Recurrent Neural Networks (RNN) are trained; one of them at the encoder side to encode the input sequences, thereby mapping them into a fixed-size vector as hidden states and the second one is at the decoder meant for decoding the output sequence from the hidden states.
Seq2seq with attention [9,10] was introduced to enhance the summarization task. The purpose of attention is to focus on key sentences with some attention weights assigned to them. Though attention is flexible and highly parallelizable with matrix computation, more memory is required. Hence, a novel method for computing dependency position (DP) has been incorporated into intra-attention. In a given text, ahead of a certain distance, denoted by D, there is no considerable dependency between the sentences. The value of this threshold plays a vital role in the analysis. By doing so, the number of matrix computations gets reduced in self-attention also carried out in parallel. These parallelizable intra-attentions make use of the Graphical Processing Unit (GPU) more efficiently.
So, a novel model has been proposed to overcome these problems by applying self normalized Ext-ICAS in extractive methods and then by combining with abstractive sequence methods that make use of parallel computations efficiently. Major contributions of this paper are:
• Development of a novel Ext-ICAS in extractive phase to extract relevant sentences with sentence dependency position (DP), which are then built on standard seq2seq encoder-decoder framework.
• Ext-ICAS enables parallel computation with self-normalized intra-cosine attention similarity.
• Dealing with sentence-level representation reduces the computational complexity in the extractive phase since intra-attention works with fixed-size vectors.
• Comparison of the relevant content and abstractness with the existing methods on the CNN/Daily Mail Dataset using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores. Also, compared the training loss for the Bi-GRU encoder and Bi-LSTM encoder sequence model.
The rest of this article is organized as follows: Section 2 discusses the related works. The two-phase summarization model is briefly explained in Section 3. Dataset and experimental results are presented in Section 4. Whereas Section 5 gives a detailed description of observations and discussion, conclusions are drawn in Section 6.
The traditional summarization method is an extractive graph-based ranking algorithm proposed as LEXRANK [11,12]. These algorithms limit syntactic and semantic structure. Therefore, to maintain semantic relation, certain prediction-based neural network models like Word2Vec [13], FastText [14], GloVe were proposed for word embedding. Doc2vec [15] distributed vector representation model was proposed for sentence embedding. Bidirectional Encoder Representations from Transformers (BERT) [16–18] the pre-trained language models in sequence modeling. However, these had limits for extended input sequences and could only train with a maximum of 512 tokens.
Extractive methods are more informative but they lack cohesion [19]. So [20,21] proposed a Convolution Sequence to Sequence (ConvS2S) network with Convolution Neural Network (CNN) as an encoder for large vocabulary but limited in sequence order which requires additional layers. Later, abstractive summarization models were used RNN encoder and decoder for sequence generation tasks. A comparative study between RNN and CNN was presented in [22]; the authors have concluded that RNN is better for long-sequence prediction.
Bi-LSTM [23] was used as an encoder that showed better performance in capturing information than unidirectional gated LSTM [24]. Seq2seq model forced the compression of the input sequences leading to a loss of information. To avoid such losses, [9] proposed the use of different attention such as Dot-Product, Scaled Dot-Product, and Additive at the decoder to extract more informative sentences.
Self or Intra attention [25]was introduced, by focusing within the concerned sentences for the extraction rather than the whole to reduce the number of computations which has the ability of parallel computing and results in better utilization of the GPU.
2.1 Comparison of Seq2Seq Models in Text Summarization
RNN’s perform well for sequence data with less expensive and faster training, but has short-term memory problem. So the semantic representation using LSTM and GRU for large documents is relatively complex in respect to more number of parameters and computations, higher memory consumption, and longer training time.GRU required fewer parameters than LSTM and so faster in computations. But LSTM is meant for better input representation, accurate results, and bidirectional network learning features from both past as well as future inputs for better representation and faster convergence.
An overview of different sequence models has been presented in Tab. 1.
Even though recent advances in seq2seq language modeling techniques on deep learning have achieved a high level of semantic representation and sequence generation, through a meticulous review of the literature, some research gaps were identified.
A deep neural network performs better on a large dataset. The aforementioned seq2seq model with Bi-GRU encoder [29] requires a week to train the model. Hence, high computational complexity is the major challenging task of training sequential inputs. Another research gap evident is the need to preserve all the relevant information in the target summary without redundancies.
Reducing computational complexity and extracting relevant information are the two major bottlenecks that have been addressed in our proposed work. By combining the advantages of extractive and abstractive models in the form of a two-phase deep neural document summarization technique, sentence extraction has been accomplished using a self-normalized parallel process, and summary generation has been carried out through a seq2seq network.
3 Proposed Two-Phase Summarization Model
For summarizing large documents, a two-phase deep neural intra-attention abstractive summarization model is proposed. Fig. 1 shows the workflow of the proposed model. The proposed model comprises two phases, namely extractive and abstractive phases.
The objective of the proposed extractive phase is to extract more informative and relevant sentences to build an extractive summary using the self-normalized Ext-ICAS method. Self-normalized cosine attention with dependency position is the novelty proposed in the extractive phase for parallel computation. Intra-attention is a feed-forward neural network that enables parallel processing and hence makes use of the GPU efficiently. Compared with RNN and gated RNN, it converges quickly with a minimum number of parameters. Fig. 2 explains the steps and layers of the extractive phase.
3.1.1 Self-Normalized Cosine Attention
Computing cosine itself self-normalized one, the range of output ranges from 0 to 1, hence there is no need for explicit normalization and activation functions as shown in Fig. 3. SiV and Si+1V are two sentence vectors from the input X document. Intra Cosine Self Normalization is computed.
The extractive phase has five steps. They are explained here:
a) Input layer (Preprocessing): The input documents are split into sentences. The input document X = {S1, S2, S3…Sn}.
b) Embedding layer (Learning Intermediate Representation): The sentences are embedded into sentence vectors using doc2vec, which is effective in capturing semantic structure in the learning vector representation. X = {S1V, S2V, S3V …SnV}
c) Attention layer (Computing Self Normalized Intra-Cosine Attention Similarity): Computing self- normalized Intra Cosine Attention similarity score between each sentence vector. Intra-attention at the word level requires more matrix computations, so in this model, the sentence level attention has been employed to distill key sentences and reduce the data size, thereby resulting in minimum computations.
where, Si and Sj are fixed-size vector embeddings obtained using doc2vec; n is the number of sentences in the document; DP is the sentence dependency position; When i = j, it means the same sentence in the document, and hence similarity will be at the highest level, that is 1, leading to duplication. Therefore such cases need to be eliminated.
d) Similarities between the sentences are computed using the ICAS method with sentence dependency position DP.
e) Max-Out Layer
f) Sentence Selection or Output layer g(X): Sentences with high similarity are extracted in the extractive phase. g(X) is the concatenation of the maximum attention score.
a) First, the input news article is split into 12 sentences. S1 to S12 shown in Fig. 4
b) Learning Intermediate Representation: Sentences are converted into a vector using doc2vec gensim shown in Fig. 5
c) Computing self normalized Intra-Cosine Attention Similarity with Dependency Position DP as 2 Fig. 6. Self-normalized attention scores for Sentence S1 with S0, S1, S2, and S3. So, attention scores ICAS for S1 with the previous two sentences, and the next sentences are calculated.
d) Self-Normalized Attention Score: Tab. 2 shows the Self Normalized Attention Score ICAS with all sentences in a document. Similarity among the same sentence in the document will be 1, leading to duplication. Therefore, such comparison needs to be eliminated; to minimize the computations which are highlighted in Tab. 2
e) Max-out function h(x) is used to extract the sentences with maximum scores, and then function g(x) concatenates all the selected sentences as extractive summaries. No threshold value needs to be given manually to this function. Also [29] explain max-out function works better than ReLU. Sentences S2, S3, S4, S5, S7, and S9 are extracted from the news article as an extractive summary shown in Fig. 7, and the comparison of our proposed Ext-ICAS extractive model with Sentence Dependency Attention is shown in Fig. 8.
Sentences from the extractive phase are fed into the seq2seq framework abstractive phase to generate a quality summary. Fig. 9 shows the illustration of the abstractive phase encoder-decoder model with attention. Extracted sentences from the Ext-ICAS model fed into the seq2seq abstractive phase. Sentences are embedded into a vector using a Bi-LSTM encoder and attention weights are computed using alignment and context vector. LSTM decoder generates the summary sequence.
Following are the steps in the abstractive phase:
1. Encoder: With Bi-LSTM as the encoder, all sentences are encoded into a hidden vector representation.
2. Attention: Focuses on all the hidden vectors to select the keywords.
3. Decoder: Using LSTM beam-search, the decoder generates a summary sequence from the extracted words.
Bi-LSTM supersedes LSTM in terms of reduced convergence cycle (i.e., in both forward and backward direction). Thus Bi-LSTM has been chosen as the encoder. Eqs. (7)–(9) provide formulae for forward, backward, and concatenated-LSTM respectively.
where,
It focuses on relevant words for the whole input sequence by assigning each word with a weighted score. These weights are then aggregated with hidden vectors of both the encoder and decoder. As a result, the key concepts are represented in the form of a feature vector. This can be computed using Eq. (10).
where,
where αi is the alignment vector from the encoder. Then the alignment vector for each word can be arrived at using Eq. (11). From the alignment vector obtained in the previous step, these words, which meet certain threshold values are selected in the form of a context vector using Eq. (12).
In the decoder, the output summary is predicted as sequences of words using a searching algorithm. The extracted feature vector from the attention layer is fed as input for the decoder to generate a summary, using the Naive Bayes method by maximizing conditional probability with beam size 2. As beam size increases beyond 2, the results will be more optimized but with more memory consumption. Since the beam size of 2 results in reduced search space, it has been chosen.
As a post-processing step, the proposed model incorporates the pointer mechanism [30] and copy mechanism [31], to avoid Out-of-Vocabulary (OoV) word problem.
To evaluate the proposed model, the benchmark dataset CNN/Daily Mail is adopted by most of the existing methods. Initially, the dataset is framed for question-answering [32]. Later, it is extended for the summarization task as well. This dataset contains over 300 K unique news articles, each article with an average of 800 words, with a highlighted summary of 3 to 4 lines each. The training/validation/testing split was 287,113/13/368/11490. Link: https://www.kaggle.com/gowrishankarp/newspaper-text-summarization-cnn-dailymail.
In the extractive phase, preprocessed sentence embeddings are implemented using gensim doc2vec, a single-layer feed-forward neural network as an intra-attention layer is built. Then, the intra-cosine similarity score is computed with DP as 2. No explicit normalization is implemented since the model itself is self-normalized.
The abstractive model has 256-dimensional hidden states and 128-dimensional word embeddings. A deep neural network works with fixed size vocabulary to generate text sequences. So vocab size is fixed as 50 K (the top frequent words) in this model. This will enable the framework to preserve relevant information. As could be seen from Fig. 10, for a vocal size up to 50 k words, the training loss is at a minimal level; beyond that a spurt in training loss. Hence, the vocal size has been fixed as 50 K words.
Seq2seq framework has been modeled as a two-layered Bi-LSTM encoder with attention [4] and LSTM decoder with beam size as 2. The learning rate has been maintained as 0.15.
The quality of a summary can be evaluated by human beings depending on its saliency, fluency, redundancy, and grammar. This evaluation varies from one human annotator to the other. So, it is decided to evaluate the using Rouge [33] scores.
Rouge is a package introduced in the framework for measuring the quality of summary by counting the number of word overlapping or sentence similarities. Rouge-n for n-gram similarity and Rouge - L for longest sequences are computed using Eq. (13);
where ref is the reference summary; count match (n-gram) is the number of n-gram matches between reference summary and model generated summary; count (n-gram) is the number of the n-gram in reference summary.
This section presents a comparative study of the proposed Ext-ICAS summarization model with the existing extractive and abstractive summarization models. At the outset, it is observed that Ext-ICAS extractive phase summary extracts semantically related sentences with a minimum number of similarity computations. Compared with RNN, feed-forward layers have a less number of parameters and also support parallel computation, thus the time complexity is reduced.
For example, let us consider a news article with N sentences; its similarity computation without DP is N*N, whereas, when DP is introduced, the same gets reduced to N*(DP*2), where N is the number of sentences in the document, DP is the sentence dependency position and the numeric value of 2 signifies the fact that the previous and next sentences are taken into consideration.
Let a randomly-selected news article with 12 sentences from the chosen dataset be taken for visualization purposes; the similarity score between the sentence vectors is computed as shown in Fig. 11. Fig. 11 a) shows the similarity score using Ext-ICAS model without dependency position and Fig. 11 b) shows the similarity score using Ext-ICAS model with sentence dependency position (DP = 2). The diagonal elements always take a value of 1, which means that they represent either duplicates or highly related sentences and so the diagonal values are not considered for further computation. After a certain distance [25], the dependencies between the sentences are meaningless. So, the parameter DP is introduced to reduce the number of computations.
By tuning the values of DP, rouge scores are calculated. From Tab. 3, it is observed that the R1 score remains constant after DP ≥ 3 and small variation between DP values as 2 and 3. So, in the proposed model, we fix the DP as 2. The number of sentences N is 12. Similarity, computation without DP is 144, (i.e., 12 × 12); with DP = 2 it is 48, (i.e., 12 × 2 × 2). Thus two-thirds of similarity computations are evaded, which reduces the computations by 60%.
5.1 Impact on Enriched Vocabulary
Extracted relevant informative sentences from the proposed Ext-ICAS model enriched the vocabulary with the most informative and relevant keywords as fixed-size vocab (top 50K words) without any repetition. Summary generated from extracted sentences avoids redundancies. Hence, Ext-ICAS combined with the seq2seq abstractive phase has enhanced the performance.
5.2 Comparison with Existing Model
Time consumption is the biggest challenge with continuously growing data. Now the minimized data had shown improved performance in training. From Fig. 12, it could be observed that the proposed model Ext-ICAS + PGN (Bi-GRU encoder)converged at around 140K iterations and Ext-ICAS + ABS (with Bi-LSTM encoder) converged around 160K iterations with minimum loss intraining. But the baseline PGN model with Bi-GRU encoder converges after 200K iterations with lots of loss fluctuations, which leads to slowing-down of the training process.
The Ext-ICAS extractive phase summary extracts semantically related sentences and achieves better Rouge scores than the existing models. In combination with the abstractive phase also it has performed better in the Rouge scores. In Tab. 4, Rouge scores R-1, R-2, and R-L are compared with the existing extractive summarization models: LEAD-N, LEXRANK and (Hierarchical Structured Self-Attentive Model for Extractive Document Summarization) HSSAS and Pointer-Generator Networks (PGN). Tab. 5 illustrates the comparative study of the Rouge scores of R-1, R-2, and R-L with the existing abstractive summarization models (ABS).
The proposed model yields a high R-1 score, thereby maintaining most of the relevant informative keywords than the other baseline models. R-2 also attained more or less comparable results by maintaining co-occurrences of bigrams. The extractive model achieved a high R-L score by keeping the extracted sentence intact. But, in the abstractive phase, the R-L score is less in comparison with the extractive phase because of the rephrasing of the sentences in the abstractive summarization phase.
Graphical representation of rouge scores for both models is shown in Fig. 13. Comparatively high R1, R2 rouge score at extractive phase than abstractive combine with Ext-ICAS summarization model.
The proposed model yields a high R-1 score, thereby maintaining most of the relevant informative keywords than the other baseline models. R-2 also attained more or less comparable results by maintaining co-occurrences of bigrams. The extractive model achieved a high R-L score by keeping the extracted sentence intact. But, in the abstractive phase, the R-L score is less in comparison with the extractive phase because of the rephrasing of the sentences in the abstractive summarization phase.
Sample summaries generated by the existing extractive and abstractive model compared with the proposed Ext-ICAS extractive model and also combined with the abstractive model are shown in Fig. 14 with the reference summary.
6 Conclusion and Suggestions for Future Work
In this work, the Ext-ICAS self-normalized deep neural document summarization model for relevant informative summarization has been proposed. The effectiveness of the proposed model both in extractive and abstractive summarization has been evaluated on the CNN/DailyMail dataset. The proposed Ext-ICAS self-normalized deep neural document summarization model incorporated with sentence dependency position performed better in capturing semantic relationships within the documents. The experimental results have shown significant improvement in performance with minimum training time and abstractive relevant sequence extraction. Concerning readability and relevance, it is found to give complete meaning.
Upon evaluation of the proposed model, it was observed that it outperforms the baseline model in terms of ROUGE scores and also in computational training with an average number of similarity computations in the extractive phase being reduced by 59%; the model achieved ROUGE scores of 0.435, 0.176 and 0.348 respectively for R-1, R-2, and R-L.
There is always a tradeoff between beam size and memory consumption, which can be dealt with in future studies to generate abstractive summary. Further, this work can be extended for multi-document summarization and query-based summarization. Later on, the proposed Ext-ICAS model could also be applicable in the abstractive phase. It could also be observed that, in the absence of a reference summary, the rouge score may not be adequate to evaluate the summary quality. In such a situation, further research is needed to look for alternative evaluation metrics.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
References
1. M. Yousefi-Azar and L. Hamey, “Text summarization using unsupervised deep learning,” Expert System with Applications, vol. 68, no. 1, pp. 93–105, 2017. [Google Scholar]
2. M. F. Mridha, A. A. Lima, K. Nur, S. C. Das, M. Hasan et al., “A survey of automatic text summarization: Progress, process and challenges,” IEEE Access, vol. 9, pp. 156043–156070, 2021. [Google Scholar]
3. J. Pilault, R. Li, S. Subramanian and C. Pal, “On extractive and abstractive neural document summarization with transformer language models,” in Proc. Conf. on Empirical Methods in Natural Language Processing, pp. 9308–9319, 2020. [Google Scholar]
4. J. Su, J. Zeng, D. Xiong, Y. Liu, M. Wang et al., “A hierarchy-to-sequence attentional neural machine translation model,” IEEE/ACM Transactions Audio, Speech, Language Processing, vol. 26, no. 3, pp. 623–632, 2018. [Google Scholar]
5. Z. Song, “English speech recognition based on deep learning with multiple features,” Computing, vol. 102, no. 3, pp. 663–682, 2020. [Google Scholar]
6. L. Q. Cai, M. Wei, S. T. Zhou and X. Yan, “Intelligent question answering in restricted domains using deep learning and question pair matching,” IEEE Access, vol. 8, pp. 32922–32934, 2020. [Google Scholar]
7. J. Torres, C. Vaca, L. Terán and C. L. Abad, “Seq2seq models for recommending short text conversations,” Expert System with Applications, vol. 150, no. 8, pp. 113270, 2020. [Google Scholar]
8. Z. Liang, J. Du and C. Li, “Abstractive social media text summarization using selective reinforced Seq2Seq attention model,” Neurocomputing, vol. 410, pp. 432–440, 2020. [Google Scholar]
9. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., “Attention is all you need,” Proceedings of Advances in Neural Information Processing Systems, vol. 2, pp. 5999–6009, 2017. [Google Scholar]
10. C. Chootong, T. K. Shih, A. Ochirbat, W. Sommool and Y. Y. Zhuang, “An attention enhanced sentence feature network for subtitle extraction and summarization,” Expert System with Application, vol. 178, pp. 114946, 2021. [Google Scholar]
11. G. Erkan and D. R. Radev, “LexRank: Graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, vol. 22, pp. 457–479, 2004. [Google Scholar]
12. W. S. El-Kassas, C. R. Salama, A. A. Rafea and H. K. Mohamed, “EdgeSumm: Graph-based framework for automatic text summarization,” Information Processing and Management, vol. 57, no. 6, pp. 102264, 2020. [Google Scholar]
13. T. Mikolov, K. Chen, G. Corrado and J. Dean, “Efficient estimation of word representations in vector space, arXiv,” arXiv: 1301.3781, 2013. [Google Scholar]
14. Z. Lin, L. Wang, X. Cui and Y. Gu, “Fast sentiment analysis algorithm based on double model fusion,” Computer Systems Science and Engineering, vol. 36, no. 1, pp. 175–188, 2021. [Google Scholar]
15. Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proc. of Int. Conf. on Machine Learning, pp. 1188–1196, 2014. [Google Scholar]
16. M. Moradi, G. Dorffner and M. Samwald, “Deep contextualized embeddings for quantifying the informative content in biomedical text summarization,” Computer Methods and Programs in Biomedicine, vol. 184, pp. 105117, 2020. [Google Scholar]
17. J. Devlin, M. W. Chang, K. Lee and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186, 2019. [Google Scholar]
18. J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim et al., “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020. [Google Scholar]
19. K. Nandhini and S. R. Balasundaram, “Improving readability through extractive summarization for learners with reading difficulties,” Egyptian Informatics Journal, vol. 14, no. 3, pp. 195–204, 2013. [Google Scholar]
20. J. P. A. Vieira and R. S. Moura, “An analysis of convolutional neural networks for sentence classification,” in Proc. of XLIII Latin American Computer Conf., pp. 1–5, 2017. [Google Scholar]
21. H. Kameoka, K. Tanaka, D. Kwaśny, T. Kaneko and N. Hojo, “ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1849–1863, 2020. [Google Scholar]
22. R. Paulus, C. Xiong and R. Socher, “A deep reinforced model for abstractive summarization,” in Proc. of 6th Int. Conf. Learning Representation, pp. 1–12, 2018. [Google Scholar]
23. H. Palangi, L. Deng, Y. Shen, J. Gao, X. He et al., “Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 24, no. 4, pp. 694–707, 2016. [Google Scholar]
24. X. Duan, S. Ying, W. Yuan, H. Cheng and X. Yin, “A generative adversarial networks for log anomaly detection,” Computer Systems Science and Engineering, vol. 37, no. 1, pp. 135–148, 2021. [Google Scholar]
25. P. Shaw, J. Uszkoreit and A. Vaswani, “Self-attention with relative position representations,” in Proc. of Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, vol. 2, pp. 464–468, 2018. [Google Scholar]
26. L. Wang, P. Yao, Y. Tao, Z. Li, W. Liu et al., “A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization,” in Proc. of the 27th Int. Joint Conf. on Artificial Intelligence, AAAI Press, pp. 4453–4460, 2018. [Google Scholar]
27. J. Xie, B. Chen, X. Gu, F. Liang and X. Xu, “Self-attention-based BiLSTM model for short text fine-grained sentiment classification,” IEEE Access, vol. 7, pp. 180558–180570, 2019. [Google Scholar]
28. S. Lamsiyah, A. El Mahdaouy, B. Espinasse and S. E. A. Ouatik, “An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings,” Expert Systems with Applications, vol. 167, no. 4, pp. 114152, 2021. [Google Scholar]
29. G. Castaneda, P. Morris and T. M. Khoshgoftaar, “Evaluation of maxout activations in deep learning across several big data domains,” Journal of Big Data, vol. 6, no. 1, pp. 72, 2019. [Google Scholar]
30. A. See, P. J. Liu and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” ArXiv170404368 Cs, 2017. [Google Scholar]
31. J. Gu, Z. Lu, H. Li and V. O. K. Li, “Incorporating copying mechanism in sequence-to-sequence learning,” in Proc. of 54th Annual Meeting Association for Computational Linguistics, Long Paper, vol. 3, pp. 1631–1640, 2016. [Google Scholar]
32. T. Baumel, M. Eyal and M. Elhadad, “Query focused abstractive summarization: Incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models,” ArXiv:1801.07704,2018. [Google Scholar]
33. C. Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Proc. of Association for Computational Linguistics, Barcelona, Spain, vol. 8, pp. 74–81, 2004. [Google Scholar]
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.