iconOpen Access

ARTICLE

crossmark

Topic Models to Analyze Disaster-Related Newspaper Articles: Focusing on COVID-19

Yun-Jung Choi1, Youn-Joo Um2,*

1 Red Cross College of Nursing, Chung-Ang University, Seoul, 06974, South Korea
2 Department of Nursing, Dong-Yang University, Yeongju, Gyeongbuk, 36040, South Korea

* Corresponding Author: Youn-Joo Um. Email: email

International Journal of Mental Health Promotion 2023, 25(3), 421-431. https://doi.org/10.32604/ijmhp.2023.023255

Abstract

Major media outlets have run many articles on the COVID-19 pandemic. Since the public suffers cognitive and emotional effects related to COVID-19 from such reports, we analyzed and reviewed the topics of news reports. We searched newspaper articles with the term ‘COVID-19’ term in four Korean daily newspapers from January 20, 2020, when the first patient in Korea was found, to June 15, 2020. Topic modeling analysis was conducted through text mining using R. Five themes were found: “Changes in people’s everyday life,” “Socio-economic shock,” “Trends in infection,” “Role of the government and business,” and “Increased psychological anxiety,” which all showed sharp increases in articles from mid-February to early March and then decreased. Despite the increased psychological anxiety people suffered from the COVID-19 pandemic, this topic showed the fewest articles. “Changes in people’s everyday life” showed the most, focusing attention on stimulating lifestyle articles of general interest. Since the COVID-19 pandemic can lead to mental health problems due to severe changes and isolation in everyday life, a comprehensive response to the news focusing on the impact on the mental health of the population around the world should be made.

Keywords


1  Introduction

Since the first reported COVID-19 case on December 01, 2019 [1], numerous confirmed cases and deaths have occurred, and the disease has spread rapidly to 185 countries [2]. During this unprecedented pandemic in the 21st century, major media have published newspaper articles on COVID-19, attracting great public attention. Socially, the COVID-19 pandemic has had severe effects on the psychological public [3]. Studies addressing the psychological effects of COVID-19 have shown that the initial COVID-19 outbreak caused severe distress associated with anxiety, depression, and post-traumatic stress symptoms [4]. In addition, “psychological infection prevention” has become no less important than “physical infection prevention,” as it has been reported that psychological well-being, anxiety, and depression scores were lower than before COVID-19 and did not improve between the early stages of COVID-19 and four weeks later [5]. Public awareness is reported as being formed through the mainstream media such as daily newspapers [6], and media information that spread rapidly during the epidemic period has greatly influenced public awareness and awareness formation [7]. Managing epidemics such as COVID-19 requires a positive approach to mental health that can provide direction to measures to overcome disaster situations [8].

Therefore, it is necessary to determine the topics that COVID-19-related articles cover in the news reports of daily newspapers, which the public is most exposed to, and which thus can exert the strongest effect of COVID-19-related information. Information about what piques people’s curiosity is quickly disseminated through major news media, thereby enhancing awareness of dangerous situations and playing a large role in preventing the spread of infectious diseases [9]. However, even though the impact of the disaster is serious, it may not be possible to augment the overall benefits of public welfare and health improvement if more lurid and extreme content is distributed, or if simple training or biased articles are encountered [10]. Therefore, this study aims to analyze the topic of news articles related to COVID-19 and review what related measures need to be implemented to promote mental health.

Earlier research on COVID-19 has primarily included studies of the clinical [1,11] and epidemiological characteristics of the COVID-19 virus [12,13], and measures to control infectious diseases [14]. Regarding its treatment in the media, one study analyzed retweeted news articles related to research on COVID-19 [15], another study investigated the risk of frequent exposure to COVID-19 news [16], and a third analyzed such topics as a media frame analysis by region of news reports on COVID-19 [17], but few studies have focused on the topic of news reports related to COVID-19 and the improvement of mental health in Korea. To fill this gap, this study investigates COVID-19 topics in the Korean context using structural topic modeling, thereby exploring ways to promote mental health. Finally, In the end, in the process of recognizing COVID-19-related disasters through reporting on COVID-19, we intend to grasp the level of psychological support of the news and review intervention measures for mental health promotion. Through this, it is expected that the global psychological disaster response capacity will be improved and disaster resilience promoted through comprehensive disaster news coverage focusing on the impact on the mental health of the population.

2  Methods

2.1 Data Collection

This study utilized text mining to analyze vast amounts of unstructured data. The Korea Press Foundation developed and released the BigKinds analysis system, which has accumulated news articles from 42 media organizations on a daily basis since 1990 and processed them with natural language processing. Using this system, researchers can individually pre-process and with minimal effort format unstructured newspaper article data. News articles in four Korean general daily newspapers were collected from January 20, 2020, when the first patient in Korea was found to June 15, 2020, by searching the term ‘Corona’ in BigKinds [18]. To prevent duplicate collection and analysis of the same article, separate duplicate articles were removed.

Fig. 1 presents the distribution of articles related to COVID-19 in four comprehensive daily newspapers over time. At the beginning of February, the number of newspaper articles was the highest, and it tended to decrease gradually. A more detailed analysis shows that the number of newspaper articles was not high when the first patient was confirmed occurred from around January 20 to January 26, when the first patient was confirmed. Then on January 27, the fourth confirmed patient appeared, and the Ministry of Health and Welfare upgraded the crisis alert level from “Caution” to “Warning.” In addition, the number of newspaper articles increased sharply when the Novel COVID-19 Virus Infection Control Headquarters under the Ministry of Health and Welfare began operations. Then, by mid-February, the number of confirmed cases slowed to zero to two, showing stability, after which the number of newspaper articles was again reduced. Then, the number of confirmed cases every day until mid-February slowed to zero, showing stability and then decreasing the number of newspaper articles again. Starting with an outbreak of infection in a specific area on February 19, the group infection caused news articles to rapidly increase. Since then, the number of newspaper articles has been on the decline, with increasing and decreasing trends.

images

Figure 1: Number of articles related to COVID-19 in four general daily newspapers

2.2 Data Analysis

This study adopted text data collection, text preprocessing, keyword analysis, and topic modeling based on the text mining method. In order to formalize text data, which consists of unstructured data, version 4.0 of R and RmecabKo, a Korean preprocessing package, were used.

2.2.1 Text Preprocessing

When text analysis is performed on the subject of newspaper article analysis, pre-processing of words must be performed. Accordingly, the text preprocessing process for word refinement was carried out, and the detailed contents are as follows. First, as a preprocessing task for analysis, articles were integrated into one data, and then duplicate articles were removed. Second, to extract the language for the noun, the tokenization process of the word was performed by applying the Mecab-ko morpheme analyzer. In order to divide the data into sentences by word units, morpheme analysis is required. In particular, since Korean has a wide variety of word endings, accurate morpheme analysis is important, and only common nouns are extracted through morpheme analysis. Based on the extracted nouns, unnecessary words (postpositions, special symbols, e-mail addresses, names, etc.) were deleted. In order to extract words through the Mecab-ko morpheme analyzer, the ‘tidytext’ package was used to organize the data. Using the POS (parts-of speech) tagging command of Mecab-ko, which divides sentences into morphemes, together with unnest_tokens of tidytext, sentences are tagged for each morpheme and tokenized. Next, using the %>% pipeline operator in the “dplyr” package, only words marked with NNG, which is a common noun, were extracted, and the rest were deleted and pasted into the existing data frame. Third, if the words have the same meaning, but are expressed differently (COVID-19, Coronavirus-19), the two words go through the process of unifying into one word. Fourth, words that appear continuously together on world occurrence were regarded as compound nouns. For example, when the words ‘COVID-19’ and ‘infectious person’ have a high frequency in succession, it was changed to ‘COVID-19_infectious person’. The observation frequency was set based on 5 or more observations.

2.2.2 Topic Modeling Analysis

In this study, LDA (Latent Dirichlet Allocation), which has been widely used recently in topic modeling, a text analysis method, was used by forming a matrix of keywords extracted from news texts using the R program. This algorithm is an unsupervised learning algorithm that finds the hidden topic of each word or document and groups the topics by document and keyword. That is, the text is processed by statistically inferring the topic by clustering keywords constituting the literature according to the appearance probability and distribution. LDA method (Latent Dirichlet Allocation) extracts the latent topics that determine the distribution of words in individual documents, that is, individual articles, from the distribution of words in the entire corpus. This method can be used to identify potential topics that affect the appearance of individual documents, that is, words containing individual news. After extracting multiple potential topics from a large-scale corpus using a data-driven method, it is possible to extract a topic that permeates the entire text of the corpus by finding the patterns in which potential topics cluster. For this, the stm package, which can conduct structural topic modeling analysis, was used. The data to be analyzed were created by transforming a sparse matrix through the cast_sparse function of tidytext, which was the processed in natural language according to then stm data format.

3  Results

3.1 Determining the Number of Topics

Multiple topics can be covered in a single newspaper article, and multiple newspaper articles can cover a common topic. In order to calculate the number of topics (K) covered in a particular newspaper article, the K value was set at a range of 30 to 50 to obtain the value of sustainability and the semantic consistency of residuals. Fig. 2 presents a diagram of this procedure.

images

Figure 2: Criteria for calculating the number of topics 1

Values with high sustainability, low residuals, and high semantic consistency are appropriate topics [19]. Fig. 3 shows the points where semantic coherence and exclusivity meet each other. Eventually, the K value was set to 40; it was checked whether the topic was classified well through the stm package, and the number of topics was finally calculated by reflecting the result.

images

Figure 3: Criteria for calculating the number of topics 2

3.2 STM Key Words by Topic

First, 16,521 news articles were preprocessed and nouns were extracted to construct a document-term matrix (DTM). Log-likelihood, semantic coherence, and residuals of the model, which change with the number of potential topics, were reviewed and the number of topics was set to 40 to extract potential topics. A list of the words that mainly appear in the 40 potential topics extracted is given in Table 1.

images

3.3 Hierarchical Cluster Analysis by Topic

The results of hierarchical cluster analysis between subjects using the structural topic modeling analysis results are presented as dendrograms in Fig. 4. It is also possible to analyze the 40 topics as they are, but setting up such a large number of topics would hamper conceptual brevity and create problems in interpretation and analysis.

images

Figure 4: Cluster analysis result after COVID-19 structural topic modeling (STM) analysis

Therefore, semantic dimensionality reduction analysis of the additional data was performed by first transforming the correlation between objects into a distance measure to create a distance matrix, then putting this distance matrix back into the cluster analysis.

Based on the results of systematic cluster analysis, it was considered that potential themes belonging to each cluster had close semantic correlations that could be conceptualized for each kind of news topic. In other words, another method of classifying newspaper article topics in this study can be operatively defined as the tendency of clustering among potential topics obtained as a result of structural topic model analysis.

A total of five clusters were identified, and the results of the topic organization according to structural topic modeling are presented in Table 2.

images

In order to examine whether such manipulation is valid, as shown in Table 2, clusters of potential topics were combined and viewed as a single newspaper article topic, and the meaning of each cluster was interpreted.

Theme 1 comprises seven topics (T1, T28, T3, T9, T27, T37, T39) and is labeled “Trend of infection.” It concerns reports on specific information on the spread of infection in Korea, as well as trends in the spread of infection abroad and details on foreign corona-related diplomatic issues and vaccine development.

Theme 2 consists of 10 topics (T12, T13, T25, T8, T30, T32, T34, T40) and is termed “Socio-economic impact.” This refers to the socio-economic impact of COVID-19, which has led to a decrease in income and an increase in borrowing by small business owners, and the market is changing from offline to online. In addition, it concerns content on group activities that have been suspended and the overall paralysis of social activities, such as the postponement or suspension of games and competitions.

Theme 3 consists of eight topics (T12, T13, T25, T8, T30, T32, T34, T40) and is labeled “The role of the government and business.” It includes reports on the government’s provision of national disaster support funds, the public disclosure of the status of confirmed patients, the impact on parliamentary elections due to corona, the government’s active screening measures and quarantine of confirmed patients, self-isolation, disclosure of confirmed patient movement, and police investigations into groups that did not cooperate in quarantine.

In addition, the company’s growth rate has decreased, the direction of business has changed due to the coronavirus, and the nature of charitable donations to overcome corona and changes of working policy to work from home are included.

Theme 4 comprises 10 topics (T15, T29, T14, T23, T31, T11, T20, T22, T35, T38) and is labeled “Changes in people’s everyday life.” It refers to the changes in the daily lives of people due to COVID-19 and related problems. In spite of the quarantine guidelines that require wearing a mask, it is difficult to obtain masks, so it is necessary to line up at pharmacies, observe the current weekday system to purchase masks and seek scarce disinfectants. In addition, school openings have been delayed, online instruction has been implemented, educational activities have been greatly reduced, the release of movies has been delayed, and the scope of activities has been reduced with the increase in confirmed patients.

Theme 5 consists of five topics (T2, T24, T26, T17, T19), and is labeled “Increased psychological anxiety.” It concerns people’s psychological difficulties, emotional anxiety, and depression caused by COVID-19.

In addition, contents such as anxiety about infection of themselves and family members, anger about group infections, restrictions on visiting patients, and fears of death and suffering due to financial difficulties were reflected. Time series patterns in STM themes.

In order to examine the changes in the themes defined by the structural theme model since January 20, 2020, we plotted them in a time series as shown in Fig. 5. The shades of gray presented as background in the figure indicate the frequency of COVID-19 articles. This shading is applied to the background of the distribution of the frequency of articles suggested in Fig. 1. According to the results for the patterns of changes in the time series for each theme, all themes showed a steep rise in March, followed by a gradual decrease. Around January 20, 2020, the theme of “Socio-economic shock” was the most common, but it gradually showed a lower trend than “Change in people’s everyday life.” Excluding January, the topics were ordered as follows from most articles to least: “Change in people’s everyday life,” “Socio-economic shock,” “The spread of infection,” “The role of the government and business,” and “Increased psychological anxiety.” The “Increased psychological anxiety” theme shows fewer articles than the other themes, and the “Changes in people’s everyday life” theme is consistently the most common.

images

Figure 5: Temporal pattern of the themes of COVID-19 news in a comprehensive daily newspaper search

4  Discussion

COVID-19 is a global pandemic directly reflected in newspaper articles. This study examined COVID-19-related reports from four major Korean daily newspapers from January 20, 2020, to June 15, 2020.

First, “Changes in people’s everyday life” and “Social and economic shock” were the most common themes. In particular, the number of articles for the former was lower than for the “Socio-economic shock” theme at the beginning of the outbreak, but after February, many changes in people’s daily lives pushed “Changes in people’s daily life” to the top. In addition, it can be seen as an effort to promote fellow feelings by publishing articles on the daily difficulties that people face by covering the many changes in daily life and economic harms experienced by the people. Rather than focusing on articles about the reality that attract public attention and stimulate emotions to increase the number of views on COVID-19 articles, we need to take more comprehensive measures related to various aspects of COVID-19 such as causes and measures.

Second, the theme of “The spread of infection” was the third highest, reflecting many reports on the situation of the COVID-19 pandemic at home and abroad. This theme played an important role in informing the public of the current status of infection and raising awareness and decreasing the risk of infection. However, it might also lead to the stigmatization of confirmed cases or criticism of mass outbreak areas. Therefore, the media should be very cautious in publishing articles that may stigmatize a specific group, strive to provide accurate information to prevent fear and anxiety about infection, and provide information in a direction focused more on infection prevention.

Third, the topic of “increasing psychological anxiety” was derived. The unpredictable transmission potential of COVID-19 is consistent with previous research showing that it not only threatens people’s physical health but also affects people’s increased psychological anxiety [20]. Since a sudden outbreak of an infectious disease can cause psychological anxiety in people, newspaper articles during the epidemic period need to evaluate the public’s psychological state and provide information about it [21]. Newspaper articles about the recovery process from COVID-19 and accurate information on how to respond to infectious diseases can reduce people’s anxiety and give them a sense of security [22]. It is necessary to produce newspaper articles that suggest practical ways to overcome the psychological anxiety of people in a more in-depth and contextual way by breaking away from the existing simple infectious disease incident reporting type of news production. Newspaper articles also need to provide ongoing training for telemedicine systems, online mental health services, and comprehensive mental health teams to provide professional services to people with mental health problems due to COVID-19.

This study has analyzed the themes of newspaper articles on COVID-19 to explore the role of the news in mental health promotion in disaster situations. However, there are several limitations of this study.

There is a limit to collecting all the articles on the actual website, so we collected them based on the articles posted on the news data collection site called BigKinds. In future research, it will be necessary to collect and analyze more reliable data through web crawling methods. In addition, while the COVID-19 pandemic is still active, only a specific period was selected for analysis. Therefore, it will be necessary to analyze the topics of the news more comprehensively and closely after the COVID-19 pandemic ends. We aimed to analyze the topic of COVID-19 news in Korea by considering only Korean newspaper data, but it will be necessary to make improvements in newspaper reporting to promote disaster mental health through a comparative analysis of the role of the news globally in the future.

5  Conclusions

This study examined the treatment in four major daily newspapers of the topics of COVID-19 news. Structural topic modeling (STM) was used to analyze the topics of COVID-19-related newspaper articles, yielding five themes: “Changes in people’s everyday life,” “Socio-economic shock,” “Trends in infection,” “The role of the government and business,” and “Increased psychological anxiety,” from most articles to least. All five themes showed a sharp increase in the number of articles from mid-February to early March, and then gradually decreased. Since the COVID-19 pandemic is likely to weaken the support system psychologically due to its adverse impacts on daily life, global health measures should be taken to use news reports to protect the mental safety of people in a disaster situation. The research results suggest that news articles should also report on disaster crisis intervention services, education, and psychological support services that can aid and give comfort to people in infectious disease disasters such as COVID-19. Furthermore, there is a need for comprehensive reporting of news focusing on the impact of the pandemic on the mental health of the population around the world.

Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study

References

  1. Huang, C., Wang, Y., Li, X., Ren, L., & Zhao, J. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet, 395, 497-506. [Google Scholar] [CrossRef]
  2. World Health Organization (2021). Coronavirus disease 2019 (COVID-19) situation reports. https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---21-september-2021.
  3. Bryant, R. A., Gallagher, H. C., Gibbs, L., Pattison, P., & MacDougall, C. (2017). Mental health and social networks after disaster. American Journal of Psychiatry, 174(3), 277-285. [Google Scholar] [CrossRef]
  4. Talevi, D., Socci, V., Carai, M., Carnaghi, G., & Faleri, S. (2020). Mental health outcomes of the COVID-19 pandemic. Rivista di Psichiatria, 55(3), 137-144. [Google Scholar] [CrossRef]
  5. Vindegaard, M., & Benros, M. E. (2020). COVID-19 pandemic and mental health consequences: Systematic review of the current evidence. Brain, Behavior, and Immunity, 89, 531-542. [Google Scholar] [CrossRef]
  6. Nair, B., Janenova, S., Serikbayeva, B. (2020). Social and mainstream media relations. In: Nair, S. Janenova, B. Serikbayeva (Eds.), A primer on policy communication in Kazakhstan, pp. 35–48. Singapore: Palgrave Pivot.
  7. Bennett, K. J., Olsen, J. M., Mekaru, S., Livinski, A. A., & Brownstein, J. S. (2013). The perfect storm of information: Combining traditional and non-traditional data sources for public health situational awareness during hurricane response. PLoS Currents Disasters, 16(5), [Google Scholar] [CrossRef]
  8. Cowper, A. (2020). COVID-19: Are we getting the communications right?. British Medical Journal, 368, m919. [Google Scholar] [CrossRef]
  9. Laupacis, A. (2020). Working together to contain and manage COVID-19. Canadian Medical Association Journal, 192(13), E340-E341. [Google Scholar] [CrossRef]
  10. Neumann, W. R., Just, M. R., Crigler, A. N. (1992). Common knowledge: News and the construction of political meaning. Chicago: University of Chicago Press.
  11. Peto, J. (2020). COVID-19 mass testing facilities could end the epidemic rapidly. British Medical Journal, 368, m1163. [Google Scholar] [CrossRef]
  12. Tian, S., Hu, W., Niu, L., Liu, H., & Xu, H. (2020). Pulmonary pathology of early phase 2019 novel coronavirus (COVID-19) pneumonia in two patients with lung cancer. Journal of Thoracic Oncology, 15(5), 700-704. [Google Scholar] [CrossRef]
  13. Wu, P., Hao, X., Lau, E. H. Y., Wong, J. Y., & Leung, K. S. M. (2020). Real-time tentative assessment of the epidemiological characteristics of novel coronavirus infections in Wuhan, China, as at 22 January 2020. Eurosurveillance, 25(3), 1-6. [Google Scholar] [CrossRef]
  14. Cowling, B. J., & Leung, G. M. (2020). Epidemiological research priorities for public health control of the ongoing global novel coronavirus (2019-nCoV) outbreak. Eurosurveillance, 25, 2000110. [Google Scholar] [CrossRef]
  15. Park, H. W., Park, S., & Chong, M. (2020). Conversations and medical news frames on twitter: Infodemiological study on COVID-19 in South Korea. Journal of Medical Internet Research, 22(5), e18897. [Google Scholar] [CrossRef]
  16. Olagoke, A. O., Olagoke, O. O., & Hughes, A. M. (2020). Exposure to coronavirus news on mainstream media: The role of risk perceptions and depression. British Journal of Health Psychology, 25(4), 865-874. [Google Scholar] [CrossRef]
  17. Poirier, W., Ouellet, C., Rancourt, M. A., Béchard, J., & Dufresne, Y. (2020). (Un)covering the COVID-19 pandemic: Framing analysis of the crisis in Canada. Canadian Journal of Political Science, 53(2), 365-371. [Google Scholar] [CrossRef]
  18. Korea Press Foundation (2020). BigKinds. https://www.bigkinds.or.kr/.
  19. Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., & Leder Luis, J. (2014). Structural topic models for open ended survey responses. American Journal of Political Science, 58(4), 1064-1082. [Google Scholar] [CrossRef]
  20. Li, S., Wang, Y., Xue, J., Zhao, N., & Zhu, T. (2020). The impact of COVID-19 epidemic declaration on psychological consequences: A study on active weibo users. International Journal of Environmental Research and Public Health, 17(6), 20-32. [Google Scholar] [CrossRef]
  21. Wang, Y., Di, Y., Ye, J., & Wei, W. (2021). Study on the public psychological states and its related factors during the outbreak of coronavirus disease 2019 (COVID-19) in some regions of China. Psychology, Health & Medicine, 26(1), 13-22. [Google Scholar] [CrossRef]
  22. Ho, C. S., Chee, C. Y., & Ho, R. C. (2020). Mental health strategies to combat the psychological impact of coronavirus disease 2019 (COVID-19) beyond paranoia and panic. Annals, Academy of Medicine, 49, [Google Scholar]

Cite This Article

Choi, Y., Um, Y. (2023). Topic Models to Analyze Disaster-Related Newspaper Articles: Focusing on COVID-19. International Journal of Mental Health Promotion, 25(3), 421–431.


cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 635

    View

  • 384

    Download

  • 0

    Like

Share Link