The research volume increases at the study rate, causing massive text corpora. Due to these enormous text corpora, we are drowning in data and starving for information. Therefore, recent research employed different text mining approaches to extract information from this text corpus. These proposed approaches extract meaningful and precise phrases that effectively describe the text's information. These extracted phrases are commonly termed keyphrases. Further, these key phrases are employed to determine the different fields of study trends. Moreover, these key phrases can also be used to determine the spatiotemporal trends in the various research fields. In this research, the progress of a research field can be better revealed through spatiotemporal bibliographic trend analysis. Therefore, an effective spatiotemporal trend extraction mechanism is required to disclose textile research trends of particular regions during a specific period. This study collected a diversified dataset of textile research from 2011–2019 and different countries to determine the research trend. This data was collected from various open access journals. Further, this research determined the spatiotemporal trends using quality phrase mining. This research also focused on finding the research collaboration of different countries in a particular research subject. The research collaborations of other countries’ researchers show the impact on import and export of those countries. The visualization approach is also incorporated to understand the results better.
Scientific knowledge is rapidly growing, making rich data sources available [
Previously, many text and phrase mining approaches were used for text analysis. Keyphrase extraction approaches are more important. However, the state-of-the-art keyword and keyphrase extraction approach attained more attention. The phrase is considered more informative than a word because the words mainly direct to a specific meaning. Moreover, N-gram phrase extraction is also critical for determining frequent and interesting patterns from massive text corpora [
Many statistical text mining approaches are used for keyword and keyphrase extraction but still have limitations while working with large and complex datasets [
This research is focused on determining spatiotemporal trends. According to the literature studies, there are some methods to assess the spatial trends used from data collection for different purposes. Similarly, some ways determine temporal trends used for various purposes such as weather analysis, accident causes and medical reforms, etc. So, such an intelligent method is needed to determine the research trends of any field from the published literature with combined properties of spatial and temporal techniques. The key contribution of this research is that previous methods were based on the data collected from different data sources, but this research used published articles from various journals.
Furthermore, this study combines the spatial and temporal data processed by a machine learning method to determine the research trends from the collected data effectively. It works in such a way that it first collects the dataset containing the published research articles from multiple journals preprocesses them to remove nauseous data. Then, we used the machine learning method ‘Quality Phrase Mining’ to determine the research trends of textile literature based on spatiotemporal properties. Finally, the directions of textile research from the previous nine years are visualized in graphs and word clouds to communicate the information effectively.
The rest of this paper is organized as follows. First, Section 2 discusses the current state-of-the-art work in the domain, followed by the proposed research methodology in Section 3. Next, results have been elaborated in Section 4, along with discussions. Finally, Section 5 concludes the paper.
Data mining approaches are used for finding non-trivial and exciting patterns from a massive text corpus. Scientific literature is rapidly growing, and many stakeholders need to discover innovations and new trends to make better decisions in real-time [
Text results were extracted using different approaches stored in the textual format. These results are difficult to understand. Therefore, researchers try to visualize textual results to understand results better. We can find several visualization approaches designed to show various text data types. A visualization method can better convey a tremendous amount of information with a more negligible reasoning effect. Visualization features are better to explain with an old saying, “A picture worth a thousand words” [
Text mining approaches are openly used for finding non-trivial and exciting patterns from a massive text corpus. Text mining methods have used various dimensions and types of interpretative information known as Meta-Knowledge in events and related contexts. Text mining is a flexible technology that is applicable in many fields. There is a lot of work done in text mining for keyword and keyphrase extraction. Many keywords and keyphrase extraction approaches are presented by researchers, which are broadly discussed below. Text mining approaches are further divided into statistical, graph-based, and machine learning methods. Some of the renowned statistical, graph-based, and machine learning approaches are discussed in detail in the following sections.
Statistical approaches in terms of text mining for keyword extraction work on the text data's word occurrence—keywords ranking based on their occurrences in the data collection to quantify the information content of the keywords. In statistical approaches, the frequency of every word concerning its occurrence, then words ranked according to their scores [
The study presents a keyword extraction approach by adding co-occurrence to the feature set. The main document is divided into subsets by clustering [
Therefore, another research also proposed a new feature set of synonyms named synset that enhanced the performance of TF-IDF FLC and BDS algorithms [
The graph is made based on nodes and edges, the words in the document are considered as nodes, and the edge between them is made based on some specific constraint. The constraint may be the co-occurrence or any other (depending on the nature of the graph). The study presented a keyword extraction method based on the Patricia-tree (PAT-tree). PAT-a tree is constructed on the sentence's context based on keyword location, then keyword mined from the PAT-tree. However, PAT-tree construction is time-consuming in this method [
The study used a graph-based approach that enhanced the results on complex data. New Selectivity parameter of nodes used for keyword extraction that defines as the average weighted of single edge node on the link [
However, another research proposed a k-core approach for document representation as a graph of words. For node representation, the central core is used. That approach gives better proximity between the variability in the number of keywords and keywords extracted from a subset of nodes. However, the performance can be compromised by choosing the wrong parameters [
The author proposed a weighted graph-based approach named Keyword from Weighted Graph [
Machine learning approaches provide keyword or keyphrases based on the training dataset containing user annotating keywords or key phrases. First, data from the corpus is preprocessed for keyphrase extraction. Then, the machine learning algorithm is applied to the processed data. Machine learning algorithms use training data to extract keywords or keyphrases from data. Some of the renowned machine learning approaches are discussed below.
Keyword extraction gives a concise representation of the document. The study [
Furthermore, model training time consumption is resolved in the machine learning approach called Automatic Term Recognition (ATR). An annotated n-gram is used for the training dataset, consisting of a positive and negative example. Furthermore, the machine learning approach is used for bilingual ATR with some modifications w.r.t monolingual machine learning methods [
Particle Swarm Optimization is used for recomputed node and the optimal corresponding node predicted. The research proposed a novel K-means Non-Negative Matrix Factorization (KNMF), implemented using Negative Matrix Factorization guidelines [
To improve the previous study, the author proposed a supervised machine learning approach for keyphrase extraction called citation-enhanced keyphrase extraction (CeKE) [
The improvement in getting a better result has proposed a framework for solving Spatio-temporal data mining of multiple events. Moreover, the author improved Spatiotemporal pattern mining by considering both space and time factors [
The research demonstrates the implementation of Spatiotemporal pattern mining to identify property crime ratios in different geographical areas [
The proposed research methodology is divided into two phases. First, data is collected using a crawler from the collected research articles of open access research journals. In the first phase, preprocessing tasks consist of tokenization, stop words removal, and stemming. Next, preprocessed data is used for research trend mining. This research uses the machine learning-based approach named quality phrase mining. Quality phrase mining is used to determine the spatiotemporal trends. This provides the research trends in different countries from 2011 to 2019. The trends are visualized using state-of-the-art visualization approaches to understand results better. The researcher's collaboration from other countries is determined using research articles metadata from the dataset in the second phase. The research methodology used in this research is shown in
This study collected data from previously published literature related to textiles and their applications. Many datasets were studied, but no data pertaining to textiles is suitable for this research's requirement. Data collection tasks are performed with a crawler that crawls data from several scientific digital libraries.
These scientific digital libraries include Sage journals, Science Direct, Springer, and MDPI. Different scientific digital libraries were selected for data collection to avoid the journal preferences of publications. Several available approaches of text mining use only research article abstract instead of the entire paper because every journal's research articles are not freely available. In this research, full-text research articles are used for trend mining from research articles of textiles. This research collected research articles from open access journals only from 2011 to 2019. For spatiotemporal trend mining, different country and years data was needed. More than 1500 full-text research articles are collected from various open access journals. This research chooses nine developed countries affiliated research articles like China, France, Germany, Italy, Japan, South Korea, UK, and the USA for spatial data. After data collection, all research articles are arranged country and yearly basis.
Each country had a research article from 2011 to 2019, as shown in
Number of articles | 1533 |
---|---|
Number of countries for trend | 9 |
Number of years | 10 |
Venues | 8 |
Affiliation of countries | 36 |
In this step, raw data was converted into a process-able form. First, the research article is available in a pdf file converted into a text file. After this, articles were discarded other than the English language because this research only processed the English language data. Then the preprocessing task is performed, which consists of multiple steps as follows (tokenization, stop word removal and stemming).
In this step quality, phrases were mined from the dataset generated from the previous step. Mining quality phrase is a crucial step in methodology because the phrase mined affects this research results. A phrase is considered a quality phrase with four features: popularity, concordance, completeness, and informative.
A phrase occurred in the document with a higher than the specific frequency is considered a quality phrase. A phrase should be frequent in a document. If it is not frequent, it cannot be a quality phrase from a given document. Popularity also ensures downward property. A subset of a phrase must be frequent if the phrase is frequent.
Concordance is the frequency higher than the frequency that occurred by chance. Identify the relevancy between several sentences and synonyms identification of all the words. For example, strong tea is similar to solid tea in English. However, strong tea is a more proper word and occurs frequently. If strong tea is selected as a quality phrase, power tea is also a quality phrase because both words have the same semantic meaning.
An informative phrase indicates a specific topic. All frequent phrases are not referred to as informative phrases because an informative phrase has semantic meaning. Like “this paper” is can frequently occur in the document but not informative in research articles. The approach name used in the research article is informative for research.
Completeness refers to the property of a phrase that deals with a complete, understandable meaning of the phrase. The phrase is determined as a complete semantic entity. A phrase may not always appear as a single word. It frequently appears as a combination of two words that give a single understandable meaning. So, completeness refers to the quality of a phrase that makes it understand completely.
The spatial is a term that refers to a geographical location. The location may be any geographical space like a country city. In contrast, a temporal term refers to any time span like decades, years, and months. The term is combined as spatiotemporal, referring to both meanings, firstly space and secondly time. In this research, the term spatiotemporal means both geographical location and year; here, geographical location represents the country where research is being taken. The year represents the specific year in which the research was being done. Trend refers to a direction in which research is being conducted, for example, bulletproof fiber production.
Finally, the spatiotemporal trend means that country of a specific year following the research performed in the textile research field. This research has arranged datasets on a different country and yearly basis. To get research trends, a specific country and one-year data analysis were performed through quality phrase mining. The exact process followed for every analysis. This research has focused on collecting data of a specific time (number of years) from 2011 to 2019 based on specific countries on which this research method was performed. For spatiotemporal trends, finding data related to a country was separated and the quality phrase mining method was applied to get trends in a country. This task performed year-wise data to show different years of textile research trends in a specific country.
This step involves extracting the author's affiliation information from all research articles (from the previously collected dataset). Focus on the affiliated country of a specific research article. At first, this research selects articles from more than one country and discards all others. The next step filtered out only those articles that contained at least two different countries. Then, all the researcher's affiliations in a research article are extracted. This research used pdfx [
Collaboration in research means that researchers from different countries collaborate for the research. Many research articles contain authors from different countries. In collaborative research, various researchers from other countries work together on a problem. This research finds collaborative research articles from the collected dataset by finding at least two affiliated authors in a research article. In this research, collaboration means researchers from different geographical areas (for example, one author from one country and another from other countries) researched on any specific research topic. Previously collected data of the researcher's affiliation was used to find researchers’ collaboration from different countries. Authors defined keywords also extracted to show which keyword researchers collaborate from other countries. This research gives us research trends in various countries’ textile field collaboration. After finding a collaborative document, this research also extracts the author's given keyword to show country collaboration based on the keywords. That also helps to see which countries collaborate on which keywords.
Computer visualizations have become more diverse since it has been implemented in various previous studies. It supports single and multi-dimension data and visualizes it. It means every type of data can be converted into the visualization format at first and then can be visualized effectively. As the main application of visualization is to display data effectively, it also supports an additional feature that enables the viewer to zoom in and zoom out the content to view hierarchy levels or connections. While dealing with visualization D3.js has been used by many research articles in literature [
Results can be filtered out by clicking on a circle according to the country name. Clicking on specific country circle detail on demand will be shown. Zoomable circle packing is also used to show research collaborations as shown in
This research identifies the spatiotemporal scientific trends in textiles and their applications from 2011 to 2019 using quality phrase mining on a scholarly dataset. In this approach, quality phrases of variable length are extracted from literature related to textiles. In this approach, the maximum phrase length is set to five, and the frequency threshold value is set according to the number of publications. Because that process is divided into ten spans, the number of publications differs in each span. Further, these spans are divided into nine different countries.
This section is divided into two further sections. In the first section discusses trends. That section is further divided into spatial, temporal, and spatiotemporal trends. The second section discusses the collaboration of researchers from different countries in textiles. The first section is further divided into nine years from 2011 to 2019. Furthermore, in a year, there are nine different countries’ research trends. Moreover, results are shown using visualization for a better understanding of results. The second section shows the researcher's collaboration with different countries. Results also show the keyword on which researchers collaborate. Two different visualizations were used for better results representation.
In this step, the scientific trend of textiles from a timespan of the year 2011 to 2019 years is described using the word cloud visualization method. The year 2011, 2012 and 2013 trend of textile research are shown (using the word cloud) as in
In this step, the scientific trends of textiles from different countries in the time span of year 2011 to 2019 is described using the word cloud visualization. The China, USA and UK's trends of textile research from year 2011 to 2019 shown (using word cloud) in
This step deals with the representation of scientific trends of textiles from different countries during timespan of year 2011 to 2019 using the word cloud visualization. Every country's trends of textile research from year 2011 to 2019 are shown separately. The top temporal trends identified by this research for the year 2011 were “Tissue Engineering”, “Nano mat”, and “Composite Fiber” and are also given in the
These were the top three terms that were considered to be the trending terms in the year 2011 in textile field research. The same way this research identified spatial trends of a specific country “China” from the timespan of the year 2011 to the year 2019 and the top three trending terms identified by the proposed method were “woven fabric”, “spacer fabric”, and “Tensile strength”. Therefore, it is determined that these three terms were the topmost trending research terms in the said timespan. It is being observed from these two, the spatial and the temporal trends individually that determined trends were different in results. So, there was a need for such a method that works in such a way to work efficiently in determining factors, the spatial and the temporal combined. In this regard, this research's proposed approach named spatiotemporal trend mining worked well in determining both factors of the trends, it produced meaningful and more accurate results than the individually determined results (spatio and temporal) which is given in the
The top trend of the year 2012 extracted using quality phrase mining are “Spacer fabric”, “fabric”, “woven fabric”, and more trends given in the
In this step, the collaboration of researchers of a country with other country researchers, determined in the Section 3.6 are shown graphically using two visualization methods zoomable circle method and simple network representation of researchers collaboration. In this research, 37 different country's data was used for determining the collaboration of the researcher. Collaborative countries and keyword on which researchers collaborate are shown in this section. The collaboration of countries that collaborate with china is visualized. It shows countries name on vertex and edges connected show collaboration between countries.
The connections between the names of countries are actually the collaborative research number; the more the links means the more number of collaborative researches conducted between them. The visualization works in such a dynamic way that when the mouse is moved on any country's name, the links of that country to the other countries are highlighted with the dark color, form which the collaborative countries can easily be seen. Further, the Zoomable-circle-packing visualization also used for visualizing research collaboration shown in the
This visualization works in hierarchy-based zooming, if we click on a circle of a specific country it will show the collaboration of that country with the other countries with their names on circles, the bigger the circle is the more number of collaborations are there. When the inner circle is clicked containing the name of collaborative country the keywords of both countries are shown, that shows on basis of which topic both countries were collaborating in the research.
The results are also validated by fetching textile exports of countries. This experiment was carried out in such a way that, it shows the collaborative textile exports stats of those countries which are collaborating in textile research. It was being observed that as according to the results of the proposed method, the most collaborative countries in the said timespan were the USA and China. The textile export stats of both countries were also observed higher, so it can relate that the countries having collaborative research also have economical rise collaboratively in exports. The
Many data mining approaches are available that extract spatial, temporal, or combined spatiotemporal data from different types of datasets. Although these techniques are used in literature but no such methods exist that deals with the extraction of data based on spatiotemporal factor from scholarly data collection. The progress of any research domain can better be revealed through the spatiotemporal scientific trend analysis. So, the key idea of this research is to analyze the scientific spatiotemporal trend in the rapidly growing research field from published literature. To achieve this claim more than fifteen hundred research articles are extracted from open source journals of textiles from year 2011–2019. This research used quality phrase mining approach that was applied to the collected dataset in order to determine the spatial (in the Section 4.2) and temporal (in the Section 4.1) results. In this research it has been observed that the spatial and temporal trends tend to be less informative separately as compared to the combined spatiotemporal results. Moreover, the collaboration of researchers based on the country was also determined to show importance of collaborative research. The research collaboration results show that the countries with more collaboration in the research have more import-export collaborations. Furthermore, for an effective understanding of all results, this research implemented visualization techniques. Word cloud visualization is used to compare the results of spatial and temporal results with proposed method spatio-temporal results. Different visualization approaches are used for better representation of further results such as Word cloud, zoomable-circle-packing, and bilevel-edge-binding.
The author thanks Natural Sciences and Engineering Research Council of Canada (NSERC) and New Brunswick Innovation Foundation (NBIF) for the financial support of the global project. These granting agencies did not contribute in the design of the study and collection, analysis, and interpretation of data.