Regulatory authorities create a lot of legislation that must be followed. These create complex compliance requirements and time-consuming processes to find regulatory non-compliance. While the regulations establish rules in the relevant areas, recommendations and best practices for compliance are not generally mentioned. Best practices are often used to find a solution to this problem. There are numerous governance, management, and security frameworks in Information Technology (IT) area to guide businesses to run their processes at a much more mature level. Best practice maps can used to map another best practice, and users can adapt themselves by the help of this relation maps. These maps are created generally by an expert judgment or top-down relationship analysis. These methods are subjective and easily creates inconsistencies. In order to have an objective and statistical relationships map, we propose a Latent Semantic Analysis (LSA) based modal to generate a specific relatedness correlation map. We created a relatedness map of a banking regulation to a best practice. We analyzed 224 statements of this regulation in relation to Control Objectives for Information Technologies (Cobit) 2019's 1202 activities. Furthermore, we support our LSA results with MCDM analysis methods; Fuzzy Analytics Hierarchy Process (FAHP) to prioritize our criteria and, WASPAS (Weighted Aggregated Sum Product Assessment Method) to compare similarity results of regulation and Cobit activity pairs. Instead of the subjective methods for mapping best practices and regulations, this study suggests creating relatedness maps supported by the objectivity of LSA.
In the IT world creating standard processes or techniques are crucial. Network devices use protocols to connect or server client architecture uses almost same rules to communicate. Rule creation is essential to continue IT services. On process side, some institutes create best practices to govern IT better. Also, there are some best practices that are more technical issues in. Another dimension is regulations [
Our selected regulation is a basis for banks in Turkey for whole IT processes, it regulates the IT areas and create rules to obey. Regulators mention their rule in their regulations but they do not explain how to be compliant. To be compliant in a specific article or an area banks uses best practices. It can be ISO, NIST or Cobit to find related activities. Finding related activities, practices or a domain is a hard process. Best practices domain, subdomain or practices names and creation methods are different, regulations are complicated. To create a semantic relatedness map can improve this process.
To find semantic similarities in texts, there are 5 main methods as; string-based, character-based, corpus-based, knowledge-based, and hybrid similarity measurements methods. [
LSA is a type of neural language processing subset of automatic neural language processing based on predefined word-to-word association maps called corpus. The underlying idea is that the sum of information about all word contexts, in which a given word appears and does not appear, largely determines the similarity of the word meaning and word set. This provides a set of mutual constraints and with the help of this method, text similarities can be defined after creating an analyzable data set.
In this study, one to many LSA method was used to compare one regulation statement to many Cobit activities. After defining every regulation statements’ relation percentage of every Cobit practices relatedness percentage, we create 224*1202 relation matrices. After that, we analyzed our LSA-based data set with FAHP and WASPAS to prioritize our criteria and rank comparison pairs. Comparison pairs are one of randomly selected regulation statement and LSA result that most or least meaningful pairs selected from Cobit activities.
Finding the relatedness of regulation-is requiring intense effort for governing bodies or compliance officers. Gap analyses or finding related compliance requirements is generally hard. There are not enough relatedness maps created. Only best-known best practices to another best practice relatedness maps or relation matrices made by generally subjective methods. E.g. Cobit to ISO 27001. Regulations to another best practice relatedness mapping has never worked before, because of the creation process of relation map difficult. NLP (Natural Language Processing) and semantic analysis techniques are being developed and using in many areas. However, these methods are not applied to define relation of regulations or best practices before. Our works are unique in this aspect. Not only in this implementation but also, we used our LSA results with FAHP + WASPAS method to compare created LSA matrices. This strengthens the accuracy of this study and offers a new method to map regulation to best practices or both.
In some sectors risks and compliances are changing fast, and it is difficult to be compliant. Businesses want to have quality processes by adapting to best practices. In doing so, there are regulations issued by the regulators that businesses have to comply with. Many of the practices included in the provisions of the legislation are included in international good practices, that is, in best practices. While businesses comply with the legislation, they also want to comply with international practices. Therefore, the compatibility of international rules such as Cobit 2019 with local legislation is always one of the issues investigated by businesses. In the study, [
Cobit and NIST Cyber Security Framework or other references maps were created manually by Nicole Keller in 2018, and also Glenfis, mapped Cobit 2019 with ITIL v4 relatedness map with only their master domain name similarity [
Cobit and a specific IT regulation relatedness has not been worked, Although Cobit
Semantic relatedness has several applications in NLP such as word sense disambiguation, paraphrasing, text classification, dimension reduction, etc. [
To cluster mashup web technology researchers focus on utilizing semantic similarities to guide the mashup clustering process and to find structural and semantic information in mashup profiles. They used to integrate structural similarity and semantic similarity using fuzzy AHP and LDA (Latent Dirichlet Allocation). LDA is a clustering algorithm based on the genetic algorithm again an LSA-like NLP technique [
The entire literature on highly correlated maps generated using LSA and fuzzy AHP was reviewed. There is a lack of studies, no comparison of an international regulation such as Cobit 2019 with another regulation with the LSA method has not been found before. In addition, no study has been found in the literature supporting the LSA method with the fuzzy AHP method.
This study aims to reveal a scientific similarity map by using the LSA method by extracting the similarity map of Cobit 2019 and the related regulation. As a result of this analysis, the results of the study were expressed more strongly by using the fuzzy AHP method. Because of all these, the hypothesis of our study can be determined as follows: LSA method can be used in the regulation and best practice mapping and support our LSA result with fuzzy AHP and WASPAS methods used to determine the most important similarity criteria and ranked similarity of our randomly selected Cobit 2019 and regulation statement similarity is aligned with LSA results. Finally, we analyzed that created relatedness map is consistent.
In the third section we expressed all equations which are used in LSA, FAHP and WASPAS, in the fourth section we implemented all methods and represent application results and in the last section, we conclude and give recommendations.
LSA is a method that puts relevant data from a large document data with a particular query, allowing us to detect similarities between data. The query process has several development routes, including keyword, weighted keyword matching, and vector-based relationships based on word formation [
LSA extracts information from phrases or phrases that often appear simultaneously in different sentences. If there is more than a one-word group in the sentences in the specified database, the sentence has a semantic or safe meaning [
The database of the text to be processed is determined and separated into documents. Considering that each item in the database is consistent, each item is processed as a separate document. This stage is the stage of transforming the texts in the unstructured database into structured data.
At this stage, the structured document and terms matrix are created. In this matrix consisting of rows and columns, it is determined how many times the specified term occurs in the specified database. Thus, the document-term matrix is created. Each row represents a word, while each column represents the resulting word root.
The document term matrix determined in the previous stage is divided into three different matrices at this stage. These are the left singular vector matrix (U), the singular value matrix (S), and the right singular matrix (V). It can be expressed by the following
As shown in
As stated in the previous equations, there is a matrix representing each letter. The S matrix in
As expressed in
After that, SVD can now define and edit dimensions that express which data variations are included and how often in the text. At this stage, SVD takes the term document matrix. We now have an SVD vector to use to calculate similarity [
This similarity is used to calculate the value between the document vector and the term vector in a database [
As expressed in the equation, A is the document and B is the term vector. Vector B can be thought of as the input of vector A. IAI is the length of vector A, and IBI is the length of vector between IAI and B. Thus, to be more precise, |B|, |A| It is a cross product between |B| and α can be expressed as the bridge between vector A and vector B [
One of the most studied methods of multi-criteria decision-making approaches (AHP) is. The analytical hierarchy process is a general theory of measurement. AHP is the broadest application used in multi-criteria decision making, planning and resource allocation, and conflict resolution [
The LSA method, which is the subject of our study, reveals the relationship between Cobit 2019 items, which express control targets in information technologies, and BRSA legislation, which expresses the supervision and regulation statements of the Turkish banking sector, and creates a similarity map. We aim to express the resulting analysis results in a more powerful way. Therefore, we used fuzzy AHP method, which is frequently used in recent studies, allows selection by experts for the criteria expressed in
C1 | Area similarity |
C2 | Main domain similarity |
C3 | Subdomain similarity |
C4 | Activity/Article subject similarity |
C5 | Objective similarity |
The reason why we used the fuzzy version of the AHP method in our study is that this method expresses the opinions of expert decision-makers more objectively than the AHP method. The fuzzy triangular numbers were used in this version used, which allows decision-makers to express their expressions more accurately. These numbers are indicated in
Rank | Linguistic term abbreviation | Triangular fuzzy number |
---|---|---|
1 | Absolutely low importance | (1, 1, 2) |
2 | Essentially low importance | (1, 2, 3) |
3 | Weakly low importance | (2, 3, 4) |
4 | Equally low importance | (3, 4, 5) |
5 | Exactly equal | (4, 5, 6) |
6 | Equally high importance | (5, 6, 7) |
7 | Weakly high importance | (6, 7, 8) |
8 | Essentially high importance | (7, 8, 9) |
9 | Absolutely high importance | (8, 9, 9) |
In order to apply the FAHP method, the opinions of 5 decision-makers who are well aware of both Cobit 2019 and the regulation were taken. Our sample was chosen from a Turkish Bank and they are the main universe of a compliance office. The application steps of this method are expressed as follows. Due to constraints, not all equations are expressed.
The resulting weight vector W is no longer a fuzzy number after normalization [
Using the criteria weight values obtained by the fuzzy AHP method, 25 randomly selected regulation statements from our matrix created by the one to many LSA method, with the highest correlation, 15 Cobit 2019 activities, and 10 lowest related Cobit 2019 activities were matched. Bilateral evaluation documents were created. Experts, who made the criteria determination in FAHP, also made WASPAS (Weighted Aggregated Sum Product Assessment) surveys voted for these 25 regulation pairs according to the score scale between 1 to 9. Our aim here is to determine whether the highest and lowest similarity rates determined by LSA are parallel with the result of the WASPAS method. Thus, it will be possible to compare the results of the LSA method with another method.
The WASPAS method was proposed by Zavadskas et al. [
Here; λ= is the combined optimality coefficient and λ Є (0, 1). In cases where the Weighted Sum Model and Weighted Product Model approaches have equal effects on the combined optimality criterion, λ=0.5 is taken. Thus, each alternative is ranked considering its combined optimality value
The starting point of the LSA is text collections, within the text material usually, paragraphs are split up and create documents. The paragraphs saved in the documents have information about word relationships and can be represented abstractly in a frequency matrix, where the columns contain the individual documents and the lines that contain different words. In our research, regulation's every action and Cobit practices create documents and terms. We use one to many approaches stated on Colorado University LSA application which defined in Landauer and Dumains’ paper [
The frequency of occurrence of a word in a specific document specifies the relatedness, this uses large corpora of natural language, then this frequency matrix is very sparse. The frequency matrix already contains all information about word relationships. In documents and texts there are usually too big and they are able to carry out unnecessary information (“noise”). In order to eliminate the “background noise”, reducing the information contained in the frequency matrix to the core content. An essential step needed to create a filter of potentially superfluous words, application of weighting functions of the cell frequencies, singular value decomposition, and determination of the optimal number dimensions [
In the first step, we define the potentially unnecessary words and they are excluded. These include high-frequency words that do not convey any specific information, as well as words that appear very rarely, for example, less than three times in the entire text corpus. This reduces the number of different one's words clearly and increases the quality of the documents.
In our documents we have lots of IT-related special abbreviations, we completely changed these words with the help of a word processor. For example, we changed all IT abbreviations words to information technology. Cobit has lots of prepositions on it like, “e.g.,”, “ex.”, “aka.” or abbreviations like “I&T”, “decision-making”, “IT-enabled or (IT-something else)”, “enterprise’”, “stakeholder’”, apostrophe or hyphen words are converted. Cobit gives too many examples on bracelets, these are deleted. Also, sometimes the example is off topic may be related to the document but it can affect the main area or direction and relatedness. So, we cleared these examples from actions and Cobit practices. This data evaluation increases the similarity ratio if any. On the other side, our sample regulation has some special country-specific information and some unrelated statements like revision purpose, the basis of legislation, definitions, abbreviations, and final provisions.
Cobit has 5 main domains and 40 processes (subdomain), these 40 processes have 231 practices and every practice has many actions as a piece of advice to comply with Cobit. Totally Cobit has 1202 actions. We got these 1202 actions and BRSA's “Information Systems and Electronics of Banks Regulation on Banking Services” regulation which has 224 articles and made a one-to-many latent semantic analysis to gather a relatedness map. With this map, we intended to have two ways to evaluate, depicted in
While applying the LSA, a similarity map was created by choosing Cobit 2019 as the main text and the relevant regulation items as the comparison text. Then, the opposite was applied and merging was done as much as the number of database matrices.
After coding our regulation and Cobit 2019 actions, we started to analyze every article with 1202 action from Cobit and created a matrix 224*1202. LSA gave us every article's relatedness ratio with Cobit's selected domains’ practices’ action's similarity percentage. Values are varied from 0 to 1, 0 shows relatedness is not found and 1 shows they are the same. After analyzing all articles’ relatedness, we store data in Power BI table and create a heat map depicted in
To show our LSA results meaningfully we create a Power BI application with power query. Our dataset comes from the analysis result of our regulation articles and Cobit actions relation map. Data modal is shown on
The criterion weights obtained after the applied fuzzy AHP steps are expressed in
Symbol | Criteria | Weight | Rank |
---|---|---|---|
C1 | Area similarity | 0.0086 | 5 |
C2 | Main domain similarity | 0.0296 | 4 |
C3 | Subdomain similarity | 0.0677 | 3 |
C4 | Activity/Article subject similarity | 0.3060 | 2 |
C5 | Objective similarity | 0.5882 | 1 |
After the LSA method was applied, the fuzzy AHP method was used to strengthen our study. According to the results of this method, among the determining criteria, Practice/Article Subject Similarity (C4) and Objective Similarity (C5) were determined as the criteria with the highest criterion weights. This means that, according to the opinions of 5 experts who dominate all regulations, Cobit 2019 and related regulation items have 58 percent similarity in purpose, 30 percent application and subject similarity. These results also support our LSA method.
Ranking results are expressed in
According to the results in
Symbol | WASPAS value | Rank | LSA results |
---|---|---|---|
Q12 (+) | 0.97 | 1 | 0,92 |
Q4 (+) | 0.93 | 2 | 0,91 |
Q2 (+) | 0.93 | 3 | 0,90 |
Q10 (+) | 0.93 | 4 | 0,89 |
Q7 (+) | 0.92 | 5 | 0,86 |
Q1 (+) | 0.92 | 6 | 0,90 |
Q13 (+) | 0.91 | 7 | 0,87 |
Q8 (+) | 0.91 | 8 | 0,85 |
Q5 (+) | 0.90 | 9 | 0,86 |
Q6 (+) | 0.90 | 10 | 0,89 |
Q11(+) | 0.90 | 11 | 0,92 |
Q3 (+) | 0.89 | 12 | 0,90 |
Q14 (+) | 0.87 | 13 | 0,91 |
Q9 (+) | 0.86 | 14 | 0,90 |
Q16 (-) | 0.28 | 15 | 0,26 |
Q18 (-) | 0.27 | 16 | 0,31 |
Q17 (-) | 0.26 | 17 | 0,32 |
Q19 (-) | 0.25 | 18 | 0,29 |
Q24 (-) | 0.24 | 19 | 0,27 |
Q21 (-) | 0.23 | 20 | 0,30 |
Q15 (-) | 0.23 | 21 | 0,25 |
Q22 (-) | 0.22 | 22 | 0,33 |
Q20 (-) | 0.19 | 23 | 0,32 |
Q25 (-) | 0.19 | 24 | 0,28 |
Q23 (-) | 0.19 | 25 | 0,29 |
Nowadays businesses are operating in a highly regulated areas and their thread landscape is generally high. To govern information technologies, they should create mechanisms to have effective oversight of IT operations. To create effective governance, businesses use best practices and regulations. Best practice compliance is not important as regulations. To become more mature in both areas the requirement is much to be compliant with regulations and some selected best practices. Creating a regulation relevance map with a best practice creates value to practitioners. Compliance officers and also governing body always trying to find the most relevant path between related best practices to easily understand and generalize or exemplify the regulation statements. This complexity may create wrong interpretations and commentary about specific issues in regulation. Also, another aspect about best practice mapping is creating only master domain relatedness by best practice owner, this is not enough for governing bodies. They need to understand not only master domains but also any area, subdomains, objectives, and practices.
Our LSA-based relatedness method facilitates the regulation to best practices or best practice to regulation relation. Searching the whole similarity between all practices and every statement is creating big data for analyze. This dataset can be used for finding any selected regulation statement and specific related best practice practices. The most challenging step is cleaning each of the documents to get valid results. LSA's one to many approaches gets every regulation statement one by one and every practice (in our sample 1202 Cobit statement) relation with cosine similarity method. With the help of LSA method, we create every regulation relatedness percentage to any of Cobit practices. Practitioners can also use these matrices’ transpose to create Cobit practice relatedness to any regulation statement. In our sample we create a Power BI report to easily apply this two-way analysis.
Finally, we used FAHP and WASPAS methods to verify our relatedness matrices. Our results show that LSA method creates valid correct similarities and can be usable in regulation and best practice mapping and creating relatedness matrices. Creating a relatedness map between a regulation and a best practice in an objective way is possible and creates consistency by the help of LSA. Sample bank can easily adapt its processes to find Cobit to regulation related articles or regulations’ related Cobit activities easily. Moreover, not only in our sample bank but also in all sector representatives can use this modal, due to they are all subject to same regulations and best practices.
After creating an LSA-based relatedness map, a business should create a project or a program that addresses the need of increasing the maturity level of the compliance process. This project may focus on techniques to assess results with the related regulation and best practices by the sponsorship of governing body. In this way, regulation findings, issues, and risks can be addressed and can be solved in a collaborative environment.
Our research covers the relatedness of two standards; Cobit and BRSA's IT-related regulation. Businesses also want to implement other standards like ISO 27001 or NIST (National Institute of Standards and Technology) standards and there are too many regulations to be compliant. In the future, researchers can use our LSA-based method to create a relatedness map with each standard or regulation pair. Also, we can implement ready-to-use relatedness maps to bridge our relatedness map.