The rapid advancement of IT technology has enabled the quick discovery, sharing and collection of quality information, but has also increased cyberattacks at a fast pace at the same time. There exists no means to block these cyberattacks completely, and all security policies need to consider the possibility of external attacks. Therefore, it is crucial to reduce external attacks through preventative measures. In general, since routers located in the upper part of a firewall can hardly be protected by security systems, they are exposed to numerous unblocked cyberattacks. Routers block unnecessary services and accept necessary ones while taking appropriate measures to reduce vulnerability, block unauthorized access, and generate relevant logs. Most logs created through unauthorized access are caused by SSH brute-force attacks, and therefore IP data of the attack can be collected through the logs. This paper proposes a model to detect SSH brute-force attacks through their logs, collect their IP address, and control access from that IP address. In this paper, we present a model that extracts and fragments the specific data required from the packets of collected routers in order to detect indiscriminate SSH input attacks. To do so, the model multiplies a user’s access records in each packet by weights and adds them to the blacklist according to a final calculated result value. In addition, the model can specify the internal IP of an attack attempt and defend against the first 29 destination IP addresses attempting the attack.
In recent years, IT technology has been making rapid progress at an unprecedented pace. This has enabled us to quickly find, collect, and share quality data; but it has also brought to us a growing number of cyberattacks that snatch and forge data during data communication [
Known as one of the most important remote access protocols, Secure Shell (SSH) logs into another computer through communications or executes commands to a system, copies files, and performs various functions [
In general, network systems should be directly connected using a console in the initial setting, but they are used by supporting remote access through SSH. While the user who remotely accesses the system allowing SSH service is typically a manager, it cannot be assumed that all access requests come from managers. Thus, permission of remote access requests from all origins may expose a system to threats, and it is important to allow manager IPs to access SSH only remotely and establish appropriate policies for the start and end points of the firewall [
In brute-force attacks, the attacker submits all possible values as account inputs and attempts to access the system’s account information. Methods employed in brute-force attacks are divided into dictionary attack methods, which try all strings in a pre-arranged listing, and random sequence methods, which try all possible string combinations in a sequence [
Controlling accessing IPs by a blacklist prepared through security log analysis is useful to defend against random attacks. This section provides various study cases in which attackers are specified through a blacklist.
Dooyong mentioned that a blacklist of IP addresses is an important element for IT system protection. While various aspects of data and their operation records must be thoroughly reviewed for the blacklisting of IP addresses, the majority of current IP blacklists rely on security monitoring by skilled experts in specific domains. To solve this problem, he designed and established a blacklist model adopting machine learning (ML), and arranged and sorted out data through logistic regression analysis and a random forest; this resulted in incorrectly set blacklists being decreased by 90% [
Meanwhile, Dmytro warned that while blacklists of IP addresses and domains exist as components for various security systems, it is difficult for them to be applied to certain risks such as zero-day attacks. He estimated the accuracy of the model’s prediction by crossing IPs in the blacklists with a blacklist dataset containing about 270,200 unique blacklisted IPs. His method was more effective than another recent blacklist prediction method [
Meanwhile, Rick emphasized the risk of brute-force attacks against web applications and their damage, and noted that the detection of attacks was based on the server’s log file analysis, host-based intrusion detection system, or firewall, indicating several relevant problems. Further, he investigated the feasibility of a network-based monitoring approach that detects brute-force attacks against web applications in an encrypted environment and their damage. Afterward, he analyzed brute-force attacks through histograms on data packet payload sizes based on IP Flow Information Export (IPFIX) [
This research designed an access control policy to detect IPs approaching a system maliciously through the logs generated from a router, one of the devices comprising IT infrastructure, and prepared a blacklist of IPs with higher risk levels to protect the system from malicious access.
A blacklist refers to a specific problematic IP group and to the technology applied to block specific IPs.
To identify IP groups suspected of accessing the system through a brute-force attack, we needed to obtain SSH brute-force attack logs from the router, and fragment and analyze their content. The IPs that had a high frequency of attacks were listed to be used in the blocking policy of routers and other security equipment. The detection model proposed in this paper aims to collect logs, fragment the collected logs to analyze high-risk access IPs, detect SSH brute-force attacks, and defend against SSH brute-force attacks by managing the detected attackers’ IPs using blacklist techniques. If IPs that attempt malicious access can be obtained continuously, they can be used to defend against future brute-force, thereby enhancing the security of IT infrastructure. In addition, the proposed method can block malicious attacks while preventing channels from concealing their IP address.
To develop an access control policy based on logs from routers, this study established hypotheses based on the research model and verified the implementation of the proposed SSH brute-force attack detection model by testing each hypothesis. The research hypotheses for the proposed model are as follows:
Identification of continuous and malicious access to the Internet router Specification and detection of unauthorized access Fragmentation of unauthorized attack logs
First, a hypothesis was established to verify whether there is continuous and malicious access to the router and attacks against the target system. Then, to collect attack logs by identifying unauthorized access from logs on attacks against the router, it was assumed that unauthorized access can be specified and detected by confirming the points of an excessively huge number of access through SSH. The last hypothesis assumes that data to be used for a blacklist can be extracted by removing repeated messages from repeated logs and fragmenting necessary data only.
The research subject is a router applied without a separate system, since it is connected to a network before security equipment is installed in a fail-safe way. Routers are equipped with no separate network or access point to ensure connection with the external network. Thus, if there is no change in the network, fewer logs will be generated, and the administrator’s access will be recorded as logs.
Classification | 19/4 | 19/5 | 19/6 | 19/7 | 19/8 | 19/9 | 19/10 | 19/11 | 19/12 | 20/1 | 20/2 | 20/3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
R1 | 6187 | 4856 | 9988 | 6864 | 3780 | 5763 | 8592 | 15464 | 16912 | 52218 | 18141 | 6776 |
R2 | 6226 | 4387 | 11160 | 6055 | 3339 | 6235 | 8731 | 15318 | 16942 | 50312 | 18454 | 6966 |
The analysis results of the Internet router logs produced are as follows: “SSH user failed to login from” means that SSH access was attempted but failed, and whoever requested access is displayed via IP information. “on VTY0 due to IP restriction” means that access failure occurred due to IP access restriction; in other words, the SSH user of the starting point IP information was not able to log into VTY0 due to IP restriction, where VTY refers to a virtual terminal line to access the router interface.
Unauthorized access to the Internet router is normal access through SSH but is not authorized access on the target system. The Internet router only allows SSH access through the registered administrator IP, blocks all other accesses, and generates relevant logs. These logs take up a large portion of the entire logs, and unauthorized SSH access attempts are a result of brute-force attack attempts to identify the account information of the target system’s Internet router.
Since unauthorized access by IPs to the Internet router can be specified through the number of access attempts or access log messages and accounts for most of the logs, these logs can be used as security logs. Furthermore, these access attempts fall under SSH brute-force attacks and their IP information can be collected through logs.
Logs of SSH brute-force attacks collected in the Internet router are shown as follows:
“Apr 1 00:01:07 2019 router-1 %%10SHELL/5/SHELL_LOGINFAIL(l): SSH user failed to log in from 11*. 17*.5*. 88 on VTY0 due to IP restriction.”
The above log can be analyzed as follows: First, “Apr 1 00:01:07 2019” refers to the month, day, time, and year, respectively; “router-1” is a hostname; and the log shows SSH access failure analyzed previously. This log can be rearranged as a construction consisting of month, day, time, year, hostname, and message. The log is generated with the hostname and time information (month, day, time, and year), accurately verifying the content of the log. In other words, the log reveals when and to which equipment the SSH access has been attempted. The part excluding the IP is repeated in the message, whose log length can be reduced and the construction can be fragmented by deleting the repeated part. The above log can be reduced to “Apr 1 00:01:07 2019 router-1 11*. 17*.5*. 88”, and it can be saved in various file formats.
Month | Day | Time | Year | Hostname | IP |
---|---|---|---|---|---|
Apr | 1 | 0:01:07 | 2019 | Router-1 | 11*. 17*.5*. 88 |
Apr | 1 | 0:01:45 | 2019 | Router-1 | 10*. 20*. 3*. 84 |
Apr | 1 | 0:03:09 | 2019 | Router-1 | 15*. 23*.24*.63 |
Apr | 1 | 0:05:25 | 2019 | Router-1 | 15*.23*.24*. 63 |
Apr | 1 | 0:05:56 | 2019 | Router-1 | 11*. 17*. 5*. 88 |
Apr | 1 | 0:10:44 | 2019 | Router-1 | 11*. 17*. 5*. 88 |
Apr | 1 | 0:12:00 | 2019 | Router-1 | 15*. 23*. 24*. 63 |
Apr | 1 | 0:14:11 | 2019 | Router-1 | 15*. 23*. 24*. 63 |
Apr | 1 | 0:15:33 | 2019 | Router-1 | 11*. 17*. 5*. 88 |
The fragmented data of the log can be exported into a CSV file, through which the number of access attempts by time, day, year, and IP can be easily identified. Moreover, by adding a country field and registering the country code associated with each IP, the country from which attacks were attempted can be specified.
We study a blacklist access control policy against SSH brute-force attacks by analyzing source logs from the Internet router, extracting logs of SSH brute-force attacks only, and fragmenting them. This requires the extraction of logs related to SSH brute-force attacks caused by unauthorized access from source logs. Source logs are arranged in a text file, and a function is applied to classify them into two types: SSH brute-force attack logs, and other logs. Then, repeated strings are eliminated from the SSH brute-force attack logs and only the necessary data are left, which are then exported to a CSV file. IPs to be included in the blacklist are identified and used for the ACL of the Internet router and the blocking policy of security equipment.
A log document composed of one row and as many columns as the number of logs generated can be utilized to detect SSH brute-force attacks. To do this, a filter is used to fragment the log content. To process logs, the initial log information is first classified by applying two functions: (1) the FIND function, which returns the location of a starting point of the cell that includes the finding value; and (2) the ISNUMBER function, which returns the resulting value according to the result of a formula. The FIND function is used to determine the starting point of a message in a log, while the ISNUMBER function is applied to express TRUE or FALSE when a message related to SSH brute-force attacks is detected. These enable the extraction of logs only of brute-force attacks. Strings indicating SSH brute-force attacks are “SSH user failed to log in from” and “on VTY0 due to IP restriction.” Only when the ISNUMBER function is applied to the two strings and they are found on a log is the result TRUE.
Once the raw log information is determined as either TRUE or FALSE through the aforementioned functions, the logs related to SSH brute-force attacks are extracted through filtering. At least 3,000 and up to 5,000 logs are generated from the Internet router in a month. Extracting the system logs from a large number of logs can be cumbersome as only specific logs must be extracted. However, when extracting only the logs not related to SSH brute-force attacks, the number of logs extracted is reduced exponentially; this allows users to identify system logs more easily.
Raw log information | ||
---|---|---|
FALSE | Apr 23 12:13:48 2019 router-1 %%10SSH/4/TrapLogoff(t): 1.3.6.1.4.1.25506.2.22.1.3.0.4 SSH user logoff trap information | |
TRUE | Apr 23 12:38:26 2019 router-1 %%10SHELL/5/SHELL_LOGINFAIL(l): SSH user failed to log in from 5*.21*.12*.66 on VTY0 due to IP restriction. | |
TRUE | Apr 23 12:50:28 2019 router-1 %%10SHELL/5/SHELL_LOGINFAIL(l): SSH user failed to log in from 6*.24*.20*.20 on VTY0 due to IP restriction. |
The fragmentation process for the extracted logs of SSH brute-force attacks is simple because the pattern of all logs is set and the row can be classified by meaning. In addition, log information can be fragmented by eliminating the two phrases referring to the SSH brute-force attacks, “SSH user failed to log in from” and “on VTY0 due to IP restriction” and extracting the IP data.
Information for analysis undergo classification and fragmentation, and be exported to a CSV file. Log analysis is performed by dividing the log content into month, day, time, year, hostname, and IP address. Entered logs are sequentially sorted by a certain column, which is divided into sub-columns for the next entry.
If a is a log fragmented as above, the month field extracts the Apr column and generates a log counter, and then the Apr column is divided by day to obtain the number of logs generated per day. The count increases by 1 when the same IP is found among the logs per month, and allows us to determine the number of access requests made over an entire month. Among the classified logs, the IP address of 11*.17*.588, was extracted and found to account for 51.87% of the total logs. In addition, collected IP addresses can be used to identify access requests. Most importantly, the number of access attempts can be identified by accessing IPs and related maliciousness. When the counts are sorted by day for the IPs with high risk of multiple intrusions, attack patterns by date can also be analyzed.
While SSH brute-force attacks occurred with high frequency, the analysis of attacking patterns by date confirmed that the attacks took place over a few days rather than in a single day. These attacks are more malicious than those occurring in a single day and must be blocked since they occur continuously.
IPs carrying out SSH brute-force attacks can all be assumed to be malicious hosts; however, designating and blocking all attacking IPs as blacklisted IPs may be an ineffective policy. Some of the attacking IPs can be a one-time attack and have no intention to attack anymore. Thus, the frequency of attack is an important factor to consider in proving the intention of an attack. The number of attacks per day is also a significant factor that determines the continuity of attacks. A time-based classification can provide more details, such as whether the attacks occur during business hours or over holidays, and the country from which the attack originates can be determined by the IP.
Division | Characteristic |
---|---|
IP identification | SSH brute-force attack |
Attack strength | Blocking frequency by IP |
Attack timing | Number of days of detection |
Attack position | Country/International |
When selecting blacklisted IPs, those with one-time attacks should be excluded. A total of 419 IP addresses were involved in the attacks against the router. The IP with the highest frequency carried out 3,171 attacks, demonstrating its intention of attack. For ambiguous one-time attacks, it was difficult to determine whether 405 IPs, i.e., 96% of the total IPs, had any intent to attack further since they carried out less than 10 attacks; thus, these were not classified as blacklisted IPs. This made the analysis faster. Other parameters can be calculated to establish the allowable criteria to determine blacklisted IPs. Three characteristics of attack are defined as x, and whether an IP is classified as blacklisted is y. Formula 1 is used to identify a blacklisted IP.
Here, x1 is the frequency of attacks. n attacks (
x2 denotes the number of days detected, and is m (
x3 denotes the location of attack, which is either a specific country or international. The weighted values are 1 and 1.1 for country and international, respectively. x3 can be calculated as follows:
To illustrate the use of these formulas, we assumed that 100 SSH brute-force attacks were made against the Internet router per day within a country. The relevant formula is (100 * 0.01) * (1 * 0.1) * 1, and the value of 0.1 is returned. For an IP to be determined as a blacklisted IP, it must meet the requirements of at least 100 attacks, 2 or more detection days, and an international location. In the
IP | Attack frequency ( |
Detection days ( |
Location (Country 1/ International 1.1) | Blacklist rating | Blacklist status |
---|---|---|---|---|---|
a | 1 | 0.2 | 1.1 | 0.22 | Blacklist |
b | 1.32 | 0.1 | 1 | 0.132 | Suspect |
c | 8 | 0.1 | 1 | 0.8 | Blacklist |
d | 0.9 | 0.7 | 1.1 | 0.69 | Blacklist |
This research aims to identify SSH brute-force attacks against an Internet router through logs, select blacklisted IPs based on the analysis of attack characteristics, and establish an access control policy using the blacklisted IPs. Three research hypotheses on continuous and malicious access to the Internet router, specification of unauthorized access, and the fragmentation of logs of unauthorized access were set up and verified to establish access control policies based on logs.
Regarding the first hypothesis that there is continuous and malicious access to the Internet router, we revealed that more than 90% of logs were of unauthorized access. Attacks by unauthorized access generated logs that had the same message pattern, including “SSH user failed to log in from [attacker’s IP address] on VTY0 due to IP restriction.” By extracting relevant logs, unauthorized access was identified (which is associated with the second hypothesis) and the SSH brute-force attacks were detected. Lastly, the repeated message content was eliminated, and logs were fragmented by month, day, time, and year, which confirmed that the third hypothesis of log fragmentation was true. We digitized the characteristics of fragmented attack logs to detect blacklisted IPs and finally proposed a blacklisted IP identification model that can determine whether an IP should be blacklisted.
Attack frequency, number of detection days, and country from which the attack originated were used to determine blacklisted IPs. The weighted values of 0.01, 0.1, and 1 (country) or 1.1 (international) were set for attack frequency, detection days, and location, respectively. IPs with 100 or more attacks and 2 or more detection days and an overseas origin were classified as blacklisted IPs.
Month | No. of policies | Month | No. of policies | Month | No. of policies |
---|---|---|---|---|---|
2019/04 | 7 | 2019/08 | 4 | 2019/12 | 22 |
2019/05 | 24 | 2019/09 | 14 | 2020/01 | 27 |
2019/06 | 18 | 2019/10 | 19 | 2020/02 | 33 |
2019/07 | 11 | 2019/11 | 18 | 2020/03 | 19 |
We imposed a firewall to block blacklisted IPs attempting to access the internal traffic of destination. For 46 days from May when the firewall policy was established through June, a total of 10,147 IPs were blocked by the firewall through the CIDR blocking a single IP or processing the bands of multiple IPs. We found that 29 destination IPs were Chinese IPs with access requests.
Internal IP | Number of intercepts | Blocking specific gravity (%) | Internal IP | Number of policies | Relative height (%) |
---|---|---|---|---|---|
A IP | 5,466 | 53.87 | E IP | 45 | 0.44 |
B IP | 4,236 | 41.75 | F IP | 30 | 0.30 |
C IP | 285 | 2.81 | G IP | 30 | 0.30 |
D IP | 54 | 0.53 | H IP | 1 | 0.01 |
The advancement of infrastructure has increased the necessity to collect a significant amount of personal data for user convenience, driving IT infrastructure managers to encounter many important information assets and systems. However, there is no standardized method for analyzing the data generated during system management, which thus tends to rely on the manager’s experience. While skilled managers can solve a problem based on their experience, most managers take a long time to determine the meaning of each log and the cause of the error from a huge number of logs generated by a system. Systems designed to manage important information assets require timely action against errors and appropriate prevention, but the general process alone is not enough to solve the error when a quick action is required in a certain case.
This research analyzed logs generated in a router for one year and studied methods for protecting the system through the detection and access control of SSH brute-force attacks against the target system. We confirmed that many of the logs were generated due to SSH brute-force attacks, and then proposed a blacklisted IP determination model by fragmenting and examining the logs. This approach can prevent continuous attacks by detecting access to specific internal IPs through routers that are not normally protected by security devices. The approach can also prevent unauthorized access to internal IPs and attack site IPs by creating a blacklist based on the risk, thereby preventing infection to other systems. However, this method aims to prevent future attacks by analyzing the attacks that have already occurred, and thus it does not have the capability to respond to real-time attacks.
While the logs that are currently generated must be determined faster than the analysis of logs on previous attacks to respond to real-time attacks, in practice, it is difficult for a manager to immediately identify and determine them. To prevent potential attacks, it is necessary to apply a real-time log determination using machine learning in the future by analyzing past logs while conducting real-time analysis on current logs.