Table of Content

Open Access iconOpen Access

ARTICLE

crossmark

Oversampling Methods Combined Clustering and Data Cleaning for Imbalanced Network Data

Yang Yang1,*, Qian Zhao1, Linna Ruan2, Zhipeng Gao1, Yonghua Huo3, Xuesong Qiu1

1 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100000, China
2 Cloud Computing and Distributed Systems Laboratory, School of Computing and Information Systems, University of Melbourne, Melbourne, 3000, Australia
3 The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang, 050000, China

* Corresponding Author: Yang Yang. Email: email

Intelligent Automation & Soft Computing 2020, 26(5), 1139-1155. https://doi.org/10.32604/iasc.2020.011705

Abstract

In network anomaly detection, network traffic data are often imbalanced, that is, certain classes of network traffic data have a large sample data volume while other classes have few, resulting in reduced overall network traffic anomaly detection on a minority class of samples. For imbalanced data, researchers have proposed the use of oversampling techniques to balance data sets; in particular, an oversampling method called the SMOTE provides a simple and effective solution for balancing data sets. However, current oversampling methods suffer from the generation of noisy samples and poor information quality. Hence, this study proposes an oversampling method for imbalanced network traffic data that combines the SMOTE algorithm and FINCH clustering algorithm to filter out minority sample clusters, proposes a scheme to allocate the number of synthetic samples per cluster according to the clustering sparsity and sample weight, and finally uses multi-layer sensors for noisy sample cleaning during sampling. We compare the proposed method with other oversampling methods, verifying that a data set processed using this method works better in network traffic anomaly detection.

Keywords


Cite This Article

Y. Yang, Q. Zhao, L. Ruan, Z. Gao, Y. Huo et al., "Oversampling methods combined clustering and data cleaning for imbalanced network data," Intelligent Automation & Soft Computing, vol. 26, no.5, pp. 1139–1155, 2020.

Citations




cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1500

    View

  • 1042

    Download

  • 1

    Like

Share Link