Hierarchical Clustering Based Network Traffic Data Reduction for Improving Suspicious Flow Detection

Published in TrustCom/BigDataSE 2018, 2018

Recommended citation: Liya Su, Yepeng Yao, Ning Li, Junrong Liu, Zhigang Lu, Baoxu Liu. Hierarchical Clustering Based Network Traffic Data Reduction for Improving Suspicious Flow Detection[C]//2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, 2018: 744-753. https://ieeexplore.ieee.org/document/8455976

Download paper here

Recommended citation: Liya Su, Yepeng Yao, Ning Li, Junrong Liu, Zhigang Lu, Baoxu Liu. Hierarchical Clustering Based Network Traffic Data Reduction for Improving Suspicious Flow Detection[C]//2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, 2018: 744-753.

Abstract

Attacks like APT have lasted for a long time which need suspicious flow detection on long-time data. However, the challenge of effectively analyzing massive data source for suspicious flow diagnosis is unmet yet. Consequently, flow data reduction should be adopted, which refers to abstract the most relevant information from the massive dataset. Existing approaches to sampling flow data are inherently inaccurate unless running at high sampling rate. In this paper, we proposed HCBS (Hierarchical Clustering Based Sampling), a flow data reduction scheme, to alleviate such problems. We study the characteristics of flow data relating malicious activities and employ hierarchical clustering to sample data for further deep detection. Experiments on 1999 DARPA dataset demonstrates that HCBS reduces the size of the flow data by 40% with only a small loss in accuracy and significantly outperforms the compared state-of-the-art.