FPGA-Based Network Traffic Classification Using Machine Learning

Real-time classification of internet traffic is critical for the efficient management of networks. Classification approaches based on machine learning techniques have shown promising results with high levels of accuracy. In this paper, the suitability of packet-level and flow-level features is valid...

Full description

Saved in:
Bibliographic Details
Main Author: Elnawawy, Mohammed (author)
Other Authors: Sagahyroon, Assim (author), Shanableh, Tamer (author)
Format: article
Published: 2020
Subjects:
Online Access:http://hdl.handle.net/11073/19796
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Real-time classification of internet traffic is critical for the efficient management of networks. Classification approaches based on machine learning techniques have shown promising results with high levels of accuracy. In this paper, the suitability of packet-level and flow-level features is validated using stepwise regression and random forest feature selection. Moreover, the optimal percentage of packets considered within a flow while extracting flow-level features is determined. Several experiments are conducted using naïve Bayes, support vector machine, k-nearest neighbor, random forest, and artificial neural networks on the University of Brescia (UNIBS) and the University of New Brunswick (UNB) datasets, which are both publicly available. The performed experiments show that 60% of flow packets are a good compromise that ensures high performance in the least processing time. The results of the conducted experiments indicate that random forest outperforms other algorithms achieving a maximum accuracy of 98.5% and an F-score of 0.932. Further, and since software-based classifiers cannot meet the anticipated real-time requirements, we propose a Field-Programmable Gate Array (FPGA) based random forest implementation that utilizes a highly pipelined architecture to accelerate such a time-consuming task. The proposed design achieves an average throughput of 163.24 Gbps, exceeding throughputs of reported hardware-based classifiers that use comparable approaches, which in turn ensures the continuity of realtime traffic classification at congested data centers.