SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

NetGraf: An End-to-End Learning Network Monitoring Service


Workshop:INDIS'21: 8th Workshop on Innovating the Network for Data-Intensive Science

Authors: Bashir Mohammed (Lawrence Berkeley National Laboratory (LBNL)), Mariam Kiran (Energy Sciences Network (ESnet)), and Bjoern Enders (Lawrence Berkeley National Laboratory (LBNL))


Abstract: Network monitoring services are of enormous importance to ensure optimal performance is being delivered and help determine any failing services. Particularly for large data transfers, checking key performance indicators like throughput, packet loss, and latency can make or break experiment results. However, network monitoring tools are very diverse in metrics collected and dependent on the devices installed. Additionally, there are limited tools that can learn and determine the cause of degraded performance. This paper presents NetGraf, a novel end-to-end learning monitoring system that utilizes current monitoring tools, merges multiple data sources into one dashboard for easy use, and provides machine learning libraries to analyze the data and perform real-time anomaly finding. Using a database backend, NetGraf can learn performance trends and show users if network performance has degraded. We demonstrate how NetGraf can easily be deployed through automation services and linked to multiple monitoring sources to collect data. Via the machine learning innovation and merging various data sources, NetGraf aims to fulfill the need for holistic learning network telemetry monitoring. To the best of our knowledge, this is the first-ever end-to-end learning monitoring service. We demonstrate its use on two network setups to showcase its impact.





Back to INDIS'21: 8th Workshop on Innovating the Network for Data-Intensive Science Archive Listing



Back to Full Workshop Archive Listing