John Heidemann

Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)

TitleBack Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)
Publication TypeTechnical Report
Year of Publication2018
AuthorsJ. Heidemann, Y. Pradkin, and A. Nisar
Date Publishedfeb
Institutionusc-isi
Abstract

Internet reliability has many potential weaknesses: fiber rights-of-way at the physical layer, exchange-point congestion from DDOS at the network layer, settlement disputes between organizations at the financial layer, and government intervention the political layer. This paper shows that we can discover common points-of-failure at any of these layers by observing correlated failures. We use end-to-end observations from data-plane-level connectivity of edge hosts in the Internet. We identify correlations in connectivity: networks that usually fail and recover at the same time suggest common point-of-failure. We define two new algorithms to meet these goals. First, we define a computationally-efficient algorithm to create a linear ordering of blocks to make correlated failures apparent to a human analyst. Second, we develop an event-based clustering algorithm that directly networks with correlated failures, suggesting common points-of-failure. Our algorithms scale to real-world datasets of millions of networks and observations: linear ordering is O(n log n) time and event-based clustering parallelizes with Map/Reduce. We demonstrate them on three months of outages for 4 million /24 network prefixes, showing high recall (0.83 to 0.98) and precision (0.72 to 1.0) for blocks that respond. We also show that our algorithms generalize to identify correlations in anycast catchments and routing.

URLhttps://www.isi.edu/%7ejohnh/PAPERS/Heidemann18b.html
Groups: