Time Warner Defeated by Munin

Solving internet service disruption by external Munin server tracking.

By: Chris Saylor | November 18, 2010 • 2 minute read

Being a customer of Time Warner (Road Runner) for nearly three years now, I have had my share of technical issues that required a technician to come out and rummage around in the magic cable box outside my house. Some of the worst issues to correct are intermittent. I share the pain of the Time Warner techs when dealing with seemingly intangible errors, but that doesn’t mean you devote no effort to diagnosing what the issue could have been (still existing just not presenting).

Diligence to identifying a problem that isn’t currently happening usually wanes fast. I just happen to run a web server in Atlanta, and that server happens to have a monitoring tool installed called Munin. Munin is a tool that graphs many aspects of a node, logs that data, and transmits it back to the Munin server. In this case, the Munin server is at my house.

So how did Munin help me convince a technician that something systematic was happening? It turns out that intermittent issues become very obvious when they’re made visual by being graphed over the time period in which it occurred. I was able to demonstrate to the technician exactly when and for how long I was without internet by showing him the interruption in reporting from my Munin node in Atlanta.

eth0 traffic graphed by week
eth0 traffic graphed by week

The gaps on the left side of the above graph makes it pretty plain that something happened where there are noticeable gaps in the traffic graph. One could argue, well maybe there just was no traffic going to your server during those times (doesn’t really explain the sudden drop instead of a drop off). Observe exhibit B:

Note: This article was restored from archive and this image was lost. It depicted MySQL activity graphed by week which showed a similar gap as other examples.

Still not convinced?

Disk Usage graphed by week
Disk Usage graphed by week

Disk utilization does not change that quickly on a web server, and certainly is not going to zero without something horribly wrong happening.

Thanks to Munin, the tech acknowledged that there was a problem, quickly determined it was something on their end (hard to BS me), and scheduled a work order for a line technician to take care of the issue. Munin for the win.

Related Content