This Space Intentionally Left Blank

8 Brumaire CCXII (October 29, 2003)

(System Stuff) I'm Gonna Let the Bad Times Roll

There's a lessson to be learnt here, I'm sure of it. Mainly, don't upgrade important parts of your router from a remote location. What follows is an explanation of why, and a quick, not-horribly-technical explanation of why the site was down:

I realised that I hadn't run apt-get upgrade on my router for a while, so I did so during my lunch break. Among the things that upgrades were gotten for was the PPP daemon. Ok, not a serious problem, the IP changes but thankfully ddclient reregisters it with DynDNS and I can reconnect.

So, where do the problems come in? Well, among the other things I noticed was that I had very little free space on my root partition. A quick search revelead that the culprit was daemon.log, which was taking up a total of 120MB between the current log and the previous one. Part of the reason? The DHCP daemon was set to a 10 minute lease, meaning that any issues that occurred got printed in there 6 times per hour. So, I edit the DHCP config file and set it for a longer lease period. Restart the server and all should be good.

Errrrrr… No.

Seems that I forgot that I never fixed the PPP script so that it didn't overwrite the reference to the local DNS server and the DHCP addresses are bound to local host names. So it assigned random IPs to the DB and web servers, meaning you could no longer access them they way you normally would.

Now I've forgotten all about the DNS issue, so I'm trying random things to get it to work. (Reloading the iptables rules, taking down the network interface, restarting DHCP, restarting the network — in that order, which is important later.) I finally figure out what it is, connect to the machines on their new addresses, tell it to restart the network connection, and watch as they disappear off the network.

Remember how I said that the order I tried things was important? Well, it seems that dhcpd didn't like me restarting it while that interface was still down. Whoops.

So by then my machines are completely unreachable remotely. Time to put up the "please standby" page that resides on the router and wait until I get home.

Hmmmm… That was not anywhere near as non-technical as I promised, was it? Well then, one sentence non-technical summary: Andrew fucked up.

Posted by g026r at 20:11
Comments

It seems that site downages are all the rage lately.  Who will be the next to follow the trend?

Posted by peter at 9 Brumaire CCXII 01:46 (2003/10/30)

Maybe someone should talk to Brandon, that way him and Nancy won't feel left out.

Siteicon Posted by g026r at 9 Brumaire CCXII 02:13 (2003/10/30)

Whoops! Just noticed I messed up their/there in that entry.

This is why I should reread these things. Especially if they contain any senetences I rewrote whilst constructing the entry.

Siteicon Posted by g026r at 9 Brumaire CCXII 02:17 (2003/10/30)
Post a comment







Past Entries

Past Entries