Bad Start

Sunday, January 17th, 2016

This has been a pretty bad start to the year at work. It seems I've had problem after problem with things that have been running perfectly for years. It's really annoying.

First was the art file server. Students in art classes use this server to store their projects and data. The live disk started developing offline uncorrectable sectors left and right. This happened the Friday before Christmas break, best timing ever. This server was very old, not very powerful and with the disk problems it took days to copy the 2TB of data to a new server.

Then Monday after break I get to work to find the faculty/staff file server with a dead disk. This disk was totally dead, but thanks to backups no data was actually lost. It just took most of the day to change the disk, make sure things were stable and I had multiple copies of the data before allowing people to access the files. It can be annoying when you're constantly asked by everyone for an ETA on when something will be working again. At least this forced an upgrade from a few hundred gigabytes to a 6 terabytes of storage.

The next day I find out one of my public-facing servers had been compromised and was participating in a DDOS attack against some Chinese webmail provider. It was sending 20Mbps of DNS requests to the site's nameservers. I only noticed this since DNS requests are very small packets to get to 20Mbps the number of packets per second coming out of this server was causing some serious latency with other things. The malware replaced the binarys for most programs used to detect things like this (ps, ss, netstat, lsof) with its own versions that hide its presence. The better part of another day wasted dealing with this.

Then I find out that the data collection system for the Commons is all sorts of fucked up. Every 30 seconds or so it logs current data on electric usage, weather stats and all sorts of other data while providing a website to access this data for students to download and analyze. It was collecting data (for the most part) but the website was eating up 500% of the CPU while just sitting there idle. It was take more than 10 seconds to render the index page on the site.

This was all in the first few days of the new year. I hope the rest of the year calms down because I'm not getting anything accomplished when I have to deal with these bullshit problems.

