Yikes, time flies!
We’ve been dealing with several emergencies that sprung up in fairly short order, and working on adding redundancy to our systems so these things don’t happen again.
It’s a long story, and essentially started when Oracle bought Sun Microsystems, the vendor and tech support for most of our servers. Suddenly the prices for tech support contracts multiplied (yes, you heard right, more than doubled) to the point where we couldn’t afford decent support plans for several of our important systems.
Since then we’ve found out that servers are kind of like cars in that they apparently have a built-in sensor for “warranty has expired, time to break down now.”
This day and age being what it is, I suppose I should add a disclaimer in case it isn’t TOTALLY FREAKING OBVIOUS that I was joking. I’d probably lose my job if I seriously accused Oracle or anybody of purposely making systems fail if the support contract level wasn’t high enough. Besides, they wouldn’t need to– systems fail often enough on their own.
…aaand I should probably quit writing before I say something that might really get me in trouble.
As of a bit after 5:00 PM Friday, our main webserver is running on Apache, rather than Sun’s Java Enterprise System Webserver as it has been for years. Originally this change was meant as a test, but by late Friday night (AKA Saturday morning) things looked good enough that we decided to run with it.
Unfortunately, what looks good at 4:00 AM isn’t always so great by the light of day. Ever since then I’ve been running around putting out fires.
Here’s a brief list of the problems we’ve seen (most are already solved.)
Cold Fusion (.cfm) pages don’t work on the new server. All those sites I’ve found have been redirected back to the old server where they do work.
Blog admin didn’t work. I had to point that back to the old server too, for now.
Portal single-signon links weren’t all working. The link for blog admin is fixed. The WOUAlert link isn’t fixed yet, but at least I know the problem and am working on it.
PHP pages using old-style code block tags didn’t work. Some people were coding PHP using the old, deprecated tags to delimit blocks of PHP code. The correct way to do this is . I had to tell the new webserver to allow the old-school code, but we really need to get rid of it because it can get mixed up with other languages. BTW, this is what caused the quick links on the homepage not to work, so I’m not totally innocent here myself.
Old-style PHP database calls didn’t work. I replaced the PEAR/DB module so some of this stuff will work again, but there might be other stuff made without PEAR/DB and with obsolete database tags that won’t work until it is rewritten.
Overly tight security settings. Some pages weren’t able to get external files that they needed and so were erroring out.
There’s more, but I need to get back to that WOUAlert problem.
A few weeks ago, when Brian was on vacation, some of the rest of us had a communication breakdown about creating new user accounts, and several new employees had to wait entirely too long before they could log on. This was at least partially my fault.
Brian is going to be gone again next week, but this time we won’t have these problems because we’ve improved the process. First of all, we found out why most of the notifications were misrouted and fixed that. Also, I’ve added some more automation to the faculty/staff account creation system, so there’s less work to do. I can’t really talk about the details because that would mean giving out too many specifics about our servers, but several steps that formerly had to be done by hand now happen by themselves. The weird part was how easy it was to do, once we took another look at the process; once upon a time it had to be complicated, but thanks to various changes we’ve made in the last few years, a bunch of stuff was no longer needed.
Anyway it’s way the heck late at night and I need to get out of here. At least the prettymail stuff is working , um, pretty well. (Yeah, this is my 2AM sense of humor.)
On Saturday all three air conditioning units in the server room shut down, and the place rapidly turned into an oven. Our servers put out a lot of heat, and have to be kept cool to prevent Bad Things from happening… and so when the air handlers stopped, Bad Things started to happen.
Luckily, only a couple of servers had actual hardware damage, and those didn’t have anything critical on them. Several more servers shut down ungracefully or started behaving erratically. Luckily our two biggest servers, cougar and sundown, never actually crashed, but since our main network infrastructure server did, nobody could get to cougar or sundown.
Since I live so close to campus, I got called in, but it was Paul Lambert and Dave Diemer who did most of the heavy lifting. Once the major problems were cleared away, then I could do my thing. Dave was still working on three servers until the next morning, and I was up until really late babysitting the webserver, which seemed to go catatonic every few minutes for no apparent reason. We’ll still be cleaning this up for a while.
OK, user registration on the wiki server now requires a valid WOU login. I got a little sick of spammers crapping on our server, so this should lock them out.
If you have any trouble registering, please let me know at email@example.com.
Oh, and I probably shouldn’t mention this until I have it working, but I think I see a fix for the long-standing problem of the missing email notification of changes. With luck, that’ll be taken care of soon.
Hey folks, you may (or may not) have noticed that the wiki server was down for a few days. This was because we got more spam on it. It’s cleaned up again, and I’ve locked the guest account so it can no longer edit anything on the server.
In the near future I’ll also be adding account verification to the user registration process; if you’re already a user this won’t affect you, but new people who want to register will have to provide a valid WOU login. If anyone finds this inconvenient, well, blame the spammers.
It looks like some spammer, and probably more than one, has been dumping their crap on our wiki server for the last couple of weeks. I just spent the last ten or so hours cleaning that up. I’ve disabled the guest account, which is where most of the damage came from, plus a few other spammer accounts that have crept in since Troy left.
Looks like I’m managing the wiki server now; if you see anything that looks like spam, or if you have any general wiki questions, please feel free to ask me!