Trials and Tribulations Giving svn space to HLP projects since, well, 2009…

31Oct/150

…and we are back… mostly….

By now, those who check this probably noticed that things were non-functional for roughly two days, and might be wondering what happened.

In a nutshell, there was a mishap with a RAID virtual disk setting, which ended up nuking around a dozen machines...

and when I say nuking, I mean gone.  No trace.  Pretty much the entire infrastructure here, gone in the blink of an eye.  Fortunately I had backups of the systems, so it was more takes time to rebuild and figure out the differences between the versions of software than a case of being totally, well, fucked.

So, as many of you can tell, the SVN is back up and should be 100%.  I've still got some monitoring stuff to fix and a pair of windows machines to rebuild (ug, those fsckers are going to take a while with all the patching), but the majority of stuff is back up (along with all 75+ gig of the svn.  That took a while to restore).  Pretty sure I will still need to tweak some memory allocations and such, but that's a hell of a lot easier to deal with than arguing with a backup program that wants to restore data, but wants to use DNS names with said DNS being gone.

I dunno, I think rebuilding around 9 machines and well over 100 gigs of data from basically blank drives in about two days isn't that bad..

Filed under: Software No Comments
10Jun/150

For fscks sake, a tale of HHD’s playing hide and seek from the controller…

So the other day, the host that runs the SVN (and about half of the infrastructure, including my IRC proxy), decided that it didn't want to talk to the local hard drive.  This isn't good, obviously.  A reboot of the system brought it back and I chalked it up to generic computer weirdness.

Well, imagine my reaction when I found it had done the same thing today.  A quick bit of experimenting showed that the drive wouldn't reappear to the system unless I did a full power-down, not just a reset.

This doesn't bode well for the drive and is not how I wanted to spend my morning.

So everything is back up, again, and I'm moving the machines from the local drive to the iSCSI NAS setup, which has proven to be reliable, if not somewhat slower.  This should work until I can:

  • Replace the drive
  • Find some kind of inexpensive RAID 1 setup to implement on the main machines

Right now, I'm thinking a pair of used PERC/6 cards.  As much as I dislike using anything Dell produced, these are apparently known quantity cards that are well supported.  They have their limitations, of course, but it's hard to look away from a card that has battery backed RAM than can be acquired for around 15 USD a pop, just supply break out cables and batteries.

Of course, finding a 8484 breakout cable that does the 8484 at an angle (there isn't a lot of room between the top of the full height card and a 4U case) and ends in SATA connectors that doesn't cost 30-freaking-bucks-a-cable is seemingly less than trivial, if even such a thing is available.

Anyway, that's the situation right now.  Migration to the iSCSI is still ongoing at the time of this writing, but seems to be progressing OK so far.

Knock on some wood for me, will ya?

UPDATE - 11:36PM Pacific

Obviously, the migration didn't go as smoothly as expected.  About the time I was finishing up the migration, the target machine decided to go weird with it's iSCSI connection to the NAS as well.

Figures, huh?

It's been restarted, machines migrated and everything checks out OK.  Naturally, the suspect hard drive in the original machine decided to not drop out during this mess.  I still plan on adding in some RAID1 stuff on these machines to, hopefully, prevent this kind of crap from happening again.

It never ends, you know?

UPDATE - 6/24:

Found a killer deal on some PERC 6/i cards and have placed the order for the cards (with a spare or two, just in case) along with new batteries and battery cables.  Now to find the breakout cables, hot swap bays and drives.

The drives, I'm looking at the WD Red series as they are designed for 24/7 operation, as well as being sold, quite often, at a similar price as the blues and such for the same capacity.  They aren't the fastest drives out there, but the warranty and exchange process WD has (cross shipping and stuff) makes them worthwhile IMO.

Besides, the write caching the PERC's have will help a lot in the effective performance department.

UPDATE - 10/19:

Well, finally found a deal on some drives.  WD greens (not as 'good'), but found some utilities that will let me turn on TLER and either turn off the head parking or increase the timeout to some high amount like 8 min, which will basically turn them into WD reds.  The machine that has the problem drive isn't currently being used for anything, so there isn't any worry of downtime from this upcoming work.

Greens were not my first choice, but from what I have read on forums like the one for FreeNAS, which is populated with people who are very conservative about their hardware choices, making sure what is used works well (no half arsed hardware suggestions in there), the greens are decent, once the adjustments are done.

Everything is finally coming together for this.  About bloody time IMO.

UPDATE - 10/28:

Well crap.  The TLER adjustment thing doesn't work any longer for the newer drives, but the head parking does.  I've deployed the card and drives in one of the hosts and things seem to be working very well so far (knock on wood).  I plan on deploying the changes on the other system ASAP.  The process usually takes a day or so, time allowing, but I'll be glad and breath easier when a can finally get rid of another single point of failure on the rack.

Filed under: Hardware, Software No Comments
22Feb/130

If you tried to work with the SVN manager UI since last night…

Yes, it was non-functional.

The machine that plays the part of a proxy between the DMZ and the SQL server (where all the candy is kept) runs as a VM, and the VM server was having... issues.

Specifically, either the CF card to SATA adapter went sideways, or the board decided that it didn't like it. Either way, with the adapter plugged in, the physical drives in the machine would randomly block the adapter from being recognized.

  • Adapter only = works, but no VM
  • Drives only = works, but no way to boot the system
  • Both = Random drive would appear in the BIOS, or all of them would appear, but it wouldn't let the adapter show up as a bootable device

Go figure.

Anyway, I think I have it all back up now (I was getting some nasty page faults and such for a while, but I was able to apply some needed patches in-between restarts). If you are reading this, it's not bombed out recently.

That's good, right? 😛

25Dec/110

Well, that went easy enough (knock on wood)…

Well the work last night appeared to go off without a hitch (knocks on wood). The storage space on the SVN went from 40 gig to 100 gig, so it should be good for a while now. Well, one hopes anyway. 🙂

The firewall work didn't happen as I've run into a possible bug with the traffic shaper, and as the newer traffic shaper is one of the main reasons for the switchout... Well, you get the idea.

Anyway, I have an image of the previous storage area for the SVN in case something goes wrong, but plan on removing it in a week or so to reclaim the space. Enjoy!

Rev. P

20Dec/110

Rev and the case of the Hungry Hungry SVN….

Greetings one and all.

It's been a while since I've had to send anything out. I'm going to chalk this up as a Good Thing(tm) since the only time I have to send out any notifications is when Something Bad(tm) has happened, or is going to happen.

Sadly, this is more the latter.

The SVN, she is running out of space again (kaptin!).

The Good: I do happen to have some extra space I can add to the SVN. I was thinking I could probably allocate another 50 gig (or so) without too much issue.

The Bad, Part 1: This is going to require downtime. I'm also looking into moving the svn space over to another machine on my local network so that I can potentially adjust the storage space on-the-fly, or at least cut down on the amount of downtime needed.

The Bad, Part 2: I'm sure you all have seen the prices of drive space since the flooding in Asia? It's not pretty, and it doesn't look to be going back to pre-flood prices any time soon. So if anyone has any unused drives, spare funds, magic hard drive dust, etc, and is able to redirect it this way, it would be appreciated. I can promise and guarantee that anything sent my way would be used only for the SVN. 🙂

Anyway, it's Tuesday around 12:50 PST right now (-8 GMT), and I will be looking at taking the SVN down this weekend (more than likely sometime around 22:00) to perform the allocation. I may be able to get the newer firewall build in as well, computer gods willing.

And that's about it. If you have any questions and such, IM or email me directly, the notifications from HLP, for some reason, are not making it to my inbox and I've yet to track down where the problem is.

Rev. P

21Aug/110

If you are reading this, the work is done. :-P

Well, that was a bit of an adventure. Due to the odd spacing of the holes in the rack, the mounting of the servers wasn't quite as smooth as I had hoped, and the oddish way that the rails had to be mounted didn't help either.

However, it is done, the machines are able to be pulled in and out with ease for any hardware work, and it looks a heck of a lot better.

If there are any problems still (initial tests from a few people have all been positive), let me know.

It's now 6AM, and I'm heading off to sleep. 🙂

19Aug/110

Upcoming downtime this weekend

I've got some new rails inbound, and will be taking the systems down this weekend to mount them in the rack. This shouldn't take too much time and bringing everything back up should also be as painless as when I replaced the drives.

EDIT: Well, the rails arrived and I test fit one of my empty cases. The results were semi-successful...

Due to the design of the rack, the clips the rack uses needs to be aligned a specific way (which makes them a PITA), as well as the rails mounted in a specific way from how I would normally mount a case if I was to mount them directly to the rack. Even then, it's looking to be a somewhat tight fit.

Once again, it's never easy. I'm going to get as much done before I power things down to minimize the downtime, but it might take somewhat longer than expected.

*sigh*

Hopefully I don't need to run to the hardware store to get more screws. Wish me luck? 🙂

EDIT2: So you all know, it was asked how long this will take. I'm estimating around 45 minutes for the SVN itself. This shouldn't take anything near hours or all weekend. If it was looking to take that long, I'd abort and regroup before trying it again. 🙂

Filed under: Hardware No Comments
27Jul/110

Accessing the SVN via IP? Having problems?

For those of you who are using the IP to access the SVN, it's gone kinda weird on me. I suspect DNS issues somewhere, but I'm trying to figure out WTH is happening exactly.

However, I highly/strongly recommend that you switch to using a DNS name vs an IP. Use the SVN relocate function, it's easy, and everything should be seamless.

The reasoning behind this is that with the number of projects being hosted, I've had to perform some DNS 'slight-of-hand' to get everything working, so what does what for how many cookies/biscuits tends to depend on using the proper name vs an address.

EDIT 8/1: I have managed to find a config change that fixes this behavior, and things should be working as expected from before, but my suggestion to use machine names over IP's still holds.

Besides, if, for some reason, have to change the IP of the server, using the name will keep things working (once the DNS changes make it to you). In the long run, it's much easier to just use a name.

As always, thanks for reading and contact me if you have any questions, comments, poison pen letters, and so on.

Rev. P

2Jul/110

I love the smell of randomly overheating kit in the morning…

*sigh*

So out of the blue today, the firewall machine decided that it was overheating and decided to shut down.

*frowny face*

Why it did this, I do not know.  Fortunately someone was around to power it back up and it's been a happy camper since.  This makes me think that it's time to seriously look at updating the hardware that makes it do what it does.

The current layout is:

  • VIA 800 MHz CPU
  • 1 gig of RAM
  • Some random IDE laptop hard drive

Doesn't seem like much, but it's been plenty for a firewall machine, low power and fairly quiet as well.  However, I'm starting to not trust the hardware.

I have a spare board with a single core Celeron (core2 based, so it's not total pants), but I don't have a 2U power supply for the rack case, or a riser card for the network (the board has a single interface, I need a minimum of three).  I might be able to kludge something together, but I'd rather actually use parts designed for the purpose, and not use the equivalent of razor wire and gaffer tape to do it, you know?

If only 2U power supplies were not so uncommon, expensive, and noisy.  Anyway, if anyone was thinking of donating funds, now is a pretty good time to do so. 😉

Sorry about the brief outage today.  As you can see above, I'm trying to figure out a way to keep it from happening again.

Filed under: Hardware No Comments
12Jun/110

Server is back up, everything looks good

Drive swap was successful, and the server is back up and running on a new drive.

Hopefully I won't have to take things down again any time soon.

Filed under: Hardware, Software No Comments