Once again, Rackspace has changed the contents of an already-published server image without any notice to its users.
22 days ago, I provisioned a staging system with Ubuntu 11.10. In upgrading from 11.04, I had the typical difficulties — e.g., removing 11.04 package workarounds, and upgrading some software that we built from sources. When I finished, my Fabric script provisioned my 11.10 servers, and I wouldn’t have to futz with it again until we advanced to Ubuntu 12.04.
So imagine my surprise when I tried re-provisioning our staging system yesterday, and the script threw an oddball installation failure for PostgreSQL, and all the servers had oddball network flakiness.
What the? Our fabfile.py hadn’t changed. I was still installing on Ubuntu 11.10. In 22 days there weren’t any significant package changes. This was weird.
After a couple of hours sleuthing, going to Rackspace customer support for help, and searching on the web, I found that the Rackspace Ubuntu 11.10 images had mysteriously changed in two significant ways.
Git becomes a git
22 days ago, our script did this without error:
with cd("/tmp"): run("git clone git://github.com/ariya/phantomjs.git")
The “run” command now generated a name resolution failure. After some puzzlement, I used a sledgehammer:
with cd("/tmp"): sudo("git clone git://github.com/ariya/phantomjs.git")
…and all was well.
Network? You don’t need no steekin’ network
The Ubuntu 11.10 image now has /etc/hosts protected from world read access! It should (and 22 days ago it did) have an access mode of 644 when the server first booted. It now has an access mode of 600.
A hat tip to a Blogsplat post that described this same problem with Rackspace Ubuntu 9.10 images, and which saved me lots of time.
Rackspace’s “Live Chat” customer support was great. The rep was super helpful, and he & I debugged the problem together in real-time. I can’t get mad at him, because he’s just the poor sod who’s on the customer service front line!
But I am increasingly irked by how Rackspace changes a server image it’s already published. I don’t understand how anyone can think this is a good policy. If there’s a flaw in an image, leave it in place and put up another server image with the fix. Maybe use local version numbers on the server images.
It’s like they don’t realize their customers might use automated deployment, and once they can install and configure version X of server image Y, they expect to have repeatability.