Screwed again by Rackspace changing a published image


Once again, Rackspace has changed the contents of an already-published server image without any notice to its users.

22 days ago, I provisioned a staging system with Ubuntu 11.10. In upgrading from 11.04, I had the typical difficulties — e.g., removing 11.04 package workarounds, and upgrading some software that we built from sources. When I finished, my Fabric script provisioned my 11.10 servers, and I wouldn’t have to futz with it again until we advanced to Ubuntu 12.04.

So imagine my surprise when I tried re-provisioning our staging system yesterday, and the script threw an oddball installation failure for PostgreSQL, and all the servers had oddball network flakiness.

What the? Our fabfile.py hadn’t changed. I was still installing on Ubuntu 11.10. In 22 days there weren’t any significant package changes. This was weird.

After a couple of hours sleuthing, going to Rackspace customer support for help, and searching on the web, I found that the Rackspace Ubuntu 11.10 images had mysteriously changed in two significant ways.

Git becomes a git

22 days ago, our script did this without error:

with cd("/tmp"):
    run("git clone git://github.com/ariya/phantomjs.git")

The “run” command now generated a name resolution failure. After some puzzlement, I used a sledgehammer:

with cd("/tmp"):
    sudo("git clone git://github.com/ariya/phantomjs.git")

…and all was well.

Network? You don’t need no steekin’ network

The Ubuntu 11.10 image now has /etc/hosts protected from world read access! It should (and 22 days ago it did) have an access mode of 644 when the server first booted. It now has an access mode of 600.

Surprise!

A hat tip to a Blogsplat post that described this same problem with Rackspace Ubuntu 9.10 images, and which saved me lots of time.

Gah

Rackspace’s “Live Chat” customer support was great. The rep was super helpful, and he & I debugged the problem together in real-time. I can’t get mad at him, because he’s just the poor sod who’s on the customer service front line!

But I am increasingly irked by how Rackspace changes a server image it’s already published. I don’t understand how anyone can think this is a good policy. If there’s a flaw in an image, leave it in place and put up another server image with the fix. Maybe use local version numbers on the server images.

It’s like they don’t realize their customers might use automated deployment, and once they can install and configure version X of server image Y, they expect to have repeatability.

3 comments
  1. Erik Carlin said:

    John,

    I work on the Cloud Servers team at Rackspace. We think we know what happened. Could you please contact me at erik dot carlin at rackspace dot com. I’d love to follow up with you, confirm our hypothesis, and get some feedback from you on how you’d like to see it work.

    Thanks for calling us out. We are constantly improving cloud servers and need customers to make us aware of issues and keep us honest.

    Regards,
    Erik Carlin
    Director of Product

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: