An update to an earlier post…
I’m replacing pyrax with something else in our system. The authentication errors and oddball failures still occur, and I’ve lost confidence that Rackspace will fix them in any reasonable amount of time. This is extremely frustrating.
Python-cloudfiles was way more stable, even though it wasn’t and still isn’t in active development. Maybe we’ll resume using that.
Thursday, I was irked by a bug.
I had modified a background task so it could import a range of documents from another subsystem into our datastore, instead of only one. Its parameters had included one “document id”, which identified the patent document to import. Now, it could be given that, or two document ids representing a document range.
In one instance, it reported a successful completion yet the desired patents weren’t loaded. What had gone wrong?
Multiple official and de facto formats exist for US patent application and grant document ids. To keep this simple, let’s consider US Design Patents. Their document id is a “D” followed by a number. This looks like “D4432”, or “D902”.
So if you wanted to import a range of Design Patents, you might say, “Import the patents D900 through D4000, inclusive.” “D900” is the lower bound and “D4000” is the upper bound. Right?
Not so fast!
>>> "D900" < "D4000"
tl;dr: Think about exceptions when writing a context manager.
I made a huge unforced error with a context manager at work.
We use Redis distributed locks for system synchronization. I wanted a context manager that acquired n locks, executed protected code, and then released the n locks in reverse order. It would be simple to use:
from common.util import Semaphore, distlock
semaphore1 = Semaphore(OwnerDisambiguationUpdate.UPDATE_LOCK)
semaphore2 = Semaphore(USMaintenanceFeeUpdate.UPDATE_LOCK)
with distlock(semaphore1, semaphore2):
(The Semaphore class does other work with aborting Celery tasks, but that’s not germane here. It’s a Redis distributed lock with extra fanciness.)
An update to an earlier post…
We’ve had problems using the pyrax SDK, mostly in account authentication.
First, it wasn’t at all clear when, or under what conditions, we had to re-authenticate our pyrax token. As documented, after you initially authenticate your credentials, pyrax handles all subsequent re-authentication under the covers. I.e., it will automatically re-authenticate the token if it ever expires.
This is kind of odd. I don’t understand why a good token should need re-authentication.
We then discovered that pyrax sometimes can’t re-authenticate our token! Every 19 hours, we hit a period of about five hours when our token won’t automatically authenticate. Why? I still don’t have a clear answer. Some authentication server, somewhere, clearly gets confused. You won’t run into this bug if you don’t have long-running processes. But, we do.
We host IP Street’s SAAS product at Rackspace. We’re finally taking the plunge and upgrading from python-cloudfiles to pyrax. We didn’t have any big issues with python-cloudfiles, but I was tiring of getting the brush-off from Rackspace when we asked for help with an API failure.
The benefits of keeping a technology up-to-date far outweighs the costs, unless you’re in an extreme corner case with a very unreliable vendor. Better performance, bug fixes, better capabilities, better support… all good stuff.
I’ve found some candidates for replacing Celery in my company’s product. (My reasons for replacing it are elucidated here, here, and here.)
I got these from web trawling, blog comments, and some e-mail. At first blush, none of the candidates have any disqualifying attributes, except for lacking subtasks. Celery is the only Python-friendly asynchronous task technology with subtask support, so I’ll need to bend on that if I want any alternatives to consider. (If I’m wrong on this point, please let me know in the comments!)
I’m not saying that these candidates will definitely satisfy all (sans subtasks) of my requirements. Right now they’ve just passed my initial sniff test. The next step will be to read documentation in detail, assess the health/activity of its community and developers, and try some sample code.
I’m ready to start looking at candidates to replace Celery in my company’s product. (The reasons are elucidated here, here, and here.)
Our SaaS product provides data mining and visualization for intellectual property. A 10-second elevator pitch is, it’s as though we attached Microsoft Excel’s chart wizard to US and international patent offices. (“As though” = “We didn’t do that, and in fact we go way beyond that, but I’m giving you a simple description.”) Our code is 100% Django and Python.
I looked at how we use Celery in our codebase. The reality of how we use it is much simpler than our ideas when we started two years ago. Combining our existing features with our product roadmap, I know with high confidence what features we need for our asynchronous tasks. And which ones are nice to have but not required, and which ones we’ll probably never need.