My requirements for replacing Celery


I’m ready to start looking at candidates to replace Celery in my company’s product. (The reasons are elucidated here, here, and here.)

Our SaaS product provides data mining and visualization for intellectual property. A 10-second elevator pitch is, it’s as though we attached Microsoft Excel’s chart wizard to US and international patent offices. (“As though” = “We didn’t do that, and in fact we go way beyond that, but I’m giving you a simple description.”) Our code is 100% Django and Python.

I looked at how we use Celery in our codebase. The reality of how we use it is much simpler than our ideas when we started two years ago. Combining our existing features with our product roadmap, I know with high confidence what features we need for our asynchronous tasks. And which ones are nice to have but not required, and which ones we’ll probably never need.

Here are my Celery replacement requirements. Except for the “I don’t need” or “Tasks don’t have to…” statements, they’re all about equally important.

  1. The non-deprecated section of the API does not change often.
  2. New code releases have one of two attributes: They’ll be 100% backwards compatible, or they’ll have clear instructions for migrating code.
  3. There’s active current development, an easy way to interrogate and submit tickets, and a healthy user community. The main developer(s) is(are) reasonably communicative.
  4. Our tasks are 100% Python code, and can easily access my project’s Python and Django resources. E.g., my project’s settings.py.
  5. We can pass arguments into a task, just like calling a function.
  6. Support for periodic and on-demand tasks. The periodic tasks’ frequency can be a simple cycle, like a timedelta().
  7. For task servers:
    • Support for more than one. I.e., I can define a task server farm. It’s OK if they’re statically defined and homogeneous. (We now have just one dual hex-core Celery task server… I could let the multiple server requirement slide for six months.)
    • Support for defining the maximum number of executing tasks per server. It’s OK if this is homogeneous.
    • I don’t need different queues. I don’t need task priorities. I don’t need routing keys. I don’t need autoscaling. I don’t need remote shutdown. I don’t need any signal besides “abort.” I don’t need different tasks per server. All the tasks can be in one big FIFO queue. Submit a task, it runs, it finishes. Simplicity is powerful.
    • A built-in way for all task logging to go into one file per server would be nice, but not essential.
    • Task overhead equivalent to Unix cron jobs would be fine. We aren’t hitting the task scheduler with 18 billion task submissions/second.
  8. For task management (these can be programmatic or via a web interface), a way to:
    • Know how many task servers are up.
    • Know what tasks are running, and when they were submitted. It would be nice but not essential to know on what task server they’re running.
    • Know what tasks are pending.
    • Send executing tasks an “abort” signal. It’s OK if the task code has to periodically interrogate it.
    • Cancel all pending tasks.
    • Cancel all pending tasks with a certain task name.
  9. A way for a task to run subtasks, and for the parent task to wait until they’re all done. The parent just needs to know the “>= 1 subtask remains” and “0 subtasks remain” conditions.
  10. Tasks don’t have to return values. All of a task’s effects can be side-effects.

I’m struck by how simple our requirements are, for such a complex and capable product!

My next step is to assemble a list of candidate technologies. I’ll report back what I come up with.

5 comments
  1. Looking forward to your candidates list.
    I was planning on doing a talk on django+celery but as I dive in the topic I’m questioning if it makes much sense.

  2. Charlie said:

    I too became frustrated with celery…I wrote my own task queue and consumer which supports many of the bullets you listed.. https://github.com/coleifer/huey hope you find it useful!

  3. Diederik van der Boor said:

    Ohh I’m so curious what solution you’ll come up with.. Something really simple (especially in the server side) would be more then welcome. Currently I’d just use Celery as is, but the whole multi-component layers + different programming languages setup just makes me hesitate about Celery. Perhaps unfounded, yet I feel you can fill in a niche here that doesn’t have 18 billion task submissions/second like you mentioned. 🙂

  4. Goulwen said:

    Hi,

    We’re also in the process of removing Celery/RabbitMQ: despite its name, RMQ is
    a big beast not so easy to tame.

    We’re also quite disappointed by changes in Celery: not speaking to Celery 3,
    moving from celery 2.4 to 2.5 implies quite some changes in our code although it
    was a minor release.

    So we have decided to use ØMQ, gevent and redis to build our custom messaging
    system. It’s not ready for now for open source publishing but it’s interesting
    to view other people’s requirements.

    Did you manage to find a replacement?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: