Archive
Alternatives to using Celery
I’ve found some candidates for replacing Celery in my company’s product. (My reasons for replacing it are elucidated here, here, and here.)
I got these from web trawling, blog comments, and some e-mail. At first blush, none of the candidates have any disqualifying attributes, except for lacking subtasks. Celery is the only Python-friendly asynchronous task technology with subtask support, so I’ll need to bend on that if I want any alternatives to consider. (If I’m wrong on this point, please let me know in the comments!)
I’m not saying that these candidates will definitely satisfy all (sans subtasks) of my requirements. Right now they’ve just passed my initial sniff test. The next step will be to read documentation in detail, assess the health/activity of its community and developers, and try some sample code.
My requirements for replacing Celery
I’m ready to start looking at candidates to replace Celery in my company’s product. (The reasons are elucidated here, here, and here.)
Our SaaS product provides data mining and visualization for intellectual property. A 10-second elevator pitch is, it’s as though we attached Microsoft Excel’s chart wizard to US and international patent offices. (“As though” = “We didn’t do that, and in fact we go way beyond that, but I’m giving you a simple description.”) Our code is 100% Django and Python.
I looked at how we use Celery in our codebase. The reality of how we use it is much simpler than our ideas when we started two years ago. Combining our existing features with our product roadmap, I know with high confidence what features we need for our asynchronous tasks. And which ones are nice to have but not required, and which ones we’ll probably never need.
Breakage when upgrading from Celery 2.5.3 -> 3.0.4
Commenting on my update to my Celery rant, @asksol asked me to post the Pylint results that made me question the claim of backwards compatibility.
(“@Asksol asked” — See what I did there? That’s alliteration. It’s a sign of a quality blog post. Ask for it by name.)
Again for the record, @asksol is a smart and friendly person. I know I wouldn’t last a day supporting a project the way he has supported Celery over multiple years. I’ve calmed down since yesterday, and I hope that something good results from my rant — if not for me, then for a future Celery user needing upgrade help. In his reply to my rant, @asksol describes some history and rationale for how he manages code change, and I encourage you to read it.
Here we go:
An update to my Celery rant
An update to my rant on Celery’s frequently-changing API: I’ve decided to stay with Django-celery 2.5.5 and Celery 2.5.3.
When I tried using Celery 3.0.4 with my existing code, Pylint threw about 60 warnings, many of which look real and all of which weren’t there when I used Celery 2.5.3.
“Backwards-compatible” my ass!
I shouldn’t have to chase my tail like this. Celery, you lost me. I’m now looking to replace you.
Celery API changes drive me nuts
This is a rant.
My company’s code base is over 65K lines of Python and JavaScript code. We use Celery, Django-Celery, and RabbitMQ for our background asynchronous tasks. Ten different tasks.py files contain 30 task classes, split roughly 50-50 between periodic and on-demand. We use subtasks.
Today, I dug into updating from Celery 2.5.3 to 3.0.4, and I popped my cork.
I am aggravated by the frequency and extent of Celery API changes. It’s easily changed more often than any other five technologies in our stack combined. I’ve been upgrading Celery and Django-celery every six months or so, which corresponds to upgrading every few minor versions. And the changes are similar in scope to what I see when upgrading any other technology across one or two major versions.
Replacing Redis with a Python Mock
tl;dr
When writing tests, mock out a subsystem if and only if it’s prohibitive to test against the real thing.
!tl;dr
Our product uses Redis. It’s an awesome technology.
We’ve avoided needing Redis in our unit tests. But when I added a product feature that made deep use of Redis, I wrote its unit tests to use it, and changed our development fabfile to instantiate a test Redis server when running the unit tests locally.
(A QA purest might argue that unit tests should never touch major system components outside of the unit under test. I prefer to do as much testing as possible in unit tests, provided they don’t take too long to run, and setup and teardown aren’t too much of a PITA.)
This was a contributory reason for our builds now failing on our Hudson CI server. Redis wasn’t installed on it!
Why didn’t I immediately install Redis on our CI server?
- Our CI server had other problems
- I intended to nuke it and re-create it with the latest version of Jenkins. I just needed to first clear some things off my plate
- Our dev team had shrunk down to just two people
- We were both strict about running unit tests before checking code into the pool
- We were up to our necks in other alligators
From a test-quality perspective, if code uses X in production, it’s better for tests to run with X than with a simulation of X.
One of the many joys of working with Ryan is that he challenges my assumptions and makes me consider alternatives. Because of a perceived lack of elegance in needing Redis on our CI server, and because his work had been temporarily blocked by my code changes, he challenged me to replace my unit tests’ use of Redis with a mock.
I walked into work yesterday and it was quiet. All our critical bugs blocking Saturday’s release were closed. I thought, why not? I’ll give it a go. Today’s a good day to see what’s involved with replacing Redis with a mock!
An awesome engineer makes me scratch my head
I know an awesome software engineer. He’s very smart and a joy to work with. He’s platinum-grade material, and I’d work with him again in a femtosecond.
On rare occasions, this Pythonista among Pythonistas and Djangonaut among Djangonauts writes code that makes me scratch my head.
IP Street’s Senior Developer opening now more about Search, less about Python/Django
After some job market feedback and chin-scratching, I’ve changed our Senior Developer opening’s job description. Now it’s less about Python or Django, and more about search technologies, specifically full-text and LSI search.
We hope candidates will have some experience with Python or Django, but search technology experience (e.g., tuning, tokenizers, parsers, relevancy rank tweaking, aggregates and pivots) in now more important, and emphasized, in the the job.
Here’s the new description:
———
Founded in 2009, IP Street develops and markets software to help corporations, law firms, financial research firms, and government agencies better analyze patent information. Our goal is to make IP data easy to get, use, and understand, so everyone can have access to high quality and transparent information.
A significant facet of our application’s capabilities are derived from Solr and other search technologies. We’re seeking a great full-text Search developer with experience in:
- Solr, Lucene, or other search engines
- Full-text search schemas, tokenizers, parsers, and rules for returning statistics and meaningful analytics
- Automated workflows that process millions of objects
- Data quality metrics and repairs
You’ll be joining us at a great time! Revenue is coming in, and we’ve done two Angel funding rounds at increasing valuations.
Key Responsibilities.
- Enhance our Solr engine to provide more statistics and meaningful analytics to the product
- Enhance or tune our use of other search technologies, e.g., LSI
- Enhance and extend the existing code base to add new product features. Our application is written in Django and Python, with an almost all open-source technology stack
- Occasionally wear testing or devops “hats,” as the needs arise
- Write unit tests for your code, and do performance analysis
- Demonstrate technical leadership within the team
- Communicate well with the team, in writing and orally
Qualifications.
- Significant experience using and tuning Solr, Lucene, or other search engines with similar capabilities
- 3+ years related experience in Python development
- 1+ years experience in Django development, or a strong interest in learning
- Experience using one or more of: MongoDB, CouchDB, or another NoSQL database; Celery; Redis; PostgreSQL or another SQL database
- Experience using latent semantic indexing search technologies would be a plus
- Experience integrating with open-source 3rd-party libraries
- Experience creating customer-focused software to process data and generate statistics and analytics
- Solid troubleshooting abilities, self-directed, and proactive
- Enjoy all aspects of software product creation — design, implementation, and debugging
- Familiarity with using OS X as a development environment, and Linux as a production environment
- Bachelors Degree or equivalent in Computer Science or Software Engineering
- Excellent communication skills
Salary is DOE.
Please send resume to johnd@ipstreet.com.
IP Street will consider more than three Django developers
My Senior Developer job description had an embarrassing mistake. It asked for 7+ years experience in Python and Django, which, as a commenter noted, limited the candidate pool to about three people on the entire planet.
I’ve fixed my goof. We’re nominally looking for at least seven years of Python experience, and at least three years of Django experience, for this slot.
IP Street is looking for a Senior Developer
We’re looking to hire two lucky people who desire fame and fortune. Here’s the Senior Developer opening:
Founded in 2009, IP Street develops and markets software to help corporations, law firms, financial research firms, and government agencies better analyze patent-related information. Our goal is to make IP data easy to get, use, and understand, so everyone can have access to high quality and transparent information.
We’re seeking a great Python developer with experience in: Automated workflows that process millions of objects; data quality metrics and repairs; search, particularly with Solr or Lucene; and/or general data mining. Our stack, and development & production environments, are almost all open-source. The key technologies are Python, Django, Celery, Solr, and PostgreSQL.
Django vs. PostgreSQL IN operations
Here’s another cautionary performance tale, wherein I thought I was clever but was not.
A table (“Vital”) holds widget information. Another table (“Furball”) holds other information, with an M:M relationship to Vital.
We want to do inferential computations on filtered Furball rows. So we generate a pk list from a Vital QuerySet, and call this function:
def _get_top(vitals):
from django.db.models import Count
TOP_NUMBER = 5
vitalids = [x.id for x in vitals]
top_balls = Furball.objects.filter(vital__id__in=vitalids)\
.annotate(count=Count('id'))\
.order_by('-count')[:TOP_NUMBER]
top_list = [(x.name, x.count)for x in top_balls]
return top_list
Recent Comments