Here’s another cautionary performance tale, wherein I thought I was clever but was not.
A table (“Vital”) holds widget information. Another table (“Furball”) holds other information, with an M:M relationship to Vital.
We want to do inferential computations on filtered Furball rows. So we generate a pk list from a Vital QuerySet, and call this function:
def _get_top(vitals): from django.db.models import Count TOP_NUMBER = 5 vitalids = [x.id for x in vitals] top_balls = Furball.objects.filter(vital__id__in=vitalids)\ .annotate(count=Count('id'))\ .order_by('-count')[:TOP_NUMBER] top_list = [(x.name, x.count)for x in top_balls] return top_list
Here’s another cautionary performance tale.
If you use Celery subtasks to manage parallel work, know going in that it uses spin-loops to monitor subtask progress. Specifically, if you get a TaskSetResult from a TaskSet and then use
join(), the underlying code will eat your CPU alive. Here’s the code in
At work, we’ve contracted with PostgreSQL Experts to help us improve our Postgres performance. After analyzing our system, one of their consultants, Christophe Pettus, found glaring problems in how some of my code accessed our database.
I consider myself well-informed about good database access practices in Django, and in general. I might not exactly hit the bull’s-eye, but I’m sufficiently savvy to avoid making a “WTF” mistake, right?
Postgres’ site now is apparently generated from a bespoke PHP script mishmash. Josh said that tasks like creating new forms was much harder than they ought to be. So…they’re moving it to Django.
19:00: Checking the PostgreSQL BOF session. Oh, Selena‘s here, that’s a +1. News and tidbits about Postgres 9… I made a lame joke about Postgres running on Android, and the response was a serious, “I don’t think so, not yet.” (The times, they are a-changin’.) Postgres’ site will be migrated to Django. Hot-standby replication and streaming replication. Automatic join removal and optimization of ORM-generated queries. Some disparaging comments about the SQL generated by Rails.
18:41: Dinner was a quick bite at a Subway. Then after I return to the hacker lounge, there’s a call for a group to go to a sushi place. argh!
16:45: import rdma: Zero-copy networking with RDMA and Python. Interesting talk about kernel and user mode buffered-I/O, and the consequences of buffer copies in the socket interface. Locking down memory regions used for I/O feels like going back to the future, before the time of scatter/gather. But InfiniBand products’ price/performance are impressive. I don’t expect to use any of these techniques anytime soon, but I’ll file them away for future reference.
15:45: Cassandra: Strategies for Distributed Data Storage. Overview of CAP theorem, then delved into using Cassandra. A little too deeply too quickly for my interests, but I stayed with it. A good talk.