I’m in work on a Saturday, doing some database munging. We have a large update that requires a bunch of rows to be dropped, and a schema change.
I wish we had a DBA on staff for times like this. Or maybe a kick-ass local consultant whom we could bring in from time to time.
The row drops are taking forever. We don’t use triggers, but we of course have FKs and indexes. I’ll bet a savvy DBA would know some tricks to make the drops go faster. Drop indexes first? Don’t use a transaction? Inhibit table scanning? Something something something.
I know about good db behavior in our application, and measurement techniques, and know enough to know what I don’t know (that’s always most important), and a few performance tricks esp. when using Django. But table munging tricks I’m not so hot on. It’s not want for lack of desire; there are only so many hours in the day.
Here’s another cautionary performance tale, wherein I thought I was clever but was not.
A table (“Vital”) holds widget information. Another table (“Furball”) holds other information, with an M:M relationship to Vital.
We want to do inferential computations on filtered Furball rows. So we generate a pk list from a Vital QuerySet, and call this function:
from django.db.models import Count
TOP_NUMBER = 5
vitalids = [x.id for x in vitals]
top_balls = Furball.objects.filter(vital__id__in=vitalids)\
top_list = [(x.name, x.count)for x in top_balls]
Here’s another cautionary performance tale.
If you use Celery subtasks to manage parallel work, know going in that it uses spin-loops to monitor subtask progress. Specifically, if you get a TaskSetResult from a TaskSet and then use
join(), the underlying code will eat your CPU alive. Here’s the code in
At work, we’ve contracted with PostgreSQL Experts to help us improve our Postgres performance. After analyzing our system, one of their consultants, Christophe Pettus, found glaring problems in how some of my code accessed our database.
I consider myself well-informed about good database access practices in Django, and in general. I might not exactly hit the bull’s-eye, but I’m sufficiently savvy to avoid making a “WTF” mistake, right?
One interesting tidbit from last night’s PostgreSQL BOF session was the news that Postgres’ site would be migrated to Django within the next year. This came from Josh Berkus.
Postgres’ site now is apparently generated from a bespoke PHP script mishmash. Josh said that tasks like creating new forms was much harder than they ought to be. So…they’re moving it to Django.
19:00: Checking the PostgreSQL BOF session. Oh, Selena‘s here, that’s a +1. News and tidbits about Postgres 9… I made a lame joke about Postgres running on Android, and the response was a serious, “I don’t think so, not yet.” (The times, they are a-changin’.) Postgres’ site will be migrated to Django. Hot-standby replication and streaming replication. Automatic join removal and optimization of ORM-generated queries. Some disparaging comments about the SQL generated by Rails.
18:41: Dinner was a quick bite at a Subway. Then after I return to the hacker lounge, there’s a call for a group to go to a sushi place. argh!
16:45: import rdma: Zero-copy networking with RDMA and Python. Interesting talk about kernel and user mode buffered-I/O, and the consequences of buffer copies in the socket interface. Locking down memory regions used for I/O feels like going back to the future, before the time of scatter/gather. But InfiniBand products’ price/performance are impressive. I don’t expect to use any of these techniques anytime soon, but I’ll file them away for future reference.
15:45: Cassandra: Strategies for Distributed Data Storage. Overview of CAP theorem, then delved into using Cassandra. A little too deeply too quickly for my interests, but I stayed with it. A good talk.