We didn’t initially build captchas into TrenchMice, because we simply didn’t think they would be necessary.
By September 2006, the site started receiving spam comments. They were the usual gibberish you see in blog spam: Lots of links, garbage words, and bogus e-mail addresses. (Whenever I see this stuff, I shake my head and wonder why script kidz waste their time generating it. Then I remember it’s because, out of the gazillions of spam messages, some recipients click on the links, making spam financially rewarding. And then I get slightly depressed about the average Internet user. <Sigh.> But, I digress…)
So we broke down and added a captcha system using PIL and our own algorithms. Our images were simple, as modern captchas go:
But they got the job done, with an acceptable load on our servers.
We decided to upgrade the captcha technology for two reasons.
- Lately, we’ve noticed more doorknob-jiggling activity. This hasn’t yet resulted in comment spam, but it indicates the site is getting more attention from spammers. I don’t want to wait until there’s a successful attack to better secure the site.
- We’re not interested in developing a core competency in captcha design. The initial captchas were easy to do, but we don’t want to invest time into learning the latest and greatest imaging techniques now that more work is required.
Their image algorithms are way more sophisticated than ours, and they believe they’re as good as any out there.
Their system does useful work by correcting OCR text from digitized books. This is rather cool.
They claim excellent system availability for their users, and expect to be in business for years. There’re no indications to the contrary.
If a hacker cracks their images, they promise to respond quickly by tweaking their algorithms. So we won’t have to do much besides add our voice to the, “Please fix this,” thread that would presumably get created in their support newsgroup.
I replaced our template captcha code with this. It’s a straight lift from their client API instructions:
captcha_error template variable is
"&error=ERROR_CODE" if we’re re-displaying a bad form after a POST. Otherwise, it’s an empty string. I vacillated over moving this into the view’s form class for 30 minutes, but I kept it in the template because:
- TrenchMice has a mix of oldforms and newforms, because we agreed to upgrade pages to newforms only if edit them for another reason. (I.e., fixing a bug or changing the form for some other reason.) We haven’t done all of them yet. I didn’t want to procedurally trigger an update of the remain oldforms-based views using captchas; and if I chose to ignore this self-imposed rule, I didn’t want to further burden them with even more code that would have to eventually be updated.
I replaced our view captcha code with this. It uses the Python recaptcha-client. (Warning, recaptcha-client didn’t install properly on my system without my tweaking the package files. YMMV.):
# Initialize to an empty string, not None, so the reCAPTCHA call query string # will be correct if there wasn't a captcha error on POST. captcha_error = "" if request.method == 'POST': ## # Check the form captcha. If not good, pass the template an error code captcha_response = \ captcha.submit(request.POST.get("recaptcha_challenge_field", None), request.POST.get("recaptcha_response_field", None), RECAPTCHA_PRIVATE_KEY, request.META.get("REMOTE_ADDR", None)) if not captcha_response.is_valid: captcha_error = "&error=%s" % captcha_response.error_code elif form.is_valid(): ## ...
I also swapped out our PIL-based e-mail address obfuscation for the reCAPTCHA Mailhide API. Recaptcha-client had code for this too, and it was easy to hook up. So easy that I won't bother writing about it.🙂
The end result
The reCAPTCHA captchas work great, and the total amount of view and template code decreased. Our simple captchas had small view hacks to handle the case of re-displaying a form that had a good captcha response but a problem in another field. That code, however minor, is now gone. We also had a background script to clean the captcha image file directory — gone. We also had a font directory for the images — gone.
Visually, the styling isn't completely in keeping with the rest of the page. But it's perfectly acceptable. We have been displaying a simple blue box with green text, and I can't claim that was visually wonderful.