Like many bloggers, I use Akismet to screen comments for spam. In the past month, I’ve noticed two new kinds of spam that are cleverer than the usual gibberish. Akismet didn’t classify them as spam, but it did at least classify them as questionable.
One new spam type contains extractions from other sites that contain one or more capitalized words from the blog post. I guess the rationale is that capitalized words are likely to be proper nouns. (If they also filter out words beginning a sentence, they’re virtually guaranteed to be proper nouns.)
For example, say a post contains the word, “Django.” The spammer extracts capitalized words from the post, selects “Django”, and does a web search for it. They’ll then extract a couple of sentences containing that word from the search results, and use them as the spam text.
In my cases, the questionable comments were odd because the text was disjointed. I did some searching, found the sites from which they originated, and then Bzzzzzzt flagged the comments as spam.
Another new type contains extractions from existing comments on the blog post. When I saw one of these in my Akismet “pending” queue, my initial reaction was, “Those words are passably relevant to the post, but they’re…odd. And reading them gives me Deja Vu.” After a few seconds of chin-scratching, I searched for a string from the comment, and lo and behold my own blog comes up in the results! Bzzzzzzzt.
These new tactics are attempts at “spam seeding.” Which is when a spammer tries to get his/her address recognized as legitimate, as preparation for an eventual spam campaign.
Just as there’s an Captcha arms race, there’s a comment-spam arms race. Makes you wonder how much more advanced our software products would be if all this energy were directed to more productive work.
To fight spam I have created my own implementation of a “reverse captcha” system. It’s designed so that only dumb robots can caught in its net and it’s effective.
I can’t post my implementation, because that would reveal the trick. But there are different examples: create a trap field called ’email’, use .js to load the form, etc.