The Joy of mod_rewrite and .htaccess

I hate comment and trackback spam. I hate it with the white-hot intensity of a thousand suns. I hate to have to install a plug-in to handle it. I hate the “spam digests” I receive via email. I hate the entire, sordid business of blog spam.

One of the main ways I choose to enjoy my blog is through my site statistics. I love seeing all the happy referrals, the most popular pages and the search engine searches that all lead people to my site and ultimately represent more traffic for me. I also enjoy a good bit of “egorati“, but my main source of pride comes from these numbers. Spam makes these numbers skewed and meaningless, hence my hatred.

So yesterday, I decided to do something about the spam. I looked at my failed referrers and failure reports and found a representative set of URLs to rewrite. The tool I used, mod_rewrite, is actually used extensively by WordPress to make all my permalink, trackback and syndication URLs look less like PHP and more like a human wrote them. It works extremely well and is fairly intuitive – once you get the hang of it. I used this awesome online tutorial to help write my rewrite rules. You can then apply these rules to your .htaccess file, a nifty “distributed configuration file” for redirecting, rewriting and restricting access to pages on your blog.

The biggest blogspam offenders were the URLs that previously led to the comment and trackback cgi scripts used by MovableType. It seems that spammers were using my old blog software to try and spam me. Now they happily redirect to a “Die, Fucking Spammer! Die!” message page. Try it yourself. Pardon the profanity, but they really irk me.

After dealing with the spam, I noticed a few other URLs, former links to my RSS feeds, which didn’t work anymore and were causing 404 errors. I wrote a few rules for these as well, rewriting them to my existing feeds. I have few readers as it is – I can’t afford to lose the old ones. If you had subscribed previously and thought the site had gone dormant, hopefully now you’ll be getting updates.

Finally, I rewrote all the old MovableType archive URLs to newer WordPress archive URLs. It isn’t as slick as I’d like it to be, but it works. Now people who come from old bookmarks or bad search results won’t be left out in the cold. They’ll get to my content, they just won’t get the nice, pretty WordPress URLs.

For those that are curious, here are the new mod_rewrite rules in my .htaccess file:

RewriteRule ^index\.rdf$ /wp-rss2.php [R]
RewriteRule ^index\.xml$ /wp-rss2.php [R]
RewriteRule ^atom\.xml$ /wp-rss2.php [R]
RewriteRule ^cgi-bin/mt/mt-comments\.cgi$ /spam.html [R]
RewriteRule ^cgi-bin/mt/mt-tb\.cgi$ /spam.html [R]
RewriteRule ^archives/([0-9]+)\.html$ /?p=$1 [R]

I hope the blog experience here at Mostly Muppet Dot Com is now 100% better for legitimate visitors and completely dreadful for spammers.

Die, Fucking Spammer! Die!

2 thoughts on “The Joy of mod_rewrite and .htaccess

  1. There’s a whole lot more one can do to combat spam. A lot more ways to utilize .htaccess. Like pinappleproxy block, some trackback blocks etc. Just search my site for htaccess to find them.

Leave a Reply

Your email address will not be published. Required fields are marked *