Using .htaccess to minimise comment and referrer spam

I have been using my .htaccess file to stop comment and referrer spam on this site and it has been surprisingly successful (so far!). How do I create a .htaccess file capable of greatly reducing comment and referrer spam?

Firstly, I use Awstats to analyse visits to my site daily and I use Spam Karma to help control comment spam. Both applications give me information on spammers visiting my site.

Awstats gives me a list of the referer sites – this list contains those sites which are trying to spam my referrer logs. I monitor those sites and as new ones appear I add them to my .htaccess list in the form:
RewriteCond %{HTTP_REFERER} \.domain\.tld [NC]
where .domain is the domain trying to spam my site (psxtreme, freakycheats, terashells, and so on) and the .tld is the top level domain the site is registered to (.com, .net, .org, .info, etc.).

So, for instance, in the case of the spammer coming from the smsportali.net domain, I have added the following line to my .htaccess code:
RewriteCond %{HTTP_REFERER} \.smsportali\.net [NC]
This will stop accesses from all subdomains of smsportali.net (spamterm.smsportali.net) to the site and the NC ensures that this rule is case insensitive.

In the case of comment spam, I have configured Spam Karma to email me every time it deletes a spam comment – this is becoming rarer and rarer as the .htaccess file becomes more and more effective. I have configured Spam Karma to include the server variables and request headers of a comment that is not approved in the email – this is one of the configuration options of this plugin.

Scanning these emails, I can see the User Agents being employed by these spammers – armed with this information, I added the following lines to my .htaccess file:
RewriteCond %{HTTP_USER_AGENT} Indy.Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crazy\ Browser [NC]
RewriteRule .* – [F]
and this has greatly reduced the amount of comment spam coming through.

Also, Cindy alerted me to the fact that adding:
RewriteCond %{HTTP:VIA} ^.+pinappleproxy [NC]
RewriteRule .* – [F]
Will also catch a lot of the spammers.

I have a copy of my .htaccess file available for review (it is in .txt format).

NOTE:
For each set of rules in your .htaccess file, you need to finish with a RewriteRule – RewriteRule .* – [F] will give a 403 (page forbidden) to the spammers. Your last set of rules should end with RewriteRule .* – [F,L] – the L telling the RewriteEngine that this is the last line and to stop processing the rules here.

IMPORTANT WARNING:
the .htaccess file is a very unforgiving file. It has the power to make your entire site unavailable to anyone. It is strongly advised to read up on Regular Expressions and Mod_Rewrite (the Apache module which processes these commands in a .htaccess file) before creating a .htaccess file or modifying an existing one.

37 thoughts on “Using .htaccess to minimise comment and referrer spam”

  1. It looks like you’re having problems with many of the same spammers as I am, but your htaccess file is much cleaner than the one I’ve been building. Thanks for publishing it!

  2. Because .htaccess is so powerful, I set up up a special directory to test the file. The directory contains an “index.html” file and a second one. I know I can’t fully test who is blocked, but I can at least test to make sure the new .htaccess file won’t block my whole site from absolutely postively everyone including me.

    This is useful for new users like me, who have never, ever, ever fiddles with the htaccess file.

  3. Pingback: orbitalworks
  4. I think your list of “bad words” could be shortened considerably by the simple expedient of using PHP’s “explode” and “implode” on the comments. Explode it to remove hyphens, and then implode to put spaces back in. Then process. Then you’ll weed out domains like buy-this-stuff when actually what you want to block is the word ‘stuff’.

    I’ve done this on my 1.2 implementation but am still screwing up the courage to up to 1.5, as I hear it breaks various plugins I rely on. Spaminator, for one.

  5. Oh, the irony of having comment spam in the comments of a post about getting rid of comment spam…

    Good tips here and I plan to make use of them. My main site doesn’t use WordPress, but a small CMS that I am proud to have modified to its current state (and am stubborn to give on up for that reason). As such, comment moderation is a bit difficult, so blocking spammers before they even reach the comment page is good for me. Thanks.

  6. Can you explain the Files trackback part of the code? I don’t have a file called trackback. Does it partial match on the name? Should it really be
    wp-trackback.php? That doesn’t seem right either or you may as well be blocking the get access…

    Also, isn’t there harm in blocking all mozilla and opera browsers from posting track backs?

    (Hopefully this next bit will look right in the comment… sorry if it doesn’t )

    Here’s the section I am talking about:
    # From Spamhuntress – code to deny the below user agents POST access to trackback

    # From Spamhuntress – code to deny the below user agents POST access to trackback
    <Files trackback>
    <limit POST>  

    SetEnvIf User-Agent “Mozilla” trackers
    SetEnvIf User-Agent “Opera” trackers
    SetEnvIf User-Agent ^$ trackers

    Order Allow,Deny
    Allow from all
    Deny from env=trackers

    </limit>

  7. What my blog says in Swedish is that using your .htaccess leaves SpamKarma out of work. I have hardly had any spam (usually caught by SK) since I modified my .htaccess with your code.

    I also write about having considered to exchange your list of known spammers for the compact optimized regex version of ReferrerCops blacklist. It might give Apache and the server a harder time, with all those regular expressions, but there should be few spammers passing by that test.

    I also mention Chongqed‘s blacklist, no regex, quite long.

    You are quite right in your comment on my site, I am positive. 🙂

    Regards,
    Johan Adler
    Sweden

  8. Great Johan – glad it was of some use to you (and apologies for my lack of Swedish!).

    By the way, I have started using Akismet recently and I find it is the best anti-spam tool i have come across yet!

  9. Clarification: I have thought of using ReferrerCops regular expressions blacklist, but putting their regexes in your .htaccess.

    I have not switched to Akismet, have not bothered to get the needed API. Inspired by you, I got the key (and a wordpress.com blog that will be unused, waste of space) and activated Akismet. It might not have much work to do either. 😉

  10. Oh- I meant to tell you this a month ago but you dropped the

    RewriteEngine On

    line from your .htaccess file.

    As I understand it, that line is somewhat important for the rest of the file to work right on most apache servers…

    http://www.apacheref.com/ref/mod_rewrite/RewriteEngine.html
    “By default, rewrite configurations are not inherited. Thus you need a RewriteEngine directive to switch this configuration on for each virtual host in which you wish to use it. ”

    But then again most all I know about .htaccess files, I learned from you so why should the student question the master!

  11. D’Oh!

    Thanks for the heads up Brian – hope that hasn’t been missing too long.

    I put it back in now (and I’m hardly a master – I just read up on that stuff when I was having spam problems – I’ve forgotten a lot of it now 😦 ),

    Cheers,

    Tom.

  12. There are two BIG limitations of your method.

    The first big problem is with referrer checking in general. The problem is that many of today’s browsers and firewalls now strip the referrer information. It’s a privacy thing.

    Therefore if you implement that check, you will block a lot of legitmate traffic that comes to you with a blank referrer. Likewise, if you allow blanks, you will let in all of the spam bots that don’t send a referrer.

    Additionally, if you did something like:

    <
    order deny,allow
    deny from all
    allow from yoursite.com
    </Limit>

    How would the people get to your site initially? If they browsed straight to it, they would get permission denied error.

    This might work in some limited office environment on a subdirectory that should only be accessed from the site and using the company approved browser. Then you could consider all traffic without a reffer ilegitimate. Basically, if you have control over some of the variables in the situation, YMMV.

  13. thanks Brian for comment!

    ok. I have everyday spamming:
    – ip is different everytime
    – text the SAME, but antispam-bad-words-list doesn’t filter it.
    things like: -0-XXX-0, -0 XXX 0 -, etc.

    I can’t even imagine how to prevent spam, besides send *** kind words to spammers 🙂

    I suppose to use referrer check for spam-machine – if it isn’t yoursite.com that is badguy-goodbye.com 8)

  14. How does it work when “page.php” is in one, two, three level deep folder?
    Is it like this:
    RewriteCond %{REQUEST_URI} .folder1/folder2/folder3/page\.php*
    RewriteCond %{REQUEST_URI} .folder1/folder2/page\.php*
    RewriteCond %{REQUEST_URI} .folder1/page\.php*

    OR just like this:
    RewriteCond %{REQUEST_URI} .page\.php*

    Cheers

  15. I am new to .htaccess and have to ask…
    Q1: Can I use this for any page that is posting data?
    Q2: If Q1 is YES, my page is one folder deep, ie:comments/page.php
    Do I do this:
    RewriteCond %{REQUEST_URI} .comments/page.php\.php*
    Or this:
    RewriteCond %{REQUEST_URI} .comments\page.php\.php*
    Or this:
    RewriteCond %{REQUEST_URI} .http://www/example.com/comments/page.php\.php*
    Or this:
    RewriteCond %{REQUEST_URI} ./var/htdocs/web/comments/page.php\.php*

    Any help would be great.
    Cheers

  16. wow… the thread from beyond the grave! 4 years later and I’m still getting comment notifications 🙂 We’ll I’m just stopping by because this thread seems like an old friend. Cheers all!

  17. Might be the thread from the grave, however a great thread and interesting use of .htaccess. I did some google and this is what i got, a great article. I especially liked the SetEnvIfNoCase method which seems very clean endeed!

  18. I’m curious if I can block user jokers. For example, I get referrer spam of all kinds of websites with “weddingdresses” in the url, or airmax or intl-alliance, it would be great to be able to block the whole lot in one line.

Comments are closed.