Tag: .htaccess

How to block comment spam

Like all bloggers, I find comment spam to be a constant annoyance. There are many ways to mitigate the problems it causes however and using the following techniques means that this site is subject to almost no comment spam.

Use WordPress’ built in comment spam tools –

  • In WordPress Options -> Discussion, fill in the list of common spam words – words in this list automatically cause a comment to go into the moderation queue. I use the following list.
  • Also use the Comment Blacklist field. Populate this very carefully. Any comment containing words in this list are nuked automatically. No notification. No way to get them back. Gone. This is the list of words I have in my blacklist.
  • I have checked the “Comment author must have a previously approved comment” field as well. This is a very simple but very effective tool – regular commenter’s are able to leave comments and see them appear instantly; new commenter’s comments are held for approval and if they are not spam, their comment appears in short order and subsequent comments appear immediately.
  • And I use WordPress’ built in anti-spam plugin – Akismet.

I also have a custom .htaccess file which stops a lot of spamers cold before they reach the site at all. Excercise extreme caution with .htaccess files as they can take your entire site down. If you are not sure what you are doing, I have written a few explanatory articles on .htaccess files previously. If you are still not sure what you are doing, put the .htaccess file down and walk away very slowly!!!

Finally, I use plugins called Referrer Karma and Bad Behaviour which help significantly by stopping bots from accessing your site to leave comment spam.

Having implemented these techniques ensures that my site stays free of comment spam without having to moderate all comments and without having to implement CAPTCHAs. CAPTCHAs are those horrible badly drawn images of combinations of letters and numbers which some people put on their sites to stop spam. CAPTCHA’s are evil*. Stop using them. Now.

* The American Foundation for the blind has written many times about how difficult Captchas make browsing for blind or partially sighted people and the W3C in a report on Captcha’s said:

A common method of limiting access to services made available over the Web is visual verification of a bitmapped image. This presents a major problem to users who are blind, have low vision, or have a learning disability such as dyslexia.

Blocking trackback spam using .htaccess

This morning I received a trackback spam. It pointed at a rubbish domain – ohuudfghj.com, and came from ip address 172.164.210.50 using User Agent Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90).

I took a look at Spamhuntress’ site and sure enough she has a post warning that a trackback spam run is about to get underway imminently.

Then I checked out my raw log files and found several entries from this User Agent, all from different IP addresses so banning the IP address would be useless to block the spam.

Consequently, I added the following line:
SetEnvIfNoCase User-Agent ^Mozilla\/4.0 \(compatible; MSIE 5.5; Windows 98; Win 9x 4.90\) spammer=yes
to my .htaccess file.

Although this may seem a tad drastic, I trawled through my raw log files and couldn’t find any legitimate entry for that User Agent in my logs.

Be aware that if you intend to use this code, you need to use it in the context of the surrounding code in my .htaccess file (i.e. follow the code with
deny from env=spammer
if you are uncertain, be sure to check out my .htaccess file).

You can test the efficacy of this code by going to the Wannabrowser site, entering the User Agent into the HTTP User Agent field, your site’s address in the Location field and clicking the Load URl button. You should get a 403 result if the code is successfully blocking this User Agent.

UPDATE: Diane let me know that this code was too strict as it was blocking her and she isn’t on a Windows 98 PC. Spamhuntress pointed to a script to block access to Trackbacks – basically you use this script. I have been using the script and haven’t received any trackback spam since I installed it.

Using WordPress as a CMS – More problems sorted

WordPress is the Content Manangement System (CMS) used to produce this site but shoehorning it to produce a complete site can cause unforseen problems. For example, my Permalink structure was denying me access to my awstats folder!

Why? – this is because I’m publishing the site using WordPress 1.5 and it’s mod_rewrite (created by the permalink’s structure) incorrectly assumes that the Awstats folder is part of the WordPress site (because the Rewritebase is set to /) and when it can’t find Awstats within WordPress, I get dropped into a 404 page.

I tried accessing the folder using the ip address instead of the domain but couldn’t get in – my ISP subsequently told me this wouldn’t work – that I need to come in under my domain for awstats to function.

To sort this out, I inserted the following code above the code WordPress inserted in my .htaccess:
RewriteCond %{REQUEST_URI} ^/awstats/
RewriteRule .* - [L]

Hat tip to Niall for helping out with this.

Comment spam plugins no longer required!

I have written many posts on my battles with WordPress comment spam but all that appears to be coming to a very satisfactory solution. I am now no longer using any comment spam plugins and I have stopped moderating comments on this blog.

How did I get to this enviable position? Well, it has been a long road and I have learned loads about WordPress along the way.

I started down this road by trying various comment spam plugins with different degrees of success. However, none were really satisfactory. The best one was WP-Hashcash – best in that it was most transparent to the user – but it requires commenters to have Javascript turned on in their browser. So I kept looking for another strategy to eradicate this scourge from my blog.

I upgraded from WordPress 1.2 to WordPress 1.5 (the current version) – WordPress 1.5 has a number of anti spam comment features natively built in.

Of these, I have set the number of links allowed in comments to 3 – any more than that, and the comment is auto-moderated.

I have populated the blacklist with a short list of words (just over 40) – any comments containing these words are automatically deleted – boom! No notification to me, no notification to the commenter.

I have written a custom .htaccess file which blocks a lot of potential spam commenters at the gates. Instructions on how and why I set it up are here.

And finally, I have installed Dr. Dave’s plugin Referrer Karma. I know, I know, I said I didn’t have any comment plugins, but I don’t. Referrer Karma is a referrer spam plugin which just happens to work like my .htaccess file (but much more elegantly) to block the bad guys at the gates.

The combination of these measures has allowed me to turn off moderation on the comments on my blog – and so far (one week later) no comment spam has made it through my defences. I’m not saying the war is over but, so far, I seem to have won this round.

Block hotlinkers but allow some sites remote access to images using .htaccess

In a previous post I explained how to create a .htaccess file to stop remote image linking (hotlinking) and bandwidth theft – however, there are some situations where you might want your image files linked to from remote sites – how do you make exceptions for these sites?

The code to block all sites from hotlinking to your images is, as follows (see my previous post for a detailed explanation of the code):
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?tomrafteryit.net [NC]
RewriteRule \.(png|gif|jpe?g)$ - [NC,F]

To allow Google, AltaVista, Gigablast, Comet Systems, and SearchHippo translators and caches to be able to link to images we need to use the following code:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?tomrafteryit\.net [NC]
RewriteCond %{HTTP_REFERER} !^http://216\.239\.(3[2-9]|[45][0-9]|6[0-3]).*(www\.)?tomrafteryit\.net [NC]
RewriteCond %{HTTP_REFERER} !^http://babel.altavista.com/.*(www\.)?tomrafteryit\.net [NC]
RewriteCond %{HTTP_REFERER} !^http://216\.243\.113\.1/cgi/
RewriteCond %{HTTP_REFERER} !^http://search.*\.cometsystems\.com/search.*(www\.)?tomrafteryit\.net [NC]
RewriteCond %{HTTP_REFERER} !^http://.*searchhippo\.com.*(www\.)?tomrafteryit\.net [NC]
RewriteRule \.(png|gif|jpe?g)$ - [NC,F]

And obviously, everywhere you see my domain (tomrafteryit.net) in the code, substitute in your own domain.

Using .htaccess to minimise comment and referrer spam

I have been using my .htaccess file to stop comment and referrer spam on this site and it has been surprisingly successful (so far!). How do I create a .htaccess file capable of greatly reducing comment and referrer spam?

Firstly, I use Awstats to analyse visits to my site daily and I use Spam Karma to help control comment spam. Both applications give me information on spammers visiting my site.

Awstats gives me a list of the referer sites – this list contains those sites which are trying to spam my referrer logs. I monitor those sites and as new ones appear I add them to my .htaccess list in the form:
RewriteCond %{HTTP_REFERER} \.domain\.tld [NC]
where .domain is the domain trying to spam my site (psxtreme, freakycheats, terashells, and so on) and the .tld is the top level domain the site is registered to (.com, .net, .org, .info, etc.).

So, for instance, in the case of the spammer coming from the smsportali.net domain, I have added the following line to my .htaccess code:
RewriteCond %{HTTP_REFERER} \.smsportali\.net [NC]
This will stop accesses from all subdomains of smsportali.net (spamterm.smsportali.net) to the site and the NC ensures that this rule is case insensitive.

In the case of comment spam, I have configured Spam Karma to email me every time it deletes a spam comment – this is becoming rarer and rarer as the .htaccess file becomes more and more effective. I have configured Spam Karma to include the server variables and request headers of a comment that is not approved in the email – this is one of the configuration options of this plugin.

Scanning these emails, I can see the User Agents being employed by these spammers – armed with this information, I added the following lines to my .htaccess file:
RewriteCond %{HTTP_USER_AGENT} Indy.Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crazy\ Browser [NC]
RewriteRule .* – [F]
and this has greatly reduced the amount of comment spam coming through.

Also, Cindy alerted me to the fact that adding:
RewriteCond %{HTTP:VIA} ^.+pinappleproxy [NC]
RewriteRule .* – [F]
Will also catch a lot of the spammers.

I have a copy of my .htaccess file available for review (it is in .txt format).

NOTE:
For each set of rules in your .htaccess file, you need to finish with a RewriteRule – RewriteRule .* – [F] will give a 403 (page forbidden) to the spammers. Your last set of rules should end with RewriteRule .* – [F,L] – the L telling the RewriteEngine that this is the last line and to stop processing the rules here.

IMPORTANT WARNING:
the .htaccess file is a very unforgiving file. It has the power to make your entire site unavailable to anyone. It is strongly advised to read up on Regular Expressions and Mod_Rewrite (the Apache module which processes these commands in a .htaccess file) before creating a .htaccess file or modifying an existing one.

Using .htaccess to redirect hotlinkers to another image

In my last post on using .htaccess to block direct linking of images, I advised simply using the RewriteRule to forbid display of images (i.e. RewriteRule .(gif|png|jpg|jpeg?)$ – [NC,F]). This is a nice simple rule which works a treat to block display of your images on remote sites.

However, if you want to take this a step further, you can re-direct requests for images from remote webpages to an image of choice on your website. I have created an image, called stolenimage.jpg, which simply says “This image is stolen”. Anyone trying to link directly to images on my site is, therefore, inadvertantly serving that image on their pages.

The code to put in .htaccess to achieve this is:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?tomrafteryit.net [NC]
RewriteRule \.(png|gif|jpe?g)$ stolenimage.$1 [NC,L]

This is the same code as is in my previous post except for the RewriteRule.

It is a very good idea not to redirect a browser from one file type to another. The cleanest approach is to make a seperate version of your the stolenimage.jpg file in each format that you use on your site – for example I have one in gif format, one in jpg format, one in jpeg format, and one in png format. Then redirect each hot-linked image to the matching filetype.

In the RewriteRule above, the “$1” in the last line refers back to the contents of the parenthesis in the same line. That is, a request for a .jpg file will be redirected to http://www.tomrafteryit.net/stolenimage.jpg, and a request for a .gif file will be redirected to http://www.tomrafteryit.net/stolenimage.gif, etc.

The L in the square brackets is the “last rule” – it stops the rewriting process here and tells the .htaccess file not to apply any more rewriting rules. See the Apache mod_rewrite URL Rewriting Engine page for more.

Obviously, if you are feeling a bit mischievous, you can serve other images to people hotlinking your images – “Free shipping worldwide – we ship anywhere for free”, “Order one, get three free” or “This site supports the Taliban’s policy on Feminism” are some possibilities! You are only limited by your imagination.

Many thanks to all the contributers to the WebmasterWorld forums, from where I gleaned most of the information in these posts.

Using .htaccess to stop remote image linking (hotlinking) and bandwidth theft

Hotlinking, remote image linking, direct image linking is when a remote website embeds images from your site on their webpage(s) – this causes the image to be served from your website to anyone browsing their site – thus they are robbing your bandwidth.

How can you stop this? Well, using an .htaccess file in your images folder(s), there are a number of options.

The most straightforward is to simply create an .htaccess file with the following code:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?tomrafteryit.net [NC]
RewriteRule \.(png|gif|jpe?g)$ – [NC,F]

The first line here turns on mod_rewrite (a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly) and only needs to be done once per .htaccess file.
The next line is needed to allow your site to be viewed through proxy caches. If you take it out, then anyone without a referer won’t be able to view your site. Many proxy caches, for instance, block referers… and that looks the same as a directly-entered URL.
The third line tells the .htaccess file where to allow image files to be served from – in this case it will allow images be served from http://tomrafteryit.net and http://www.tomrafteryit.net (remember to update this for your own domain!) and
The final line is case insensitive (the NC) and instructs the .htaccess file what file types to restrict the serving of. You could just as easily use this to protect .mp3s, .pdf’s or any other file type by substituting the file type in this line. The F in the square brackets forces the current URL to be forbidden.

For more infomation on this see the Apache mod_rewrite URL Rewriting Engine page.

There are more things you can do via .htaccess to stop people hotlinking to your images that I’ll cover in my next post.

Warning – The .htaccess file is very powerful (it can potentially take your entire site offline) and sensitive to typo’s – always test your site after making changes and be sure you have a plan to revert in the event of a problem arising.

How to create an .htaccess file

The .htaccess file is a very powerful tool – amongst other things, it allows you to password protect folders, redirect users automatically, use custom error pages, change your file extensions, ban users by IP address, only allow users with certain IP addresses, stop directory listings and use an alternate index file.

Creating the file is easy, you just need enter the appropriate code into a text editor (like notepad). You may run into problems with saving the file because .htaccess is a strange file name (the file actually has no name but a 8 letter file extension). You may need to name it something else (e.g. htaccess.txt) and then upload it to the server using an ftp client program (.htaccess files must be uploaded in ASCII mode, not BINARY). Once you have uploaded the file you can then rename it using your FTP program.

You may need to CHMOD the htaccess file to 644 or (RW-R–R–). This makes the file usable by the server, but prevents it from being read by a browser, which could seriously compromise your security.

For more information on .htaccess files see the Comprehensive guide to .htaccess.

In my next post I’ll be going through some cool things you can do with the .htaccess file