htaccess is an Apache thing, not an NT thing. There are similar capabilities for NT servers, though in my professional experience and personal opinion, Apache is best. But that’s not what we’re here for.
htaccess files must be uploaded as ASCII mode, not BINARY. You may need to CHMOD the htaccess file to 644 or (RW-R–R–). This makes the file usable by the server, but prevents it from being read by a browser, which can seriously compromise your security. htaccess files affect the directory they are placed in and all sub-directories, that is an htaccess file located in your root directory (yoursite.com) would affect yoursite.com/content, yoursite.com/content/contents, etc.
It is important to note that this can be prevented (if, for example, you did not want certain htaccess commands to affect a specific directory) by placing a new htaccess file within the directory you don’t want affected with certain changes, and removing the specific command(s) from the new htaccess file that you do not want affecting this directory. In short, the nearest htaccess file to the current directory is treated as the htaccess file. If the nearest htaccess file is your global htaccess located in your root, then it affects every single directory in your entire site.
Before you go off and plant htaccess everywhere, read through this and make sure you don’t do anything redundant, since it is possible to cause an infinite loop of redirects or errors if you place something weird in the htaccess.
You will probably want to create an error document for codes 404 and 500, at the least 404 since this would give you a chance to handle requests for pages not found. 500 would help you out with internal server errors in any scripts you have running. You may also want to consider ErrorDocuments for 401 – Authorization Required (as in when somebody tries to enter a protected area of your site without the proper credentials), 403 – Forbidden (as in when a file with permissions not allowing it to be accessed by the user is requested) and 400 – Bad Request, which is one of those generic kind of errors that people get to by doing some weird stuff with your URL or scripts.
In order to specify your own customized error documents, you simply need to add the following command, on one line, within your htaccess file:
ErrorDocument code /directory/filename.ext or ErrorDocument 404 /errors/notfound.html
This would cause any error code resulting in 404 to be forward to yoursite.com/errors/notfound.html
Likewise with: ErrorDocument 500 /errors/internalerror.html
You can name the pages anything you want (I’d recommend something that would prevent you from forgetting what the page is being used for), and you can place the error pages anywhere you want within your site, so long as they are web-accessible (through a URL). The initial slash in the directory location represents the root directory of your site, that being where your default page for your first-level domain is located. I typically prefer to keep them in a separate directory for maintenance purposes and in order to better control spiders indexing them through a ROBOTS.TXT file, but it is entirely up to you.
If you were to use an error document handler for each of the error codes I mentioned, the htaccess file would look like the following (note each command is on its own line):
ErrorDocument 400 /errors/badrequest.html ErrorDocument 401 /errors/authreqd.html ErrorDocument 403 /errors/forbid.html ErrorDocument 404 /errors/notfound.html ErrorDocument 500 /errors/serverr.html
You can specify a full URL rather than a virtual URL in the ErrorDocument string (http://yoursite.com/errors/notfound.html vs. /errors/notfound.html). But this is not the preferred method by the server’s happiness standards.
You can also specify HTML, believe it or not!
ErrorDocument 401 "<body bgcolor=#ffffff> You have to actually <b>BE</b> a member to view this page!
The only time I use that HTML option is if I am feeling particularly saucy, since you can have so much more control over the error pages when used in conjunction with xSSI or CGI or both. Also note that the ErrorDocument starts with a ” just before the HTML starts, but does not end with one…it shouldn’t end with one and if you do use that option, keep it that way. And again, that should all be on one line, no naughty word wrapping!
Blocking Users by IP
Add the following –changing the IPs to suit your needs–each command on one line:
order allow,deny deny from 123.45.6.7 deny from 012.34.5. allow from all
You can deny access based upon IP address or an IP block. The above blocks access to the site from 123.45.6.7, and from any sub domain under the IP block 012.34.5. (012.34.5.1, 012.34.5.2, 012.34.5.3, etc.) I have yet to find a useful application of this, maybe if there is a site scraping your content you can block them, who knows.
Block traffic from a single referrer
RewriteEngine on # Options +FollowSymlinks RewriteCond %{HTTP_REFERER} badsite.com [NC] RewriteRule .* - [F]
Block traffic from multiple referrers
RewriteEngine on # Options +FollowSymlinks RewriteCond %{HTTP_REFERER} badsite.com [NC,OR] RewriteCond %{HTTP_REFERER} anotherbadsite.com RewriteRule .* - [F]
In the “single referrer” case above, “badsite\.com” is the domain you wish to block. Note the backslash proceeding the period (“.”) to actually donate a period, as in Regular Expressions, a period donates any character, which is not what we want. The flag “[NC]” is added to the end of the domain to make it case insensitive, so whether the domain is “badsite.com”, “Badsite.com” etc, however bad it gets, it gets blocked. Finally, the last line in the .htaccess file specifies that the action to take when a match is found is to fail the request, meaning the referrer traffic will hit a 403 Forbidden error. The only difference between blocking a single referrer and multiple referrers is the modified [NC, OR] flag in the later case to every domain but the last.
The definition of a “bad bot” varies depending on who you ask, but most would agree they are the spiders that do a lot more harm than good on your site (ie: an email harvester). A site ripper on the other hand are offline browsing programs that a surfer may unleash on your site to crawl and download every one of its pages for offline viewing. In both cases, both your site’s bandwidth and resource usage are jacked up as a result, sometimes to the point of crashing your server. Bad bots typically ignore the wishes of your robots.txt file, so you’ll want to ban them using means such as .htaccess. The trick is to identify a bad bot.
Below is a useful code block you can insert into.htaccess file for blocking a lot of the known bad bots and site rippers currently out there.
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image Sucker [OR] RewriteCond %{HTTP_USER_AGENT} Indy Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC Web Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web Image Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.* - [F,L]
Bots that are listed above will all receive a 403 Forbidden error when trying to view your site. The amount of bandwidth savings and decrease in server resource usage as a result may be significant in many cases.
Redirects
htaccess uses redirect to look for any request for a specific page (or a non-specific location, though this can cause infinite loops) and if it finds that request, it forwards it to a new page you have specified:
Redirect /olddirectory/oldfile.html http://yoursite.com/newdirectory/newfile.html
Note that there are 3 parts to that, which should all be on one line : the Redirect command, the location of the file/directory you want redirected relative to the root of your site (/olddirectory/oldfile.html = yoursite.com/olddirectory/oldfile.html) and the full URL of the location you want that request sent to. Each of the 3 is separated by a single space, but all on one line. You can also redirect an entire directory by simple using Redirect /olddirectory http://yoursite.com/newdirectory/
Disallowing Hotlinking
Using .htaccess, you can disallow hot linking on your server, so those attempting to link to an image or CSS file on your site, for example, is either blocked (failed request, such as a broken image) or served a different content (ie: an image of an angry man) . Note that mod_rewrite needs to be enabled on your server in order for this aspect of .htaccess to work. Inquire your web host regarding this.
With all the pieces in place, here’s how to disable hot linking of certain file types on your site, in the case below, images, JavaScript (js) and CSS (css) files on your site. Simply add the below code to your .htaccess file, and upload the file either to your root directory, or a particular subdirectory to localize the effect to just one section of your site:
RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www.)?mydomain.com/.*$ [NC] RewriteRule .(gif|jpg|js|css)$ - [F]
Be sure to replace “mydomain.com” with your own. The above code creates a failed request when hot linking of the specified file types occurs. In the case of images, a broken image is shown instead.
Serving alternate content when hot linking is detected
You can set up your .htaccess file to actually serve up different content when hot linking occurs. This is more commonly done with images, such as serving up an Angry Man image in place of the hot linked one. The code for this is:
RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www.)?mydomain.com/.*$ [NC] RewriteRule .(gif|jpg)$ http://www.mydomain.com/angryman.gif [R,L]
Same deal- replace mydomain.com with your own, plus angryman.gif.
Preventing Directory Listings
Typically a server is setup to prevent directory listing, but sometimes they are not. If not, become self-sufficient and fix it yourself:
IndexIgnore *
The * is a wildcard that matches all files, so if you stick that line into an htaccess file in your images directory, nothing in that directory will be allowed to be listed.
On the other hand, what if you did want the directory contents to be listed, but only if they were HTML pages and not images? Simple says I:
IndexIgnore *.gif *.jpg
This would return a list of all files not ending in .jpg or .gif, but would still list .txt, .html, etc.
And conversely, if your server is setup to prevent directory listing, but you want to list the directories by default, you could simply throw this into an htaccess file the directory you want displayed:
Options +Indexes
If you do use this option, be very careful that you do not put any unintentional or compromising files in this directory. And if you guessed it by the plus sign before Indexes, you can throw in a minus sign (Options -Indexes) to prevent directory listing entirely–this is typical of most server setups and is usually configured elsewhere in the apache server, but can be overridden through htaccess.
If you really want to be tricky, using the +Indexes option, you can include a default description for the directory listing that is displayed when you use it by placing a file called HEADER in the same directory. The contents of this file will be printed out before the list of directory contents is listed. You can also specify a footer, though it is called README, by placing it in the same directory as the HEADER. The README file is printed out after the directory listing is printed.