Preventing resource issues from Bots and crawlers

Bots and crawlers make up a significant portion of website traffic, driving up resource usage. While some crawlers like Google and Bing are essential for SEO, many others - particularly AI scraping bots - may be unwanted. This guide covers methods to optimize and control crawler access to your site.

If you don't already have one, we would suggest having a robots.txt file in your websites root directory. This file is checked by bots before crawling your sites. With this, you can control how bots will crawl your site.

A template which blocks common bad bots: robots.txt

robots.txt - Denying bots to the Entire site

If you wished to Block an entire Bot from accessing a site you can block with a Disallow on /

Example blocking PetalBot and Bytespider:

User-agent: PetalBot
User-agent: Bytespider
Disallow: /

robots.txt - Crawl Delays

The crawl-delay is an unofficial directive meant to slow down crawling in order not to overload the web site. Not all Search engines support this (e.g. Google) so will ignore it. The below example would mean any bot which supports this directive will take 3 seconds between each request

User-agent: *
Crawl-delay: 3

Disallowing Specific queries

If you're getting bots crawling Querystrings, this can cause hundreds or even thousands of requests to made for what is essentially the same product and Page.

Example 1 of the type of Query you may see against a product which can come in many different colours or Sizes. A request does not need to be made to each page, but a bot will request each one as if it is a different page:

66.249.66.166 - - [16/Mar/2025:13:31:47 +0000] "GET /products/flooringcovers?brand=34&colour=65&range=34 HTTP/1.1" 200 1027427 "https://google.com" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.6998.165 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Example 1 Rule to prevent this:

User-agent: *
Disallow: /*?*colour=

Example 2 for a query logs:

5.255.231.142 - - [08/Apr/2025:10:58:27 +0100] "GET /home?q=Availability-In+stock-Not+available/Colour-Coral+Orange-Blast+Blue-Ocean+Blue-Sand+Beige-Scarlet-Olive+Green/Material-Aluminium-Graphite-GRAPHITE+HYBRID-HM+Graphite-Minolon HTTP/1.1" 403 1242 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

Example 2 for a block against these queries:

User-agent: *
Disallow: home?q=

.htaccess - Blocking bots

Sometimes bad or aggressive bots will ignore a robots.txt file, or maybe you want a more immediate solution to stop a specific bot from crawling the site. If so, you can deny access in a .htaccess file by using a rule:

# Bot Agent Block Rule
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (BOTNAME|BOTNAME2|BOTNAME3) [NC]
RewriteRule (.*) - [F,L]

Place the above in the websites .htaccess file (if you don't have one, you can create one), replacing/remove "BOTNAME[2-3]" with the bots you wish to deny and then save it. The bot will be denied with a 403 response afterwards and eventually stop crawling the site all together.


How did we do?


Powered by HelpDocs (opens in a new tab)
© Krystal Hosting Ltd 2002–