Preventing resource issues from Bots and crawlers
Bots and crawlers make up a significant portion of website traffic, driving up resource usage. While some crawlers like Google and Bing are essential for SEO, many others - particularly AI scraping bots - may be unwanted. This guide covers methods to optimize and control crawler access to your site.
If you don't already have one, we would suggest having a robots.txt
file in your websites root directory. This file is checked by bots before crawling your sites. With this, you can control how bots will crawl your site.
A template which blocks common bad bots: robots.txt
robots.txt - Denying bots to the Entire site
If you wished to Block an entire Bot from accessing a site you can block with a Disallow on /
Example blocking PetalBot and Bytespider:
User-agent: PetalBot
User-agent: Bytespider
Disallow: /
robots.txt - Crawl Delays
The crawl-delay
is an unofficial directive meant to slow down crawling in order not to overload the web site. Not all Search engines support this (e.g. Google) so will ignore it. The below example would mean any bot which supports this directive will take 3 seconds between each request
User-agent: *
Crawl-delay: 3
Disallowing Specific queries
If you're getting bots crawling Querystrings, this can cause hundreds or even thousands of requests to made for what is essentially the same product and Page.
Example 1 of the type of Query you may see against a product which can come in many different colours or Sizes. A request does not need to be made to each page, but a bot will request each one as if it is a different page:
66.249.66.166 - - [16/Mar/2025:13:31:47 +0000] "GET /products/flooringcovers?brand=34&colour=65&range=34 HTTP/1.1" 200 1027427 "https://google.com" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.6998.165 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Example 1 Rule to prevent this:
User-agent: *
Disallow: /*?*colour=
Example 2 for a query logs:
5.255.231.142 - - [08/Apr/2025:10:58:27 +0100] "GET /home?q=Availability-In+stock-Not+available/Colour-Coral+Orange-Blast+Blue-Ocean+Blue-Sand+Beige-Scarlet-Olive+Green/Material-Aluminium-Graphite-GRAPHITE+HYBRID-HM+Graphite-Minolon HTTP/1.1" 403 1242 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
Example 2 for a block against these queries:
User-agent: *
Disallow: home?q=
.htaccess - Blocking bots
Sometimes bad or aggressive bots will ignore a robots.txt file, or maybe you want a more immediate solution to stop a specific bot from crawling the site. If so, you can deny access in a .htaccess
file by using a rule:
# Bot Agent Block Rule
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (BOTNAME|BOTNAME2|BOTNAME3) [NC]
RewriteRule (.*) - [F,L]
Place the above in the websites .htaccess file (if you don't have one, you can create one), replacing/remove "BOTNAME[2-3]" with the bots you wish to deny and then save it. The bot will be denied with a 403 response afterwards and eventually stop crawling the site all together.