List of Articles Icon

Knowledge Base

Guides and answers for your VPS, the client area, and billing

Bots and crawlers are hammering my website

What this is

Your traffic allowance is draining, or the CPU graph climbed, and the visitors aren't human. Automated clients, search crawlers, AI training and scraping bots, SEO tools, vulnerability scanners, and plain scrapers, now make up a very large share of all web traffic, and a small site can receive more bot requests than human ones. Most are harmless at low volume; the problem is the aggressive tail. Here's how to see who's hitting you and how to push back proportionately.

Step 1: identify the culprits in the access log

The access log has everything. Two summaries answer most questions:

# Top requesting IPs
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

# Top user agents
awk -F'"' '{print $6}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

(For an ongoing view, goaccess /var/log/nginx/access.log gives you a live terminal dashboard from the same data.) What you'll typically find: named crawlers (Googlebot, bingbot, GPTBot, ClaudeBot, Amazonbot, Bytespider, assorted *-AI agents), SEO crawlers (AhrefsBot, SemrushBot, MJ12bot), and the shadier end, blank or fake user agents from rotating IPs.

Decide what you actually mind: Googlebot is usually earning its keep; an AI scraper pulling your entire catalog every night on your bandwidth may not be.

Step 2: robots.txt, the polite request

robots.txt is a request, not a control: legitimate crawlers honor it, abusive ones ignore it, so it's the right tool exactly for the named, well-behaved bots:

User-agent: GPTBot
Disallow: /

User-agent: AhrefsBot
Disallow: /

Also set Crawl-delay for bots that honor it, and make sure you're not wasting crawl budget on infinite URL spaces (calendars, faceted filters), those make even polite bots look like floods.

Step 3: rate limiting, the enforced request

For clients that ignore robots.txt, make the server enforce a budget. nginx's built-in limiter, applied per IP:

# in http {}
limit_req_zone $binary_remote_addr zone=perip:10m rate=5r/s;

# in the server/location block
limit_req zone=perip burst=20 nodelay;

Humans browse well under 5 requests/second; scrapers slam into it and get 503s instead of your rendered pages. This alone converts most floods from a CPU problem into a log line. (Caddy and Apache have equivalents.)

Step 4: fail2ban for the abusive tail

For IPs that keep hammering after being limited, ban them at the firewall: a fail2ban jail watching the access log for repeated 403/503s (or requests to paths like /wp-login.php and /.env that only scanners touch) turns the pattern into a temporary IP ban automatically. This pairs with, not replaces, the rate limit.

Step 5: Cloudflare, the heavy artillery

If the traffic is distributed (thousands of IPs, fake agents), per-IP tools run out of road, and this is exactly what Cloudflare's free tier is built for: Bot Fight Mode, managed challenges for suspicious clients, per-path WAF rules ("challenge everything hitting /api at >n/min"), and their AI crawler blocking toggle that handles the whole scraper category in one switch. Bonus: challenged bots never reach your VPS at all, so they stop costing you traffic entirely, cached-and-challenged at the edge beats rate-limited at the origin.

Keep perspective

  • Check the damage honestly on your Graphs: a bot wave that costs 2 GB and 3% CPU needs robots.txt, not an arms race.
  • Bots hunting /wp-login.php, /xmlrpc.php, and /.env aren't a traffic problem but a security probe, make sure those paths are locked or absent rather than just rate-limited.
  • If what you're seeing isn't crawling but a flood meant to take you down, that's the DDoS lane.

Still need help?

You can open a support ticket. So we can help on the first reply, it's worth mentioning:

  • the VPS and the site being hammered,
  • the top IPs or user agents from your access log,
  • what you've already put in place (robots.txt, rate limiting, Cloudflare).
  • "Bots are eating my bandwidth, how do I stop them?"
  • "How do I see which bots are crawling my site?"
  • "Does robots.txt actually block crawlers?"
  • "How do I block AI crawlers like GPTBot?"
  • "How do I rate limit requests in nginx?"
  • "Should I use Cloudflare to stop bot traffic?"
Last reviewed: 2026-07-02