Back to all posts
guides 7 min read

Bot Traffic Now Exceeds Half the Web: What It Means for You

Serap Gündoğdu ·
Bot Traffic Now Exceeds Half the Web: What It Means for You

The headline going around this month is that bots now make up more than half of all web traffic. It is true, and it sounds alarming, and most of the takes attached to it are the wrong kind of alarming. The number is real but the conclusion people draw from it usually is not.

So let me unpack what the data actually says, separate the part that should change your behaviour from the part that should not, and be honest about where this leaves a normal site owner. The short version: the share of bots crossing fifty percent is a milestone, not an emergency. What matters is which bots are hitting you and what they do to your crawl budget, and that is a question you can answer in an afternoon rather than worry about in the abstract.

What the Numbers Actually Say

Industry reports converge on the same picture. Across multiple measurements of aggregate internet traffic, automated requests have edged past traffic from real people, somewhere just over half. That figure has been climbing for years, so the surprise is less that it happened and more that it took this long to cross the line.

The important detail is hidden under the headline. “Bot traffic” is not one thing. It splits into roughly three buckets, and they could not be more different from each other:

  • Good bots with a job. Search engine crawlers, uptime monitors, feed readers, and the newer AI crawlers that fetch pages for training or for answering questions. These identify themselves and mostly follow the rules.
  • Bad bots. Scrapers stealing content, credential stuffers, inventory hoarders, vulnerability scanners. These lie about who they are and ignore your rules on purpose.
  • AI crawlers, the new and fast-growing slice. GPTBot, ClaudeBot, Google-Extended, PerplexityBot and others. They are technically good bots, but their volume has grown so quickly that they deserve their own line.

When someone says bots are half the web, they are adding all three together. Treating that combined number as a single threat is the first mistake, because your response to a content scraper and your response to Googlebot should be opposites.

A bar split into three coloured segments labelled good bots, bad bots, and AI crawlers, sitting next to a smaller human-traffic bar

Why the AI Slice Is Growing So Fast

The growth is not evenly spread. Classic search crawling is relatively stable. The part adding the most new requests is AI: models being trained, and increasingly, models fetching live pages to answer a user’s question in the moment.

This is a genuine shift, not just more of the same. A traditional search crawler visits, indexes once, and sends you visitors over the following weeks. A live AI fetch may pull your page, use it to compose an answer, and send you nothing, because the user got what they needed inside the chat. We dug into whether these engines even read the files you set up for them in our look at whether AI engines actually read llms.txt, and the answer was sobering. The protocols being built to make this exchange fairer, like NLWeb and AIPREF, are still early, which we covered in the agentic web standards explainer.

The practical upshot is that more of your server’s work is now being done for an audience that may never click through. That is uncomfortable, but it is a strategy question for another day. The immediate, concrete cost is simpler: every one of those fetches spends part of your crawl budget.

The Part That Actually Touches You: Crawl Budget

Here is where the half-the-web number stops being trivia and starts being your problem. Your site has a finite amount of attention any crawler will spend on it in a given window. When bot volume rises, more requests compete for that same budget, and the requests that win are not always the pages you care about.

If a crawler burns its visit on faceted filter URLs, expired tag pages, session-ID duplicates, and thin archive pages, it has less left for the pages that earn you traffic. We wrote about exactly this failure in why low-value pages get crawled more, and the rise in overall bot traffic makes the symptom sharper, not different. A messy site wasted crawl budget before; now it wastes more of it, faster.

The good news is that the levers have not changed. Rising bot traffic is a reason to finally pull them, not a reason to panic:

  • Stop feeding crawlers junk. Trim the low-value, near-duplicate, and parameter-generated URLs that exist only to be crawled. The crawl budget optimization guide walks through the full process.
  • Shape attention with structure. Internal links are the strongest signal you control for where crawlers spend their time. We cover this in internal linking as a crawl budget tool.
  • Set clear rules for AI bots. Decide deliberately which AI crawlers may access what, and write it down properly. The complete guide to robots.txt and AI bots covers the syntax and the trade-offs.

How to Find Out What Is Actually Hitting Your Site

The honest answer to “are bots a problem for me” is: you do not know until you look, and almost nobody looks. The aggregate half-the-web figure tells you nothing about your specific site. Your mix could be ninety percent Googlebot doing useful work, or it could be a scraper hammering a single endpoint and dragging your response times down.

Two things tell you the truth. First, your server logs. They record every request, the real user agent, the IP, the status code, and which URLs got hit hardest. A log sample over a few days will show you instantly whether your crawl budget is being spent on money pages or on a swamp of parameter URLs. Be aware that user agents can be faked, so cross-check the heavy hitters against the published IP ranges that real search and AI crawlers document.

Second, crawl your own site the way a bot would. When you run a crawler over your site, you see the same structure a search or AI bot sees: how many URLs you actually expose, how many are thin or duplicated, where redirect chains and orphan pages quietly eat budget. This is the part our own crawler is built for, and it works the same across Windows, macOS, and Linux, so it fits whatever you already run. You are not guessing at the aggregate number any more; you are looking at your own.

A magnifying glass over a site map, with a few pages highlighted as wasted crawl paths and the main pages clear

The Bottom Line

Bots passing half of all web traffic is a real milestone and a useful prompt, but it is not, by itself, something to fear. The number lumps together helpful crawlers, harmful scrapers, and a fast-growing AI slice that each demand a different answer. Averaging them into one scary statistic is how you end up either ignoring a real scraper problem or blocking the crawlers that actually send you traffic.

The move that pays off is unglamorous and entirely within your control: look at your own logs, crawl your own site, and spend your crawl budget on the pages that matter. The web getting busier with bots only raises the value of a site that is clean and easy to crawl. That part has not changed, and it is still the work worth doing.