Cloudflare to block AI crawlers on ad-supported pages by default from September 15

From September 15, 2026, Cloudflare blocks AI training and agent crawlers by default on ad-supported pages, while search crawlers stay allowed. Here's what changes and who it affects.

Jul 3, 2026 · by Nadeem Siddique

Cloudflare AI crawlers Publishers

On September 15, 2026, Cloudflare changes its default settings so that “mixed-use” AI crawlers are blocked from any page that shows ads. Crawlers used for AI training and for AI agents get blocked on those pages by default. Crawlers used for ordinary search indexing stay allowed.

It is a small settings change with a large blast radius. Cloudflare sits in front of a big share of the web, so a default it flips does not stay a default for long. It becomes the way a large chunk of the internet behaves.

How we got here

This is the next turn of a screw Cloudflare started tightening in 2025. On July 2 that year it became the first big infrastructure provider to block AI crawlers by default on new sites, and it launched a “pay per crawl” beta that let owners charge bots for access instead of only allowing or blocking them. Cloudflare branded the whole effort “Content Independence Day,” and the argument behind it has not changed: site owners should decide how AI uses their work, and they should not have to hunt down an opt-out for scraping that happens by default.

What is new in 2026 is the precision. The 2025 move treated AI crawlers as one bucket. The 2026 change breaks that bucket into pieces, because the crude version was catching traffic that publishers actually wanted, namely the search crawling that still sends them readers.

What is changing

Cloudflare is sorting AI bots into three groups. Search crawlers, the ones that index pages so people can find them, stay allowed by default. Training crawlers, which pull content to train or fine-tune a model, get blocked by default on pages that show ads. So do agent bots, meaning the ones acting in real time for a user, such as chat fetchers and browser-use tools.

The reasoning is straightforward. Content that earns ad revenue should not be crawled for AI without the owner’s say-so. Search sends readers back to the site that did the work, so it stays welcome. Training and agent traffic tends to keep users inside a chatbot instead, so it does not.

The catch is that a lot of crawlers refuse to stay in one box. By Cloudflare’s own count, about 36% of crawler activity now comes from mixed-use bots that do search and training at once. That single number is the reason the whole policy exists, and it is also where the fight is.

Who it applies to

The new defaults hit new Cloudflare customers, new sites created by existing customers, and every existing free customer. Paid customers who are already set up keep their current settings unless they opt in, and anyone can re-allow crawlers on ad-supported pages by changing their site settings. Cloudflare expects about 85% of its ad-supported customers to leave the block on. If that holds, it is a big shift in who can crawl the ad-funded web, and it happens without most of those owners touching a single toggle.

The mixed-use problem

The real teeth here are in how Cloudflare treats bots that do more than one job. It applies the most restrictive rule to any multi-purpose crawler. That catches Googlebot, which indexes for Search and also feeds Gemini, AI Overviews, and AI Mode, all under one user agent. So from September 15, customers who turn on Training blocking will find Googlebot blocked on their ad pages too. Applebot and Bing’s crawler sit in the same spot, since they also fold search and AI work into a single bot, though each offers an AI opt-out.

Cloudflare has not been shy about why it is pushing here, and it has aimed most of its argument at Google. It says Google’s position gives it access to roughly twice the web content available to rivals, because staying visible in Search usually means allowing AI use as well. Google does offer Google-Extended, a robots.txt directive that lets a site keep traditional search while opting out of model training. But there is a gap it does not cover: a publisher who wants to appear in AI Mode answers, yet does not want to train Google’s models, has no clean way to ask for that today. Cloudflare’s category split is a way of forcing that question into the open.

Paying publishers, not only blocking bots

The blocking comes with a second half, and it is the more interesting one. Cloudflare is turning its 2025 “pay per crawl” experiment into a wider “Pay Per Use” model.

The original design was a tollbooth. When a crawler asked for a paid URL, Cloudflare answered with an HTTP 402 Payment Required response and a crawler-price header that told the bot what access would cost. It was clever, but it paid owners for the fetch, not for the value. Cloudflare says publishers kept asking for a third path between “block everything” and “give it away free,” so the model is moving from charging per crawl to paying per use, meaning the owner earns when their content actually does something, not merely when a bot grabs it.

Two launch partners show what that looks like. Ceramic.ai runs a pay-per-query model that returns queries, citations, and ranking data to publishers, and pays them when their content shows up in Ceramic’s AI search results. You.com is building a way for AI agents to buy access to individual pieces of premium content on demand. Publishers and platforms including beehiiv, Condé Nast, and Patreon have signed on early. It is a small roster so far, and whether it grows into a real marketplace or stays a pilot is the open question hanging over the money side of this.

How it works

The controls lean on open signals a bot can prove, not on IP blocklists that go stale. Cloudflare uses a managed robots.txt with a new Content Signals “use” directive, which lets an owner set intent at graduated levels: immediate (interact but store nothing), reference (index, excerpt, and link back, the default), and training or inference. Crawlers identify themselves through Web Bot Auth by registering a public key directory and signing each request with HTTP Message Signatures, so a bot can be recognized rather than guessed at, and shown which resources are paid.

Will the AI giants play along?

This is the part nobody can answer yet. Cloudflare can block a crawler, but it cannot make Google, Apple, or Microsoft split their bots in two. The whole design is pressure: separate your search crawler from your training crawler, or lose access to ad-supported pages across a large slice of the web. Whether the big AI companies decide that access is worth re-architecting their crawlers for, or whether they wait it out and lean on their opt-out directives, is the thing to watch after September 15. The answer decides whether this policy reshapes crawling or just annoys it.

Why now

Cloudflare points to a web where non-human traffic has passed human traffic, and where AI answers eat into the clicks that fund publishers. It cites a field study finding that Google’s AI Overviews cut outbound clicks by about 40%, and estimates roughly half of all crawls are wasted. CEO Matthew Prince put it plainly: now that most Internet traffic is non-human, the industry has to move faster so a sustainable ecosystem can emerge.

What it means if you run a site on Cloudflare

Free and new sites get the block on ad pages automatically, so there is nothing to switch on to be covered. Search visibility stays intact by default, which means keeping AI training out does not cost you discoverability. You still hold the dial: re-allow the traffic if you want it, or try Pay Per Use as a middle path between blocking everything and giving it away.

The one thing worth doing before September 15 is checking your multi-purpose crawlers, Googlebot above all. Because Cloudflare applies the strictest rule to a mixed bot, turning on Training blocking can quietly pull Googlebot off your ad pages, and that is the kind of change you want to make on purpose, not discover in a traffic dip a week later.