The Agentic Web Standards: NLWeb, MCP, and AIPREF Explained

There is a new acronym almost every month now. NLWeb, MCP, A2A, AIPREF, llms.txt. Each one arrives with a blog post telling you your website is about to be obsolete unless you adopt it today. If you run a site and try to keep up, the honest feeling is somewhere between curiosity and fatigue.

So let me do the boring thing and explain what these standards actually are, what they are trying to fix, and what a site owner should genuinely do about them right now. Some of it matters. Most of it is something to understand, not something to implement this week.

For context, two earlier shifts already happened. First came the question of whether an AI engine would read your page and quote it, which our guide to Generative Engine Optimization covers. Then came agents that operate your site on a user’s behalf, which we walked through in Agentic SEO and the Universal Commerce Protocol. This article is about a third layer underneath both: the protocols being built so agents and sites can talk to each other in a standard way.

What the Agentic Web Actually Means

“Agentic web” is a loose phrase, so it helps to be precise. It does not only mean an AI clicking your buttons. That part, an agent driving a browser through your checkout, is the topic of the earlier agentic SEO guide.

The newer idea is a layer of standards that lets a site offer a clean, structured way for an agent to ask questions and take actions, without scraping the rendered page at all. Think of the difference between a person reading a restaurant menu off a chalkboard versus the restaurant handing them a structured list with prices, allergens, and availability already sorted. Same information, but one is far easier and cheaper to act on.

Three standards are doing most of the talking in this space. NLWeb, which lets a site answer natural language questions. MCP, which lets a site expose tools an agent can call. And AIPREF, which lets a site state how its content may be used by AI. They overlap, they are at very different stages of maturity, and only one of them is a finished idea you could reasonably ship.

Three mosaic panels representing the three standards: conversation for NLWeb, tools for MCP, and permission for AIPREF

NLWeb: Your Site as a Conversation

NLWeb is an open project announced by Microsoft at its Build 2025 conference. The interesting detail is who is behind it: R.V. Guha, the person associated with RSS, RDF, and Schema.org. That lineage matters, because NLWeb leans on formats you already know rather than inventing a new universe.

The mechanics are simple to describe. An NLWeb enabled site exposes an /ask endpoint. An agent, or a person, sends a plain language question to it, and the site returns a structured JSON answer grounded in that site’s own content. Under the hood it reuses Schema.org markup, RSS, sitemaps, and feeds, then combines them with a language model and a vector index so the answers stay tied to your real data instead of being made up.

There is one more detail worth knowing. Every NLWeb instance is also an MCP server, which brings us to the next standard.

MCP: A Universal Port for Agents

MCP, the Model Context Protocol, was introduced by Anthropic in late 2024. The common analogy is a universal connector. Before a shared standard, every AI integration was custom wiring. MCP gives agents one standard way to discover and use the tools a service offers.

For a website, an MCP server is something the site owner sets up and hosts, usually one per site. It defines a set of tool endpoints, things like getProductDetails or createAppointment, described with a machine readable schema. An agent that speaks MCP can then discover those tools and call them in a predictable way, instead of guessing how your page works.

It is tempting to call MCP a smarter robots.txt, and that captures part of it, but the comparison undersells it. A robots.txt file tells a bot where it may not go. An MCP server tells an agent what it can actually do and hands it clean, segmented data to do it with. For some businesses that is a real opportunity, including publishers who might expose a structured dataset and charge for access to it. For most small sites, it is infrastructure they do not need yet.

AIPREF: Saying How AI May Use Your Content

The third standard is the most grounded of the three, because it solves a problem every site owner already has: a way to say “you may read this, but do not train on it,” and have machines respect the distinction.

AIPREF is a working group at the IETF, the same standards body that defines core internet protocols. It is building a small, shared vocabulary for expressing AI usage preferences. The current draft defines two usage categories. train-ai covers using your content to build or refine an AI model. search covers applications whose main purpose is to find your content and point users back to it, with attribution. Each category can be set to allow, disallow, or left unstated.

The group is also working on how you attach those preferences to content, either embedded in the content itself or through a file in the spirit of robots.txt, plus rules for reconciling conflicting signals. This connects directly to how you already manage crawler access, which we cover in the complete guide to robots.txt and AI bots.

Here is the honest status. AIPREF is on the standards track but it is not finished. The vocabulary draft is still being revised, and the current revision is set to expire in late 2026. That is normal for IETF work, and it is also exactly why you should not rebuild your access policy around it this quarter. Understand it, watch it, keep your existing robots and bot rules clean in the meantime.

So What Do You Actually Do Today

Here is the part most articles skip. Agentic web optimization is often drawn as a stack of layers, and the useful insight is that every layer rests on the one below it. Semrush frames it roughly like this, and the order is what matters:

Two columns comparing solid foundations you should do now against experimental protocols to watch, drawn as stacked retro blocks

SEO foundations. Your site has to be crawlable and free of technical faults. If a normal crawler trips over your site, an agent will too.
Agent readiness. Clear writing, semantic structure, complete data, so a machine can parse what your brand is and what you offer without guessing.
Off site presence. Consistent information about your brand across the web, so an agent forms an accurate picture of you from more than one source.
Action layer. The site being actually operable for an agent, the form that submits, the button that works.

Notice what is and is not on that list. NLWeb endpoints and MCP servers sit at the very top, and they only pay off once everything beneath them is solid. The vast majority of the value for a typical site lives in the bottom two layers, and those are things you control today with no new protocol at all.

Concretely, the work that pays off regardless of which standard wins is the same boring list it has always been. Clean, crawlable HTML with no broken links or redirect chains. No accidental noindex or blocked pages. Complete Schema.org markup on your important pages rather than thin markup everywhere, because incomplete structured data signals uncertainty to an agent the same way it does to a search engine. Accurate feeds and sitemaps. Honest, current product and service data. That last set is also the foundation NLWeb itself reads from, which is the point: do the basics well and you are already most of the way to being agent ready.

Schema markup deserves a specific note, because it is quietly becoming part of the plumbing of the agentic web. Agents use it not just to identify what an entity is, but to judge relationships, relevance, and whether your content is trustworthy enough to act on. Fully populated markup on your key pages beats thin markup spread thin. We go deeper in the guide to schema markup for SEO and AI.

The Hype Filter

Now the contrarian part, because the agentic web has attracted some genuinely silly takes. You will read that your website is already obsolete, that human visitors are an afterthought, that you need an MCP server by next quarter or you vanish. Treat those the way you would treat any pitch that ends in “buy now or lose forever.”

A few things are true at the same time, and holding all of them is the honest position.

The standards are real and the people behind them are serious. NLWeb comes from the person who helped give us Schema.org. MCP already has broad traction as a way for agents to use tools. AIPREF is being built by the IETF. This is not vaporware.

But adoption is early and the barriers are concrete. Surveys of publishers point to technical complexity and plain organizational readiness as the top reasons they have not deployed agentic tooling, not lack of interest. Standards that are still in draft will change. And the risks scale with the capability: an agent that can read, compare, and act on a user’s behalf is also an agent with access to sensitive preferences and money, which is exactly why the consent layer that AIPREF is trying to standardize exists at all.

So the reasonable conclusion is not “ignore it” and not “panic.” It is sequence. Most small and midsize sites will get far more value, sooner, from better content structure, cleaner data, simpler conversion paths, and stronger trust signals than from racing to stand up a protocol server on top of a shaky foundation. Betting a roadmap on a draft specification is how you spend a quarter building something the spec then changes underneath you. If you want a longer version of this argument applied to AI search in general, we made it in is SEO dead in 2026.

None of this means the protocol layer does not matter. It means the order matters. Get the foundation right, because the foundation is what every one of these standards reads from, and it is the only part that pays off whether the agentic web arrives next year or takes five.

Where a Crawler Fits

The unglamorous truth running through all of this is that “agent ready” and “search ready” are almost the same checklist. Crawlable pages. No broken links. Correct status codes. No stray noindex. Schema present and complete. Predictable structure. An agent and a search bot both fail on the same problems.

That overlap is good news, because it means you can measure your readiness with tools you already understand. A crawl of your own site surfaces exactly the foundation issues that block both: pages returning 4xx or 5xx, links that go nowhere, missing or thin structured data, blocked or noindexed pages you did not mean to hide. This is the kind of pass Seodisias is built for. You point it at your site, with no URL limit, and it reports the broken links, the bad status codes, the missing schema, and the blocked pages in one run, the same faults that trip an AI agent and a search crawler alike. For the full method, see the technical SEO audit checklist and the complete guide to SEO crawlers.

The Bottom Line

NLWeb, MCP, and AIPREF are worth understanding, and one of them, AIPREF, is worth watching closely because it touches how your content may legally be used. But none of them changes the work you should be doing this month. Make your site clean, crawlable, well structured, and honest about its data. That foundation is what the agentic web is built to read, it is what search engines already reward, and it is the only part of this story that is finished enough to act on today. For a plain reading of where official guidance lands on all of this, Google’s own AI optimization guide, explained is a good companion to this piece.

The Agentic Web Standards: NLWeb, MCP, and AIPREF Explained

What the Agentic Web Actually Means

NLWeb: Your Site as a Conversation

MCP: A Universal Port for Agents

AIPREF: Saying How AI May Use Your Content

So What Do You Actually Do Today

The Hype Filter

Where a Crawler Fits

The Bottom Line

Related Posts

Agentic SEO: Preparing Your Site for AI Agents and the Universal Commerce Protocol

We Audited the Top 1000 Sites for AI Search Readiness

ai-dataset.json and AI Index Files: Do You Need One in 2026?