SOM vs MCP: How Publishers and Agents Are Different Problems
MCP is the protocol an agent uses to talk to its tools. SOM is the format a publisher uses to describe a web page. They are routinely confused, frequently positioned as alternatives, and in fact occupy different layers of the stack — designed to compose, not compete.
In conversations with publishers, framework authors, and AI engineers throughout 2026, the most frequently confused pair of acronyms in the agent-infrastructure landscape is MCP and SOM. The confusion is not surprising. Both arrived at the same moment, both carry similar surface-level promises about making the web friendlier to AI, and both are open standards backed by credible institutional sponsors. The conflation is a category error, but it is a forgivable one. The two specifications are different in kind, sit at different layers of the agent stack, and are designed to compose rather than compete.
This piece settles the disambiguation. It is the companion piece to SOM vs llms.txt: When to Use Which, which does the same work for the publisher-side site-introduction layer. Together the two articles cover the most common axes of confusion that publishers and agent authors encounter when deciding what to ship and what to consume.
The short answer
MCP (Model Context Protocol) is a protocol that an AI client uses to talk to its tools. It defines a transport, a request and response shape, and a way for servers to expose tools, resources, and prompts to a client. The client is an LLM-powered application such as Claude Desktop, Cursor, Continue, or Zed. The server is a process that exposes capabilities to that client. MCP is, in essence, the wiring between an agent and the rest of its tool surface.
SOM (Semantic Object Model) is a content format that a publisher uses to describe a web page in agent-native form. It defines the structure of a JSON document that represents the content, regions, elements, and available actions of an addressable page. SOM is, in essence, a wire format for the second reader of the web — see The Web’s Second Reader.
These are different layers. MCP carries payloads. SOM is one of the payloads MCP can carry. A well-built agent stack uses both, and treating them as alternatives is the kind of mistake that suggests the team has not yet drawn the boundaries cleanly in their own architecture.
What MCP actually is
Anthropic introduced the Model Context Protocol on November 25, 2024, with a stated goal of making it easier to connect AI applications to data and tools. The specification has since accumulated multi-vendor adoption: every major AI-native client now speaks MCP, and the directory of open-source MCP servers numbers in the hundreds. It is the closest thing the agent ecosystem has to a settled cross-vendor standard for client-tool wiring.
Concretely, the architecture has three roles:
- A host application, which is the user-facing AI product (Claude Desktop, Cursor, Continue, the Plasmate Notebook, an internal enterprise app).
- One or more MCP clients embedded in the host, each of which holds a one-to-one connection to an MCP server.
- One or more MCP servers, which are independent processes (often local, sometimes remote) that expose tools, resources, and prompts.
Communication between client and server is JSON-RPC 2.0, transported over either standard input/output (for local servers) or HTTP with Server-Sent Events (for remote servers). The protocol exposes a small, deliberately constrained set of primitives:
- Tools — functions the agent can invoke (e.g.
fetch_url,send_email,create_issue). - Resources — read-only data the agent can reference (a file, a database row, a piece of context).
- Prompts — reusable templated instructions the host can offer to the user.
The win of MCP is that any client can connect to any server, and any server can be consumed by any client, without bespoke integration. The same Plasmate-MCP server that a developer connects to Claude Desktop in the morning works unchanged with Cursor in the afternoon. The cost of writing a new tool drops from one-per-client to one-per-tool. This is the same shape of win that Language Server Protocol delivered for code editors in 2016, and the analogy is not accidental — Anthropic’s original framing referenced LSP explicitly.
What SOM actually is
SOM/1.0 is a JSON document that represents an addressable web page in a form designed for AI agent consumption rather than human rendering. The full specification lives at /spec, but the shape is:
{
"som_version": "1.0",
"url": "https://example.com/page",
"title": "Page title",
"lang": "en",
"regions": [
{
"id": "r_main",
"role": "main",
"elements": [
{ "id": "e_3f8a", "role": "heading", "text": "...", "attrs": { "level": 1 } },
{ "id": "e_9d4e", "role": "paragraph", "text": "..." },
{ "id": "e_c082", "role": "link", "text": "...", "actions": ["click"], "attrs": { "href": "/next" } }
]
}
],
"meta": { "html_bytes": 28104, "som_bytes": 412, "compression_ratio": 68.2 }
}The content is flat. Element identifiers are derived from a stable hash so that an agent can refer back to the same element across re-fetches. Roles are typed. Available actions are declared. The format compresses the typical production page by between an order of magnitude and two orders of magnitude versus raw HTML; see the most recent public measurement at three weeks of public benchmarks.
Crucially, SOM is a publisher-side artefact. A publisher who runsexample.com can ship a SOM endpoint at, say, example.com/api/v1/som and advertise it via SOM Directives in robots.txt. Any agent with the discipline to read those directives can then fetch SOM in place of HTML and save the order-of-magnitude token cost.
The most common conflation, and why it persists
The conflation that drives the question “should I use MCP or SOM?” usually comes from one of three reasonable misreadings.
The first misreading treats MCP as if its scope included the format of the data it transports. It does not. MCP is intentionally agnostic about payload content — it transports whatever JSON the tool returns. Whether that JSON is a SOM document, a Markdown blob, a raw HTML string, a database query result, or a custom domain-specific schema is up to the server. MCP is a wire protocol; SOM is one of the wire formats it can carry.
The second misreading treats SOM as if it were a publisher-facing version of MCP. It is not. The publisher does not care about MCP. The publisher cares about being legible to agents, and the legibility surface is HTML for the human reader and SOM for the second reader. The MCP layer is invisible to the publisher entirely. It is a concern for the agent author and the tool-builder, not for the site operator.
The third misreading treats them as competing claims for the same market position. They are not. There is no overlap in their addressable users. MCP is adopted by client teams (whoever ships Claude Desktop, Cursor, an enterprise chat-with-your-data product) and by tool-builders (whoever writes the GitHub MCP server, the Plasmate-MCP server, the Linear MCP server). SOM is adopted by content publishers (whoever runs nytimes.com, stripe.com, kubernetes.io). The two camps share agents as a downstream beneficiary but do not share a customer.
How they compose: a worked example
The cleanest way to see the relationship is to watch a single agent task all the way from user input to final result, paying attention to which layer is active at each step.
Suppose a user asks Claude Desktop, “Summarise the latest pricing changes on stripe.com.” The flow is:
- Claude Desktop (MCP host) receives the request. It has previously been configured with a Plasmate-MCP server providing a
fetch_urltool. - The MCP client embedded in Claude Desktop calls the Plasmate-MCP server with
fetch_url("https://stripe.com/pricing"). This call travels over JSON-RPC 2.0 across the MCP transport. - Plasmate-MCP (the MCP server) receives the tool call. It first checks
stripe.com/robots.txtfor SOM Directives. It finds aSOM-Endpointline. It fetchesstripe.com/api/v1/som?url=...pricing. The response is a SOM document. - The MCP server returns the SOM document to the MCP client as the result of the tool call. The MCP transport carries the JSON unchanged.
- Claude Desktop presents the structured content to the model as tool output. The model, given a few hundred tokens of structured JSON instead of forty thousand tokens of marketing-page HTML, produces a clean summary.
Notice the layering. MCP carried the request and the response. SOM was the format of the response. Stripe (the publisher) did not know any agent visited; it served an HTTP request the same way it serves any other. Claude Desktop (the host) did not know the tool call would resolve via SOM; it received structured JSON and reasoned over it. The Plasmate-MCP server in the middle knew about both protocols and did the work of translating between them.
This is what good infrastructure looks like. Every layer is replaceable. Cursor could substitute for Claude Desktop tomorrow. A different MCP server could substitute for Plasmate-MCP next quarter. Stripe could change its SOM endpoint implementation without anyone noticing. The composability of MCP and SOM is the point.
Where the responsibility for each lives
The cleanest way for a team to internalise the difference is to ask, for each of the two specifications, who on our team owns this?
| Concern | MCP | SOM |
|---|---|---|
| Layer | Transport / protocol | Content / format |
| Cardinality | One per client-server pair | One per addressable page |
| Who ships it | Client teams + tool-builders | Publishers |
| Discovery | Client configuration (server registry) | robots.txt SOM Directives |
| Wire format | JSON-RPC 2.0 over stdio or SSE | JSON document, SOM/1.0 schema |
| Reference impl | modelcontextprotocol.io | plasmate.app |
| Maintainer | Anthropic + multi-vendor community | SOMspec community + W3C CG |
| Validation | Inspector / SDK conformance | somspec.org/validate |
Inside an organisation, MCP is owned by the platform or AI engineering team that ships the agent. SOM is owned by the publishing or web team that ships the site. In a company that does both — say, a SaaS vendor with a developer-facing product and a marketing site — the two efforts run in parallel and do not block each other.
What to do as an agent author
If you are building an agent product (whether the host or a tool-server inside the host), the practical checklist is short.
- Adopt MCP for your client-tool wiring. The compatibility win across the ecosystem is large and the cost of inventing a private protocol is permanent. Any server you build today should expose tools via MCP; any host you ship should consume MCP. Reference: modelcontextprotocol.io.
- Have your fetch tool prefer SOM where it is advertised. When your tool fetches a URL, check
robots.txtfirst; if a SOM endpoint is listed, fetch that instead of the HTML. The token savings are substantial and the publisher has already opted in. - Where SOM is not advertised, derive it locally. The Plasmate engine and similar libraries can produce a SOM document from raw HTML on the fly. The publisher does not benefit, but the model in your client does.
- Return the SOM document as the tool result. Let the model reason over typed structured content rather than raw markup. The reliability and accuracy gains are immediate.
What to do as a publisher
If you are a publisher, MCP is not your concern. SOM is. The full guide lives at /directives, but the short version:
- Generate a SOM document for each page you serve. Most publishers can do this from their existing CMS or rendering pipeline with a single library call.
- Expose a SOM endpoint at a stable URL (typically
/api/v1/som?url=...). - Advertise the endpoint in
robots.txtwith five lines of SOM Directives. - Verify with somready.com.
That is the entire publisher-side surface. You do not need to know what MCP is. Your site has just become legible to the second reader through whatever MCP server an agent author chooses to deploy.
The deeper symmetry
It is worth ending with a structural observation. MCP and SOM are not just non-competing; they are filling complementary gaps that the previous generation of infrastructure left open.
For two decades, two questions have lacked clean cross-vendor answers in agent-shaped systems. The first is, how does an agent talk to its tools? Every framework before MCP solved this with a bespoke convention, and every change of framework forced a rewrite. MCP closed that question. The second question is, how does the web deliver content to an agent? The previous answer was “just send the HTML and let the model figure it out,” which worked for a while at considerable cost. SOM is closing that question.
The two specifications cover the two open boundaries of the agent stack: client to tool, and tool to content. A team that adopts both is, for the first time in the brief history of agent infrastructure, working with cross-vendor settled standards on both sides of its tool layer. That has not been true at any previous moment, and it is the practical reason 2026 feels different from 2024 if you are building agent-native products.
For further reading: the SOM/1.0 specification, the SOM Directives proposal, the reference implementations, and the canonical MCP specification. For the companion comparison piece, see SOM vs llms.txt: When to Use Which.