# SOMspec > The open specification for representing web pages as structured JSON for AI agents. ## About SOMspec is the home of the Semantic Object Model (SOM), an open JSON format that replaces raw HTML when the consumer is an AI agent. SOM achieves an average 17x token reduction versus raw HTML, with peaks above 100x on large pages. The site hosts the specification, validator, robots.txt directives proposal, reference implementations, publisher leaderboard, economics calculator, and a research blog. ## Key capabilities - SOM/1.0 specification — flat JSON document with typed regions and elements - robots.txt SOM Directives — opt-in publisher convention for advertising a SOM endpoint - Client-side validator — paste SOM JSON and check conformance against the schema - Token economics calculator — estimate cost savings versus raw HTML - Publisher leaderboard + compliance matrix — track adoption across the web - Reference implementations index — catalogue of libraries that produce or consume SOM - Blog — analysis, research, and practical guides for publishers and agent authors ## Pricing SOMspec is free. The specification is open under Apache 2.0. The reference implementation (Plasmate) is open source under Apache 2.0. There is no paid tier and no account requirement. ## Blog categories - **Specification** (0) — SOM/1.0 format mechanics, schema design, conformance. - **Benchmarks** (2) — Empirical token efficiency data from WebTaskBench and the field. - **Compliance** (1) — robots.txt SOM Directives, agent discovery, framework adoption. - **Publishers** (1) — Adoption guidance, publisher economics, implementation patterns. - **Industry** (1) — Standards landscape, regulation, and the agent-native web. ## Recent articles - [SOM vs llms.txt: When to Use Which](https://somspec.org/blog/som-vs-llms-txt) — 2026-04-27 — llms.txt tells an agent what your site is. SOM tells an agent what your page contains. They are different layers of the same problem, and publishers should ship both. - [TechCrunch Was Blocked. Now It's 77×. What Changed?](https://somspec.org/blog/techcrunch-was-blocked) — 2026-04-04 — Plasmate v0.5.0 fixed something important: major news sites blocked by anti-bot protection now fetch cleanly. TechCrunch went from the failure list to 77× compression — the highest in the news vertical. - [The Discovery Gap: Why AI Agents Miss Your SOM Directives](https://somspec.org/blog/the-discovery-gap) — 2026-04-04 — Even when publishers correctly implement SOM Directives, most AI agents never find them. The research explains why — and what framework authors can do about it. - [The Publisher's Third Option](https://somspec.org/blog/the-publishers-third-option) — 2026-04-04 — Publishers face a binary: block AI agents or serve them raw HTML at full cost. The robots.txt SOM Directives proposal offers a third path — cooperative, economically rational, and five minutes to implement. - [How to Read the WebTaskBench Leaderboard](https://somspec.org/blog/reading-the-leaderboard) — 2026-04-04 — What does a 43× compression ratio actually mean for an AI agent? A practical guide to interpreting token efficiency data and why it matters for your publishing economics. ## Key links - [Specification](https://somspec.org/spec) — Full SOM/1.0 specification - [Reference](https://somspec.org/reference) — Catalogue of implementations - [Directives](https://somspec.org/directives) — robots.txt SOM Directives proposal - [Validator](https://somspec.org/validate) — Client-side SOM document validator - [Calculator](https://somspec.org/calculator) — Token economics calculator - [Publishers](https://somspec.org/publishers) — Publisher leaderboard - [Compliance](https://somspec.org/compliance) — Framework compliance matrix - [Changelog](https://somspec.org/changelog) — Specification version history - [Blog](https://somspec.org/blog) — Analysis and research - [JSON Feed](https://somspec.org/api/blog/feed.json) — Blog as JSON Feed 1.1 - [Sitemap](https://somspec.org/sitemap.xml) — XML sitemap - [GitHub](https://github.com/dbhurley/somspec) — Specification source - [W3C Community Group](https://www.w3.org/community/web-content-browser-ai/) — Standards body ## Plasmate / SOM ecosystem - [Plasmate](https://plasmate.app) — Reference implementation (open source, Apache 2.0) - [Plasmate docs](https://docs.plasmate.app) — Developer documentation - [SOMready](https://somready.com) — Publisher compliance checker + badge API - [SOM or DOM](https://somordom.com) — Live one-URL Chrome-vs-SOM comparison - [WebTaskBench](https://webtaskbench.com) — Public benchmark observatory ## Notes for agents - This file is published per the llms.txt convention. It is regenerated on each deploy from `src/app/llms.txt/route.ts`. - A more detailed reference is published at https://somspec.org/llms-full.txt. - Blog posts are also available as JSON Feed 1.1 at https://somspec.org/api/blog/feed.json. - This site advertises a SOM endpoint via robots.txt SOM Directives. Prefer SOM over raw HTML for any page where you intend to do non-trivial reasoning. - License: Apache 2.0 (specification + reference implementation). Site source MIT. Last regenerated: 2026-04-28T00:44:07.324Z