Related Formats
Prior to SOM, agent pipelines consumed web content as raw HTML, stripped Markdown, or accessibility trees — each a repurposing of a format designed for other consumers. The table below characterises the trade-offs.
| Property | HTML | Markdown | A11y Tree | SOM |
|---|---|---|---|---|
| Token overheadRelative to content density | High | Moderate | Moderate | Minimal |
| Structural typingTyped element roles and semantic regions | None | None | Partial | Complete |
| Interactivity preservedClickable, typeable, scrollable elements | Raw attributes | Not preserved | Present | Typed with actions |
| Stable element IDsReproducible across independent fetches | None | None | None | SHA-256 derived |
| Publisher-servableCacheable as an alternate representation | Yes | Yes | No | Yes |
| Approx. tokens per pageMedian across 51 representative sites | ~80,000 | ~12,000 | ~8,000 | ~4,600 |
Token estimates derived from the Plasmate benchmark suite (51 sites, April 2026). A11y Tree figures represent Playwright accessibility snapshot output. SOM figures represent plasmate fetch output without selector filtering. | ||||
The Specification
SOM v1.0 defines a compact, typed JSON representation of web pages. Explore the core concepts below.
Every SOM document is a single JSON object with the following top-level fields:
- som_version (string, required) - Specification version, currently "1.0"
- url (string, required) - The canonical URL of the source page
- title (string, required) - The document title extracted from the page
- lang (string, optional) - BCP 47 language code (e.g., "en", "fr")
- regions (array, required) - Ordered list of semantic page regions
- meta (object, required) - Compression and structure metadata
- structured_data (object, optional) - Extracted semantic data (JSON-LD, OpenGraph, etc.)
The document structure is intentionally flat. There is exactly one level of nesting: document contains regions, regions contain elements. This avoids the deeply nested trees that make HTML expensive for LLMs to process.
Get Started
Install the reference implementation and start converting pages to SOM in seconds.
# Installnpm install -g plasmate# or: brew install plasmate-labs/tap/plasmate# Fetch any page as SOMplasmate fetch https://example.com# With selector to strip nav/footerplasmate fetch https://example.com --selector main# Compile existing HTML to SOMcat page.html | plasmate compileImplementations
Tools and libraries that produce or consume SOM documents.
Plasmate
Reference ImplementationOpen source Rust engine. CLI + MCP server + WASM.
plasmate.appplasmate-wasm
SOM compiler as WebAssembly. Run in Node.js, Deno, edge workers.
npm: plasmate-wasmplasmate-python
Python SDK with async support.
PyPI: plasmateplasmate-mcp
MCP server. Works with Claude Desktop, Cursor, VS Code Copilot, Windsurf.
npm: plasmate-mcpsomordom.com
Browser-based SOM vs DOM comparison tool with badges and certifications.
somordom.com