Semantic Object Model v1.0
SOM is an open specification for representing web pages as structured JSON documents optimized for consumption by large language models and AI agents.
Introduction
The Semantic Object Model (SOM) is a JSON-based format for representing the meaningful content and interactive elements of web pages. It is designed as a replacement for raw HTML, Markdown extraction, and accessibility tree dumps when the consumer is an AI agent or large language model.
HTML was designed for browsers. It carries layout directives, styling hooks, script blocks, and deeply nested structures that are expensive to parse and wasteful to include in LLM context windows. Markdown loses interactive elements entirely. Accessibility trees vary across browsers and are not designed for serialization.
SOM addresses these limitations by providing a single, flat, typed representation that preserves both content and interactivity while minimizing token usage. On average, SOM documents use 17x fewer tokens than the equivalent HTML.
Design Goals
SOM is designed around five core principles:
- Token efficiency. Minimize the number of tokens an LLM must process to understand a web page. SOM achieves an average 17x reduction compared to raw HTML.
- Type safety. Every element has a well-defined role with role-specific attributes. Agents can reason about element types without parsing heuristics.
- Interactivity preservation. Interactive elements (links, buttons, inputs, selects) carry explicit action annotations. Agents know what they can do with each element.
- Stable references. SHA-256 derived element IDs are deterministic. The same element produces the same ID across page refreshes, enabling reliable multi-step workflows.
- Publisher compatibility. SOM documents can be served directly by publishers as an alternative representation of their pages, similar to RSS or JSON feeds.
Document Structure
A SOM document is a single JSON object. The structure is intentionally flat: documents contain regions, regions contain elements. There is no deeper nesting.
Top-level fields
som_version(string)requiredThe specification version. Currently "1.0". Implementations must reject documents with unrecognized major versions.
url(string)requiredThe canonical URL of the source page. Used as part of the stable ID hash input.
title(string)requiredThe document title, extracted from the HTML title element or first h1.
lang(string)BCP 47 language code (e.g., "en", "ja"). Extracted from the html element lang attribute.
regions(array)requiredOrdered list of semantic page regions. Must contain at least one region.
meta(object)requiredCompression and structure metadata. See section 8.
structured_data(object)Extracted semantic data from the page. See section 9.
Regions
Regions represent semantic zones of a web page. They provide a lightweight grouping layer between the document and its elements. Each region has a role that describes its purpose.
Region detection precedence
Implementations must detect regions using the following precedence order:
- ARIA roles - Elements with explicit role attributes (role="navigation", role="main")
- HTML5 landmarks - Semantic elements (nav, main, aside, header, footer)
- Class/ID heuristics - Common naming patterns ("sidebar", "nav", "footer")
- Link density analysis - Areas with high link density are classified as navigation
- Content heuristics - Text density, heading presence, content patterns
- Fallback - Remaining content is grouped under role "generic"
Standard roles
The following region roles are defined: main, navigation, aside, header, footer, search, form, dialog, section, generic.
Region fields
id(string)requiredRegion identifier, prefixed with "r_" followed by a descriptive slug.
role(string)requiredOne of the standard region roles listed above.
label(string)Accessible name of the region if available (from aria-label or aria-labelledby).
elements(array)requiredOrdered list of elements within this region.
Elements
Elements are the atomic units of a SOM document. Each element represents a single meaningful content node or interactive control on the page.
Element fields
id(string)requiredStable identifier derived from SHA-256 hash. Format: "e_" + 12 hex chars. See section 7.
role(string)requiredOne of the 15 defined element types. See section 6.
text(string)requiredVisible text content or computed accessible name. Must not be empty.
attrs(object)Role-specific attributes. The allowed keys depend on the element role.
actions(array)Available interactions. Values: "click", "type", "select", "toggle", "clear".
hints(object)CSS-inferred semantic signals. Keys include: visually_hidden, primary, destructive, disabled_visual, truncated.
aria(object)Dynamic ARIA widget state. Keys include: expanded, checked, selected, disabled, pressed, invalid, required, readonly.
Element ordering
Elements within a region must be ordered by their visual position on the page (top-to-bottom, left-to-right), not by DOM source order. This ensures agents process content in the order a human would read it.
Element Types
SOM defines 15 element types. Each type has a fixed set of allowed attributes. Implementations must map HTML elements to the most appropriate SOM type.
| Type | Description | Attributes | Actions |
|---|---|---|---|
| link | Hyperlinks and anchor elements | href, visited | click |
| button | Clickable controls and submit buttons | type, form_action | click |
| text_input | Single-line text entry fields | value, placeholder, input_type | type, clear |
| textarea | Multi-line text entry fields | value, placeholder, rows | type, clear |
| select | Dropdown menus and listboxes | value, options, multiple | select |
| checkbox | Toggle checkboxes | checked, value | toggle |
| radio | Radio button options | checked, value, name | click |
| heading | Section headings (h1-h6) | level | - |
| image | Visual content | src, alt, width, height | - |
| list | Ordered and unordered lists | items, ordered | - |
| table | Tabular data | headers, rows | - |
| paragraph | Block-level text content | - | - |
| section | Content grouping containers | - | - |
| separator | Visual dividers (hr elements) | - | - |
| details | Collapsible disclosure widgets | open, summary | toggle |
Stable IDs
SOM generates deterministic element identifiers using SHA-256 hashing. This ensures the same element on a page produces the same ID across page loads, enabling agents to build reliable multi-step workflows that reference specific elements.
Hash algorithm
// Hash input construction
input = origin + "|" + role + "|" + accessible_name + "|" + dom_path
id = "e_" + SHA256(input).hex()[0:12]
Components
origin(string)requiredThe page origin (scheme + host + port). Example: "https://example.com".
role(string)requiredThe SOM element type (e.g., "link", "button", "heading").
accessible_name(string)requiredThe computed accessible name of the element, following the W3C Accessible Name computation algorithm.
dom_path(string)requiredThe simplified CSS path from the document root to the element. Example: "html>body>div>main>p>a".
Guarantees
- Deterministic: Same inputs always produce the same ID.
- Stable: IDs do not change across page refreshes when content is unchanged.
- Unique: Hash collisions are statistically negligible at 12 hex characters (48 bits).
- Compact: 14 characters total (e_ prefix + 12 hex) is efficient for token usage.
Meta Block
Every SOM document includes a meta block with compression and structure statistics. This allows consumers to assess document characteristics without parsing the full content.
Fields
html_bytes(integer)requiredSize of the original HTML document in bytes, after removing any Content-Encoding.
som_bytes(integer)requiredSize of the serialized SOM JSON in bytes (minified, no whitespace).
element_count(integer)requiredTotal number of elements across all regions.
interactive_count(integer)requiredNumber of elements that have at least one entry in their actions array.
compression_ratio(number)requiredThe ratio of html_bytes to som_bytes, rounded to one decimal place.
Structured Data
SOM extracts and normalizes structured data embedded in web pages. This data is included in the structured_data top-level field, making it directly accessible without HTML parsing.
Supported formats
json_ld(array)All JSON-LD blocks found in the page, parsed into objects and deduplicated.
open_graph(object)OpenGraph meta tags, with the "og:" prefix stripped from keys.
twitter_card(object)Twitter Card meta tags, with the "twitter:" prefix stripped from keys.
links(object)Link relations extracted from link elements: canonical, alternate, prev, next, icon, manifest.
meta(object)Other meta tags: description, robots, viewport, theme-color, author.
Conformance
An implementation conforms to this specification if it produces JSON documents that satisfy all of the following:
- The document is valid JSON.
- All required top-level fields are present with correct types.
- All regions have valid roles from the standard set.
- All elements have valid roles from the 15 defined types.
- Element IDs are generated using the specified SHA-256 algorithm.
- Element attributes conform to the allowed set for their role.
- Elements are ordered by visual position, not DOM order.
- The meta block accurately reflects the document statistics.
Implementations may include additional fields not defined in this specification. Consumers must ignore unrecognized fields rather than treating them as errors.
See also
- API Reference — Complete reference for all element types, region roles, attributes, and actions.
- Changelog — Version history of the SOM specification.