Specification

Semantic Object Model v1.0

SOM is an open specification for representing web pages as structured JSON documents optimized for consumption by large language models and AI agents.

Version 1.0April 2026Apache 2.0

01

Introduction

The Semantic Object Model (SOM) is a JSON-based format for representing the meaningful content and interactive elements of web pages. It is designed as a replacement for raw HTML, Markdown extraction, and accessibility tree dumps when the consumer is an AI agent or large language model.

HTML was designed for browsers. It carries layout directives, styling hooks, script blocks, and deeply nested structures that are expensive to parse and wasteful to include in LLM context windows. Markdown loses interactive elements entirely. Accessibility trees vary across browsers and are not designed for serialization.

SOM addresses these limitations by providing a single, flat, typed representation that preserves both content and interactivity while minimizing token usage. On average, SOM documents use 17x fewer tokens than the equivalent HTML.

02

Design Goals

SOM is designed around five core principles:

  • Token efficiency. Minimize the number of tokens an LLM must process to understand a web page. SOM achieves an average 17x reduction compared to raw HTML.
  • Type safety. Every element has a well-defined role with role-specific attributes. Agents can reason about element types without parsing heuristics.
  • Interactivity preservation. Interactive elements (links, buttons, inputs, selects) carry explicit action annotations. Agents know what they can do with each element.
  • Stable references. SHA-256 derived element IDs are deterministic. The same element produces the same ID across page refreshes, enabling reliable multi-step workflows.
  • Publisher compatibility. SOM documents can be served directly by publishers as an alternative representation of their pages, similar to RSS or JSON feeds.
03

Document Structure

A SOM document is a single JSON object. The structure is intentionally flat: documents contain regions, regions contain elements. There is no deeper nesting.

Top-level fields

som_version(string)required

The specification version. Currently "1.0". Implementations must reject documents with unrecognized major versions.

url(string)required

The canonical URL of the source page. Used as part of the stable ID hash input.

title(string)required

The document title, extracted from the HTML title element or first h1.

lang(string)

BCP 47 language code (e.g., "en", "ja"). Extracted from the html element lang attribute.

regions(array)required

Ordered list of semantic page regions. Must contain at least one region.

meta(object)required

Compression and structure metadata. See section 8.

structured_data(object)

Extracted semantic data from the page. See section 9.

04

Regions

Regions represent semantic zones of a web page. They provide a lightweight grouping layer between the document and its elements. Each region has a role that describes its purpose.

Region detection precedence

Implementations must detect regions using the following precedence order:

  1. ARIA roles - Elements with explicit role attributes (role="navigation", role="main")
  2. HTML5 landmarks - Semantic elements (nav, main, aside, header, footer)
  3. Class/ID heuristics - Common naming patterns ("sidebar", "nav", "footer")
  4. Link density analysis - Areas with high link density are classified as navigation
  5. Content heuristics - Text density, heading presence, content patterns
  6. Fallback - Remaining content is grouped under role "generic"

Standard roles

The following region roles are defined: main, navigation, aside, header, footer, search, form, dialog, section, generic.

Region fields

id(string)required

Region identifier, prefixed with "r_" followed by a descriptive slug.

role(string)required

One of the standard region roles listed above.

label(string)

Accessible name of the region if available (from aria-label or aria-labelledby).

elements(array)required

Ordered list of elements within this region.

05

Elements

Elements are the atomic units of a SOM document. Each element represents a single meaningful content node or interactive control on the page.

Element fields

id(string)required

Stable identifier derived from SHA-256 hash. Format: "e_" + 12 hex chars. See section 7.

role(string)required

One of the 15 defined element types. See section 6.

text(string)required

Visible text content or computed accessible name. Must not be empty.

attrs(object)

Role-specific attributes. The allowed keys depend on the element role.

actions(array)

Available interactions. Values: "click", "type", "select", "toggle", "clear".

hints(object)

CSS-inferred semantic signals. Keys include: visually_hidden, primary, destructive, disabled_visual, truncated.

aria(object)

Dynamic ARIA widget state. Keys include: expanded, checked, selected, disabled, pressed, invalid, required, readonly.

Element ordering

Elements within a region must be ordered by their visual position on the page (top-to-bottom, left-to-right), not by DOM source order. This ensures agents process content in the order a human would read it.

06

Element Types

SOM defines 15 element types. Each type has a fixed set of allowed attributes. Implementations must map HTML elements to the most appropriate SOM type.

TypeDescriptionAttributesActions
linkHyperlinks and anchor elementshref, visitedclick
buttonClickable controls and submit buttonstype, form_actionclick
text_inputSingle-line text entry fieldsvalue, placeholder, input_typetype, clear
textareaMulti-line text entry fieldsvalue, placeholder, rowstype, clear
selectDropdown menus and listboxesvalue, options, multipleselect
checkboxToggle checkboxeschecked, valuetoggle
radioRadio button optionschecked, value, nameclick
headingSection headings (h1-h6)level-
imageVisual contentsrc, alt, width, height-
listOrdered and unordered listsitems, ordered-
tableTabular dataheaders, rows-
paragraphBlock-level text content--
sectionContent grouping containers--
separatorVisual dividers (hr elements)--
detailsCollapsible disclosure widgetsopen, summarytoggle
07

Stable IDs

SOM generates deterministic element identifiers using SHA-256 hashing. This ensures the same element on a page produces the same ID across page loads, enabling agents to build reliable multi-step workflows that reference specific elements.

Hash algorithm

// Hash input construction

input = origin + "|" + role + "|" + accessible_name + "|" + dom_path

id = "e_" + SHA256(input).hex()[0:12]

Components

origin(string)required

The page origin (scheme + host + port). Example: "https://example.com".

role(string)required

The SOM element type (e.g., "link", "button", "heading").

accessible_name(string)required

The computed accessible name of the element, following the W3C Accessible Name computation algorithm.

dom_path(string)required

The simplified CSS path from the document root to the element. Example: "html>body>div>main>p>a".

Guarantees

  • Deterministic: Same inputs always produce the same ID.
  • Stable: IDs do not change across page refreshes when content is unchanged.
  • Unique: Hash collisions are statistically negligible at 12 hex characters (48 bits).
  • Compact: 14 characters total (e_ prefix + 12 hex) is efficient for token usage.
08

Meta Block

Every SOM document includes a meta block with compression and structure statistics. This allows consumers to assess document characteristics without parsing the full content.

Fields

html_bytes(integer)required

Size of the original HTML document in bytes, after removing any Content-Encoding.

som_bytes(integer)required

Size of the serialized SOM JSON in bytes (minified, no whitespace).

element_count(integer)required

Total number of elements across all regions.

interactive_count(integer)required

Number of elements that have at least one entry in their actions array.

compression_ratio(number)required

The ratio of html_bytes to som_bytes, rounded to one decimal place.

09

Structured Data

SOM extracts and normalizes structured data embedded in web pages. This data is included in the structured_data top-level field, making it directly accessible without HTML parsing.

Supported formats

json_ld(array)

All JSON-LD blocks found in the page, parsed into objects and deduplicated.

open_graph(object)

OpenGraph meta tags, with the "og:" prefix stripped from keys.

twitter_card(object)

Twitter Card meta tags, with the "twitter:" prefix stripped from keys.

links(object)

Link relations extracted from link elements: canonical, alternate, prev, next, icon, manifest.

meta(object)

Other meta tags: description, robots, viewport, theme-color, author.

10

Conformance

An implementation conforms to this specification if it produces JSON documents that satisfy all of the following:

  1. The document is valid JSON.
  2. All required top-level fields are present with correct types.
  3. All regions have valid roles from the standard set.
  4. All elements have valid roles from the 15 defined types.
  5. Element IDs are generated using the specified SHA-256 algorithm.
  6. Element attributes conform to the allowed set for their role.
  7. Elements are ordered by visual position, not DOM order.
  8. The meta block accurately reflects the document statistics.

Implementations may include additional fields not defined in this specification. Consumers must ignore unrecognized fields rather than treating them as errors.


See also

  • API Reference — Complete reference for all element types, region roles, attributes, and actions.
  • Changelog — Version history of the SOM specification.