Specification

Semantic Object Model v1.0

SOM is an open specification for representing web pages as structured JSON documents optimized for consumption by large language models and AI agents.

Version 1.0April 2026Apache 2.0

Introduction

The Semantic Object Model (SOM) is a JSON-based format for representing the meaningful content and interactive elements of web pages. It is designed as a replacement for raw HTML, Markdown extraction, and accessibility tree dumps when the consumer is an AI agent or large language model.

HTML was designed for browsers. It carries layout directives, styling hooks, script blocks, and deeply nested structures that are expensive to parse and wasteful to include in LLM context windows. Markdown loses interactive elements entirely. Accessibility trees vary across browsers and are not designed for serialization.

SOM addresses these limitations by providing a single, flat, typed representation that preserves both content and interactivity while minimizing token usage. On average, SOM documents use 17x fewer tokens than the equivalent HTML.

Design Goals

SOM is designed around five core principles:

Token efficiency. Minimize the number of tokens an LLM must process to understand a web page. SOM achieves an average 17x reduction compared to raw HTML.
Type safety. Every element has a well-defined role with role-specific attributes. Agents can reason about element types without parsing heuristics.
Interactivity preservation. Interactive elements (links, buttons, inputs, selects) carry explicit action annotations. Agents know what they can do with each element.
Stable references. SHA-256 derived element IDs are deterministic. The same element produces the same ID across page refreshes, enabling reliable multi-step workflows.
Publisher compatibility. SOM documents can be served directly by publishers as an alternative representation of their pages, similar to RSS or JSON feeds.

Document Structure

A SOM document is a single JSON object. The structure is intentionally flat: documents contain regions, regions contain elements. There is no deeper nesting.

Top-level fields

som_version(string)required

The specification version. Currently "1.0". Implementations must reject documents with unrecognized major versions.

url(string)required

The canonical URL of the source page. Used as part of the stable ID hash input.

title(string)required

The document title, extracted from the HTML title element or first h1.

lang(string)

BCP 47 language code (e.g., "en", "ja"). Extracted from the html element lang attribute.

regions(array)required

Ordered list of semantic page regions. Must contain at least one region.

meta(object)required

Compression and structure metadata. See section 8.

structured_data(object)

Extracted semantic data from the page. See section 9.

Regions

Regions represent semantic zones of a web page. They provide a lightweight grouping layer between the document and its elements. Each region has a role that describes its purpose.

Region detection precedence

Implementations must detect regions using the following precedence order:

ARIA roles - Elements with explicit role attributes (role="navigation", role="main")
HTML5 landmarks - Semantic elements (nav, main, aside, header, footer)
Class/ID heuristics - Common naming patterns ("sidebar", "nav", "footer")
Link density analysis - Areas with high link density are classified as navigation
Content heuristics - Text density, heading presence, content patterns
Fallback - Remaining content is grouped under role "generic"

Standard roles

The following region roles are defined: main, navigation, aside, header, footer, search, form, dialog, section, generic.

Region fields

id(string)required

Region identifier, prefixed with "r_" followed by a descriptive slug.

role(string)required

One of the standard region roles listed above.

label(string)

Accessible name of the region if available (from aria-label or aria-labelledby).

elements(array)required

Ordered list of elements within this region.

Elements

Elements are the atomic units of a SOM document. Each element represents a single meaningful content node or interactive control on the page.

Element fields

id(string)required

Stable identifier derived from SHA-256 hash. Format: "e_" + 12 hex chars. See section 7.

role(string)required

One of the 15 defined element types. See section 6.

text(string)required

Visible text content or computed accessible name. Must not be empty.

attrs(object)

Role-specific attributes. The allowed keys depend on the element role.

actions(array)

Available interactions. Values: "click", "type", "select", "toggle", "clear".

hints(object)

CSS-inferred semantic signals. Keys include: visually_hidden, primary, destructive, disabled_visual, truncated.

aria(object)

Dynamic ARIA widget state. Keys include: expanded, checked, selected, disabled, pressed, invalid, required, readonly.

Element ordering

Elements within a region must be ordered by their visual position on the page (top-to-bottom, left-to-right), not by DOM source order. This ensures agents process content in the order a human would read it.

Element Types

SOM defines 15 element types. Each type has a fixed set of allowed attributes. Implementations must map HTML elements to the most appropriate SOM type.

Type	Description	Attributes	Actions
link	Hyperlinks and anchor elements	href, visited	click
button	Clickable controls and submit buttons	type, form_action	click
text_input	Single-line text entry fields	value, placeholder, input_type	type, clear
textarea	Multi-line text entry fields	value, placeholder, rows	type, clear
select	Dropdown menus and listboxes	value, options, multiple	select
checkbox	Toggle checkboxes	checked, value	toggle
radio	Radio button options	checked, value, name	click
heading	Section headings (h1-h6)	level	-
image	Visual content	src, alt, width, height	-
list	Ordered and unordered lists	items, ordered	-
table	Tabular data	headers, rows	-
paragraph	Block-level text content	-	-
section	Content grouping containers	-	-
separator	Visual dividers (hr elements)	-	-
details	Collapsible disclosure widgets	open, summary	toggle

Stable IDs

SOM generates deterministic element identifiers using SHA-256 hashing. This ensures the same element on a page produces the same ID across page loads, enabling agents to build reliable multi-step workflows that reference specific elements.

Hash algorithm

// Hash input construction

input = origin + "|" + role + "|" + accessible_name + "|" + dom_path

id = "e_" + SHA256(input).hex()[0:12]

Components

origin(string)required

The page origin (scheme + host + port). Example: "https://example.com".

role(string)required

The SOM element type (e.g., "link", "button", "heading").

accessible_name(string)required

The computed accessible name of the element, following the W3C Accessible Name computation algorithm.

dom_path(string)required

The simplified CSS path from the document root to the element. Example: "html>body>div>main>p>a".

Guarantees

Deterministic: Same inputs always produce the same ID.
Stable: IDs do not change across page refreshes when content is unchanged.
Unique: Hash collisions are statistically negligible at 12 hex characters (48 bits).
Compact: 14 characters total (e_ prefix + 12 hex) is efficient for token usage.

Meta Block

Every SOM document includes a meta block with compression and structure statistics. This allows consumers to assess document characteristics without parsing the full content.

Fields

html_bytes(integer)required

Size of the original HTML document in bytes, after removing any Content-Encoding.

som_bytes(integer)required

Size of the serialized SOM JSON in bytes (minified, no whitespace).

element_count(integer)required

Total number of elements across all regions.

interactive_count(integer)required

Number of elements that have at least one entry in their actions array.

compression_ratio(number)required

The ratio of html_bytes to som_bytes, rounded to one decimal place.

Structured Data

SOM extracts and normalizes structured data embedded in web pages. This data is included in the structured_data top-level field, making it directly accessible without HTML parsing.

Supported formats

json_ld(array)

All JSON-LD blocks found in the page, parsed into objects and deduplicated.

open_graph(object)

OpenGraph meta tags, with the "og:" prefix stripped from keys.

twitter_card(object)

Twitter Card meta tags, with the "twitter:" prefix stripped from keys.

links(object)

Link relations extracted from link elements: canonical, alternate, prev, next, icon, manifest.

meta(object)

Other meta tags: description, robots, viewport, theme-color, author.

Conformance

An implementation conforms to this specification if it produces JSON documents that satisfy all of the following:

The document is valid JSON.
All required top-level fields are present with correct types.
All regions have valid roles from the standard set.
All elements have valid roles from the 15 defined types.
Element IDs are generated using the specified SHA-256 algorithm.
Element attributes conform to the allowed set for their role.
Elements are ordered by visual position, not DOM order.
The meta block accurately reflects the document statistics.

Implementations may include additional fields not defined in this specification. Consumers must ignore unrecognized fields rather than treating them as errors.

References

[1] Hurley, D. (2026). The Semantic Object Model: A Token-Efficient Web Representation for AI Agents. arXiv cs.IR/cs.AI. dbhurley.com/papers
[2] Hurley, D. (2026). The Agentic Web: Rethinking Web Infrastructure for Machine Consumption. arXiv cs.AI/cs.CY. dbhurley.com/papers
[3] Hurley, D. (2026). Agent Web Protocol: A Purpose-Built Communication Protocol for AI Agent-Web Interaction. arXiv cs.NI/cs.SE. dbhurley.com/papers
[4] Hurley, D. (2026). Cooperative Content Negotiation for the Agentic Web: Extending robots.txt for AI Agents. arXiv cs.CY/cs.IR. dbhurley.com/papers
[5] Hurley, D. (2026). The Hidden Tax: Quantifying Token Waste in Agent-Web Interaction. arXiv cs.AI/cs.CY. dbhurley.com/papers
[6] Hurley, D. (2026). Does Format Matter? Agent Task Performance Across Web Representations. arXiv. dbhurley.com/papers
[7] Hurley, D. (2026). The Publisher's Calculus: A Cost-Benefit Analysis of Serving Structured Representations to AI Agents. arXiv cs.AI/cs.CY. dbhurley.com/papers
[8] Hurley, D. (2026). Information Fidelity Under Semantic Compression. arXiv cs.AI/cs.CY. dbhurley.com/papers
[9] Hurley, D. (2026). Agent Compliance with robots.txt SOM Directives: Empirical Evidence of the Discovery Gap. arXiv cs.AI/cs.CY. dbhurley.com/papers

Semantic Object Model v1.0

Introduction

Design Goals

Document Structure

Top-level fields

Regions

Region detection precedence

Standard roles

Region fields

Elements

Element fields

Element ordering

Element Types

Stable IDs

Hash algorithm

Components

Guarantees

Meta Block

Fields

Structured Data

Supported formats

Conformance

See also

References