← Writing
·4 min read
BenchmarksToken EconomicsGuide

How to Read the WebTaskBench Leaderboard

webtaskbench.com publishes compression ratios comparing raw HTML token counts to SOM token counts across 44 major websites. If you have looked at the leaderboard and wondered what a 117.9× ratio actually means for your publishing operation, this is the practical guide.

What the ratio measures

The ratio is simple: html_tokens / som_tokens. A ratio of 43× for nytimes.com means the SOM representation of a New York Times page uses 43 times fewer tokens than the raw HTML. Not 43% fewer — 43 times fewer.

The tokenizer is tiktoken cl100k_base, the same tokenizer used by GPT-3.5, GPT-4, and GPT-4o. This makes the numbers directly comparable to actual API costs. When the leaderboard says a page is 45,000 HTML tokens, that is the same 45,000 tokens your API bill reflects.

Why the range is so wide

The leaderboard shows ratios from roughly 4× to 118×. This range is not noise — it reflects genuine structural differences in how websites are built.

Cloud dashboard pages like cloud.google.com have extreme ratios (117.9×) because their raw HTML is dominated by large navigation structures, dynamically loaded JavaScript bundles, and deeply nested component trees. The actual semantic content — the text, the headings, the links, the interactive elements a user or agent cares about — is a small fraction of the total HTML payload. SOM extracts that fraction. The rest disappears.

Simple editorial pages compress less dramatically because their HTML is already closer to their content. A well-structured article page with minimal JavaScript might compress only 4–8×. That is still significant — it still means 75–87% fewer tokens — but it is not the headline-grabbing 100× number. The ratio tells you how much of your HTML is not your content. A high ratio means your pages are heavy with structure. A low ratio means your HTML is already relatively clean.

The dollar math

The practical formula is straightforward:

daily_html_cost = pages_per_day × html_tokens × token_price

daily_som_cost  = pages_per_day × som_tokens × token_price

daily_savings   = daily_html_cost − daily_som_cost

At GPT-4o pricing ($0.0000025 per token), a publisher serving 10,000 AI agent page views per day on a site averaging 45,000 HTML tokens per page spends $1,125 per day in downstream token costs. At 17× compression, that drops to $66 per day. The savings are $1,059 per day — $31,770 per month — and they scale linearly with traffic.

The economics calculator on this site will compute these numbers for your specific traffic volume and compression ratio.

What the leaderboard does not measure

Compression is necessary but not sufficient. A compressed page that loses meaningful content is worse than a larger page that retains it. Token efficiency matters only if the resulting representation still contains the information the agent needs to complete its task.

The companion research — Does Format Matter? (Hurley, 2026) — addresses this directly. It measures task completion accuracy across formats: raw HTML, stripped Markdown, accessibility tree, and SOM. The finding is that SOM maintains task accuracy while substantially reducing token count. But this is worth verifying for your own content type and your own agent workflows. Compression without fidelity is just data loss.

Start with your own numbers

The leaderboard is a starting point, not a guarantee. Your actual compression ratio depends on your site’s HTML structure, your JavaScript footprint, your navigation complexity, and the density of your semantic content relative to your total markup.

The only way to know your number is to measure it. Use somready.com/check to see if your site already has SOM configured, or look up your domain on webtaskbench.com to see if it appears in the benchmark dataset. If it does not, the reference implementation at the specification page includes tooling to generate SOM for any URL and measure the resulting compression.