Menú de Accesibilidad saltar al contenido
day3

Token Optimization – Getting More from Less

13 min de lectura

Token Optimization – Getting More from Less

Yesterday we showed you how to build custom agents with surgical tool selection. Today, we’re diving deeper: Token Optimization.

Selecting the right tools is only half the battle. The real game-changer is optimizing what those tools return. We’ve re-architected our data pipelines to deliver maximum accuracy while using 40-80% fewer tokens on outputs.

Here’s how we did it.

The Problem: Data Bloat

When you call a tool like scrape_as_markdown or search_engine, the API returns rich data. But here’s the catch: most of that data is formatted for humans, not LLMs.

Traditional APIs include unnecessary overhead:

  • Redundant formatting (bold, italic, headings) that LLMs don’t need
  • Ads and sponsored content mixed with organic results
  • Image metadata and visual elements that waste tokens
  • Inconsistent field naming and redundant metadata

For a typical web page scrape or search query, you’re often getting 3-5x more data than the LLM actually needs for reasoning.

The Solution: Two-Layer Token Optimization

We’ve implemented a layered optimization strategy that targets different types of data:

  1. Remark + Strip-Markdown for web page content (scrape_as_markdown)
  2. Parsed Light + Payload Cleaning for search engine results (search_engine)

Let’s break down each layer.

But Wait-Why Not TOON?

You might be wondering: what about TOON (Token-Oriented Object Notation)? We initially explored it as a third optimization layer for structured datasets like LinkedIn profiles and Amazon products.

TOON is a clever format that uses indentation and tabular layouts to reduce tokens. On paper, it delivers 30-60% savings for uniform arrays of identical objects. But when we tested it on real-world API responses from Bright Data, we discovered something important:

The delimiter isn’t the bottleneck-the data itself is.

The Delimiter Illusion

Looking at a typical LinkedIn profile response, most tokens come from:

  • Long text fields (aboutrecommendationsactivity[].title)
  • Long URLs (avatarbanner_imageactivity[].linkcredential_url)

The delimiter (\n|\t) is a tiny fraction of the total token count.

Newline (\n) is already:

  • single, very common token in all major LLM tokenizers
  • Naturally aligned with how models chunk text (line-oriented)
  • Doesn’t appear inside URLs or most text, avoiding escaping issues

Exotic separators like |^, or \x1F might reduce quoting in a few spots, but they often introduce rare multi-token sequences that cancel out any gains.

Short answer: If you only tweak the delimiter, \n is already about as good as it gets for this kind of data.

Where TOON Falls Short

TOON shines for uniform arrays of identical objects—think 1,000 employee records with the same schema. But real-world web data from tools like web_data_linkedin_person_profile or web_data_amazon_product is:

  • Heterogeneous — Nested objects with different schemas (experienceeducationactivity arrays)
  • Non-uniform — Mixed array types (some entries have img, others don’t)
  • Single-object responses — Most API calls return 1 profile or 1 product, not 1,000

For deeply nested or non-uniform structures, minified JSON often uses fewer tokens than TOON. The TOON spec itself admits this—TOON can actually use more tokens than compact JSON for single objects with deep nesting.

The Real Lever: Change What You Send, Not How You Format It

Here’s the insight that matters: Any format-level optimization (JSON vs TOON vs YAML) is dwarfed by simply changing what data you send.

We don’t do all of that—our tools return the full data from Bright Data’s APIs. But we do strip null values, which appear frequently in web scraping responses and waste tokens without adding information.

The point is: delimiter tweaks save ~5-10% at best. Content filtering saves 20-80%. TOON optimizes the wrong variable for real-world web data.

Tooling Immaturity

TOON is also brand new—the first commit to the spec was November 2nd, 2024. It’s literally a month old. JSON has validators, editors, and libraries in every language. TOON requires custom parsing and lacks ecosystem support.

One engineer put it well: “First time I saw TOON, it looked like someone’s half-finished scratchpad. Show it to your backend engineer, and there’s a chance they’ll frown like you brought them a new problem.”

Our Decision

After benchmarking TOON on real Bright Data payloads (LinkedIn profiles, Amazon products, Google SERPs), we concluded:

  • For search results: Bright Data’s Parsed Light format (see Layer 2 below) delivers 80% token reduction by filtering at the API level—no custom encoding needed.
  • For web scraping: Strip-markdown reduces tokens by 40% while keeping responses human-readable—no new format required.
  • For structured datasets: The real wins come from dropping fields and truncating text, not from replacing JSON with TOON.

TOON is a brilliant idea for the right use case (massive uniform datasets). But for heterogeneous web API responses, standard optimizations beat exotic formats every time.


Layer 1: Remark + Strip-Markdown for Web Scraping

The Challenge: Markdown Bloat

Our scrape_as_markdown tool converts any web page into clean, LLM-friendly markdown. But raw markdown converters often include:

  • Redundant formatting (bold, italic, headings) that LLMs don’t need for reasoning
  • Image alt-text and metadata
  • Empty lines and spacing inconsistencies

For a typical blog post or product page, raw markdown can be 3-5x longer than the core content.

The Solution: Strip-Markdown

We use remark + strip-markdown to intelligently reduce markdown to plain text while preserving structure:

We’re grateful to the remark project for their excellent markdown processing library. Consider supporting their work!

import {remark} from 'remark';
import strip from 'strip-markdown';

// Inside scrape_as_markdown tool
const minified_data = await remark()
    .use(strip)
    .process(response.data);
return minified_data.value;

What Gets Stripped?

The strip-markdown plugin removes:

  • Bold/Italic — **Important** becomes Important
  • Image syntax — ![alt text](image.png) becomes alt text (if needed) or empty
  • Headings — ### Section Title becomes Section Title (preserves text, drops markup)
  • Code blocks — Reduces backticks and formatting while keeping content

The result? Plain text that retains the semantic meaning but drops all the formatting overhead.

Example: Before and After

Raw Markdown (from Web Unlocker):

# Product Reviews

## Customer Feedback

- **John D.** - ⭐⭐⭐⭐⭐
  *"Great product! Highly recommend."*
  [Read more](https://example.com/review/123)

- **Sarah M.** - ⭐⭐⭐⭐
  *"Good value for money."*
  [Read more](https://example.com/review/456)

![Product Image](https://cdn.example.com/product.jpg)

[Buy Now](https://example.com/buy)

After remark().use(strip).process():

Product Reviews

Customer Feedback

John D. - ⭐⭐⭐⭐⭐
"Great product! Highly recommend."
Read more

Sarah M. - ⭐⭐⭐⭐
"Good value for money."
Read more

Product Image

Buy Now

Token reduction: ~40% for a full page.

The LLM still gets all the review text, ratings, and call-to-action, but without the link URLs, image paths, or markdown formatting syntax.

When to Use Stripped Markdown

This optimization is perfect for:

  • Summarization tasks — “Summarize this blog post”
  • Sentiment analysis — “What do customers think about this product?”
  • Entity extraction — “Extract company names and contact info from this page”

If your agent needs to click links or navigate the page, use our Scraping Browser tools instead (scraping_browser_navigatescraping_browser_snapshot).


Layer 2: Parsed Light – Engineered for AI Agents

The Problem: Traditional SERP APIs Weren’t Built for LLMs

Traditional search engine result page (SERP) APIs were designed for humans browsing web interfaces. They return everything:

  • Ads and sponsored content your agent doesn’t need
  • Knowledge panels and featured snippets that bloat responses
  • Redundant metadata fields across multiple naming conventions
  • Visual elements (thumbnails, favicons) that waste tokens
  • Related searches, autocomplete suggestions, and “people also ask” sections

The result? A single search for 10 results can return 2,000-3,000 tokens of JSON, when your LLM agent only needs link + title + description.

For AI agents running multi-step research workflows, this is a dealbreaker. Every extra token compounds across the context window, limiting how many queries you can run before hitting limits.

The Solution: Bright Data’s Parsed Light Format

We’ve introduced Parsed Light API response format—purpose-built for AI agents that need speed and efficiency.

Here’s what makes it different:

  • Top 10 organic results only — No ads, no knowledge panels, no sidebar clutter
  • Consistent field structure — Every result has linktitledescription, and optional global_rank
  • Clean by design — Pre-optimized at the API level, so you don’t need complex post-processing
  • Faster response times — Smaller payloads = faster network transfer and parsing

Instead of wrestling with inconsistent field names and bloated responses, Parsed Light delivers exactly what AI agents need: actionable search results in minimal tokens.

Parsed Light in Action

When you call our search_engine tool with Google as the engine, we automatically request Bright Data’s parsed_light format:

// Inside search_engine tool (for Google)
const response = await axios({
    url: 'https://api.brightdata.com/request',
    method: 'POST',
    data: {
        url: search_url('google', query, cursor),
        zone: ctx.unlocker_zone,
        format: 'raw',
        data_format: 'parsed_light',  // ← The magic parameter
    },
    headers: api_headers(ctx.api_token, ctx.client_name, 'search_engine'),
    responseType: 'text',
});

What You Get: Clean, Predictable JSON

Here’s an actual Parsed Light response for a search query:

{
  "organic": [
    {
      "link": "https://example.com/pizza",
      "title": "Best Pizza in NYC - Joe's Pizza",
      "description": "Family-owned pizzeria serving authentic New York slices since 1975...",
      "global_rank": 1
    },
    {
      "link": "https://example.com/pizza-guide",
      "title": "Top 10 Pizza Places in NYC",
      "description": "Discover the highest-rated pizza restaurants across all five boroughs...",
      "global_rank": 2,
      "extensions": [
        {
          "type": "site_link",
          "link": "https://example.com/pizza-guide/brooklyn",
          "text": "Brooklyn"
        }
      ]
    }
    // ... 8 more results
  ]
}

Notice what’s not there:

  • No ads or sponsored listings
  • No knowledge graph panels
  • No “people also ask” sections
  • No redundant metadata fields
  • No unicode control characters or formatting noise

Just 10 clean, ranked search results ready for your LLM to process.

Additional Cleanup: The Final Polish

Even with Parsed Light doing the heavy lifting, we apply a lightweight post-processing step to ensure perfect consistency:

function clean_google_search_payload(raw_data){
    const data = raw_data && typeof raw_data=='object' ? raw_data : {};
    const organic = Array.isArray(data.organic) ? data.organic : [];
    const pagination = data.pagination && typeof data.pagination=='object'
        ? data.pagination : {};

    // Normalize to just link, title, description
    const organic_clean = organic
        .map(entry=>{
            if (!entry || typeof entry!='object')
                return null;
            const link = typeof entry.link=='string' ? entry.link.trim() : '';
            const title = typeof entry.title=='string'
                ? entry.title.trim() : '';
            const description = typeof entry.description=='string'
                ? entry.description.trim() : '';
            if (!link || !title)
                return null;  // Skip invalid entries
            return {link, title, description};
        })
        .filter(Boolean);

    const parsed_page = Number(pagination.current_page);
    const current_page = Number.isFinite(parsed_page) && parsed_page>0
        ? parsed_page : 1;

    return {organic: organic_clean, current_page};
}

This final cleanup:

  1. Validates data types — Ensures linktitle, and description are strings
  2. Trims whitespace — Removes any leading/trailing spaces
  3. Filters invalid entries — Skips results missing required fields
  4. Normalizes pagination — Converts current_page to a consistent number format
  5. Strips optional fields — Removes global_rank and extensions to keep responses ultra-minimal

The result is bulletproof JSON that your LLM can parse with zero edge cases.

Example: Traditional vs. Parsed Light

Traditional SERP API (before Parsed Light):

{
  "ads": [...],
  "organic": [
    {
      "link": "https://example.com/product",
      "url": "https://example.com/product",
      "cache": {"url": "https://webcache.google.com/..."},
      "title": "Amazing\u2003Product\u2003\u2003Review",
      "heading": "Amazing Product Review",
      "name": "Product Review",
      "description": "This   is   a   great   product...",
      "snippet": "This is a great product...",
      "snippet_long": "This is a great product with many features...",
      "subtitle": "Product features",
      "rating": 4.5,
      "price": "$49.99",
      "image": "https://cdn.example.com/image.jpg",
      "favicon": "https://example.com/favicon.ico"
    }
    // ... 30+ more results including ads, knowledge panels, etc.
  ],
  "knowledge_graph": {...},
  "people_also_ask": [...],
  "related_searches": [...],
  "pagination": {...}
}

~2,500 tokens for a typical response.

Parsed Light (optimized for AI agents):

{
  "organic": [
    {
      "link": "https://example.com/product",
      "title": "Amazing Product Review",
      "description": "This is a great product...",
      "global_rank": 1
    }
    // ... 9 more results (top 10 only)
  ]
}

~600 tokens for the same query.

After clean_google_search_payload():

{
  "organic": [
    {
      "link": "https://example.com/product",
      "title": "Amazing Product Review",
      "description": "This is a great product..."
    }
  ],
  "current_page": 1
}

~500 tokens — an 80% reduction from traditional SERP APIs.

Why Parsed Light Outperforms Traditional Parsers

Most SERP APIs parse the entire page and leave you to clean up the mess. Parsed Light is different:

  • Pre-filtered at the source — Only extracts organic results, no ads or sidebars
  • Standardized schema — Consistent field names across all queries (no snippet vs. description vs. snippet_long)
  • LLM-first design — Built for token efficiency from day one, not as an afterthought
  • Sub-1-second response times — Parsed Light is served via Bright Data’s premium routing infrastructure, designed specifically for mission-critical AI applications

This isn’t just about saving tokens—it’s about rethinking how SERP data should work for AI agents.

Built for Real-Time AI Agents

Bright Data’s Parsed Light isn’t just optimized—it’s engineered for speed. With sub-1-second response times, it’s ideal for:

  • Real-time data enrichment — Agents performing live lookups during user conversations
  • Multi-step research workflows — Chain multiple queries without latency bottlenecks
  • Fact verification — Instant validation of claims and statements
  • User-facing applications — Search-powered features that feel instant

Traditional SERP APIs can take 3-5 seconds per query. At scale, that latency compounds. Parsed Light delivers results in under 1 second, keeping your agents responsive and your users engaged.


Combined Impact: Real-World Workflow

Let’s trace token usage through a realistic agent workflow:

Task: “Find articles about AI regulations, then summarize the key points from each source.”

Step 1: Search for Articles

Agent calls: search_engine({query: "AI regulations 2024"})

Without optimization (traditional SERP API): ~2,500 tokens (10 results + ads + knowledge panels)
With Parsed Light + cleanup: ~500 tokens
Savings: 80% (2,000 tokens saved)

Step 2: Scrape Article Pages

Agent calls: scrape_as_markdown({url: "https://example.com/article"}) × 5 articles

Without optimization: ~15,000 tokens (5 pages × 3,000 tokens/page)
With remark().use(strip): ~9,000 tokens
Savings: 40% (6,000 tokens saved)

Step 3: Additional Research

Agent calls: search_engine({query: "EU AI Act details"}) for follow-up research

Without optimization: ~2,500 tokens
With Parsed Light + cleanup: ~500 tokens
Savings: 80% (2,000 tokens saved)

Total Workflow Savings

Without optimization: 20,000 tokens
With optimization: 10,000 tokens
Overall reduction: 50% (10,000 tokens saved)

At $3 per million input tokens (Claude Sonnet pricing), that’s $0.030 saved per workflow. Run this 1,000 times a day, and you’re saving $30/day or $10,950/year.

But the real value isn’t just cost savings—it’s throughput. With these optimizations, your agents can run more complex workflows in the same context window, completing tasks faster and handling more sophisticated queries.


Why This Matters for Agentic Workflows

Token optimization isn’t just about cost. It’s about enabling more complex workflows within context windows.

With a 200K token context window:

  • Without optimization: You can process ~10 multi-step workflows before hitting the limit
  • With optimization: You can process ~20 workflows in the same window

That’s 100% more throughput from the same infrastructure.

And when you combine this with Day 1’s Tool Groups (60-95% reduction in system prompt tokens) and Day 2’s Custom Tools (surgical tool selection), you’re looking at massive total token reduction across the entire agent lifecycle (system prompt + tool calls + tool responses).

Technical Details: Package Dependencies

Both optimization layers are implemented using battle-tested open-source libraries:

  • remark — Markdown processor (used by MDX, Gatsby, Next.js)
  • strip-markdown — Remark plugin for stripping formatting

These are the same tools used by production sites processing millions of requests per day.


See the Difference

Want to measure the impact? Compare token counts:

  1. Call a search_engine tool and count tokens in the response
  2. Compare against a traditional SERP API response for the same query
  3. Use your LLM provider’s tokenizer (e.g., tiktoken for OpenAI/Claude)

You’ll see 80% reduction on Google searches, 40% on scraped pages, and 50% on structured datasets.

This isn’t just optimization—it’s a complete rethinking of how web data should be delivered to AI agents.


Performance Stats Summary

Optimization Tool(s) Affected Token Reduction Use Case
Strip-Markdown scrape_as_markdown ~40% Web page summaries, content extraction
Parsed Light search_engine (Google only) ~80% Search result parsing, lead generation, research workflows

What’s Next?

Tomorrow (Day 4), we’re releasing enterprise integrations that bring our MCP server to the platforms your teams already use.

Stay tuned.

¿Listo para empezar?
Explore The Web MCP Server and start building powerful AI agents.
Read documentation View the Repo