Skip to main content

Browser MCP Server Comparison

Benchmark date: 2026-02-21 | OpenBrowser MCP v0.1.26 (CodeAgent, 1 tool) | Playwright MCP (latest) | Chrome DevTools MCP (latest)

Overview

Three approaches to browser automation via MCP, measured on identical tasks.
Playwright MCPChrome DevTools MCPOpenBrowser MCP
MaintainerMicrosoftChrome DevTools team (Google)OpenBrowser
GitHub Stars27,400+26,200+
EnginePlaywright (Chromium/Firefox/WebKit)Puppeteer (CDP)Raw CDP (direct)
Tools22 core (34 total)261 (execute_code)
ApproachA11y snapshot with every actionA11y snapshot on demandCodeAgent — Python code execution with browser namespace
Element IDsref from snapshot ([ref=e25])uid from snapshot (uid=1_2)Numeric index from state
Transportstdiostdiostdio
LanguageTypeScript (Node.js)TypeScript (Node.js)Python

Token Usage Benchmark

Methodology

All three MCP servers were started as subprocesses and tested via JSON-RPC stdio transport. Same 5-step workflow, same pages, same measurement method. All numbers are real measurements, not estimates. Workflow: Navigate to Wikipedia Python page -> get page state -> click link -> go back -> get state again.

Results: 5-Step Workflow on Wikipedia (Complex Page)

MetricPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Tool calls555
Total response chars992,065539,2091,131
Est. response tokens248,016134,802283
Token ratio vs OpenBrowser877x more476x more1x (baseline)

Results: Small Page (httpbin.org/forms/post)

OperationPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Navigate1,985 chars182 chars105 chars
Page state1,896 chars1,366 chars1,488 chars
Type/fill526 chars95 chars92 chars
List tabs/pages367 chars120 chars195 chars

Per-Operation Token Breakdown (Wikipedia)

OperationPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Navigate495,950 chars (~124K tokens)240 chars (~60 tokens)134 chars (~34 tokens)
Snapshot / Get state495,437 chars (~124K tokens)538,467 chars (~135K tokens)421 chars (~105 tokens)
Click341 chars (~85 tokens)218 chars (~55 tokens)80 chars (~20 tokens)
Go back198 chars (~50 tokens)149 chars (~37 tokens)75 chars (~19 tokens)
Get state (2nd)139 chars (~35 tokens)135 chars (~34 tokens)421 chars (~105 tokens)

Why the Difference

Design DecisionPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Navigate responseFull a11y snapshotURL confirmationURL confirmation
State queryAlways full snapshotFull a11y snapshot (take_snapshot)Compact (105 tokens) or full
Click responseUpdated snapshotAction confirmationAction confirmation
Text extractionNo dedicated toolNo dedicated toolevaluate() + Python processing
Content searchDump full snapshotevaluate_script (JS)evaluate() + Python regex/string matching
Element searchPart of snapshotuid from snapshotbrowser.get_browser_state_summary()
Playwright MCP: Every navigation returns the full page accessibility snapshot (~124K tokens for Wikipedia). Consistent, but forces the LLM to process the entire page with every action. Chrome DevTools MCP: Returns minimal confirmations for navigation. The agent must explicitly call take_snapshot to see the page (~135K tokens for Wikipedia). One snapshot is comparable in size to Playwright’s, but actions don’t auto-return snapshots. OpenBrowser MCP: Actions return minimal confirmations. The agent explicitly requests the level of detail it needs — from 105 tokens (compact state) to 3,981 tokens (targeted search) to 25,164 tokens (full page text). More tool calls, dramatically fewer tokens.

What Each Server Returns (Verbatim)

All responses below are real output captured from each MCP server on the httpbin.org/forms/post page. Playwright MCP — returns full a11y snapshot with every navigation (~2,150 chars on httpbin, ~496K chars on Wikipedia):
### Ran Playwright code
await page.goto('https://httpbin.org/forms/post');
### Page
- Page URL: https://httpbin.org/forms/post
- Console: 1 errors, 0 warnings
### Snapshot
- generic [ref=e2]:
  - paragraph [ref=e3]:
    - generic [ref=e4]:
      - text: "Customer name:"
      - textbox "Customer name:" [ref=e5]
  - paragraph [ref=e6]:
    - generic [ref=e7]:
      - text: "Telephone:"
      - textbox "Telephone:" [ref=e8]
  ... (entire page tree continues)
Chrome DevTools MCP — returns URL confirmation only (~136 chars):
# navigate_page response
Successfully navigated to https://httpbin.org/forms/post.
## Pages
1: https://httpbin.org/forms/post [selected]
OpenBrowser MCP — returns URL confirmation only (~105 chars):
Navigated to: https://httpbin.org/forms/post

Get Page State / Snapshot

Playwright MCP browser_snapshot — full a11y tree again (~1,896 chars on httpbin, ~495K chars on Wikipedia):
- generic [ref=e2]:
  - paragraph [ref=e3]:
    - generic [ref=e4]:
      - text: "Customer name:"
      - textbox "Customer name:" [ref=e5]
  ... (entire page tree)
Chrome DevTools MCP take_snapshot — full a11y tree (~1,214 chars on httpbin, ~538K chars on Wikipedia):
uid=1_0 RootWebArea url="https://httpbin.org/forms/post"
  uid=1_1 StaticText "Customer name: "
  uid=1_2 textbox "Customer name: "
  uid=1_3 StaticText "Telephone: "
  uid=1_4 textbox "Telephone: "
  uid=1_5 StaticText "E-mail address: "
  uid=1_6 textbox "E-mail address: "
  ... (entire page tree)
OpenBrowser MCP — the LLM writes Python code to query exactly what it needs:
# Get page metadata
state = await browser.get_browser_state_summary()
print(json.dumps({"url": state.url, "title": state.title}))
{"url": "https://httpbin.org/forms/post", "title": "httpbin.org/forms/post"}
The agent requests only the data it needs via Python code — page title, specific element attributes, or targeted JS evaluation. No full-page dumps.

Click / Type / Go Back

All three servers return short confirmations for actions:
OperationPlaywright MCPChrome DevTools MCPOpenBrowser MCP
ClickClicked elementClicked element uid=1_2Clicked element 5
TypeUpdated snapshot (~526 chars)Filled element uid=1_2Typed 'John Doe' into element 4
Go back~198 chars~149 charsNavigated back

Targeted Extraction (OpenBrowser only)

No equivalent in Playwright or Chrome DevTools MCP — both require dumping the full snapshot or running JavaScript from the client side. Element search — the LLM writes Python to find specific elements:
state = await browser.get_browser_state_summary()
for idx, el in state.dom_state.selector_map.items():
    if 'submit' in el.get_all_children_text(max_depth=1).lower():
        print(f"Found submit button at index {idx}")
Data extraction — extract specific data via JS evaluation, not full-page dumps:
# Extract only what's needed from a 97K-char Wikipedia page
data = await evaluate('document.querySelector(".infobox")?.innerText')
print(data)  # ~900 chars instead of ~97K
Comparison for finding “Guido van Rossum” on Wikipedia:
ServerMethodResponse Size
Playwright MCPbrowser_snapshot (dump full tree, search client-side)~495K chars
Chrome DevTools MCPtake_snapshot (dump full tree, search client-side)~538K chars
OpenBrowser MCPevaluate() + Python string matching~900 chars

Cost Comparison

MCP tool response costs per 5-step workflow on a complex page (Wikipedia). These tokens are added to the LLM’s context window, charged at input token rates. All numbers based on real measurements.
ModelPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Claude Sonnet 4.6 ($3/M input)$0.744$0.404$0.001
Claude Opus 4.6 ($5/M input)$1.240$0.674$0.001
GPT-5.2 ($1.75/M input)$0.434$0.236$0.001
Per 1,000 workflows:
ModelPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Claude Sonnet 4.6$744$404$0.85
Claude Opus 4.6$1,240$674$1.42
GPT-5.2$434$236$0.50

E2E LLM Benchmark

Methodology

Six real-world browser tasks run through Claude Sonnet 4.6 on AWS Bedrock (Converse API). Each task uses a single MCP server as tool provider. The LLM decides which tools to call and when the task is complete. All tasks run against live websites. Tasks: Wikipedia fact lookup, httpbin form fill, Hacker News data extraction, Wikipedia search + navigation, GitHub release lookup, example.com content analysis.

Results: Task Success

All three servers pass all 6 tasks.
TaskPlaywright MCPChrome DevTools MCPOpenBrowser MCP
fact_lookupPASSPASSPASS
form_fillPASSPASSPASS
multi_page_extractPASSPASSPASS
search_navigatePASSPASSPASS
deep_navigationPASSPASSPASS
content_analysisPASSPASSPASS
Total6/66/66/6

Results: Tool Calls and Duration

MetricPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Total tool calls (mean)9.419.413.8
Avg tool calls/task1.63.22.3
Total duration (mean +/- std)62.7 +/- 4.8s103.4 +/- 2.7s77.0 +/- 6.7s
Playwright completes most tasks in 1 tool call because every navigation returns the full accessibility snapshot — the LLM sees the entire page immediately and can answer without follow-up queries. OpenBrowser takes more turns because the LLM writes code to navigate, extract, and verify data step by step. MCP Server Benchmark: Duration vs Token Usage

Per-Task Duration

Per-Task Duration by MCP Server

Per-Task Tool Calls

Tool Calls Per Task by MCP Server

Results: Token Efficiency

Bedrock API token usage measured from the Converse API usage field (mean across 5 runs, 10,000-sample bootstrap CIs).
MetricPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Bedrock API Tokens (mean)158,787299,48650,195
MCP Response Chars (mean)1,132,1731,147,2447,853
API Token ratio vs OpenBrowser3.2x more6.0x more1x (baseline)
Response char ratio vs OpenBrowser144x more146x more1x (baseline)
Total Bedrock API Token Usage (Input vs Output)

Per-Task MCP Response Size

TaskPlaywright MCPChrome DevTools MCPOpenBrowser MCP
fact_lookup520,742 chars509,058 chars3,144 chars
form_fill4,075 chars3,150 chars2,305 chars
multi_page_extract58,392 chars38,880 chars294 chars
search_navigate519,241 chars595,590 chars2,848 chars
deep_navigation14,875 chars195 chars113 chars
content_analysis485 chars501 chars499 chars
The pattern: any task involving a complex page (Wikipedia, GitHub releases) produces massive response payloads for Playwright and Chrome DevTools because they dump the full accessibility snapshot. OpenBrowser returns only the data the code explicitly extracts. MCP Response Size: Full Page Dumps vs Server-Side Processing

Per-Task Input Token Usage

Per-Task Input Token Usage by MCP Server

Results: Cost Per Benchmark Run (6 Tasks)

Cost per benchmark run based on Bedrock API token usage (input + output tokens at respective rates).
ModelPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Claude Sonnet 4.6 (3/3/15 per M)$0.50$0.92$0.18
Claude Opus 4.6 (5/5/25 per M)$0.83$1.53$0.30
Cost Per Benchmark Run by Model

Why OpenBrowser Uses More Tool Calls but Fewer Tokens

Playwright sends the full page with every response, so the LLM gets the answer immediately but pays for ~120K tokens per Wikipedia page load. OpenBrowser returns compact results (~30-800 chars per call), so the LLM needs more round-trips to navigate and extract but pays far fewer tokens overall. For a single simple task, Playwright’s approach is fast. At scale (thousands of workflows, complex pages, multi-step agents), the MCP response size difference is 144x.

Tool Surface Comparison

Playwright MCP (22 core tools, 34 total)

Navigation, interaction, form filling, file upload, drag and drop, hover, key press, select option, screenshots, snapshots, console messages, dialog handling, network requests, tab management, code execution, PDF export, wait conditions, resize, vision-mode coordinate tools, test assertions.

Chrome DevTools MCP (26 tools)

Input automation (click, drag, fill, fill_form, hover, press_key, handle_dialog, upload_file), navigation (navigate_page, new_page, close_page, list_pages, select_page, wait_for), emulation (emulate, resize_page), performance tracing (start/stop/analyze), network debugging (list/get requests), JS execution, console messages, screenshots, DOM snapshots.

OpenBrowser MCP (1 tool — CodeAgent)

ToolCapabilities
execute_codePython code execution in a persistent namespace with browser automation functions: navigate(), click(), input_text(), scroll(), go_back(), select_dropdown(), send_keys(), upload_file(), evaluate() (JS), switch()/close() (tabs), done() (task completion). Also provides file_system for local file operations and pre-imported libraries (json, pandas, re, csv, etc.)

Unique to OpenBrowser

Features no competitor offers:
  • CodeAgent architecture — single execute_code tool runs Python in a persistent namespace. The LLM writes code to navigate, extract, and process data rather than calling individual tools. Variables and state persist between calls.
  • 144x smaller MCP responses — returns only the data the code explicitly extracts, not full page dumps
  • JS evaluation with Python processingawait evaluate("JS expression") returns Python objects directly (dicts, lists, strings), enabling pandas/regex/json processing in the same code block
  • Built-in libraries — json, pandas, numpy, matplotlib, csv, re, datetime, requests, BeautifulSoup available in the execution namespace
  • File system accessfile_system object for reading/writing local files from browser automation code
  • Dropdown supportselect_dropdown() and dropdown_options() for native <select> elements
  • Task completion signaldone(text, success) to explicitly mark task completion with a result

Gaps vs Competitors

CapabilityPlaywright MCPChrome DevTools MCPOpenBrowser MCP
Cross-browserYes (Chromium/Firefox/WebKit)NoNo
ScreenshotsYesYesNo
File uploadYesYesNo
PDF exportYesNoNo
Network monitoringYesYesNo
Console logsYesYesNo
Performance tracingNoYesNo
Emulation (device/viewport)NoYesNo
Drag and dropYesYesNo
Dialog handlingYesYesNo
Wait conditionsYesYesNo
Code generationYesNoNo
Incremental snapshotsYesNoNo
Accessibility treeVia snapshotVia snapshotVia evaluate() JS queries
Content grep/searchNoNoVia evaluate() + Python
Progressive disclosureNoNoYes (code extracts only what’s needed)
Session reuseNoPuppeteer-managedYes (existing Chrome)

Benchmark Methodology Notes

Token Usage Benchmark (5-step workflow)

  • All three servers benchmarked via JSON-RPC stdio subprocess — no estimates
  • Response sizes measured as total JSON-RPC response character count
  • Estimated tokens = characters / 4 (standard approximation for mixed English/JSON)
  • 5-step workflow uses matched operations: navigate, get state/snapshot, click, go back, get state/snapshot
  • Raw benchmark data: benchmarks/playwright_results.json, benchmarks/cdp_results.json, benchmarks/openbrowser_results.json

E2E LLM Benchmark (6 real-world tasks, N=5 runs)

  • Model: Claude Sonnet 4.6 on AWS Bedrock (Converse API)
  • Each task uses a single MCP server as tool provider via JSON-RPC stdio
  • The LLM autonomously decides which tools to call and when the task is complete
  • 5 runs per server, 10,000-sample bootstrap for 95% confidence intervals
  • 6 tasks: fact_lookup, form_fill, multi_page_extract, search_navigate, deep_navigation, content_analysis
  • All tasks run against live websites (Wikipedia, httpbin.org, news.ycombinator.com, github.com, example.com)
  • Bedrock API token usage measured from the Converse API usage field (actual billed tokens)
  • MCP response sizes measured from tool response character counts
  • Playwright MCP tested via npx @playwright/mcp@latest
  • Chrome DevTools MCP tested via npx -y chrome-devtools-mcp@latest
  • OpenBrowser MCP v0.1.26 tested via uvx openbrowser-ai[mcp]==0.1.26 --mcp
  • All tests run on macOS, M-series Apple Silicon
  • Benchmark scripts: benchmarks/e2e_llm_benchmark.py, benchmarks/e2e_llm_stats.py
  • Results: benchmarks/e2e_llm_stats_results.json