Skip to main content

We Tried Different Ways to Give Claude Code Access to a Web Browser

· 9 min read

At Ritza, we build proof-of-concept applications for clients when documenting their products. AI coding agents handle simple projects well, registration pages, login flows, and to-do apps take minutes. But our clients like full MVPs that showcase what their product can do.

An AI coding agent will tell you, "Perfect! The application is ready to test now," when the UI is broken or features don't function. We needed a way for Claude Code, our favorite coding agent, to test its own work before declaring the task done.

We tested two browser automation approaches: dev-browser (a Playwright-based skill) and Claude Code's official Chrome extension. We built the same documentation builder app twice, tracked token usage, and measured completion time for identical tasks.

Dev-browser used an average 33% fewer tokens and finished complex workflows in half the time. Here's what we learned.

Test methodology

We tested both tools with identical tasks using the same prompts in separate Claude Code sessions to prevent context carryover. Each test started fresh with no prior context to ensure fair comparison.

We tracked token usage using ccusage and manual tracking, and measured the time for each task completion. Our measurements have a margin of error of 7-10%, but the performance differences between tools are large enough that this margin doesn't affect our conclusions.

Note: The token counts reported in this article (13k, 8k, 22k, 45k) reflect the visible token usage shown in Claude Code's interface. However, the actual API costs are higher because Claude Code performs additional work behind the scenes, cache reads, cache writes, and other operations not displayed in the UI. For reference, a typical day of building this documentation app, and testing with both tools consumed approximately 28 million tokens (including cache operations) at a cost of $12.58, as shown in our API usage logs.

Daily API token usage showing 28M tokens at $12.58 cost

We tested three scenarios:

  1. Simple task: Authentication flow with signup, login, and logout (5-10 actions)
  2. Medium task: Documentation creation with markdown rendering and UI fixes (15-25 actions)
  3. Complex task: End-to-end workflow from account creation through publishing (30+ actions)

All tests used Claude Sonnet 4.5 and ran on the same machine to eliminate hardware variables.

Setup

If you want to set up dev-browser, read this article. For the Chrome extension, read this article.

Testing authentication

We asked Claude Code to test the signup, login, and logout paths.

The Chrome extension took 15 minutes and consumed 13,000 tokens. Dev-browser consumed 8,000 tokens, 38% fewer because Playwright scripts generate more compact outputs than the Chrome extension's tool call responses.

Both tools produced ugly UIs. The documentation list was cramped and unreadable:

Ugly documentation list

The detail view was worse:

Ugly documentation detail

Testing documentation creation

We asked Claude to create a documentation project, add pages, preview markdown rendering, and publish it. The documentation topic: how to build a documentation website using our docs-builder. We also required high color accessibility with black and white only.

Dev-browser took 12 minutes and consumed around 22,000 tokens. The Chrome extension consumed around 45,000 tokens across multiple iterations, twice as many, and the operation took 20 minutes. Dev-browser's visual feedback was better, though we can't pinpoint why.

Both tools caught a critical bug: the publish feature wasn't implemented on the frontend. Claude only checked the API endpoint. The first version looked like this:

Missing publish UI

What both tools caught

Both tools identified the same typography bug: headings rendered at 16px with no bold weight. Dev-browser diagnosed this via await page.screenshot({ path: "tmp/debug.png" }). Developer tools inspection showed fontSize: 16px, fontWeight: 400 on H1 elements.

Fixing the typography

We asked Claude to visit the rendered pages and verify the UI. We ran two sessions to compare visual feedback speed between the tools.

Claude installed these packages for markdown rendering:

PackagePurpose
react-markdownCore markdown-to-React rendering
remark-gfmGitHub Flavored Markdown support (tables, task lists, strikethrough)
rehype-highlightSyntax highlighting for code blocks
highlight.jsCSS themes for code syntax coloring

Integration in Next.js:

import ReactMarkdown from 'react-markdown';
import remarkGfm from 'remark-gfm';
import rehypeHighlight from 'rehype-highlight';
import 'highlight.js/styles/github-dark.css';

<ReactMarkdown
remarkPlugins={[remarkGfm]}
rehypePlugins={[rehypeHighlight]}
>
{selectedPage.content}
</ReactMarkdown>

Claude also added 95 lines of custom CSS typography rules to app/globals.css after discovering Tailwind's Typography plugin was incompatible with Tailwind v4's new syntax system.

Testing the complete workflow

Testing features independently works for QA validation. Each feature gets validated like a JIRA or Trello task. But scenario testing validates the complete workflow.

Complete workflow demo showing published documentation

We ran the final test: create an account, create documentation for using the docs builder, and verify the published result. We used dev-browser and the Chrome extension in separate sessions.

The Chrome extension took 32 minutes from signup to writing, viewing, and publishing the documentation. Dev-browser took 15 minutes for the same workflow, less than half the time.

Why dev-browser is faster

Dev-browser uses Playwright with a persistent browser context. Scripts are written in TypeScript and executed via npx tsx:

cd skills/dev-browser && npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";

const client = await connect();
const page = await client.page("docs-builder");
await page.goto("http://localhost:3001");
// automation code
await client.disconnect();
EOF

The Chrome extension uses MCP (Model Context Protocol) tools. Each action requires an explicit tool call:

mcp__claude-in-chrome__navigate(tabId, url)
mcp__claude-in-chrome__computer(action: "left_click", coordinate: [x, y])
mcp__claude-in-chrome__computer(action: "screenshot")
mcp__claude-in-chrome__read_page(tabId) // accessibility tree

The architectural difference matters. Dev-browser writes one script that executes many actions, amortizing the startup cost. The Chrome extension requires Claude to interpret each tool response before deciding the next action, adding 3-20 seconds of "thinking time" between steps.

Dev-browser also has two advantages: screenshotting and LLM-friendly DOM snapshots. When Claude takes screenshots and reads DOM snapshots optimized for token usage, it more quickly pinpoints what needs modification.

Testing CSS injection speed

We tested how fast each tool could inject CSS and HTML into the running app without modifying code files.

Dev-browser injecting CSS

Dev-browser took 3 minutes 30 seconds and consumed 6.3k tokens.

Chrome extension injecting CSS

The Chrome extension took 3 minutes and consumed 4.6k tokens.

For this simple task, the Chrome extension was slightly faster and more token-efficient because the startup cost matters less with fewer actions.

The break-even point

The Chrome extension excels at small tasks (1-15 actions) where each tool call executes in approximately 200 milliseconds with near-zero startup overhead. Dev-browser requires 2-3 seconds for process spawning, compilation, and connection establishment before any action occurs.

Dev-browser's strength emerges in complex workflows (20+ actions). The startup cost is paid once per script, then actions execute at native browser speed without thinking time between steps.

Here's the same workflow as a dev-browser script:

npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";

const client = await connect();
const page = await client.page("docs-workflow");

// Step 1: Navigate to dashboard
await page.goto("http://localhost:3001");
await waitForPageLoad(page);

// Step 2: Login
await page.fill('input[type="email"]', 'test@example.com');
await page.fill('input[type="password"]', 'password123');
await page.click('button:has-text("Login")');
await waitForPageLoad(page);

// Step 3: Create new project
await page.click('text=New Project');
await page.fill('input[name="name"]', 'E2E Test Documentation');
await page.fill('textarea[name="description"]', 'Created via automation');
await page.click('button:has-text("Create")');
await waitForPageLoad(page);

// Step 4: Add first page
await page.click('text=+ New Page');
await page.fill('input[placeholder="Page Title"]', 'Getting Started');
await page.fill('textarea', '# Welcome\n\nThis is automated content.');
await page.waitForTimeout(1000); // Auto-save

// Step 5: Add second page
await page.click('text=+ New Page');
await page.fill('input[placeholder="Page Title"]', 'API Reference');
await page.fill('textarea', '## Endpoints\n\n- GET /api/docs\n- POST /api/docs');
await page.waitForTimeout(1000);

// Step 6: Add third page
await page.click('text=+ New Page');
await page.fill('input[placeholder="Page Title"]', 'Examples');
await page.fill('textarea', '```js\nconst docs = new DocsBuilder();\n```');
await page.waitForTimeout(1000);

// Step 7: Publish
await page.click('button:has-text("Publish")');
await page.waitForTimeout(500);

// Step 8: Get public URL
const publicUrl = await page.evaluate(() => {
const urlElement = document.querySelector('.public-url');
return urlElement?.textContent;
});
console.log('Published at:', publicUrl);

// Step 9: Navigate to public URL to verify
await page.goto(publicUrl);
await waitForPageLoad(page);

// Step 10: Verify all pages are visible
const pageLinks = await page.$$('aside button');
console.log(`Found ${pageLinks.length} pages in public view`);

// Step 11: Take screenshot of public docs
await page.screenshot({ path: 'tmp/published-docs.png' });

await client.disconnect();
console.log('✅ Complete workflow finished');
EOF

The same workflow using the Chrome extension:

1. navigate → http://localhost:3000 (~300ms)
2. form_input → email field (~200ms)
3. form_input → password field (~200ms)
4. computer(click)Login button (~300ms)
5. computer(click)New Project (~300ms)
6. form_input → project name (~200ms)
7. form_input → description (~200ms)
8. computer(click)Create button (~300ms)
9. computer(click)+ New Page (~300ms)
10. form_input → page title (~200ms)
11. form_input → textarea content (~200ms)
... (continue for pages 2 & 3)
12. computer(click)Publish (~300ms)
13. javascript_tool → extract URL (~200ms)
14. navigate → public URL (~300ms)
15. javascript_tool → count pages (~200ms)
16. computer(screenshot)capture (~200ms)

Each tool call requires Claude to interpret the response before making the next call, consuming more tokens.

The graph below visualizes this performance difference. Chrome's linear execution (blue line) starts quickly but slows as tasks become more complex. Dev-browser's approach (red line) has a higher startup cost but scales better, with the break-even point occurring around 15-20 actions. Beyond this threshold, dev-browser's advantage grows exponentially.

Performance comparison

Final thoughts

At Ritza, time matters. We write technical content, and we need Claude Code to test its work quickly. We recommend dev-browser for agentic browser testing and automation.

Our projects showcase client products with more complexity than to-do apps. Dev-browser excels at complex workflows, using 33% fewer tokens and finishing in half the time compared to the Chrome extension. For workflows with 20+ actions, the speed advantage reaches 16x.

The Chrome extension works well for quick validations under 15 actions. But for end-to-end testing and scenarios you'll run repeatedly, dev-browser's upfront script investment pays off immediately.