How to build a simple stealth browser
Talking to Chrome over a WebSocket directly as an alternative to browser automation toolkits.
Why bother?
Puppeteer, Playwright, and Selenium all (typically in modern versions at least) talk to Chrome over the Chrome Devtools Protocol (CDP). Signals resulting from use of CDP may at times leak, however quickly get patched. Further, Chrome DevTools itself uses CDP to orchestrate chromium behaviour.
There are several reasons you might get caught by a bot protection tool whwile using something like Puppeteer, however there’s 2 main reasons that I want to focus on addressing in this blog post:
- Your tool (e.g. Puppeteer) is leaking that it’s a bot, either by design or by accident
A default Puppeteer injects __puppeteer_evaluation_script__ into error stacks, default Playwright sets __pwInitScripts on the window object. Both set navigator.webdriver = true.
You can patch these out. puppeteer-extra-plugin-stealth tries, and it works against basic checks. But the patches themselves can introduce new artifacts: Overriding native functions to hide navigator.webdriver means toString() on that getter no longer returns [native code], which is its own detection vector.
Another issue arrises! Now your fingerprint looks specifically like puppeteer-extra-plugin-stealth.
- Your tool might not directly look like a bot, but it does look awfully similar to other suspicious users
A bot protection vendor may (probably can) notice that you’re generating the same or very similar fingerprints to other users who have patterns of suspicious behvaiour (though whether you are standing out or blending in here is somewhat up for debate!).
What should I do instead?
A simple alternative is just talk to Chrome yourself.
CDP is a WebSocket with JSON messages, the framework layer on top is what leaves fingerprints. A few hundred lines of TypeScript and your browser is just Chrome. From here you can build in human-like mouse movements and typing.
Out of scope of this blog post but important for being a bot at scale is varying your fingerprint. I’ll leave that one for a follow up.
The full source is on GitHub. This post walks through how it all works.
How to launch a browser (and kill it)
Step one: find your Chrome executable and spawn it as a child process. The key flag is --remote-debugging-port, which tells Chrome to run a small HTTP server on that port. We also give it a temp directory for user data so each instance is isolated, and set a few flags to skip first-run prompts and allow WebSocket connections:
const args = [
`--remote-debugging-port=${debuggingPort}`,
`--user-data-dir=${tempDir}`,
"--no-first-run",
"--no-default-browser-check",
"--remote-allow-origins=*",
"--window-size=1440,900",
"--window-position=100,50",
];
const chromeProcess = spawn(chromePath, args, {
stdio: ["ignore", "ignore", "ignore"],
});
Step two: Chrome takes a moment to boot. The HTTP server isn’t ready immediately after the process spawns. So we poll http://127.0.0.1:{port}/json until it answers. When it does, we get back a JSON array of open targets (tabs, service workers, etc.). Each target has a webSocketDebuggerUrl, which is the URL via which we’ll talk to Chrome
for (let attempt = 0; attempt < 10; attempt++) {
try {
const targets = await fetchJson(`http://127.0.0.1:${port}/json`);
const page = targets.find((t) => t.type === "page");
if (page?.webSocketDebuggerUrl) {
return page.webSocketDebuggerUrl; // done, connect to this
}
} catch {
// not ready yet
}
await sleep(500);
}
Now open a WebSocket to that URL and we’ve got a channel to communicate with Chrome.
To kill it: chromeProcess.kill() and delete the temp directory. Multiple tabs on the same Chrome instance means multiple WebSocket connections (one per target). Multiple independent browsers means multiple Chrome processes on different ports.
The CDP connection
CDP uses JSON-RPC over WebSocket. Two kinds of messages come back from Chrome:
- Responses: You sent a command with an
id, Chrome replies with the sameidplus a result (or error). Like a function call that returns. - Events: Chrome pushes these unprompted when something happens (“the page finished loading”, “a network request fired”). They have a
methodname andparams, but noid.
To handle both, you need two data structures. A map of pending promises keyed by command id, and a list of event listeners:
let nextId = 1;
const pending = new Map<number, { resolve, reject }>();
const eventListeners: Array<(event) => void> = [];
When any message arrives on the WebSocket, check whether it has an id. If yes, it’s a response to something we sent - pull the matching promise out of pending and resolve it. If no, it’s an event - broadcast it to all registered listeners:
ws.on("message", (raw) => {
const msg = JSON.parse(raw.toString());
if ("id" in msg) {
// Response: route back to whoever sent the command
const handler = pending.get(msg.id);
if (handler) {
pending.delete(msg.id);
handler.resolve(msg);
}
} else {
// Event: broadcast to all listeners
for (const listener of eventListeners) {
listener(msg);
}
}
});
Sending a command means assigning it the next id, stashing a promise in pending, and writing JSON to the socket. The promise resolves when the response handler above finds the matching id:
function send(method: string, params = {}): Promise<CDPResponse> {
const id = nextId++;
return new Promise((resolve, reject) => {
pending.set(id, { resolve, reject });
ws.send(JSON.stringify({ id, method, params }));
});
}
Multiple commands can be in-flight at once because each has a unique id. You’ll also want a waitForEvent helper to register a temporary listener that resolves a promise when a specific event method arrives:
function waitForEvent(eventName: string, timeoutMs = 30000): Promise<CDPEvent> {
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
cleanup();
reject(new Error(`Timed out waiting for ${eventName}`));
}, timeoutMs);
const listener = (event) => {
if (event.method === eventName) {
cleanup();
resolve(event);
}
};
const cleanup = () => {
clearTimeout(timer);
eventListeners.splice(eventListeners.indexOf(listener), 1);
};
eventListeners.push(listener);
});
}
How to navigate to URLs
CDP is organized into domains: Page, Network, Input, DOM, Runtime, etc. By default Chrome doesn’t send you events from any of them so you have to opt in:
await send("Page.enable");
Before we navigate we register a listener for the page loaded event. This is important: register before firing off the navigation, otherwise we might miss the event if the page loads fast.
const loaded = waitForEvent("Page.loadEventFired");
await send("Page.navigate", { url });
await loaded;
So we send:
{ "id": 7, "method": "Page.navigate", "params": { "url": "https://..." } }
Chrome replies:
{ "id": 7, "result": { "frameId": "...", "loaderId": "..." } }
This just means “I’ve started navigating.”, not that the page is loaded.
id: 7 matches what we sent, so CDPConnection routes it back to the right pending promise.
Meanwhile, Chrome is fetching HTML, parsing CSS, loading images, running scripts, etc. Once all done and the page’s native load event fires, Chrome pushes:
{ "method": "Page.loadEventFired", "params": { "timestamp": 123456.789 } }
That’s what our waitForEvent call resolves on.
There’s a crazy amount going on between those two messages, that being the reason “what happens when you type google.com and press enter” is still a great interview question.
How to find things on a page
Say you want to find an element by XPath, then you want to ask Chrome to run JavaScript on your behalf.
The CDP command for this is Runtime.evaluate. Send it a JavaScript expression string, Chrome evaluates it in the page’s JS context, and sends back the result:
const js = `document.evaluate(
"/html/body/p[96]",
document,
null,
XPathResult.FIRST_ORDERED_NODE_TYPE,
null
).singleNodeValue`;
const response = await send("Runtime.evaluate", {
expression: js,
returnByValue: false,
});
That’s the browser’s standard document.evaluate() API, and JS can generally be expected to run inside Chrome exactly as if you’d typed it into the DevTools console.
Notice returnByValue: false. This changes what Chrome sends back:
returnByValue: true: Chrome serializes the result to JSON and sends the actual data. This is crap for a DOM node.returnByValue: false: Chrome keeps the object in its own memory and sends back aremoteObjectId: an opaque string like{"injectedScriptId":1,"id":42}. This is a pointer to a specific object in Chrome’s JS heap.
If we get back a remoteObjectId and the subtype isn’t "null" (CDP’s way of saying document.evaluate returned null), we have a valid handle. We wrap it in a WebElement.
Operating on remote DOM nodes
Once we have a remoteObjectId, we want to wrap it in some object that remembers the handle and provides methods like textContent(), getAttribute(), click(). The pattern for every operation is the same, using Runtime.callFunctionOn to execute a function with this bound to the remote object:
const response = await send("Runtime.callFunctionOn", {
objectId: remoteObjectId,
functionDeclaration: "function() { return this.textContent; }",
returnByValue: true,
});
// response.result.result.value is the text content string
Runtime.callFunctionOn is the companion to Runtime.evaluate. Instead of evaluating a raw expression, it calls a function with this bound to a specific remote object. Chrome executes the function with this pointing at our DOM node and sends the return value back.
The same pattern works for innerHTML, outerHTML, getAttribute(). You’re essentially building a proxy object: every method sends a CDP command, Chrome does the work, you get the result back over the WebSocket.
| Command | What it does | When we use it |
|---|---|---|
Runtime.evaluate |
Run arbitrary JS, get back a result or a remote handle | Finding elements – we need a handle to a DOM node |
Runtime.callFunctionOn |
Call a function with this bound to a remote object |
Operating on elements – we have a handle, want to read data from it |
How to take a screenshot
Simple as Page.captureScreenshot.
const response = await send("Page.captureScreenshot", { format: "png" });
const base64 = response.result.data; // base64-encoded image
const buffer = Buffer.from(base64, "base64");
Chrome screenshots, encodes it to the format you supply, base64 encodes the result and sends it over the WebSocket.
With no clip parameter, it captures exactly what’s visible in the viewport.
How to click
Send three CDP commands in sequence:
await send("Input.dispatchMouseEvent", { type: "mouseMoved", x, y });
await send("Input.dispatchMouseEvent", { type: "mousePressed", x, y, button: "left", clickCount: 1 });
await send("Input.dispatchMouseEvent", { type: "mouseReleased", x, y, button: "left", clickCount: 1 });
mouseMoved first because some sites track mouse movement patterns and will flag a click that materializes at coordinates with no preceding movement.
Chrome processes these through the same input pipeline as real user input. It takes the coordinates, hit-tests them against the rendered layout tree, and fires the full DOM event chain (mousedown -> mouseup -> click) on whatever element is at that position.
This is fundamentally different from calling element.click() in JavaScript. A JS click() bypasses hit-testing entirely, fires a synthetic event directly on the element, doesn’t trigger hover/focus state changes, doesn’t produce mousedown/mouseup events, and produces events with isTrusted: false. Sites can check for that. Input.dispatchMouseEvent clicks have isTrusted: true, same as a physical mouse.
Clicking an element by reference
You have a remoteObjectId but Input.dispatchMouseEvent needs pixel coordinates. You need to bridge from one to the other, and the element must be visible in the viewport for the coordinates to work.
The sequence:
Scroll the element into view. Use
Runtime.callFunctionOnto callelement.scrollIntoView({ block: "center" })on your remote handle.Get the bounding box. Use
Runtime.callFunctionOnto callgetBoundingClientRect(), which returns{ x, y, width, height }in viewport-relative coordinatesCompute somewhere confidently withing the element and click.
x + width/2,y + height/2is the exact centre, but you probably don’t want to do exactly that.
Be careful of layout shift!
Also be careful of clicking exactly the same spot in an element every time, this is something protection systems actively monitor.
How to type
The quick way is Input.insertText which pastes the entire string in one frame.
A bot detection system watching keystroke timing seeing zero inter key delay is a fairly firm detection signal.
Better is to dispatch individual key events:
for (const char of text) {
// keyDown
await send("Input.dispatchKeyEvent", {
type: "keyDown",
key: char,
code: charToCode(char), // "KeyA", "Digit5", "Space", etc.
windowsVirtualKeyCode: charToKeyCode(char),
text: char,
modifiers: isUpperCase(char) ? 8 : 0, // 8 = shift held
});
// small random delay between down and up
await sleep(Math.max(4, gaussRandom(8, 3)));
// keyUp
await send("Input.dispatchKeyEvent", {
type: "keyUp",
key: char,
code: charToCode(char),
windowsVirtualKeyCode: charToKeyCode(char),
});
await sleep(delayForNextChar);
}
Each keypress goes through Chrome’s real input pipeline, firing keydown/keypress/input events on the focused element. Pages listening for keydown events would notice if characters just appeared without them. The code field matters too, as it identifies the physical key (“KeyA”, “Digit2”, “Space”).
Congrats!
You’ve got a bare bones stealth browser that hooks into a real chrome! The sky is the limit.
Making it look human
Stay tuned for a follow up post here, however it comes down to:
- Use residential IPs where possible
- Find a way to move the mouse like a human
- Find a way to click like a human
- Find a way to type like a human
- Find a way to pause like a human
- Don’t always follow the same interaction patterns
Varying your fingerprint
Also stay tuned for a follow up post! This is simpler to summarise though:
- Vary your browser
- Try very hard to not lie about your browser (see creepJS for reading about lie detection)