Agents can browse the web, fill forms, download files, and ask humans for help.
YokeBot agents have full browser automation capabilities powered by Playwright. Agents can navigate websites, fill forms, click buttons, download files, take screenshots, and complete multi-step online tasks — all autonomously. When they hit ambiguity, they ask the human team for guidance.
The browser is integrated directly into the Workspace as a tab alongside files, data tables, and the video editor. You can observe agent browsing in real-time or take control of the browser yourself.
| Mode | Who Drives | Use Case |
|---|---|---|
| Agent Browser | Agent drives, human observes. | Watch agents complete online tasks autonomously. |
| Take Control | Human drives, agent observes. | Record a login, intervene mid-task, or browse manually. |
Agents have access to 10+ browser tools during their heartbeat cycle:
| Tool | Description |
|---|---|
| browser_navigate | Go to a URL. Includes SSRF protection against private IPs and DNS rebinding. |
| browser_snapshot | Get an accessibility snapshot of the current page for understanding page structure. |
| browser_click | Click an element by CSS selector or pixel coordinates. |
| browser_type | Type text into the currently focused input field. |
| browser_press_key | Press a keyboard key (Enter, Tab, Escape, etc.). |
| browser_select_option | Select an option from a dropdown menu. |
| browser_screenshot | Capture a screenshot of the current page state. |
| browser_fill_form | Fill multiple form fields at once from a structured list. |
| browser_download_file | Download a file and save it to the team workspace. |
| browser_ask_human | Ask the human a question with optional multiple-choice answers. |
When an agent encounters ambiguity while browsing — a form field it cannot fill, a choice it cannot make, a CAPTCHA, or a decision that requires business context — it calls browser_ask_human. This:
Agent: "Which shipping option should I select?"
Options: Standard ($5.99), Express ($12.99), Overnight ($24.99)
Context: "I'm on the checkout page at example.com ordering the widgets you requested."
[screenshot attached]The browser_fill_form tool lets agents populate multiple form fields in a single action:
{
"fields": [
{ "selector": "#name", "value": "Jane Smith" },
{ "selector": "#email", "value": "jane@example.com" },
{ "selector": "#company", "value": "Acme Corp" }
],
"submit": false
}Set submit to true to automatically click the submit button after filling all fields.
When an agent is actively browsing, you can watch in real-time from the Workspace browser tab. Screenshots are streamed at ~2fps via SSE (Server-Sent Events). From the live view, you can:
Browser sessions are secured with multiple layers: