Built-in Skills

Reference for all built-in skills that ship with YokeBot.

Web Search

The web search skill lets agents query the internet and retrieve up-to-date information. YokeBot supports two search providers:

Provider	Env Variable	Notes
Tavily	TAVILY_API_KEY	Optimized for AI consumption. Returns structured summaries. Recommended default.
Brave Search	BRAVE_API_KEY	Privacy-focused search engine. Returns traditional web results.

Configure your preferred provider by setting the appropriate API key in your environment variables. If both are set, agents can choose between them.

Image Generation

Agents can generate images using multiple models. The default model is Nano Banana 2 — fast, high-quality, and cost-effective. Style references let agents provide up to 6 existing images to guide the visual output.

Model	Strengths	Credit Cost
Nano Banana 2	Fast, versatile, supports style references via /edit endpoint.	100
Seedream 3.0	Photorealistic, high detail, great for product imagery.	100
Flux	Artistic styles, creative compositions.	100

Skill: generate_image
Required env: FAL_API_KEY
Parameters: prompt (required), aspect_ratio, num_images, image_urls (style refs, up to 6)

lightbulb

When image_urls are provided, the model automatically switches to its style-reference mode, using the provided images to guide the visual output while following the text prompt.

Image Editing

The edit_image skill uses the FireRed model to modify existing images based on text instructions. Agents can change backgrounds, swap elements, adjust styles, or composite multiple images together.

Skill: edit_image
Provider: FireRed Image Edit
Required env: FAL_API_KEY
Parameters: prompt (required), image_url (required), aspect_ratio
Credit cost: 150

Browser Automation

Agents have full browser automation capabilities via Playwright. These tools let agents complete any multi-step online task — filling forms, submitting orders, downloading files, navigating dashboards, and more.

Tool	Description
browser_navigate	Navigate to a URL with SSRF protection.
browser_click	Click an element by CSS selector or coordinates.
browser_type	Type text into a focused input field.
browser_screenshot	Capture a screenshot of the current page.
browser_snapshot	Get an accessibility snapshot of the page DOM.
browser_fill_form	Fill multiple form fields at once.
browser_download_file	Download a file and save to workspace.
browser_ask_human	Ask the human a question when the agent hits ambiguity.
browser_select_option	Select an option from a dropdown.
browser_press_key	Press a keyboard key (Enter, Tab, etc.).

Browser tools are covered in detail in the Browser Automation section.

Video Generation

YokeBot supports two video generation models:

Kling — high-quality video generation from text prompts.
Wan — fast video generation suitable for iterative workflows.

Set the FAL_API_KEY environment variable to enable video generation skills.

3D Model Generation

The 3D generation skill uses the Hunyuan model to create 3D models from text descriptions. Output is provided in standard 3D formats that can be viewed in the dashboard or downloaded.

Music Generation

The music generation skill uses the ACE-Step model to compose original music from text prompts describing genre, mood, tempo, and instrumentation. Generated audio files are playable directly in the dashboard.

Sound Effects

The MireloSFX skill generates short sound effects from text descriptions. Useful for game development, video production, and creative projects.

Text Embedding

The text embedding skill generates vector embeddings using the Qwen3 model. These embeddings power the Knowledge Base's semantic search. Agents can also use this skill directly to compute similarity between texts.

lightbulb

Text embedding is automatically used by the Knowledge Base. You only need to assign it manually if you want an agent to perform ad-hoc embedding operations outside the KB.