Built-in Skills

Reference for all built-in skills that ship with YokeBot.

Web Search

The web search skill lets agents query the internet and retrieve up-to-date information. YokeBot supports two search providers:

ProviderEnv VariableNotes
TavilyTAVILY_API_KEYOptimized for AI consumption. Returns structured summaries. Recommended default.
Brave SearchBRAVE_API_KEYPrivacy-focused search engine. Returns traditional web results.

Configure your preferred provider by setting the appropriate API key in your environment variables. If both are set, agents can choose between them.

Image Generation

Agents can generate images using multiple models. The default model is Nano Banana 2 — fast, high-quality, and cost-effective. Style references let agents provide up to 6 existing images to guide the visual output.

ModelStrengthsCredit Cost
Nano Banana 2Fast, versatile, supports style references via /edit endpoint.100
Seedream 3.0Photorealistic, high detail, great for product imagery.100
FluxArtistic styles, creative compositions.100
Skill: generate_image
Required env: FAL_API_KEY
Parameters: prompt (required), aspect_ratio, num_images, image_urls (style refs, up to 6)
lightbulb
When image_urls are provided, the model automatically switches to its style-reference mode, using the provided images to guide the visual output while following the text prompt.

Image Editing

The edit_image skill uses the FireRed model to modify existing images based on text instructions. Agents can change backgrounds, swap elements, adjust styles, or composite multiple images together.

Skill: edit_image
Provider: FireRed Image Edit
Required env: FAL_API_KEY
Parameters: prompt (required), image_url (required), aspect_ratio
Credit cost: 150

Browser Automation

Agents have full browser automation capabilities via Playwright. These tools let agents complete any multi-step online task — filling forms, submitting orders, downloading files, navigating dashboards, and more.

ToolDescription
browser_navigateNavigate to a URL with SSRF protection.
browser_clickClick an element by CSS selector or coordinates.
browser_typeType text into a focused input field.
browser_screenshotCapture a screenshot of the current page.
browser_snapshotGet an accessibility snapshot of the page DOM.
browser_fill_formFill multiple form fields at once.
browser_download_fileDownload a file and save to workspace.
browser_ask_humanAsk the human a question when the agent hits ambiguity.
browser_select_optionSelect an option from a dropdown.
browser_press_keyPress a keyboard key (Enter, Tab, etc.).

Browser tools are covered in detail in the Browser Automation section.

Video Generation

YokeBot supports two video generation models:

  • Kling — high-quality video generation from text prompts.
  • Wan — fast video generation suitable for iterative workflows.

Set the FAL_API_KEY environment variable to enable video generation skills.

3D Model Generation

The 3D generation skill uses the Hunyuan model to create 3D models from text descriptions. Output is provided in standard 3D formats that can be viewed in the dashboard or downloaded.

Music Generation

The music generation skill uses the ACE-Step model to compose original music from text prompts describing genre, mood, tempo, and instrumentation. Generated audio files are playable directly in the dashboard.

Sound Effects

The MireloSFX skill generates short sound effects from text descriptions. Useful for game development, video production, and creative projects.

Text Embedding

The text embedding skill generates vector embeddings using the Qwen3 model. These embeddings power the Knowledge Base's semantic search. Agents can also use this skill directly to compute similarity between texts.

lightbulb
Text embedding is automatically used by the Knowledge Base. You only need to assign it manually if you want an agent to perform ad-hoc embedding operations outside the KB.