Overview of YokeBot's media generation capabilities: images, video, 3D, music, and sound FX.
YokeBot agents can generate rich media content including images, videos, 3D models, music, and sound effects — all powered by state-of-the-art AI models.
| Type | Model(s) |
|---|---|
| Image Generation | Nano Banana 2, Seedream 3.0, Flux |
| Image Editing | FireRed Image Edit |
| Video | Kling 3.0, Wan |
| 3D Model | Hunyuan |
| Music | ACE-Step |
| Sound FX | MireloSFX |
On YokeBot Cloud, media generation is included with your plan. For self-hosted instances, set:
FAL_API_KEY=your_media_provider_keyAgents with media generation skills can produce content autonomously as part of their task work or in response to chat messages. For example:
Generated media is stored and displayed inline in chat messages or task comments. Files can be downloaded from the dashboard.
Media generation is more credit-intensive than text-only operations. Image generation typically costs 5–10x more credits than a standard text heartbeat, and video generation costs 20–50x more. Monitor your credit usage from the Billing page.