Embeddings & Search

How vector embeddings and semantic search work in the Knowledge Base.

What Are Embeddings?

An embedding is a numerical vector representation of text that captures its semantic meaning. Texts with similar meanings produce vectors that are close together in embedding space. YokeBot uses embeddings to power semantic search — agents can find relevant documents even when the exact words differ.

Embedding Model

YokeBot uses the Qwen3 embedding model to generate vectors. This model produces high-quality embeddings optimized for retrieval tasks across multiple languages.

PropertyValue
ModelQwen3 Embedding
Dimensions1024
Max Input Tokens8192
ProviderConfigurable

How Search Works

When an agent queries the Knowledge Base, the following steps occur:

  1. The query text is converted to an embedding vector using the same Qwen3 model.
  2. The vector is compared against all stored chunk embeddings using cosine similarity.
  3. The top-K most similar chunks are returned (K is configurable, default 5).
  4. The chunks' original text is injected into the agent's LLM context.

Tuning Search Quality

You can adjust search behavior with these parameters:

ParameterDefaultDescription
top_k5Number of chunks to retrieve per query.
similarity_threshold0.7Minimum similarity score (0–1) for a chunk to be included.
chunk_size500Approximate chunk size in tokens. Smaller chunks = more precise but less context.
chunk_overlap50Overlap between adjacent chunks to preserve context boundaries.

Hybrid Search

For best results, YokeBot combines vector similarity search with keyword matching. If the agent's query contains specific names, codes, or identifiers that may not be captured well by embeddings alone, keyword search ensures those documents are still surfaced.

Performance Considerations

Embedding generation happens once per document upload and is the most compute-intensive step. Queries are fast even for large knowledge bases because vector search is optimized with approximate nearest neighbor (ANN) indexing.

lightbulb
If you notice slow embedding on self-hosted instances, consider using a GPU-enabled machine or offloading embedding to a hosted provider.