🧬Embeddings

Generate numerical vector representations of text that capture semantic meaning. Embeddings power semantic search, recommendations, clustering, and similarity detection.

🎯 What are Embeddings?

Embeddings convert text into high-dimensional vectors (arrays of numbers) where semantically similar texts have similar vector representations.

🏗️ Embedding Architecture

Example:

"cat" → [0.2, -0.5, 0.8, 0.1, ...]      (1536 dimensions)
"kitten" → [0.3, -0.4, 0.7, 0.2, ...]   (similar vector)
"car" → [-0.6, 0.3, -0.2, 0.9, ...]     (different vector)

Key Properties:

  • Similar meanings = Close vectors

  • Different meanings = Distant vectors

  • Math operations preserve semantic relationships

  • Dimension count varies by model (typically 768-3072)

🔧 The aiEmbed() Function

Generate embeddings for single texts or batches.

🔄 Embedding Generation Flow

Basic Usage

Single Text

Batch Processing

Configuration Options

Provider Selection

Model Selection

Return Formats

Control what data is returned:

Raw Response (Default)

Full API response with metadata:

Embeddings Array

Get just the vectors:

First Vector

Get single vector for single input:

💡 Use Cases

Find documents similar to a query using vector similarity:

Semantic Search Flow

Text Clustering

Group similar texts together:

Recommendations

Recommend items based on similarity:

Duplicate Detection

Find duplicate or near-duplicate content:

RAG (Retrieval Augmented Generation)

Combine embeddings with AI chat for intelligent Q&A:

Advanced Techniques

Dimension Reduction

Some models support dimension reduction for faster processing:

Caching Embeddings

Embeddings are expensive - cache them:

Batch Optimization

Process large datasets efficiently:

Chunked Document Embeddings

Embed large documents by chunks:

Provider Comparison

OpenAI

Models:

  • text-embedding-3-small (1536 dimensions) - Default, balanced

  • text-embedding-3-large (3072 dimensions) - Highest quality

  • text-embedding-ada-002 (1536 dimensions) - Legacy

Pros:

  • High quality embeddings

  • Good for English text

  • Supports dimension reduction

Cons:

  • Requires API key

  • Costs money

  • Data sent to OpenAI servers

Usage:

Ollama

Models:

  • nomic-embed-text (768 dimensions) - Recommended

  • mxbai-embed-large (1024 dimensions) - High quality

  • Many others available

Pros:

  • Completely free

  • Runs locally

  • Private - data stays on your machine

  • No API key needed

Cons:

  • Requires Ollama installation

  • Slightly lower quality than OpenAI

  • Slower than API calls

Setup:

Usage:

Gemini

Models:

  • text-embedding-004 (768 dimensions)

  • embedding-001 (768 dimensions) - Legacy

Pros:

  • Good quality

  • Google infrastructure

  • Competitive pricing

Cons:

  • Requires API key

  • Data sent to Google

Usage:

Voyage AI

Models:

  • voyage-3 (1024 dimensions) - Latest, highest quality

  • voyage-3-lite (512 dimensions) - Faster, more efficient

  • voyage-code-3 (1024 dimensions) - Optimized for code

  • voyage-finance-2 (1024 dimensions) - Financial documents

  • voyage-law-2 (1024 dimensions) - Legal documents

Pros:

  • State-of-the-art quality for RAG and semantic search

  • Specialized models for specific domains

  • input_type parameter optimizes for queries vs documents

  • Excellent performance on retrieval benchmarks

Cons:

  • Embeddings only (no chat support)

  • Requires API key

  • Free tier has 3 RPM rate limit

  • Data sent to Voyage servers

Setup:

Usage:

When to Use Voyage:

  • Building RAG (Retrieval Augmented Generation) systems

  • Semantic search requiring highest accuracy

  • Domain-specific applications (code, finance, legal)

  • When you can optimize queries vs documents separately

Note: Voyage specializes in embeddings only. For chat completions, use OpenAI, Claude, or another provider.

Cohere

Models:

  • embed-english-v3.0 (1024 dimensions) - Latest English model, best quality

  • embed-multilingual-v3.0 (1024 dimensions) - Supports 100+ languages

  • embed-english-light-v3.0 (384 dimensions) - Faster, lighter version

  • embed-english-v2.0 (4096 dimensions) - Legacy, larger model

Pros:

  • Excellent multilingual support (100+ languages)

  • input_type parameter optimizes for different use cases

  • Multiple model sizes for speed/quality tradeoffs

  • Also offers chat capabilities

  • Good documentation and examples

  • Competitive pricing

Cons:

  • Requires API key

  • Data sent to Cohere servers

  • Rate limits on free tier

Setup:

Usage:

Input Types:

  • search_query - Optimize for search queries

  • search_document - Optimize for documents being searched

  • clustering - Optimize for clustering tasks

  • classification - Optimize for classification tasks

When to Use Cohere:

  • Need multilingual embeddings (100+ languages)

  • Want to optimize separately for queries vs documents

  • Building search, clustering, or classification systems

  • Need both embeddings and chat in one provider

  • Want multiple model size options

Best Practices

1. Choose the Right Model

2. Batch When Possible

3. Cache Embeddings

4. Normalize Text

5. Handle Errors Gracefully

6. Monitor Costs

Troubleshooting

Empty or Invalid Embeddings

Dimension Mismatches

Similarity Scores

Summary

Embeddings enable powerful semantic understanding:

  • Generate: Use aiEmbed() for single or batch processing

  • Search: Compare vectors with cosine similarity

  • Optimize: Cache embeddings, batch requests, choose right model

  • Apply: Semantic search, clustering, recommendations, RAG

Start with OpenAI for quality, try Ollama for privacy and cost savings!

Last updated