Vector Memory Systems
Comprehensive guide to vector memory systems for semantic search and retrieval in BoxLang AI applications.
Vector memory enables semantic search and retrieval using embeddings to find contextually relevant information based on meaning rather than just recency. This guide covers all vector memory implementations and how to choose the right one for your needs.
π Looking for Standard Memory? For conversation history management, see the Memory Systems Guide.
π Table of Contents
π Multi-Tenant Isolation
All vector memory providers support multi-tenant isolation through userId and conversationId parameters. This enables secure, isolated vector storage for:
Per-user isolation: Separate vector collections per user
Per-conversation isolation: Multiple conversations for same user
Combined isolation: Complete conversation isolation in shared collections
How Multi-Tenant Works
Vector memories automatically filter searches and retrievals by userId/conversationId:
Multi-Conversation Support
Isolate multiple conversations for the same user:
Storage Strategy by Provider
BoxVector
Metadata
In-memory filter
Chroma
Metadata
$and operator
Milvus
Metadata
filter expressions
MySQL
Dedicated columns
SQL WHERE
Postgres
Dedicated columns
SQL WHERE
Pinecone
Metadata
$eq operators
Qdrant
Payload root
match filters
TypeSense
Root fields
:= filters
Weaviate
Properties root
GraphQL Equal
Hybrid
Delegates to vector provider
Provider-specific
All providers support getAllDocuments(), getRelevant(), and findSimilar() with automatic tenant filtering.
For enterprise patterns, security considerations, and advanced multi-tenancy, see the Multi-Tenant Memory Guide.
π Overview
Vector memory systems store conversation messages as embeddings (numerical vector representations) and enable semantic similarity search. Unlike standard memory that retrieves messages chronologically, vector memory finds the most relevant messages based on meaning.
ποΈ Vector Memory Architecture
Key Benefits
Semantic Understanding: Find relevant context based on meaning, not just keywords
Long-term Context: Search across thousands of past messages efficiently
Intelligent Retrieval: Get the most relevant history, even if discussed long ago
Scalable: Handle large conversation datasets with specialized vector databases
Flexible: Choose from local (in-memory), cloud, or self-hosted solutions
Use Cases
Customer Support: Retrieve relevant past support cases
Knowledge Bases: Find similar questions and answers
Long Conversations: Maintain context across lengthy interactions
Multi-session: Remember user preferences across sessions
RAG Applications: Combine document retrieval with AI responses
π How Vector Memory Works
π Vector Search Process
1. Embedding Generation
When you add a message, it's converted to a vector embedding:
2. Semantic Search
When retrieving context, vector memory finds similar messages:
3. Integration with Agents
Agents automatically use vector memory for context:
Choosing a Vector Provider
Quick Decision Matrix
BoxVector
Development, testing, small datasets
β Instant
Free
Good
β
Hybrid
Balanced recent + semantic
β Easy
Low
Excellent
β
ChromaDB
Python integration, local dev
βοΈ Moderate
Free
Good
β
PostgreSQL
Existing Postgres infrastructure
βοΈ Moderate
Low
Good
β
MySQL
Existing MySQL 9+ infrastructure
βοΈ Moderate
Low
Good
β
TypeSense
Fast typo-tolerant search, autocomplete
βοΈ Easy
Free/Paid
Excellent
β
Pinecone
Production, cloud-first
βοΈ Easy
Paid
Excellent
β
Qdrant
Self-hosted, high performance
βοΈ Complex
Free/Paid
Excellent
β
Weaviate
GraphQL, knowledge graphs
βοΈ Complex
Free/Paid
Excellent
β
Milvus
Enterprise, massive scale
βοΈ Complex
Free/Paid
Outstanding
β
Detailed Recommendations
Start Development:
Use BoxVector for immediate prototyping
Use Hybrid when you need both recent and semantic context
Production (Cloud):
Pinecone: Best for cloud-native, managed service
Qdrant Cloud: Excellent performance, generous free tier
Production (Self-Hosted):
PostgreSQL: If you already use Postgres
MySQL: If you already use MySQL 9+
TypeSense: Fast typo-tolerant search with low latency
Qdrant: Best performance for self-hosted
Milvus: Enterprise-grade, handles billions of vectors
Special Use Cases:
ChromaDB: Python ML infrastructure
Weaviate: Complex queries, GraphQL API
Hybrid: Best of both worlds (recent + semantic)
Vector Memory Types
BoxVectorMemory
In-memory vector storage perfect for development and testing.
Features:
No external dependencies
Instant setup
Full feature support
Cosine similarity search
Configuration:
Multi-Tenant Configuration:
Best For:
Local development
Testing
Small datasets (< 10,000 messages)
Proof of concepts
Limitations:
Data lost on restart
Limited to single instance
Memory usage grows with dataset
ChromaVectorMemory
ChromaDB integration for local vector storage.
Features:
Local persistence
Python ecosystem integration
Easy Docker deployment
Metadata filtering
Setup:
Configuration:
Multi-Tenant Configuration:
Best For:
Python-based infrastructure
Local development with persistence
Medium datasets (< 1M vectors)
PostgresVectorMemory
PostgreSQL with pgvector extension.
Features:
Use existing Postgres infrastructure
ACID compliance
Familiar SQL queries
Mature ecosystem
Setup:
Configuration:
Multi-Tenant Configuration:
Best For:
Existing PostgreSQL deployments
Applications requiring SQL access
Strong consistency requirements
Medium-large datasets
MysqlVectorMemory
MySQL 9+ with native VECTOR data type support.
Features:
Native vector storage (MySQL 9+)
Use existing MySQL infrastructure
ACID compliance
Familiar SQL ecosystem
Application-layer distance calculations (MySQL Community Edition compatible)
Requirements:
MySQL 9.0 or later (Community or Enterprise Edition)
Configured BoxLang datasource
VECTOR data type support
Setup:
MySQL 9 Community Edition includes native VECTOR data type support. No extensions needed - tables are auto-created:
Configuration:
BoxLang Datasource Setup:
Distance Functions:
COSINE: Cosine distance (1 - cosine similarity), best for semantic search
L2: Euclidean distance (L2 norm), good for spatial data
DOT: Dot product similarity, efficient for normalized vectors
Usage Example:
Multi-Tenant Configuration:
Best For:
Existing MySQL 9+ deployments
Organizations standardized on MySQL
Applications requiring SQL access
ACID compliance requirements
Medium-large datasets (millions of vectors)
Performance Notes:
Distance calculations performed in application layer (MySQL Community Edition compatible)
MySQL HeatWave (Oracle Cloud) provides native DISTANCE() function for optimal performance
Suitable for production use with proper indexing
Table is automatically created with collection-based indexing
MySQL Community vs HeatWave:
Community Edition (Free): VECTOR data type, app-layer distance calculations
HeatWave (Oracle Cloud): Native DISTANCE() function, VECTOR INDEX, GPU acceleration
TypesenseVectorMemory
TypeSense is a fast, typo-tolerant search engine optimized for instant search experiences and vector similarity search.
Features:
Lightning-fast search with typo tolerance
Native vector search support
Easy Docker deployment
RESTful API
Built-in relevance tuning
Excellent for autocomplete and instant search
Requirements:
TypeSense Server 0.23.0+ (vector search support)
HTTP/HTTPS access to TypeSense instance
API key for authentication
Setup:
Configuration:
TypeSense Cloud Configuration:
Usage Example:
Multi-Tenant Configuration:
Best For:
Applications requiring fast, low-latency search
Autocomplete and instant search features
Typo-tolerant semantic search
E-commerce product search
Documentation search
Customer support systems
Small to medium datasets (< 10M vectors)
TypeSense Advantages:
Speed: Sub-50ms search latency
Typo Tolerance: Built-in fuzzy search
Simple Setup: Single binary, easy Docker deployment
RESTful API: Simple HTTP API, easy integration
Relevance Tuning: Fine-grained control over ranking
Pricing:
Self-Hosted: Free (open source)
TypeSense Cloud:
Free tier: Development clusters
Paid: Production clusters from $0.03/hour
When to Choose TypeSense:
Need instant search with typo tolerance
Want simple deployment and management
Require low-latency semantic search
Building search-heavy applications
Need both keyword and vector search
Performance Notes:
Optimized for low-latency queries (< 50ms)
In-memory index for fast access
Horizontal scaling support
Efficient resource usage
PineconeVectorMemory
Pinecone managed cloud vector database.
Features:
Fully managed, no ops
Excellent performance
Auto-scaling
Built-in metadata filtering
Setup:
Sign up at pinecone.io
Create an index
Get API key
Configuration:
Multi-Tenant Configuration:
Best For:
Production cloud deployments
Teams without ML ops expertise
Rapid scaling requirements
Global deployments
Pricing:
Free tier: 1GB storage, 100K operations/month
Paid: Scales with usage
QdrantVectorMemory
Qdrant high-performance vector search engine.
Features:
Rust-based (excellent performance)
Rich filtering capabilities
Payload support
Self-hosted or cloud
Setup:
Configuration:
Multi-Tenant Configuration:
Best For:
High-performance requirements
Self-hosted production
Complex filtering needs
Large datasets (millions of vectors)
Qdrant Cloud:
Free tier: 1GB cluster
Excellent developer experience
WeaviateVectorMemory
Weaviate GraphQL vector database with knowledge graph capabilities.
Features:
GraphQL API
Automatic vectorization (optional)
Knowledge graph functionality
Rich schema support
Setup:
Configuration:
Multi-Tenant Configuration:
Best For:
Complex entity relationships
Knowledge graph requirements
GraphQL preferences
Multi-modal applications
MilvusVectorMemory
Milvus enterprise-grade distributed vector database.
Features:
Massive scalability (billions of vectors)
Distributed architecture
GPU acceleration support
Enterprise features
Setup:
Configuration:
Multi-Tenant Configuration:
Best For:
Enterprise deployments
Massive datasets (> 10M vectors)
High throughput requirements
GPU-accelerated search
Hybrid Memory
HybridMemory combines the benefits of both standard memory (recency) and vector memory (relevance).
How It Works
Maintains recent messages in a window
Stores all messages in vector database
Returns combination of recent + semantically relevant messages
Automatically deduplicates
Configuration
Multi-Tenant Configuration:
Benefits
Recent Context: Always includes latest messages
Semantic Relevance: Finds related past conversations
Balanced: Best of both approaches
Automatic: No manual context management
Use Cases
Configuration Examples
Development Setup
Production (Cloud)
Production (Self-Hosted)
Embedding Provider Options
With Caching
Best Practices
1. Choose Appropriate Embedding Models
2. Use Metadata for Filtering
3. Optimize Collection Size
4. Monitor Performance
5. Use Hybrid for User-Facing Apps
6. Dimension Matching
Ensure embedding dimensions match across your application:
7. Use Multi-Tenant Isolation
Securely isolate user and conversation data in shared collections:
Advanced Usage
Custom Similarity Thresholds
Multi-Collection Strategy
Cross-Session Continuity
Batch Operations
Troubleshooting
Common Issues
1. Dimension Mismatch
Solution: Ensure embedding model dimensions match collection configuration
2. Connection Errors
Solution: Verify host, port, and network accessibility. Check firewall rules.
3. API Key Issues
Solution: Verify API keys for both embedding provider and vector database
4. Slow Performance
Solution:
Enable caching for embeddings
Use appropriate index type (Milvus, Qdrant)
Reduce limit parameter
Consider smaller embedding model
5. Out of Memory
Solution: Switch to persistent vector database (Chroma, Postgres, etc.)
See Also
Memory Systems Guide - Standard conversation memory
Custom Vector Memory - Build your own provider
Embeddings Guide - Understanding embeddings
Agents Documentation - Using memory in agents
Examples - Complete working examples
Next Steps: Try the Vector Memory Examples or learn about building custom vector memory providers.
Last updated