Key Concepts
Essential concepts and terminology for understanding BoxLang AI - your guide to AI, embeddings, RAG, and more.
Understanding these core concepts will help you make the most of BoxLang AI. This guide explains the terminology and ideas you'll encounter throughout the documentation.
📋 Table of Contents
🤖 AI & Machine Learning
Artificial Intelligence (AI)
Computer systems that can perform tasks that typically require human intelligence, such as understanding language, recognizing patterns, and making decisions.
In BoxLang AI: All the AI providers (OpenAI, Grok, Claude, etc.) use AI to understand and respond to your prompts.
Large Language Model (LLM)
A type of AI trained on massive amounts of text data to understand and generate human-like text. Examples: GPT-4, Claude, Grok, Gemini.
Key characteristics:
Trained on billions of text examples
Can understand context and nuance
Generate coherent, contextual responses
Follow instructions and answer questions
Training vs Inference
Training: The process of teaching an AI model (done by provider companies, not by you)
Inference: Using a trained model to generate responses (what you do with BoxLang AI)
💬 Language Models
Temperature
Controls randomness in AI responses. Range: 0.0 to 2.0+. Please note also that some providers may not even offer temperature settings. Or some offer different ranges. Check your provider's documentation for details.
Boxlang AI Example:
Top P (Nucleus Sampling)
Alternative to temperature. Limits token selection to top percentage of probability mass. Please note also that some providers may not offer topP settings or may have different ranges. Check your provider's documentation for details.
topP: 0.1- Very focused (top 10% of likely words)topP: 0.5- Moderate varietytopP: 1.0- Full vocabulary available (default)
Pro tip: Use either temperature OR topP, not both.
Max Tokens
Maximum length of the AI's response, measured in tokens. It is important because it affects both cost and the amount of information the AI can provide.
Note: Token limits include BOTH input (your prompt) and output (AI response).
Context Window
The maximum total tokens (input + output) a model can handle in one request.
GPT-4 Turbo
128,000 tokens (~96,000 words)
Claude 3 Opus
200,000 tokens (~150,000 words)
Gemini 1.5 Pro
2,000,000 tokens (~1.5M words)
Llama 3.1 8B
128,000 tokens
Please verify with your provider for the exact limits of the model you are using.
Why it matters: Determines how much conversation history or document context you can include.
📨 Messages & Conversations
Message Roles
Every message in a conversation has a role:
Example conversation:
System Messages
Special instructions that guide AI behavior throughout the conversation. It is important to set the right tone and constraints for the AI.
Best practices:
Only ONE system message per conversation
Place at the beginning
Be specific and clear
Define personality, constraints, and output format
Multi-Turn Conversations
Conversations with multiple back-and-forth exchanges. The AI remembers context from previous messages.
Visual comparison:
Without memory (stateless):
With memory (stateful):
Multimodal Capabilities
Modern AI models can process and generate multiple types of content beyond just text, including images, audio, and video.
Supported modalities:
📝 Text - Natural language input and output (all models)
🖼️ Images - Image understanding and generation (GPT-4 Vision, Claude 3, Gemini)
🎵 Audio - Speech recognition and synthesis (Whisper, TTS models)
🎥 Video - Video analysis (some advanced models)
Vision with aiMessage() (fluent API):
Vision example (raw format for advanced use):
Common use cases:
📸 Image analysis - Describe photos, extract text from images (OCR)
🏷️ Content moderation - Detect inappropriate visual content
📋 Document processing - Extract data from receipts, forms, invoices
🔍 Visual search - Find similar images or products
♿ Accessibility - Generate alt text for images
🎨 Image generation - Create images from text descriptions
Model support:
GPT-4o
✅
✅
✅ (via Whisper)
GPT-4 Turbo
✅
✅
❌
Claude 3 Opus/Sonnet
✅
✅
❌
Gemini 1.5 Pro
✅
✅
✅
Llama 3.2 Vision
✅
✅
❌
Note: Check your provider's documentation for specific model capabilities and pricing for multimodal inputs.
🧬 Embeddings & Vectors
Embeddings
Numerical representations of text as vectors (arrays of numbers) that capture semantic meaning. These are used for tasks like semantic search and similarity comparisons.
Key properties:
Similar meanings = similar vectors
Mathematical operations preserve semantic relationships
Enables semantic search (find by meaning, not just keywords)
Vector Dimensions
The number of values in an embedding vector. Different models produce different dimensions:
OpenAI
text-embedding-3-small: 1536 dimensionsOpenAI
text-embedding-3-large: 3072 dimensionsCohere
embed-english-v3.0: 1024 dimensionsVoyage
voyage-2: 1024 dimensions
Trade-off: More dimensions = better accuracy but more storage/compute.
Cosine Similarity
Measures how similar two vectors are (0 to 1). Cosine similarity is commonly used to compare embeddings.
1.0- Identical meaning0.8+- Very similar0.5- Somewhat related0.0- Unrelated
Used for: Finding the most relevant documents in semantic search. The mathematical formula is:
Where A · B is the dot product of vectors A and B, and ||A|| and ||B|| are the magnitudes of vectors A and B.
Visual representation:
Vector Database
Specialized database optimized for storing and searching vector embeddings.
Popular options in BoxLang AI:
ChromaDB - Local/cloud, easy to start
PostgreSQL (pgvector) - Enterprise-ready
Pinecone - Managed cloud service
Qdrant - High-performance
BoxVector - Built-in, simple in memory option
Weaviate - Scalable, cloud-native
MySQL (with vector support) - Common relational DB
💭 Memory Systems
Conversation Memory
Stores chat history to maintain context across interactions.
Types:
Window: Keep last N messages (simple, memory-efficient)
Summary: Auto-summarize old messages (long conversations)
Session: Web session-based (per-user in web apps)
File: Persist to disk (survives restarts)
Cache: Distributed storage (multiple servers)
JDBC: Database-backed (enterprise apps)
Vector Memory
Stores documents as embeddings for semantic search. Enables RAG.
Types: ChromaDB, PostgreSQL, Pinecone, Qdrant, Weaviate, MySQL, TypeSense, BoxVector, Milvus
Use cases:
Knowledge bases
Document search
Question answering with context
Recommendation systems
Hybrid Memory
BoxLang AI offers hybrid memory that combines conversation and vector memory. This allows agents to maintain chat context while also retrieving relevant documents.
Visual architecture:
Best of both worlds: Recent chat history + relevant document retrieval.
Multi-Tenant Memory
BoxLang AI also supports multi-tenant memory for applications with multiple users by isolating memory per user or conversation.
Why important: Prevents users from seeing each other's data in shared applications.
🎯 RAG (Retrieval Augmented Generation)
What is RAG?
Retrieval Augmented Generation - A technique that gives AI models access to external knowledge by retrieving relevant documents and including them in the prompt.
The problem RAG solves:
AI models have a knowledge cutoff date
Can't access your private/proprietary data
May hallucinate facts
The RAG solution:
Store documents as embeddings in vector memory
When user asks a question, search for relevant docs
Include retrieved docs in the prompt as context
AI answers based on YOUR data, not just training data
RAG Workflow
Chunking
Breaking large documents into smaller segments that fit in context windows. BoxLang AI offers several chunking strategies.
Strategies:
Recursive (recommended): Split by paragraphs → sentences → words
Fixed size: Equal-sized chunks
Semantic: Split by meaning/topics
Why overlap matters: Ensures context isn't lost at chunk boundaries.
🛠️ Tools & Function Calling
AI Tools
Functions that AI can call to access real-time data or perform actions. This is how you extend AI capabilities beyond text generation.
Example use cases:
Get current weather
Search databases
Execute calculations
Call external APIs
Retrieve user data
Function Calling
The process where AI decides to call a tool, executes it, and uses the result in its response.
Visual flow:
User: "What's the weather in Boston?"
AI thinks: "I need weather data, I'll call get_weather tool"
Tool executes:
get_weather("Boston")→{temp: 15, condition: "cloudy"}AI responds: "The weather in Boston is 15°C and cloudy"
Key point: AI automatically decides when to use tools based on the conversation.
Tool Schemas
JSON description of tool parameters that AI uses to call functions correctly.
Tip: Clear descriptions help AI use tools correctly.
📡 Streaming & Async Computations
Streaming
Receiving AI responses in real-time as tokens are generated, rather than waiting for the complete response. BoxLang AI supports streaming for better user experience.
Benefits:
Better UX (immediate feedback)
Feels faster
Can display partial results
Process data as it arrives
Use when: Building chat UIs, long responses, real-time applications.
Server-Sent Events (SSE)
The underlying protocol used for streaming. Providers send data chunks over HTTP as they're generated. BoxLang offers native SSE support for compatible providers.
Async (Asynchronous)
Non-blocking operations that return immediately with a "promise" (Future) of the result.
Use when: Making multiple AI calls in parallel, background processing, non-UI operations.
Futures
A "promise" of a value that will be available later. Returned by async operations. You can read more about BoxLang Futures here: https://boxlang.ortusbooks.com/boxlang-framework/asynchronous-programming/box-futures
🔗 Pipelines & Composition
Pipelines
Composable workflows that chain AI operations together. Inspired by Unix pipes.
Benefits:
Reusable components
Testable steps
Clear data flow
Easy to modify
Runnables
Components that can be executed and chained in pipelines. Must implement run() and optionally stream() (Implements our AiRunnable interface)
Runnable types:
AiModel- AI provider integrationAiMessage- Message templatesAiTransform- Data transformationsAiAgent- Autonomous agents
Chaining
Connecting runnables using .to() method.
Variable Binding
Using placeholders in templates that get replaced with actual values at runtime.
🌐 Providers & Services
AI Provider
A company/service that offers AI models (OpenAI, Anthropic, Google, etc.).
BoxLang AI supports: OpenAI, Claude, Gemini, Groq, Grok, DeepSeek, Ollama, Perplexity, HuggingFace, Mistral, OpenRouter, Cohere, Voyage.
Service Instance
A configured connection to a specific AI provider.
When to use: Need fine-grained control, multiple configurations, or service reuse.
Model
A specific AI model within a provider (e.g., gpt-4, claude-3-opus, gemini-pro).
Model selection matters:
Speed: Smaller models are faster
Cost: Larger models cost more per token
Quality: Larger models generally perform better
Features: Some features only work with specific models
Local vs Cloud
Cloud providers (OpenAI, Claude): Hosted remotely, requires API key, charges per use
Local providers (Ollama): Runs on your machine, free, private, offline-capable
Ollama advantages:
✅ No API costs
✅ Complete privacy
✅ Works offline
✅ No rate limits
Cloud advantages:
✅ More powerful models
✅ No hardware requirements
✅ Always up-to-date
💰 Tokens & Costs
Token
The basic unit of text processing in language models. Roughly:
1 token ≈ 4 characters
1 token ≈ 0.75 words
100 tokens ≈ 75 words
Example:
Token Count
The number of tokens in a text. Important for:
Cost estimation (charged per token)
Context limits (max tokens per request)
Response sizing (limit output length)
Input vs Output Tokens
Input tokens: Your prompt + conversation history
Output tokens: AI's response
Cost difference: Output tokens often cost 2-3x more than input tokens!
Rate Limits
Maximum number of requests allowed per time period by providers.
Typical limits:
Free tier: 3-20 requests/minute
Paid tier: 60-10,000 requests/minute
Enterprise: Custom limits
Handling rate limits:
🎯 Related Guides
📦 Installation - Get BoxLang AI set up
⚡ Quick Start - Your first AI interaction
🧩 Provider Setup - Configure AI providers
💬 Basic Chatting - Simple AI conversations
🤖 AI Agents - Autonomous AI assistants
🔮 Vector Memory - Semantic search
📄 RAG Guide - Retrieval Augmented Generation
💡 Quick Reference
Most Important Concepts:
Temperature - Controls randomness (0.0 = consistent, 1.0+ = creative)
Tokens - Basic unit of text (≈0.75 words, used for cost/limits)
Embeddings - Text as vectors for semantic search
RAG - Give AI access to your documents
Tools - Let AI call your functions
Memory - Maintain conversation context
Streaming - Real-time token-by-token responses
Pipelines - Chain AI operations together
When to use what:
🔥 Quick answers:
aiChat()💭 Conversations:
aiAgent()with memory📄 Your data: RAG with vector memory
🛠️ Real-time data: Tools/function calling
🎨 Consistent format: Structured output
⚡ Better UX: Streaming responses
Last updated