Key Concepts

Essential concepts and terminology for understanding BoxLang AI - your guide to AI, embeddings, RAG, and more.

Understanding these core concepts will help you make the most of BoxLang AI. This guide explains the terminology and ideas you'll encounter throughout the documentation.

📋 Table of Contents


🤖 AI & Machine Learning

Artificial Intelligence (AI)

Computer systems that can perform tasks that typically require human intelligence, such as understanding language, recognizing patterns, and making decisions.

In BoxLang AI: All the AI providers (OpenAI, Grok, Claude, etc.) use AI to understand and respond to your prompts.

Large Language Model (LLM)

A type of AI trained on massive amounts of text data to understand and generate human-like text. Examples: GPT-4, Claude, Grok, Gemini.

Key characteristics:

  • Trained on billions of text examples

  • Can understand context and nuance

  • Generate coherent, contextual responses

  • Follow instructions and answer questions

Training vs Inference

  • Training: The process of teaching an AI model (done by provider companies, not by you)

  • Inference: Using a trained model to generate responses (what you do with BoxLang AI)


💬 Language Models

Temperature

Controls randomness in AI responses. Range: 0.0 to 2.0+. Please note also that some providers may not even offer temperature settings. Or some offer different ranges. Check your provider's documentation for details.

Boxlang AI Example:

Top P (Nucleus Sampling)

Alternative to temperature. Limits token selection to top percentage of probability mass. Please note also that some providers may not offer topP settings or may have different ranges. Check your provider's documentation for details.

  • topP: 0.1 - Very focused (top 10% of likely words)

  • topP: 0.5 - Moderate variety

  • topP: 1.0 - Full vocabulary available (default)

Pro tip: Use either temperature OR topP, not both.

Max Tokens

Maximum length of the AI's response, measured in tokens. It is important because it affects both cost and the amount of information the AI can provide.

Note: Token limits include BOTH input (your prompt) and output (AI response).

Context Window

The maximum total tokens (input + output) a model can handle in one request.

Model
Context Window

GPT-4 Turbo

128,000 tokens (~96,000 words)

Claude 3 Opus

200,000 tokens (~150,000 words)

Gemini 1.5 Pro

2,000,000 tokens (~1.5M words)

Llama 3.1 8B

128,000 tokens

Please verify with your provider for the exact limits of the model you are using.

Why it matters: Determines how much conversation history or document context you can include.


📨 Messages & Conversations

Message Roles

Every message in a conversation has a role:

Example conversation:

System Messages

Special instructions that guide AI behavior throughout the conversation. It is important to set the right tone and constraints for the AI.

Best practices:

  • Only ONE system message per conversation

  • Place at the beginning

  • Be specific and clear

  • Define personality, constraints, and output format

Multi-Turn Conversations

Conversations with multiple back-and-forth exchanges. The AI remembers context from previous messages.

Visual comparison:

Without memory (stateless):

With memory (stateful):

Multimodal Capabilities

Modern AI models can process and generate multiple types of content beyond just text, including images, audio, and video.

Supported modalities:

  • 📝 Text - Natural language input and output (all models)

  • 🖼️ Images - Image understanding and generation (GPT-4 Vision, Claude 3, Gemini)

  • 🎵 Audio - Speech recognition and synthesis (Whisper, TTS models)

  • 🎥 Video - Video analysis (some advanced models)

Vision with aiMessage() (fluent API):

Vision example (raw format for advanced use):

Common use cases:

  • 📸 Image analysis - Describe photos, extract text from images (OCR)

  • 🏷️ Content moderation - Detect inappropriate visual content

  • 📋 Document processing - Extract data from receipts, forms, invoices

  • 🔍 Visual search - Find similar images or products

  • Accessibility - Generate alt text for images

  • 🎨 Image generation - Create images from text descriptions

Model support:

Model
Text
Vision
Audio

GPT-4o

✅ (via Whisper)

GPT-4 Turbo

Claude 3 Opus/Sonnet

Gemini 1.5 Pro

Llama 3.2 Vision

Note: Check your provider's documentation for specific model capabilities and pricing for multimodal inputs.


🧬 Embeddings & Vectors

Embeddings

Numerical representations of text as vectors (arrays of numbers) that capture semantic meaning. These are used for tasks like semantic search and similarity comparisons.

Key properties:

  • Similar meanings = similar vectors

  • Mathematical operations preserve semantic relationships

  • Enables semantic search (find by meaning, not just keywords)

Vector Dimensions

The number of values in an embedding vector. Different models produce different dimensions:

  • OpenAI text-embedding-3-small: 1536 dimensions

  • OpenAI text-embedding-3-large: 3072 dimensions

  • Cohere embed-english-v3.0: 1024 dimensions

  • Voyage voyage-2: 1024 dimensions

Trade-off: More dimensions = better accuracy but more storage/compute.

Cosine Similarity

Measures how similar two vectors are (0 to 1). Cosine similarity is commonly used to compare embeddings.

  • 1.0 - Identical meaning

  • 0.8+ - Very similar

  • 0.5 - Somewhat related

  • 0.0 - Unrelated

Used for: Finding the most relevant documents in semantic search. The mathematical formula is:

Where A · B is the dot product of vectors A and B, and ||A|| and ||B|| are the magnitudes of vectors A and B.

Visual representation:

Vector Database

Specialized database optimized for storing and searching vector embeddings.

Popular options in BoxLang AI:

  • ChromaDB - Local/cloud, easy to start

  • PostgreSQL (pgvector) - Enterprise-ready

  • Pinecone - Managed cloud service

  • Qdrant - High-performance

  • BoxVector - Built-in, simple in memory option

  • Weaviate - Scalable, cloud-native

  • MySQL (with vector support) - Common relational DB


💭 Memory Systems

Conversation Memory

Stores chat history to maintain context across interactions.

Types:

  • Window: Keep last N messages (simple, memory-efficient)

  • Summary: Auto-summarize old messages (long conversations)

  • Session: Web session-based (per-user in web apps)

  • File: Persist to disk (survives restarts)

  • Cache: Distributed storage (multiple servers)

  • JDBC: Database-backed (enterprise apps)

Vector Memory

Stores documents as embeddings for semantic search. Enables RAG.

Types: ChromaDB, PostgreSQL, Pinecone, Qdrant, Weaviate, MySQL, TypeSense, BoxVector, Milvus

Use cases:

  • Knowledge bases

  • Document search

  • Question answering with context

  • Recommendation systems

Hybrid Memory

BoxLang AI offers hybrid memory that combines conversation and vector memory. This allows agents to maintain chat context while also retrieving relevant documents.

Visual architecture:

Best of both worlds: Recent chat history + relevant document retrieval.

Multi-Tenant Memory

BoxLang AI also supports multi-tenant memory for applications with multiple users by isolating memory per user or conversation.

Why important: Prevents users from seeing each other's data in shared applications.


🎯 RAG (Retrieval Augmented Generation)

What is RAG?

Retrieval Augmented Generation - A technique that gives AI models access to external knowledge by retrieving relevant documents and including them in the prompt.

The problem RAG solves:

  • AI models have a knowledge cutoff date

  • Can't access your private/proprietary data

  • May hallucinate facts

The RAG solution:

  1. Store documents as embeddings in vector memory

  2. When user asks a question, search for relevant docs

  3. Include retrieved docs in the prompt as context

  4. AI answers based on YOUR data, not just training data

RAG Workflow

Chunking

Breaking large documents into smaller segments that fit in context windows. BoxLang AI offers several chunking strategies.

Strategies:

  • Recursive (recommended): Split by paragraphs → sentences → words

  • Fixed size: Equal-sized chunks

  • Semantic: Split by meaning/topics

Why overlap matters: Ensures context isn't lost at chunk boundaries.


🛠️ Tools & Function Calling

AI Tools

Functions that AI can call to access real-time data or perform actions. This is how you extend AI capabilities beyond text generation.

Example use cases:

  • Get current weather

  • Search databases

  • Execute calculations

  • Call external APIs

  • Retrieve user data

Function Calling

The process where AI decides to call a tool, executes it, and uses the result in its response.

Visual flow:

  1. User: "What's the weather in Boston?"

  2. AI thinks: "I need weather data, I'll call get_weather tool"

  3. Tool executes: get_weather("Boston"){temp: 15, condition: "cloudy"}

  4. AI responds: "The weather in Boston is 15°C and cloudy"

Key point: AI automatically decides when to use tools based on the conversation.

Tool Schemas

JSON description of tool parameters that AI uses to call functions correctly.

Tip: Clear descriptions help AI use tools correctly.


📡 Streaming & Async Computations

Streaming

Receiving AI responses in real-time as tokens are generated, rather than waiting for the complete response. BoxLang AI supports streaming for better user experience.

Benefits:

  • Better UX (immediate feedback)

  • Feels faster

  • Can display partial results

  • Process data as it arrives

Use when: Building chat UIs, long responses, real-time applications.

Server-Sent Events (SSE)

The underlying protocol used for streaming. Providers send data chunks over HTTP as they're generated. BoxLang offers native SSE support for compatible providers.

Async (Asynchronous)

Non-blocking operations that return immediately with a "promise" (Future) of the result.

Use when: Making multiple AI calls in parallel, background processing, non-UI operations.

Futures

A "promise" of a value that will be available later. Returned by async operations. You can read more about BoxLang Futures here: https://boxlang.ortusbooks.com/boxlang-framework/asynchronous-programming/box-futures


🔗 Pipelines & Composition

Pipelines

Composable workflows that chain AI operations together. Inspired by Unix pipes.

Benefits:

  • Reusable components

  • Testable steps

  • Clear data flow

  • Easy to modify

Runnables

Components that can be executed and chained in pipelines. Must implement run() and optionally stream() (Implements our AiRunnable interface)

Runnable types:

  • AiModel - AI provider integration

  • AiMessage - Message templates

  • AiTransform - Data transformations

  • AiAgent - Autonomous agents

Chaining

Connecting runnables using .to() method.

Variable Binding

Using placeholders in templates that get replaced with actual values at runtime.


🌐 Providers & Services

AI Provider

A company/service that offers AI models (OpenAI, Anthropic, Google, etc.).

BoxLang AI supports: OpenAI, Claude, Gemini, Groq, Grok, DeepSeek, Ollama, Perplexity, HuggingFace, Mistral, OpenRouter, Cohere, Voyage.

Service Instance

A configured connection to a specific AI provider.

When to use: Need fine-grained control, multiple configurations, or service reuse.

Model

A specific AI model within a provider (e.g., gpt-4, claude-3-opus, gemini-pro).

Model selection matters:

  • Speed: Smaller models are faster

  • Cost: Larger models cost more per token

  • Quality: Larger models generally perform better

  • Features: Some features only work with specific models

Local vs Cloud

  • Cloud providers (OpenAI, Claude): Hosted remotely, requires API key, charges per use

  • Local providers (Ollama): Runs on your machine, free, private, offline-capable

Ollama advantages:

  • ✅ No API costs

  • ✅ Complete privacy

  • ✅ Works offline

  • ✅ No rate limits

Cloud advantages:

  • ✅ More powerful models

  • ✅ No hardware requirements

  • ✅ Always up-to-date


💰 Tokens & Costs

Token

The basic unit of text processing in language models. Roughly:

  • 1 token ≈ 4 characters

  • 1 token ≈ 0.75 words

  • 100 tokens ≈ 75 words

Example:

Token Count

The number of tokens in a text. Important for:

  • Cost estimation (charged per token)

  • Context limits (max tokens per request)

  • Response sizing (limit output length)

Input vs Output Tokens

  • Input tokens: Your prompt + conversation history

  • Output tokens: AI's response

Cost difference: Output tokens often cost 2-3x more than input tokens!

Rate Limits

Maximum number of requests allowed per time period by providers.

Typical limits:

  • Free tier: 3-20 requests/minute

  • Paid tier: 60-10,000 requests/minute

  • Enterprise: Custom limits

Handling rate limits:



💡 Quick Reference

Most Important Concepts:

  1. Temperature - Controls randomness (0.0 = consistent, 1.0+ = creative)

  2. Tokens - Basic unit of text (≈0.75 words, used for cost/limits)

  3. Embeddings - Text as vectors for semantic search

  4. RAG - Give AI access to your documents

  5. Tools - Let AI call your functions

  6. Memory - Maintain conversation context

  7. Streaming - Real-time token-by-token responses

  8. Pipelines - Chain AI operations together

When to use what:

  • 🔥 Quick answers: aiChat()

  • 💭 Conversations: aiAgent() with memory

  • 📄 Your data: RAG with vector memory

  • 🛠️ Real-time data: Tools/function calling

  • 🎨 Consistent format: Structured output

  • Better UX: Streaming responses

Last updated