Frequently Asked Questions
Frequently asked questions about BoxLang AI - answers to common questions about costs, providers, performance, and usage.
Quick answers to the most common questions about BoxLang AI. If you don't find your answer here, check the main documentation or ask in the community forum.
📋 Table of Contents
🚀 Getting Started
Why use BoxLang AI instead of calling provider APIs directly?
Short answer: Productivity, flexibility, and consistency.
Benefits:
✅ Unified API - Same code works with any provider (OpenAI, Claude, Gemini, etc.)
✅ Switch providers - Change one config setting, no code changes
✅ Built-in features - Memory, tools, RAG, streaming, agents out-of-the-box
✅ Less boilerplate - Focus on your app, not HTTP requests
✅ Type safety - Structured output with BoxLang classes
✅ Multi-tenant ready - Built-in user/conversation isolation
✅ Production features - Events, logging, error handling, timeouts
Example: Same code, different provider:
What's the easiest way to get started?
Install the module:
Set an API key (or use free Ollama):
Make your first call:
Full guide: Quick Start Guide
What's the best free option for learning/testing?
Ollama - Completely free, runs locally on your machine.
Advantages:
✅ No API key needed
✅ No usage charges
✅ Works offline
✅ Complete privacy
✅ No rate limits
Setup:
Best models for Ollama:
qwen2.5:3b- Fast, good for testing (3GB)llama3.2:3b- Meta's model, good quality (2GB)mistral:7b- Best quality for the size (4GB)
Guide: Installation - Ollama Setup
Can I use BoxLang AI without an internet connection?
Yes! Use Ollama for completely offline AI.
Once you've pulled a model, it runs entirely on your machine:
Limitations: Local models are smaller and less capable than cloud models (GPT-4, Claude), but great for:
Privacy-sensitive applications
Offline environments
Development/testing
Cost savings
🤖 Providers & Models
Which AI provider should I use?
It depends on your needs:
Claude
Long context, analysis
Medium
Medium
Cohere
Embeddings, RAG
Low
Fast
DeepSeek
Code generation, reasoning
Low
Fast
Gemini
Google integration, multimodal
Low
Fast
Grok
Meta models, cost-effective
Low
Fast
Groq
Speed (ultra-fast inference)
Low
⚡ Fastest
Hugging Face
Custom models, flexibility
Varies
Varies
Mistral
Open models, balance
Low
Fast
Ollama
Free, private, offline
Free
Slow (local)
OpenAI
General purpose, reliability
Medium
Fast
OpenRouter
Cost-effective, multi-cloud
Low
Fast
Perplexity
Research, citations
Medium
Medium
Voyage
Enterprise, custom solutions
High
Varies
Recommendations:
🎯 General use: OpenAI (GPT-4)
💰 Budget: Gemini or Groq
🏠 Free/Private: Ollama
📝 Long documents: Claude (200K context)
⚡ Speed: Groq
💻 Code: DeepSeek or OpenAI
Full comparison: Provider Setup Guide
Can I use multiple providers in the same application?
Yes! You can mix and match providers for different tasks:
Common patterns:
Fast provider for UI responsiveness, powerful for complex tasks
Cloud for production, Ollama for development
Specialized models for specific tasks (code, analysis, chat)
💰 Costs & Pricing
How much does it cost to use BoxLang AI?
BoxLang AI module: Free (open source)
AI Provider costs: Pay-per-use (except Ollama which is free)
Typical pricing (per 1M tokens):
GPT-3.5 Turbo: $0.50 input / $1.50 output
GPT-4o: $2.50 input / $10 output
Claude 3 Haiku: $0.25 input / $1.25 output
Gemini 1.5 Flash: $0.075 input / $0.30 output
Ollama: $0 (free!)
Real-world examples:
How can I reduce AI costs?
Top strategies:
Use cheaper models for simple tasks
Limit response length
Use Ollama for development/testing
Cache responses for repeated queries
Use summarization for long conversations
Batch requests instead of one-by-one
More tips: Advanced Topics - Performance (coming soon)
How do I estimate token counts before making a request?
Use the aiTokens() function:
Rule of thumb:
1 token ≈ 0.75 words
100 tokens ≈ 75 words
1000 tokens ≈ 750 words (1 page)
Guide: Utilities - Token Counting
⚡ Performance & Reliability
Why do I get different responses each time?
This is normal AI behavior due to temperature (randomness setting).
Temperature scale:
0.0- Deterministic (same response every time)0.7- Default (balanced, some variation)1.0+- Creative (high variation)
For consistent responses:
When you want consistency:
Data extraction
Factual questions
Classification tasks
Structured output
When you want variety:
Creative writing
Brainstorming
Content generation
Multiple perspectives
What happens if an AI provider is down?
Built-in error handling:
Fallback provider pattern:
Production recommendations:
Monitor provider status pages
Implement retries with exponential backoff
Use multiple providers for critical apps
Cache responses when possible
Guide: Production Deployment (coming soon)
🎯 Features & Capabilities
Can I extract structured data from AI responses?
Yes! This is one of BoxLang AI's best features - Structured Output.
Works with:
Classes (type-safe)
Structs (flexible)
Arrays (multiple items)
JSON schemas
Full guide: Structured Output
Can AI access real-time data or call APIs?
Yes! Use Tools (function calling):
AI can call your functions for:
Database queries
API requests
File operations
Calculations
Any custom logic
Full guide: Tools & Function Calling
Can AI remember previous conversations?
Yes! Use Memory:
Memory types:
Window - Keep last N messages
Summary - Auto-summarize for long conversations
Session - Web session persistence
File - Save to disk
Cache - Distributed memory
JDBC - Database storage
Vector - Semantic search (for RAG)
Full guide: Memory Systems
Can AI answer questions about my documents?
Yes! This is called RAG (Retrieval Augmented Generation):
Works with:
PDF files
Markdown docs
Text files
Web pages
Databases
CSV data
Any text source
Full guide: RAG (Retrieval Augmented Generation)
Can I process images, audio, or video?
Yes! Many providers support multimodal content:
Provider support:
✅ OpenAI GPT-4 Vision
✅ Claude 3
✅ Gemini Pro Vision
⏳ Others adding support
Full guide: Advanced Chatting - Multimodal
💭 Memory & Context
What's the difference between conversation memory and vector memory?
Conversation Memory (stores recent chat):
Keeps message history
Simple append/retrieve
Used for: Multi-turn conversations, context retention
Types: Window, Summary, Session, File, Cache, JDBC
Vector Memory (stores documents for semantic search):
Stores documents as embeddings
Semantic search by meaning
Used for: RAG, knowledge bases, document Q&A
Types: ChromaDB, PostgreSQL, Pinecone, Qdrant, etc.
When to use each:
How do I prevent users from seeing each other's conversations?
Use multi-tenant memory with userId and conversationId:
Isolation guaranteed: Users NEVER see each other's data.
Works with ALL memory types: Window, Cache, File, JDBC, Vector, etc.
Full guide: Multi-Tenant Memory
🔐 Security & Privacy
Is my data sent to AI providers?
Yes, when using cloud providers (OpenAI, Claude, Gemini, etc.):
Your prompts and conversation history are sent to their servers
They process and return responses
Most providers don't train on your data (check their terms)
For complete privacy:
Best practices:
❌ Don't send passwords, API keys, or secrets to AI
❌ Don't send PII without user consent
✅ Use Ollama for sensitive data
✅ Review provider privacy policies
✅ Sanitize/anonymize data before sending
Full guide: Security & Best Practices (coming soon)
How do I prevent prompt injection attacks?
Prompt injection: When users trick AI by embedding instructions in their input.
Example attack:
Mitigation strategies:
Separate user input from instructions:
Validate and sanitize input:
Use structured output (harder to inject):
Monitor for suspicious patterns:
Where should I store API keys?
❌ Never hardcode:
✅ Use environment variables:
✅ Use BoxLang configuration:
✅ Use secrets management (production):
AWS Secrets Manager
Azure Key Vault
HashiCorp Vault
Environment-specific configs
Full guide: Provider Setup - API Keys
🔧 Troubleshooting
"Invalid API key" error
Check:
API key is correct (copy from provider dashboard)
Environment variable is set correctly
No extra spaces or quotes
Using the right provider name
"Rate limit exceeded" error
You're making too many requests too fast.
Solutions:
Wait and retry (most limits reset after 60 seconds)
Upgrade to paid tier
Add retry logic with backoff:
"Context length exceeded" error
Your prompt + conversation history is too long.
Solutions:
Use a model with larger context:
Truncate conversation history:
Summarize long conversations:
Chunk long documents:
Response is too slow
Try:
Switch to faster provider:
Use streaming (better perceived performance):
Use async for background tasks:
Limit response length:
💡 Best Practices
Should I use aiChat() or aiAgent()?
aiChat() or aiAgent()?Use aiChat() when:
✅ Simple one-off questions
✅ Stateless interactions
✅ Quick prototyping
✅ No conversation context needed
Use aiAgent() when:
✅ Multi-turn conversations
✅ Need memory/context
✅ Using tools/functions
✅ Complex workflows
✅ Autonomous behavior
Example comparison:
How many messages should I keep in memory?
Depends on use case:
Simple chat
10-20 messages
Customer support
20-50 messages
Long conversations
Use Summary memory
Document Q&A
Vector memory + 5-10 recent
Cost consideration:
More messages = more tokens = higher cost
Keep only what's needed for context
Should I cache AI responses?
Yes, when:
✅ Same questions asked repeatedly
✅ Static/unchanging content
✅ High traffic, low variety
✅ Cost is a concern
No, when:
❌ Responses need to be current
❌ High variety of unique questions
❌ Personalized responses
Implementation:
How do I handle errors gracefully?
Always wrap AI calls in try-catch:
User-friendly fallbacks:
🔗 More Resources
❓ Still Have Questions?
Last updated