Frequently Asked Questions

Frequently asked questions about BoxLang AI - answers to common questions about costs, providers, performance, and usage.

Quick answers to the most common questions about BoxLang AI. If you don't find your answer here, check the main documentation or ask in the community forum.

📋 Table of Contents


🚀 Getting Started

Why use BoxLang AI instead of calling provider APIs directly?

Short answer: Productivity, flexibility, and consistency.

Benefits:

  • Unified API - Same code works with any provider (OpenAI, Claude, Gemini, etc.)

  • Switch providers - Change one config setting, no code changes

  • Built-in features - Memory, tools, RAG, streaming, agents out-of-the-box

  • Less boilerplate - Focus on your app, not HTTP requests

  • Type safety - Structured output with BoxLang classes

  • Multi-tenant ready - Built-in user/conversation isolation

  • Production features - Events, logging, error handling, timeouts

Example: Same code, different provider:


What's the easiest way to get started?

  1. Install the module:

  2. Set an API key (or use free Ollama):

  3. Make your first call:

Full guide: Quick Start Guide


What's the best free option for learning/testing?

Ollama - Completely free, runs locally on your machine.

Advantages:

  • ✅ No API key needed

  • ✅ No usage charges

  • ✅ Works offline

  • ✅ Complete privacy

  • ✅ No rate limits

Setup:

Best models for Ollama:

  • qwen2.5:3b - Fast, good for testing (3GB)

  • llama3.2:3b - Meta's model, good quality (2GB)

  • mistral:7b - Best quality for the size (4GB)

Guide: Installation - Ollama Setup


Can I use BoxLang AI without an internet connection?

Yes! Use Ollama for completely offline AI.

Once you've pulled a model, it runs entirely on your machine:

Limitations: Local models are smaller and less capable than cloud models (GPT-4, Claude), but great for:

  • Privacy-sensitive applications

  • Offline environments

  • Development/testing

  • Cost savings


🤖 Providers & Models

Which AI provider should I use?

It depends on your needs:

Provider
Best For
Cost
Speed

Claude

Long context, analysis

Medium

Medium

Cohere

Embeddings, RAG

Low

Fast

DeepSeek

Code generation, reasoning

Low

Fast

Gemini

Google integration, multimodal

Low

Fast

Grok

Meta models, cost-effective

Low

Fast

Groq

Speed (ultra-fast inference)

Low

⚡ Fastest

Hugging Face

Custom models, flexibility

Varies

Varies

Mistral

Open models, balance

Low

Fast

Ollama

Free, private, offline

Free

Slow (local)

OpenAI

General purpose, reliability

Medium

Fast

OpenRouter

Cost-effective, multi-cloud

Low

Fast

Perplexity

Research, citations

Medium

Medium

Voyage

Enterprise, custom solutions

High

Varies

Recommendations:

  • 🎯 General use: OpenAI (GPT-4)

  • 💰 Budget: Gemini or Groq

  • 🏠 Free/Private: Ollama

  • 📝 Long documents: Claude (200K context)

  • Speed: Groq

  • 💻 Code: DeepSeek or OpenAI

Full comparison: Provider Setup Guide


Can I use multiple providers in the same application?

Yes! You can mix and match providers for different tasks:

Common patterns:

  • Fast provider for UI responsiveness, powerful for complex tasks

  • Cloud for production, Ollama for development

  • Specialized models for specific tasks (code, analysis, chat)


💰 Costs & Pricing

How much does it cost to use BoxLang AI?

BoxLang AI module: Free (open source)

AI Provider costs: Pay-per-use (except Ollama which is free)

Typical pricing (per 1M tokens):

  • GPT-3.5 Turbo: $0.50 input / $1.50 output

  • GPT-4o: $2.50 input / $10 output

  • Claude 3 Haiku: $0.25 input / $1.25 output

  • Gemini 1.5 Flash: $0.075 input / $0.30 output

  • Ollama: $0 (free!)

Real-world examples:


How can I reduce AI costs?

Top strategies:

  1. Use cheaper models for simple tasks

  2. Limit response length

  3. Use Ollama for development/testing

  4. Cache responses for repeated queries

  5. Use summarization for long conversations

  6. Batch requests instead of one-by-one

More tips: Advanced Topics - Performance (coming soon)


How do I estimate token counts before making a request?

Use the aiTokens() function:

Rule of thumb:

  • 1 token ≈ 0.75 words

  • 100 tokens ≈ 75 words

  • 1000 tokens ≈ 750 words (1 page)

Guide: Utilities - Token Counting


⚡ Performance & Reliability

Why do I get different responses each time?

This is normal AI behavior due to temperature (randomness setting).

Temperature scale:

  • 0.0 - Deterministic (same response every time)

  • 0.7 - Default (balanced, some variation)

  • 1.0+ - Creative (high variation)

For consistent responses:

When you want consistency:

  • Data extraction

  • Factual questions

  • Classification tasks

  • Structured output

When you want variety:

  • Creative writing

  • Brainstorming

  • Content generation

  • Multiple perspectives


What happens if an AI provider is down?

Built-in error handling:

Fallback provider pattern:

Production recommendations:

  • Monitor provider status pages

  • Implement retries with exponential backoff

  • Use multiple providers for critical apps

  • Cache responses when possible

Guide: Production Deployment (coming soon)


🎯 Features & Capabilities

Can I extract structured data from AI responses?

Yes! This is one of BoxLang AI's best features - Structured Output.

Works with:

  • Classes (type-safe)

  • Structs (flexible)

  • Arrays (multiple items)

  • JSON schemas

Full guide: Structured Output


Can AI access real-time data or call APIs?

Yes! Use Tools (function calling):

AI can call your functions for:

  • Database queries

  • API requests

  • File operations

  • Calculations

  • Any custom logic

Full guide: Tools & Function Calling


Can AI remember previous conversations?

Yes! Use Memory:

Memory types:

  • Window - Keep last N messages

  • Summary - Auto-summarize for long conversations

  • Session - Web session persistence

  • File - Save to disk

  • Cache - Distributed memory

  • JDBC - Database storage

  • Vector - Semantic search (for RAG)

Full guide: Memory Systems


Can AI answer questions about my documents?

Yes! This is called RAG (Retrieval Augmented Generation):

Works with:

  • PDF files

  • Markdown docs

  • Text files

  • Web pages

  • Databases

  • CSV data

  • Any text source

Full guide: RAG (Retrieval Augmented Generation)


Can I process images, audio, or video?

Yes! Many providers support multimodal content:

Provider support:

  • ✅ OpenAI GPT-4 Vision

  • ✅ Claude 3

  • ✅ Gemini Pro Vision

  • ⏳ Others adding support

Full guide: Advanced Chatting - Multimodal


💭 Memory & Context

What's the difference between conversation memory and vector memory?

Conversation Memory (stores recent chat):

  • Keeps message history

  • Simple append/retrieve

  • Used for: Multi-turn conversations, context retention

  • Types: Window, Summary, Session, File, Cache, JDBC

Vector Memory (stores documents for semantic search):

  • Stores documents as embeddings

  • Semantic search by meaning

  • Used for: RAG, knowledge bases, document Q&A

  • Types: ChromaDB, PostgreSQL, Pinecone, Qdrant, etc.

When to use each:


How do I prevent users from seeing each other's conversations?

Use multi-tenant memory with userId and conversationId:

Isolation guaranteed: Users NEVER see each other's data.

Works with ALL memory types: Window, Cache, File, JDBC, Vector, etc.

Full guide: Multi-Tenant Memory


🔐 Security & Privacy

Is my data sent to AI providers?

Yes, when using cloud providers (OpenAI, Claude, Gemini, etc.):

  • Your prompts and conversation history are sent to their servers

  • They process and return responses

  • Most providers don't train on your data (check their terms)

For complete privacy:

Best practices:

  • ❌ Don't send passwords, API keys, or secrets to AI

  • ❌ Don't send PII without user consent

  • ✅ Use Ollama for sensitive data

  • ✅ Review provider privacy policies

  • ✅ Sanitize/anonymize data before sending

Full guide: Security & Best Practices (coming soon)


How do I prevent prompt injection attacks?

Prompt injection: When users trick AI by embedding instructions in their input.

Example attack:

Mitigation strategies:

  1. Separate user input from instructions:

  2. Validate and sanitize input:

  3. Use structured output (harder to inject):

  4. Monitor for suspicious patterns:


Where should I store API keys?

❌ Never hardcode:

✅ Use environment variables:

✅ Use BoxLang configuration:

✅ Use secrets management (production):

  • AWS Secrets Manager

  • Azure Key Vault

  • HashiCorp Vault

  • Environment-specific configs

Full guide: Provider Setup - API Keys


🔧 Troubleshooting

"Invalid API key" error

Check:

  1. API key is correct (copy from provider dashboard)

  2. Environment variable is set correctly

  3. No extra spaces or quotes

  4. Using the right provider name


"Rate limit exceeded" error

You're making too many requests too fast.

Solutions:

  1. Wait and retry (most limits reset after 60 seconds)

  2. Upgrade to paid tier

  3. Add retry logic with backoff:


"Context length exceeded" error

Your prompt + conversation history is too long.

Solutions:

  1. Use a model with larger context:

  2. Truncate conversation history:

  3. Summarize long conversations:

  4. Chunk long documents:


Response is too slow

Try:

  1. Switch to faster provider:

  2. Use streaming (better perceived performance):

  3. Use async for background tasks:

  4. Limit response length:


💡 Best Practices

Should I use aiChat() or aiAgent()?

Use aiChat() when:

  • ✅ Simple one-off questions

  • ✅ Stateless interactions

  • ✅ Quick prototyping

  • ✅ No conversation context needed

Use aiAgent() when:

  • ✅ Multi-turn conversations

  • ✅ Need memory/context

  • ✅ Using tools/functions

  • ✅ Complex workflows

  • ✅ Autonomous behavior

Example comparison:


How many messages should I keep in memory?

Depends on use case:

Use Case
Recommended

Simple chat

10-20 messages

Customer support

20-50 messages

Long conversations

Use Summary memory

Document Q&A

Vector memory + 5-10 recent

Cost consideration:

  • More messages = more tokens = higher cost

  • Keep only what's needed for context


Should I cache AI responses?

Yes, when:

  • ✅ Same questions asked repeatedly

  • ✅ Static/unchanging content

  • ✅ High traffic, low variety

  • ✅ Cost is a concern

No, when:

  • ❌ Responses need to be current

  • ❌ High variety of unique questions

  • ❌ Personalized responses

Implementation:


How do I handle errors gracefully?

Always wrap AI calls in try-catch:

User-friendly fallbacks:


🔗 More Resources


❓ Still Have Questions?

Last updated