🛠️Utility Functions

The bx-ai module provides powerful utility functions for text processing, token management, and working with AI models. These utilities help you prepare data, estimate costs, and optimize your AI interactions.

🎯 Utility Architecture

spinner

📋 Table of Contents

📄 Text Chunking

Break large texts into manageable segments that fit within AI token limits. Essential for processing long documents, articles, or books.

🔄 Chunking Flow

spinner

aiChunk() Function

Split text into smaller chunks using intelligent strategies that preserve meaning and context.

Basic Usage

Configuration Options

Chunking Strategies

Intelligently splits by trying larger units first (paragraphs → sentences → words → characters):

Best for:

  • Natural language documents

  • Articles, blog posts, documentation

  • Preserving semantic meaning

  • General-purpose text processing

How it works:

  1. Tries to split by paragraphs (double newlines)

  2. If paragraphs too large, splits by sentences (. ! ?)

  3. If sentences too large, splits by words

  4. If words too large, splits by characters

Characters

Simple character-based splitting:

Best for:

  • Consistent chunk sizes

  • Code or structured text

  • Maximum control over size

Words

Splits on word boundaries:

Best for:

  • Preserving complete words

  • Avoiding mid-word breaks

  • Language processing

Sentences

Splits on sentence boundaries:

Best for:

  • Preserving complete thoughts

  • Question answering systems

  • Semantic search preparation

Paragraphs

Splits on paragraph boundaries:

Best for:

  • Maintaining topic coherence

  • Document summarization

  • Large context windows

Understanding Overlap

Overlap preserves context between chunks by including text from the previous chunk:

Why use overlap?

  • Prevents losing context at chunk boundaries

  • Improves semantic search accuracy

  • Better for question answering across chunks

  • Helps AI models maintain coherence

Recommended overlap: 10-20% of chunk size

Real-World Examples

Processing Long Documents

Semantic Search Preparation

Token-Aware Chunking

🔢 Token Counting

Estimate token usage before making API calls. Essential for cost management and staying within model limits.

aiTokens() Function

Estimate token count for text using industry-standard heuristics.

Basic Usage

Estimation Methods

Characters Method (Default)

Uses the rule: 1 token ≈ 4 characters (OpenAI standard):

Best for:

  • English text

  • General-purpose estimation

  • Quick calculations

  • Conservative estimates

Words Method

Uses the multiplier: 1 token ≈ 1.3 words:

Best for:

  • Non-English text

  • Technical content

  • More accurate word-based languages

Detailed Statistics

Get comprehensive token analysis:

Batch Token Counting

Count tokens across multiple text chunks:

Real-World Examples

Cost Estimation

Model Selection

Request Validation

Batch Processing Optimization

Dynamic Chunking

Token Counting Guidelines

Understanding Token Ratios

Different content types have different character-to-token ratios:

Content Type
Characters per Token
Example

English text

~4

"Hello world" = 3 tokens

Code

~3.5

function foo() = 4 tokens

JSON

~3

{"key":"value"} = 6 tokens

Technical terms

~5

"Parameterization" = 4 tokens

Best Practices

  1. Always estimate before large requests

  2. Use detailed stats for optimization

  3. Add safety margins

  4. Cache token counts for repeated use

Combining Utilities

Use chunking and token counting together for optimal processing:

Tips and Tricks

Optimal Chunk Sizes by Use Case

Memory-Efficient Streaming

Intelligent Overlap Strategy

Object Population

The aiPopulate() function lets you manually convert JSON data or structs into typed BoxLang objects. Perfect for testing, caching AI responses, or working with pre-existing data.

aiPopulate() Function

Populate a class instance, struct template, or array from JSON string or struct data.

Basic Usage with Classes

Array Population

Struct Template Population

Use Cases

Testing with Mock Data

Caching AI Responses

Converting Existing Data

Transforming API Responses

With Nested Objects

Validation and Error Handling

Comparison: aiPopulate vs Structured Output

Feature

aiPopulate()

.structuredOutput()

Purpose

Manual population

AI extraction

Input

JSON/struct data

Natural language prompt

AI Call

❌ No (instant)

✅ Yes (costs tokens)

Use Case

Testing, caching, conversion

Live AI extraction

Type Safety

✅ Yes

✅ Yes

Validation

✅ Yes

✅ Yes

Best For

Known data, offline processing

Unknown data, AI parsing

Use aiPopulate() when:

  • Writing tests with mock data

  • Working with cached responses

  • Converting existing JSON/structs to typed objects

  • No AI interpretation needed

Use .structuredOutput() when:

  • Extracting data from natural language

  • Need AI to understand and parse content

  • Dealing with unstructured text

  • Real-time data extraction

Learn More

For complete details on structured output and object population:

Last updated