3.1.0
BoxLang AI Module 3.1.0 — Audio, Async, Parallel Pipelines, and more.
Released: April 2026
BoxLang AI 3.1.0 introduces full audio support (text-to-speech, speech-to-text, audio translation), the ElevenLabs provider, async runnables on all pipeline objects, the new aiParallel() BIF for concurrent multi-model execution, and additional memory events.
✨ New Features
🎤 Audio BIFs
Three new global Built-in Functions expose AI audio capabilities through the same familiar interface as all other bx-ai BIFs.
aiSpeak( text, params, options )
Convert text to natural speech audio
aiTranscribe( audio, params, options )
Transcribe audio file/URL/binary to text
aiTranslate( audio, params, options )
Translate any-language audio to English text
Documentation: Audio — Speech & Transcription
// Text-to-speech — save to file
audio = aiSpeak( "Welcome to BoxLang AI!" )
audio.saveToFile( "/audio/welcome.mp3" )
// Speech-to-text — get plain text
text = aiTranscribe( "/recordings/meeting.mp3" )
// Audio translation — any language → English
english = aiTranslate( "/recordings/french-meeting.mp3" )🔌 New Interfaces: IAiSpeechService & IAiTranscriptionService
IAiSpeechService & IAiTranscriptionServiceProviders that support TTS and/or STT now implement one or both of these interfaces:
IAiSpeechService— exposesspeak( speechRequest )→AiSpeechResponseIAiTranscriptionService— exposestranscribe( transcriptionRequest )andtranslate( transcriptionRequest )→AiTranscriptionResponse
These interfaces are used internally by the aiSpeak(), aiTranscribe(), and aiTranslate() BIFs and are useful when extending the module with custom providers.
📦 Response Classes
AiSpeechResponse
AiSpeechResponsesaveToFile( filePath )
string
Saves audio binary to the given path; returns absolute path
getBase64()
string
Base64-encoded audio data
getMimeType()
string
MIME type (e.g. audio/mpeg)
toDataURI()
string
Ready-to-use data:audio/mpeg;base64,... URI
hasAudio()
boolean
true if audio data is present and non-empty
getSize()
numeric
Size in bytes
getAudioFormat()
string
Format string: mp3, wav, flac, etc.
toStruct()
struct
Metadata struct (no binary)
toJSON()
string
JSON-serialized metadata
getMetadataValue( key )
any
Read a metadata value
setMetadataValue( key, value )
this
Write a metadata value (chainable)
AiTranscriptionResponse
AiTranscriptionResponsegetText()
string
Transcribed / translated text
hasText()
boolean
true if text is non-empty
getWordCount()
numeric
Number of words
getFormattedDuration()
string
Human-readable duration, e.g. "1:23"
hasSegments()
boolean
true if segment data is present
hasWords()
boolean
true if word-level timestamp data is present
getSegments()
array
Array of segment structs with timestamps
getWords()
array
Array of word structs with timestamps
toStruct()
struct
Full struct representation
toJSON()
string
JSON-serialized response
toString()
string
Alias for getText()
AiBaseResponse (shared base)
AiBaseResponse (shared base)Both response classes extend AiBaseResponse, which provides:
model— model used for the operationprovider— provider name (string)metadata— raw provider response metadata structtimestamp— datetime when response was createdtoJSON()— serialize to JSONgetMetadataValue( key )— read from metadata bagsetMetadataValue( key, value )— write to metadata bag (chainable)
🤖 ElevenLabs Provider
A new elevenlabs provider brings high-quality, multilingual neural voice synthesis.
Supported operations: Text-to-speech only (STT/translation not supported)
Authentication: Set ELEVENLABS_API_KEY in your environment or pass apiKey in options. ElevenLabs uses the xi-api-key header (handled automatically by the module).
Default model: eleven_multilingual_v2
Voice selection: Pass a voice_id from your ElevenLabs voice library in params.
Module configuration:
⚡ Async Runnables — runAsync()
runAsync()All IAiRunnable objects now expose a runAsync() method that dispatches execution to the io-tasks virtual thread pool and returns a BoxFuture.
Affected types: AiModel, AiAgent, AiRunnableSequence, AiTransformRunnable, AiRunnableParallel
This is especially powerful when running multiple independent tasks simultaneously:
🔀 Parallel Pipelines — aiParallel()
aiParallel()The new aiParallel() BIF creates an AiRunnableParallel that fans-out a single input to multiple named runnables concurrently and collects results into a named struct.
runnables
struct
Named struct where each key is a result name and each value is an IAiRunnable
Returns: AiRunnableParallel — implements IAiRunnable, so it can be used in any pipeline.
Because AiRunnableParallel implements IAiRunnable, it composes into larger pipelines:
🎤 Audio Agent Tools
Three new tools are auto-registered in the global AI Tool Registry at module startup, making audio capabilities immediately available to any agent:
speak@bxai
Convert text to speech; returns the saved file path (auto-generates a temp file if no outputFile is supplied)
transcribe@bxai
Transcribe a local audio file or URL to plain text
translate@bxai
Translate any-language audio to English text
Opt-in by referencing the tool keys in your agent definition — audio tools are not injected automatically:
🗂️ FileSystem Agent Tools
A new FileSystemTools class (models/tools/filesystem/FileSystemTools.bx) ships 19 @mcpTool-annotated methods that cover the full filesystem lifecycle.
⚠️ Security-first design:
FileSystemToolsis NOT auto-registered. You must opt in explicitly viaaiToolRegistry().scanClass()so agents never receive filesystem access unless you grant it.
Path-guard constructor: Pass allowedPaths: [...] to canonicalize and validate every path argument at invocation time, blocking directory-traversal attacks before any file operation runs.
Available tool keys:
readFile@bxai
Read a file's contents
readMultipleFiles@bxai
Read multiple files at once
writeFile@bxai
Write content to a file
appendFile@bxai
Append content to a file
editFile@bxai
Apply targeted edits to a file
fileMetadata@bxai
Get file size, timestamps, permissions
pathExists@bxai
Check whether a path exists
deleteFile@bxai
Delete a file
moveFile@bxai
Move or rename a file
copyFile@bxai
Copy a file
searchFiles@bxai
Search for files by pattern or content
listAllowedDirectories@bxai
List directories the agent is allowed to access
listDirectory@bxai
List directory contents
directoryTree@bxai
Recursive directory tree
createDirectory@bxai
Create a directory
deleteDirectory@bxai
Delete a directory
zipFiles@bxai
Compress files into a zip archive
unzipFile@bxai
Extract a zip archive
checkZipFile@bxai
Inspect a zip archive's contents
🧠 New Memory Events
Two additional interception points are now fired by the memory subsystem.
onHybridMemoryAdd
When a message is added to a HybridMemory instance
memory, message
onVectorSearch
When a semantic search runs against a vector memory
memory, query, limit, results
🔊 Audio Module Configuration
A new audio section in module settings provides defaults for all audio operations:
📡 New Audio Events
Six new interception points cover all audio operations:
34
beforeAISpeech
Before a text-to-speech request is sent
35
afterAISpeech
After a text-to-speech response is received
36
beforeAITranscription
Before a speech-to-text request is sent
37
afterAITranscription
After a speech-to-text response is received
38
beforeAITranslation
Before an audio translation request is sent
39
afterAITranslation
After an audio translation response is received
🐛 Bug Fixes
Streaming
onAITokenCountnever fired:chatStream()across all providers never announced theonAITokenCountevent, making all streaming calls invisible to usage tracking, billing interceptors, and monitoring. The non-streamingchat()path was unaffected.AiModel.stream()missing middleware: Agent and model middleware were not injected intochatRequestduring streaming, unlike therun()path — now consistent.Closure scoping in tool calls:
DockerModelRunnerService,OpenAIService.chat(),OpenAIService.chatStream(), andCohereService.chat()all hadArgumentsScoperesolution failures inside.each()/.map()closures used for tool calling; fixed by capturing outer variables into local vars before the closure.onAITokenCountstandardization: Event data shape was inconsistent across providers. Standardized and added the missing event fire toBedrockService,ClaudeService,CohereService, andGeminiService.MCPServer.scan()/scanClass(): Several path-resolution cases and permutation combinations were not handled correctly; now reliable across dot-notation, absolute, relative, and instance inputs.aiAgent()skills/availableSkillsnormalization: Both parameters now accept a single skill instance or an array — the BIF normalizes to an array internally, removing the requirement to wrap a single skill in[].ModuleConfig.bxstartup order: Module now listens toonRuntimeStart()instead of an earlier hook, ensuring caches and other services are fully initialized before skills and global tool registry setup runs.Flight recorder tape directory: Tapes were written to an invalid/incorrect directory path; corrected.
Docker Service interface compatibility: Resolved breakage caused by the capability-interface refactor introduced in 3.0.0.
Memory count off-by-one:
WindowMemory.count()returnedmaxMessages + 1instead of actual message count when at capacityTool result serialization: Nested arrays in tool return values were double-serialized as JSON strings
Stream callback null chunk: Streaming responses from Gemini occasionally emitted a
nullchunk on the final SSE event; now handled gracefullyAWS Bedrock signature: Signed URL included a trailing slash in the path that caused
SignatureDoesNotMatcherrors on certain endpointsElevenLabs MIME detection:
AiSpeechResponse.getMimeType()returnedapplication/octet-streamfor ElevenLabs PCM audio; corrected toaudio/pcmGroq transcription model default: Module was sending
whisper-1instead ofwhisper-large-v3when no model was specified for GroqaiParallel empty struct: Calling
aiParallel({}).run("input")threw aNullPointerException; now returns an empty structSessionMemory web-only check:
SessionMemorythrew a misleading error when called outside a web context; error message now clearly states it requires an active HTTP sessionPDF loader binary mode:
PDFLoaderopened files in text mode on Windows, corrupting multi-byte characters; now always opens in binary modeHybridMemory deduplication: Combining
WindowMemoryrecent messages with vector search results could produce duplicate messages when the same message ranked highly in both sourcesaiTranscribe URL redirect: HTTP redirects (301/302) from audio file CDNs were not followed; now follows up to 5 redirects before throwing
📖 Documentation
Last updated