star3.1.0

BoxLang AI Module 3.1.0 — Audio, Async, Parallel Pipelines, and more.

Released: April 2026

BoxLang AI 3.1.0 introduces full audio support (text-to-speech, speech-to-text, audio translation), the ElevenLabs provider, async runnables on all pipeline objects, the new aiParallel() BIF for concurrent multi-model execution, and additional memory events.


✨ New Features

🎤 Audio BIFs

Three new global Built-in Functions expose AI audio capabilities through the same familiar interface as all other bx-ai BIFs.

BIF
Description

aiSpeak( text, params, options )

Convert text to natural speech audio

aiTranscribe( audio, params, options )

Transcribe audio file/URL/binary to text

aiTranslate( audio, params, options )

Translate any-language audio to English text

Documentation: Audio — Speech & Transcription

// Text-to-speech — save to file
audio = aiSpeak( "Welcome to BoxLang AI!" )
audio.saveToFile( "/audio/welcome.mp3" )

// Speech-to-text — get plain text
text = aiTranscribe( "/recordings/meeting.mp3" )

// Audio translation — any language → English
english = aiTranslate( "/recordings/french-meeting.mp3" )

🔌 New Interfaces: IAiSpeechService & IAiTranscriptionService

Providers that support TTS and/or STT now implement one or both of these interfaces:

  • IAiSpeechService — exposes speak( speechRequest )AiSpeechResponse

  • IAiTranscriptionService — exposes transcribe( transcriptionRequest ) and translate( transcriptionRequest )AiTranscriptionResponse

These interfaces are used internally by the aiSpeak(), aiTranscribe(), and aiTranslate() BIFs and are useful when extending the module with custom providers.


📦 Response Classes

AiSpeechResponse

Method
Returns
Description

saveToFile( filePath )

string

Saves audio binary to the given path; returns absolute path

getBase64()

string

Base64-encoded audio data

getMimeType()

string

MIME type (e.g. audio/mpeg)

toDataURI()

string

Ready-to-use data:audio/mpeg;base64,... URI

hasAudio()

boolean

true if audio data is present and non-empty

getSize()

numeric

Size in bytes

getAudioFormat()

string

Format string: mp3, wav, flac, etc.

toStruct()

struct

Metadata struct (no binary)

toJSON()

string

JSON-serialized metadata

getMetadataValue( key )

any

Read a metadata value

setMetadataValue( key, value )

this

Write a metadata value (chainable)

AiTranscriptionResponse

Method
Returns
Description

getText()

string

Transcribed / translated text

hasText()

boolean

true if text is non-empty

getWordCount()

numeric

Number of words

getFormattedDuration()

string

Human-readable duration, e.g. "1:23"

hasSegments()

boolean

true if segment data is present

hasWords()

boolean

true if word-level timestamp data is present

getSegments()

array

Array of segment structs with timestamps

getWords()

array

Array of word structs with timestamps

toStruct()

struct

Full struct representation

toJSON()

string

JSON-serialized response

toString()

string

Alias for getText()

AiBaseResponse (shared base)

Both response classes extend AiBaseResponse, which provides:

  • model — model used for the operation

  • provider — provider name (string)

  • metadata — raw provider response metadata struct

  • timestamp — datetime when response was created

  • toJSON() — serialize to JSON

  • getMetadataValue( key ) — read from metadata bag

  • setMetadataValue( key, value ) — write to metadata bag (chainable)


🤖 ElevenLabs Provider

A new elevenlabs provider brings high-quality, multilingual neural voice synthesis.

Supported operations: Text-to-speech only (STT/translation not supported)

Authentication: Set ELEVENLABS_API_KEY in your environment or pass apiKey in options. ElevenLabs uses the xi-api-key header (handled automatically by the module).

Default model: eleven_multilingual_v2

Voice selection: Pass a voice_id from your ElevenLabs voice library in params.

Module configuration:


⚡ Async Runnables — runAsync()

All IAiRunnable objects now expose a runAsync() method that dispatches execution to the io-tasks virtual thread pool and returns a BoxFuture.

Affected types: AiModel, AiAgent, AiRunnableSequence, AiTransformRunnable, AiRunnableParallel

This is especially powerful when running multiple independent tasks simultaneously:


🔀 Parallel Pipelines — aiParallel()

The new aiParallel() BIF creates an AiRunnableParallel that fans-out a single input to multiple named runnables concurrently and collects results into a named struct.

Parameter
Type
Description

runnables

struct

Named struct where each key is a result name and each value is an IAiRunnable

Returns: AiRunnableParallel — implements IAiRunnable, so it can be used in any pipeline.

Because AiRunnableParallel implements IAiRunnable, it composes into larger pipelines:


🎤 Audio Agent Tools

Three new tools are auto-registered in the global AI Tool Registry at module startup, making audio capabilities immediately available to any agent:

Tool Key
Description

speak@bxai

Convert text to speech; returns the saved file path (auto-generates a temp file if no outputFile is supplied)

transcribe@bxai

Transcribe a local audio file or URL to plain text

translate@bxai

Translate any-language audio to English text

Opt-in by referencing the tool keys in your agent definition — audio tools are not injected automatically:


🗂️ FileSystem Agent Tools

A new FileSystemTools class (models/tools/filesystem/FileSystemTools.bx) ships 19 @mcpTool-annotated methods that cover the full filesystem lifecycle.

⚠️ Security-first design: FileSystemTools is NOT auto-registered. You must opt in explicitly via aiToolRegistry().scanClass() so agents never receive filesystem access unless you grant it.

Path-guard constructor: Pass allowedPaths: [...] to canonicalize and validate every path argument at invocation time, blocking directory-traversal attacks before any file operation runs.

Available tool keys:

Tool Key
Description

readFile@bxai

Read a file's contents

readMultipleFiles@bxai

Read multiple files at once

writeFile@bxai

Write content to a file

appendFile@bxai

Append content to a file

editFile@bxai

Apply targeted edits to a file

fileMetadata@bxai

Get file size, timestamps, permissions

pathExists@bxai

Check whether a path exists

deleteFile@bxai

Delete a file

moveFile@bxai

Move or rename a file

copyFile@bxai

Copy a file

searchFiles@bxai

Search for files by pattern or content

listAllowedDirectories@bxai

List directories the agent is allowed to access

listDirectory@bxai

List directory contents

directoryTree@bxai

Recursive directory tree

createDirectory@bxai

Create a directory

deleteDirectory@bxai

Delete a directory

zipFiles@bxai

Compress files into a zip archive

unzipFile@bxai

Extract a zip archive

checkZipFile@bxai

Inspect a zip archive's contents


🧠 New Memory Events

Two additional interception points are now fired by the memory subsystem.

Event
When Fired
Available Data

onHybridMemoryAdd

When a message is added to a HybridMemory instance

memory, message

onVectorSearch

When a semantic search runs against a vector memory

memory, query, limit, results


🔊 Audio Module Configuration

A new audio section in module settings provides defaults for all audio operations:

📡 New Audio Events

Six new interception points cover all audio operations:

#
Event
When Fired

34

beforeAISpeech

Before a text-to-speech request is sent

35

afterAISpeech

After a text-to-speech response is received

36

beforeAITranscription

Before a speech-to-text request is sent

37

afterAITranscription

After a speech-to-text response is received

38

beforeAITranslation

Before an audio translation request is sent

39

afterAITranslation

After an audio translation response is received


🐛 Bug Fixes

  1. Streaming onAITokenCount never fired: chatStream() across all providers never announced the onAITokenCount event, making all streaming calls invisible to usage tracking, billing interceptors, and monitoring. The non-streaming chat() path was unaffected.

  2. AiModel.stream() missing middleware: Agent and model middleware were not injected into chatRequest during streaming, unlike the run() path — now consistent.

  3. Closure scoping in tool calls: DockerModelRunnerService, OpenAIService.chat(), OpenAIService.chatStream(), and CohereService.chat() all had ArgumentsScope resolution failures inside .each() / .map() closures used for tool calling; fixed by capturing outer variables into local vars before the closure.

  4. onAITokenCount standardization: Event data shape was inconsistent across providers. Standardized and added the missing event fire to BedrockService, ClaudeService, CohereService, and GeminiService.

  5. MCPServer.scan() / scanClass(): Several path-resolution cases and permutation combinations were not handled correctly; now reliable across dot-notation, absolute, relative, and instance inputs.

  6. aiAgent() skills / availableSkills normalization: Both parameters now accept a single skill instance or an array — the BIF normalizes to an array internally, removing the requirement to wrap a single skill in [].

  7. ModuleConfig.bx startup order: Module now listens to onRuntimeStart() instead of an earlier hook, ensuring caches and other services are fully initialized before skills and global tool registry setup runs.

  8. Flight recorder tape directory: Tapes were written to an invalid/incorrect directory path; corrected.

  9. Docker Service interface compatibility: Resolved breakage caused by the capability-interface refactor introduced in 3.0.0.

  10. Memory count off-by-one: WindowMemory.count() returned maxMessages + 1 instead of actual message count when at capacity

  11. Tool result serialization: Nested arrays in tool return values were double-serialized as JSON strings

  12. Stream callback null chunk: Streaming responses from Gemini occasionally emitted a null chunk on the final SSE event; now handled gracefully

  13. AWS Bedrock signature: Signed URL included a trailing slash in the path that caused SignatureDoesNotMatch errors on certain endpoints

  14. ElevenLabs MIME detection: AiSpeechResponse.getMimeType() returned application/octet-stream for ElevenLabs PCM audio; corrected to audio/pcm

  15. Groq transcription model default: Module was sending whisper-1 instead of whisper-large-v3 when no model was specified for Groq

  16. aiParallel empty struct: Calling aiParallel({}).run("input") threw a NullPointerException; now returns an empty struct

  17. SessionMemory web-only check: SessionMemory threw a misleading error when called outside a web context; error message now clearly states it requires an active HTTP session

  18. PDF loader binary mode: PDFLoader opened files in text mode on Windows, corrupting multi-byte characters; now always opens in binary mode

  19. HybridMemory deduplication: Combining WindowMemory recent messages with vector search results could produce duplicate messages when the same message ranked highly in both sources

  20. aiTranscribe URL redirect: HTTP redirects (301/302) from audio file CDNs were not followed; now follows up to 5 redirects before throwing


📖 Documentation

Last updated