microphoneText-to-Speech

Convert any text to natural-sounding audio using AI providers. aiSpeak() returns binary audio data wrapped in an AiSpeechResponse with helpers for saving, encoding, and streaming.

aiSpeak() converts text to natural-sounding speech using cloud AI providers. It returns an AiSpeechResponse object containing the binary audio data, with convenience methods for saving to disk, encoding as Base64, generating data URIs for HTML playback, and inspecting metadata.

🔧 The aiSpeak() Function

Syntax

aiSpeak( text, params={}, options={} )

Parameters

Parameter
Type
Required
Description

text

string

✅ Yes

The text to synthesize into speech

params

struct

No

Provider API parameters such as model, voice, speed

options

struct

No

Module-level options such as provider, apiKey, outputFile

Options

Option
Type
Default
Description

provider

string

(config)

AI provider: openai, mistral, gemini, grok, elevenlabs

apiKey

string

(env var)

Provider API key (falls back to <PROVIDER>_API_KEY env var)

voice

string

(config)

Voice name or ID for the provider, or the gender keyword "male" / "female" (resolved per provider via audio.voiceGenderMap in your config)

outputFormat

string

mp3

Audio output format: mp3, wav, flac, opus, pcm

speed

numeric

1.0

Playback speed multiplier (range: 0.25–4.0)

outputFile

string

""

If non-empty, saves audio to this path and returns the file path string instead of AiSpeechResponse

timeout

numeric

30

HTTP request timeout in seconds

logRequest

boolean

false

Log requests to the module log file

logRequestToConsole

boolean

false

Print request payload to console

logResponse

boolean

false

Log responses to the module log file

logResponseToConsole

boolean

false

Print raw provider response to console

📦 Return Value — AiSpeechResponse

When outputFile is not set, aiSpeak() returns an AiSpeechResponse object wrapping the binary audio data.

When outputFile is set, aiSpeak() saves the audio to that path and returns the absolute file path string instead of the response object.

AiSpeechResponse Methods

Method
Returns
Description

saveToFile( filePath )

string

Saves audio binary to disk; returns the absolute path

getBase64()

string

Returns the audio data as a Base64-encoded string

getMimeType()

string

Returns the MIME type (e.g. audio/mpeg for mp3)

toDataURI()

string

Returns a data:audio/mpeg;base64,... URI for an HTML <audio> src attribute

hasAudio()

boolean

Returns true if audio binary data is present and non-empty

getSize()

numeric

Returns the size of the audio data in bytes

getAudioFormat()

string

Returns the audio format string (e.g. mp3, wav)

toStruct()

struct

Returns a metadata struct (no binary data — safe for logging)

toJSON()

string

Returns JSON-serialized metadata

getMetadataValue( key )

any

Read a value from the response metadata bag

setMetadataValue( key, value )

this

Write a value to the metadata bag (fluent, chainable)

💡 Examples

Basic — synthesize and save to file

Shorthand with outputFile option

When you only need the file on disk, use outputFile to skip the response object entirely:

Custom provider and voice

🧱 Fluent Builder API (v3.2.0+)

Calling aiSpeak() with no arguments returns an AiSpeechRequest builder object for method chaining. This provides a more readable, self-documenting way to configure speech synthesis.

Basic Builder Usage

Builder Methods

Method
Description

of( text )

Set the text to synthesize (static factory)

.text( text )

Alias for of()

.model( name )

Set the TTS model

.provider( name )

Set the provider

.apiKey( key )

Set the API key

.voice( name )

Set the voice name

.male()

Use male voice (resolved per provider)

.female()

Use female voice (resolved per provider)

.speed( n )

Set playback speed (0.25–4.0)

.instructions( text )

Set voice instructions

.outputFile( path )

Set output file path

.asMP3()

Set output format to MP3

.asWav()

Set output format to WAV

.asFlac()

Set output format to FLAC

.asOpus()

Set output format to Opus

.asPCM()

Set output format to PCM

.withParams( struct )

Set provider params

.withOptions( struct )

Set module options

.withLogging()

Enable request/response logging

.speak()

Terminator — execute and return AiSpeechResponse

Fluent Examples

💡 Backward Compatible: The traditional aiSpeak( text, params, options ) syntax continues to work unchanged. The fluent builder is an additional option — no migration required.

Gender keyword voices

Use "male" or "female" for provider-agnostic voice selection. The module resolves the keyword to the concrete voice configured in audio.voiceGenderMap for the active provider — no need to remember provider-specific names:

Base64 / data URI for web responses

Embed audio directly in an HTML page or API response — no file I/O required:

ElevenLabs — high-quality multilingual voice

Generate comparison files across all voices

🎙️ Provider Voice Reference

Provider

Available Voices

"male" keyword

"female" keyword

OpenAI

alloy, ash, echo, fable, onyx, nova, shimmer

ash

nova

Mistral

Charlotte

(provider default)

Charlotte

Gemini

Fenrir, Aoede, Kore (and others via API)

Fenrir

Aoede

Grok / xAI

alloy, echo, fable, onyx, nova, shimmer, eve

onyx

nova

ElevenLabs

Voice IDs from your ElevenLabs voice library

(provider default)

21m00Tcm4TlvDq8ikWAM

For ElevenLabs, pass a voice_id in params for specific voices. The "male"/"female" keywords resolve to the IDs configured in audio.voiceGenderMap.

Customising the gender map

The defaults can be overridden per-provider in config/boxlang.json:

📡 Events

aiSpeak() fires two interception points you can hook into for logging, auditing, or modifying behavior.

Event
Data Available

beforeAISpeech

speechRequest, service

afterAISpeech

speechRequest, service, result


Last updated