Transformers & Return Formats

The guide to data transformation between AI pipeline steps using built-in return formats and custom transformers.

Transform and process data between pipeline steps. Learn about built-in transformers (return formats) and custom data transformations.

📖 Table of Contents

🎯 Built-In Transformers: Return Formats

The most common "transformers" in bx-ai are return formats - built-in ways to automatically transform AI responses.

🏗️ Transformation Pipeline

📊 Available Return Formats

All AI functions accept a returnFormat option that controls response transformation:

Format
Description
Returns
Use Case

single

Extract content only

String

Simple text responses

all

Full messages array

Array

Conversation history

raw

Complete API response

Struct

Debugging, metadata

json

Parse JSON response

Any

Structured data

xml

Parse XML response

XML Object

XML documents

Single Format (Default for Functions)

Returns just the content string - the most common use case:

Perfect for:

  • Simple questions

  • Text generation

  • When you only need the answer

All Format

Returns complete messages array with roles and metadata:

Perfect for:

  • Conversation history

  • Multi-turn chats

  • Analyzing conversation flow

Raw Format (Default for Pipelines)

Returns the complete API response with all metadata:

Perfect for:

  • Token usage tracking

  • Debugging

  • Custom response processing

  • Accessing metadata

JSON Format (NEW!)

Automatically parses JSON responses:

Perfect for:

  • Structured data extraction

  • API-like responses

  • Data transformation

  • Form generation

Advanced JSON Usage:

XML Format (NEW!)

Automatically parses XML responses:

Perfect for:

  • XML document generation

  • Legacy system integration

  • RSS/ATOM feeds

  • SOAP responses

Advanced XML Usage:

Using Return Formats in Pipelines

Pipelines use raw format by default, but you can set any format:

Helper methods for common formats:

Comparing Formats

🧰 Core Built-In Transformers

BoxLang AI ships with several powerful built-in transformers ready to use in your pipelines:

CodeExtractorTransformer

Extracts code blocks from AI responses, particularly useful when AI returns code embedded in markdown formatting.

Features:

  • Extract code from markdown code blocks (```language ... ```)

  • Filter by programming language (or extract all)

  • Extract single or multiple code blocks

  • Include metadata (language, line numbers, etc.)

  • Strip comments and normalize formatting

  • Strict mode for error handling

Configuration Options:

Option
Type
Default
Description

language

string

"all"

Filter by language ("all", "python", "java", etc.)

multiple

boolean

false

Extract all blocks (true) or first only (false)

returnMetadata

boolean

false

Return metadata with code or just code string

stripComments

boolean

false

Remove comments from extracted code

trim

boolean

true

Trim whitespace from code blocks

stripMarkdown

boolean

true

Look for markdown code blocks

strictMode

boolean

false

Throw error if no code found

defaultLanguage

string

"text"

Default language when not specified

Basic Usage:

Hope this helps! """;

// Extract just the Python code code = extractor.transform( aiResponse ); // Returns: "def add_numbers(a, b):\n return a + b\n\nresult = add_numbers(5, 3)\nprint(result)"

Extract Multiple Blocks:

JavaScript example:

""";

blocks = extractor.transform( multiCodeResponse ); // Returns: [ // { language: "python", code: 'print("Hello")' }, // { language: "javascript", code: 'console.log("Hello");' } // ]

Does this help? """;

// Extract and parse JSON data = extractor.transform( aiResponse ); // Returns: { name: "John Doe", age: 30, email: "[email protected]" }

Path Extraction:

""";

users = extractor.transform( response ); // Returns: [ { id: 1, name: "Alice" }, { id: 2, name: "Bob" } ]

Use Cases:

  • ✅ Extracting structured data from AI responses

  • ✅ Building form auto-population from AI

  • ✅ API response parsing

  • ✅ Configuration generation


XMLExtractorTransformer

Extracts and validates XML from AI responses, with support for XPath queries and case-sensitive parsing.

Features:

  • Extract XML from markdown code blocks

  • Find XML in mixed text (looks for <?xml or root tags)

  • Parse and validate XML structure

  • XPath queries for specific elements

  • Case-sensitive or case-insensitive parsing

  • Strict mode for error handling

Configuration Options:

Option
Type
Default
Description

stripMarkdown

boolean

true

Remove markdown code block formatting

strictMode

boolean

false

Throw error if XML invalid or not found

xPath

string

""

XPath query to extract specific elements

returnRaw

boolean

false

Return raw XML string instead of parsed

caseSensitive

boolean

true

Case-sensitive parsing

Basic Usage:

""";

// Extract and parse XML config = extractor.transform( aiResponse ); // Returns parsed XML document object

XPath Queries:

""";

hosts = extractor.transform( response ); // Returns array of matching nodes: [localhost]

Configuration Options:

AiTransformRunnable

A wrapper class that converts any lambda function into a pipeline-compatible transformer. This is what aiTransform() BIF creates internally.

Features:

  • Converts functions to IAiRunnable interface

  • Fluent API support

  • Pipeline integration

  • Named transformers

Usage:

Chaining Multiple Transformers:

🔧 Custom Transformers

Transformers process data between pipeline steps. They implement the IAiRunnable interface but ignore the options parameter since they don't interact with AI providers.

🔄 Custom Transform Flow

Inline Transform

Using aiTransform()

Named Transformer

Return Format Examples in Pipelines

Simple Text Extraction with .singleMessage()

JSON Data Extraction with .asJson()

XML Document Generation with .asXml()

Full Response with .rawResponse()

Conversation History with .allMessages()

Combining Return Formats with Custom Transforms

JSON Then Transform

XML Then Extract Data

Raw Response for Debugging

Common Transformations

Extract Content from Raw Response

String Manipulation

Parse JSON

Extract Code

⛓️ Chaining Transforms

🔗 Transform Chain Architecture

Sequential Processing

Data Enrichment

Options in Transformers

Transformers accept the options parameter for interface consistency but ignore it since they don't make AI requests:

Why options exist: Transformers implement IAiRunnable interface which requires the options parameter. This maintains a consistent API across all pipeline components, even though transformers don't use options.

Options propagation: When transformers are part of a pipeline sequence, options flow through to AI components:

Advanced Transforms

Conditional Logic

Error Handling

Data Validation

Practical Examples

Markdown to HTML

SQL Generator

Response Cache

Multi-Format Output

Transform Patterns

Filter Pattern

Map Pattern

Reduce Pattern

Aggregate Pattern

Transform Library

TransformAndRun Shortcut

Combine transform and run in one step:

Best Practices

  1. Keep Transforms Simple: One responsibility per transform

  2. Handle Errors: Use try/catch in transforms

  3. Document Logic: Comment complex transformations

  4. Test Transforms: Unit test transformation functions

  5. Chain Appropriately: Logical sequence of operations

  6. Return Consistent Types: Predictable output format

  7. Use Named Transforms: For reusability

Testing Transforms

🏗️ Building Your Own Transformers

Want to create custom transformers for your specific needs? BoxLang AI provides a complete framework for building reusable, pipeline-compatible transformers.

Learn More:

  • Building Custom Transformers - Complete guide with examples:

    • Implementing the ITransformer interface

    • Extending BaseTransformer

    • Real-world examples (JSONSchemaTransformer, code extractor, sentiment analyzer)

    • Pipeline integration patterns

    • Testing and best practices

Common Custom Transformer Use Cases:

  • 🔍 Data Validation - Validate and sanitize AI responses

  • 🔄 Format Conversion - Convert between JSON, XML, and custom formats

  • 📊 Content Extraction - Parse specific data from responses (code, prices, entities)

  • 🧮 Business Logic - Apply domain-specific rules and calculations

  • 📝 Logging & Monitoring - Track and audit data flow through pipelines

Next Steps

Last updated