Building Custom Document Loaders

Learn how to build custom document loaders to integrate any data source into BoxLang AI workflows.

Create custom document loaders to integrate any data source into your BoxLang AI workflows. This guide shows you how to build loaders that work seamlessly with memory systems, RAG pipelines, and the aiDocuments() BIF.

🎯 Why Custom Loaders?

Build custom loaders when you need to:

  • Load proprietary formats - Parse custom file formats or data structures

  • Integrate APIs - Pull data from external services or APIs

  • Transform data - Apply domain-specific logic during loading

  • Add metadata - Extract and preserve rich metadata from sources

  • Optimize performance - Implement specialized caching or batching

🏗️ Loader Architecture

📝 IDocumentLoader Interface

All document loaders must implement the IDocumentLoader interface:

interface {
    /**
     * Load documents from the source
     * @return Array of Document objects
     */
    public array function load();

    /**
     * Load documents asynchronously
     * @return BoxLang Future resolving to array of Documents
     */
    public any function loadAsync();

    /**
     * Get the source being loaded
     * @return string
     */
    public string function getSource();

    /**
     * Configure the loader
     * @param config Configuration struct
     * @return this (for fluent API)
     */
    public any function configure( required struct config );
}

🚀 Quick Start: Simple Custom Loader

Here's a minimal custom loader that loads data from an API:

Usage:

🎨 Extending BaseDocumentLoader

The BaseDocumentLoader provides common functionality:

Inherited Methods

Fluent API Pattern

Implement fluent methods for your custom configuration:

Usage:

💡 Advanced Example: Database Loader

Here's a more complex example that loads data from a database with pagination:

Usage:

🔌 Registering Custom Loaders

Make your custom loader available via aiDocuments():

Module Registration

In your ModuleConfig.bx:

Usage After Registration

✅ Best Practices

1. Error Handling

Always wrap external calls with try/catch:

2. Resource Cleanup

Clean up resources in finally blocks:

3. Metadata Enrichment

Add rich metadata for better retrieval:

4. Performance Optimization

Implement batching and caching:

📚 Next Steps

🎓 Summary

Custom document loaders enable you to:

  • ✅ Integrate any data source into BoxLang AI workflows

  • ✅ Preserve rich metadata for better retrieval

  • ✅ Implement domain-specific logic and transformations

  • ✅ Work seamlessly with memory systems and RAG pipelines

  • ✅ Provide fluent APIs for easy configuration

Start with BaseDocumentLoader, implement the load() method, and you're ready to integrate your custom data sources!

Last updated