aiDocuments
Main entry point for document loading in AI workflows. Returns a fluent document loader for flexible configuration and execution.
Syntax
aiDocuments(source, config)Parameters
source
string
Yes
Source to load from: file path, directory, URL, or SQL query
config
struct
No
Configuration options for the loader
Config Options
type
string
Explicit loader type (auto-detected if omitted)
recursive
boolean
Recurse into subdirectories (directory loader)
extensions
array
File extensions to include: ["md", "txt"]
chunkSize
numeric
Chunk size for splitting documents
overlap
numeric
Overlap between chunks
delimiter
string
CSV delimiter character
encoding
string
File encoding (default: UTF-8)
Returns
Returns an IDocumentLoader instance with fluent API for chaining configuration and execution.
Supported Loader Types
text
.txt extension
Plain text files
markdown
.md extension
Markdown documents
csv
.csv extension
CSV data files
json
.json extension
JSON documents
xml
.xml extension
XML documents
directory
directoryExists()
Folder scanning
http
http:// or https://
Web page loading
feed
.rss, .atom, /feed URLs
RSS/Atom feeds
sql
Starts with SELECT, WITH
Database queries
crawler
Explicit type needed
Website crawling
Fluent API Methods
Configuration Methods
.recursive(boolean)- Enable/disable recursive directory scanning.extensions(array)- Filter by file extensions.chunkSize(numeric)- Set chunk size.overlap(numeric)- Set chunk overlap.filter(function)- Filter documents with callback.transform(function)- Transform documents with callback.onProgress(function)- Progress callback:(completed, total, doc) => {}
Execution Methods
.load()- Load and return array of documents.loadAsync()- Load asynchronously, return Future.toMemory(memory, options)- Load and ingest into memory.each(function)- Stream process each document:(doc) => {}
Examples
Simple File Loading
Directory Loading
With Chunking
Filtering
Progress Tracking
Direct to Memory (RAG)
Stream Processing
Loading from Web
Website Crawling
Database Loading
CSV Loading
Transformation
Async Loading
Multiple Sources
toMemory() Report Structure
When using .toMemory(), a report struct is returned:
Document Structure
Each loaded document has:
Notes
Auto-detection: Loader type automatically detected from source (file extension, URL pattern, SQL syntax)
Fluent API: Chain multiple configuration calls before execution
Lazy loading: No processing until
.load(),.loadAsync(),.toMemory(), or.each()calledMemory efficient:
.each()streams documents without loading all into memoryChunking: Automatic text chunking for RAG workflows
Progress tracking: Built-in progress callbacks for long operations
Error handling: Continues on error by default (configurable with
continueOnError)
Related Functions
aiChunk()- Manual text chunkingaiMemory()- Create memory instancesaiEmbed()- Generate embeddings
Best Practices
Last updated