aiDocuments

Main entry point for document loading in AI workflows. Returns a fluent document loader for flexible configuration and execution.

Syntax

aiDocuments(source, config)

Parameters

Parameter
Type
Required
Description

source

string

Yes

Source to load from: file path, directory, URL, or SQL query

config

struct

No

Configuration options for the loader

Config Options

Option
Type
Description

type

string

Explicit loader type (auto-detected if omitted)

recursive

boolean

Recurse into subdirectories (directory loader)

extensions

array

File extensions to include: ["md", "txt"]

chunkSize

numeric

Chunk size for splitting documents

overlap

numeric

Overlap between chunks

delimiter

string

CSV delimiter character

encoding

string

File encoding (default: UTF-8)

Returns

Returns an IDocumentLoader instance with fluent API for chaining configuration and execution.

Supported Loader Types

Type
Auto-Detected From
Description

text

.txt extension

Plain text files

markdown

.md extension

Markdown documents

csv

.csv extension

CSV data files

json

.json extension

JSON documents

xml

.xml extension

XML documents

directory

directoryExists()

Folder scanning

http

http:// or https://

Web page loading

feed

.rss, .atom, /feed URLs

RSS/Atom feeds

sql

Starts with SELECT, WITH

Database queries

crawler

Explicit type needed

Website crawling

Fluent API Methods

Configuration Methods

  • .recursive(boolean) - Enable/disable recursive directory scanning

  • .extensions(array) - Filter by file extensions

  • .chunkSize(numeric) - Set chunk size

  • .overlap(numeric) - Set chunk overlap

  • .filter(function) - Filter documents with callback

  • .transform(function) - Transform documents with callback

  • .onProgress(function) - Progress callback: (completed, total, doc) => {}

Execution Methods

  • .load() - Load and return array of documents

  • .loadAsync() - Load asynchronously, return Future

  • .toMemory(memory, options) - Load and ingest into memory

  • .each(function) - Stream process each document: (doc) => {}

Examples

Simple File Loading

Directory Loading

With Chunking

Filtering

Progress Tracking

Direct to Memory (RAG)

Stream Processing

Loading from Web

Website Crawling

Database Loading

CSV Loading

Transformation

Async Loading

Multiple Sources

toMemory() Report Structure

When using .toMemory(), a report struct is returned:

Document Structure

Each loaded document has:

Notes

  • Auto-detection: Loader type automatically detected from source (file extension, URL pattern, SQL syntax)

  • Fluent API: Chain multiple configuration calls before execution

  • Lazy loading: No processing until .load(), .loadAsync(), .toMemory(), or .each() called

  • Memory efficient: .each() streams documents without loading all into memory

  • Chunking: Automatic text chunking for RAG workflows

  • Progress tracking: Built-in progress callbacks for long operations

  • Error handling: Continues on error by default (configurable with continueOnError)

Best Practices

Last updated