API reference - Exulu IMP

ExuluChunkers namespace

ExuluChunkers is exported as a namespace object:

import { ExuluChunkers } from "@exulu/backend";

// Access sentence chunker
const sentenceChunker = await ExuluChunkers.sentence.create({...});

// Access recursive chunker
const recursiveChunker = await ExuluChunkers.recursive.function.create({...});

// Access recursive rules
const rules = new ExuluChunkers.recursive.rules({...});

SentenceChunker

create()

Factory method to create a new SentenceChunker instance.

static async create(options: SentenceChunkerOptions): Promise<CallableSentenceChunker>

options

SentenceChunkerOptions

required

Configuration options for the chunker

options.chunkSize

number

required

Maximum number of tokens per chunk

options.chunkOverlap

number

default:0

Number of tokens to overlap between chunks (default: 0)

options.minSentencesPerChunk

number

default:1

Minimum sentences per chunk (default: 1)

options.minCharactersPerSentence

number

default:10

Minimum character length for a sentence (default: 10)

return

Promise<CallableSentenceChunker>

A callable chunker function that can be invoked with text

import { ExuluChunkers } from "@exulu/backend";

// Create chunker
const chunker = await ExuluChunkers.sentence.create({
  chunkSize: 512,
  chunkOverlap: 50,
  minSentencesPerChunk: 2,
  minCharactersPerSentence: 15
});

// Use chunker
const text = "Your document text here...";
const chunks = await chunker(text);

console.log(chunks.length);        // Number of chunks
console.log(chunks[0].text);       // First chunk text
console.log(chunks[0].tokenCount); // Token count

CallableSentenceChunker

The chunker returned by create() is a callable function:

async (text: string): Promise<Chunk[]>

text

string

required

The text to chunk

return

Promise<Chunk[]>

Array of Chunk objects

const chunks = await chunker("Long text to chunk...");

for (const chunk of chunks) {
  console.log(chunk.text);
  console.log(chunk.tokenCount);
  console.log(chunk.startIndex, chunk.endIndex);
}

Properties

The callable chunker also has properties from the SentenceChunker class:

chunkSize

number

Maximum tokens per chunk

chunkOverlap

number

Overlap in tokens

minSentencesPerChunk

number

Minimum sentences per chunk

minCharactersPerSentence

number

Minimum characters per sentence

tokenizer

ExuluTokenizer

The tokenizer instance used for counting tokens

console.log(chunker.chunkSize);         // 512
console.log(chunker.chunkOverlap);      // 50
console.log(chunker.minSentencesPerChunk); // 2

RecursiveChunker

create()

Factory method to create a new RecursiveChunker instance.

static async create(options: RecursiveChunkerOptions): Promise<CallableRecursiveChunker>

options

RecursiveChunkerOptions

required

Configuration options for the chunker

options.chunkSize

number

required

Maximum number of tokens per chunk

options.rules

RecursiveRules

default:"default rules"

Recursive splitting rules (default: paragraphs → sentences → pauses → words → tokens)

options.minCharactersPerChunk

number

default:50

Minimum character length for a chunk (default: 50)

return

Promise<CallableRecursiveChunker>

A callable chunker function that can be invoked with text

import { ExuluChunkers } from "@exulu/backend";

// Create with default rules
const chunker = await ExuluChunkers.recursive.function.create({
  chunkSize: 1024,
  minCharactersPerChunk: 75
});

// Or with custom rules
const rules = new ExuluChunkers.recursive.rules({
  levels: [
    { delimiters: ["\n\n"] },
    { delimiters: [". "] },
    { whitespace: true }
  ]
});

const customChunker = await ExuluChunkers.recursive.function.create({
  chunkSize: 1024,
  rules: rules,
  minCharactersPerChunk: 50
});

CallableRecursiveChunker

The chunker returned by create() is a callable function:

async (text: string): Promise<RecursiveChunk[]>

text

string

required

The text to chunk

return

Promise<RecursiveChunk[]>

Array of RecursiveChunk objects

const chunks = await chunker("Long text to chunk...");

for (const chunk of chunks) {
  console.log(`Level ${chunk.level}: ${chunk.text}`);
  console.log(`Tokens: ${chunk.tokenCount}`);
  console.log(`Range: ${chunk.startIndex}-${chunk.endIndex}`);
}

Properties

The callable chunker also has properties from the RecursiveChunker class:

chunkSize

number

Maximum tokens per chunk

rules

RecursiveRules

The recursive splitting rules

minCharactersPerChunk

number

Minimum characters per chunk

tokenizer

ExuluTokenizer

The tokenizer instance used for counting tokens

console.log(chunker.chunkSize);              // 1024
console.log(chunker.minCharactersPerChunk);  // 75
console.log(chunker.rules.length);           // Number of levels

RecursiveRules

Class representing recursive chunking rules.

Constructor

new RecursiveRules(data?: RecursiveRulesData)

data

RecursiveRulesData

Configuration for recursive rules

data.levels

RecursiveLevelData[]

Array of recursive levels defining the splitting hierarchy

import { ExuluChunkers } from "@exulu/backend";

// Create with default levels
const defaultRules = new ExuluChunkers.recursive.rules();

// Create with custom levels
const customRules = new ExuluChunkers.recursive.rules({
  levels: [
    { delimiters: ["\n\n", "\n"] },
    { delimiters: [". ", "! ", "? "] },
    { whitespace: true }
  ]
});

Default levels:

Paragraphs: ["\n\n", "\r\n", "\n", "\r"]
Sentences: [". ", "! ", "? "]
Pauses: ["{", "}", '"', "[", "]", "<", ">", "(", ")", ":", ";", ",", "—", "|", "~", "-", "...", "”, ”’”]`
Words: whitespace: true
Tokens: No delimiters

Properties

levels

RecursiveLevel[]

Array of recursive levels

length

number

Number of levels in the rules

const rules = new ExuluChunkers.recursive.rules();

console.log(rules.length);      // 5 (default levels)
console.log(rules.levels[0]);   // First level (paragraphs)

Methods

getLevel()

Get a level by index.

getLevel(index: number): RecursiveLevel | undefined

index

number

required

The index of the level to retrieve

return

RecursiveLevel | undefined

The level at the specified index, or undefined if not found

const rules = new ExuluChunkers.recursive.rules();

const firstLevel = rules.getLevel(0);   // Paragraphs level
const secondLevel = rules.getLevel(1);  // Sentences level
const invalid = rules.getLevel(999);    // undefined

toDict()

Convert rules to a dictionary-like object.

toDict(): RecursiveRulesData

return

RecursiveRulesData

Dictionary representation of the rules

const rules = new ExuluChunkers.recursive.rules({
  levels: [
    { delimiters: ["\n\n"] },
    { whitespace: true }
  ]
});

const dict = rules.toDict();
console.log(dict);
// { levels: [{ delimiters: ["\n\n"], whitespace: false, includeDelim: "prev" }, ...] }

fromDict()

Create RecursiveRules from a dictionary.

static fromDict(data: RecursiveRulesData): RecursiveRules

data

RecursiveRulesData

required

Dictionary representation of rules

return

RecursiveRules

New RecursiveRules instance

const data = {
  levels: [
    { delimiters: ["\n\n"] },
    { whitespace: true }
  ]
};

const rules = ExuluChunkers.recursive.rules.fromDict(data);

toString()

String representation of the rules.

toString(): string

return

string

String representation

const rules = new ExuluChunkers.recursive.rules();
console.log(rules.toString());
// "RecursiveRules(levels=[...])"

Symbol.iterator

The rules object is iterable:

for (const level of rules) {
  console.log(level.delimiters);
  console.log(level.whitespace);
}

RecursiveLevel

Class representing a single level in the recursive hierarchy.

Constructor

new RecursiveLevel(data?: RecursiveLevelData)

data

RecursiveLevelData

Configuration for the level

data.delimiters

string | string[]

Delimiter(s) to use for splitting at this level

data.whitespace

boolean

default:false

Whether to split on whitespace (default: false)

data.includeDelim

'prev' | 'next'

default:"prev"

Whether to include delimiter in previous or next chunk (default: “prev”)

// Single delimiter
const level1 = new RecursiveLevel({
  delimiters: "\n\n"
});

// Multiple delimiters
const level2 = new RecursiveLevel({
  delimiters: [". ", "! ", "? "],
  includeDelim: "prev"
});

// Whitespace splitting
const level3 = new RecursiveLevel({
  whitespace: true
});

// No delimiters (token-level fallback)
const level4 = new RecursiveLevel();

Cannot use both delimiters and whitespace in the same level. They are mutually exclusive.

Properties

delimiters

string | string[] | undefined

Custom delimiters for chunking

whitespace

boolean

Whether to use whitespace as a delimiter

includeDelim

'prev' | 'next'

Where to include the delimiter

const level = new RecursiveLevel({
  delimiters: [". ", "! ", "? "],
  includeDelim: "prev"
});

console.log(level.delimiters);    // [". ", "! ", "? "]
console.log(level.whitespace);    // false
console.log(level.includeDelim);  // "prev"

Methods

toDict()

Convert level to dictionary.

toDict(): RecursiveLevelData

return

RecursiveLevelData

Dictionary representation

const level = new RecursiveLevel({ delimiters: [". "] });
const dict = level.toDict();
console.log(dict);
// { delimiters: [". "], whitespace: false, includeDelim: "prev" }

fromDict()

Create RecursiveLevel from dictionary.

static fromDict(data: RecursiveLevelData): RecursiveLevel

data

RecursiveLevelData

required

Dictionary representation

return

RecursiveLevel

New RecursiveLevel instance

const data = { delimiters: [". "], includeDelim: "next" };
const level = RecursiveLevel.fromDict(data);

toString()

String representation of the level.

toString(): string

return

string

String representation

const level = new RecursiveLevel({ delimiters: [". "] });
console.log(level.toString());
// "RecursiveLevel(delimiters=["."], whitespace=false, includeDelim=prev)"

Chunk

Base class for text chunks.

Properties

text

string

The chunk text

startIndex

number

Starting index in the original text

endIndex

number

Ending index in the original text

tokenCount

number

Number of tokens in the chunk

embedding

number[] | undefined

Optional embedding vector for the chunk

const chunk = chunks[0];

console.log(chunk.text);        // "This is the first chunk..."
console.log(chunk.startIndex);  // 0
console.log(chunk.endIndex);    // 245
console.log(chunk.tokenCount);  // 48
console.log(chunk.embedding);   // undefined (or embedding array)

Methods

toString()

String representation of the chunk (returns the text).

toString(): string

return

string

The chunk text

console.log(chunk.toString()); // "This is the first chunk..."

toRepresentation()

Detailed string representation.

toRepresentation(): string

return

string

Detailed representation

console.log(chunk.toRepresentation());
// "Chunk(text='...', tokenCount=48, startIndex=0, endIndex=245)"

slice()

Get a slice of the chunk’s text.

slice(start?: number, end?: number): string

start

number

Starting index for the slice

end

number

Ending index for the slice

return

string

Sliced text

const chunk = chunks[0];
console.log(chunk.slice(0, 50)); // First 50 characters

toDict()

Convert chunk to dictionary.

toDict(): ChunkData

return

ChunkData

Dictionary representation

const dict = chunk.toDict();
console.log(dict);
// { text: "...", startIndex: 0, endIndex: 245, tokenCount: 48, embedding: undefined }

fromDict()

Create Chunk from dictionary.

static fromDict(data: ChunkData): Chunk

data

ChunkData

required

Dictionary representation

return

Chunk

New Chunk instance

const data = {
  text: "Sample text",
  startIndex: 0,
  endIndex: 11,
  tokenCount: 3
};

const chunk = Chunk.fromDict(data);

copy()

Create a deep copy of the chunk.

copy(): Chunk

return

Chunk

Deep copy of the chunk

const original = chunks[0];
const copy = original.copy();

console.log(copy.text === original.text); // true
console.log(copy === original);           // false (different objects)

RecursiveChunk

Extends Chunk with recursion level tracking.

Properties

All properties from Chunk, plus:

level

number | undefined

The recursion level at which this chunk was created

const chunk = chunks[0];

console.log(chunk.text);        // "This is the first chunk..."
console.log(chunk.tokenCount);  // 48
console.log(chunk.level);       // 0 (split at top level)

Level interpretation:

0: Split at first level (e.g., paragraphs)
1: Split at second level (e.g., sentences)
2: Split at third level (e.g., pauses)
etc.

Methods

All methods from Chunk, with overridden implementations that preserve the level property.

Usage examples

Basic sentence chunking

import { ExuluChunkers } from "@exulu/backend";

const chunker = await ExuluChunkers.sentence.create({
  chunkSize: 512,
  chunkOverlap: 50
});

const text = `
  Artificial intelligence is transforming industries worldwide.
  Machine learning enables computers to learn from data without
  explicit programming. Deep learning uses neural networks to
  recognize complex patterns in images, text, and audio.

  The field continues to evolve rapidly. New techniques emerge
  regularly, pushing the boundaries of what's possible.
`;

const chunks = await chunker(text);

console.log(`Created ${chunks.length} chunks`);

for (const [i, chunk] of chunks.entries()) {
  console.log(`\nChunk ${i + 1}:`);
  console.log(`  Text: ${chunk.text.slice(0, 50)}...`);
  console.log(`  Tokens: ${chunk.tokenCount}`);
  console.log(`  Range: ${chunk.startIndex}-${chunk.endIndex}`);
}

Recursive chunking with custom rules

import { ExuluChunkers } from "@exulu/backend";

// Define custom rules for markdown
const rules = new ExuluChunkers.recursive.rules({
  levels: [
    // Split by headers (keep header with content)
    {
      delimiters: ["\n## ", "\n### "],
      includeDelim: "next"
    },
    // Split by paragraphs
    { delimiters: ["\n\n"] },
    // Split by sentences
    { delimiters: [". ", "! ", "? "] },
    // Split by words
    { whitespace: true }
  ]
});

const chunker = await ExuluChunkers.recursive.function.create({
  chunkSize: 1024,
  rules: rules,
  minCharactersPerChunk: 75
});

const markdown = `
## Introduction

Machine learning is a subset of artificial intelligence.
It enables systems to learn and improve from experience.

## Applications

Recommendation systems use ML to personalize content.
Fraud detection systems identify suspicious patterns.
Autonomous vehicles use ML for navigation and decision-making.

## Future Directions

The field continues to advance rapidly.
New architectures and techniques emerge regularly.
`;

const chunks = await chunker(markdown);

console.log(`Created ${chunks.length} chunks`);

for (const [i, chunk] of chunks.entries()) {
  console.log(`\nChunk ${i + 1} (level ${chunk.level}):`);
  console.log(`  Text: ${chunk.text}`);
  console.log(`  Tokens: ${chunk.tokenCount}`);
}

Analyzing chunk statistics

const chunker = await ExuluChunkers.sentence.create({
  chunkSize: 512,
  chunkOverlap: 50
});

const text = "Your long document...";
const chunks = await chunker(text);

// Calculate statistics
const tokenCounts = chunks.map(c => c.tokenCount);
const avgTokens = tokenCounts.reduce((a, b) => a + b, 0) / chunks.length;
const maxTokens = Math.max(...tokenCounts);
const minTokens = Math.min(...tokenCounts);

console.log(`Chunks: ${chunks.length}`);
console.log(`Avg tokens: ${avgTokens.toFixed(2)}`);
console.log(`Max tokens: ${maxTokens}`);
console.log(`Min tokens: ${minTokens}`);
console.log(`Total tokens: ${tokenCounts.reduce((a, b) => a + b, 0)}`);

// Distribution
const histogram = {};
for (const chunk of chunks) {
  const bucket = Math.floor(chunk.tokenCount / 100) * 100;
  histogram[bucket] = (histogram[bucket] || 0) + 1;
}

console.log("\nToken distribution:");
for (const [bucket, count] of Object.entries(histogram)) {
  console.log(`  ${bucket}-${parseInt(bucket) + 99}: ${'*'.repeat(count)}`);
}

Inspecting level distribution (recursive)

const chunker = await ExuluChunkers.recursive.function.create({
  chunkSize: 1024
});

const text = "Your document...";
const chunks = await chunker(text);

// Count chunks by level
const levelCounts = {};
for (const chunk of chunks) {
  levelCounts[chunk.level || 0] = (levelCounts[chunk.level || 0] || 0) + 1;
}

console.log("Chunk distribution by level:");
for (const [level, count] of Object.entries(levelCounts)) {
  const levelName = ["Paragraphs", "Sentences", "Pauses", "Words", "Tokens"][level];
  console.log(`  Level ${level} (${levelName}): ${count} chunks`);
}

Using with ExuluContext

import { ExuluContext, ExuluChunkers, ExuluEmbedder } from "@exulu/backend";

// Create chunker
const chunker = await ExuluChunkers.sentence.create({
  chunkSize: 512,
  chunkOverlap: 75
});

// Create embedder
const embedder = new ExuluEmbedder({
  id: "openai_embedder",
  name: "OpenAI Embeddings",
  provider: "openai",
  model: "text-embedding-3-small",
  vectorDimensions: 1536
});

// Create context with chunker
const context = new ExuluContext({
  id: "documentation",
  name: "Product Documentation",
  description: "Searchable product documentation",
  embedder: embedder,
  chunker: chunker, // Documents will be chunked automatically
  fields: [
    { name: "title", type: "text", required: true },
    { name: "content", type: "longtext", required: true },
    { name: "url", type: "text", required: false }
  ],
  sources: []
});

// Add document - it's automatically chunked and embedded
await context.createItem(
  {
    title: "Getting Started Guide",
    content: "Very long documentation content...",
    url: "https://example.com/docs/getting-started"
  },
  { generateEmbeddings: true }
);

// Search returns relevant chunks
const results = await context.search({
  query: "How do I install?",
  limit: 5
});

for (const result of results) {
  console.log(`Score: ${result.score}`);
  console.log(`Chunk: ${result.chunk.text.slice(0, 100)}...`);
}

Type definitions

// Sentence chunker options
interface SentenceChunkerOptions {
  chunkSize: number;
  chunkOverlap?: number;
  minSentencesPerChunk?: number;
  minCharactersPerSentence?: number;
}

// Recursive chunker options
interface RecursiveChunkerOptions {
  chunkSize: number;
  rules?: RecursiveRules;
  minCharactersPerChunk?: number;
}

// Recursive rules data
interface RecursiveRulesData {
  levels?: RecursiveLevelData[];
}

// Recursive level data
interface RecursiveLevelData {
  delimiters?: string | string[];
  whitespace?: boolean;
  includeDelim?: "prev" | "next";
}

// Chunk data
interface ChunkData {
  text: string;
  startIndex: number;
  endIndex: number;
  tokenCount: number;
  embedding?: number[];
}

// Recursive chunk data
interface RecursiveChunkData extends ChunkData {
  level?: number;
}

Best practices

Use appropriate chunk size: Match your embedding model’s token limit. Leave 10-20% headroom for metadata.

Enable overlap for natural language: Use 10-20% overlap to preserve context at chunk boundaries.

Monitor chunk count: More chunks = higher embedding costs. Balance granularity with cost.

Choose the right chunker: SentenceChunker for most text, RecursiveChunker for structured documents.

​ExuluChunkers namespace

​SentenceChunker

​create()

​CallableSentenceChunker

​Properties

​RecursiveChunker

​create()

​CallableRecursiveChunker

​Properties

​RecursiveRules

​Constructor

​Properties

​Methods

​getLevel()

​toDict()

​fromDict()

​toString()

​Symbol.iterator

​RecursiveLevel

​Constructor

​Properties

​Methods

​toDict()

​fromDict()

​toString()

​Chunk

​Properties

​Methods

​toString()

​toRepresentation()

​slice()

​toDict()

​fromDict()

​copy()

​RecursiveChunk

​Properties

​Methods

​Usage examples

​Basic sentence chunking

​Recursive chunking with custom rules

​Analyzing chunk statistics

​Inspecting level distribution (recursive)

​Using with ExuluContext

​Type definitions

​Best practices

​Next steps

Configuration guide

Overview

ExuluChunkers namespace

SentenceChunker

create()

CallableSentenceChunker

Properties

RecursiveChunker

create()

CallableRecursiveChunker

Properties

RecursiveRules

Constructor

Properties

Methods

getLevel()

toDict()

fromDict()

toString()

Symbol.iterator

RecursiveLevel

Constructor

Properties

Methods

toDict()

fromDict()

toString()

Chunk

Properties

Methods

toString()

toRepresentation()

slice()

toDict()

fromDict()

copy()

RecursiveChunk

Properties

Methods

Usage examples

Basic sentence chunking

Recursive chunking with custom rules

Analyzing chunk statistics

Inspecting level distribution (recursive)

Using with ExuluContext

Type definitions

Best practices

Next steps