Overview

ExuluReranker is a component that improves search result quality by reordering chunks based on relevance to the user’s query. After initial vector or hybrid search retrieves candidate results, a reranker applies more sophisticated relevance scoring to surface the most useful chunks at the top.

Key features

Post-search refinement

Reorder search results after initial retrieval for better relevance

Model-agnostic

Use any reranking API (Cohere, Voyage AI, custom models)

Simple integration

Drop-in enhancement for ExuluContext search

Flexible scoring

Custom reranking logic or third-party services

What is a reranker?

A reranker is a specialized model or algorithm that:

Receives initial search results from vector or hybrid search
Analyzes query-document pairs to compute refined relevance scores
Reorders the results to surface the most relevant chunks first
Returns the reordered chunks to the application

Rerankers typically use cross-attention mechanisms or more sophisticated language models than embedding models, providing better relevance at the cost of higher latency. This makes them ideal for post-processing search results.

Why use reranking?

Better relevance

Reranking models analyze the relationship between query and document more deeply than embedding similarity, leading to more accurate results.

Two-stage retrieval

Fast vector search retrieves candidates, then slower but more accurate reranking refines the top results. This balances speed and quality.

Domain-specific ranking

Custom rerankers can implement business logic, user preferences, or domain-specific relevance criteria.

Handle semantic nuances

Rerankers can better understand negations, comparisons, and complex queries that embedding models might miss.

Quick start

import { ExuluReranker } from "@exulu/backend";

// Create a reranker using Cohere
const cohereReranker = new ExuluReranker({
  id: "cohere_reranker",
  name: "Cohere Rerank",
  description: "Reranks search results using Cohere's rerank-english-v3.0 model",
  execute: async ({ query, chunks }) => {
    const response = await fetch("https://api.cohere.com/v1/rerank", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.COHERE_API_KEY}`,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({
        model: "rerank-english-v3.0",
        query: query,
        documents: chunks.map(chunk => chunk.chunk_content),
        top_n: 10,
        return_documents: false
      })
    });

    const data = await response.json();

    // Reorder chunks based on Cohere's ranking
    return data.results
      .sort((a, b) => b.relevance_score - a.relevance_score)
      .map(result => chunks[result.index]);
  }
});

// Use with ExuluContext
const docsContext = new ExuluContext({
  id: "documentation",
  name: "Documentation",
  description: "Product docs",
  active: true,
  fields: [/* ... */],
  sources: [],
  resultReranker: async (results) => {
    return cohereReranker.run(
      results[0]?.query || "",
      results
    );
  }
});

Architecture

Integration with ExuluContext

Rerankers integrate with ExuluContext through the resultReranker configuration:

const context = new ExuluContext({
  // ... other config
  resultReranker: async (chunks) => {
    // Access the query from the first chunk's context
    const query = chunks[0]?.context?.query || "";

    // Apply reranking
    return reranker.run(query, chunks);
  }
});

When a search is performed:

Vector/hybrid search retrieves initial candidates
resultReranker is called with the chunks
Reranker reorders the chunks
Reordered results are returned to the agent/user

Chunk structure

Rerankers work with VectorSearchChunkResult objects:

type VectorSearchChunkResult = {
  chunk_content: string;           // The text content
  chunk_index: number;             // Position in original document
  chunk_id: string;                // Unique chunk ID
  chunk_source: string;            // Source item ID
  chunk_metadata: Record<string, string>;
  chunk_created_at: string;
  chunk_updated_at: string;
  item_id: string;                 // Parent item ID
  item_external_id: string;
  item_name: string;               // Parent item name
  item_updated_at: string;
  item_created_at: string;
  chunk_cosine_distance?: number;  // Similarity score
  chunk_fts_rank?: number;         // Keyword score
  chunk_hybrid_score?: number;     // Combined score
  context?: {
    name: string;
    id: string;
  };
};

Common reranking strategies

API-based reranking
Custom scoring
LLM-based reranking
Hybrid approach

Use third-party reranking APIs like Cohere, Voyage AI, or Jina AI:

const apiReranker = new ExuluReranker({
  id: "cohere_rerank",
  name: "Cohere Reranker",
  description: "Uses Cohere rerank API",
  execute: async ({ query, chunks }) => {
    const response = await fetch("https://api.cohere.com/v1/rerank", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${API_KEY}`,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({
        model: "rerank-english-v3.0",
        query,
        documents: chunks.map(c => c.chunk_content),
        top_n: chunks.length
      })
    });

    const data = await response.json();
    return data.results.map(r => chunks[r.index]);
  }
});

Implement custom business logic:

const customReranker = new ExuluReranker({
  id: "custom_scorer",
  name: "Custom Scorer",
  description: "Custom relevance scoring with business rules",
  execute: async ({ query, chunks }) => {
    // Score each chunk
    const scored = chunks.map(chunk => ({
      chunk,
      score: computeCustomScore(query, chunk)
    }));

    // Sort by score descending
    scored.sort((a, b) => b.score - a.score);

    return scored.map(s => s.chunk);
  }
});

function computeCustomScore(query: string, chunk: VectorSearchChunkResult): number {
  let score = chunk.chunk_hybrid_score || 0;

  // Boost recent content
  const age = Date.now() - new Date(chunk.chunk_updated_at).getTime();
  const daysSinceUpdate = age / (1000 * 60 * 60 * 24);
  score += Math.max(0, 1 - daysSinceUpdate / 365);

  // Boost specific categories
  if (chunk.chunk_metadata.category === "tutorial") {
    score *= 1.2;
  }

  // Penalize very short chunks
  if (chunk.chunk_content.length < 100) {
    score *= 0.7;
  }

  return score;
}

Use an LLM to judge relevance:

const llmReranker = new ExuluReranker({
  id: "llm_rerank",
  name: "LLM Reranker",
  description: "Uses GPT-4 to score relevance",
  execute: async ({ query, chunks }) => {
    // Score chunks in parallel
    const scoredPromises = chunks.map(async (chunk) => {
      const prompt = `On a scale of 0-10, how relevant is this passage to the query?

Query: "${query}"

Passage: "${chunk.chunk_content}"

Respond with only a number.`;

      const response = await openai.chat.completions.create({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: prompt }],
        temperature: 0
      });

      const score = parseFloat(response.choices[0].message.content);

      return { chunk, score };
    });

    const scored = await Promise.all(scoredPromises);
    scored.sort((a, b) => b.score - a.score);

    return scored.map(s => s.chunk);
  }
});

LLM-based reranking can be expensive and slow. Use for small result sets or where quality is critical.

Combine multiple signals:

const hybridReranker = new ExuluReranker({
  id: "hybrid_rerank",
  name: "Hybrid Reranker",
  description: "Combines API reranking with custom scoring",
  execute: async ({ query, chunks }) => {
    // Step 1: Get API reranking scores
    const apiResponse = await fetch("https://api.cohere.com/v1/rerank", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${API_KEY}`,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({
        model: "rerank-english-v3.0",
        query,
        documents: chunks.map(c => c.chunk_content)
      })
    });

    const apiData = await apiResponse.json();

    // Step 2: Combine with custom business logic
    const scored = chunks.map((chunk, idx) => {
      const apiScore = apiData.results.find(r => r.index === idx)?.relevance_score || 0;
      const customScore = computeCustomScore(chunk);

      return {
        chunk,
        score: (apiScore * 0.7) + (customScore * 0.3) // Weighted combination
      };
    });

    scored.sort((a, b) => b.score - a.score);
    return scored.map(s => s.chunk);
  }
});

Usage patterns

With ExuluContext

The most common pattern is integrating with ExuluContext:

import { ExuluContext, ExuluReranker } from "@exulu/backend";

// Create reranker
const reranker = new ExuluReranker({
  id: "my_reranker",
  name: "My Reranker",
  description: "Custom reranking logic",
  execute: async ({ query, chunks }) => {
    // Your reranking logic
    return reorderedChunks;
  }
});

// Use with context
const context = new ExuluContext({
  id: "docs",
  name: "Documentation",
  // ... other config
  resultReranker: async (chunks) => {
    const query = chunks[0]?.context?.query || "";
    return reranker.run(query, chunks);
  }
});

Standalone usage

You can also use rerankers independently:

const chunks = await context.search({
  query: "How do I authenticate?",
  method: "hybridSearch",
  // ... other params
});

const rerankedChunks = await reranker.run(
  "How do I authenticate?",
  chunks.chunks
);

console.log(rerankedChunks[0].chunk_content); // Most relevant result

Conditional reranking

Apply reranking only when needed:

resultReranker: async (chunks) => {
  // Only rerank if we have enough results
  if (chunks.length < 3) {
    return chunks;
  }

  // Only rerank if scores are similar (ambiguous results)
  const scores = chunks
    .map(c => c.chunk_hybrid_score || 0)
    .filter(s => s > 0);

  if (scores.length === 0) return chunks;

  const avgScore = scores.reduce((a, b) => a + b) / scores.length;
  const variance = scores.reduce((sum, score) => sum + Math.pow(score - avgScore, 2), 0) / scores.length;

  // Low variance means clear ranking, skip reranking
  if (variance < 0.01) {
    return chunks;
  }

  // High variance, apply reranking
  const query = chunks[0]?.context?.query || "";
  return reranker.run(query, chunks);
}

Popular reranking services

Cohere Rerank

High-quality reranking with multilingual support

Voyage AI

Domain-specific reranking models

Jina AI

Open-source and API-based reranking

Custom models

Self-hosted models like cross-encoders or custom LLM scoring

Best practices

Rerank the right amount: Reranking is slower than initial retrieval. Retrieve more candidates (e.g., 50-100) with fast vector search, then rerank the top 10-20.

Preserve metadata: Ensure your reranker returns chunks with all their original metadata intact. Only change the order, not the content.

Watch latency: Reranking adds latency to search. Test with production query volumes and consider caching or async reranking for non-critical queries.

Measure impact: A/B test your reranker to ensure it improves relevance. Track metrics like click-through rate or user satisfaction.

Performance considerations

Batch reranking

Some APIs support batch reranking for better throughput:

execute: async ({ query, chunks }) => {
  // Send all documents at once
  const response = await cohereClient.rerank({
    model: "rerank-english-v3.0",
    query,
    documents: chunks.map(c => c.chunk_content),
    top_n: 20 // Limit results
  });

  return response.results.map(r => chunks[r.index]);
}

Caching

Cache reranking results for repeated queries:

const cache = new Map();

execute: async ({ query, chunks }) => {
  const cacheKey = `${query}:${chunks.map(c => c.chunk_id).join(",")}`;

  if (cache.has(cacheKey)) {
    return cache.get(cacheKey);
  }

  const reranked = await performReranking(query, chunks);
  cache.set(cacheKey, reranked);

  return reranked;
}

Parallel processing

For custom scoring, process chunks in parallel:

execute: async ({ query, chunks }) => {
  const scored = await Promise.all(
    chunks.map(async (chunk) => ({
      chunk,
      score: await computeScore(query, chunk)
    }))
  );

  scored.sort((a, b) => b.score - a.score);
  return scored.map(s => s.chunk);
}

Overview

Overview

Key features

Post-search refinement

Model-agnostic

Simple integration

Flexible scoring

What is a reranker?

Why use reranking?

Quick start

Architecture

Integration with ExuluContext

Chunk structure

Common reranking strategies

Usage patterns

With ExuluContext

Standalone usage

Conditional reranking

Popular reranking services

Cohere Rerank

Voyage AI

Jina AI

Custom models

Best practices

Performance considerations

Next steps

Configuration

API reference

​Overview

​Key features

Post-search refinement

Model-agnostic

Simple integration

Flexible scoring

​What is a reranker?

​Why use reranking?

​Quick start

​Architecture

​Integration with ExuluContext

​Chunk structure

​Common reranking strategies

​Usage patterns

​With ExuluContext

​Standalone usage

​Conditional reranking

​Popular reranking services

Cohere Rerank

Voyage AI

Jina AI

Custom models

​Best practices

​Performance considerations

​Next steps

Configuration

API reference

Overview

Key features

What is a reranker?

Why use reranking?

Quick start

Architecture

Integration with ExuluContext

Chunk structure

Common reranking strategies

Usage patterns

With ExuluContext

Standalone usage

Conditional reranking

Popular reranking services

Best practices

Performance considerations

Next steps