Skip to main content

How to combine results from multiple retrievers

Prerequisites

This guide assumes familiarity with the following concepts:

The EnsembleRetriever supports ensembling of results from multiple retrievers. It is initialized with a list of BaseRetriever objects. EnsembleRetrievers rerank the results of the constituent retrievers based on the Reciprocal Rank Fusion algorithm.

By leveraging the strengths of different algorithms, the EnsembleRetriever can achieve better performance than any single algorithm.

One useful pattern is to combine a keyword matching retriever with a dense retriever (like embedding similarity), because their strengths are complementary. This can be considered a form of "hybrid search". The sparse retriever is good at finding relevant documents based on keywords, while the dense retriever is good at finding relevant documents based on semantic similarity.

Below we demonstrate ensembling of a simple custom retriever that simply returns documents that directly contain the input query with a retriever derived from a demo, in-memory, vector store.

import { EnsembleRetriever } from "langchain/retrievers/ensemble";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { BaseRetriever, BaseRetrieverInput } from "@langchain/core/retrievers";
import { Document } from "@langchain/core/documents";

class SimpleCustomRetriever extends BaseRetriever {
lc_namespace = [];

documents: Document[];

constructor(fields: { documents: Document[] } & BaseRetrieverInput) {
super(fields);
this.documents = fields.documents;
}

async _getRelevantDocuments(query: string): Promise<Document[]> {
return this.documents.filter((document) =>
document.pageContent.includes(query)
);
}
}

const docs1 = [
new Document({ pageContent: "I like apples", metadata: { source: 1 } }),
new Document({ pageContent: "I like oranges", metadata: { source: 1 } }),
new Document({
pageContent: "apples and oranges are fruits",
metadata: { source: 1 },
}),
];

const keywordRetriever = new SimpleCustomRetriever({ documents: docs1 });

const docs2 = [
new Document({ pageContent: "You like apples", metadata: { source: 2 } }),
new Document({ pageContent: "You like oranges", metadata: { source: 2 } }),
];

const vectorstore = await MemoryVectorStore.fromDocuments(
docs2,
new OpenAIEmbeddings()
);

const vectorstoreRetriever = vectorstore.asRetriever();

const retriever = new EnsembleRetriever({
retrievers: [vectorstoreRetriever, keywordRetriever],
weights: [0.5, 0.5],
});

const query = "apples";
const retrievedDocs = await retriever.invoke(query);

console.log(retrievedDocs);

/*
[
Document { pageContent: 'You like apples', metadata: { source: 2 } },
Document { pageContent: 'I like apples', metadata: { source: 1 } },
Document { pageContent: 'You like oranges', metadata: { source: 2 } },
Document {
pageContent: 'apples and oranges are fruits',
metadata: { source: 1 }
}
]
*/

API Reference:

Next steps​

You've now learned how to combine results from multiple retrievers. Next, check out some other retrieval how-to guides, such as how to improve results using multiple embeddings per document or how to create your own custom retriever.


Was this page helpful?


You can leave detailed feedback on GitHub.