Skip to content

Retrieval Pipelines

Algorithms that take a query and return relevant documents.

Available Pipelines

Pipeline Algorithm Modality
BM25 Sparse (term frequency) Text
Hybrid RRF / Convex Combination Text
Vector Search Dense (vector similarity) Text
HyDE Dense (hypothetical document embeddings) Text

Base Class

All retrieval pipelines extend BaseRetrievalPipeline:

from autorag_research.pipelines.retrieval import BaseRetrievalPipeline


class MyRetrievalPipeline(BaseRetrievalPipeline):
    def _get_retrieval_func(self):
        def retrieve(queries: list[str], top_k: int) -> list[list[dict]]:
            # Return list of results per query
            # Each result: {"doc_id": ..., "score": ...}
            pass

        return retrieve

    def _get_pipeline_config(self):
        return {"type": "my_pipeline"}

Methods

Method Description
retrieve(query, top_k) Single query retrieval
run(top_k, batch_size) Batch retrieval for all queries