Skip to content

Text Retrieval Benchmark

Run a text retrieval benchmark without generation (no LLM required).

Download Dataset

autorag-research data restore beir scifact_openai-small

Downloads BEIR SciFact (300 queries, 5,183 documents).

Create Experiment Config

# configs/experiment.yaml
db_name: beir_scifact_test_openai_small

pipelines:
  retrieval:
    - bm25
  generation: []

metrics:
  retrieval:
    - recall
    - precision
    - ndcg
    - mrr
  generation: []

Run

autorag-research run --config-name=experiment

Expected Output

Pipeline: bm25
  Recall@10: 0.847
  Precision@10: 0.085
  NDCG@10: 0.712
  MRR@10: 0.634
Dataset Queries Documents Best For
BEIR SciFact 300 5,183 Scientific claims
BEIR NFCorpus 323 3,633 Biomedical
MTEB varies varies General text

See Text Datasets for all options.

Metric Measures
Recall@k Coverage of ground truth
NDCG@k Ranking quality
MRR First relevant position

See Retrieval Metrics for details.

Next