Text Retrieval Benchmark¶
Run a text retrieval benchmark without generation (no LLM required).
Download Dataset¶
autorag-research data restore beir scifact_openai-small
Downloads BEIR SciFact (300 queries, 5,183 documents).
Create Experiment Config¶
# configs/experiment.yaml
db_name: beir_scifact_test_openai_small
pipelines:
retrieval:
- bm25
generation: []
metrics:
retrieval:
- recall
- precision
- ndcg
- mrr
generation: []
Run¶
autorag-research run --config-name=experiment
Expected Output¶
Pipeline: bm25
Recall@10: 0.847
Precision@10: 0.085
NDCG@10: 0.712
MRR@10: 0.634
Recommended Datasets¶
| Dataset | Queries | Documents | Best For |
|---|---|---|---|
| BEIR SciFact | 300 | 5,183 | Scientific claims |
| BEIR NFCorpus | 323 | 3,633 | Biomedical |
| MTEB | varies | varies | General text |
See Text Datasets for all options.
Recommended Metrics¶
| Metric | Measures |
|---|---|
| Recall@k | Coverage of ground truth |
| NDCG@k | Ranking quality |
| MRR | First relevant position |
See Retrieval Metrics for details.
Next¶
- Text RAG - Add generation
- Custom Pipeline - Implement your algorithm