Skip to content

Metrics

Evaluation measures for retrieval and generation.

Retrieval Metrics

Metric Measures Range
Recall@k Ground truth coverage [0, 1]
Precision@k Retrieved relevance [0, 1]
F1@k Recall + Precision balance [0, 1]
NDCG@k Ranking quality [0, 1]
MRR First relevant position [0, 1]
MAP Average precision [0, 1]

Generation Metrics

Metric Measures Range
BLEU N-gram overlap [0, 1]
METEOR Alignment [0, 1]
ROUGE Recall overlap [0, 1]
BERTScore Semantic similarity [-1, 1]
SemScore Embedding similarity [-1, 1]

Choosing Metrics

Goal Metrics
Find all relevant docs Recall
Rank correctly NDCG
Find relevant quickly MRR
Text similarity ROUGE, BLEU
Semantic correctness BERTScore
Trust-Align refusal/correctness Trust-Align Metrics Plugin

Browse