Metrics
Evaluation measures for retrieval and generation.
Retrieval Metrics
| Metric |
Measures |
Range |
| Recall@k |
Ground truth coverage |
[0, 1] |
| Precision@k |
Retrieved relevance |
[0, 1] |
| F1@k |
Recall + Precision balance |
[0, 1] |
| NDCG@k |
Ranking quality |
[0, 1] |
| MRR |
First relevant position |
[0, 1] |
| MAP |
Average precision |
[0, 1] |
Generation Metrics
| Metric |
Measures |
Range |
| BLEU |
N-gram overlap |
[0, 1] |
| METEOR |
Alignment |
[0, 1] |
| ROUGE |
Recall overlap |
[0, 1] |
| BERTScore |
Semantic similarity |
[-1, 1] |
| SemScore |
Embedding similarity |
[-1, 1] |
Choosing Metrics
| Goal |
Metrics |
| Find all relevant docs |
Recall |
| Rank correctly |
NDCG |
| Find relevant quickly |
MRR |
| Text similarity |
ROUGE, BLEU |
| Semantic correctness |
BERTScore |
| Trust-Align refusal/correctness |
Trust-Align Metrics Plugin |
Browse