Skip to content

MrTyDi

Multilingual retrieval benchmark.

Overview

Field Value
Modality Text
Generation GT No
HF Repository mrtydi-dumps
Paper Zhang et al., 2021

Description

Mr. TyDi is a multilingual benchmark for retrieval, covering multiple languages with native speakers providing queries and relevance judgments.

Languages

  • Arabic
  • Bengali
  • English
  • Finnish
  • Indonesian
  • Japanese
  • Korean
  • Russian
  • Swahili
  • Telugu
  • Thai

Download

autorag-research data restore mrtydi <language>_<embedding_model>

Ingest from Source

autorag-research ingest --name=mrtydi --extra language=english --embedding-model=openai-small

Best For

  • Multilingual retrieval evaluation
  • Cross-lingual transfer
  • Non-English benchmarking