Yandex Launches Yambda Dataset for Music Recommender System Research

Edited by: Veronika Radoslavskaya

Yandex has introduced Yambda, a large public dataset for advancing research in recommender systems. Released in May 2025, this dataset is designed to connect academic research with real-world industry applications. It includes nearly 5 billion anonymized user interaction events from Yandex Music.

The Yambda dataset features 4.79 billion anonymized user interactions collected over ten months from approximately 1 million users engaging with around 9.4 million tracks. To protect privacy, all user and track identifiers are anonymized using numeric IDs.

Yandex provides baseline recommender models implemented on the dataset, including Item-Based Collaborative Filtering, Matrix Factorization, and Neural Collaborative Filtering. The dataset is available in three sizes via Hugging Face, accommodating various research needs and computational capacities.

Sources

  • MarkTechPost

  • Yandex

  • Yandex

  • arXiv

  • Yandex

  • Hugging Face

  • MarkTechPost

Did you find an error or inaccuracy?

We will consider your comments as soon as possible.

Yandex Launches Yambda Dataset for Music R... | Gaya One