Yandex Launches Yambda Dataset for Music Recommender System Research

21:53, 30 May

Edited by: Veronika Radoslavskaya

Yandex has introduced Yambda, a large public dataset for advancing research in recommender systems. Released in May 2025, this dataset is designed to connect academic research with real-world industry applications. It includes nearly 5 billion anonymized user interaction events from Yandex Music.

The Yambda dataset features 4.79 billion anonymized user interactions collected over ten months from approximately 1 million users engaging with around 9.4 million tracks. To protect privacy, all user and track identifiers are anonymized using numeric IDs.

Yandex provides baseline recommender models implemented on the dataset, including Item-Based Collaborative Filtering, Matrix Factorization, and Neural Collaborative Filtering. The dataset is available in three sizes via Hugging Face, accommodating various research needs and computational capacities.

Sources

MarkTechPost
Yandex
Yandex
arXiv
Yandex
Hugging Face
MarkTechPost

Notification Center

Yandex Launches Yambda Dataset for Music Recommender System Research

Sources

Read more news on this topic: