Home/Blog/System Design/How Spotify Wrapped scales for 7M users: System Design case study
Home/Blog/System Design/How Spotify Wrapped scales for 7M users: System Design case study

How Spotify Wrapped scales for 7M users: System Design case study

11 min read
Nov 28, 2024

Every year end, Spotify delivers its users a gift: a beautifully personalized summary of their listening habits. From your top songs to your most-streamed genres, Spotify Wrapped transforms your music data into an engaging, shareable story.

Spotify Wrapped is also a powerful product offering that users wait for each year. 

The numbers speak for themselves:

  • 100 million+ shares on social media in 20221

  • 20% increase in Spotify downloads in 20202

  • 602 million users engaged across 184 countries (as of 2023)3

But Spotify Wrapped is more than just a successful marketing campaign — it's a time capsule of your year in music, powered by data science, machine learning, and immaculate System Design.

Behind every Wrapped recap is a robust architecture that processes petabytes of data with precision and speed. This all means that engineers are hard at work ensuring Wrapped is seamless, scalable, and always ready for millions of users worldwide. 

Let’s explore the inner mechanics that makes Spotify Wrapped work like clockwork. We'll cover:

  • Challenges for scaling Spotify Wrapped

  • System Design of Spotify Wrapped

  • 4 engineering lessons we can learn from Spotify

Let's dive in.

Getting to know Spotify Wrapped#

Spotify’s scalable System Design enables the Wrapped campaign to reach millions without a hitch, turning personal listening data into a global phenomenon. By seamlessly scaling to meet huge surges in demand, Spotify ensures that each Wrapped experience is fast, personalized, and ready to share.

Wrapped over the years#

Since its launch in 2016, Spotify Wrapped has added new layers of interactivity and personalization every year:

Year

Spotify Wrapped Features

2016

  • The first edition of Spotify offered basic stats, such as the top songs, artists, and genres, based on users’ yearly listening habits

2017

  • Expanded stats with more detailed insights, including top 5 artists, songs, genres, and the ability to share these stats on social media

2018

  • Top artists, songs, genres
  • Added “Your Top Songs” playlist, allowing users to re-listen to their top songs of the year


2019

  • Top artists, songs, genres
  • Introduced “Tastebreakers” playlist, which recommended new songs outside users’ usual preferences
  • Added a slideshow for a more engaging experience
  • Introduced decade-based insights into users’ listening history

2020

  • Top artists, songs, genres
  • Focused on past listening patterns and added “Missed Hits” playlist
  • Introduced new stats like the number of new artists discovered, top podcasts, etc.


2021

  • Introduced new interactive features like “Audio Aura.”
  • The “2021: The Movie,” which matched users’ music to movie scenes
  • Introduced shareable “Wrapped Cards” for social media

2022

  • Added personalized “Listening Personality” types and improved visuals
  • Introduced “Audio Day,” offering a peak into evolving tastes based on preferences at different times of the day
  • Expanded on the slideshow, making it more dynamic and engaging



2023

  • Enhanced “Listening Personality” insights with social sharing
  • Introduced custom storylines based on listening behavior
  • Upgraded interactive slideshow
  • Added “Me in 2023,“ which assigns users a unique listening character
  • Introduced “Sound Town,” matching listeners to a city that reflects their music tastes
  • An “AI DJ” that guides users through their Wrapped with commentary on top songs and artists

Estimating Wrapped users#

widget

We estimated user counts for 2024 based on past user data from Spotify:5 

  • 700 million total monthly active users on Spotify 

    • Based on average growth of ~23%, 2019 to 2023

  • 295 million users accessing Wrapped 

    • Based on average growth of 37.5%, 2019 to 2022 (2023 data is undisclosed)

While not every user opens and accesses their Wrapped, Spotify creates the personalized Wrapped experience for each user (provided they meet simple eligibility criteria such as minimum listening time).

Wrapped data#

Spotify likely logs user data from January 1 through November 15 or 30.

Here's some insights on the Wrapped data that's collected: 

  • Top songs and artists are ranked by play count, not total listening time 

  • Songs must be played for over 30 seconds to count in rankings

  • Only the first 10 songs in the top 100 playlists are strictly sorted by play count

We all have something to hide..
We all have something to hide..

Challenges of scaling Spotify Wrapped#

Processing data and creating Wrapped for 700 million users requires a scalable and robust architecture to process data from the year-long music history of their users. Spotify must manage millions of simultaneous streams, store and deliver petabytes of data, and recommend personalized content — all with high low latency and high performance.

The engineering team faces several challenges when ensuring a seamless Spotify Wrapped user experience:

Scalability and resource management#

Scalability is crucial amid Wrapped, as the surge in user engagement and social sharing can overload the system. Maintaining scalable, serverless, and auto-scaling solutions is critical, but these must be optimized without overloading and increasing costs.

Data volume and processing#

Spotify handles an enormous amount of data, especially historical data, across hundreds of millions of users. Processing this data in batch jobs to compile Wrapped insights while simultaneously managing real-time data flows for recommendations requires highly efficient data pipelines and storage solutions, like data lakes and distributed storage.

Personalization complexity#

Wrapped’s success hinges on hyper-personalized insights requiring complex machine learning models trained on massive data sets. Scaling these models while avoiding latency issues is challenging, but advanced machine learning models can optimize this.

Cost management#

Efficiently managing cloud resources during Wrapped’s annual spike is key to balancing performance and costs. One may do this by integrating Wrapped calculations into existing data pipelines used for real-time recommendations.

Data compliance#

Handling user data requires compliance with regulations like GDPR and CCPA. Spotify can ensure data privacy while maintaining low-latency data delivery through edge computing and distributed systems.

Scalability techniques for Spotify Wrapped#

Let's see how Spotify's scalability techniques help address these challenges.

Technique

Description

Challenges Addressed


Multi-tiered storage

Cloud-based tiered storage efficiently stores vast amounts of historical user data and Wrapped results.

Data volume & processing, cost management


Horizontal scaling

Adds servers instead of upgrading existing ones, enabling Spotify to handle massive concurrent user demand.

Scalability & resource management, availability


Serverless and auto-scaling

Uses serverless architectures and auto-scaling (e.g., AWS Lambda, GCP) to dynamically allocate resources as demand spikes.

Scalability, cost management


Data processing using the data lake

Processes user history and engagement data in a data lake or warehouse to manage high-volume batch processing needed for Wrapped.

Data volume & processing. personalization complexity

Real-time processing

Uses tools like Kafka and Spark to continuously process user data, ensuring real-time insights are available.

Data volume & processing. personalization complexity


Edge computing

Caches content closer to users on edge servers, reducing latency and handling regional load effectively during Wrapped access.

Data compliance, scalability & resource management


Monitoring and auto-recovery

Implements real-time monitoring tools like Grafana and failover mechanisms to detect and recover from issues quickly.

Scalability & resource management, availability


Spotify System Design and workflow#

Spotify’s System Design ensures seamless streaming, user interactions, and personalized features like Wrapped. The architecture is built for scalability and high availability, handling millions of simultaneous requests efficiently.

An overview of Spotify’s scalable System Design
An overview of Spotify’s scalable System Design

Key system components and services#

Here’s how Spotify processes user requests and ensures scalability:

  • API Gateway: Acts as the entry point, authenticating user requests.

  • Load Balancer: Distributes requests evenly across application servers to handle large volumes of traffic.

  • Messaging Queue: User interactions (e.g., playing songs, creating playlists) are sent to a queue (like Pub/Sub or Kafka). This queue distributes the data to various microservices for tasks like generating recommendations or creating Wrapped summaries. This allows the data to be processed asynchronously, improving scalability and availability.

This asynchronous approach enhances scalability and availability, ensuring Spotify’s system can handle traffic spikes and real-time demands.

Spotify’s microservices architecture supports various tasks:

  • User Service: Manages user data, including preferences and subscriptions, with connections to a payment service for subscription verification.

  • Upload Service: Ingests new content from artists. 

  • Transcoding Service: Converts uploaded files into streaming-compatible formats, storing them in cloud-based blob storage (and metadata into an SQL database).

  • Streaming Service: Delivers content to users via a content delivery network (CDN), minimizing latency.

  • Search Service: Enables fast lookups using Elasticsearch.

  • Processing Service: Powers recommendations and Wrapped summaries using advanced machine learning models.

  • Monitoring Service: Monitors the overall system’s health and alerts in case of errors, failures, etc.

Distributed databases#

Spotify employs multiple database types:

  • Blob Storage: Stores tracks, podcasts, and audiobooks.

  • SQL Databases: Store user metadata like account details.

  • NoSQL Databases: Handle activity data such as listening history, playlists, and preferences.

System Design for Spotify Wrapped#

Let’s explore how data processing services process such a massive amount of data at scale to create personalized Spotify Wrapped.

Spotify uses the ETL (Extract, Transform, Load) process: Extract defines how data is collected, Transform covers how data is processed and turned into features, and Load specifies where data is stored for efficient retrieval. They also use reverse ETL to create Wrapped from the processed data.

Data collection or ingestion (Extract)#

A data collection service collects data from data resources (databases) and passes it to tools like Kafka or Pub/Sub to stream and make it available for immediate processing.

Data processing (Transform)#

The data from the ingestion layer is fed to the processing layer, where the batch processor runs on massive data, aggregating users’ year-long listening tasks and generating insights. Spotify uses Google Cloud Bigtable to efficiently handle its extensive time-series data and user listening history, optimizing it for fast data aggregation over specific time frames. 

In 2019, Spotify’s use of Bigtable and BigQuery for data processing resulted in processing 5x data while reducing 25% of the overall cost.6

Spotify can quickly compile user-level insights by structuring data storage to minimize shuffling (reducing the need to move data between nodes, which can be time-consuming and resource-intensive) during processing.

Note: The following illustration is an in-depth exploration of how data processing services process data and transform it into a personalized Wrapped.

A detailed design of data processing for Spotify Wrapped
A detailed design of data processing for Spotify Wrapped

Data warehousing (Load)#

Apache Spark and other big data frameworks process this data at scale, and the results are stored in data warehouses like Google BigQuery.

Wrapped Creation and Personalization (Reverse ETL)#

Finally, data visualization tools and services aggregate this processed data, allowing Wrapped summaries to be sent to the user in real time through APIs. Cloud services ensure low latency, high availability, and scalability across Spotify’s global infrastructure.

The Wrapped summaries are sent to users via Email or in-app notifications through Pub/Sub service.

Note: Spotify Wrapped is all about personalization, done by utilizing advanced machine learning algorithms. The ML engine uses collaborative filteringA filtering technique to recommend content based on the behavior of similar users or items., content-based filteringA filtering technique to recommend content based on similarities between items or content based on metadata and content features., and a hybrid model, mostly the best of both, to generate personalized Wrapped for each user.

For 2019 Wrapped, Spotify processed decade-long data of users by utilizing Bigtable. A similar data processing pattern for a year-long data is shown below:

The architecture of data pipelines [source: Spotify]
The architecture of data pipelines [source: Spotify]

Front-end animations in Spotify Wrapped#

The front-end experience for Spotify Wrapped plays a crucial role in driving user engagement. The design of the Wrapped interface transforms raw data into fun, shareable content.

Spotify Wrapped’s front-end elements include:

Visual and interactive features#

  • Personalized visualizations: Insights are displayed as animated reels or cards.

  • NLP-powered content: Uses natural language processing to generate captions and labels for animations.

Feature highlights#

  • Audio Aura (2021): Colors representing listening intensity for different genres.

  • Sound Town (2023): Mapped users’ tastes to fictional cities, creating playful, shareable visuals.

These interactive features enhance user engagement, turning data into delightful experiences.

2023 Sound Town [Source: Spotify]
2023 Sound Town [Source: Spotify]

What we can learn from Spotify #

Here are 4 key takeaways from Spotify’s approach to delivering a seamless Wrapped experience each year:

  1. A robust, scalable System Design is the backbone of Wrapped. It handles huge data volumes by separating real-time and batch-processing content to ensure fast data access and reliable yearly insights.

  2. Using solutions like Bigtable and BigQuery, Spotify minimizes data shuffling and enables efficient aggregation, providing quick user-level insights for millions.

  3. Advanced machine learning models help Spotify deliver Wrapped’s unique, personalized insights by analyzing patterns in listening data.

  4. By employing auto-scaling and load balancing, Spotify can smoothly manage the surge in Wrapped engagement.

What's next for Spotify Wrapped?#

As Spotify Wrapped continues to scale year after year, it offers a glimpse into the complex System Design that powers real-world applications at a massive scale. 

Spotify's developers add new features and insights to make the user experience better by the year — so it's hard to tell what's coming next. However, we can expect AI to level up the Wrapped experience through features like:

  • Interactive, real-time playlists that evolve based on a user’s Wrapped experience 

  • AI-driven "music DNA" visualizations breaking down listening habits into dynamic, shareable formats

  • Leveraging GenAI to create unique soundtracks for users, blending favorite genres, moods, and artists into a custom composition

Spotify Wrapped highlights how important approaches like cloud-based data pipelines, advanced machine learning models, and auto-scaling for cost savings are in System Design.

To truly understand these systems, you'd need to dive deep into the world of scalable System Design. If you haven't done so yet, I recommend starting with the following course:

Cover
Grokking the Modern System Design Interview

System Design interviews are now part of every Engineering and Product Management Interview. Interviewers want candidates to exhibit their technical knowledge of core building blocks and the rationale of their design approach. This course presents carefully selected system design problems with detailed solutions that will enable you to handle complex scalability scenarios during an interview or designing new products. You will start with learning a bottom-up approach to designing scalable systems. First, you’ll learn about the building blocks of modern systems, with each component being a completely scalable application in itself. You'll then explore the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process. Finally, you'll design several popular services by using these modular building blocks in unique combinations, and learn how to evaluate your design.

26hrs
Intermediate
5 Playgrounds
18 Quizzes

Disclaimer: All technical information and design insights provided in this newsletter are curated by our System Design experts to the best of their knowledge and based on available resources, including insights from Spotify’s engineering blogs. While we strive for accuracy, some details may vary from Spotify’s actual implementations and are meant for educational interpretations.

#


 
Join 2.5 million developers at
Explore the catalog

Free Resources