Home/Blog/System Design/How Spotify Wrapped scales for 7M users: System Design case study

How Spotify Wrapped scales for 7M users: System Design case study

11 min read

Nov 28, 2024

content

Getting to know Spotify Wrapped

Wrapped over the years

Estimating Wrapped users

Wrapped data

Challenges of scaling Spotify Wrapped

Scalability and resource management

Data volume and processing

Personalization complexity

Cost management

Data compliance

Scalability techniques for Spotify Wrapped

Spotify System Design and workflow

Key system components and services

Distributed databases

System Design for Spotify Wrapped

Data collection or ingestion (Extract)

Data processing (Transform)

Data warehousing (Load)

Wrapped Creation and Personalization (Reverse ETL)

Front-end animations in Spotify Wrapped

Visual and interactive features

Feature highlights

What we can learn from Spotify

What's next for Spotify Wrapped?

Every year end, Spotify delivers its users a gift: a beautifully personalized summary of their listening habits. From your top songs to your most-streamed genres, Spotify Wrapped transforms your music data into an engaging, shareable story.

Spotify Wrapped is also a powerful product offering that users wait for each year.

The numbers speak for themselves:

100 million+ shares on social media in 2022¹
20% increase in Spotify downloads in 2020²
602 million users engaged across 184 countries (as of 2023)³

But Spotify Wrapped is more than just a successful marketing campaign — it's a time capsule of your year in music, powered by data science, machine learning, and immaculate System Design.

Behind every Wrapped recap is a robust architecture that processes petabytes of data with precision and speed. This all means that engineers are hard at work ensuring Wrapped is seamless, scalable, and always ready for millions of users worldwide.

Let’s explore the inner mechanics that makes Spotify Wrapped work like clockwork. We'll cover:

Challenges for scaling Spotify Wrapped
System Design of Spotify Wrapped
4 engineering lessons we can learn from Spotify

Let's dive in.

Getting to know Spotify Wrapped#

Spotify’s scalable System Design enables the Wrapped campaign to reach millions without a hitch, turning personal listening data into a global phenomenon. By seamlessly scaling to meet huge surges in demand, Spotify ensures that each Wrapped experience is fast, personalized, and ready to share.

Wrapped over the years#

Since its launch in 2016, Spotify Wrapped has added new layers of interactivity and personalization every year:

Year	Spotify Wrapped Features
2016	The first edition of Spotify offered basic stats, such as the top songs, artists, and genres, based on users’ yearly listening habits
2017	Expanded stats with more detailed insights, including top 5 artists, songs, genres, and the ability to share these stats on social media
2018	Top artists, songs, genres Added “Your Top Songs” playlist, allowing users to re-listen to their top songs of the year
2019	Top artists, songs, genres Introduced “Tastebreakers” playlist, which recommended new songs outside users’ usual preferences Added a slideshow for a more engaging experience Introduced decade-based insights into users’ listening history
2020	Top artists, songs, genres Focused on past listening patterns and added “Missed Hits” playlist Introduced new stats like the number of new artists discovered, top podcasts, etc.
2021	Introduced new interactive features like “Audio Aura.” The “2021: The Movie,” which matched users’ music to movie scenes Introduced shareable “Wrapped Cards” for social media
2022	Added personalized “Listening Personality” types and improved visuals Introduced “Audio Day,” offering a peak into evolving tastes based on preferences at different times of the day Expanded on the slideshow, making it more dynamic and engaging
2023	Enhanced “Listening Personality” insights with social sharing Introduced custom storylines based on listening behavior Upgraded interactive slideshow Added “Me in 2023,“ which assigns users a unique listening character Introduced “Sound Town,” matching listeners to a city that reflects their music tastes An “AI DJ” that guides users through their Wrapped with commentary on top songs and artists

We estimated user counts for 2024 based on past user data from Spotify:⁵

700 million total monthly active users on Spotify
- Based on average growth of ~23%, 2019 to 2023
295 million users accessing Wrapped
- Based on average growth of 37.5%, 2019 to 2022 (2023 data is undisclosed)

While not every user opens and accesses their Wrapped, Spotify creates the personalized Wrapped experience for each user (provided they meet simple eligibility criteria such as minimum listening time).

Wrapped data#

Spotify likely logs user data from January 1 through November 15 or 30.

Here's some insights on the Wrapped data that's collected:

Top songs and artists are ranked by play count, not total listening time
Songs must be played for over 30 seconds to count in rankings
Only the first 10 songs in the top 100 playlists are strictly sorted by play count

Challenges of scaling Spotify Wrapped#

Processing data and creating Wrapped for 700 million users requires a scalable and robust architecture to process data from the year-long music history of their users. Spotify must manage millions of simultaneous streams, store and deliver petabytes of data, and recommend personalized content — all with high low latency and high performance.

The engineering team faces several challenges when ensuring a seamless Spotify Wrapped user experience:

Scalability and resource management#

Scalability is crucial amid Wrapped, as the surge in user engagement and social sharing can overload the system. Maintaining scalable, serverless, and auto-scaling solutions is critical, but these must be optimized without overloading and increasing costs.

Data volume and processing#

Spotify handles an enormous amount of data, especially historical data, across hundreds of millions of users. Processing this data in batch jobs to compile Wrapped insights while simultaneously managing real-time data flows for recommendations requires highly efficient data pipelines and storage solutions, like data lakes and distributed storage.

Personalization complexity#

Wrapped’s success hinges on hyper-personalized insights requiring complex machine learning models trained on massive data sets. Scaling these models while avoiding latency issues is challenging, but advanced machine learning models can optimize this.

Cost management#

Efficiently managing cloud resources during Wrapped’s annual spike is key to balancing performance and costs. One may do this by integrating Wrapped calculations into existing data pipelines used for real-time recommendations.

Data compliance#

Handling user data requires compliance with regulations like GDPR and CCPA. Spotify can ensure data privacy while maintaining low-latency data delivery through edge computing and real-world distributed systems.

Scalability techniques for Spotify Wrapped#

Let's see how Spotify's scalability techniques help address these challenges.

Technique	Description	Challenges Addressed
Multi-tiered storage	Cloud-based tiered storage efficiently stores vast amounts of historical user data and Wrapped results.	Data volume & processing, cost management
Horizontal scaling	Adds servers instead of upgrading existing ones, enabling Spotify to handle massive concurrent user demand.	Scalability & resource management, availability
Serverless and auto-scaling	Uses serverless architectures and auto-scaling (e.g., AWS Lambda, GCP) to dynamically allocate resources as demand spikes.	Scalability, cost management
Data processing using the data lake	Processes user history and engagement data in a data lake or warehouse to manage high-volume batch processing needed for Wrapped.	Data volume & processing. personalization complexity
Real-time processing	Uses tools like Kafka and Spark to continuously process user data, ensuring real-time insights are available.	Data volume & processing. personalization complexity
Edge computing	Caches content closer to users on edge servers, reducing latency and handling regional load effectively during Wrapped access.	Data compliance, scalability & resource management
Monitoring and auto-recovery	Implements real-time monitoring tools like Grafana and failover mechanisms to detect and recover from issues quickly.	Scalability & resource management, availability

Key system components and services#

Here’s how Spotify processes user requests and ensures scalability:

API Gateway: Acts as the entry point, authenticating user requests.
Load Balancer: Distributes requests evenly across application servers to handle large volumes of traffic.
Messaging Queue: User interactions (e.g., playing songs, creating playlists) are sent to a queue (like Pub/Sub or Kafka). This queue distributes the data to various microservices for tasks like generating recommendations or creating Wrapped summaries. This allows the data to be processed asynchronously, improving scalability and availability.

This asynchronous approach enhances scalability and availability, ensuring Spotify’s system can handle traffic spikes and real-time demands.

Spotify’s microservices architecture supports various tasks:

User Service: Manages user data, including preferences and subscriptions, with connections to a payment service for subscription verification.
Upload Service: Ingests new content from artists.
Transcoding Service: Converts uploaded files into streaming-compatible formats, storing them in cloud-based blob storage (and metadata into an SQL database).
Streaming Service: Delivers content to users via a content delivery network (CDN), minimizing latency.
Search Service: Enables fast lookups using Elasticsearch.
Processing Service: Powers recommendations and Wrapped summaries using advanced machine learning models.
Monitoring Service: Monitors the overall system’s health and alerts in case of errors, failures, etc.

Distributed databases#

Spotify employs multiple database types:

Blob Storage: Stores tracks, podcasts, and audiobooks.
SQL Databases: Store user metadata like account details.
NoSQL Databases: Handle activity data such as listening history, playlists, and preferences.

System Design for Spotify Wrapped#

Let’s explore how data processing services process such a massive amount of data at scale to create personalized Spotify Wrapped.

Spotify uses the ETL (Extract, Transform, Load) process: Extract defines how data is collected, Transform covers how data is processed and turned into features, and Load specifies where data is stored for efficient retrieval. They also use reverse ETL to create Wrapped from the processed data.

Data collection or ingestion (Extract)#

A data collection service collects data from data resources (databases) and passes it to tools like Kafka or Pub/Sub to stream and make it available for immediate processing.

Data processing (Transform)#

The data from the ingestion layer is fed to the processing layer, where the batch processor runs on massive data, aggregating users’ year-long listening tasks and generating insights. Spotify uses Google Cloud Bigtable to efficiently handle its extensive time-series data and user listening history, optimizing it for fast data aggregation over specific time frames.

In 2019, Spotify’s use of Bigtable and BigQuery for data processing resulted in processing 5x data while reducing 25% of the overall cost.⁶

Spotify can quickly compile user-level insights by structuring data storage to minimize shuffling (reducing the need to move data between nodes, which can be time-consuming and resource-intensive) during processing.

Note: The following illustration is an in-depth exploration of how data processing services process data and transform it into a personalized Wrapped.

Data warehousing (Load)#

Apache Spark and other big data frameworks process this data at scale, and the results are stored in data warehouses like Google BigQuery.

Wrapped Creation and Personalization (Reverse ETL)#

Finally, data visualization tools and services aggregate this processed data, allowing Wrapped summaries to be sent to the user in real time through APIs. Cloud services ensure low latency, high availability, and scalability across Spotify’s global infrastructure.

The Wrapped summaries are sent to users via Email or in-app notifications through Pub/Sub service.

Note: Spotify Wrapped is all about personalization, done by utilizing advanced machine learning algorithms. The ML engine uses collaborative filteringA filtering technique to recommend content based on the behavior of similar users or items., content-based filteringA filtering technique to recommend content based on similarities between items or content based on metadata and content features., and a hybrid model, mostly the best of both, to generate personalized Wrapped for each user.

For 2019 Wrapped, Spotify processed decade-long data of users by utilizing Bigtable. A similar data processing pattern for a year-long data is shown below:

Front-end animations in Spotify Wrapped#

The front-end experience for Spotify Wrapped plays a crucial role in driving user engagement. The design of the Wrapped interface transforms raw data into fun, shareable content.

Spotify Wrapped’s front-end elements include:

Visual and interactive features#

Personalized visualizations: Insights are displayed as animated reels or cards.
NLP-powered content: Uses natural language processing to generate captions and labels for animations.

Feature highlights#

Audio Aura (2021): Colors representing listening intensity for different genres.
Sound Town (2023): Mapped users’ tastes to fictional cities, creating playful, shareable visuals.

These interactive features enhance user engagement, turning data into delightful experiences.

What we can learn from Spotify #

Here are 4 key takeaways from Spotify’s approach to delivering a seamless Wrapped experience each year:

A robust, scalable System Design is the backbone of Wrapped. It handles huge data volumes by separating real-time and batch-processing content to ensure fast data access and reliable yearly insights.
Using solutions like Bigtable and BigQuery, Spotify minimizes data shuffling and enables efficient aggregation, providing quick user-level insights for millions.
Advanced machine learning models help Spotify deliver Wrapped’s unique, personalized insights by analyzing patterns in listening data.
By employing auto-scaling and load balancing, Spotify can smoothly manage the surge in Wrapped engagement.

What's next for Spotify Wrapped?#

As Spotify Wrapped continues to scale year after year, it offers a glimpse into the complex System Design that powers real-world applications at a massive scale.

Spotify's developers add new features and insights to make the user experience better by the year — so it's hard to tell what's coming next. However, we can expect AI to level up the Wrapped experience through features like:

Interactive, real-time playlists that evolve based on a user’s Wrapped experience
AI-driven "music DNA" visualizations breaking down listening habits into dynamic, shareable formats
Leveraging GenAI to create unique soundtracks for users, blending favorite genres, moods, and artists into a custom composition

Spotify Wrapped highlights how important approaches like cloud-based data pipelines, advanced machine learning models, and auto-scaling for cost savings are in System Design.

To truly understand these systems, you'd need to dive deep into the world of scalable System Design. If you haven't done so yet, I recommend starting with the following course:

Grokking the Modern System Design Interview

System Design interviews now shape hiring decisions across Engineering and Product Management roles. Interviewers expect you to demonstrate technical depth, justify design choices, and build for scale. This course helps you do exactly that. Tackle carefully selected design problems, apply proven solutions, and navigate complex scalability challenges—whether in interviews or real-world product design. Start by mastering a bottom-up approach: break down modern systems, with each component modeled as a scalable service. Then, apply the RESHADED framework to define requirements, surface constraints, and drive structured design decisions. Finally, design popular architectures using modular building blocks, and critique your solutions to improve under real interview conditions.

26hrs

Intermediate

5 Playgrounds

18 Quizzes