Every year end, Spotify delivers its users a gift: a beautifully personalized summary of their listening habits. From your top songs to your most-streamed genres, Spotify Wrapped transforms your music data into an engaging, shareable story.
Spotify Wrapped is also a powerful product offering that users wait for each year.
The numbers speak for themselves:
100 million+ shares on social media in 20221
20% increase in Spotify downloads in 20202
602 million users engaged across 184 countries (as of 2023)3
But Spotify Wrapped is more than just a successful marketing campaign — it's a time capsule of your year in music, powered by data science, machine learning, and immaculate System Design.
Behind every Wrapped recap is a robust architecture that processes petabytes of data with precision and speed. This all means that engineers are hard at work ensuring Wrapped is seamless, scalable, and always ready for millions of users worldwide.
Let’s explore the inner mechanics that makes Spotify Wrapped work like clockwork. We'll cover:
Challenges for scaling Spotify Wrapped
System Design of Spotify Wrapped
4 engineering lessons we can learn from Spotify
Let's dive in.
Spotify’s scalable System Design enables the Wrapped campaign to reach millions without a hitch, turning personal listening data into a global phenomenon. By seamlessly scaling to meet huge surges in demand, Spotify ensures that each Wrapped experience is fast, personalized, and ready to share.
Since its launch in 2016, Spotify Wrapped has added new layers of interactivity and personalization every year:
Year | Spotify Wrapped Features |
2016 |
|
2017 |
|
2018 |
|
2019 |
|
2020 |
|
2021 |
|
2022 |
|
2023 |
|
We estimated user counts for 2024 based on past user data from Spotify:5
700 million total monthly active users on Spotify
Based on average growth of ~23%, 2019 to 2023
295 million users accessing Wrapped
Based on average growth of 37.5%, 2019 to 2022 (2023 data is undisclosed)
While not every user opens and accesses their Wrapped, Spotify creates the personalized Wrapped experience for each user (provided they meet simple eligibility criteria such as minimum listening time).
Spotify likely logs user data from January 1 through November 15 or 30.
Here's some insights on the Wrapped data that's collected:
Top songs and artists are ranked by play count, not total listening time
Songs must be played for over 30 seconds to count in rankings
Only the first 10 songs in the top 100 playlists are strictly sorted by play count
Processing data and creating Wrapped for 700 million users requires a scalable and robust architecture to process data from the year-long music history of their users. Spotify must manage millions of simultaneous streams, store and deliver petabytes of data, and recommend personalized content — all with high low latency and high performance.
The engineering team faces several challenges when ensuring a seamless Spotify Wrapped user experience:
Scalability is crucial amid Wrapped, as the surge in user engagement and social sharing can overload the system. Maintaining scalable, serverless, and auto-scaling solutions is critical, but these must be optimized without overloading and increasing costs.
Spotify handles an enormous amount of data, especially historical data, across hundreds of millions of users. Processing this data in batch jobs to compile Wrapped insights while simultaneously managing real-time data flows for recommendations requires highly efficient data pipelines and storage solutions, like data lakes and distributed storage.
Wrapped’s success hinges on hyper-personalized insights requiring complex machine learning models trained on massive data sets. Scaling these models while avoiding latency issues is challenging, but advanced machine learning models can optimize this.
Efficiently managing cloud resources during Wrapped’s annual spike is key to balancing performance and costs. One may do this by integrating Wrapped calculations into existing data pipelines used for real-time recommendations.
Handling user data requires compliance with regulations like GDPR and CCPA. Spotify can ensure data privacy while maintaining low-latency data delivery through edge computing and distributed systems.
Let's see how Spotify's scalability techniques help address these challenges.
Technique | Description | Challenges Addressed |
Multi-tiered storage | Cloud-based tiered storage efficiently stores vast amounts of historical user data and Wrapped results. | Data volume & processing, cost management |
Horizontal scaling | Adds servers instead of upgrading existing ones, enabling Spotify to handle massive concurrent user demand. | Scalability & resource management, availability |
Serverless and auto-scaling | Uses serverless architectures and auto-scaling (e.g., AWS Lambda, GCP) to dynamically allocate resources as demand spikes. | Scalability, cost management |
Data processing using the data lake | Processes user history and engagement data in a data lake or warehouse to manage high-volume batch processing needed for Wrapped. | Data volume & processing. personalization complexity |
Real-time processing | Uses tools like Kafka and Spark to continuously process user data, ensuring real-time insights are available. | Data volume & processing. personalization complexity |
Edge computing | Caches content closer to users on edge servers, reducing latency and handling regional load effectively during Wrapped access. | Data compliance, scalability & resource management |
Monitoring and auto-recovery | Implements real-time monitoring tools like Grafana and failover mechanisms to detect and recover from issues quickly. | Scalability & resource management, availability |
Spotify’s System Design ensures seamless streaming, user interactions, and personalized features like Wrapped. The architecture is built for scalability and high availability, handling millions of simultaneous requests efficiently.
Here’s how Spotify processes user requests and ensures scalability:
API Gateway: Acts as the entry point, authenticating user requests.
Load Balancer: Distributes requests evenly across application servers to handle large volumes of traffic.
Messaging Queue: User interactions (e.g., playing songs, creating playlists) are sent to a queue (like Pub/Sub or Kafka). This queue distributes the data to various microservices for tasks like generating recommendations or creating Wrapped summaries. This allows the data to be processed asynchronously, improving scalability and availability.
This asynchronous approach enhances scalability and availability, ensuring Spotify’s system can handle traffic spikes and real-time demands.
Spotify’s microservices architecture supports various tasks:
User Service: Manages user data, including preferences and subscriptions, with connections to a payment service for subscription verification.
Upload Service: Ingests new content from artists.
Transcoding Service: Converts uploaded files into streaming-compatible formats, storing them in cloud-based blob storage (and metadata into an SQL database).
Streaming Service: Delivers content to users via a content delivery network (CDN), minimizing latency.
Search Service: Enables fast lookups using Elasticsearch.
Processing Service: Powers recommendations and Wrapped summaries using advanced machine learning models.
Monitoring Service: Monitors the overall system’s health and alerts in case of errors, failures, etc.
Spotify employs multiple database types:
Blob Storage: Stores tracks, podcasts, and audiobooks.
SQL Databases: Store user metadata like account details.
NoSQL Databases: Handle activity data such as listening history, playlists, and preferences.
Let’s explore how data processing services process such a massive amount of data at scale to create personalized Spotify Wrapped.
Spotify uses the ETL (Extract, Transform, Load) process: Extract defines how data is collected, Transform covers how data is processed and turned into features, and Load specifies where data is stored for efficient retrieval. They also use reverse ETL to create Wrapped from the processed data.
A data collection service collects data from data resources (databases) and passes it to tools like Kafka or Pub/Sub to stream and make it available for immediate processing.
The data from the ingestion layer is fed to the processing layer, where the batch processor runs on massive data, aggregating users’ year-long listening tasks and generating insights. Spotify uses Google Cloud Bigtable to efficiently handle its extensive time-series data and user listening history, optimizing it for fast data aggregation over specific time frames.
In 2019, Spotify’s use of Bigtable and BigQuery for data processing resulted in processing 5x data while reducing 25% of the overall cost.6
Spotify can quickly compile user-level insights by structuring data storage to minimize shuffling (reducing the need to move data between nodes, which can be time-consuming and resource-intensive) during processing.
Note: The following illustration is an in-depth exploration of how data processing services process data and transform it into a personalized Wrapped.
Apache Spark and other big data frameworks process this data at scale, and the results are stored in data warehouses like Google BigQuery.
Finally, data visualization tools and services aggregate this processed data, allowing Wrapped summaries to be sent to the user in real time through APIs. Cloud services ensure low latency, high availability, and scalability across Spotify’s global infrastructure.
The Wrapped summaries are sent to users via Email or in-app notifications through Pub/Sub service.
Note: Spotify Wrapped is all about personalization, done by utilizing advanced machine learning algorithms. The ML engine uses
, collaborative filtering A filtering technique to recommend content based on the behavior of similar users or items. , and a hybrid model, mostly the best of both, to generate personalized Wrapped for each user. content-based filtering A filtering technique to recommend content based on similarities between items or content based on metadata and content features.
For 2019 Wrapped, Spotify processed decade-long data of users by utilizing Bigtable. A similar data processing pattern for a year-long data is shown below:
The front-end experience for Spotify Wrapped plays a crucial role in driving user engagement. The design of the Wrapped interface transforms raw data into fun, shareable content.
Spotify Wrapped’s front-end elements include:
Personalized visualizations: Insights are displayed as animated reels or cards.
NLP-powered content: Uses natural language processing to generate captions and labels for animations.
Audio Aura (2021): Colors representing listening intensity for different genres.
Sound Town (2023): Mapped users’ tastes to fictional cities, creating playful, shareable visuals.
These interactive features enhance user engagement, turning data into delightful experiences.
Here are 4 key takeaways from Spotify’s approach to delivering a seamless Wrapped experience each year:
A robust, scalable System Design is the backbone of Wrapped. It handles huge data volumes by separating real-time and batch-processing content to ensure fast data access and reliable yearly insights.
Using solutions like Bigtable and BigQuery, Spotify minimizes data shuffling and enables efficient aggregation, providing quick user-level insights for millions.
Advanced machine learning models help Spotify deliver Wrapped’s unique, personalized insights by analyzing patterns in listening data.
By employing auto-scaling and load balancing, Spotify can smoothly manage the surge in Wrapped engagement.
As Spotify Wrapped continues to scale year after year, it offers a glimpse into the complex System Design that powers real-world applications at a massive scale.
Spotify's developers add new features and insights to make the user experience better by the year — so it's hard to tell what's coming next. However, we can expect AI to level up the Wrapped experience through features like:
Interactive, real-time playlists that evolve based on a user’s Wrapped experience
AI-driven "music DNA" visualizations breaking down listening habits into dynamic, shareable formats
Leveraging GenAI to create unique soundtracks for users, blending favorite genres, moods, and artists into a custom composition
Spotify Wrapped highlights how important approaches like cloud-based data pipelines, advanced machine learning models, and auto-scaling for cost savings are in System Design.
To truly understand these systems, you'd need to dive deep into the world of scalable System Design. If you haven't done so yet, I recommend starting with the following course:
System Design interviews are now part of every Engineering and Product Management Interview. Interviewers want candidates to exhibit their technical knowledge of core building blocks and the rationale of their design approach. This course presents carefully selected system design problems with detailed solutions that will enable you to handle complex scalability scenarios during an interview or designing new products. You will start with learning a bottom-up approach to designing scalable systems. First, you’ll learn about the building blocks of modern systems, with each component being a completely scalable application in itself. You'll then explore the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process. Finally, you'll design several popular services by using these modular building blocks in unique combinations, and learn how to evaluate your design.
Disclaimer: All technical information and design insights provided in this newsletter are curated by our System Design experts to the best of their knowledge and based on available resources, including insights from Spotify’s engineering blogs. While we strive for accuracy, some details may vary from Spotify’s actual implementations and are meant for educational interpretations.
Free Resources