Imagine waking up to find your Twitter/X account with millions of followers vanished overnight—your posts, identity, and followers—all wiped out with a single ban.
In a perfect world, no single entity could take your social presence away from you. But we live in a world where certain companies and governments wield their power to delete accounts of even the most well-intentioned people.
As a solution to this, Bluesky is carving a bold new path in social networking.
While centralized platforms currently dominate social networking, Bluesky is a decentralized social networking initiative that puts the control into its users' hands. In many ways, it aims to uphold social networking to the same values that originally founded the internet: transparency and independence.
Bluesky's vision carries a lot of promise, and it's attracting a fast-growing user base. In November 2024, Bluesky's users skyrocketed 165% to over 20 million users. But if Bluesky stands a chance as a viable replacement for traditional social platforms, it needs to be ready to scale accordingly.
Today, we'll explore the System Design of Bluesky and its capacity for future-proofing, security, and resilience.
We'll cover:
What is Bluesky?
How the AT protocol contributes to Bluesky’s unique features
The System Design of Bluesky
The uniqueness of recommendation algorithms and feed generation
Challenges for Bluesky
Let’s dive in.
Bluesky is very different from traditional platforms like Twitter/X.
Unlike traditional platforms, Bluesky offers:
Portability: Users maintain ownership of their identity, posts, and followers, even when switching platforms. You’re no longer tied to a single ecosystem.
Transparent algorithms: Instead of opaque recommendation engines, Bluesky offers users open, customizable algorithms to shape their feed experience.
Data ownership: Bluesky ensures your data remains yours—accessible and portable across services, not held hostage by any one platform.
Bluesky breaks the social networking status quo by providing collaborative host services (decentralization) and allowing interoperability to share data between platforms.
So, how do we define the architecture and standards for building a decentralized social network?
Bluesky’s answer for this is the Authenticated Transfer Protocol (AT protocol).
The AT protocol refers to the underlying decentralized framework that powers Bluesky. By governing how data flows, identities are managed, and communication happens across the network, the AT protocol helps achieve an open and interoperable social network.
The AT protocol offers the following unique features:
Account portability: The AT protocol makes it easy to switch social platforms without losing your profile, followers, or content, ensuring your digital identity belongs to you—not the platform.
Transparent algorithms: The AT protocol doesn’t force you to stick with a single hidden algorithm. Instead, it offers an open marketplace where you can choose how your feed looks or even create your own. Think of it like picking a playlist, but for your social feed, in a way that is personalized and entirely in your control.
Decentralized moderation: Moderation isn’t ruled by one company; communities can set rules, creating spaces that reflect their values while maintaining a fair and decentralized structure.
Open-source: The AT protocol is open-source, meaning developers can contribute freely, driving innovation and ensuring the platform remains transparent and adaptable.
Global conversation: Bluesky allows access to a global conversation including breaking news and viral posts. Unlike Mastodon, another decentralized social network where the server or instance you join determines your experience, your Bluesky experience is based on feeds and accounts you follow. You can also use your own domain as your username, maintaining flexibility and portability wherever your account is hosted.
Performance: The AT protocol prioritizes performance, providing fast timeline generation and loading at a large scale, unlike centralized social media platform algorithms.
How do users create their accounts, and how are they secured?
In Bluesky, identity works through a system called Decentralized Identifiers (DIDs), providing users both stability and flexibility in managing their online identities, as follows:
Each user has a universal, platform-agnostic identity, such as did:plc:123abc
, which acts as a stable cryptographic identifier. This ensures that even if you switch servers hosting your data (also called personal data servers (PDS)—we’ll discuss them later on), your DID, aka your identity, remains unchanged.
The did
here is a unique and stable cryptographic identifier representing a user or entity.
The plc
stands for “personal linked chains,” a specific method Bluesky uses for generating and resolving DIDs.
The 123abc
is a unique string that distinguishes each DID.
Each DID corresponds to a DID document containing public keys and service endpoints essential for verification and interaction.
DID is a recent W3C standard, which is the secret sauce behind Blueksy’s interoperability feature.
To make DIDs more user-friendly, they are paired with a handle the same way as X (formerly Twitter) does it, like @yourname.bsky.social
. The good part is users can also use their domain name as their handle, e.g., @yourdomain.com
, linking it to their DID through a DNS configuration.
As the handle and DID are separate from the platform, you can move your accounts, including your posts, followers, or profile, to another PDS without losing your social graph.
Your posts and activities are cryptographically signed using your DID, making it difficult for any single entity, including Bluesky, to remove or alter your identity. Even if a hosting server goes offline, you can seamlessly restore your account data on another server from a backup.
Let's explore Bluesky's System Design with a high-level design perspective.
Users connect to a personal data server (PDS) that hosts their account, data, and identity:
The PDS stores posts, followers, and settings, acting as a gateway to the network.
PDSs communicate using the AT Protocol, enabling interactions across the broader network.
The feed generation process is achieved through:
Relay crawls and aggregates updates from all known PDSs to produce a public stream of repository updates.
The updates are sent to feed generators using Firehose.
Firehose feeds data to labelers and feed generators, which classify content and apply user-selected algorithms to create personalized feeds.
Users can choose or create algorithms that define how posts are sorted, such as trending or custom feeds.
These generated feeds are processed through App View, which converts post IDs to full posts before displaying them to users, as illustrated below:
Let’s discuss the role of each component and service shown in the diagram above.
Components | Roles |
App View |
|
Personal Data Servers |
|
User Data Repositories |
|
Relays |
|
Firehose |
|
Labelers |
|
Feed Generators |
|
Other social platforms display feeds to users in reverse chronological order; this is not the case with Bluesky.
Bluesky offers an open marketplace of feed generators where users can choose or even create algorithms to curate content.
Bluesky supports thousands of custom feeds, from manually curated lists to machine-learning-powered feeds, enabling unique and dynamic ways to explore content.
So, how does Bluesky show feeds, specifically when users can customize them?
The answer is indexing, similar to how web pages are indexed and displayed by search engines. Here’s how it works:
A user posts on Bluesky, which is stored in their PDS as a record, including metadata such as the user’s handle, timestamp, and content. These posts or records are signed cryptographically to show authorship with the
{"lexicon": 1,"id": "com.example.getProfile","type": "query","parameters": {"user": {"type": "string", "required": true}},"output": {"encoding": "application/json","schema": {"type": "object","required": ["did", "name"],"properties": {"did": {"type": "string"},"name": {"type": "string"},"displayName": {"type": "string", "maxLength": 64},"description": {"type": "string", "maxLength": 256}}}}}
The indexing system crawls the repositories of various PDSs to create an index of all available posts. It collects key information such as user handle (@username
), post content, hashtag or keywords, date or time of posting, etc.
The indexer organizes these data points for easy querying, for example, by user handle (a list of posts made by @username
), by hashtag (a list of posts containing a specific hashtag, like #tech
), by data (a chronological list of posts made within a specific time frame), etc.
When a user searches for a specific topic, the indexing system provides relevant posts from the database based on indexed data without having to query PDSs. It should be noted, however, that the actual posts are fetched from PDS but indexed for quick retrieval by the indexing system.
The indexing system is a combination of multiple components, including relays, Firehose, and the App View.
As Bluesky scales to accommodate its growing user base, a strategic shift in database architecture has been pivotal due to the following issues with its earlier PostgreSQL database:
When multiple processes accessed the same data in the PostgreSQL database, connection pool backups and lock contention caused delays.
Connection pool backup occurs when the database connection pool gets overloaded, leading to delays or failure in serving requests.
Lock contention happens when multiple resources try to access the same resource and keep waiting for the lock to be released.
PostgreSQL’s query planner sometimes selected suboptimal execution plans, leading to dramatic slowdowns in query performance—up to 1,000 times slower than expected. This unpredictability caused performance instability and, in some cases, system outages.
Lacked support for horizontal scaling, essential for handling growing user bases and massive data streams.
ScyllaDB came as an alternative to PostgreSQL with its unique advantages:
It supports horizontal scaling due to its wide-column database (a NoSQL database), which efficiently spreads data across multiple servers.
It provides fine-grained control over how data is indexed and queried, which allows for performance optimization for specific use cases.
This switch comes with a couple of tradeoffs that Bluesky engineers had to plan around to ensure scalability:
Storing data in a less compact format increases storage requirements.
Indexing on write makes data storage more resource-heavy compared to relational databases.
ScyllaDB is used for App View, a read-heavy service that reads data for feeds. The data repositories in PDSs are entirely different and use SQLite—a database written in C that requires zero configurations and stores data in a single file.
As Bluesky scaled, it shifted from AWS to on-premises infrastructure to gain greater control over system performance, reduce operational costs, and achieve better customizability for its growing network.
Users and service providers find it challenging to understand and adapt to complex decentralized technology solutions that require specialized knowledge and skills.
Balancing decentralization and performance: Decentralized systems are complex and often face performance trade-offs. Bluesky’s shift to ScyllaDB and SQLite is aimed at balancing user autonomy with system efficiency, but these efforts require continuous refinement.
Incentivizing users to switch: Encouraging users to move from mainstream social platforms to a decentralized alternative is challenging. Bluesky emphasizes ease of use and highlights benefits like portability and transparent recommendation algorithms to lower this barrier.
Moderating content without central authority: Content moderation in a decentralized system is tricky. There is no easy way to remove harmful content without risking censorship. Bluesky’s open indexing system enables community-driven moderation and third-party indexing, offering alternative approaches to harmful content without undermining openness.
Ensuring security and privacy: Protecting user data while maintaining accessibility for decentralized indexing and curation is complex. Bluesky uses cryptographic methods like DIDs to ensure data integrity, authentication, and secure data transfer.
Introducing competitive features: To succeed and compete with established platforms, Bluesky needs to develop advanced content moderation tools, monetization options, and integrations while adhering to its open and federated principles.
Bluesky has a clear vision: to give users more control over their online experience. The AT protocol powers this by creating a unique environment where users are confident that their identity will never be lost. This level of control and security gives Bluesky the potential to become the go-to choice for social media in the future, provided it can maintain a safe, secure, and optimized environment.
As a developer, you can get involved with the open-source Bluesky project and build something for the future of social networking, whether it’s a new feed algorithm or a privacy-enhancing tool for people around the globe.
Here's how you can get started:
Request an invite code for the "atproto" repository: You can find a link to a form to request access in Bluesky's Call for Developers.
Learn the AT Protocol: Understand the core concepts of the AT Protocol so you can build with it.
Build a project: Create custom clients, feeds, or other applications that interact with the Bluesky network.
If you're not quite ready for that, you can explore the System Design behind Twitter, Instagram, and newsfeed systems with the course: Grokking the Modern System Design Interview.
Happy learning!
Free Resources