Home/Blog/System Design/Bluesky Social Network: System Design and Key Features
Home/Blog/System Design/Bluesky Social Network: System Design and Key Features

Bluesky Social Network: System Design and Key Features

12 min read
Dec 03, 2024
content
What makes Bluesky unique?
What is the AT protocol?
Features offered by the AT protocol
How does identity work in the AT protocol?
System Design of Bluesky
Uniqueness in feed generation
Database scalability
Challenges for Bluesky
Building for the future

Imagine waking up to find your Twitter/X account with millions of followers vanished overnight—your posts, identity, and followers—all wiped out with a single ban.

In a perfect world, no single entity could take your social presence away from you. But we live in a world where certain companies and governments wield their power to delete accounts of even the most well-intentioned people.

As a solution to this, Bluesky is carving a bold new path in social networking.

While centralized platforms currently dominate social networking, Bluesky is a decentralized social networking initiative that puts the control into its users' hands. In many ways, it aims to uphold social networking to the same values that originally founded the internet: transparency and independence.

Bluesky's vision carries a lot of promise, and it's attracting a fast-growing user base. In November 2024, Bluesky's users skyrocketed 165% to over 20 million users. But if Bluesky stands a chance as a viable replacement for traditional social platforms, it needs to be ready to scale accordingly.

Today, we'll explore the System Design of Bluesky and its capacity for future-proofing, security, and resilience.

We'll cover:

  • What is Bluesky?

  • How the AT protocol contributes to Bluesky’s unique features

  • The System Design of Bluesky

  • The uniqueness of recommendation algorithms and feed generation

  • Challenges for Bluesky

Let’s dive in.

What makes Bluesky unique?#

Bluesky is very different from traditional platforms like Twitter/X.

widget

Unlike traditional platforms, Bluesky offers:

  • Portability: Users maintain ownership of their identity, posts, and followers, even when switching platforms. You’re no longer tied to a single ecosystem.

  • Transparent algorithms: Instead of opaque recommendation engines, Bluesky offers users open, customizable algorithms to shape their feed experience.

  • Data ownership: Bluesky ensures your data remains yours—accessible and portable across services, not held hostage by any one platform.

Bluesky breaks the social networking status quo by providing collaborative host services (decentralization) and allowing interoperability to share data between platforms.

So, how do we define the architecture and standards for building a decentralized social network?

Bluesky’s answer for this is the Authenticated Transfer Protocol (AT protocol).

What is the AT protocol?#

The AT protocol refers to the underlying decentralized framework that powers Bluesky. By governing how data flows, identities are managed, and communication happens across the network, the AT protocol helps achieve an open and interoperable social network.

Features offered by the AT protocol#

The AT protocol offers the following unique features:

  • Account portability: The AT protocol makes it easy to switch social platforms without losing your profile, followers, or content, ensuring your digital identity belongs to you—not the platform.

  • Transparent algorithms: The AT protocol doesn’t force you to stick with a single hidden algorithm. Instead, it offers an open marketplace where you can choose how your feed looks or even create your own. Think of it like picking a playlist, but for your social feed, in a way that is personalized and entirely in your control.

  • Decentralized moderation: Moderation isn’t ruled by one company; communities can set rules, creating spaces that reflect their values while maintaining a fair and decentralized structure. 

  • Open-source: The AT protocol is open-source, meaning developers can contribute freely, driving innovation and ensuring the platform remains transparent and adaptable.

  • Global conversation: Bluesky allows access to a global conversation including breaking news and viral posts. Unlike Mastodon, another decentralized social network where the server or instance you join determines your experience, your Bluesky experience is based on feeds and accounts you follow. You can also use your own domain as your username, maintaining flexibility and portability wherever your account is hosted.

  • Performance: The AT protocol prioritizes performance, providing fast timeline generation and loading at a large scale, unlike centralized social media platform algorithms.

How does identity work in the AT protocol?#

How do users create their accounts, and how are they secured?

In Bluesky, identity works through a system called Decentralized Identifiers (DIDs), providing users both stability and flexibility in managing their online identities, as follows:

  • Each user has a universal, platform-agnostic identity, such as did:plc:123abc, which acts as a stable cryptographic identifier. This ensures that even if you switch servers hosting your data (also called personal data servers (PDS)—we’ll discuss them later on), your DID, aka your identity, remains unchanged.

    • The did here is a unique and stable cryptographic identifier representing a user or entity.

    • The plc stands for “personal linked chains,” a specific method Bluesky uses for generating and resolving DIDs.

    • The 123abc is a unique string that distinguishes each DID.

Each DID corresponds to a DID document containing public keys and service endpoints essential for verification and interaction.

DID is a recent W3C standard, which is the secret sauce behind Blueksy’s interoperability feature.

  • To make DIDs more user-friendly, they are paired with a handle the same way as X (formerly Twitter) does it, like @yourname.bsky.social. The good part is users can also use their domain name as their handle, e.g., @yourdomain.com, linking it to their DID through a DNS configuration.

Educative registering at Bluesky with its own handle name
Educative registering at Bluesky with its own handle name
  • As the handle and DID are separate from the platform, you can move your accounts, including your posts, followers, or profile, to another PDS without losing your social graph.

  • Your posts and activities are cryptographically signed using your DID, making it difficult for any single entity, including Bluesky, to remove or alter your identity. Even if a hosting server goes offline, you can seamlessly restore your account data on another server from a backup.

System Design of Bluesky#

Let's explore Bluesky's System Design with a high-level design perspective.

Users connect to a personal data server (PDS) that hosts their account, data, and identity:

  • The PDS stores posts, followers, and settings, acting as a gateway to the network.

  • PDSs communicate using the AT Protocol, enabling interactions across the broader network.

The feed generation process is achieved through:

  • Relay crawls and aggregates updates from all known PDSs to produce a public stream of repository updates.

  • The updates are sent to feed generators using Firehose.

  • Firehose feeds data to labelers and feed generators, which classify content and apply user-selected algorithms to create personalized feeds.

  • Users can choose or create algorithms that define how posts are sorted, such as trending or custom feeds.

These generated feeds are processed through App View, which converts post IDs to full posts before displaying them to users, as illustrated below:

An overview of the architecture and workflow of Bluesky Social
An overview of the architecture and workflow of Bluesky Social

Let’s discuss the role of each component and service shown in the diagram above.

System Components and their roles

Components

Roles


App View

  • Handles API requests and user interactions on the frontend.
  • Processes data from the network to compile posts from generated feeds, notifications, and user profiles.
  • Acts as a bridge between users and decentralized components of the system.


Personal Data Servers

  • Hosts user profiles, posts, associated media files, and follow graphs.
  • Allows users to switch service providers without losing data or connections.
  • Handles authentication and secures user identities in the decentralized network.


User Data Repositories

  • Acts as the central source of truth for a user’s digital identity and content.
  • Stores all user data in a compact binary format.
  • Supports retrieval of data for building feeds or displaying content.
  • Enables data portability when switching PDSs.


Relays

  • Crawls and aggregates updates from all known PDSs to provide a low-latency data stream.
  • Consumes update streams from PDS via WebSocket connections and replicates user repositories.
  • Validates updates using cryptographic signatures and Merkle tree proofs.

Firehose

  • Streams bulk data for analytics and development
  • Streams updates that notifies subscribers whenever records are added or deleted in known repositories

Labelers

  • Applies metadata to posts for content classification, e.g., spam posts.
  • Supports custom filtering, moderation tools, and discovery mechanism

Feed Generators

  • Consumes streams from Firehose and generates personalized feeds.
  • Allows users to choose algorithms and customize their feeds.

Uniqueness in feed generation#

Other social platforms display feeds to users in reverse chronological order; this is not the case with Bluesky.

Bluesky offers an open marketplace of feed generators where users can choose or even create algorithms to curate content.

Bluesky supports thousands of custom feeds, from manually curated lists to machine-learning-powered feeds, enabling unique and dynamic ways to explore content.

So, how does Bluesky show feeds, specifically when users can customize them?

The answer is indexing, similar to how web pages are indexed and displayed by search engines. Here’s how it works:

  • A user posts on Bluesky, which is stored in their PDS as a record, including metadata such as the user’s handle, timestamp, and content. These posts or records are signed cryptographically to show authorship with the Lexicon schemaLexicon is a schema definition language, with a similar format as JSON, used to describe atproto records, HTTP endpoints and event streaming messages..

{
"lexicon": 1,
"id": "com.example.getProfile",
"type": "query",
"parameters": {
"user": {"type": "string", "required": true}
},
"output": {
"encoding": "application/json",
"schema": {
"type": "object",
"required": ["did", "name"],
"properties": {
"did": {"type": "string"},
"name": {"type": "string"},
"displayName": {"type": "string", "maxLength": 64},
"description": {"type": "string", "maxLength": 256}
}
}
}
}

  • The indexing system crawls the repositories of various PDSs to create an index of all available posts. It collects key information such as user handle (@username), post content, hashtag or keywords, date or time of posting, etc.

  • The indexer organizes these data points for easy querying, for example, by user handle (a list of posts made by @username), by hashtag (a list of posts containing a specific hashtag, like #tech), by data (a chronological list of posts made within a specific time frame), etc.

  • When a user searches for a specific topic, the indexing system provides relevant posts from the database based on indexed data without having to query PDSs. It should be noted, however, that the actual posts are fetched from PDS but indexed for quick retrieval by the indexing system.

The indexing system is a combination of multiple components, including relays, Firehose, and the App View.

Database scalability#

As Bluesky scales to accommodate its growing user base, a strategic shift in database architecture has been pivotal due to the following issues with its earlier PostgreSQL database:

  • When multiple processes accessed the same data in the PostgreSQL database, connection pool backups and lock contention caused delays.

    • Connection pool backup occurs when the database connection pool gets overloaded, leading to delays or failure in serving requests.

    • Lock contention happens when multiple resources try to access the same resource and keep waiting for the lock to be released.

  • PostgreSQL’s query planner sometimes selected suboptimal execution plans, leading to dramatic slowdowns in query performance—up to 1,000 times slower than expected. This unpredictability caused performance instability and, in some cases, system outages.

  • Lacked support for horizontal scaling, essential for handling growing user bases and massive data streams.

The reasons to switch from PostgreSQL to ScyllaDB
The reasons to switch from PostgreSQL to ScyllaDB

ScyllaDB came as an alternative to PostgreSQL with its unique advantages:

  • It supports horizontal scaling due to its wide-column database (a NoSQL database), which efficiently spreads data across multiple servers.

  • It provides fine-grained control over how data is indexed and queried, which allows for performance optimization for specific use cases.

This switch comes with a couple of tradeoffs that Bluesky engineers had to plan around to ensure scalability:

  • Storing data in a less compact format increases storage requirements.

  • Indexing on write makes data storage more resource-heavy compared to relational databases.

ScyllaDB is used for App View, a read-heavy service that reads data for feeds. The data repositories in PDSs are entirely different and use SQLite—a database written in C that requires zero configurations and stores data in a single file.

As Bluesky scaled, it shifted from AWS to on-premises infrastructure to gain greater control over system performance, reduce operational costs, and achieve better customizability for its growing network.

Challenges for Bluesky#

Users and service providers find it challenging to understand and adapt to complex decentralized technology solutions that require specialized knowledge and skills.

  • Balancing decentralization and performance: Decentralized systems are complex and often face performance trade-offs. Bluesky’s shift to ScyllaDB and SQLite is aimed at balancing user autonomy with system efficiency, but these efforts require continuous refinement.

  • Incentivizing users to switch: Encouraging users to move from mainstream social platforms to a decentralized alternative is challenging. Bluesky emphasizes ease of use and highlights benefits like portability and transparent recommendation algorithms to lower this barrier.

  • Moderating content without central authority: Content moderation in a decentralized system is tricky. There is no easy way to remove harmful content without risking censorship. Bluesky’s open indexing system enables community-driven moderation and third-party indexing, offering alternative approaches to harmful content without undermining openness.

  • Ensuring security and privacy: Protecting user data while maintaining accessibility for decentralized indexing and curation is complex. Bluesky uses cryptographic methods like DIDs to ensure data integrity, authentication, and secure data transfer.

  • Introducing competitive features: To succeed and compete with established platforms, Bluesky needs to develop advanced content moderation tools, monetization options, and integrations while adhering to its open and federated principles.

Building for the future#

widget

Bluesky has a clear vision: to give users more control over their online experience. The AT protocol powers this by creating a unique environment where users are confident that their identity will never be lost. This level of control and security gives Bluesky the potential to become the go-to choice for social media in the future, provided it can maintain a safe, secure, and optimized environment. 

As a developer, you can get involved with the open-source Bluesky project and build something for the future of social networking, whether it’s a new feed algorithm or a privacy-enhancing tool for people around the globe.

Here's how you can get started:

  • Request an invite code for the "atproto" repository: You can find a link to a form to request access in Bluesky's Call for Developers.

  • Learn the AT Protocol: Understand the core concepts of the AT Protocol so you can build with it.

  • Build a project: Create custom clients, feeds, or other applications that interact with the Bluesky network.

If you're not quite ready for that, you can explore the System Design behind Twitter, Instagram, and newsfeed systems with the course: Grokking the Modern System Design Interview.

Happy learning!


Written By:
Fahim ul Haq
Join 2.5 million developers at
Explore the catalog

Free Resources