Basics of Schema Registry

Learn the basics of using Schema Registry with Kafka.

Previously, we used Schema Registry. Let’s understand it in detail.

What is Schema Registry?

Schema Registry is a central repository for storing and managing schemas for data in Kafka topics. Schemas are used to define the structure of data, and they can be used to ensure data consistency and compatibility as schemas evolve.

Several popular implementations of Schema Registry are commonly used with Apache Kafka. Some of them include:

  • Confluent Schema Registry: It is the official Schema Registry implementation provided by Confluent. It offers advanced features and tight integration with the Confluent Platform.

  • Apicurio Registry: It is an open-source Schema Registry that provides schema management capabilities for Kafka and other messaging and event streaming platforms.

These implementations support Avro, JSON Schema, and Protobuf schemas and offer features like schema validation, versioning, compatibility checks, and schema evolution.

Why do we need Schema Registry?

When data is serialized and deserialized using the same schema, we need to be sure that the data is always in a consistent format. For example, a data analytics application might need to know the schema of the data it consumes to parse it and extract the insights it needs properly. If the schema of the data changes, the application might not be able to parse the data correctly, which could lead to errors.

Kafka Schema Registry helps to avoid these errors by providing a central place to store and manage schemas. When a producer or consumer wants to send or receive data from a Kafka topic, it can first check Schema Registry to get the schema for the topic. This ensures that the producer and consumer use the same schema, which helps ensure data consistency.

Let’s expand on this and understand some of the key features of Schema Registry.

Key features

  • Centralized repository: Schema Registry is a dedicated location where producers and consumers can register and retrieve the schemas associated with their data. This centralization simplifies schema handling across multiple Kafka topics and applications.

  • Schema validation: When a new schema is registered, it undergoes validation to ensure it adheres to the specified schema format (e.g., the Avro or JSON Schema).

  • Schema versioning and compatibility: Each registered schema is assigned a unique identifier, and multiple versions of the same schema can coexist in the Registry. This allows for backward (and forward) compatibility because different versions of the schema can be used by different producers and consumers. Versioning enables smooth evolution of data structures over time without causing disruptions in data processing.

Since producers and consumers rely on the Registry to retrieve the schema associated with a given topic or data record, this eliminates the need for manual schema distribution and ensures consistency across different components of a Kafka-based system. By abstracting away schema details, developers can focus on the logic of their applications.

How does it work?

Schema Registry provides a RESTful API that allows users to interact with the Registry for schema validation, storage, and retrieval. This service is a central hub where Avro, JSON Schema, and Protobuf schema can be registered, managed, and accessed. It accepts schema definitions, validates their syntax and structure, and stores them securely.

In addition to the REST service, Schema Registry provides serializers and deserializers that integrate with Apache Kafka clients. These components plug into the Kafka Producer and Consumer APIs to handle schema-related operations during message processing.

Get hands-on with 1200+ tech skills courses.