Data Schema: Avro and Protobuf
Learn common data schema, including Avro and Protobuf.
We'll cover the following...
One of the effective techniques for ensuring data quality is to implement a data schema. By defining the structure of data in a specific format, a data schema ensures consistency and accuracy in the exchange, storage, and utilization of data.
For example, in the context of data exchange between two applications, a schema defines the structure and constraints of data being passed between systems, including data format (XML, JSON, or CSV), field types (int, float, or string), and any rules such as the range of a numeric value and the date format. We will learn about two common data schema types—Avro and Protobuf—and how to incorporate them into data engineering pipelines.
Apache Avro
Apache Avro is an open-source data serialization system that exchanges and stores data between different applications in an efficient manner, independent of the programming languages they use.
Serialization is the process of converting an object into a format that can be easily stored, transmitted, and reconstructed later. It encodes the object's ...