SerDe

This lesson explains the importance of serialization in the Hadoop landscape.

We'll cover the following...

SerDe

In this lesson, we’ll study why serialization is important and which serialization systems are used in the Big Data ecosystem. SerDe is short for Serializer/Deserializer. Encoding an object as a byte stream is called serializing the object. Once an object is serialized, its encoding can be transmitted from one running virtual machine to another, and stored on disk or in a database for deserialization later. Serialization is often used in network programming. Objects that need to be transmitted through the network must be converted into bytes. Serialization can also store objects on disk.

Need for SerDe

Why can’t we use Java’s own serialization capability when most of Hadoop daemons are written in Java? Most languages offer libraries for serialization like pickle in Python and marshal in Ruby. ...