Parquet: Intro

This lesson introduces the Parquet format used for storing data in columnar format.

We'll cover the following...

Parquet

Parquet literally means a patterned wooden surface. But, in Hadoop world, the term refers to a columnar storage format. Parquet is based on Google’s Dremel paper. It is the love-child of collaboration between Cloudera and Twitter engineers. What sets Parquet apart from other columnar formats is its ability to efficiently store nested data. Deeply nested fields are also stored in a truly columnar fashion and can be read independently of other fields. Let’s see an example. Consider the following record which represents the structure for a car record in JSON:

{
  make        : "",
  model       : "",
  year        : "",
  horsepower  : "",
 
...
Access this course and 1400+ top-rated courses and projects.