Hadoop Ecosystem

The Hadoop Ecosystem is a set of tools that helps us solve big data problems. You'll learn about all its intricacies here.

What is Hadoop ?

Hadoop is an open source software which involves solving big data problems using large clusters of hardware. It efficiently stores and processes big data across big clusters. The idea of Hadoop came from a map reduce paper proposed by Google. Hadoop is developed in the Java programming language.

Components of Hadoop

While setting up a Hadoop cluster for Big Data processing, there are two services which are mandatory

  1. HDFS (Hadoop Distributed File System) for storing data

  2. YARN (Yet Another Resource Negotiator) for processing the data in the HDFS

%0 node_1 Hadoop Components node_2 HDFS node_1->node_2 node_3 YARN node_1->node_3
Hadoop Components

Hadoop Distributed File System (HDFS)

Hadoop Distributed File System consists of Name Nodes and Data Nodes.

Name Node

It is the Master Node which keeps ...