Hadoop Ecosystem
The Hadoop Ecosystem is a set of tools that helps us solve big data problems. You'll learn about all its intricacies here.
We'll cover the following...
What is Hadoop ?
Hadoop is an open source software which involves solving big data problems using large clusters of hardware. It efficiently stores and processes big data across big clusters. The idea of Hadoop came from a map reduce paper proposed by Google. Hadoop is developed in the Java programming language.
Components of Hadoop
While setting up a Hadoop cluster for Big Data processing, there are two services which are mandatory
-
HDFS (Hadoop Distributed File System) for storing data
-
YARN (Yet Another Resource Negotiator) for processing the data in the HDFS
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System consists of Name Nodes and Data Nodes.
Name Node
It is the Master Node which keeps ...