Delve into Big Data essentials, explore data types, and gain insights into Hadoop components like YARN, MapReduce, HDFS, and Spark. Discover foundations to excel in the growing Big Data field.

data.tar.gz

HADOOP_HOME

JAVA_HOME

HDFS_NAMENODE_USER

HDFS_DATANODE_USER

HDFS_SECONDARYNAMENODE_USER

YARN_RESOURCEMANAGER_USER

YARN_NODEMANAGER_USER

HADOOP_CONF_DIR

ZK_HOME

PIG_HOME

AvroWriteExample

AvroReadExample

AvroGeneratedCodeReadExample

AvroGeneratedCodeWriteExample

AvroRPCExample

ParquetReadExampleJob

ParquetWriteExampleJob

ParquetAvroReadExampleJob

ParquetAvroWriteExampleJob

ParquetProjectionReadExampleJob

SequenceFileReadExampleJob

SequenceFileWriteExampleJob

SequenceFileSyncPointExampleJob

TestCarMapperJob

TestCarReducerJob

CarCounterMrProgramJob

MyLiveAppJob

DataNodeWebUI2

YarnWebUI

YarnWebUI-copy

YarnWebUI-copy-copy

JHS-UI

Spark-UI-copy

Spark-History-Server-UI-3

This course offers a one-of-a-kind rich and interactive experience to learn the fundamentals and basics of Big Data. Throughout this course, you will have plenty of opportunities to get your hands dirty with functioning Hadoop clusters.

You will start off by learning about the rise of Big Data as well as the different types of data like structured, unstructured, and semi-structured data. You will then dive into the fundamentals of Big Data such as YARN (yet another resource manager), MapReduce, HDFS (Hadoop Distributed File System), and Spark.

By the end of this course, you will have the foundations in place to start working with Big Data, which is a massively growing field.

Introduction to Big Data and Hadoop

# Mapper

We'll start by examining Hadoop's Java classes to see how the abstract concept of map and reduce is translated in code. The Map phase of a MapReduce job is implemented by a Java class called `Mapper`. It maps input key/value pairs to a set of intermediate key/value pairs. Conceptually, a mapper performs parsing, projection (selecting fields of interest from the input) and filtering (removing non-interesting or malformed records). The `Mapper` class is defined as follows in the package `org.apache.hadoop.mapreduce;` 

## Mapper class

```java
public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
        //...class body
    }
```

The `Mapper` class provided by Hadoop can be used to write our custom derived mapper class. Both the phases in a MapReduce have key/value pairs as input and output. If you look at the `Mapper` class you can see four generic parameters representing the inputs and outputs to the mapper class. The `Mapper` class defines a `map(...)` method that contains the mapping logic. This method is overridden in the user's derived mapper class and contains custom logic for executing the map phase of the user's job.

At times, having an input key for the map phase may not make sense. In our car counting example, we only care about seeing strings representing the car made in the input file. The key received by the `map(...)` method will be a long value representing the offset of the beginning of the line from the start of the file. The logic of our mapper funcion will be trivial. Whenever we see a brand name like _Toyota_ , we output the name as the key and a count of 1. This indicate that we came across one car of that particular brand. Similarly, whenever we see a string _Mercedes_, we output the text Mercedes as the key and a count of 1 to document that we saw one car of Mercedes make, so on and so forth. Our mapper class looks like this:


This lesson describes the Mapper function used to implement the Map phase.

Mapper

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Mapper

Mapper

Mapper class