Getting Started with Spark

Get introduced to Apache Spark.

Apache Spark

Apache Spark is a computation engine and a stack of tools for big data. It has capabilities around streaming, querying your dataset, Machine Learning (Spark MLlib), and graph processing (GraphX).

Spark is developed in Scala but has bindings for Python, Java, SQL, and R, too.

Spark relies entirely on in-memory processing, which makes it manifold times faster than the performance of respective Hadoop functionalities.

MapReduce and Spark comparison

With the advent of Spark, the MapReduce framework took a backseat due to several reasons mentioned below:

  • Iterative jobs: Certain Machine