Spark Fundamentals
Overview of Spark as a tool to solve specific problems and its structure.
We'll cover the following...
Why choose Spark?
As the demand to process data and generate information continues to grow, engineers and data scientists are increasingly searching for easy and flexible tools to carry out parallel data analysis. This becomes even more apparent with the dawn of cloud computing, where processing power and horizontal scaling are more available.
Spark comes into this picture as one such tool due to the following principal reasons:
Ease of use: Spark is straightforward to use in comparison to other existing tools that pre-date it, such as Hadoop with MapReduce engine. It enables developers to focus on the logic of computation while they code on high-level APIs. It can also be installed and used on a simple laptop.
Speed: Spark is incredibly fast and is continuously praised for it in the big data world.
General-purpose engine: Spark allows developers to use and combine multiple types of computations, such as SQL queries, text processing, machine learning, etc.
What is Spark?
Spark is fundamentally a cluster-based computational platform designed to be fast and general purpose. If we attempt to define a specific purpose for Spark we’d find ourselves constrained by the many use cases this technology offers. However, Spark is usually referred to as a unified analytics engine for large-scale data processing.
In developers’ terms, the beauty of Spark is in the ...