Spark API
This lesson introduces data abstractions available in Spark.
We'll cover the following
Spark API
Spark offers APIs and data abstractions that significantly enhance the developer experience. The original Spark paper describes the low-level abstraction called RDD. Later, others like DataFrames and Datasets were added. Spark enables distributed data processing through functional transformations of data collections(RDDs). The Spark API significantly reduces the size of programs compared to other frameworks like MapReduce. The three data abstractions available in Spark are:
-
Resilient Distributed Datasets
-
DataFrames
-
Datasets
Get hands-on with 1400+ tech skills courses.