Spark API
Get an introduction to the three Spark data abstractions - RDDs, DataFrames, and Datasets.
We'll cover the following
APIs
Spark offers APIs and data abstractions that significantly enhance the developer experience. The original Spark paper describes the low-level abstraction called Resilient Distributed Datasets (RDD). Later, others such as DataFrames and Datasets were added. Spark enables distributed data processing through functional transformations of collections of data (RDDs). The Spark API significantly reduces the size of programs compared to the size of the same programs when written in other frameworks such MapReduce. The Spark APIs are:
Resilient Distributed Datasets
DataFrames
Datasets
Get hands-on with 1400+ tech skills courses.