Datasets
Get an introduction to the strongly typed Datasets API available in Spark.
We'll cover the following
Datasets
Below is the definition of a Dataset from the official Databricks documentation:
“A Dataset is a strongly-typed, immutable collection of objects that are mapped to a relational schema. Datasets are a type-safe structured API available in statically typed, Spark supported languages Java and Scala. Datasets are strictly a JVM language feature. Datasets aren’t supported in R and Python since these languages are dynamically typed languages”.
After Spark 2.0, RDD was replaced by Dataset, which is strongly-typed like an RDD, but with richer optimizations under the hood.
Get hands-on with 1400+ tech skills courses.