Spark's Java Main Abstraction: The DataFrame
Get introduced to Spark's main abstraction in this lesson.
What is a DataFrame?
A DataFrame is both a logical container of data and an API, purposely built as a higher abstraction to the RDDs, as an older Spark abstraction in the case of the Java API and JavaRDDs.
In the Spark context, “logical container” defines a placeholder for data that spark loads and distributes, while the worker nodes process on an actual physical cluster.
The ...
Access this course and 1400+ top-rated courses and projects.