DataFrames
This lesson discusses Spark DataFrames.
We'll cover the following...
DataFrames
A Dataframe is the most common Structured API. It represents a table with rows and columns. Each column has a defined type maintained in a schema. You can think of the DataFrame as a spreadsheet that is too big to fit on a single machine so it has parts of it spread across a cluster of machines. Even if the spreadsheet can fit onto a single machine, the desired computations take too long so the data has to be chunked and processed on multiple machines in parallel.
Another way to describe DataFrames is to think of them as distributed table-like collections with well-defined rows and columns. Each column has the same type of ...