Union, UnionByName, and DropDuplicates
Get introduced to Union, UnionByName, and DropDuplicates transformations in this lesson.
We'll cover the following...
Union
The union
transformation allows us to combine two DataFrames, thus producing a new one containing the rows from both.
This operation has the following characteristics:
-
The schemas of both DataFrames have to be identical. This doesn’t detour much from the classical SQL UNION operation available in RDBMS.
-
Duplicate records are preserved and aggregated to the final results.
We are going to first present a graphical representation of this transformation, which illustrates an interesting property that makes union
an attractive transformation in specific scenarios.
The union
transformation merges and piles up one DataFrame after the another. No exchange of ...