Exporting Information
Learn how to export DataFrames from Spark into different sources.
We'll cover the following...
In the Spark world, exporting information is simply the act of writing a DataFrame (or other abstractions) into a persistent DataSource, such as a database, or to a persistent storage abstraction, such as a file. In this lesson, we are going to learn how to do both.
There are many similarities between ingesting and exporting information, mostly in programmatic terms at the code level. This is no accident. The creators of the Spark API made the interfaces and methods somewhat similar by applying design patterns, thus providing the developer with an easy-to-remember API.
Exporting to a database
We can pick up where we left off in the previous project because we have a database, Embedded DerbyDB, already bootstrapped to the application during its startup.
However, let’s inspect some new classes added to the project shown below:
mvn install exec:exec
If the previous lesson, the following records were inserted into the STUDENTS
table once the application started:
+---+---+----------+---------+| ID|AGE|FIRST_NAME|LAST_NAME|+---+---+----------+---------+| 1| 20| John| Brown|| 2| 19| Marie| Curie|| 3| 32| Harry| Truman|| 4| 28| Bob| Ross|| 5| 22| Mark| Spencer|| 6| 24| Adam| Birch|+---+---+----------+---------+
For this lesson’s hypothetical requirement, we need to add all the students’ nationalities.
We are also well aware by now that DataFrames are immutable, so we need to apply some transformations to come up with a satisfactory solution.
First, we apply a transformation to the Students
DataFrame. We drop the ID field (which is auto-generated by the RDBMS), the AGE
field, and add an extra column containing countries of origin named “Nationality.”
Naturally, by changing the schema of the DataFrame, we get a new DataFrame as a result of the operation, one that now diverges from the Students table structure. However, in this branch, we took the precaution of creating an extra table that acts as the recipient ...