How to Ingest Files: Part II
Learn how to load different file formats with the Spark API.
We'll cover the following
Ingestion of XML files
Extensible Markup Files, or XML files, are still broadly present in the realm of data formats.
These files are structured, extensible, self-describing (easy to read for us humans), and can be validated by using XSD files in conjunction with them.
Note: For more information on the XML format, please refer to: https://www.w3.org/XML/
On the downside, they tend to be quite verbose, and sometimes, depending on the complexity of their structure, very hard to read. Nonetheless, this format is widely used, and Spark finds no impediments to parsing it for us.
The project for this lesson is quite similar to the previous lesson one:
Get hands-on with 1200+ tech skills courses.