Mastering Big Data with Apache Spark and Java/

...

Ingestion Job: Part II

Continue inspecting the code of the Ingestion Batch Job example.

We'll cover the following...

Processing the input

Flattening the JSON records

Writing processed information to the database
Integrating code changes to the versioning repository

Press + to interact

@Component
public class IngesterProcessor implements Processor<Dataset<Row>> {
    private static Logger LOGGER = LoggerFactory
            .getLogger(IngesterProcessor.class);
    @Override
    public Dataset<Row> process(Dataset<Row> inputDf) {
        LOGGER.info("Flattening JSON records...");
        //Get the appropriate Spark based class doing a Transformation
        Dataset<Row> parsedResults = inputDf.flatMap(new IngesterJsonFlatMapper(), RowEncoder.apply(SalesSchema.getSparkSchema()));
        return parsedResults;
    }
}

Press + to interact

Interestingly enough, Spark internal representation of a JSON read record (the schema) is pretty similar to the JSON structure.

It contains an array of Sales and Seller_id String fields as root properties. The sales array is a collection of elements, named element, of type struct.

Struct refers here to an object that is composed of a Date property and a collection or array of Item elements. These, in turn, possess two fields: a Product String and a Quantity long property.

But this type of structure with nested collections does not ...

Course Introduction

Spark Introduction and Basics

Getting Started with Spark

DataFrame Basic Operations

DataFrame Advanced Operations

Spark SQL and Other Functionalities

Building a Big Data Batch Application

Deployment and Cluster Execution

Monitoring and Performance Fundamentals

Conclusion

Apendix

Ingestion Job: Part II

Processing the input