...

/

Performance Fundamentals and Recipes

Performance Fundamentals and Recipes

Learn the fundamentals of performance for a Spark application.

Many factors and constraints affect an application’s execution performance, such as its architecture, resources available, and non-functional requirements like data encryption. No one magic recipe can account for the myriad of applications and their nature when it comes to performance considerations.

Ultimately, applying a systemic approach, one of testing, gathering metrics and results, doing changes, testing again, and repeating the process might shed some light on the bottlenecks, overheads, or just poor design of an application. On the other hand, specific third-party libraries and frameworks, like Spark, are designed in a manner that imposes constraints on the application that uses its APIs. We have seen an example of this in the case of immutability, specifically when we talked about the impossibility of changing a DataFrame when a transformation is applied to it. Instead, a new DataFrame is always returned with changes reflected.

Constraints like this don’t have to be a foe, but rather a friend, when it comes to utilizing Spark in a performant way. With this in mind, this lesson tries to provide general guidelines and explain the fundamentals for setting the foundation of a robust Spark application, as well as describing some recipes commonly used in Spark developments.

Note: Application performance optimization tends to be a very complex topic that unfortunately cannot be explained in one or even several lessons, so this lesson might pack a lot of concepts. The best recipe might simply be, at the end of the day, getting our hands dirty with the development and testing of a Spark application dealing with huge volumes of information; this ...

Access this course and 1400+ top-rated courses and projects.