Operationalizing and Visualizing Machine Learning Models

Let's cover the importance of machine learning models when operationalizing and visualizing data.

We'll cover the following

Enhancing performance and using big data tools
- Cleaning up our code
Apache Arrow

Machine learning and deep learning are completely separate topics, of course, but with all the previously mentioned skills, we can take our machine learning to a new level. At the end of the day, we’ll use charts to express certain ideas about our data, and with a good interactive data visualization vocabulary, we can give our users many options to test different models and tune hyperparameters.

Enhancing performance and using big data tools

This is a very important topic, and we always need to make sure that our apps perform at an acceptable level. We didn’t tackle this in the course because the focus was mainly to learn how to create a Dash app with all the other details that make it work. We also worked with a very small dataset of a few megabytes. Still, even with a small dataset, it can be crucial to optimize it. Big data can be about handling a massive file, or it can be about a small file that needs to be handled a massive number of times.

These are some things that can be done to optimize performance, but big data is a separate topic altogether, so here are some hints and some areas to explore.

Cleaning up our code

Once we know how our app will behave and what features we’ll be using, we can clean up some unnecessary code and data that might be hindering our app’s performance. Here are some ideas that can be implemented immediately in our app:

Load the necessary data only: We loaded the whole file, and for each callback, we queried the DataFrame separately. That can be wasteful. For example, if we have a callback for population data only, we can create a separate file (and then a separate subset) DataFrame that only contains relevant columns and query them only, instead of using the whole DataFrame.
Optimize data types: Sometimes we need to load data that contains the same values repeated many times. For example, the poverty dataset contains many repetitions of country names. We can use the pandas categorical data type to optimize those values.

Load the sys module, and see the difference in size in bytes for a string (a country name) and an integer.

Get hands-on with 1200+ tech skills courses.

Plotly's Dash Framework

Overview of the Dash Ecosystem

Exploring the Structure of a Dash App

Working with Plotly's Figure Objects

Data Manipulation and Preparation using Plotly Express

Interactively Comparing Values with Bar Charts and Drop-Down Menus

Exploring Variables and Filtering Subsets

Exploring Map Plots and Enriching Dashboards with Markdown

Calculating the Frequency of Data with Histograms and Tables

Letting the Data Speak for Itself with Machine Learning

Turbocharge Apps with Advanced Callbacks

URLs and Multipage Apps

Deploying the App

Next Steps

Appendix

Operationalizing and Visualizing Machine Learning Models

Enhancing performance and using big data tools

Cleaning up our code