Operationalizing and Visualizing Machine Learning Models
Let's cover the importance of machine learning models when operationalizing and visualizing data.
We'll cover the following
Machine learning and deep learning are completely separate topics, of course, but with all the previously mentioned skills, we can take our machine learning to a new level. At the end of the day, we’ll use charts to express certain ideas about our data, and with a good interactive data visualization vocabulary, we can give our users many options to test different models and tune hyperparameters.
Enhancing performance and using big data tools
This is a very important topic, and we always need to make sure that our apps perform at an acceptable level. We didn’t tackle this in the course because the focus was mainly to learn how to create a Dash app with all the other details that make it work. We also worked with a very small dataset of a few megabytes. Still, even with a small dataset, it can be crucial to optimize it. Big data can be about handling a massive file, or it can be about a small file that needs to be handled a massive number of times.
These are some things that can be done to optimize performance, but big data is a separate topic altogether, so here are some hints and some areas to explore.
Cleaning up our code
Once we know how our app will behave and what features we’ll be using, we can clean up some unnecessary code and data that might be hindering our app’s performance. Here are some ideas that can be implemented immediately in our app:
- Load the necessary data only: We loaded the whole file, and for each callback, we queried the DataFrame separately. That can be wasteful. For example, if we have a callback for population data only, we can create a separate file (and then a separate subset) DataFrame that only contains relevant columns and query them only, instead of using the whole DataFrame.
- Optimize data types: Sometimes we need to load data that contains the same values repeated many times. For example, the poverty dataset contains many repetitions of country names. We can use the pandas categorical data type to optimize those values.
- Load the
sys
module, and see the difference in size in bytes for a string (a country name) and an integer.
Get hands-on with 1400+ tech skills courses.