Creating Great AI Products: The Rules
This lesson provides logical, easy to understand rules for creating great AI in all phases of production.
Rules of machine learning: Best practices for ML engineering
Google has made public a document sharing their best practices in ML engineering. Everyone working on creating smart products should know these best practices because there is no need to reinvent the wheel; we should listen to the experts and learn from the principles of those who have mastered their craft by being on the battlefield time and time again.
“Do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.” [cit. Google]
The document is pretty extensive and detailed, and I recommend you to read it fully. However, I have summarized it here to help you remember the rules (and recall them) easily. I want to ensure that you lead and guide your team to create great ML-driven products!
Phase 0 – Before ML: Understand whether the time is right for building a machine learning system
Before implementing ML in your system, consider the following rules to determine whether it is appropriate to incorporate ML.
Rule #1 – Don’t be afraid to launch a product without machine learning: If machine learning is really required, you can start with some simple heuristics. Otherwise, wait until you have enough data.
Rule #2 – Design and implement the metrics first: Add metrics, and then add some more. Before formalizing what your machine learning system will do, track as much data as possible in your current system and design your system with metric instrumentation in mind. Gathering metrics in a liberal way will allow you to gain a broader picture of your system.
Rule #3 – Choose machine learning over a complex heuristic: Start with simple heuristics. Next, move on to machine learning.
Phase 1 – Your first ML pipeline
Once you can decipher when to implement machine learning, these rules walk you through creating your first ML pipeline.
Rule #4 – Keep the first model simple and get the infrastructure right: Don’t start with fancy models and features. Focus on fixing the infrastructure issues first, e.g., ensure that your (simple) features are correctly reaching both the algorithm and the server. This simple model will provide you with baseline metrics and a baseline behavior that you can then use to test more complex models.
Rule #5 – Test the infrastructure independently from machine learning: Have a testable infrastructure. Test data flow into the algorithm and its processing. Test getting models from the training algorithm.
Rule #6 – Be careful about dropped data when copying pipelines: Often we create a new pipeline by recycling some existing pipeline, and the old pipeline drops data that we need for the new pipeline. Make sure that all the pieces are in place when doing so.
Rule #7 – Turn heuristics into features, or handle them externally: Don’t just discard existing heuristics related to the machine learning problem you are trying to solve. For example, don’t try to ...