Overview

Automated feature engineering is a powerful tool for reducing the amount of manual work needed in order to build predictive models. Instead of a data scientist spending days or weeks coming up with the best features to describe a dataset, we can use tools that approximate this process. One library I’ve been working with to implement this step is FeatureTools. It takes inspiration from the automated feature engineering process in deep learning. However, it is meant for shallow learning problems where you already have structured data but need to translate multiple tables into a single record per user.

In our pre-configured execution environment, gcc and Python3.7 are already installed along with the required libraries. To skip local installation instructions and get on with the applications, click here.

Installing libraries

The library can be installed as follows:

Press + to interact

In addition to this library, I loaded the framequery library, which enables writing SQL queries against dataframes. Using SQL to work with dataframes versus specific interfaces, such as Pandas, is useful when translating between different execution environments.

Getting started

The task we’ll apply the FeatureTools library to is predicting which games in the Kaggle NHL dataset are postseason games. We’ll make this prediction based on summarizations of the play events that are recorded for each game. Since there can be hundreds of play events per game, we need a process for aggregating these into a single summary per ...

Introduction to Building Scalable Model Pipelines

Models as Web Endpoints

Models as Serverless Functions

Create an Echo Function in Lambda

Working with S3 in Lambda

Working with API in Lambda

Containers for Reproducible Models

Working with AWS Container Registry

Workflow Tools for Model Pipelines

PySpark for Batch Pipelines

Cloud Dataflow for Batch Modeling

Streaming Model Workflows

Course Conclusion

Automated Feature Engineering

Overview

Installing libraries

Getting started