...

Introduction to Data Bias

Learn what data bias is and where it comes from.

We'll cover the following...

Misrepresentation in data
Feature selection bias

While many of the pre-pipeline biases are not directly observed or created by data scientists, it’s important to be conscious of where and under what conditions data is sourced. In this lesson, we focus primarily on data bias.

Defined simply, data bias is a skew or tendency in the data that leads a model to make potentially erroneous conclusions. In other words, it’s a property of a dataset that greatly increases ML risk downstream in the pipeline. Data bias is a general phenomenon that doesn’t necessarily relate to discrimination, but some of the most famous cases of data bias in the media come from improperly sourced sets that lead to discriminatory models.

Misrepresentation in data

The most common source of data bias comes from skewed ...

Introduction

Disasters in Data

Disasters in Models

Measuring Causal Relations with Python

Alternatives to Traditional ML

Adversarial Robustness of Neural Networks

Conclusion

Assessment: Disasters in ML Pipelines

Introduction to Data Bias

Misrepresentation in data