...

/

Training Data Generation

Training Data Generation

Let's generate training data for the semantic image segmentation model.

Generating training examples

Autonomous driving systems have to be extremely robust with no margin of error. This requires training each component model with all the possible scenarios that can happen in real life. Let’s see how to generate such training data for performing semantic segmentation.

Human-labeled data

First, you will have to gather driving images and hire people to label the images in a pixel-wise manner. There are many tools available such as Label Box that can facilitate the pixel-wise annotation of images.

Press + to interact
Manual labelling of driving images to generate training data for semantic segmentation
Manual labelling of driving images to generate training data for semantic segmentation

All the other parts of the image are also labelled similarly.

Open source datasets

There are quite a few open-source datasets available with segmented driving videos/images. One such example is the “BDD100K: A Large-scale Diverse Driving Video Database”. It contains segmented data for full frames, as shown with an example below.

Press + to interact
BDD100K: full frame segmentation
BDD100K: full frame segmentation

These two methods are effective in generating manually labeled data. In most cases, your training data distribution will match with what you observe in the real-world scene images. However, you may not have enough data for all the conditions that you would like your model to make predictions for such as snow, rain, and night. For a self-autonomous vehicle to be perfect, your segmenter should work well for all the possible weather conditions, as well as cover a variety of obstacle images in the road.

Press + to interact
A segmenter trained on a certain type of images won't be able to handle unseen conditions reliably
A segmenter trained on a certain type of images won't be able to handle unseen conditions reliably

The sunny vs rainy condition is just one example; there can be many such situations.

It is an important area to think about how you can give your model training data for all conditions. One option is to manually label a lot of examples for each scenario. The second option is to use powerful data augmentation techniques to generate new training examples given your human-labeled data, especially for conditions that are missing in your training set. Let’s discuss the second approach, using generative adversarial networks (GANs).

Training data enhancement through GANs

In the big picture, your self-driving system should compete with human intelligence when it comes to making decisions and planning movements. The segmenter can play its role in creating a reliable system by being able to accurately segment the driving scene in any condition that the vehicle experiences.

Achieving this target requires a diverse set of training data that covers all the possible permutations of the driving scene. For instance, assume that your dataset only contains ten-thousand driving images in the snowy Montreal conditions and fifty-thousand images in the sunny California conditions. You need to devise a way to enhance your training data by converting snowy conditions of the ten-thousand Montreal images to sunny conditions and vice versa.

The target includes two parts:

  1. Generating new
...
Access this course and 1400+ top-rated courses and projects.