Annotation Formats

Learn to convert two annotation formats exported by CVAT into PyTorch tensors suitable for training a semantic segmentation CNN.

We learned to save the semantic segmentation annotation data in two formats: CVAT for images 1.1 and Segmentation mask 1.1.

Training a semantic segmentation dataset with PyTorch requires having target tensors with the type torch.int64. Each pixel in the target tensor must hold a long integer indicating the category index of the corresponding pixel in the original image. So, we need to run some code to convert the data exported by CVAT into suitable target tensors.

CVAT for images 1.1

The CVAT for images 1.1 format exports the semantic segmentation data into a single XML file. At the highest level, there is a <version> element and a <meta> element, where data about the annotation task is stored.

Get hands-on with 1200+ tech skills courses.