Voxels

Learn how to use voxel grids and the cubify operator for 3D deep learning.

The voxel grid

The first data representation we introduce is the most straightforward one: the voxel grid. Essentially, the voxel grid approach is very similar to the dense 2D grid structure we see in images. The only difference is the extension into a 3rd spatial dimension. They are also called volumes, and PyTorch3D has its own Volumes class.

Rather than representing visual data (e.g., light) projected onto a 2D plane, voxel grids are a discrete data structure directly representing the 3D physical space. Instead of 2D pixels, which bin light within a rectangular receptive field, we have voxels, which are the 3D analog of pixels. Images can represent more than just color; images can represent depth from time-of-flight sensors or LiDARLiDAR, or Light Detection and Ranging, is a remote sensing technology that uses laser light to measure depth or distances with high precision. , radiation measurements from PET, SPECT, and CT scanning, and more. Likewise, voxels in a 3D space can represent density, probability, and color, just to name a few.

Press + to interact
Example of a voxel grid
Example of a voxel grid

This simple grid design allows us to carry over most of the same tools we use in deep neural networks for computer vision, such as convolutions, pooling, and activation functions. In many cases, other types of 3D data such as meshes and point clouds can be easily massaged into the voxel grid format as well.

Use cases

Voxel grids can be useful in applications where dense representation of data is needed.

Modeling occupancy

In the case of 3D shape data, the simplest technique involving voxel grids is to model the binary occupancy of the 3D space. In other words, every voxel contains either a 00 if geometry does not exist or a 11 if geometry does exist at that location in 3D space. This can be extended to a Bernoulli distributionThe Bernoulli distribution models a random experiment with two possible outcomes (usually success and failure) and is characterized by a single parameter representing the probability of success. by allowing each voxel to contain a probability pp in the range 0p10 \le p \le 1. Some techniques such as 3D ShapeNets use this representation to learn to recognize object categories, recognize object parts (also known as 3D part segmentation), and infer the full 3D shape of objects.

3D object detection

Voxel grids are also a commonly used data representation for 3D object detection. RoomPlan, a 3D floor-plan estimation API developed by Apple, also makes use of voxel grids. It provides a two stage pipeline that consists of a room layout estimation followed by 3D object detection (3DOD). In the first stage, a voxel grid represents the density of points aggregated from a point cloud. This is used as a feature in the room layout estimation network. The 3DOD stage also leverages a voxel representation as input to another model that predicts properties at each voxel such as objectnessThe likelihood that a given voxel corresponds to an object., type, and size.

Creating meshes

An advantage to using the voxel grid representation is the straightforward conversion to meshes. In Mesh R-CNN, Gkioxari et al. propose a neural network that can convert 2D RGB images into 3D meshes for specific categories of objects. One component of this network is a ...