Voxels
Learn how to use voxel grids and the cubify operator for 3D deep learning.
The voxel grid
The first data representation we introduce is the most straightforward one: the voxel grid. Essentially, the voxel grid approach is very similar to the dense 2D grid structure we see in images. The only difference is the extension into a 3rd spatial dimension. They are also called volumes, and PyTorch3D has its own Volumes
class.
Rather than representing visual data (e.g., light) projected onto a 2D plane, voxel grids are a discrete data structure directly representing the 3D physical space. Instead of 2D pixels, which bin light within a rectangular receptive field, we have voxels, which are the 3D analog of pixels. Images can represent more than just color; images can represent depth from time-of-flight sensors or
This simple grid design allows us to carry over most of the same tools we use in deep neural networks for computer vision, such as convolutions, pooling, and activation functions. In many cases, other types of 3D data such as meshes and point clouds can be easily massaged into the voxel grid format as well.
Use cases
Voxel grids can be useful in applications where dense representation of data is needed.
Modeling occupancy
In the case of 3D shape data, the simplest technique involving voxel grids is to model the binary occupancy of the 3D space. In other words, every voxel contains either a
3D object detection
Voxel grids are also a commonly used data representation for 3D object detection. RoomPlan, a 3D floor-plan estimation API developed by Apple, also makes use of voxel grids. It provides a two stage pipeline that consists of a room layout estimation followed by 3D object detection (3DOD). In the first stage, a voxel grid represents the density of points aggregated from a point cloud. This is used as a feature in the room layout estimation network. The 3DOD stage also leverages a voxel representation as input to another model that predicts properties at each voxel such as
Creating meshes
An advantage to using the voxel grid representation is the straightforward conversion to meshes. In Mesh R-CNN, Gkioxari et al. propose a neural network that can convert 2D RGB images into 3D meshes for specific categories of objects. One component of this network is a