Stacking Restricted Boltzmann Machines to Generate Images

Explore the theoretical background of how a DBN is trained and how TensorFlow 2’s gradient tape functionality can be utilized.

We'll cover the following

We have seen that an RBM with a single hidden layer can be used to learn a generative model of images; in fact, theoretical work has suggested that with a sufficiently large number of hidden units, an RBM can approximate any distribution with binary valuesPearl J., Russell S. (2000). BAYESIAN NETWORKS. https://ftp.cs.ucla. edu/pub/stat_ser/r277.pdf. However, in practice, for very large input data, it may be more efficient to add additional layers instead of a single large layer, allowing a more “compact” representation of the data.

Researchers who developed DBNs also noted that adding additional layers can only lower the log-likelihood of the lower bound of the approximation of the data reconstructed by the generative modelHinton GE, Osindero S, Teh YW. (2006) A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527-54. https://www.cs.toronto.edu/~hinton/ absps/fastnc.pdf. In this case, the hidden layer output hh of the first layer becomes the input to a second RBM; we can keep adding other layers to make a deeper network. Furthermore, if we wanted to make this network capable of learning not only the distribution of the image (xx) but also the label—the digit it represents from 00 to 99 (yy)—we could add another layer to a stack of connected RBMs. This layer would use a softmax function to represent the probability distribution over the 1010 possible digit classes.

Stacking RBMs

A problem with training a very deep graphical model such as stacked RBMs is the “explaining-away effectHinton GE, Osindero S, and Teh YW. A fast learning algorithm for deep belief nets, 1527-54..” Recall that the dependency between variables can complicate the inference of the state of hidden variables

Get hands-on with 1400+ tech skills courses.