...

/

Varieties of Networks: Convolution and Recursive

Varieties of Networks: Convolution and Recursive

Learn about the history of improvements in neural networks and the alternative deep models.

Until now, we’ve primarily discussed the basics of neural networks by referencing feedforward networks, where every input is connected to every output in each layer. While these feedforward networks are useful for illustrating how deep networks are trained, they are only one class of a broader set of architectures used in modern applications, including generative models.

Networks for seeing: Convolutional architectures

One of the inspirations for deep neural network models is the biological nervous system. As researchers attempted to design computer vision systems that would mimic the functioning of the visual system, they turned to the architecture of the retina, as revealed by physiological studies by neurobiologists David Huber and Torsten Weisel in the 1960shttp://charlesfrye.github.io/FoundationalNeuroscience/img/ corticalLayers.gif. As previously described, the physiologist Santiago Ramon Y Cajal provided visual evidence that neural structures such as the retina are arranged in vertical networksWolfe, Kluender, Levy (2009). Sensation and Perception. Sunderland: Sinauer Associates Inc...

Press + to interact
The "deep network" of the retina
The "deep network" of the retina

Huber and Weisel studied the retinal system in cats, showing how their perception of shapes is composed of the activity of individual cells arranged in a column. Each column of cells is designed to detect a specific orientation of an edge in an input image; images of complex shapes are stitched together from these simpler images.

Early CNNs

This idea of columns inspired early research into CNN architecturesLeCun, Yann, et al. Backpropagation applied to handwritten zip code recognition. Neural computation 1.4 (1989): 541-551.. Instead of learning individual weights between units as in a feedforward network, this architecture (shown below) uses shared weights within a group of neurons specialized to detect a specific edge in an image. The initial layer of the network (denoted H1H1) consists of 12 groups of 64 neurons each. Each of these groups is derived by passing a 55 x 55 grid over the 1616 x 1616-pixel input image; each of the 64 55 x 55 grids in this group shares the same weights but is tied to different spatial regions of the input. You can see that there must be 64 neurons in each group to cover the input image if their receptive fields overlap by two pixels.

When combined, these 12 groups of neurons in layer H1H1 form 12 88 x 88 grids representing the presence or absence of a particular edge within a part of the image—the 88 x 88 grid is effectively a down-sampled version of the image. This weight sharing makes intuitive sense in that the kernel represented by the weight is specified to detect a distinct color and/or shape, regardless of where it appears in the image. An effect of this down-sampling is a degree of positional invariance; we only know the edge occurred somewhere within a region of the image, but not the exact location due to the reduced resolution from downsampling. Because they are computed by multiplying a 55 x ...