Model Design Cheat Sheet

Understand model design cheat sheet and its different parameters.

Now, we will provide an overview of the various choices we can make when it comes to designing the architecture of GAN models and even deep learning models in general. It is always okay to directly borrow the model architectures we see in papers. It is also imperative to know how to adjust a model and create a brand new model from scratch, according to the practical problems at hand. Other factors, such as GPU memory capacity and expected training time, should also be considered when we design our models.

We will talk about the following:

  • Overall model architecture design

  • Choosing a convolution operation method

  • Choosing a downsampling operation method

Overall model architecture design

There are mainly two different design processes for deep learning models. They are fit for different scenarios, and we should get comfortable with both processes:

  • Design the whole network directly, especially for shallow networks. We can add/remove any layer in our network with ease. With this approach, we can easily notice any bottlenecks in our network (for example, which layer needs more/fewer neurons), which is extremely important when we are designing models that will run on mobile devices.

  • Design a small block/cell (containing several layers or operations) and repeat the blocks several times to form the whole network. This process is very popular in very deep networks, especially in network architecture search (NAS). It is a bit harder to spot the weak spot in our model because all we can do is adjust the block, train the whole network for hours, and see if our adjustments lead to higher performance.

U-Net-shaped and ResNet-shaped networks are designed via a block-based approach and use skip connections to connect non-adjacent layers. There are two different forms of data flow in neural networks:

Press + to interact
Forms of data flow in neural networks
Forms of data flow in neural networks

You may have noticed that plain networks are often used in discriminators, and branching architectures are often used in generators. This is because generator networks are generally more difficult to train than discriminators, and the branches (for example, skip connections) pass low-level details to deeper layers in the forward pass and help gradients flow better in the backward pass.

When we deal with branches in a network, how several branches are merged (so that the tensors can be passed to another block/cell in a uniform size) also has a great impact on the network’s performance. Here are the recommended approaches:

  • Concatenate all the tensors into a list and create another convolution layer to map this list to a smaller tensor. This way, information from all the input branches is reserved, and the relationship between them is learned by the convolution layer. However, be careful with this approach when it comes to very deep networks since it costs more memory, and more parameters mean it is more vulnerable to overfitting.

  • Directly sum the overall input tensors. This is easy to implement but may not perform well when there are too many input branches.

  • Assign trainable weight factors to the branches ...