Pix2pixHDWang, Ting-Chun, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. "High-resolution image synthesis and semantic manipulation with conditional gans." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8798-8807. 2018. is an upgraded version of the pix2pix model. The biggest improvement of pix2pixHD over pix2pix is that it supports image-to-image translation at 2048×1024 resolution and with high quality.
Model architecture
To make this happen, they designed a two-stage approach to gradually train and refine the networks, as shown in the following diagram. First, a lower-resolution image of 1024×512 is generated by a generator network G1, called the global generator (the red box). Second, the image is enlarged by a generator network G2, called the local enhancer network so that it’s around 2048×1024 in size (the opaque box). It is also viable to put another local enhancer network at the end to generate 4096×2048 images. Note that the last feature map is also inserted into G2 (before the residual blocks) via an element-wise sum to introduce more global information into higher-resolution images: