The Pix2Pix Generative Adversarial Network, or GAN, is an approach to training a deep convolutional neural network for image-to-image translation tasks.
The careful configuration of architecture as a type of image-conditional GAN allows for both the generation of large images compared to prior GAN models (e.g. such as 256×256 pixels) and the capability of performing well on a variety of different image-to-image translation tasks We hear a lot about language translation with deep learning where the neural network learns a mapping from one language to another. In fact, Google translate uses it to translate to more than 100 languages. But, can we do a similar task with images? Certainly, yes! If it’s possible to capture the intricacies of languages, it’ll surely be possible to translate an image to another. Indeed, this shows the power of deep learning. Pix2Pix GAN paper was published back in 2016 by Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. Find the paper here. It was later revised in 2018. When it was published, internet users tried something creative. They used the pix2pix GAN system to a variety of different scenarios like a frame-to-frame translation of a video of one person to another, mimicking the former’s moves. Cool, isn’t it? Using pix2pix, we can map any image to any other image like edges of an object to the image of the object. Further, we’ll explore more about its architecture and working in detail. Now, let’s dive right in!