IM IMAGE-TO TO-IM IMAGE T TRANSLATIO ION W WIT ITH C CONDIT ITIO IONAL AD ADVERSAR ARIAL AL NETWORKS
Yuanjie Lu 9th April
IM IMAGE-TO TO-IM IMAGE T TRANSLATIO ION W WIT ITH C CONDIT - - PowerPoint PPT Presentation
IM IMAGE-TO TO-IM IMAGE T TRANSLATIO ION W WIT ITH C CONDIT ITIO IONAL AD ADVERSAR ARIAL AL NETWORKS Yuanjie Lu 9 th April What is image to image? Seems like a language concept can be translated into Chinese, French, Italian,
IM IMAGE-TO TO-IM IMAGE T TRANSLATIO ION W WIT ITH C CONDIT ITIO IONAL AD ADVERSAR ARIAL AL NETWORKS
Yuanjie Lu 9th April
■ Seems like a language concept can be translated into Chinese, French, Italian, etc. ■ A visual scene can be rendered into RGB, gradient fields, boundary maps, semantic label maps, etc. ■ We can input some outlines according to the input, and output some similar pictures ■ Given enough training data, we can translate the scene expression into another scene expression
■ The generative adversarial network (GAN) consists of 2 important parts: – Generator (Generator): data generated by the machine (in most cases is an image), the purpose is to "lie" discriminator – Discriminator: the purpose is to find out the "fake data" made by the generator ■ Algorithms: –
–
Generative Adversarial Networks) –
–
Adversarial Networks) –
–
Variation) –
■ Conditional Generative Adversarial Network (c-GAN) ■ Not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. ■ Apply the same generic approach to problems that traditionally would require very different loss formulations.
– Although CNN has better results in image recognition, it makes trouble in terms
because it only calculate the global average distance – Need to tell CNN what to train and what to learn. – Although the learning process is automatic, still need a lot of manual operation to design loss – Overall, CNNs learn to minimize a loss function – an objective that scores the quality of results
■ Different – Adversarial loss can classify if the output image is real or fake. – When training the data, the generator not only see random noise points, but also see the corresponding input image – Overall, through c-GAN, the corresponding pictures can be generated according to the pictures I input.
■ GAN: z → y ; c-GAN is: G: x, z → y. Where x is the input image, z is the random noise vector, and y is the output image. ■ A simple GAN generates an image from noise, and a discriminator cannot tell whether it is generated or real. ■ The conditional GAN generates an image from a given observed data which makes the discriminator indistinguishable ■ Use L1 loss function, because the pictures generated by L1 are clearer and less blurring
■ Generator – Generator with skips ■ Discriminator – Markovian Discriminator (Patch-GAN)
■ The generated images have the same structure but different textures as the real
aligned ■ If only use VAE(Variational Autoencoders), no matter what kind of information, it has to pass through all layers, so some information may be over-calculated and not effective. ■ How to use skip connection? – For example, make connection with layer1 and layer n, layer2 and layer n-1, and so on.
■ Although the loss of L1 and L2 distance often produces blurry images of blur, it is useful for low-frequency structure. ■ When use L1 function, we only need to design a suitable loss to capture high- frequency information to complete the effect we want. ■ So they chose Patch-GAN ■ The smaller Patch-Gan, the fewer parameters, so the model run faster and can be applied to large images.
■ To optimize the network, maximize log D(x, G(x, z)) , not log(1 − D(x, G(x, z)) ■ Use minibatch SGD ■ Learning rate = 0.0002 ■ momentum parameters β1 = 0.5, β2 = 0.999.
■ First, use Amazon Mechanical Turk (AMT) to verify real or fake, – Because the author believes that the ultimate goal of such a graphics generation task is to make it impossible for people to see whether it is fake or real, so This method is good ■ Second, FCN-score. – This is to use the generated image as a real image. If a classifier can classify a real image, it can naturally generate a fake image with a good effect.