GauGAN: From Sketch to Masterpiece
March 22, 2019

GauGAN: From Sketch to Masterpiece

A lot of cool stuff is brewing at Nvidia Research, whose mission is to plant the seeds of future success by developing technology that will make a positive impact on the company. 

One such project that caught my attention at GTC 2019 is GauGAN. It is an image creator that leverages generative adversarial networks, or GANS, to transform simplistic shapes into highly realistic, photorealistic landscapes. 

The premise is simple, and so is the operation – from the user’s standpoint. This is how it works: A person uses a palette at the bottom of the screen to draw a very simplistic coloring book-like line drawing, which is represented on the left side of the screen. On the right side, GauGAn interprets this and substitutes a very rich, detailed object or objects. The image is then further customizable according to the input – a circular motion results in a cloud that is puffy, for instance. 

The goal is to go from segmentation maps to a realistic photo in real time (or a few seconds, depending on your hardware power). It is "painted" by a deep learning model developed by Nvidia Research. Bryan Catanzaro, vice president of applied deep learning research, likens the technology behind GauGAN to a smart paintbrush that can fill in the details inside rough segmentation maps, the outlines that show the location of objects in a scene.

“Nvidia has been a graphics company for 25 years. Now we have the chance to apply AI to the process of creating computer graphics. Turing with Tensor Cores lets us run neural networks like never before,” said Bryan Catanzaro, vice president of applied deep learning research.

Draw a pond and nearby elements like trees and rocks will appear as reflections in the water. How does the neural network know to to do this? It has been trained using a million Flickr images and fills in the landscape. Just like the model learns to alter the landscape when snow is placed on ground instead of grass, turning leafy trees barren. The more trained it is, the “smarter” it is. You will be left asking yourself, Is it a photo or not? 

GANs can produce convincing results because of their structure as a cooperating pair of networks: a generator and a discriminator. The generator creates images that it presents to the discriminator. Trained on real images, the discriminator coaches the generator with pixel-to-pixel feedback on how to improve the realism of its synthetic images. After training on real images, the discriminator knows that real ponds and lakes contain reflections, so the generator learns to create a convincing imitation.

"This technology is not just stitching together pieces of other images, or cutting and pasting textures," says Catanzaro. "It's synthesizing new images, very similar to how an artist would draw something." 

The tool lets user add a style filter, changing a generate image to adapt to the style of a particular painter. Or, change a daytime scene to sunset.

GauGAN could be used for creating realistic virtual worlds for industries such as architecture, gaming, and more. As Catanzaro said, it is easier to brainstorm designs with simple sketches, and this technology converts sketches into highly realistic images.

And the best part? Soon you likely can try it out for yourself. A new website lets the public explore the wonders of neural networks: www.nvidia.com/ai-playground. Some work still has to be done before the application is added to the site, but there are other types of apps on there that you can try using your own computer. 

Meanwhile, the research paper behind GauGAN has been accepted as an oral presentation at CVPR conference in June.