While introducing new technology, Nvidia also introduced new vocabulary that reflects the significant power of the new GPUs.
This year, it was almost as if there were two conferences occurring at SIGGRAPH 2018: the version we know well, and a smaller, parallel event, complete with a separate keynote, sponsored by Nvidia.
For years now, Nvidia has been hosting its own successful GPU conference. This year, the company brought a mini version of that event, held this past March, to the SIGGRAPH attendees – during which Nvidia founder/CEO Jensen Huang introduced new technology, which became the belle of the SIGGRAPH 2018 ball.
Huang took the stage Monday afternoon, a day before the SIGGRAPH conference floor opened, to present the Nvidia keynote, which followed on the heels of ILM’s Rob Bredow’s SIGGRAPH keynote on pushing the bounds of creativity. Huang’s speech focused on pushing the bounds of creativity through technology, specifically new technology he was there to introduce to the crowd of content creators and industry pioneers.
Twenty-five years ago, Nvidia began its quest to generate amazing images. Then Huang proudly held up for all to see Nvidia’s latest innovation: the Quadro RTX GPU, the first, based on the Turing architecture, that allowed artists to render photorealistic scenes in real time and interact seamlessly with complex models.
“Today is an historic moment,” Huang said right before the product reveal. “We have something very special to share with you.”
That “something” was indeed special. The world’s first raytracing GPU.
The editor inside a replica J. Turner Whitted's "figure 6" of his paper titled "Improved Illumination Model for Shaded Display," which became a key part of the foundation for modern ray traced computer graphics. The figure illustrates the effect of refraction through a transparent object. The display was presented on the SIGGRAPH show floor by Nvidia.
That Was Then…
To really appreciate the technological leap available from the RTX, Huang took the audience on a brief trip down memory lane, offering a look at how far graphics have evolved. And what better way to illustrate this than to look at the accomplishments by Pixar, which is celebrating its 30th anniversary. “Over the course of the last 30 years, the progress they have made, their relentless pursuit for image perfection, has been nothing short of amazing,” Huang said. In 1995,
Toy Story debuted, using NURBS to generate characters, motion blur, and so forth. “The imagery was just beautiful,” he said, “processed with 800,000 CPU hours on a 100 MHz, 27 Mflops CPU. So cute!”
Several years later, on Cars, Pixar introduced real-time raytracing, “and got beautiful reflections, beautiful shadows, ambient occlusion…
.” In 2006, the studio used global illumination for the very first time, so they didn’t have to paint their movies with hundreds of cameras inside the shots, faking light. “With global illumination, light suddenly works just as it should, with subtleties of light bouncing all over the environment and picking up color as it goes along—and in doing so, saved so much time for their artists, since they no longer had to rig up these sets with virtual cameras.”
Pixar took things a step further with Finding Dory in 2014, with water environments and rays bouncing off the waves in the water, reflections, refractions that exploded the computation necessary to render that scene. And they invented something else, with state-of-the-art technology and deep learning, Pixar created AI-based de-noising, filling in all the spots where the rays did not reach yet—and reduced the render time tremendously.
“Even then, the amount of computation time required was amazing,” Huang said, “200 million CPU core hours, each core now 2 GHz.” Nevertheless, this project represented a 4,000 times increase in the amount of CPU core hours needed, despite the increased power.
To explain this, you need to look at it from CG pioneer Jim Blinn’s perspective as it pertains to Blinn’s Law, which asserts that rendering time tends to remain constant, even as computers get faster. Thus, artists would continue to use all the computational power provided.
However, Moore’s Law has come to an end, where we can’t get transistors to go any faster without consuming more power, said Huang, and we’ve lost the ability to find more construction parallelism and we’re running into walls left and right. “CPU performance has really come to a halt.” Yet, the number of CPU hours required to compute these movies continues to go up.
But don’t fret. There is a solution! (Isn’t there always?)
“I believe there is a new law. It’s Jim Blinn’s Law in addition to an artist’s law of some kind in pursuit of that perfect image, no matter what technology provides because that is what they simply demand,” Huang said.
So, 25 years ago, Nvidia began the pursuit of generating the most amazing imagery in a thirtieth of a second and at a price point consumers would pay for. This led to the introduction of the GeForce 256, the first processor that did all the lighting transformation in hardware. In 2003, Nvidia then introduced GeForce 3, with the first programmable shader. Then in 2006, the company’s single most important GPU ever, the first compute GPU (CUDA).
“The rate of progress has been completely astounding,” Huang said.
That’s because engineers and scientists have been able to operate and optimize performance across the entire stack, by working with APIs, engines, and software developers to remove bottlenecks and innovate, and come up with new ideas to break Moore’s Law. “It requires us to work as one team.”
“We’ve been pursuing the path to photoreal, and there has been so much progress. All our capabilities are possible with all this technology underneath. But there is one big roadblock—the simulation of light.” We’ve been able to make these beautiful images and video games through trickery and hacks, such as light probes, reflection probes, and more, Huang adds. Indeed, this roadblock has been a problem since 1979, since a researcher now at Nvidia wrote a paper on it, stating to process 60 pixels per second would require a Cray supercomputer behind every single pixel to generate real-time raytracing.
“It turns out his estimates were pretty close. It turns out that 35 years later, we are able to put a Cray supercomputer behind every single pixel—actually, effectively several Cray supercomputers behind every single pixel, and we’ve created a brand-new way of doing rendering, and introduced it this year at GDC in March for the very first time. It’s a new rendering system created at Nvidia called RTX.”
Star Wars demo
This Is Now…
For this scenario, the RTX runs on a deskside supercomputer called the DGX Station and requires four of Nvidia’s highest-performance GPUs that currently are powering the world’s fastest supercomputer, all four working together simultaneously to generate five rays per pixel to simulate realistic imagery in real time. The audience was then shown a Star Wars-inspired clip from ILMxLabs/Epic/Nvidia, complete with reflections, inner-reflections, area lights, dynamic area lights, and soft shadows. No, wait, it was running on a single GPU! The world’s first raytracing GPU.
Huang then clicked off the specs for this new GPU, and in doing so, introduced new words and phrases for this new era in computing. “The performance is incredible,” he said. The RTX family can handle up to 10 giga rays per second. (In comparison, the fastest CPU in the world with a lot of cores in it can do a few hundred thousand rays per second.) “10 giga rays per second, it’s even fun to say,” he added.
“When was the first time someone used the term ‘giga rays’? We use ‘teraflops,’ not ‘giga rays.’”
Not only does it have an enormous amount of floating point performance for shading, but this shading architecture, the SM, a new compute architecture, has the first integer floating point pipeline to do address calculations and numerical calculations at the same time.
The GPU also delivers up to 16 Tflops plus 16 TIPS, or trillion integer operations per second. “When was the last time you heard the word ‘TIPS’”? Huang asked. “This has 16 TIPS. These are brand-new things no computer scientist has ever said.”
The RTX comes with an NVLink multi-GPU connector so the frame buffer can talk to the other GPU frame buffers over the link—so the frame buffer is additive. In this regard, it does up to 100GB/sec with NVLink.
Huang then clicked off more numbers. The GPU powered by the Turing architecture does 500 trillion tensor operations per second. “No processor in the history of processors has every commanded this much computational resource on one chip,” he said.
Turing architecture
The Processor
There are three types of processors associated with this technology: the SM for compute and shading; a brand-new processor called the RT Core with 10 giga rays per second for real-time raytracing with physically accurate shadows, reflections, refractions, and global illumination; and a new processor called the Turing Tensor Core for accelerated deep learning and AI.
“The new Turning GPU is the greatest leap since 2006 when we introduced CUDA. This fundamentally changes how computer graphics is going to be done. This is a step function in realism,” stated Huang.
Also important to note: The RTX supports a new generation of hybrid rendering, with interoperability between the rasterization, raytracing, compute, and AI.
Huang then outlined Nvidia’s goals. First, to make sure that the images generated are dramatically different from what was possible before. Second, that they invent a new computing model that takes advantage of the efficiencies of rasterization and combine it with compute and artificial intelligence, to make all this work together in an operable way. Third, it has to work for today’s applications as well as tomorrow’s.
He then took the audience inside the chip. “New algorithms are now possible so we can conserve the shading horsepower to create, dedicate, assign, and focus that resource where you need it most.”
It took 10 years of research to figure out this RT Core that accelerates the data structure, to determine which of the primitives a ray intersects and interoperate with a shader—this was a great challenge. As a result, Nvidia has been able to achieve real-time raytracing for the first time.
To put this into perspective, Huang compared it to Pascal. Pascal had 11.8 billion transistors, Turing has 18.6 billion. Whereas Pascal has a 24GB frame buffer and 10 GHz, Turing’s is 48 plus 48GB at 14GHz. This is the largest processor short of one other: the Volta V100 used to power supercomputers around the world. And now we can do integer and floating point at the same time, instead of waiting for one to finish.
The RTX family consists of the 5000 (priced at $2,300), with 6 giga rays/sec, with 16GB (32GB with NVLink); the 6000 ($6
,300) with 10 giga rays/sec and 24GB (48GB with NVLink), and the 8000 ($10,000) with 10 giga rays/sec, 48GB (96GB with NVLink).
Porche real-time lighting demo
…and More!
Huang also announced that Nvidia is going to open-source the Nvidia Material Description Language (MDL), a high-level programming language that captures the physical properties of materials and its reflective functions using what is called bidirectional reflective scattering function. And makes it possible to interchange it within applications.
Also, Nvidia said it would be working with Pixar in supporting its Universal Scene Descriptive Language so content can move in and out of tools.
There’s also a brand-new software stack for this new computer graphics reinvented—rasterization, raytracing, compute, AI.
Huang then treated the audience to a visual exercise featuring the power of the RTX on objects within the “Cornell box,” illustrating traditional graphics, area lights, diffuse reflections, caustics, depth of field, reflections, refractions, and illumination—and showed this all in real time, as he reminded us. This was also featured on a display, playing in real time, at the Nvidia booth.
Huang began wrapping things up by reminding the audience that real-time computer graphics has been used for interactivity, design, CAD, flight simulators, and video games. But the market that relies on realistic visualization is expansive, though it has not been able to use the benefit of GPUs and acceleration. “Today we still use light baking and precomputing for realism in [things like] video games.” But, those days may be coming to an end soon.
Nvidia then gave the audience a sneak peek at this new world. For the 70th anniversary of Porche, they created a trailer called The Speed of Light with dynamic reflections, global illumination, soft shadows, reflection and refraction off the windshield. And the illustration was in real time, not a movie.
“Because of two fundamental new technologies – raytracing acceleration with the RT Core and deep learning with or Core—because of what we did before with rasterization and compute, all of a sudden we gave ourselves an enormous boost and pulled in raytracing somewhere between five and 10 years,” Huang said.
Huang went on to show the power of the Turing architecture with a comparison of scenes using Pascal versus Turing, resulting in a 6x increase in speed over Pascal, and a major factor in this was due to the raytracing, which is “incredibly fast.” Also, with AI, it can render at a slightly lower resolution; because the model is trained off a very high ground truth, the final image can be generated at a much higher rate. “We call that DLAA. The combination of AI and raytracing has made it possible for us, and in Unreal Engine 4 with RTRT (real-time raytracing) running atop Microsoft’s DXR (DirectX raytracing) API, we can take computer graphics and improve it by a factor of six,” he said.
As far as AI is concerned, Huang noted it will be used so much within the Nvidia graphics pipeline, whereby an auto encoder is taught from “ground truth” generated from 64 samples of a rendering; it is then combined together into basically one frame. Then a whole bunch of those are created, teaching the neural network to generate the final image that is much higher in quality than the original. ”So we can start with a lower-resolution image and the result is a higher-quality image rather than a higher-resolution image,” he explained. “The power of deep learning for image generation. We call it the Nvidia DLAA.”
With the three processors working together, Turning was able to accelerate rendering time by a factor of six, Huang iterates.
Huang showed raytraced final film quality imagery with motion blur, and all on the RTX. Yet, photorealism is not solely the requirement of entertainment—it is required in architecture design and other areas, too. Huang showed examples of the Rosewood hotel lobby in Bangkok re-created using RTX, SolidWorks, and Revit. The lights and time of day were easily modified to get an accurate depiction of the marble lobby, in real time.
Soon after, Huang introduced the RTX Server that enables production rendering with global illumination, powered by the RTX 8000—one server with 8 GPUs that allows you to speed up final film rendering--raytraced global illumination for up to 96GB scenes. And it is designed to be remoted—it can be your renderfarm, your workstation, your RenderMan workstation, your Maya workstation—and is compatible with every single application in the DCC universe within the software layer, the new Quadro Infinity. Rendering time goes from hours to minutes. Instead of a shot taking 6 to 8 hours, it takes one hour. So the number of iterations is remarkable, and will change how people do film. And the price is significantly lower, too, versus a traditional renderfarm of similar power—and not to mention at a much smaller footprint, with Huang calling the solution a “render planter box” compared to a renderfarm.
“There is no doubt in my mind that the Nvidia Turing is the single greatest leap ever made in one generation, the most important GPU we created since Nvidia CUDA. We have been able to build something that the $250 billion visual effects industry, for the very first time, we can enable them to be accelerated so we can change their workflow and enable them to do more with the same budget.”
“Computer graphics will never look the same again,” Huang added.
Photoreal real-time work in architectural design