At the GTC conference, Nvidia launched Volta, the most powerful GPU computing architecture, created to drive the next wave of advancement in artificial intelligence and high-performance computing.
The company also announced its first Volta-based processor, the Nvidia Tesla V100 data center GPU, which brings extraordinary speed and scalability for AI inferencing and training, as well as for accelerating HPC and graphics workloads.
"Artificial intelligence is driving the greatest technology advances in human history," said Jensen Huang, founder and chief executive officer of Nvidia, who unveiled Volta at his GTC keynote. "It will automate intelligence and spur a wave of social progress unmatched since the industrial revolution.”
Deep learning, a groundbreaking AI approach that creates computer software that learns, has insatiable demand for processing power, and Volta helps feed that need.
Volta, Nvidia's seventh-generation GPU architecture, is built with 21 billion transistors and delivers the equivalent performance of 100 CPUs for deep learning. It provides a 5x improvement over Pascal, the current-generation Nvidia GPU architecture, in peak teraflops, and 15x over the Maxwell architecture, launched two years ago. This performance surpasses by 4x the improvements that Moore's law would have predicted.
Demand for accelerating AI has never been greater. Data centers need to deliver exponentially greater processing power as neural networks become more complex. And they need to efficiently scale to support the rapid adoption of highly accurate AI-based services.
Volta offers a platform for HPC systems to excel at both computational science and data science for discovering insights. By pairing CUDA cores and the new Volta Tensor Core within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPUs for traditional HPC.
Breakthrough Technologies
The Tesla V100 GPU leapfrogs previous generations of Nvidia GPUs with groundbreaking technologies that enable it to shatter the 100 tflops barrier of deep learning performance. They include:
- Tensor Cores designed to speed AI workloads. Equipped with 640 Tensor Cores, V100 delivers 120 teraflops of deep learning performance, equivalent to the performance of 100 CPUs.
- New GPU architecture with over 21 billion transistors. It pairs CUDA cores and Tensor Cores within a unified architecture, providing the performance of an AI supercomputer in a single GPU.
- NVLink provides the next generation of high-speed interconnect linking GPUs, and GPUs to CPUs, with up to 2x the throughput of the prior generation NVLink.
- 900 GB/sec HBM2 DRAM, developed in collaboration with Samsung, achieves 50 percent more memory bandwidth than previous generation GPUs, essential to support the extraordinary computing throughput of Volta.
- Volta-optimized software, including CUDA, cuDNN and TensorRT software, which leading frameworks and applications can easily tap into to accelerate AI and research.