IBM, Nvidia Build “World’s Fastest Supercomputer” for US Government
The DOE’s new Summit system features a unique architecture that combines HPC and AI computing capabilities.
IBM and DOE Launch World’s Fastest SuperComputer
Frederic Lardinois@fredericl / Jun 8, 2018 https://techcrunch.com/2018/06/08/ibms-new-summit-supercomputer-for-the-doe-delivers-200-petaflops/ IBM and the U.S. Department of Energy’s Oak Ridge National Laboratory (ORNL) today unveiled Summit, the department’s newest supercomputer. IBM claims that Summit is currently the world’s “most powerful and smartest scientific supercomputer” with a peak performance of a whopping 200,000 trillion calculations per second. That performance should put it comfortably at the top of the Top 500 supercomputer ranking when the new list is published later this month. That would also mark the first time since 2012 that a U.S.-based supercomputer holds the top spot on that list.
Summit, which has been in the works for a few years now, features 4,608 compute servers with two 22-core IBM Power9 chips and six Nvidia Tesla V100 GPUs each. In total, the system also features over 10 petabytes of memory. Given the presence of the Nvidia GPUs, it’s no surprise that the system is meant to be used for machine learning and deep learning applications, as well as the usual high performance computing workloads for research in energy and advanced materials that you would expect to happen at Oak Ridge.
IBM was the general contractor for Summit and the company collaborated with Nvidia, RedHat and InfiniBand networking specialists Mellanox on delivering the new machine.
“Summit’s AI-optimized hardware also gives researchers an incredible platform for analyzing massive datasets and creating intelligent software to accelerate the pace of discovery,” said Jeff Nichols, ORNL associate laboratory director for computing and computational sciences, in today’s announcement.
Summit is one of two of these next-generation supercomputers that IBM is building for the DEO. The second one is Sierra, which will be housed at the Lawrence Livermore National Laboratory. Sierra, which is also scheduled to go online this year, is less powerful at an expected 125 petaflops, but both systems are significantly more powerful than any other machine in the DoE’s arsenal right now.
Karl Freund is a Moor Insights & Strategy Senior Analyst for deep learning & HPC
Summit, at the Oak Ridge National Laboratory in Oak Ridge, Tennessee. Capable of over 200 petaflops (200 quadrillion operations per second), Summit consists of 4600 IBM dual socket Power 9 nodes, connected by over 185 miles of fiber optic cabling. Each node is equipped with 6 NVIDIA Volta TensorCore GPUs, delivering total throughput that is 8 times faster than its predecessor, Titan, for double precision tasks, and 100 times faster for reduced precision tasks common in deep learning and AI. China has held the top spot in the Top 500 for the last 5 years, so this brings the virtual HPC crown home to the USA.
Some of the specifications are truly amazing; the system exchanges water at the rate of 9 Olympic pools per day for cooling, and as an AI supercomputer, Summit has already achieved (limited) “exascale” status, delivering 3 exaflops of AI precision performance. What may be more important, though, is the science that this new system will enable—it is already at work on drug discovery using quantum chemistry, chronic pain analysis, and the study of mitochondrial DNA.
For those who cannot afford a full-fledged $100M supercomputer, NVIDIA also announced the new HGX-2 chassis, available from many vendors, which can be connected to a standard server for some serious AI in a box. DGX-2 supports 16 Volta GPUs, interconnected via the new NVSwitch networking to act as a single massive GPU, to deliver 2 petaflops of performance for AI and HPC. As you can see, NVIDIA is paying a lot of attention to the idea of fusing AI with HPC.
The scientific advances in deep neural networks (DNNs) for HPC took center stage in the announcement. As I have noted in previous articles, DNNs are showing tremendous promise in High Performance Computing (HPC), not just on DNNs can be trained with massive datasets, created by running traditional simulations on supercomputers. The resulting AI can then be used to predict outcomes of new simulations with startling accuracy and can be completed in 1/1000th the time and cost. The good news for NVIDIA is that both supercomputing and AI are powered by—you guessed it, NVIDIA GPUs. Scientists have even more tools to use GPU hardware and to develop GPU software with NVIDIA’s new platforms.
The announcement of Summit as the world’s fastest computer was not a surprise; as a public project funded by the U.S. DOE, Summit has frequently been the subject of discussion. What is significant is that NVIDIA and the DOE believe that the future of HPC will be infused with AI, all running on the same hardware. The NVIDIA GPUs are delivering 95% of Summit’s performance, cementing the legitimacy and leadership of GPU-accelerated computing. HGX-2 makes that an affordable path for many researchers and cloud providers, while Summit demonstrates the art of the possible and a public platform for research. When combined, AI plus HPC also paves the way for future growth for NVIDIA.
The Summit system, with 9,216 IBM processors boosted by 27,648 Nvidia graphics chips, takes as much room as two tennis courts and as much power as a small town. It’ll be used for civilian research into subjects like material science, cancer, fusion energy, astrophysics and the Earth’s changing climate.
Summit can perform 200 quadrillion (200,000 trillion) calculations per second, or 200 petaflops. Until now, the world’s fastest supercomputer has been the Sunway TaihuLight system at the National Supercomputing Center in Wuxi, China, capable of 93.01 petaflops.