On Thursday, Argonne National Laboratory and Intel announced the successful completion of the installation of 10,624 blades for the Aurora supercomputer. This system is expected to be operational later in 2023. The Aurora supercomputer, developed by HPE, consists of 166 racks, each containing 64 blades, for a total of 10,624 blades. Each of these blades is based on two Xeon Max “Sapphire Rapids” processors with 64 GB of on-pack HBM2E memory and six Intel Data Center Max “Ponte Vecchio” compute GPUs. To achieve an impressive performance of over two FP64 ExaFLOPS, these CPUs and GPUs are cooled using a special liquid cooling system.
The Aurora supercomputer impresses with its extensive resources. It has a total of 21,248 general-purpose CPUs equipped with over 1.1 million high-performance cores. It also has impressive memory capabilities, including 19.9 petabytes (PB) of DDR5 memory and 1.36 PB of HBM2E memory directly connected to the CPUs. In addition, the supercomputer is equipped with 63,744 compute GPUs specifically designed for massively parallel AI and HPC workloads. These GPUs have 8.16 PB of HBM2E memory. To make the most of the power of these resources, the Aurora supercomputer blades are interconnected via HPE’s Slingshot Fabric. This interconnect technology, developed specifically for supercomputers, ensures efficient data transfer and communication between the system’s components.
Jeff McVeigh, corporate vice president and general manager of Intel’s Super Compute Group, said Aurora represents the first use of Intel’s Max Series GPU. It is also the largest Xeon Max-based system and the world’s largest GPU cluster. He emphasized the proud feeling of being part of this historic system and expressed excitement about the revolutionary possibilities in AI, science and technology that Aurora will enable. The Aurora supercomputer consists of a network of 1,024 storage nodes. These nodes include solid-state storage devices with an impressive 220 TB capacity and a total bandwidth of 31 TB/s.
While the installation of the Aurora blades has been successfully completed, acceptance testing of the supercomputer is still pending. When commissioned later this year as planned, it will reach a theoretical peak performance of more than 2 exaFLOPS, making it the first supercomputer with that kind of power and putting it on the Top500 list. Rick Stevens, deputy lab director at Argonne National Laboratory, said, “As we prepare to conduct acceptance testing, we will use Aurora to train large-scale generative open-source AI models for scientific purposes. With more than 60,000 Intel Max GPUs, an extremely fast I/O system, and a solid-state mass storage system, Aurora provides the ideal environment for this training.”
While the Aurora supercomputer is still in the testing phase and the ANL has yet to submit its performance results to Top500.org, Intel took the opportunity to showcase the superior performance of its hardware compared to competing solutions from AMD and Nvidia. According to Intel, preliminary tests with the Max series GPUs have shown that they excel in “real-world scientific and technical workloads.” They offer twice the performance of AMD’s Instinct MI250X GPUs on OpenMC and can scale almost perfectly across hundreds of nodes. Additionally, Intel states that their Intel Xeon Max series CPUs offer a 40% performance advantage over their competitors in numerous real-world HPC applications such as HPCG, NEMO-GYRE, Anerlastic Wave Propagation, BlackScholes and OpenFOAM.
Source: TomsHardware
9 Antworten
Kommentar
Lade neue Kommentare
Urgestein
Veteran
1
Urgestein
Urgestein
Urgestein
Urgestein
Urgestein
Neuling
Alle Kommentare lesen unter igor´sLAB Community →