Two ExaFLOPS: Aurora supercomputer with Intel Max Series CPUs and GPUs is finally finished

23. June 2023 05:40

On Thursday, Argonne National Laboratory and Intel announced the successful completion of the installation of 10,624 blades for the Aurora supercomputer. This system is expected to be operational later in 2023. The Aurora supercomputer, developed by HPE, consists of 166 racks, each containing 64 blades, for a total of 10,624 blades. Each of these blades is based on two Xeon Max “Sapphire Rapids” processors with 64 GB of on-pack HBM2E memory and six Intel Data Center Max “Ponte Vecchio” compute GPUs. To achieve an impressive performance of over two FP64 ExaFLOPS, these CPUs and GPUs are cooled using a special liquid cooling system.

The Aurora supercomputer impresses with its extensive resources. It has a total of 21,248 general-purpose CPUs equipped with over 1.1 million high-performance cores. It also has impressive memory capabilities, including 19.9 petabytes (PB) of DDR5 memory and 1.36 PB of HBM2E memory directly connected to the CPUs. In addition, the supercomputer is equipped with 63,744 compute GPUs specifically designed for massively parallel AI and HPC workloads. These GPUs have 8.16 PB of HBM2E memory. To make the most of the power of these resources, the Aurora supercomputer blades are interconnected via HPE’s Slingshot Fabric. This interconnect technology, developed specifically for supercomputers, ensures efficient data transfer and communication between the system’s components.

Jeff McVeigh, corporate vice president and general manager of Intel’s Super Compute Group, said Aurora represents the first use of Intel’s Max Series GPU. It is also the largest Xeon Max-based system and the world’s largest GPU cluster. He emphasized the proud feeling of being part of this historic system and expressed excitement about the revolutionary possibilities in AI, science and technology that Aurora will enable. The Aurora supercomputer consists of a network of 1,024 storage nodes. These nodes include solid-state storage devices with an impressive 220 TB capacity and a total bandwidth of 31 TB/s.

While the installation of the Aurora blades has been successfully completed, acceptance testing of the supercomputer is still pending. When commissioned later this year as planned, it will reach a theoretical peak performance of more than 2 exaFLOPS, making it the first supercomputer with that kind of power and putting it on the Top500 list. Rick Stevens, deputy lab director at Argonne National Laboratory, said, “As we prepare to conduct acceptance testing, we will use Aurora to train large-scale generative open-source AI models for scientific purposes. With more than 60,000 Intel Max GPUs, an extremely fast I/O system, and a solid-state mass storage system, Aurora provides the ideal environment for this training.”

While the Aurora supercomputer is still in the testing phase and the ANL has yet to submit its performance results to Top500.org, Intel took the opportunity to showcase the superior performance of its hardware compared to competing solutions from AMD and Nvidia. According to Intel, preliminary tests with the Max series GPUs have shown that they excel in “real-world scientific and technical workloads.” They offer twice the performance of AMD’s Instinct MI250X GPUs on OpenMC and can scale almost perfectly across hundreds of nodes. Additionally, Intel states that their Intel Xeon Max series CPUs offer a 40% performance advantage over their competitors in numerous real-world HPC applications such as HPCG, NEMO-GYRE, Anerlastic Wave Propagation, BlackScholes and OpenFOAM.

Source: TomsHardware

9 Antworten

Zeige alle Kommentare an

Kommentar

Lade neue Kommentare

eastcoast_pete

Urgestein

1,686 Kommentare 1,029 Likes

#1 Jun 23, 2023

Na endlich! Das Baby kam nun wirklich mit der Zange zur Welt! Ist ja gut zu sehen, daß Intel das Supercomputer Bauen nicht verlernt hat, aber sollte Aurora nicht schon 2019 fertig sein? Und das war schon die verschobene Fertigstellung (und ja, vergrößerte Kapazität von 2015). Und ich weiß, daß HPE das Ding zusammen gebaut hat, aber es lag an den Intel blades, die ewig nicht fertig wurden, daß es so lange gedauert hat. Hoffentlich hat sich das Warten gelohnt. Auf jeden Fall war das für Intel was im Englischen so schön als "command performance" bezeichnet wird - das mußte klappen!
Und jetzt bin ich gespannt, ob und wie die Technik der Ponte Vecchio Beschleuniger sich in der nächsten Generation von Intels GPUs niederschlägt.

Antwort Gefällt mir

Ocastiâ

Veteran

108 Kommentare 50 Likes

#2 Jun 23, 2023

Das klingt ja alles ganz gut aber Läuft darauf Crysis?

Antwort Gefällt mir

Igor Wallossek

10,454 Kommentare 19,547 Likes

#3 Jun 23, 2023

Nur Krisis :D

Antwort 1 Like

eastcoast_pete

Urgestein

1,686 Kommentare 1,029 Likes

#4 Jun 23, 2023

Bei Aurora hat Intel selbst "die Krise" bekommen. Das war stellenweise schon ziemlich peinlich. Sowohl die CPUs (Sapphire Rapids) als auch die GPU-basierten Beschleuniger (Ponte Vecchio) die hier im Einsatz sind gingen ja um Jahre verspätet in die Serienproduktion. Deshalb mußte Aurora jetzt auch sitzen/laufen, da stand für Intel viel auf dem Spiel.

Antwort Gefällt mir

cunhell

Urgestein

562 Kommentare 529 Likes

#5 Jun 23, 2023

Die müssen das Ding erst mal durch die Abnahme kriegen und die versprochenen Performancewerte erreichen.
Denn so wie ich den Artikel verstanden habe, haben die jetzt einfach mal das letzte Bauteil eingebaut.

Von einem stabilen Regelbetrieb sind die noch ne ganze Ecke entfernt.

Cunhell

Antwort Gefällt mir

8j0ern

Urgestein

2,734 Kommentare 852 Likes

#6 Jun 23, 2023

Warts ab, sie werden liefern !

Aber, um welchen Preis ? 🧐

Antwort Gefällt mir

cunhell

Urgestein

562 Kommentare 529 Likes

#7 Jun 24, 2023

Das Ding hinzustellen ist der einfache Part sofern die Bauteile da sind.
Das Teil als Ganzes zum Laufen zu kriegen ist alles andere als trivial.

Normalerweise wird ein Budget festgelegt und man nimmt denn Hersteller der für den Preis die höchste Leistung liefert bzw. zu liefern verspricht.
Kann natürlich sein, dass das bei den Amis anders ist insbes. bei manchen Labs.
Wenn das Teil dann endgültig läuft, müssen die erst mal nachweisen, dass sie die versprochene Leistung auch liefern können.
Papier ist nämlich geduldig.
Eventuelle defekte Bauteile kommen noch on top.

Bis so eine Kiste rund läuft vergeht noch ne Weile. Und selbst ein erfolgreicher Linpack-Lauf bedeutet noch lange keinen stabilen Betrieb.

Cunhell

Antwort 1 Like

Martin Gut

Urgestein

7,939 Kommentare 3,698 Likes

#8 Jun 24, 2023

Nur noch den Grafikkartentreiber programmieren und ein paar Fehler beheben, dann könnte es laufen. :unsure: :p

Antwort Gefällt mir

HardwareEcke

Neuling

4 Kommentare 1 Likes

#9 Jun 26, 2023

Das hört sich doch super an! Sobald ich bestellen kann, werde ich das tun, oder soll ich lieber noch auf BIOS Updates warten? Nicht das er abraucht, wie die x3D-Cpu's :D

Antwort Gefällt mir

Alle Kommentare lesen unter igor´sLAB Community →

Danke für die Spende

Du fandest, der Beitrag war interessant und möchtest uns unterstützen? Klasse!

Hier erfährst Du, wie: Hier spenden.

Hier kannst Du per PayPal spenden.

Ryzen 7 5700 without integrated graphics: Cezanne lives on

NVIDIA capitalizes, but TSMC controls production of AI processors

About the author

View All Posts

Igor Wallossek

Editor-in-chief and name-giver of igor'sLAB as the content successor of Tom's Hardware Germany, whose license was returned in June 2019 in order to better meet the qualitative demands of web content and challenges of new media such as YouTube with its own channel.

Computer nerd since 1983, audio freak since 1979 and pretty much open to anything with a plug or battery for over 50 years.

Follow Igor:
YouTube Facebook Instagram Twitter