Stability AI shows AI benchmark between Gaudi 2 and H100 and A100 GPU accelerator: Intel beats NVIDIA!

It is clear to many that AI has become an important part of technology. However, the question arises as to which company is taking the lead when it comes to AI In terms of GPUs, this is particularly exciting, which is why Stability AI used Stable Diffusion to compare Intel’s Gaudi 2 and NVIDIA’s H100 and A100 GPU accelerators.

First of all, what is Stable Diffusion? It is an AI generator that can process text into realistic images. The company responsible for this is Stability AI. With Stable Diffusion 3, it offers a parameter number of 800M to 8B parameters. The 2B parameter version was used for the analysis. The benchmark is tested on 2 nodes, i.e. 16 accelerators, and shows an interesting result.

The Gaudi 2 system processed 927 training images per second and thus achieved 1.5 times greater performance than NVIDIA’s H100-80GB. In addition, a stack size of 32 per accelerator could be installed in the 96 GB High Bandwidth Memory (HBM2E) of Gaudi 2 to further increase the training rate to 1,254 images per second.

Then we continued with 32 nodes, which corresponds to 256 accelerators. Here the Gaudi2 was also able to show a clear performance. It generated 12,654 images per second and was thus able to generate slightly more than 3x more images than the A100-80GB.

A second model was also used for testing. This is Stable Beluga 2.5 70B and is a tuned version of LLaMA 2 70B, which is based on the Stable Beluga 2 model. The company ran this training benchmark on 256 Gaudi 2 accelerators. When running the PyTorch code without additional optimizations, the average total throughput was 116,777 tokens per second. For the 70B language model on Gaudi 2, an interference test generated 673 tokens/second per accelerator, using an input token size of 128 and an output token size of 2048. Compared to TensorRT-LLM, Gaudi 2 appears to be 28% faster than the 525 tokens per second on the A100.

According to Stable Diffussion, Gaudi 2 is expected to outperform the A100 chips with further optimizations, as currently the A100 chip has a 40% better performance to generate images, mainly due to the TensorRT optimization. However, it is only a question of how long this will be the case.

On inference tests with the Stable Diffusion 3 8B parameter model the Gaudi 2 chips offer inference speed similar to Nvidia A100 chips using base PyTorch. However, with TensorRT optimization, the A100 chips produce images 40% faster than Gaudi 2. We anticipate that with further optimization, Gaudi 2 will soon outperform A100s on this model. In earlier tests on our SDXL model with base PyTorch, Gaudi 2 generates a 1024×1024 image in 30 steps in 3.2 seconds, versus 3.6 seconds for PyTorch on A100s and 2.7 seconds for a generation with TensorRT on an A100.

Source: Stability AI

Bisher keine Kommentare

Beginne eine Diskussion

Kommentar

Lade neue Kommentare

Redaktion

Artikel-Butler

2,009 Kommentare 9,694 Likes

Angepinnt Mar 13, 2024

Dass KI mittlerweile für die Technik ein bedeutender Bestandteil geworden ist, ist wohl für viele klar. Allerdings stellt sich die Frage, welches Unternehmen nimm in Sachen KI die Führung ein? In Bezug auf die GPU ist das besonders spannend und deswegen hat Stability AI mit Stable Diffusion einen Vergleich zwischen Intels Gaudi 2 und NVIDIA H100- bzw. A100-GPU-Beschleuniger unternommen. Vorab erst, was ist Stable Diffusion? Dabei handelt es sich um einen KI-Generator, der Text zu realistischen Bildern verarbeiten kann. Dabei ist das Unternehmen Stability AI verantwortlich. Sie bietet mit Stable Diffusion 3 eine Parameterzahl von 800M bis 8B Parametern. Für […] (read full article...)

Antwort 1 Like

Alle Kommentare lesen unter igor´sLAB Community →

Danke für die Spende

Du fandest, der Beitrag war interessant und möchtest uns unterstützen? Klasse!

Hier erfährst Du, wie: Hier spenden.

Hier kannst Du per PayPal spenden.