Allgemein Artificial Intelligence Latest news

Google presents Ironwood, the seventh generation of in-house AI accelerators – 24 times the performance of El Capitan should be possible

As part of the “Google Cloud Next ’25” event, Google presented its latest generation of specialized computing accelerators, the Ironwood AI Accelerator. It is the company’s first TPU chip that has been optimized from the ground up for inference tasks. While training units have been the main focus of chip development to date, the focus is increasingly shifting to inference processes – an area that Google has identified as the next stage of development in the AI industry.

Source: Youtube

Technical foundation: From Trillium to Ironwood

Ironwood is based on a complete overhaul of the architecture. Compared to the previous Trillium generation, the chip has undergone a number of far-reaching changes. Particularly noticeable is the massive expansion of the High Bandwidth Memory (HBM), which now amounts to 192 GB per chip – a six-fold increase compared to its predecessor. This is associated with a significant increase in memory bandwidth to 7.2 TB/s, which in turn plays a decisive role in the processing of large models and data volumes. The exchange between the memory banks thus takes place with considerably lower latency, which in practice ensures noticeable increases in efficiency. The Inter-Chip Interconnect (ICI) bandwidth has been doubled to 1.2 terabits per second. This interface, also referred to internally as the backbone of distributed AI processing, enables faster communication between the chips, which is essential for large inference clusters. According to Google, the efficiency per watt has been doubled compared to the previous TPU generation – a figure that drastically improves the energy-to-power ratio, at least on paper.

Source: Youtube

Scaling and cluster architecture: 9,216 chips and 42.5 exaflops

Ironwood will be available in two expansion stages: a configuration with 256 chips for small to medium-sized inference tasks and a maximum configuration with 9,216 chips. The latter is said to achieve a computing power of 42.5 exaflops. By way of comparison, the world’s most powerful supercomputer at present, El Capitan, is said to achieve a peak performance of just under 2 exaflops, according to publicly available sources. The Ironwood clusters would therefore be 24 times faster in purely mathematical terms, assuming Google’s claims stand up to reality – although this has not yet been independently verified. It should be noted that this is not a classic supercomputer in the sense of traditional HPC applications, but an infrastructure designed strictly for AI inference. Nevertheless, this comparison is relevant as it underlines the increasing dominance of AI-specific hardware over classic general-purpose systems.

Strategic importance: a challenge to NVIDIA

With Ironwood, Google is clearly positioning itself against NVIDIA’s supremacy in the AI sector. While NVIDIA continues to serve the majority of the market with its H100 and soon-to-be-released B100 GPUs, cloud providers such as Google, Amazon and Microsoft are increasingly trying to offer a more cost-effective and specialized alternative through their own developments. Amazon is focusing on the Graviton and Trainium series, while Microsoft recently took its first step towards in-house AI hardware with the Maia 100. Google’s move with Ironwood shows that the trend towards vertical integration – i.e. hardware and software from a single source – is gaining momentum. The days when large cloud providers relied entirely on third-party GPUs appear to be coming to an end.

Outlook: Inference instead of training as the new paradigm?

The decision to focus Ironwood primarily on inference tasks is no coincidence. While the training of large language models still requires immense computing resources, the future volume of AI applications clearly lies in the area of real-time evaluation and response – in other words, where inference chips shine. Energy efficiency, throughput and latency are particularly important here – all parameters that can often only be mapped suboptimally with classic GPUs.With Ironwood, Google is addressing precisely these requirements – on a scale that is also likely to be economically relevant. However, it is not yet clear whether the chips will be integrated into Google’s cloud offering across the board or whether they will initially be reserved for just a few major customers. There is also currently a lack of reliable information on the actual availability, production capacity and scalability of the platform. Ironwood marks another milestone in Google’s efforts to position itself as more technologically independent and at the same time better serve the growing requirements of modern AI systems. Whether the promised performance values can be delivered in practice remains to be seen. What is clear, however, is that the days when NVIDIA had sole control of the AI accelerator market are numbered – at least in theory. In practice, the acid test is now upon us.

Source: Youtube

Kommentar

Lade neue Kommentare

e
eastcoast_pete

Urgestein

2,477 Kommentare 1,622 Likes

Ohne jetzt das YT Video gesehen zu haben, schonmal die Frage: mit welcher Sparsity sind die 42,5 Exaflops denn gemessen worden? Denn es macht zB bei Fließkommaleistung einen Riesenunterschied, ob die Flops für fp4 oder fp64 angegeben werden.

Antwort Gefällt mir

O
Oberst

Veteran

380 Kommentare 168 Likes

Die schafft Google mit FP8 (4.614 TFLOPS pro Chip mit maximal 9.216 Chips). Der Vergleich ist daher schlicht falsch. Vermutlich kann das Ding nicht einmal FP32, geschweige denn FP64 (zumindest hat Google keine FP32 Daten angegeben).

Antwort 1 Like

8j0ern

Urgestein

3,683 Kommentare 1,185 Likes

:cool:

Antwort Gefällt mir

Danke für die Spende



Du fandest, der Beitrag war interessant und möchtest uns unterstützen? Klasse!

Hier erfährst Du, wie: Hier spenden.

Hier kannst Du per PayPal spenden.

About the author

Samir Bashir

Werbung

Werbung