Blackwell and the theory
Power gating and the use of separate power rails play a central role in optimizing the power supply of modern GPUs, especially in demanding architectures such as the NVIDIA Blackwell series. Advanced power gating technology makes it possible to selectively switch off individual functional areas of the GPU when they are not required. This is done with a fine granularity so that even the smallest sections of the GPU can be dynamically switched on or off, which significantly reduces energy consumption without compromising performance.
The introduction of separate power rails for GPU cores and the memory system is another significant innovation. These separate power supplies allow the voltage to be specifically adapted to the requirements of the respective subsystem. For example, while memory areas require a constant supply, GPU cores can be switched off completely during idle times. This separation not only increases energy efficiency, but also reduces heat generation.
Another interesting feature of the Blackwell architecture is its ability to activate and deactivate these energy-saving modes at high speed. In this way, power-saving states can be switched on and off in time with individual frames, optimizing efficiency even in scenarios with highly variable load requirements. Accelerated Frequency Switching technology, introduced in the NVIDIA Blackwell architecture, also represents a significant advance in the dynamic power and performance management of modern GPUs. This feature enables extremely fast clock frequency adaptation to the varying demands of workloads, up to 1000 times faster than in previous GPU generations. This means that the GPU no longer has to keep its frequency stable over several milliseconds, but can react to changes within microseconds.
The main advantage of Accelerated Frequency Switching is its ability to optimize energy consumption and efficiency under dynamic working conditions. Traditionally, GPU clocks were often almost constant for the duration of a frame, as frequency adaptation was comparatively sluggish. This rigidity meant that the GPU either maintained unnecessarily high clock rates even when there was no load, or could not respond quickly enough to sudden peak loads, resulting in potential performance degradation.
With the Blackwell architecture and Accelerated Frequency Switching, clocks can now be adjusted much more precisely to the actual workloads. This happens both during the active calculation phases and in the idle times between individual workloads. For example, the GPU can immediately lower its frequency during a frame in which less computing work is required for a short time in order to save energy, and increase it again in the next clocks as soon as more power is required. This ability to adapt quickly leads to a noticeable reduction in energy consumption, especially in scenarios with variable or short-term workloads.
Another advantage of this technology is the ability to adjust the GPU voltage faster according to frequency. This helps to minimize power loss due to unnecessary voltage settings and further reduce heat generation. The result is an overall higher energy efficiency with a simultaneous improvement in peak performance.
Accelerated Frequency Switching not only offers advantages in terms of energy savings, but also increases performance under real-world conditions. As the GPU is able to mobilize free energy and performance reserves more quickly, performance peaks are handled better without any negative effects on stability or latency times. This is particularly beneficial for applications such as games or AI-supported rendering, where workloads can vary greatly. These innovations are key to meeting the performance requirements of modern GPUs while keeping energy consumption within manageable limits. But what does the practice say?
Note: On the subject of accelerated frequency switching and gating, I also have the temperature diagrams in the next chapter, because you can even measure these changes!
Power Supply
Even though I’m not doing a teardown today, I want to briefly touch on the (reference) PCB design, especially since I also have some board partner cards. The 16 voltage regulators for NVDD, which powers the GPU core, are nothing new. However, NVIDIA has now reintroduced separate voltages for GDDR7 memory (6 voltage regulators) and the frame buffer (7 voltage regulators), similar to what Intel and AMD are doing.
The frame buffer in a graphics card is a specific section of memory dedicated to storing the pixel information of the displayed image. It contains data such as color depth, transparency, and resolution, and it is continuously updated by the GPU to provide the output displayed on the monitor. The frame buffer is directly connected to the graphics memory, which operates under the MSVDD voltage. This memory serves as the physical resource accessed by the frame buffer. However, why this part, which only requires up to around 40 watts, is powered by 7 voltage regulators is somewhat unclear to me.
FBVDD ensures the stability and accuracy of data transfers between the GPU and memory, especially at high clock speeds. In contrast, the MSVDD voltage manages the operation of the memory chips themselves. This voltage directly affects the speed and stability of the memory, as it meets the electrical requirements of the memory cells and the memory controller logic. MSVDD and FBVDD work closely together, as the memory logic and frame buffer must communicate efficiently to exchange image data between the GPU and memory. The separate regulation of MSVDD and FBVDD allows precise voltage adjustments to meet the specific demands of each component.
I plan to write a dedicated foundational article on this topic when I have more time to explore it in detail.
Problems when measuring power consumption with riser cards
Nothing at first, because PCIe 5.0 poses considerable challenges to signal integrity due to its high data transfer rates, especially in conjunction with additional components such as riser cables or internal connections within the graphics card. While previous generations of PCIe were more tolerant of signal interference, PCIe 5.0 requires much more precise signal transmission due to the doubling of the data rate to 32 GT/s. Any additional connection – be it a riser cable or an internal connection cable between the graphics card’s motherboard and its PCIe connector – can cause potential signal loss, reflections or distortions that affect stability.
A common problem is that such connections change the impedance of the signal system. These changes lead to reduced signal quality, especially with longer or poorly shielded cables. Another problem arises from the potential introduction of crosstalk between parallel lines if they are not sufficiently insulated. In practice, such interference manifests itself in instabilities such as boot problems, unexpected crashes or the inability of the system to initialize the graphics card correctly.
The discussion about riser cables and adapters shows that not only users but also development teams such as NVIDIA are struggling with the complexity of this issue. The experience with boot problems with the RTX 5090 in conjunction with riser cables or even NVIDIA’s first generation PCAT adapter (the new one did it so far) illustrates how critical signal integrity is for this hardware to function correctly. The problem is exacerbated by the fact that the Founders Edition internally utilizes a cable connection between the card’s motherboard and its PCIe connector, introducing additional resistance and potential signal loss. While this design decision may be made for aesthetic reasons, it leads to increased susceptibility to interference.
This leads to a fundamental debate about prioritizing design over function. While appealing looks and innovative form factors are important, “form follows function” should be the top priority. Technically, this means that hardware must be designed to perform optimally under real-world operating conditions before aesthetic considerations are taken into account. A design that compromises on signal integrity in favor of optical or mechanical stunts is not sustainable and can significantly impact the user experience. I ended up having to solder together a new solution, but what does the average user do who might want to install their card vertically in the case and use an additional riser cable?
Workaround: Set the PCIe version in the BIOS to Gen3 or Gen 4 and use either the iGPU or an older card for the boot process. The performance losses are around 10 to 15 percent for Gen 3 (indisputable) and between 0 and 4 percent for Gen 4.
Total power consumption and compliance with standards in practice
The increased idle power consumption of 29 to just under 34 watts indicates a potential and veritable driver problem, if this is the case with the power rails, which is also directly related to the resolution and refresh rate of the monitor used. Especially at higher settings, the energy consumption seems to increase unnecessarily, which speaks for optimization possibilities. Such a deviation is particularly unusual in idle states and can possibly be remedied by future driver updates.
There are also conspicuous values under load. In some demanding games, the maximum power consumption reaches up to 600 watts, which almost exhausts the limit of the 12V2X6 power supply design. This underlines the need to carefully consider the system configuration and cooling options to avoid stability issues. Such peak loads also suggest that power requirements are operating close to the upper limit of specifications during intensive graphics calculations.
Interestingly, lower resolutions such as Full HD and QHD show a more economical behavior. This could indicate a more efficient use of resources in these modes. In addition, the use of DLSS in combination with technologies such as MFG offers an opportunity to further reduce energy requirements by specifically lowering the performance requirements of the GPU. The significance of load peaks, some of which can exceed the values mentioned, will be analyzed in more detail later on. They could provide important indications of the energy requirements in specific application scenarios and require detailed consideration.
The mainboard slot, also known as the PCIe slot (PEG: PCI Express Graphics), is designed for peak currents of up to 5.5 amps at a voltage of 12 volts in accordance with the PCI-SIG standard. This corresponds to a maximum power consumption of 66 watts, which can be supplied via this slot. The PCI-SIG standard aims to ensure a consistent and reliable power supply via the mainboard slot without compromising the stability of the system. The specified limit value of 5.5 amps takes into account peak loads that can occur briefly, for example during a load change. Such load peaks must not overload the system or affect other components due to voltage fluctuations.
A key aspect of this standard is that it sets clear limits for motherboard manufacturers and graphics card developers, ensuring interoperability and compatibility between different systems. Excessive loads on the motherboard slot could lead to thermal problems or damage to traces and connectors. The fact that the graphics card only places a moderate load on the slot not only ensures stable operation, but also preserves the longevity of the hardware components. In addition, this requirement creates scope for external power connections that can take on higher loads. In the case of the card, which already reaches the limits of the power supply due to the 12V2X6 design, the low load on the PEG slot is a positive feature. It shows that the card efficiently distributes between the different power sources and does not put unnecessary load on the mainboard. With a maximum of 1.2 amps (14.4 watts), the card in question only makes minimal use of this leeway, which makes it appear very efficient in terms of the load on the motherboard slot. The 12V2x6 has to suffer all the more for this.
- 1 - Details zur Blackwell GB202 GPU
- 2 - DLSS4 einfach und im Detail erklärt
- 3 - Neurale Shader als echte Game-Changer?
- 4 - Pathtracing: Grundlagen und Verbesserungen mit Benchmarks
- 5 - Testsystem und Equipment
- 6 - Gaming: Full-HD 1920x1080 Pixels (Rasterization Only)
- 7 - Gaming: WQHD 2560x1440 Pixels (Rasterization Only)
- 8 - Gaming: Ultra-HD 3840x2160 Pixels (Rasterization Only)
- 9 - Gaming: WQHD 2560x1440 Pixels, Supersampling, RT & FG
- 10 - Gaming: Ultra-HD 3840x2160 Pixels, Supersampling, RT & FG
- 11 - DLSS4 und MFG: Cyberpunk 2077 im Detail
- 12 - DLSS4 und MFG: Alan Wake 2 im Detail
- 13 - PCIe 5 Probleme, Leistungsaufnahme in Theorie und Praxis
- 14 - Lastspitzen nativ vs. DLSS4, Netzteilempfehlung
- 15 - Kühler, Temperaturen, Thermografie, Geräuschentwicklung
- 16 - Zusammenfassung und Fazit
220 Antworten
Kommentar
Lade neue Kommentare
Mitglied
Urgestein
Urgestein
1
Mitglied
Urgestein
1
Mitglied
Urgestein
Urgestein
Mitglied
Mitglied
Mitglied
Urgestein
1
Urgestein
Neuling
Urgestein
Mitglied
Alle Kommentare lesen unter igor´sLAB Community →