After my article about the capacitors on the GeForce RTX 3080 and RTX 3090 has made a lot of waves, I have to criticize myself in two respects. I had underestimated that the readers might be interested in a somewhat longer and more technical explanation of the circumstances or background and that secondly, the (thoughtless) adoption of the terms commonly used by the manufacturers might also cause nitpickers to question the rest of the explanations in general.
In addition, I had put in front why some cards can crash despite MLCC or why some cards can’t crash despite being exclusively equipped with the supposedly worse solid caps. Unfortunately, many have overlooked this. I will return to this at the very end, but if you want to know what I mean in general, you are welcome to read this again and concentrate on the first few paragraphs:
By the way, I have (again) tried to break down the following to a comprehensible level and deliberately left out many details because they are not related to the actual problem. But in order to understand what the capacitors are at the end of the whole voltage supply chain and why you need them, why it’s not just NVIDIA cards that may be affected and what conclusions you should draw from them, we first need to look at voltage regulation and telemetry of current graphics cards.
The telemetry of current graphics cards
NVIDIA’s Boost (and also AMD’s Power Tune) are highly complex entities that are designed to help you achieve maximum graphics performance with the least amount of power consumption and the resulting side effects, such as waste heat. Even if there are sometimes considerable differences in the details and technical implementation, the two mechanisms are quite similar in their schematic structure. Unfortunately, the graphics cards are no longer the patient “consumers” they were a few years ago, but fidgety little fellows.
The main concern is always to adjust the core voltage of the GPU in real time in such a way that only as much power is supplied as is actually needed for the current load of the GPU and to achieve the optimum clock rate. Let’s call it a strongly simplified voltage curve. In Nvidia’s Boost, we have stored the individual boost steps together with the default voltage, whereby the clock of the lowest boost level is shifted and/or fixed by a so-called offset and the rest is then calculated by the arbitrator (manager, dispatcher).
The firmware continuously determines the energy consumption in very short intervals (i.e. virtually in real time), simultaneously queries all sensors and the GPU prediction and also includes the telemetry data of the voltage converters. These values are transferred to the digital power management and thus to the arbitrator in question. This control complex also knows the power, thermal and current limits of the GPU (BIOS, drivers), which it can read from the respective registers. Within these limits, it now controls all voltages, clock frequencies and fan speeds, always trying to get the maximum performance out of the card. If even one of the input variables is exceeded or undercut, the arbitrator can decrease or increase voltage or clock.
Please note: the values of clock rate, voltage and flowing currents can fluctuate extremely and quickly depending on the situation!
Special features of the power supply
Of course, I don’t want to get bogged down in technical details of the appropriate voltage conversion and monitoring, which would probably bore most people anyway, but we will have to go a little bit deeper for a better understanding. Let us therefore now go directly to the so important voltage transformers (in the diagram above at right). We already suspect it: the control circuits of the power supply work almost like normal switched-mode power supplies, although the frequencies here can usually be between 300 and 500 KHz. The following diagram illustrates the process of voltage conversion in a simplified way.
Let us now look at how one of the existing control loops works. When it is the turn of the relevant phase, the PWM controller sends a small control signal to the gate terminal of the MOSFET. This becomes conductive and the current flows from the source to the drain. The coil behind the MOSFET now builds up a magnetic field and stores the energy so that it can generate a reverse voltage opposite to the input voltage if necessary. To prevent the MOSFET from burning up, the control signal is immediately removed from the gate of the MOSFET and the gate becomes non-conductive. The coil is no longer traversed by current and releases the stored energy.
At the end of each control loop there is the coil just mentioned and a larger capacitor. The coil thus ensures that the starting current is limited, it stores the energy in the magnetic field and then ensures the induction of a counter voltage. The capacitor smoothes the whole thing as well as possible to provide a voltage that is as smooth and ripple-free as possible. Well, what does smooth mean… But almost. And no matter how many phases need to be controlled and perhaps also intelligently balanced, a PWM controller needs two values as feedback from each individual control loop (each phase): the actual current flow and the temperature. Both are important for the telemetry. This is where DCR (Direct Current Resistance) comes into play.
The monitoring can be different, because there are – no wonder – different methods for this. You often read something about the so-called Smart Power Stages (SPS) and the so-called MOSFET DCR. The picture below shows the typical layout with the intelligent SPS that provide the current value for each individual control loop with IMON (and temperature with TMON), which is so urgently needed for perfect balancing, i.e. the balance between phases. How the SPS determines this value? The drain currents of the MOSFETS are measured in real time and these values are also extremely accurate. The much cheaper Inductor DCR, i.e. a current measurement via the inductive resistance of the respective filter coils in the output range, I will spare now, because it is not useful to explain all existing variants.