It is important to emphasize that this report is not intended to be sensationalist, but should be seen as a suggestion for manufacturers to pay more attention to the thermal load of circuit boards. Individual board partners are not mentioned by name, as this problem occurs in a similar form on almost all entry-level boards from all manufacturers. Although the extent of the hotspot can be mitigated by a well-tuned active cooling system, it cannot be completely eliminated. The aim is therefore to show how design adjustments can improve thermal efficiency and reduce the load on components in the long term.
This article discusses a thermal hotspot caused by the compact arrangement of ten voltage converters for NVVDDs. These voltage converters are positioned very close together on the board, which means that the conductor paths with the generated voltage have to be routed to the GPU in a very confined space. Particularly affected are boards that are strongly based on the reference design and do not have an eleventh phase for NVVDD. In such cases, suboptimal cooling of the voltage converters (VRM) and the lack of passive cooling on the back of the board can lead to undesirable thermal stress. The focus of this article is on explaining the technical background to this problem and highlighting possible optimization approaches. Especially for the “cheaper” cards (assuming the price of the RTX 5080) up to the cards that the manufacturers are required by NVIDIA to offer at the maximum RRP.
Starting point of the investigation
Let’s now take a look at a board that closely follows the reference design and is equipped with a total of 17 phases: 10 phases for NVVDD, 4 phases for MSVDD and 3 phases for FBVDD. A large part of the maximum power limit of 400 watts is accounted for by the 10 phases for NVVDD. To illustrate this, I show a projection of the measured hotspot including the topology. This is later supplemented by superposition images of the thermography of the back of the board to substantiate the measured data. This introduction is important in order to better understand the thermal behavior and the underlying causes.
NVVDD is the power supply for the GPU cores themselves and is therefore the main power consumer. MSVDD supplies the GPU’s memory chips, while FBVDD is responsible for supplying the frame buffers (video memory). As NVVDD has the largest share of power consumption, this area is particularly susceptible to thermal problems. Closely spaced tracks in the board can lead to increased temperatures, especially when high currents are flowing. The electrical resistance of the tracks generates heat when current flows through them. In compact designs where there is little space for sufficient distribution of currents, this heat can concentrate at certain points and lead to hotspots. These hotspots not only impair the efficiency of the voltage converters, but can also have a negative impact on the service life of the components in the long term. Optimized distribution of the conductor paths and improved cooling are therefore crucial to minimize such thermal problems.
A brief foreword on the cooling of voltage converters and the affected surfaces
Efficient cooling of voltage converters (VRM) and coils is a key aspect of modern PCB design, especially for high-performance graphics cards, especially when you try to squeeze everything into a very small space, as is the case with NVIDIA. I don’t like this trend at all and it’s a shame that the board partners are submitting so dogmatically. VRMs are responsible for converting the voltage from the power source into the values required by the GPU or CPU. This voltage conversion naturally also generates heat due to its losses, which, if not dissipated effectively, in turn impairs the efficiency of the components (and generates even more heat) and can shorten the service life of the components (even in the immediate spatial environment). Coils, which act as part of the VRM circuit, are also affected by thermal stress as they also generate significant amounts of heat at high currents.
A commonly used means of cooling these components are thermal pads, which transfer the heat from the VRMs and coils to the heat sinks. However, it is not only the choice of the right heat conducting pad that is decisive here, but first and foremost the gap dimensions. Thermal pads with a thickness of 3 mm are often used, but can be counterproductive due to their increased thermal resistance. The thermal resistance of a heat conducting material depends not only on the thermal conductivity of the material itself, but also on the thickness of the pad. Thicker pads lead to a longer distance that the heat has to travel to dissipate, which significantly reduces the efficiency of heat transfer.
Another factor I just mentioned is the design of the cooling itself. Manufacturers often use large gaps between the components and the heat sinks, partly to make production more cost-effective. In addition to a generous tolerance limit, these larger gaps also allow manufacturers to flexibly switch to other models of capacitors or coils if necessary without having to fundamentally change the design. Although this can reduce manufacturing costs and increase flexibility, it leads to suboptimal thermal conditions. And then there is the choice of inappropriate pads. Let’s take a look at the exemplary measurement of a 3 mm pad:
Soft thermal pads only achieve their optimum performance when they are compressed to at least two thirds of their original thickness. In the example above, it is even less than 60 percent! The reason for this lies in the physical structure of these pads, which consist of a soft, compressible material that adapts to the surface irregularities of the components and heat sinks. Compression reduces the contact resistance between the surfaces by minimizing air pockets and irregularities. This compression promotes better heat conduction as the material becomes denser and direct contact between the surfaces is improved.
However, thermal pads are often used that are not suitable for lower contact pressures. These pads do not develop their full conductivity if the pressure is insufficient, which leads to a significantly increased thermal resistance. Many measurements have shown that the thermal resistance rises sharply if the pressure is too low, which significantly impairs the cooling performance. This is particularly problematic when manufacturers use pads that have a high nominal thermal conductivity but are not suitable for the actual mechanical conditions. The selection of the right pad must therefore take into account not only the material properties but also the contact pressure conditions in order to ensure effective heat transfer. However, this is precisely where manufacturers are called upon not to allow themselves to be chewed off by the OEM of such materials, but to make a targeted and merciless selection of what makes sense.
A hotspot that is actually superfluous
Even if a 3 mm thick thermal pad has a high nominal thermal conductivity (see measurement above), the resulting thermal resistance can significantly limit the cooling performance. In many cases, significantly thinner pads or thermal pastes, which fill a smaller gap between the component and the heat sink, would be a more effective solution. Insufficient heat dissipation can lead to overheating, reduced performance (thermal throttling) and ultimately to premature component failure. It is therefore important for manufacturers and end users alike to optimize the cooling of VRMs and coils not only in terms of material quality, but also the physical properties of the cooling solutions used. If this is not done, this is exactly what happens:
Now of course 80 °C is not the end of the world, hence my deliberate relativization in the introduction to today’s article, but: I measured this in a fully air-conditioned room (21 °C) in an open setup and NOT in a closed housing. This is exactly what I briefly checked using a glued-on K resistor in a smaller case in the form of the Thermaltake Tower 300. Apart from the fact that hanging graphics cards in such a case already have to struggle with the environment and themselves, around 100 °C is clearly too much in the long term and not a positive factor for achieving the desired life expectancy of the components. This is simply not acceptable!
The next example of a recently tested GeForce RTX 5080 from the more expensive shelf, whose VRMs are of course perfectly cooled, but where the pad has simply been positioned where it has been stuck for years without any real testing, shows that many manufacturers don’t seem to have a proper thermal plan for where it gets hot. Obviously nobody questioned this, even if it seems out of place in the truest sense of the word with the current card (because then it could have been omitted straight away):
Interim conclusion
In many cases, thermal hotspots can be further equalized and mitigated if the affected areas are also passively cooled via the back of the board. By attaching heat conduction pads to the rear, the heat can be distributed more evenly and also dissipated more efficiently via the backplate. This method therefore uses the additional surface area on the back of the board to improve heat dissipation and reduce the temperature load on the VRMs and coils. Backside cooling can make a decisive contribution to thermal stability, particularly in compact designs where the front of the board is already heavily utilized. This is exactly what will be discussed again today and will be part of my investigations. After turning the page!
120 Antworten
Kommentar
Lade neue Kommentare
Veteran
1
Veteran
Urgestein
1
Veteran
1
Mitglied
Neuling
Veteran
1
1
Veteran
1
Urgestein
Urgestein
Neuling
Urgestein
Urgestein
Alle Kommentare lesen unter igor´sLAB Community →