We had already recently reported about buildzoid and the RTX 3090 GPU he revived, which had previously stopped working for one of his Twitter followers while playing the MMO New World. Now, the Youtuber has published a video of over 1 hour in which he speculates on possible causes for the spontaneous shutdown and death of Nvidia Ampere GPUs in Amazon’s new MMO with measurement results of this graphics card and various datasheets.
First, he corrects two mistakes from older videos. The layout of the board of the Gigabyte RTX 3090 Eagle OC is very close to the reference design of Nvidia, which uses 9 or 10 power stages for the GPU power supply depending on the version. He further corrects that these Vcore phases are actually powered by the UP9511R PWM controller, which is analog and can only power 8 phases. As a result, 2 of the 10 power stages are installed together as one phase, as will be explained in detail later.
Using the data sheet of the PWM controller from UPI Micro, buildzoid first explains how the “Total Output Current Protection”, or OCP for short, works. Although not all variables for calculating the threshold could be determined exactly, it would be between 642 A conservatively and 1368 A realistically. This is typical for Nvidia GPU designs, which cheat their load peaks past the TDP. This is measured on Nvidia cards by means of “shunt” resistors, which are located before the capacitors and inductors of the input voltage and thus can only measure a filtered average.
The 60 Ampere power stages installed on the Gigabyte card are specified for short-term load peaks of 80 Ampere, but the effect of such peaks on the service life of the components is questionable. And even with the 60 amps specified for continuous operation, the waste heat with a total of 90 W would be too large for the installed cooling solution. Effectively, however, this would also mean that the OCP would effectively never intervene, and before it did, it would probably be more likely that the voltage regulation components would go up in smoke.
The fact that Nvidia Ampere cards can allow themselves much higher currents for short periods of time and are effectively only limited in average power consumption is also consistent with Igor’s measurements from the launch review, with nearly 600 W at peak at 350 W TDP.
Since the OCP would most likely never intervene, the Youtuber goes on to discuss the “Channel Current Limit” feature, which he says is a special feature on this VRM. Effectively, the individual phases are limited to a maximum current, above which the phase is throttled. While phases with two power stages are limited to 160 A, there are individual phases with both an 80 A limit and a 130 A limit. Why there are different limits for identically built phases, he cannot explain, says buildzoid.
The consequence of throttling the current would inevitably be a brief drop in the voltage supply to the GPU, which would lead to instability or even triggering of the “Under Voltage Protection”. The latter is a further protective mechanism that switches off the VRM if the output voltage is too low. The result of this would be that the GPU itself would shut down, while the rest of the components like the fan controller would go into hysteresis. The consequence of this would be a black screen and 100% fan speed, which would match the various reports of New World victims. Only a complete reboot of the system could reset the triggered protection function.
That Ampere GPU would become unstable due to too low voltage or too high clock speed was already suspected shortly after the launch. The associated “POSCAP drama” has been sufficiently illuminated and what was the real cause, a faulty boost algorithm, insufficient chip quality, too sparse capacitor layouts, borderline VRM configuration, or a combination of all of the above, the minds are still arguing today. However, it is clear that Nvidia’s fix with a subsequent driver update also noticeably throttled the short-term load peaks of Ampere GPUs.
Even if the definite cause still can’t be determined exactly, Nvidia’s decisions in the implementation of the GPU power supply are questionable or not comprehensible in some places. Also why the spontaneous demise in New World would mainly affect models of the manufacturer Gigabyte, he could not explain more precisely without further ado. However, a possible explanation would be a different implementation of the Nvidia reference design from manufacturer to manufacturer.
Since the RTX 3090 GPU he repaired was only on loan and it was to be returned to the owner in working order, buildzoid could not run further tests without risking renewed damage. Ideally, though, he would use a victim video card and an oscilloscope to measure the current draw while New World would run on the hardware. But since this would lead to a potential demise of the card, and since they are already anything but cheap to buy, he currently lacks the means for further investigations.
Update 26.10.2021, 12:40 Uhr:
In a new additional video buildzoid shows off the power consumption behavior in the applications Furmark and Unigine Superposition. In the former, a well known very power hungry stress test, the RTX 3090 manages to reach the 350 W TDP, despite automatically throttling back to about 1200 MHz and 0.72 V GPU Voltage. In Unigine Superposition with “8K” resolution the monitoring GPU-Z even shows short peaks of up to 412 W, even still when the stock power limit of 350 W remains unchanged.
This would point to Ampere GPUs being naturally very power hungry and even more so when games manage to utilize a big part of the GPUs CUDA cores. Similar to the Superposition Benchmark at high resolutions, the behavior in New World could simply mean very good optimizations by the game developers leading to very high utilization of the GPU. Consequently future games with ever increasing demands and even more efficient use of the hardware resources could lead to similar consequences as New World. So this would mean Amazon’s new MMO was just a harbinger for what Nvidia Ampere GPU owners could soon face.
19 Antworten
Kommentar
Lade neue Kommentare
Urgestein
Veteran
Urgestein
Urgestein
Veteran
Urgestein
Urgestein
Veteran
Moderator
Urgestein
Veteran
Urgestein
Urgestein
Mitglied
Veteran
Urgestein
Veteran
Urgestein
Urgestein
Alle Kommentare lesen unter igor´sLAB Community →