Actually, everything looked so good when I recently started the big roundup of workstation graphics cards and also published the first part with all four new Radeon Pro graphics cards. The cards didn’t perform badly, on the contrary. But I already noted things during benchmarking back then that I definitely wanted to re-test. Especially during a longer rendering process with HIP, but also a 3D loop to determine the power consumption, I already had isolated blackscreens with this card, but initially blamed it on a possibly defective DP cable (which then really turned out to be a wobbly candidate).
But something was still different. The normal dropouts and black screens are usually either accompanied by a fan whine or a reboot. Or everything freezes. Here, however, the computer continued to run normally, but without a picture on the graphics card. A test with a second screen on the iGP showed me that the system was still running. Everything could be shut down normally and then restarted. This of course postpones the follow-up with the four single cards again, because I first had to take care of the Radeon Pro W7600. Because one thing is also certain: Reliability is the focus of workstation graphics cards and exactly that wasn’t given here. Keyword reliability.
Protocol of the blackout
With the necessary trust in God, I was able to reproduce the blackout with various applications. No matter if Lightwave, Horizon Zero Dawn or Furmark, at some point it went dark. By the way, the card didn’t last 6 minutes with Furmark, so I opted for the very hard, but shorter tour for the record. I tested the whole thing with an internal beta of HWInfo64 (thanks to Martin Malik for participating!) and learned, for example, that the SMU outputs four different temperatures per memory module (for the 2 GB modules), although they actually don’t have any registers for it officially. But AMD delivers something there, that only times as info besides.
The average of the memory temperatures was 94 to 95 °C at the time of the crash. That is already at the upper limit of the specified temperature window, but it is not so critical that it could lead to a blackout. We also see the other temperatures, which are very high but not yet life-threatening. And now? Something had to be wrong.
I looked at the fan curve afterwards, because the fans usually stick to Tjunction, which is the hotspot of the GPU. However, the whole thing was quite strange. So, for a better understanding, I made radiometric videos of the rear side during the heating process and connected them with the charts. Also worth noting is the delta of around 5 Kelvin between the PCB measurement area and the GPU temperature, because that’s usually a maximum of 1 Kelvin, not more.
First, once again, the video with the temperatures up to the blackout:
And now the whole thing again as a comparison between GPU temperature and fan speed. You don’t have to understand this curve, though:
So I have no choice but to disassemble the card. Let’s go!
91 Antworten
Kommentar
Lade neue Kommentare
Veteran
Mitglied
Mitglied
Urgestein
Urgestein
Veteran
1
Urgestein
1
1
Veteran
Veteran
1
Veteran
1
Veteran
Urgestein
Urgestein
Urgestein
Alle Kommentare lesen unter igor´sLAB Community →