I have now looked into the problem myself, but due to the lack of a suitable EVGA card I was only able to check it with other cards and did not find any real problems – except for the extremely high FPS numbers due to a missing limiter. However, it was possible for me to tap one or the other colleague directly in the Asian R&D departments of the biger AICs, because in the meantime even NVIDIA deals with the occurred problem. Without spoiling everything in the third sentence, we can assume that this total failure looks like a pure design problem at EVGA and does not affect the other manufacturers in this form, as long as they do not use the same stupid design. So the all-clear is given and why this is so, I will now explain to you.
The fact that the damage is said to have occurred only on certain EVGA cards of a special design certainly narrows the circle of possible causes even further. Remember the EVGA GTX 1080 FTW with AVX cooler, extreme heat problems with the memory (module M7), my saving pad mod and the following ICX design with own temperature sensors (article by me via TH US)?
And that’s probably where the new design that was introduced at the time comes into play now. As I was able to find out, it is not an area of the voltage converter that is affected by the failure, but the “Fan Control IC”, i.e. the chip for the actual fan control, which is said to be completely burnt out in the worst case. We do remember that EVGA is also working with various additional temperature sensors on the board to better adjust the cooling, based on my research at the time. This is also evidenced by submissions from the aggrieved parties who reported extreme fan whine.
In the days of a GeForce GTX 1080 FTW and the lack of control options, this may have made sense, but with a GeForce RTX 3090, this solution is actually now completely obsolete. TMON, i.e. the temperatures of the Smart Power Stages, can be read out exactly in microsecond intervals and the GDDR6X RAM also offers the possibility of a direct monitoring. So no manufacturer is forced to perform such pull-ups in order to design a suitable fan controller. NVIDIA has learned a lot and solves the problem almost perfectly without such additional gimmicks.
In addition, there are also countless safety mechanisms on the Turing and Ampere cards that monitor currents and temperatures. So you wouldn’t need this solution at all, because measuring in the chip (NVIDIA) is still more accurate than beside the chip (EVGA), especially since the sensors used are quite sluggish. But why the chip for the fan control should burn out now, probably only the board layouters at EVGA themselves know. It is interesting and a good indicator in this context that NVIDIA has already asked the other board partners whether they also use such designs.
Of course, marketing will be reluctant to give up such supposed unique selling points of the top models, which have been burned into the brains of potential buyers with a lot of effort over the years. So they keep using it, even if it hardly offers any real added value anymore. On the contrary, in such extreme situations as Amazon’s new game “New World”, it then comes to the self-destruction of an actually even unnecessary feature, where a chip then runs amok. Therefore one will probably be able to give the all-clear at this point for all those who do not use such a thing. Sometimes less is more.