Question RX 5700 XT Undervolting - Confusing results

zig13

Neuling
Mitglied seit
Nov 17, 2019
Beiträge
3
Punkte
1
Standort
Folkestone
Got myself a RX 5700 XT and thought I'd spend the weekend tweaking it to find a nice sweet spot.
The stock setting reported by WattMan was 2054MHz@1173mV. Stock TimeSpy score was 8702.

My first mistake was using the Mankind Divided benchmark to test stability. I had the card 'stable' (3 subsequent benchmark runs) at 1990MHz@1020mV with almost 2FPS more than stock, -3°C off Max Edge Temp and -4°C off hotspot only for it to crash 3DMark Timespy :/
I spent the rest of yesterday carefully lowering the speed and/or increasing the voltage and got numbers that would pass TimeSpy GPU tests but would heartbreakingly fail a TimeSpy stress test (1984MHz@1020mV scoring 8951 & 1970MHz@1023mV scoring 8885).

This morning I started fresh by reloading the 1990MHz@1020mV profile and lowing the clock to 1900MHz letting WattMan reduce the voltage maintaining ratio which resulted in 986mV.
This passed TimeSpy with a score of 8745 and passed the stress test too (99.7 frame rate stability) but had worse-than-stock Mankind Divided results.
1900MHz@1V is a common underclock/undervolt so I was happy to have beaten that but not willing to rest on my laurels.
I upped the clockspeed to the point that WattMan scaled voltage up to 1V reaching 1934MHz@1V. This passed TimeSpy GPU tests with a disappointing 8750 (I think my notes are messy) but failed stress test.
1930MHz@1V however passed Time Spy stress test with 99.9% stability.

I thought next I would target 1950MHz and this is where things got properly confusing
1950MHz@1010mV - TimeSpy GPU: 8870, Stress Test: FAIL
1950MHz@1012mV - TimeSpy GPU: 8855, Stress Test: FAIL [Here +2mV is detrimental to score]
1950MHz@1013mV - TimeSpy GPU: 8905, Stress Test: FAIL [Here +1mV is super beneficial to score]
1950MHz@1015mV - TimeSpy GPU: 8887, Stress Test: 99.7% stability PASS, Temp: 58°C, HSpot: 75°C

Naturally at this point I was like "that'll do" and started going through the 8 built-in game benchmarks and 6 synthetics I used to test the card at stock. Results were very close to stock being either a little worse or a little better but what stuck me was how inconsistent they were between runs and that they tended to trend downwards. For example my stock Sleeping Dogs (High preset) results were 113.60, 113.40 & 113.10 but 1950MHz@1015mV gave results of 116, 113.7, 114.7 and 115.

I reasoned that going up one more mV might improve the stability and maybe even performance.
1950MHz@1016mV - TimeSpy GPU: 8876, Stress Test: FAIL (not just crash - no output had to restart PC)

I get that additional voltage increases temperature but how can 1mV extra (still remaining way below stock settings) completely destabilize a card?
I'm really at wits end. This is SOO much more complicated, stressful, difficult, and time consuming than I expected. I'm willing to persevere as it is still kinda fun and there's no risk of damaging my card but I feel I'm at the point where I need to reach out and get some advice. I have of course tried Googling but what works for someone else's silicon isn't going to work for mine and I keep getting overclocking rather than underclocking/undervolting guides/information.

So please - what am I missing and where am I going wrong?
 

ShieTar

Veteran
Mitglied seit
Jul 21, 2018
Beiträge
229
Punkte
28
Stop trying to change voltages in 1 mV steps, the controls are not that accurate. 25 mV steps are a good approach.

With your comparative tests you are detecting mostly noise, and partially thermal changes in your system, the single digit changes in the voltage are completely irrelevant compared to those.
 

zig13

Neuling
Mitglied seit
Nov 17, 2019
Beiträge
3
Punkte
1
Standort
Folkestone
Stop trying to change voltages in 1 mV steps, the controls are not that accurate. 25 mV steps are a good approach.

With your comparative tests you are detecting mostly noise, and partially thermal changes in your system, the single digit changes in the voltage are completely irrelevant compared to those.
I am aware the voltage control does not match up with actual maximum voltage used but it seems proportionally linked. At At 1988@1020 measured voltage hit 1025 max and at 1950@1015 measured voltage hit a max of 1018 while using 3 different programs.

Stock TimeSpy runs were within 3 points of eachother. Does how noisy the results are now (if the small voltage changes are irrelevant) indicate instability?

If noise and thermal changes can turn a Pass into a Fail then it would seem that TimeSpy Stress Test is a poor test of stability. How would you suggest I test stability instead?
 

zig13

Neuling
Mitglied seit
Nov 17, 2019
Beiträge
3
Punkte
1
Standort
Folkestone
Yes. I used the more power tool to adjust the card fans while maintaining the zero RPM mode - it's really cool for that. Adjusting target temperature and whatnot seems a better way of controlling fans than setting a fan curve override.
The two main optimizations/tweaks suggested by the article are set on my card at stock (i.e. SoC voltage at 1050 mV and TDC Limit at 174). Igor suggests lowering the TDC limit by 5 at a time in the case of temperature problems but I found this to be ineffectual to temperatures and detrimental to other metrics. My temperatures are pretty good anyway.
 
Oben Unten