Question XFX 5700 XT Cards

Anthony

Mitglied
Mitglied seit
Nov 28, 2020
Beiträge
22
Bewertungspunkte
6
Punkte
2
Preface: @Mini_Me has demonstrated profound knowledge when modding the 5700 XT series of graphics cards, and for those looking for an awesome tutorial, please reference https://www.igorslab.de/community/threads/gigabyte-5700-xt-bios-mod-fails.3304/.

Story: At the present time, I own 40 XFX 5700 XT video cards; I have 40 more on order to make a total quantity of 80 cards. With this many cards, squeezing out an additional 1 MegaHash (MH) from each card would translate into +80MH. Out of the box, the stock bios on these XFX 5700 XT cards reflects about 51MH per card. With bios modifications and tweaks, it is possible for me to obtain between 54MH-56MH per card.

Configuration: I have 5 rigs in operation and soon I will have 5 more in operation. Each rig uses 8 cards. I am using HiveOS to manage my rigs.

The importance of bios modifications: Basically, using bios modifications allows each card to provide a better hashrate -- and as hashrate increases, potential ETH mined increases. (More Hash == More $$). Bios modifications also change how each card consumes electricity -- and aside from the initial cost of computer components, electricity is the second greatest cost when it comes to mining.
  • Stock Bios: 80 Cards x 51MH from each card = 4.080GH
  • Modified Bios: 80 Cards x (54MH || 56MH) = (4.320GH || 4.480GH).
The delicate balance in bios modification reflects a specific relationship between: 1.) Power Consumption, 2.) Temperature, 3.) Hashrate, and 4.) Stability vs. Instability. In my opinion, the goal of proper bios modification means that: (a.) each card consumes as little power as necessary, (b.) each card remains within "safe" temperatures to prevent card damage, (c.) each card produces the maximum hashrate, (d.) each card remains stable, which is to say that the card does not produce "invalid" shares or other errors that get in the way of making money.

The above mentioned thread link reflects a dialogue between two members, @Mini_Me and @AlleyCat. The thread is several pages long because it reflects a series of tests (trials, errors, and new trials with new errors, etc.) with different bios values and with different values in HiveOS. There are multiple variables in play (both on the bios tweaking side with MPT and RBE and on the software tweaking side in HiveOS), and changing one variable influences other variables. The thread is so long because many different settings were tried out in order to achieve the best possible result for specific card manufacturers, in this case manufacturer Sapphire and manufacturer Gigabyte.

This post in particular captured my attention: https://www.igorslab.de/community/threads/gigabyte-5700-xt-bios-mod-fails.3304/post-79315.

RBE modified values in use:
Option 1 (Preferable and Recommended), applying Apple Inc. vram timings straps linked below, once for MT61K256M32 Micron, and Save the vbios after that load the saved vbios and once again apply the straps for K4Z80325BC Samsung if it is existed and save the vbios again,
https://www.igorslab.de/community/a...-mt61k256m32_gddr6_optimized_timings-zip.6544

MPT modified values in use:
Features Tab:
PPTable Features -> Feature Control = Nothing done. Left at default.
Overdrive Features = All Boxes Checked.

Overdrive Limits Tab:
GFX Maximum Clock = 1550
Memory Maximum Cock = 1000
Power Limit Maximum = 0
Power Limit Minimum = 0
Memory Timing Control = 1
Fan RPM Maximum = 3500
Fan RPM Minimum = 1100
Fan Acoustic Limit RPM Maximum = 3500
Fan Acoustic Limit RPM Minimum = 1100
Zero RPM Control = 0

Power and Voltage Tab:
Maximum Voltage GFX = 900
Maximum Voltage SoC = 1150
Minimum Voltage GFX = 700
Minimum Voltage SoC = 750
Power Limit GPU = 140
TDC Limit GFX (A) = 140
TDC Limit SoC (A) = 14

Frequency Tab:
GFX Maximum = 1400
GFX Minimum = 300
SoC Maximum = 1267
SoC Minimum = 507
Memory DPM 0 = 100
Memory DPM 1 = 500
Memory DPM 2 = 625
Memory DPM 3 = 960

Curve Tab:
(Left Alone, no change).
AVFS (GHz->V)
Override box is not ticked.
a = 0.017810
b = -0.047280
c = 0.054020
StaticVoltageOffset (GHz->V) = 0.000000

Fan Tab:
PWM Minimum = 15
Fan Acoustic Limit RPM = 1550
Fan Throtteling RPM = 3200
Fan Maximum RPM = 3500
Fan Target Temperature = 85
Fan GFX Clock = 800
Zero RPM Enable Box is ticked.
Stop Temperature = 60
Start Temperature = 68

As you can see, in order to achieve stability (prevent invalid shares, rejected shares, and phoenixminer reboots), I have to run most cards at 1375 core. Other cards I had to reduce to 1325 core. Some cards run at memory 930, some at 905, some at 900, and some at 890.

I know my cards can give better performance. Even though I have reduced power values to reduce # of invalid-rejected shares, phoenixminer still reboots from time to time, and sometimes, I will have a card that will "stop" and disappear ... which causes the phoenixminer rebooter. For these cards, I reduce values.

Any advice from master @Mini_Me would be appreciated.


RIG #1
RIG1.png

RIG #2
RIG2.png

RIG #3
RIG3.png

RIG #4
RIG4.png

RIG #5
RIG5.png
 
Greetings,

Recently I was having problems with one of my rigs. The issue that I encountered was that:
1. The entire rig would run normally for hours, then it would just shut down - no explanation.
2. The rig would then reboot but then refuse to start mining even after power cycling the unit.

I am using the exact same Bios mod across all my cards, but there are some cards (Micron Memory) that pull more watts than others (Samsung). I noticed that Micron cards would pull between 115W-125W, whereas the Samsung cards are pull ~85W of power to achieve the same hashrate (~56MH) per card.

My theory was that the Micron cards were pulling too much power and I therefore limited the amount of power per card to 95W in HiveOS. Now you see in the attached image that some Micron cards (GPU #1) are pulling the 95W and still achieving the 56MH rate and that other Micron Cards (GPU #4) are respecting the 95W limit but are generating less hash (~40MH).

Even though (a.) all the settings in HiveOS are equal and (b.) the Bios Used in all cards is the same, GPU#4 is doing something different -- therefore I suspect it is the single card crashing the entire rig.

Even with the 95W power limitation, the rig will run for about 1 HR before rebooting, but at least, the rig reboot and the mining auto-starts like it is supposed to. Before the 95W, the rig would reboot but would not autostart.

It almost seems like I need to have a RBE-MPT configuration for Micron and a different one for Samsung.

I have 5 Rigs with 8 XFX cards in operation (40 XFX Cards). The other rigs do not give me any trouble at all and can run for days without error. And yes, I have Micron cards on other rigs using the exact same Bios and they have no issue.

I only have issue with this one card on this one rig.

MPT Settings:
Features:
Zero RPM = disabled; All other values are enabled.
Feature Control:
DS_GFXCLK, DS_LCLK, DS_UCLK, USB_PG, APCC_PLUS, GTHR, ACDC, VR1HOT, GFX_DCS, RM, LED_DISPLAY, OUT_OF_BAND_MONITOR, TEMP_DEPENDENT_VMIN, MMHUB_PG, FEATURE_SPARE_** = disabled. All other values are enabled.
Overdrive Limits:
1440 GFX Maximum Clock
950 Memory Maximum Clock
0 Power Limit Max
0 Power Limit Min
1 Memory Timing Control
3500 Fax Max
1100 Fan Min
3500 Fan Acoustic Limit Max
1100 Fan Acoustic Limit Min
0 Zero RPM Control
Power and Voltage:
1050 Max GFX
1050 Max SoC
750 Min Voltage GFX
750 Min Voltage SoC
150 Power Limit GPU
128 TDC Limit GFX A
14 TDC Limit SoC A
Frequency:
1260 GFX Max
300 GFX Min
1267 SoC Max
507 SoC Min
100 Mem DPM 0
500 Mem DPM 1
625 Mem DPM 2
900 Mem DPM 3
Curve:
Override = Disabled.
0.017810 a
-0.047280 b
0.054020 c
StaticVoltageOffset = 0
Fan:
15 PWM Min
1550 Fan Acoustic Limit
3200 Fan Throtteling
3500 Fan Max
85 Fan Target Temp
800 Fan Taqrget GFX Clock
Zero RPM Enable = disabled
50 Stop Temp
60 Start Temp

RBE:
Apple Timings Used for both Samsung and Micron memories. Verified.

RIG3-GPU4-Strange.png
 
Greetings,

Recently I was having problems with one of my rigs. The issue that I encountered was that:
1. The entire rig would run normally for hours, then it would just shut down - no explanation.
2. The rig would then reboot but then refuse to start mining even after power cycling the unit.

I am using the exact same Bios mod across all my cards, but there are some cards (Micron Memory) that pull more watts than others (Samsung). I noticed that Micron cards would pull between 115W-125W, whereas the Samsung cards are pull ~85W of power to achieve the same hashrate (~56MH) per card.

My theory was that the Micron cards were pulling too much power and I therefore limited the amount of power per card to 95W in HiveOS. Now you see in the attached image that some Micron cards (GPU #1) are pulling the 95W and still achieving the 56MH rate and that other Micron Cards (GPU #4) are respecting the 95W limit but are generating less hash (~40MH).

Even though (a.) all the settings in HiveOS are equal and (b.) the Bios Used in all cards is the same, GPU#4 is doing something different -- therefore I suspect it is the single card crashing the entire rig.

Even with the 95W power limitation, the rig will run for about 1 HR before rebooting, but at least, the rig reboot and the mining auto-starts like it is supposed to. Before the 95W, the rig would reboot but would not autostart.

It almost seems like I need to have a RBE-MPT configuration for Micron and a different one for Samsung.

I have 5 Rigs with 8 XFX cards in operation (40 XFX Cards). The other rigs do not give me any trouble at all and can run for days without error. And yes, I have Micron cards on other rigs using the exact same Bios and they have no issue.

I only have issue with this one card on this one rig.

MPT Settings:
Features:
Zero RPM = disabled; All other values are enabled.
Feature Control:
DS_GFXCLK, DS_LCLK, DS_UCLK, USB_PG, APCC_PLUS, GTHR, ACDC, VR1HOT, GFX_DCS, RM, LED_DISPLAY, OUT_OF_BAND_MONITOR, TEMP_DEPENDENT_VMIN, MMHUB_PG, FEATURE_SPARE_** = disabled. All other values are enabled.
Overdrive Limits:
1440 GFX Maximum Clock
950 Memory Maximum Clock
0 Power Limit Max
0 Power Limit Min
1 Memory Timing Control
3500 Fax Max
1100 Fan Min
3500 Fan Acoustic Limit Max
1100 Fan Acoustic Limit Min
0 Zero RPM Control
Power and Voltage:
1050 Max GFX
1050 Max SoC
750 Min Voltage GFX
750 Min Voltage SoC
150 Power Limit GPU
128 TDC Limit GFX A
14 TDC Limit SoC A
Frequency:
1260 GFX Max
300 GFX Min
1267 SoC Max
507 SoC Min
100 Mem DPM 0
500 Mem DPM 1
625 Mem DPM 2
900 Mem DPM 3
Curve:
Override = Disabled.
0.017810 a
-0.047280 b
0.054020 c
StaticVoltageOffset = 0
Fan:
15 PWM Min
1550 Fan Acoustic Limit
3200 Fan Throtteling
3500 Fan Max
85 Fan Target Temp
800 Fan Taqrget GFX Clock
Zero RPM Enable = disabled
50 Stop Temp
60 Start Temp

RBE:
Apple Timings Used for both Samsung and Micron memories. Verified.

Anhang anzeigen 10149
Greetings and welcome,

I believe the ACDC is enabled by default therefore please do not disable it,

And as a start increase the power limit for this card as below and test,

TDP 155 W
TDC GFX 132 A
 
Greetings and welcome,

I believe the ACDC is enabled by default therefore please do not disable it,

And as a start increase the power limit for this card as below and test,

TDP 155 W
TDC GFX 132 A
Ok, I created a special Bios just for this card where:
MPT -> Power and Voltage -> Power Limit GPU = 155 and TDC Limit GFX (A) = 132.

This screenshot reflects GPU#4 with the Bios Change and with PL = 95.
R3G4.png
I then changed PL=0 and did a miner restart. The miner restarted and all cards started to mine, however, GPU#4 was still showing a hashrate reflecting the 95W limit (~40MH) ... so I did a reboot. After rebooting with PL=0, the card decided that it wanted ~120W of power to achieve 56MH -- contrasting the 56MH rate achieved with PL=95W on the other cards.

This screenshot reflects GPU#4 with the Bios change and with PL=0.
R3G4-1.png

So far, the rig has been running for 20 mins with no errors, no reboots, and no random cards disappearing. I'll keep watching it to see if something happens.

Question: If I were to set TDP=100 W and leave everything else alone, would the card pull no more than 100W of power?
 
Ok, I created a special Bios just for this card where:
MPT -> Power and Voltage -> Power Limit GPU = 155 and TDC Limit GFX (A) = 132.

This screenshot reflects GPU#4 with the Bios Change and with PL = 95.
Anhang anzeigen 10184
I then changed PL=0 and did a miner restart. The miner restarted and all cards started to mine, however, GPU#4 was still showing a hashrate reflecting the 95W limit (~40MH) ... so I did a reboot. After rebooting with PL=0, the card decided that it wanted ~120W of power to achieve 56MH -- contrasting the 56MH rate achieved with PL=95W on the other cards.

This screenshot reflects GPU#4 with the Bios change and with PL=0.
Anhang anzeigen 10186

So far, the rig has been running for 20 mins with no errors, no reboots, and no random cards disappearing. I'll keep watching it to see if something happens.

Question: If I were to set TDP=100 W and leave everything else alone, would the card pull no more than 100W of power?
It is highly recommended not to cap the power as there is an overshoot that occasionally occurs, the power limit in the guide is really a balanced one.
 
For information and research purposes
  1. I have attached the modified bios that I use across all of my cards. Yes, I use one modified bios across all 40 of my cards. I do not know if this is the best approach, but it seems to be working okay.
If you have suggestions for further improvements, let me know.
 

Anhänge

  • XFX.5700XT.150W.MINING.rom
    1 MB · Aufrufe : 132
Hello guys!!! Regards.

I would appreciate it if you could help me with my rig; I am a bit new to this.

I have 4 MSI RADEON RX 5700 MECH OC (not XT) and I have followed the guide that kindly share with us @Mini_Me on page # 32 of this post https://www.igorslab.de/community/threads/gigabyte-5700-xt-bios-mod-fails.3304/page-32.

My motherboard BIOS updated to the latest version, two coolermaster 750w gold plus power supplies, PCIe are configured in Gen3 and the rig is in a room with good air circulation and an average ambient temperature of 27 ° Celsius ; my BIOS doesn't have the option to set R6 (Render Standby).

Before the vbios update, I was having many reboots with dead GPU errors and very high memory temperatures; after BIOS flashes and set OC values for "For lower temperature and power saving" (1370MHz, 760mV, 900MHz, 780mV, 1290mV) I got some stability, but dead GPU errors and reboots from time to time time (2 or 3 times a day) Sometimes I have 24 hours without dead GPU errors and sometimes I have 2 errors (and reboots) in an hour; the temperatures of the memories dropped a bit but I think they are still high.

I am looking to avoid these errors and (if possible) reduce power consumption and memory temperatures, especially from a card that is always at 86 ° C.
Captura de pantalla 2021-01-25 a las 8.49.26 a.m..png
Captura de pantalla 2021-01-25 a las 8.43.01 a.m..png

I have configured a global default overclocking template because I could not find the Core State (DPM) and Memory State (MDPM) options in the rig regular overclocking tab to set them to 1; but I don't know if they are running in this state because when I run the command "amd-info" I see "Core state: 2 Mem state: 3"

1611690637377.png

I have read all the posts of the thread "GIGABYTE 5700 XT Bios mod fails" and tried many things but with no luck.

Thanks in advance!!

(English is not my language ... sorry if I have some mistakes in my writing)
 
Zuletzt bearbeitet :
Sorry... I've asked twice and I'm not finding the "delete" option
 
Zuletzt bearbeitet :
@Anthony, Do you think that your bios can help me with my problem? I'm thinking in try it with the GPU that has higher mem temps
In my experience, Micron pulls more watts and therefore has higher temperatures. You could lower the watts to the card, but you'll also see a reduction in the hashrate. I don't think my bios will help in that regard; actually, I don't think any bios will help in that regard.
 
In my experience, Micron pulls more watts and therefore has higher temperatures. You could lower the watts to the card, but you'll also see a reduction in the hashrate. I don't think my bios will help in that regard; actually, I don't think any bios will help in that regard.
Thanks for your quick answer @Anthony... In your experience, do you think that with my mem temps I'm going to have hardware problems like degradation in the near future? I am thinking of following the recommendations of @Mini_Me and shutdown the rig for one hour every two days
 
Thanks for your quick answer @Anthony... In your experience, do you think that with my mem temps I'm going to have hardware problems like degradation in the near future? I am thinking of following the recommendations of @Mini_Me and shutdown the rig for one hour every two days
With my bios, I try to adjust settings so that the temperature never exceeds 74 on the mem. And for the most part, I'm successful. In my case, I just let the cards run but I have air conditioning and floor fans helping to move the air around. I don't want to give you false hope. As with all things electronic, it can be working fine for days weeks or months, and then decide to die all of a sudden without warning.

I think that if you keep your temps as low as you can (within manufacturer's spec), you'll be okay. But consider that we are using graphics cards in a way that graphics cards were not designed to be used in the first place. We are all doing this for money -- balancing the risk-reward.
 
I have move the rig to a room with air conditioning set at 24° and a floor fan and now I have 3 cards at 78-80° and one card at 84° Celsius. I'm going to give your bios a try with the hottest one. The bios that I have in my cards is a modified one following the page 32 guide of @Mini_Me; I haven't followed the guide in page 15 (for cards with sensitive (restricted) power curve) because it envolves using windows in the rig and I have HiveOS, but if I have to do it for better mem temps I I will do it. Will this guide help me with mem temps? I don't understand yet the basics of bios mod, I jus follow the gides; regaring the OC settings I have tried lowering the memory voltages but I get an unstable system (I'm currently at 1290mV).
At this time I have 48 hours without GPU dead errors with this values:

1611754629741.png

if I have to sacrifice a little hashrate to get safe mem temps I will do it; What values do you guys suggest to archieve this? If I reduce memory voltage I understand that I have to reduce mem clocks but In what intervals do I have to do it?

Thanks in advance
 
I have move the rig to a room with air conditioning set at 24° and a floor fan and now I have 3 cards at 78-80° and one card at 84° Celsius. I'm going to give your bios a try with the hottest one. The bios that I have in my cards is a modified one following the page 32 guide of @Mini_Me; I haven't followed the guide in page 15 (for cards with sensitive (restricted) power curve) because it envolves using windows in the rig and I have HiveOS, but if I have to do it for better mem temps I I will do it. Will this guide help me with mem temps? I don't understand yet the basics of bios mod, I jus follow the gides; regaring the OC settings I have tried lowering the memory voltages but I get an unstable system (I'm currently at 1290mV).
At this time I have 48 hours without GPU dead errors with this values:

Anhang anzeigen 10444

if I have to sacrifice a little hashrate to get safe mem temps I will do it; What values do you guys suggest to archieve this? If I reduce memory voltage I understand that I have to reduce mem clocks but In what intervals do I have to do it?

Thanks in advance
For this particular case, probably the easiest way would be to define the power limit by card. Right now, according to your image, cards are pulling between 115W-121W, resulting in temperatures from 78-84 degrees, which correspond to 55MH. I would attempt to limit power to 100W per card, for example. (You would do this in the overclocking template in HiveOS.)

Once you do this, you'll see MH drop, but you should also see the temperature drop. Since this is a software-side implementation, there's always a chance that the software will glitch and the bios values will be used. But for a quick and dirty implementation, start off with defining the power limit. Increase or decrease the power limit until you reach the desired temperature.

From there, @Mini_Me or some other guru might recommend bios adjustments to make the power savings a little more permanent.
 
Zuletzt bearbeitet :
For this particular case, probably the easiest way would be to define the power limit by card. Right now, according to your image, cards are pulling between 115W-121W, resulting in temperatures from 78-84 degrees, which correspond to 55MH. I would attempt to limit power to 100W per card, for example. (You would do this in the overclocking template in HiveOS.)

Once you do this, you'll see MH drop, but you should also see the temperature drop. Since this is a software-side implementation, there's always a chance that the software will glitch and the bios values will be used. But for a quick and dirty implementation, start off with defining the power limit. Increase or decrease the power limit until you reach the desired temperature.

From there, @Mini_Me or some other guru might recommend bios adjustments to make the power savings a little more permanent.
The power limit now is universal and permanent and It is not recommended to lower it more than this as each gpu has different fan size, number and power draw.
 
@Mini_Me, please check if my logic and procedure is good and correct and perhaps make recommendations to refine:

Tuning the Cards
  1. Goal #1: To have all cards run "stable" at MEM>=910, regardless of VDDCI value. (If Samsung Memory, MEM=910. Elseif Micron Memory, MEM=930).
  2. Goal #2: To fine tune the value of VDDCI to accomplish Goal #1.
Assumptions
  • All cards are flashed with the same Bios modifications.
  • All rigs use TeamRedMiner
Information: HiveOS name translations (Names in OC Template vs. Names on Status Screen)
  • Core Clock, Mhz = CORE
  • Core State, index = DPM
  • Core Voltage, mV = VDD
  • Memory Controller Voltage, mV = VDDCI
  • Memory Clock, Mhz = MEM
  • Memory State, index = MDPM
  • Memory Voltage, mV = MVDD
Starting values: HiveOS OC Template Constants:
  • Core=1420
  • DPM=1
  • VDD=790
  • VDDCI=850 (This is what we will try to determine for each individual card.)
  • MVDD=1350
  • IF SAMSUNG THEN MEM=910; ELSEIF MICRON THEN MEM=930.
  • MDPM=1
Errors we are looking for:
  1. Invalid or rejected shares on individual cards.
  2. Individual cards that "drop out" completely after a short time in operation.
  3. Cards that are missing as an empty red box in Hive instead of a solid green box.
  4. Individual cards that go from operation @ > ~80W to dropping out of operation at < ~30W. (You will have to browse individual cards within each rig and notice the changed power values for individual cards, especially in cases of random reboots when no invalid shows. Before the rig reboots and refreshes, quickly observe the power values to determine which card is at fault.)
  5. Cards that cause a "reboot" of the rig without producing invalid or rejected shares.
During testing time, I recommend setting automatic refresh of status page for 30 seconds. For reboot or error, you will then need to browse individual cards in the rig in question and examine the cards in the rig for error, e.g., low power consumption, invalid shares.

I define stability as the individual card's performance of its hashing duties without producing errors OVER TIME. Notice that TIME is the most important variable here. Cards that are stable for 1 minute might show instability after 5 minutes of operation. Cards that are stable for 10 minutes might cause errors after 15 minutes. Cards that are good for 15 minutes might cause errors after 1 hour and so forth.

Therefore we begin with getting all cards and rigs stable for 5 minutes. If a card shows any kind of error within 5 minutes, especially after an immediate bootup and start, then we adjust the values of VDDCI. After we determine stability values at 5 minutes, we then see if the card remains stable at 10 minutes, 15 minutes, 30 minutes, 60 minutes and so forth.

Procedure:
  • We begin with (Samsung=910; Micron=930) and all VDDCI=850. If a card produces an error within 5 minutes, then we +25 to VDDCI until we reach VDDCI=900. If errors continue at VDDCI=900, then we -10 from MEM.
Expected Results:
  • What you will see is that some cards run rock stable at VDCCI=850 and MEM=(910 Samsung; 930 Micron). Sometimes, VDCCI=900 and MEM=(900 or 890 for Samsung; 930 or 920 for Micron).
Further tuning
After your rig has run without errors for a predetermined time to your comfort (1hr, 6hr, 12hr, 24hr, etc.) Then for those cards where MEM had to be reduced, you can lower VDDCI -25 for just these cards until you find a point where lower VDDCI causes new errors. If a lower VDDCI causes a new error, the increase VDDCI +25 until no more errors.
Yes, I have read it and tried but can't obtain better mem temps. The better numbers that I have obtained are the ones that I've showed. For example... If I go below 1290 MVDD reducing memoty clock too I get an unstable system. That's why I'm thinking in bios modifications.
I'm going to keep trying with this guide and your bios. Thanks very much @Anthony !!!
 
For this particular case, probably the easiest way would be to define the power limit by card. Right now, according to your image, cards are pulling between 115W-121W, resulting in temperatures from 78-84 degrees, which correspond to 55MH. I would attempt to limit power to 100W per card, for example. (You would do this in the overclocking template in HiveOS.)

Once you do this, you'll see MH drop, but you should also see the temperature drop. Since this is a software-side implementation, there's always a chance that the software will glitch and the bios values will be used. But for a quick and dirty implementation, start off with defining the power limit. Increase or decrease the power limit until you reach the desired temperature.

From there, @Mini_Me or some other guru might recommend bios adjustments to make the power savings a little more permanent.
I'm going to try this too!!
 
I have move the rig to a room with air conditioning set at 24° and a floor fan and now I have 3 cards at 78-80° and one card at 84° Celsius. I'm going to give your bios a try with the hottest one. The bios that I have in my cards is a modified one following the page 32 guide of @Mini_Me; I haven't followed the guide in page 15 (for cards with sensitive (restricted) power curve) because it envolves using windows in the rig and I have HiveOS, but if I have to do it for better mem temps I I will do it. Will this guide help me with mem temps? I don't understand yet the basics of bios mod, I jus follow the gides; regaring the OC settings I have tried lowering the memory voltages but I get an unstable system (I'm currently at 1290mV).
At this time I have 48 hours without GPU dead errors with this values:

Anhang anzeigen 10444

if I have to sacrifice a little hashrate to get safe mem temps I will do it; What values do you guys suggest to archieve this? If I reduce memory voltage I understand that I have to reduce mem clocks but In what intervals do I have to do it?

Thanks in advance
Please use one of the setting below to decrease the memory temperature,

Core clock, VDD : Memory clock, VDDCI, MVDD, Hashrate in TeamRedMiner,

- 1330 MHz, 750 mV : 850 (1700/2) MHz, 800 mV, 1350 mV, ~52.10 MH/s,
- 1345 MHz, 750 or 760 mV : 865 (1730/2) MHz, 810 mV, 1350 mV, ~53.10 MH/s,
- 1360 MHz, 760 or 770 mV : 880 (1760/2) MHz, 820 mV, 1350 mV, ~54.10 MH/s.
 
Oben Unten