The “secret” behind NVIDIA’s sophisticated telemetry: the role of Buckets, Speedo and Continuous Virtual Binning (CVB)

The telemetry of current NVIDIA graphics cards and the Voltage Frequency Engine (VFE)

Now I want to describe NVIDIA’s Boost (and AMD’s Power Tune in a more generalized form) and put what I’ve just read into context, even if I have to repeat myself a bit (diagram below). The task of the so-called telemetry is to achieve maximum graphics performance with the lowest possible power consumption and the resulting side effects, such as waste heat, and to use all the monitoring data for this purpose. The main objective is to adjust the core voltage of the GPU in real time so that only as much power is supplied as is actually required for the current GPU load and to achieve the optimum clock rate.

Let’s start by simply calling it a voltage curve (I’m sure everyone has heard this before), even if I’ll have to go into more detail later. To put it in ready-to-use terms: The individual boost steps including the default voltage have been stored, whereby the clock of the lowest boost stage is shifted or determined by a so-called offset and the rest then results from the calculations of the arbitrator (mediator). With AMD, the clock rates and voltages are set for a number of predefined DPM states, which is significantly less precise (more granular), but ultimately works in a similar way.

The firmware constantly estimates the energy consumption at very short intervals (virtually in real time), simultaneously queries all the sensors and the GPU prediction and incorporates the telemetry data from the voltage regulator and the input monitoring (shunts, image below). These values are sent to the pre-programmed DPM (digital power management), i.e. the arbitrator. This control complex also knows the power, thermal and current limits of the GPU (BIOS, driver), which it can read from the respective registers. Within these limits, it controls the temperatures, all voltages, clock frequencies and fan speeds and always tries to get the maximum performance out of the card. If even one of the input variables is exceeded, the mediator can reduce the voltage or clock rate.

Monitoring the 12V rail on a GeForce RTX 4070 using shunts

The disadvantage of such a publicly visible (and with suitable software also customizable) “frequency/voltage curve” is that it is actually not so easy to define in general terms. What the end user can really only modify is a certain partial shift on the basis of previously calculated, individual limit and reference values of each individual chip under the current conditions! This is where the so-called VFE (Voltage Frequency Engine) comes into play, providing a flexible framework to specify or evaluate the relationship between clock frequencies, which is normally a function of voltage, speedo and temperature. Or to put it in a nutshell: The calculated voltage for each of these frequency points of such a curve is actually a function of the GPU’s speedo, which is determined by “continuous virtual binning”.

You guessed it, now it gets a little trickier. Remember the first paragraphs on binning and the ATE flow: Continuous Virtual Binning (CVB) uses statistical models and algorithms to continuously and virtually analyze the performance of semiconductor components instead of actually physically testing them. “Continuous Virtual Binning” in our case of the GPU means that the voltage decreases by 10 mV (regular step size) when the speedo is increased by the same amount (based on a linear or quadratic equation). The voltage for each frequency point is a function of the temperature of the GPU.

The clock frequency and therefore the voltages of the GPU depend on the temperature. Semiconductors (p-type and n-type) can have either a positive or negative temperature coefficient and as the temperature increases, the movement in MOS transistors can decrease. This decrease increases the threshold voltage (Vt). This makes the transistor slower. Therefore, an increase in temperature will decrease the clock frequency and vice versa. This temperature dependency is captured in the same quadratic equation that uses the chip’s speedo. Since the frequency specified in the steps must logically remain locked, the voltage increases as the temperature rises in order to achieve the required frequency (or vice versa). This quadratic equation, which captures the relationship between the frequencies and their corresponding voltages, is captured by the so-called VFE frame, which is stored as part of the configuration data in the VBIOS firmware on the chip’s EEPROM and can no longer be overwritten.

The main function of the VFE is therefore to dynamically adjust the voltage and frequency of the processors in order to optimize performance and energy efficiency. The VFE works closely with the PMU (Power Management Unit) to provide the correct voltage and frequency values for different operating states and load conditions. I will come to this in the next paragraph. In summary, the Voltage Frequency Engine and Speedo work together to optimize performance and energy efficiency. The VFE is responsible for adjusting the voltage and frequency, while Speedo monitors the PVT variations and provides the necessary information for the VFE to make the right adjustments.

Now, let’s take a breath. But it’s not as complicated as it might sound at first. To make a long story short: You can neither trick nor override the Speedo. What you can change manually is always based on the stored Speedo and the values of the VFE, over which the end customer also has no influence. And now we also know that good cooling is often worth more than the most brutal OC. It’s the dreaded dog-eat-dog principle with air-cooled cards, where increasing the power limit for a higher clock rate also leads to higher temperatures and therefore lower clock rates again. You can do this forever and the card won’t get any faster. Just thirstier. This is exactly why the opposite undervolting is so clever, because it enables higher boost steps due to lower temperatures. So quasi lossless OC for free.

Pages:

22 Antworten

Zeige alle Kommentare an

Kommentar

Lade neue Kommentare

Legalev

Mitglied

47 Kommentare 51 Likes

#1 Dec 30, 2023

Sehr interessanter Artikel.
Liest sich sehr Aufwendig, dass alles zu Testen.

Wie lange dauert den so ein Vorgang in etwa bis entschieden ist, welche Kategorie die GPU gerecht wird bitte?

Antwort 2 Likes

Igor Wallossek

10,265 Kommentare 19,006 Likes

#2 Dec 30, 2023

Ds geht recht fix. :)

Antwort Gefällt mir

Martin Gut

Urgestein

7,828 Kommentare 3,599 Likes

#3 Dec 30, 2023

Interessant. Mich erstaunt etwas, dass bei so genau getesteten Chips von den Herstellern immer noch eine Reserve von 0.08 bis 0.10 Volt einprogrammiert ist, die man dann mit Untervolten weg schnippseln kann. Wenn die Spannung von Anfang an ein bisschen tiefer eingestellt wäre, wären die Karten doch deutlich sparsamer. Da will man vermutlich einfach keine Risiken eingehen dass mal ein Chip nicht stabil läuft. Da gibt lieber etwas mehr Spannung und nimmt den höheren Verbrauch in Kauf.

Antwort 2 Likes

stch

Mitglied

20 Kommentare 7 Likes

#4 Dec 30, 2023

In der Massenproduktion reden wir typischerweise über Taktzeiten im Bereich von einigen Sekunden.

Antwort Gefällt mir

stch

Mitglied

20 Kommentare 7 Likes

#5 Dec 30, 2023

Ökonomisch nachvollziehbar. Feldrückläufer sind sauteuer, Mehrverbrauch beim Kunden bzw. etwas weniger Leistung kostet den Hersteller nichts.

Antwort Gefällt mir

grimm

Urgestein

3,107 Kommentare 2,046 Likes

#6 Dec 30, 2023

Speedo ist ja nicht so meins - einen guten Rutsch euch allen!

View image at the forums

Antwort 1 Like

Igor Wallossek

10,265 Kommentare 19,006 Likes

#7 Dec 30, 2023

Maturing und Elektromigration. Was heute noch geht, kann übermorgen schon instabil sein. Da plant man lieber Reserven für 2 Jahre mit ein. :D

Antwort Gefällt mir

Guest

#8 Dec 30, 2023

Super Lesestoff!
( ich hab das Whitepaper von Nvidia gelesen.. ggg... Transkriptionslexicon und tech uni Leipzig und MIT old 2019-22.. mit dabei.. molto caffee)1300 Seiten..
Neuer Arbeits-Leicht-Rechner im Testbetrieb.
Die new ada A 4500.. ( 2900 Euro für 192er Schnittstelle is halt.schon halbheftig.dafür bekommt man den vollen Ram der 4090ger bei fix 180-200 Watt..)
Was Nvidia einfach kann, wenn sie wollen, ist das Leistungs Paket mit Energieverbrauch zu verknüpfen.
Und immer der gezielte Kontext der zu erwartenden Anwendung. da gehen die relativ konservativ vor, wenn man Quellcodices liest, oder die Ki macht, da
geht das über ganze Generationen hoch. Und den Vorteil haben die. Und die Alchemy der neuen Lithographen, die man behütet. Da sollen nur relativ
wenige Menschen alles wissen, und dann hat man noch die singuläre Denk-techrichtung von Nvidia, die sich net so weit spreaden wie AMD.
In Formel 1 würde man sagen, der Vatikan ( Ferrari) hat genug Spielgeld, Red Bull noch mehr und MC Laren wird 2024 Weltenmeister, weil sie voll in gehen.
Könnte AMD machen, wenn sie wüssten was sie lassen. Dazu kommt der Hype um KI.. die uns nette neue Waffen, Bomber, Digitalen warfare, Robotik und
so weiter bringt, New BANK POWER.. das Geld wird so schnell und komplex wie nie. Medizin ( nur für die die GELD haben) ( der ELYSIUM Effekt unserer
Gesellschaft zeichnet sich ab. Autokrate Geld Demokratien..) Und ein bisserl Viel ÜBERWACHUNG.. und der neue DIGITALE STÄNDESTAAT.**
In dem Bereich verkauft NVIDIA derzeit 39% strigend seiner HX-100 200 und 300+..

Da bleibt für Hollywood nur noch 15% und GAMEN unter 8%.. Das der Blick ins Jahr nach 24 um 2030.
** das sozial roolo ashole 3.0 wird ausgerollt.. da gibt es keine Menschen mehr. Nur noch digitale Leichen mit Heiligenschein.
Gut..
das MSI bootet ( 400 Euro für das AM5--weil ich als ASUS mensch ASUS mis traue.Heizen wir ein?.) der 16 Core wacht auf.. Ram. Pyn ist da.. yes.. ( rein Linux first..
die Eingeweide liegen Rum.. GEKÜHLT mit oufpassen GROSSER VENTILLATOR ( 3 Noktua hängen dran..) es ist alles schwarz. lol.. und WAKÜ.. lang steht Rom net mehr.. ( Kraken) ( schwarz) ( beQ Big Rock lauert..)(

kreativ schwarz oder st Gotthard dunkelweiss :) peace :)

GUTES GUTES RÜBER KOMMEN IN EIN GESUNDES NEUES JAHR ! und nur das gute Zeug in Maßen trinken. :)

Antwort Gefällt mir

Klicke zum Ausklappem

Guest

#9 Dec 30, 2023

Antwort Gefällt mir

Guest

#10 Dec 30, 2023

OpenAI und Axel Springer kooperieren.. !

zum Ersten mal gehen Dünnschiss und Klopapier einen gemeinsamen Weg.
angeblich gibt es 20 Millionen Abonnenten..
Ich werd das nie verstehen. 20 Millionen Hirntote lesen ihren eigen Stuhl..
es kann nur beseer werden 2024 :)

Antwort 1 Like

Daves085

Neuling

9 Kommentare 14 Likes

#11 Dec 30, 2023

Gibt es eigentlich ein Grundlagenartikel wie MikroChips überhaupt unterschiedlich gut sein können?
Wie kann ich mir das vorstellen als Technik Laie? Schaltung ist doch Schaltung? Ich verstehe dass es in den Chips mal defekte Bereiche geben kann , die deaktiviert werden, aber warum führt dass dann dazu dass ich den Chips mit höheren Voltzahlen betreiben muss?

Antwort 1 Like

Pfannenwender

Veteran

302 Kommentare 195 Likes

#12 Dec 30, 2023

Soweit zu dem, was ich verstanden habe. :unsure:

Dir auch nen Guten. 👍

Antwort 1 Like

Igor Wallossek

10,265 Kommentare 19,006 Likes

#13 Dec 30, 2023

Waferqualität, Lithografie... Da reichen schon klitzekleinste Abweichungen und Unschärfen. Außerdem nehmen nach außen hin die guten Chips ab, der Edelstoff kommt fast immer direkt aus der Mitte :)

Antwort Gefällt mir

Guest

#14 Dec 30, 2023

Und deswegen heißt die rtx 4090 D.. DIESEL ? soory..
( diese diplomatische Verrenkung hat was von Habsburger und Wallenstein und so weiter.. chschinna)

Frage : hat SCHWERKRAFT einen Einfluss auf den Lithographen? ich würde das ja in der Raumstation bauen.
Dazu ein Extra Zuschlag von Nvidia..oder?

Antwort Gefällt mir

Guest

#15 Dec 30, 2023

offtopic info :
AM5 INFO : 16 core + pyn a 4500 2 m2 2 ssd

Board : MSI MPG X670E Carbon WIFI ( 430 euro..26.11.23)

gut:

-Kein Spulen fiepen mit dedizierter GraKa und ohne is auch nix ( also Audisection leise)
-6 SATA-Ports ohne Lane Sharing mit den 4 M2-Ports.. bisserl verbaut
-PCI-E Gen 5 Unterstützung

schlecht :

-Bootverhalten & Neustarts dauern nach Einstellungen ewig. Teils Soft-Resets erforderlich. Für häufige Bios-EinstellungenOC nicht geeignet. Mach ich eher eh nicht.
Das Board wird mit DDR 6000 betrieben und hier gibt es leider noch das typische Problem mit den Bootzeiten, sobald man Expo nutzt und Max Speed von den Rams fordert
-MSI Center & Apps hängen sich häufig auf oder starten nicht. ( schauen was man nicht br)
- Beleuchtungsklimbim is ma wurscht ( abgezwickt)

- Bootzeiten, Bios, Neustart : das dauert nach dem ersten kalten Einstellen viel länger.
-kann die 5 Minuten erreichen ( sobald OS treiber etz gehen wir auf unter 45 sec..)
- bei Soft-Reset startet das Board aber ohne Probleme und bootet in Windows. supa!
-MSI-Center Software ist mir eh sch egal..nur was nötig. und weiter

CPU : Ryzen 9 7950x CPU

zu dem ist net viel zu sagen. ich find den lauwarm ( wenn man vom gen2- Threadripper kommt)

Kühlung : erstmals macht der Inschdallatuer WAKÜ. die KRAKE. Das Gedärm hat im Cosair500
Platz... Bis jetzt sind Temps ( Cpu ist auf Test temp gestellt- 30% unter Vmax) mittel.
45 bis 68..72 Grad ( cb 2023 hier schon bei 30.000+)

Ram : übliche 2 x Corsair DDR5 6000er Riegel. 64gbt..vorerst.
Keine Kollision mit letzer Agesa.. ( ich hab da nette Sammlung)

- PCI: der 1 Slot ist echt etwas bescheuert unterm CPU Feld und tiefer..wegen des der M2-slots

- M2: da ich lieber lauwarm habe sind die Samsung 990P die Grenze ( interesssante temps
beim Starten der Corrona Render Engine ..)

- Nivea Pyn A 4500. Ereignislos normal. Der Stecker hat nix zum tun.
- Das neue SEASONIC 1000 er reicht ( es konnte seit Nov aus stinken..)
- 8 Noctua 14er grau drehen mit 480 bis 800.. Das reicht.
_ die Pyn wird nach der Probezeit zersägt und der fitzelPropeller wird ausgebaut und verbrannt.
Noctua. lol ( i lern in dem Alter net mehr um)

--- Des ganze im Cosair 500 drin. 2 weiter normal ssd 4 tbt Datas.. Platzt rdeicht noch.
------------------------------------------------

2h22 speciale ( einige Migrationen aus dem W11 sind im 10er implantiert.. dazu gehört
das aus operieren aller unnötigen Teile. Ein Script.weil wenn die UNREAL-5-4-1 zum ersten mal
drauf gesp wird windooof 2 Stunden Zeit hat alles zu versauen--dazu gibt es einen sog
Scratch- Windosen-10-Container wo datt alles landet = nachher sichten und löschen..etz)
----------------------------------------------
Jetzt nach 2 Stunden is der fertig.
bootet nromal schnell. keine blue Sc. keine Verkutzer.
Ur5
maya
c4d2023
zbrush
3dCoat ( ich empfehle das bad sister of z-brush)
Arnold
Redhift
Corrona ( Bier)
1200 PLugins
audiokrempel

10 bit Monitor Nachbesserung ( nein gerader 4k LG oder BenQ )

mach mal 3ds max mit gebogen Samsung.. zum speiben gehen..::)
----------------------------------------
1 Stund sinnlos Test C4d r2023 Corrona CPU + GPU = 2 gbt datei.
Auslastung CPU 94 % +- GPU 86%-90% ( ich hab gerne etwas Spatzi = wir fackeln später ab)

TEMPS PPO ist immer noch auf sachte. Untervolten kannst beim 16er vergessen = Throtteling wh.
Raum : 19 Grad. normale Luftfeuchte. ( Gösserbier Bock in der Flasche..kalt)
Prügelt die 180 Watt an. also Normal. Temps net über 78 Grad. ( also Wakü richtig herum?)
GPU geht mit dem Minipropeller voll und wird hier sehr heiß ( Corrona mag das)
2te M2 kann sau heiss werden.. bis data in load ende--dann normal.
RESUME : stabil. im Temps Fenster.
---------------------------------geht.

Und nun der zweite : I9 13900 K.. 250 watt und ..) Adobe Bomber.. Asus brett Biege Kit..etz

Lg Peace :) und nun nur noch faul und relax

Antwort Gefällt mir

Klicke zum Ausklappem

LurkingInShadows

Urgestein

1,371 Kommentare 571 Likes

#16 Dec 30, 2023

Wie Igor schon schrieb, es gibt IMMER leichte Abweichungen. Wenn dann an einer Stelle zB der Leitungsdurchmesser nicht ganz passt muss man das ausgleichen, nacharbeiten geht ja nicht bei 5 nm.

Antwort Gefällt mir

eastcoast_pete

Urgestein

1,531 Kommentare 863 Likes

#17 Dec 30, 2023

Der höhere Verbrauch wird allerdings auch vom Kunden bezahlt, während eine schlechter gebinnte GPU, die es mit 100 mV weniger nicht mehr packt, hier Nvidia (oder, bei Navi, AMD) Geld kosten würde.

Antwort Gefällt mir

eastcoast_pete

Urgestein

1,531 Kommentare 863 Likes

#18 Dec 30, 2023

@Igor Wallossek : Danke, sehr interessant!
Und jetzt etwas, das uU auch mein Unwissen zeigt: Allgemein bin ich im Moment auch von den "KI" Fähigkeiten gerade im Bereich Power Management noch enttäuscht. Eine Steuerung, die die individuellen Eigenschaften der GPU (oder auch APU) lernen kann, sollte damit doch eine noch bessere Feinabstimmung mit entsprechend niedrigerem Verbrauch ermöglichen, oder liege ich da ganz daneben? In der Hinsicht war auch Intels "KI optimierter Thread Director" für Meteor Lake ja auch eher schwach; die ganze "KI" dafür wurde und wird ja bei Intel im Werk gemacht und fließt dann in die Firmware ein, und eben nicht direkt live von der NPU im SoC. Eine wirklich per-Chip individuelle Optimierung wäre (IMHO) wirklich ein großer Schritt nach vorne. Und sowas könnte dann lernen, welche Anwendungen man wann und wie zusammen benutzt, und somit (bei CPUs) Kerne tief schlafen legen und wecken, und bei GPUs und CPUs Taktfrequenz und Spannung noch feiner und antizipatorisch
einstellen. Alles für bessere Effizienz und (!) Schwupdizität.

Frage: Wie ändert sich denn der Stromverbrauch je nach Situation bei Spielen wie Cyperpunk? Gibt's da Artikel zum Nachlesen darüber? (Und ich weiß, ich frag manchmal komische Sachen 😁). Wenn eine KI hier in die Steuerung mit reinkommt, könnte sie auch hier mit Verbrauch und Leistung schneller und feiner anpassen und optimieren.

Antwort Gefällt mir

Klicke zum Ausklappem

LurkingInShadows

Urgestein

1,371 Kommentare 571 Likes

#19 Dec 30, 2023

und das interessiert welche Firma? Gerne auch Nennungen weltweit.....

Antwort 1 Like

Alle Kommentare lesen unter igor´sLAB Community →

Danke für die Spende

Du fandest, der Beitrag war interessant und möchtest uns unterstützen? Klasse!

Hier erfährst Du, wie: Hier spenden.

Hier kannst Du per PayPal spenden.

The NVIDIA RTX 4xxx Super portfolio is complete: faster RAM, a little more clock speed and speculation about prices

MSI and Gigabyte give a first look at the NVIDIA GeForce RTX 4080 SUPER, 4070 Ti SUPER and 4070 SUPER Custom models (LEAK)

About the author

View All Posts

Igor Wallossek

Editor-in-chief and name-giver of igor'sLAB as the content successor of Tom's Hardware Germany, whose license was returned in June 2019 in order to better meet the qualitative demands of web content and challenges of new media such as YouTube with its own channel.

Computer nerd since 1983, audio freak since 1979 and pretty much open to anything with a plug or battery for over 50 years.

Follow Igor:
YouTube Facebook Instagram Twitter