First of all, I can reassure everyone that unlike the current Zen 3 generation, Zen 4 will not change much. However, even now there are a lot of fuzzy hypotheses and ambiguities about AMD’s reading of monitoring temperatures and also outputting them to the user. There I would like to bring today, also in view of the coming CPUs once some light into the darkness. Unfortunately, terms such as Tctl are often misinterpreted and misunderstood. And when it comes to temperature and power management, only dark forest remains. But don’t worry, I’m going to break it all down as generally understandable as possible.
The Tctl and Tcase demystified
To properly understand the value for Tctl, one must take a closer look at the origin and purpose of this temperature data. On Socket AM5 processors, this so-called on-die temperature monitoring is implemented either via the sideband temperature sensor interface (SB-TSI) according to the SMBus v2.0 specification or by reading the register for THM_TCON_CUR_TMP, which is the same for both methods and would only confuse as a more detailed explanation here. SB-TSI is therefore largely identical to the interfaces of common thermal diode monitoring devices and, as usual, can also be read out quite simply.
The reported value Tcontrol (Tctl for short) is provided to the platform to control the cooling solution, but does not represent the actual temperature of the chip or the processor case! The maximum value of Tcontrol is always fixed and unchangeably normalized to 100 for all processors, independent of the maximum case temperature Tcase of the processor.
Before we go into Tcase as (TDP class-dependent) case temperature, we are first interested in Tjunction (Tj for short) as junction temperature. As with Zen 3 (and older), the customer can determine the actual junction temperature using the familiar formula Tj = Tctl + Tj,offset. Interestingly, the Zen 4 docs only include 0 as the offset for all TDP classes, so either nothing has been entered in the specs yet or no offset is used here at all. Then you could even assume that Tj and Tctl are identical, although I personally assume a higher value, but it is not disclosed so as not to scare the end user.
Tctl should always be used to control the fan speed to keep the processor within its temperature specification and can also be used by the system to throttle the processor. The processor also has a fast ALERT_L pin to use an interrupt driven model instead of the rather slow one of polling. The usual calculation with Tctl – Tctl,max (as e.g. in HWinfo64) finally only indicates by how many degrees Celsius a processor is below the maximum temperature (100).
What they have improved further at Raphael, however, is CUR_TEMP. This filtering feature smooths the reported temperature and helps avoid nervous changes in the fan speed of the fan in response to reported spikes. We still know the seemingly arbitrary, sudden howling of the CPU fans, especially from older Intel mainboards, where the annoying up and down until the circumstances were corrected met with harsh criticism. Here, the values for Tctl are effectively cut and decelerated.
With Tcase as a guideline is such a thing, because depending on the TDP class, Raphael orients itself to a different value that fits the required cooling solution. This also refers to the sum of the thermal resistances between the IHS (processor lid) and the environment. However, all values have in common the reference point in the middle of the surface of the IHS. The targeted values themselves show us the table for Tcase:
Thermal management
Parallel to these readout values for fan control and temperature control, there are other thermal functions. The most important are the configurable hardware thermal control (cHTC), PROCHOT and ThermTrip. This is highly interesting because these are things that only take place inside the processor and cannot be influenced and directly seen from the outside. One of these features is Configurable Hardware Thermal Control (cHTC), which provides smooth p-state transitions to maximize performance during actual operation. The default Tctl limit for cHTC is set to 95. This is a specially protected value and it cannot be changed by the BIOS. The end user should therefore adjust the fan speed in such a way that the processor then operates at maximum at or better still below the Tctl limit under full load, because otherwise the margins for the clock rate can no longer be utilized (thermal throttling).
PROCHOT has been around since Intel’s Pentium 4 processors and the digital output pin indicates that the internal thermal control circuitry has been activated. This occurs when the processor reaches its maximum safe operating temperature. The SoC frequency will continue to be determined by the PL4 for the upcoming Alder Lake and Raptor Lake CPUs. Not so with AMD. PROCHOT_L, unlike Intel, is a true unidirectional pin where only the system can trigger PROCHOT and put the processor in the active PROCHOT state.
In this state, the processor initiates a transition to the lowest frequency (Fmin). This value is also fixed and cannot be changed. The power reduction is achieved within 1.5 ms after switching on. PROCHOT can only be triggered by an external agent every 5 ms (for an unlimited duration). Intel’s Fast PROCHOT# is significantly faster here. Vsys1 is monitored by the IMPV9.1 controller and PROCHOT is activated within 2 μs (adjustable) after the threshold is exceeded. The CPU is then already throttled 1μs later. Fast PROCHOT# thus allows Intel to achieve a higher PL4, resulting in better responsiveness down to low load states while maintaining system stability, but also produces harder load cycling.
And then there is the Thermtrip_L pin for final protection (shut-down). This is activated by the processor itself when the processor temperature exceeds a preset limit. The processor clocks are turned off and a low-voltage VID code is sent to the voltage regulator. In such a case, the system should enter the system shutdown state (S5) within 500 ms. The Thermtrip_L pin is bidirectional and either the system or the processor can trigger the Thermtrip function by setting the Thermtrip_L pin low. Thermtrip_L is used as protection to prevent permanent hardware damage. For SB-TSI, cHTC and ThermTrip the same on-die temperature sensing mechanism is used as I described above as Tctl.
There are no more secrets here and the changes compared to Zen 3 are more of a minor evolution, but the actual basic principle remains the same. So you can also infer Zen 3 from a lot of things. By the way, it’s interesting that AMD now also refers to the 16-core internally as an APU, which hides a graphics unit in the I/O die. But now we come to the power management on the next page, because the waste heat and the temperatures as a consequence have to come from somewhere.
13 Antworten
Kommentar
Lade neue Kommentare
Veteran
Urgestein
1
Urgestein
Mitglied
Veteran
Mitglied
Veteran
Mitglied
Urgestein
Mitglied
Mitglied
Alle Kommentare lesen unter igor´sLAB Community →