So Nvidia has managed to re-create a Turing chip that takes conventional paths in the well-known RTX-Off style and yet can bring about important innovations in architecture. We'll see how much of it arrives in the GeForce GTX 1660 Ti. For any player who is now wondering how Nvidia's Turing architecture would work if the Tensor and RT cores were taken out, I also have the answer today in the form of this test.
The GeForce GTX 1660 Ti is based on the TU116-400, an all-new graphics processor that integrates Turing's improved shaders, new cache architecture, and adaptive shading support. The GPU is also connected to faster GDDR6 memory, just like the more expensive GeForce RTX 20 series models. By omitting the RT cores for accelerated ray tracing and the tensor cores for inferencing in games, the TU116 is a slimmer graphics processor that also has to (and can) dispense with features.
The EUR 299 EIA, including VAT, announced by Nvidia is certainly a bit of a downside, but I will go into more detail on the price question on the last page. Improved performance per euro is therefore not what we have seen so far from the Turing generation. But let's look for the qualities of this Full HD card and clarify that with the costs at the end.
Launch without reference cards
There is no reference card, so this time the launch goes directly to the board partners. This also has its charm, because on the first day you can examine and test the manufacturers' own designs. For my test, I picked out the MSI GTX 1660 Ti Gaming X as the main card, as it comes to the customer with the highest factory clock. In the benchmarks and some other areas, of course, I used the values of the other two cards. But the effort for teardown and board analysis is simply too high for two or even three cards at the end, so I ask for your understanding.
The other two cards are the much simpler MSI GTX 1660 Ti Ventus XS (without RGB) and the EVGA GTX 1660 Ti XC Black Gaming, which my US colleague got sampled. Unfortunately, I only have the benchmark data for the latter, as EVGA did not send a sample to us. Too bad, but not to change. At least the MSI GTX 1660 Ti Ventus XS is the model that MSI also wants to position on the market for the recommended retail price recommended by Nvidia. The Gaming X is more like the over-1660-Ti. Funnily enough, the performance isn't that different, but I don't want to spoil it.
The look and feel of the MSI GTX 1660 Ti Gaming X
The 869 gram card measures 24.8 cm gross in its length from the outer edge of the slot panel to the end of the radiator cover. It is 4.2 thick (plus the 5 mm for the backplate on the back) and 12 cm high (from the top edge of the motherboard slot to the top edge of the radiator cover). The anthracite-coloured radiator cover is kept in the usual MSI-Edge style and quite restrained. It only gets colorful with electricity, because translucent plastic strips have been incorporated, behind the RGB LEDs, which can also be controlled by software
The backplate is made of brushed aluminium and is also screwed to the cooling frame from the front. For the rest I have the complete tear down in text, picture and this time also video.
The look and feel of the MSI GTX 1660 Ti Gaming X
The card, which weighs only 705 grams and is quite short, measures only 20.3 cm gross in its length from the outer edge of the slot panel to the end of the radiator cover. It is 3.5 thick (plus the 4 mm for the backplate on the back) and 12 cm high (from the top edge of the motherboard slot to the top edge of the radiator cover). The silver-metallic radiator cover counteracts the matt-black bodysuit and dispenses with all the light stuff.
The backplate is made of black plastic in brush look and also screwed from the front with the cooling frame. There is not much more to report at this point, because the high-resolution pictures of the gallery already do this.
MSI GeForce GTX 1660 Ti Gaming X 6G, 6GB GDDR6, HDMI, 3x DP (V375-040R)
What makes Turing better than Pascal?
Since the autumn of last year, we have seen Nvidia launch four different GPUs that have been moving further and further down the Turing generation hierarchy. With each of the new chips, you've also cleverly used the resources to set new price points without cannibalizing your own products. Of course, you can also see that as you like.
A GeForce RTX 2060 is equipped with 44% of the CUDA cores and texture units of an RTX 2080 Ti, has 54% of THE ROPs and memory bandwidth, and 50% of the L2 cache. After all the patches and updates, even with this card, many of the RTX features could still be used well if you left it at the screen resolution to Full HD. But everyone will also have noticed that the pain limit was reached right there, so that Nvidia is now using a "normal" card to round off the portfolio further down.
After removing the RT and tensor cores, a 284 mm2 chip consisting of 6.6 billion transistors manufactured by TSMC's 12 nm FinFET process remains. But despite its smaller transistor number, such a TU116 is still 42% larger than its predecessor's GP106 processor! Part of the grown size is certainly due to Turing's more sophisticated shaders. Like the GeForce RTX 20 series high-end cards, the GeForce GTX 1660 Ti now supports the simultaneous execution of FP32 arithmetic commands (which make up the most shader workloads) and INT32 operations (for addressing/fetching data, floating point min/max, etc.).
This explains to a large extent the performance growth of Turing compared to Pascal at the same bar. Turing's streaming multiprocessors consist of fewer CUDA cores than Pascal's, but the design compensates for this in part by distributing more SMs to each GPU. The newer architecture assigns one scheduler and one disposition unit per 16 CUDA cores (such as Pascal) to each set of 16 CUDA cores (twice as much as Pascal).
Four of these 16-core groupings include the SM, 96KB cache, which can be configured as 64KB L1/32KB of shared memory, or vice versa, and four texture units. Because Turing has twice as many schedulers as Pascal, only one statement needs to be issued to the CUDA cores in every second cycle. In between, there is enough room to send a different instruction to any other device, including the INT32 cores.
Nvidia packs 24 SMs into the TU116 and divides them into three graphics processing clusters. With 64 FP32 cores per SM, that's 1,536 CUDA cores and 96 texture units for the entire GPU. Six 32-bit memory controllers give the TU116 an aggregated 192-bit bus that powers the six 12Gb/s GDDR6 modules (Micron MT61K256M32JE-12:A) up to 288GB/s. That's 50% more memory bandwidth than the GeForce GTX 1060, and the GeForce GTX 1660 Ti helps maintain its performance advantage with 2560×1440 enabled anti-aliasing.
Each memory controller is allocated eight ROPs and a 256KB large portion of the L2 cache. In total, this is 48 ROPs and 1.5 MB L2 for the TU116. The ROP number of the GeForce GTX 1660 Ti is surprisingly high compared to the GeForce RTX 2060, which also uses 48 ROPs. But the L2 cache units are only half the size. Despite the larger Die, the 50% higher transistor count and the more aggressive GPU boost clock speed, the GeForce GTX 1660 Ti is specified for the same TDP of 120W as the GeForce GTX 1060.
Unfortunately, neither of the two graphics cards provides multi-GPU support. Nvidia confirms once again that multi-GPU settings are only intended to achieve higher, absolute computing performance, rather than giving players a way to design a single GPU configuration in such a way that it might be a more expensive one. single card could cannibalize. However, the board partners will no doubt also aim for high clock rates ex-works in order to at least close the gap between the GeForce GTX 1660 Ti and the RTX 2060. The official base clock frequency is only 1,500 MHz with a GPU boost specification of 1,770 MHz.
Faster arithmetic even without tensor cores
In addition to the Shaders and the Unified Cache of the Turing Architecture, TU116 also supports an algorithm pair called Content Adaptive Shading and Motion Adaptive Shading, collectively called variable-rate shading. I have already written a longer introduction to the launch of the GeForce RTX 2080 (Ti). Nvidia has also revealed that it is replacing the tensor cores with dedicated FP16 cores to enable the GeForce GTX 1660 Ti to process semi-precision operations at twice the rate of FP32.
However, the other Turing-based GPUs also have twice as much FP16 performance, so it's unclear how unique the GeForce GTX 1660 Ti is within the Turing family. This is evident in the following graphic, where you can see very well that the GeForce 1660 Ti offers a massive improvement in throughput at half precision compared to the GeForce GTX 1060 and the Pascal-based GP106 chip:
Technical data and comparison maps
At the end of this introduction, the maps of the new generation and those of the old generation in direct tabular comparison:
MSI GeForce GTX 1660 Ti Gaming X | EVGA GeForce GTX 1660 Ti XC Black Gaming | MSI GeForce GTX 1660 Ti Ventus XS OC | GeForce GTX 1060 FE | GeForce GTX 1070 | GeForce RTX 2060 FE | |
Architecture | Turing (TU116) | Turing (TU116) | Turing (TU116) | Pascal (GP106) | Pascal (GP104) | Turing (TU106) |
CUDA Cores | 1536 | 1536 | 1536 | 1280 | 1920 | 1920 |
Tensor Cores | N/A | N/A | N/A | N/A | N/A | 240 |
RT Cores | N/A | N/A | N/A | N/A | N/A | 30 |
Texture Units | 96 | 96 | 96 | 80 | 120 | 120 |
FP16 Power (Peak) | 10.9 TFLOPS | 10.9 TFLOPS | 10.9 TFLOPS | 4.4 TFLOPS | 6.5 TFLOPS | 12.4 TFLOPS (51.7 TFLOPS Tensor) |
FP32 Power (Peak) | 5.5 TFLOPS | 5.5 TFLOPS | 5.5 TFLOPS | 4.4 TFLOPS | 6.5 TFLOPS | 6.2 TFLOPS |
Base clock | 1500 MHz | 1500 MHz | 1500 MHz | 1506 MHz | 1506 MHz | 1365 MHz |
Boost clock | 1875 MHz | 1830 MHz | 1830 MHz | 1708 MHz | 1683 MHz | 1680 MHz |
Memory | 6 GB GDDR6 | 6 GB GDDR6 | 6 GB GDDR6 | 6GB GDDR5 | 8GB GDDR5 | 6GB GDDR6 |
Memory bus | 192-bit | 192-bit | 192-bit | 192-bit | 256-bit | 192-bit |
Bandwidth | 288 GB/s | 288 GB/s | 288 GB/s | 192 GB/s | 256 GB/s | 336 GB/s |
Rops | 48 | 48 | 48 | 48 | 64 | 48 |
L2 Cache | 1.5 MB | 1.5 MB | 1.5 MB | 1.5 MB | 2 MB | 3 MB |
Tdp | 120 W | 120 W | 120 W | 120 W | 150 W | 160 W |
Transistors billions | 6,6 | 6,6 | 6,6 | 4,4 | 7,2 | 10,8 |
The size | 284 mm2 | 284 mm2 | 284 mm2 | 200 mm2 | 314 mm2 | 445 mm2 |
Sli | No | No | No | No | Yes | No |
Test system and measurement methods
I have already described the new test system and the methodology in the basic article "How we test graphics cards, as of February 2017" (English: "How We Test Graphics Cards") in great detail and therefore, for the sake of simplicity, now only refers to this detailed Description. So if you want to read everything again, you are welcome to do so.
If you are interested, the summary in table form quickly provides a brief overview:
Test systems and measuring rooms | |
---|---|
Hardware: |
Intel Core i7-8700K MSI Z370 Gaming Pro Carbon AC 16GB KFA2 DDR4 4000 Hall of Fame @DDR4 3400 1x 1 TByte Toshiba OCZ RD400 (M.2, System SSD) 2x 960 GByte Toshiba OCZ TR150 (Storage, Images) Be Quiet Dark Power Pro 11, 850-watt power supply |
Cooling: |
Alphacool Ice Block XPX 5x Be Quiet! Silent Wings 3 PWM (Closed Case Simulation) Thermal Grizzly Kryonaut (for cooler change) |
Housing: |
Lian Li PC-T70 with expansion kit and modifications Modes: Open Benchtable, Closed Case |
Monitor: | Eizo EV3237-BK |
Power consumption: |
non-contact DC measurement on the PCIe slot (Riser-Card) non-contact DC measurement on the external PCIe power supply Direct voltage measurement on the respective feeders and on the power supply 2x Rohde & Schwarz HMO 3054, 500 MHz multi-channel oscillograph with memory function 4x Rohde & Schwarz HZO50, current togor adapter (1 mA to 30 A, 100 KHz, DC) 4x Rohde & Schwarz HZ355, touch divider (10:1, 500 MHz) 1x Rohde & Schwarz HMC 8012, digital multimeter with storage function |
Thermography: |
Optris PI640, infrared camera PI Connect evaluation software with profiles |
Acoustics: |
NTI Audio M2211 (with calibration file) Steinberg UR12 (with phantom power for the microphones) Creative X7, Smaart v.7 own low-reflection measuring room, 3.5 x 1.8 x 2.2 m (LxTxH) Axial measurements, perpendicular to the center of the sound source(s), measuring distance 50 cm Noise in dBA (Slow) as RTA measurement Frequency spectrum as a graph |
Operating system | Windows 10 Pro (1803, all updates) |
- 1 - Architektur, Testmuster, Testsystem
- 2 - Teardown und Analyse
- 3 - Benchmarks bei 1920 x 1080 (Übersicht)
- 4 - Benchmarks bei 1920 x 1080 (Einzelergebnisse)
- 5 - Benchmarks bei 2560 x 1440 (Übersicht)
- 6 - Benchmarks bei 2560 x 1440 (Einzelergebnisse)
- 7 - Leistungsaufnahme im Detail
- 8 - Temperatur, Takt, Infrarot
- 9 - Lüfter und Lautstärke
- 10 - Zusammenfassung
Kommentieren