The non-cannibal with higher octane for the brain: GeForce GTX 1660 Super with three cards in the test

Nvidia becomes the new super-company and Master Jensen gallantly gives the Superman. Because from today it will be even more upand and in between. The Over-GeForce GTX 1660 Super, like its slower sister, is based on the TU116, more precisely the TU116-300, an already known graphics processor that supports Turing's improved shaders, new cache architecture, and adaptive shading support. Integrated. By omitting the RT cores for accelerated ray tracing and the tensor cores for inferencing in games, the TU116 is a slimmer graphics processor, but it also dispenses with features and is all about entering the lower middle class. Recommends.

The new GTX 1660 Super also has fewer shaders than the Ti version, but is now finally connected to the faster GDDR6 memory, which is allowed to clock even higher than the Ti. This creates exactly the bandwidth you think you need to somehow squeeze into the middle between the two GeForce GTX 1660 and GTX 1660 Ti that are already on the market. The EIA of 245 euros, including VAT announced by Nvidia, is 20 euros above what was called for the lead-free version with the slow storage. Currently, the cheapest models of the GeForce GTX 1660 start just above 205 Euros, the cheapest GTX 1660 Ti was this morning the single-propeller pegasus from Gainward from approx. 280 euros. This leaves Nvidia fully in the fabric of the street price, after all. But will the extra charge be worth it? That's what I have to figure out.

Relaunch without real reference cards

There is no reference card, only reference clock rates as a default, so the launch goes again directly to the board partners, who also undercut Nvidia's EIA with cheaper models on the launch day. This also has its charm, because on the first day you can examine and test the manufacturers' own designs. For my test Nvidia sent a Palit GTX 1660 Super GAMINGPRO with reference clock rates and I therefore, correspondingly, added a GAMINGPRO OC (239.90 Euro RRP) as a identical but factory overclocked card, as well as a small Gainward GTX 1660 Super Pegasus OC (EUR 229.90 RRP) for the Price Breaker and Mini-ITX Group. The MSI GTX 1660 Gaming X, which is also visible in the picture, is slightly above the three Palit and Three Gainward models and is therefore tested separately. Because this is more about beautiful living for some extra charge.

To anticipate it (without spoiling big) – i was able to give myself the separate benchmarking of the three cards for an internal comparison. The reason is the GPU lottery and the rather narrow room for manoeuvre that was left for the factory OC. And so the OC version from the box in reality does not boost a single MHz higher, which in games generally falls within the range of measurement tolerance. The small Pegasus even clocked up to 15 MHz lower than the "reference" card, because it has the slightly worse cooler at the end and the non-OC card had a very good chip quality. The power limit of all cards is absolutely identical, as is their default base clock. The differences, therefore, if they come into play at all, are to be found in the inner values. The performance is largely the same, the rest is GPU lottery. And the BIOS flash between the OC and non-OC models also went.

And because we are at benchmarking, there are nas many options for the red counterpart from the AMD shelf, because both the Polaris card in the form of the Radeon RX 590 (182 to 250 Euros) and the Vega56 (from 217 to 300 Euros) are both in the middle e EOL (although still widely available) and thus almost a decoration compromise due to a lack of alternatives. According to the board partners, the new RX-5500 series will not come until December, so that Nvidia can now play any gap filler halma with itself. So any of the three GTX-1660 variants will be able to top AMD's small navi with certainty. And that Jensen's spies have always proved a super-näschen when it came to performance assessments, we now know enough.

Optics and feel of the Palit GTX 1660 Super GAMING PRO (OC)

The two identical, 584 gram cards measure in their length 23.8 cm gross from the outer edge slot bezel to the end of the radiator cover. They are 3.7 cm thick (plus the 5 mm for the backplate on the back) and 11 cm high (from the top edge of the motherboard slot to the top edge of the radiator cover). The black cooler cover made of ABS is inconspicuous and shines only with partial piano lacquer look, which eats fingerprints with a gesture of bored nonchalance for breakfast. There is no playful LED snooze or disco light, but it lights up at least monochrome a little.

The slot panel has an HDMI, DisplayPort and DVI port. The backplate is also made of matt-black plastic and has been screwed to the radiator body of the front. For the rest I refer to the complete tear down in text and image.

Optics and feel of the Gainward GTX 1660 Super Pegasus

The almost tiny card, weighing only 407 grams, measures in its length only a whole 17 cm gross from the outer edge slot bezel to the end of the radiator cover. It is 3.5 cm thick and up to 12 cm high (from the top of the motherboard slot to the top edge of the radiator cover). The black radiator cover could have been a little smaller to fit even better into ITX systems – but the designers probably found that bigger looks much faster. Too bad actually. You can't find any LEDs and that's a good thing.

The slot panel has an HDMI, DisplayPort and DVI port. On the other hand, you can no longer find a baking plate, which was certainly no longer in the price.

What makes Turing better than Pascal?

Since the autumn of last year, we have seen Nvidia launch four different GPUs, which have been moving further and further down the Turing generation hierarchy and are now even more advanced. With each of the new chips, you've also cleverly used the resources to set new price points without cannibalizing your own products. Of course, you can see this as you like, but somehow this fragmentation also requires some respect for the creativity of the creators, even if you slowly lose track.

After removing the RT and tensor cores of the larger Turing chips, a 284 mm2 chip consisting of 6.6 billion transistors manufactured by TSMC's 12 nm FinFET process remains. But despite its smaller transistor number, such a TU116 is still 42% larger than its predecessor's GP106 processor! Part of the grown size is certainly due to Turing's more sophisticated shaders. Like the High-End GeForce RTX 20 Series, the GeForce GTX 1660 Super supports simultaneous execution of FP32 arithmetic commands (which make up most shader workloads) and INT32 operations (for addressing/fetching data, Floating Point Min/Max, etc.).

This explains to a large extent the performance growth of Turing compared to Pascal at the same bar. Turing's streaming multiprocessors consist of fewer CUDA cores than Pascal's, but the design compensates for this in part by distributing more SMs to each GPU. The newer architecture assigns a scheduler and one disposition unit per 16 CUDA cores (like Pascal) to each set of 16 CUDA cores (twice as much as Pascal).

Four of these 16-core groupings include the SM, 96 KB cache, which can be configured as 64 KB L1/32 KB of shared memory, or vice versa, and four texture units. Because Turing has twice as many schedulers as Pascal, only one statement needs to be issued to the CUDA cores in every second cycle. In between, there is enough room to send a different instruction to any other device, including the INT32 cores.

Nvidia packs 22 SMs into the abbreviated TU116-300 and splits them into three graphics processing clusters. With 64 FP32 cores per SM, that's 1,408 CUDA cores and 88 texture units for the entire GPU. Six 32-bit memory controllers give the TU116 an aggregated 192-bit bus that operates the six GDDR6 modules at up to 336 GB/s. Compared to the GTX 1660 Ti with its 192 GB/s, this is even a certain bandwidth advantage, which glues the nominally smaller chip like a rubber band to the larger Ti chip and drags it behind it.

Each memory controller is allocated eight ROPs and a 256 KB large portion of the L2 cache. In total, this is 48 ROPs and 1.5 MB L2 for the TU116-300. The ROP number of the GeForce GTX 1660 Super is surprisingly high compared to the GeForce RTX 2060, which also uses 48 ROPs. But the L2 cache units are only half the size. Despite the larger Die, the 50% higher transistor count and the more aggressive GPU boost clock speed, the GeForce GTX 1660 Super is specified for a similar TDP of 125W, as the old GeForce GTX 1060 with its 125 watts.

Unfortunately, the GeForce GTX 1660 Super does not offer multi-GPU support. Nvidia confirms once again that multi-GPU settings are only intended to achieve higher, absolute computing performance, rather than giving players a way to design a single GPU configuration in such a way that it might be a more expensive one. single card could cannibalize.

Faster arithmetic even without tensor cores

In addition to the Shaders and the Unified Cache of the Turing Architecture, TU116 also supports an algorithm pair called Content Adaptive Shading and Motion Adaptive Shading, collectively called variable-rate shading. I have already written a longer introduction to the launch of the GeForce RTX 2080 (Ti). Nvidia has also revealed that it is replacing the tensor cores with dedicated FP16 cores to enable the GeForce GTX 1660 Super to process semi-precision operations at twice the rate of FP32. However, the other Turing-based GPUs also have twice the FP16 performance, so it's unclear how unique the GeForce-GTX-1660 family is within the Turing family.

Technical data and comparison maps

The reference clock variant (bottom left) has the same base clock as the OC model. This has 45 MHz more boost clock, which can be solved via the offset. There are no other differences.

At the end of this introduction, the maps of the new generation and those of the old generation in direct tabular comparison:

	MSI GeForce GTX 1660 Gaming X	Geforce GTX 1660 Super	MSI GeForce GTX 1660 Ti Gaming X	GeForce GTX 1060 FE
Architecture	Turing (TU116-300)	Turing (TU116-300)	Turing (TU116-400)	Pascal (GP106)
CUDA Cores	1408	1408	1536	1280
Tensor Cores	N/A	N/A	N/A	N/A
RT Cores	N/A	N/A	N/A	N/A
Texture Units	88	88	96	80
FP16 Power (Peak)	10 TFLOPS	10 TFLOPS	10.9 TFLOPS	4.4 TFLOPS
FP32 Power (Peak)	5 TFLOPS	5 TFLOPS	5.5 TFLOPS	4.4 TFLOPS
Base clock	1530 MHz	1530 MHz	1500 MHz	1506 MHz
Boost clock	1860 MHz	1785 MHz (Ref.) 1830 MHz (OC)	1875 MHz	1708 MHz
Memory	6 GB GDDR5	6 GB GDDR6	6 GB GDDR6	6GB GDDR5
Memory bus	192-bit	192-bit	192-bit	192-bit
Bandwidth	192 GB/s	336 GB/s	288 GB/s	192 GB/s
Rops	48	48	48	48
L2 Cache	1.5 MB	1.5 MB	1.5 MB	1.5 MB
Tdp	120 W	125 W	120 W	120 W
Transistors billions	6.6	6,6	6,6	4,4
The size	284 mm2	284 mm2	284 mm2	200 mm2
Sli	No	No	No	No

Test system and measurement methods

I have described the test system and the methodology in great detail for years and therefore, for the sake of simplicity, I now refer only to the following list. The evaluation software is self-programmed.

If you are interested, the summary in table form quickly provides a brief overview:

Hardware:	Intel Core i7-9900K MSI MEG Z390 Ace G.Skill TridentZ DDR4 3600 1x 1 TByte Toshiba OCZ RD400 (M.2, System SSD) 2x 960 GByte Toshiba OCZ TR150 (Storage, Images) Be Quiet Dark Power Pro 11, 850-watt power supply
Cooling:	Alphacool Ice Block XPX 5x Be Quiet! Silent Wings 3 PWM (Closed Case Simulation) Thermal Grizzly Kryonaut (for cooler change)
Housing:	Lian Li PC-T70 with expansion kit and modifications
Monitor:	Eizo EV3237-BK
Power consumption:	non-contact DC measurement on the PCIe slot (Riser-Card) non-contact DC measurement on the external PCIe power supply Direct voltage measurement on the respective feeders and on the power supply 2x Rohde & Schwarz HMO 3054, 500 MHz multi-channel oscillograph with memory function 4x Rohde & Schwarz HZO50, current togor adapter (1 mA to 30 A, 100 KHz, DC) 4x Rohde & Schwarz HZ355, touch divider (10:1, 500 MHz) 1x Rohde & Schwarz HMC 8012, digital multimeter with storage function
Thermography:	Optris PI640, infrared camera PI Connect evaluation software with profiles
Acoustics:	NTI Audio M2211 (with calibration file) Steinberg UR12 (with phantom power for the microphones) Creative X7, Smaart v.7 own low-reflection measuring room, 3.5 x 1.8 x 2.2 m (LxTxH) Axial measurements, perpendicular to the center of the sound source(s), measuring distance 50 cm Noise in dBA (Slow) as RTA measurement Frequency spectrum as a graph
Operating system	Windows 10 Pro (1903, all updates), driver as of 27.10.2019