Tensor cores and DLSS
Although the Volta architecture was otherwise full of significant changes compared to Pascal, the addition of tensor cores was the most important indication of the actual purpose of the GV100: the acceleration of 4×4 matrix operations with FP16 input, which The basis for training and inference (reasoning to make explicit statements from implicit assumptions) form neural networks. However, the tensor cores of the TU106 are slightly slower. Although they support FP32 accumulation operations such as those used for deep learning training, they only support half the speed of FP16 accumulation operations. Of course, this makes sense, because the GV100 is designed for the training of neural networks, while the TU106 is a gaming chip capable of using already trained networks for infering.
Most of Nvidia's current projects and plans for the tensor cores concern neural graphics. But you are also planning some other applications from the field of deep learning on these desktop graphics cards. Smart enemies, for example, would completely change the way players approach final boss battles. Speech synthesis, speech recognition, material enhancement, cheat detection and character animation are all areas where AI is already in use or where Nvidia sees at least decent potential for it.
But of course Deep Learning Super Sampling (DLSS) is also the focus of geForce RTX. The process by which DLSS is implemented requires developer support from Nvidia's NGX API. But Nvidia also promises that the integration is pretty simple – let's be surprised by the continued acceptance and implementation on the part of the game developers.
These models required for DLSS are downloaded via the Nvidia driver and accessed via the Tensor cores on each GeForce RTX graphics card. Nvidia thinks that each AI model will be only a few megabytes in size, making it relatively easy to reload such content (once) when needed. We just hope that DLSS will not be explicitly bound and GeForce Experience, creating a registration and installation constraint.
Raytracing and RT Cores
What is arguably the most promising chapter in the entire Turing story is the RT core, which is anchored to the bottom of each SM in TU106. Nvidia's RT cores are essentially pure accelerators with a fixed "pre-wired" function for the evaluation of cross and triangular sections of the Bounding Volume Hierarchy (BVH). Both operations are essential for the ray tracing algorithm. In short, these BVH form boxes with geometry content in a particular scene.
These boxes help to narrow the position of triangles that cut rays through a tree structure. Each time a triangle is in a box, it is divided into several additional boxes until the last box can be divided into triangles. Without BVHs, an algorithm would be forced to search the entire scene by completing tons of cycles that test each triangle for a possible intersection.
This algorithm is now possible with the Microsoft D3D12 Raytracing Fallback Layer APIs, which use Compute Shadern to emulate DirectX Raytracing on devices even without native support (and redirect to DXR when driver support is supported recognized). On a Pascal-based GPU, for example, the BVH scan is performed on programmable cores that pick up each box, decode it, test for intersections, and determine if there is another child box or triangle inside.
The process repeats until triangles are found where they are tested for intersections with the beam. As you can imagine, this process is very hardware-killing in its execution as pure software emulation, so that a smooth flow of real-time ray tracing on today's graphics processors is almost prevented. By creating such fixed-function accelerators for the intersection steps between the box and triangle, the SM sends a beam with a Ray Generation shader into the scene and passes this structure to the RT core. All intersection evaluations are of course much faster and the other resources of the SM are released for shading, just like with a traditional rasterization.
But Ray Tracing is a very broad term in itself, because only the pursuit of a ray says not much. What is more important is what can be implemented with the help of these functions. because the information thus obtained can be used very versatile to create things like AO, Reflections, Global Illumination and much more. improve or/or make it possible in the first place.
- 1 - Vorstellung, Daten und Testsystem
- 2 - Was kann Turing besser?
- 3 - Mesh- und variables Shading
- 4 - DLSS und Raytracing
- 5 - Tear Down: Platine und Kühler
- 6 - Battlefield V (DXR)
- 7 - AotS: Escalation (DX12)
- 8 - Destiny 2 (DX11)
- 9 - Far Cry 5 (DX11)
- 10 - Forza Motorsport 7 (DX12)
- 11 - GTA V (DX11)
- 12 - Metro Last Light (DX11)
- 13 - Shadow of the Tomb Raider (DX12)
- 14 - Ghost Recon Wildlands (DX11)
- 15 - The Division (DX12)
- 16 - The Witcher 3 (DX11)
- 17 - Wolfenstein 2 (Vulkan)
- 18 - Leistungsaufnahme, Temperaturen und Geräuschentwicklung
- 19 - Zusammenfassung und Fazit
Kommentieren