With the NVIDIA RTX A5000, I have a workstation graphics card in review today that should not only easily replace its predecessor, the Quadro RTX 5000, but can even beat the Quadro RTX 6000 in some areas. Plus, of course, the new features Ampere offers over Turing, as well as various tweaks that NVIDIA says make it more than just a more efficient replacement. This is exactly why I’ll be testing the RTX A5000 in a variety of workloads, not just pure raster graphics.
NVIDIA also offers NVLink on the RTX A5000 (similar to the RTX A6000), a 112 GB/s GPU interconnect that is a much faster alternative for multi-GPU systems than traditional PCI-E based solutions. By interconnecting two NVIDIA graphics cards with NVLink, memory and performance can be elegantly scaled to meet the demands of large visual computing tasks.
The RTX A5000 can also leverage NVIDIA RTX Virtual Workstation software (vWS), which provides space for visual computing to harness the power of virtual workstations from the data center or cloud on any device. This means that even the most demanding applications can be run on any device at the same level as a physical workstation, while still meeting the necessary security requirements. IT can virtualize any application from the data center, creating a work environment locally that is indistinguishable from a physical workstation. So in the end, each device offers the same performance as a workstation.
This is a feature for distributing a graphics card to several users in a virtual environment “portion by portion”. The appropriate profiles, in which the allocated resources are regulated, are distributed to the VMS. However, this product is subject to licensing and Enterprise Plus licences must be available, especially in the VMware environment. In addition, special Nvidia drivers are required, which are only available with a suitable account. The vGPU does not work without the licence server, which also increases the administrative effort.
Unboxing, look, feel and connectivity
The card weighs 1013 grams and is even almost 100 grams lighter than AMD’s Radeon Pro W6800 and thus not really a heavyweight. It is with its usual 27 cm well installable, is 10.5 cm high (installation height from PEG) and in addition 3.5 cm thick (dual slot design), whereby here no backplate is used, since also at the rear no memory was installed. The total of 12 modules GDDR6 with ECC functionality are all located on the front, which is an interesting solution in terms of design.
The card is supplied with a TBP of up to 230 watts via a standard 8-pin socket, so everything remains as known and expected. Interestingly, you can still see the free solder lugs for two 8-pin sockets at the right end of the upper edge of the board. As usual, NVIDIA also relies on the DHE principle (Direct Heat Exhaust) for cooling and a quite potent, but not too loud 6.5 cm radial fan. So the sucked air leaves the case on a direct way at the back, which is really convenient.
The slot bezel has a honeycomb grille for airflow and supports four DisplayPort (1.4) bays side by side. This creates space so that the hot exhaust air can really escape well.
The GA102-850’s total of 8,192 CUDA cores (by NVIDIA’s count) can already handle larger workflows, and the card is only slightly more trimmed than a GeForce RTX 3080 with 8704 CUDA cores. With a total of 256 Tensor Cores, 64 RT Cores, 256 TMUs and 96 ROPs, the card with the trimmed GA 102 relies on a total of 64 Streaming Multiprocessors (SM), which have been completely redesigned at Ampere. The basis may still be similar to that of Turing, but important and above all decisive things have changed.
The SM have made a really big transformation which ultimately results in the increased performance. A single SM at Turing still consisted of 64 FP32-ALUs for floating point calculations and 64 INT32-ALUs for integer calculations, divided into four blocks of 16 FP32- and 16 INT32-ALUs each. The trick here is that the FP and INT ALUs can be addressed simultaneously. And amps? The 64 pure FP32-ALUs per SM are still available, but the 64 INT32-ALUs are increased by 64 additional ones, which are still able to perform floating point and integer calculations, with one restriction: this is no longer possible in parallel. The division into 4 blocks each remains, but with a separate data path.
While the base clock is specified with 1170 MHz, the boost clock is up to 1695 MHz, which isn’t always reached under absolute full load in practice, though. The card relies on a whopping 24GB of GDDR6 at 16Gbps, which is made up of 12 2GB sized modules on the front of the board. This includes the 384-bit memory interface (768 GB/s bandwidth).nvidia-rtx-a5000-datasheet
PNY RTX A5000, 24GB GDDR6, 4x DP, Smallbox (VCNRTXA5000-SB)
|Zentrallager: verfügbar, Lieferung 3-5 WerktageFiliale Wilhelmshaven: nicht lagerndStand: 17.09.21 22:08||2484,96 €*Stand: 17.09.21 22:11|
|lagernd (13 Stück), Lieferung sofort möglich||2485,00 €*Stand: 17.09.21 22:21|
|Auf Lager||2493,99 €*Stand: 17.09.21 22:14|
PNY Quadro RTX 5000, 16GB GDDR6, 4x DP, USB-C (VCQRTX5000-PB / VCQRTX5000-BSP)
|edigitech||Sofort versandfertig, Lieferzeit ca. 1-3 Werktage||1969,00 €*Stand: 17.09.21 17:41|
|ca. 2-3 Werktage||1998,00 €*Stand: 17.09.21 22:08|
|lagernd (12 Stück), Lieferung sofort möglich||1999,00 €*Stand: 17.09.21 22:21|