In new benchmarks, NVIDIA ‘s GeForce RTX 40 GPU series outperforms both laptop CPUs and dedicated NPUs in Llama and Mistral AI benchmarks. This performance boost is further optimized by NVIDIA’s TensorRT-LLM acceleration. NVIDIA’s TensorRT-LLM acceleration for Windows has thus significantly improved performance on Windows PCs.
New features have been added to NVIDIA’s RTX “AI PC” platform and it’s reaching new heights with the GeForce RTX 4090 flagship GPU. In a recent AI Decoded blog, NVIDIA showed how its current generation GPU outperforms the entire NPU ecosystem, which only reaches 50 TOPS in 2024. In contrast, NVIDIA’s RTX AI GPUs offer several 100 TOPS, reaching up to 1321 TOPS with the GeForce RTX 4090, making it the fastest desktop AI solution for running LLMs and other applications. At the same time, it is the world’s fastest gaming graphics card.
NVIDIA’s GeForce RTX GPUs have up to 24 GB of VRAM, while NVIDIA RTX GPUs offer up to 48 GB of VRAM. This makes them ideal solutions for working with LLMs (Large Language Models), as these workloads require large amounts of video memory. NVIDIA’s RTX hardware not only features dedicated video memory, but also AI-specific acceleration through Tensor Cores (hardware) and the aforementioned TensorRT-LLM (software).
The number of tokens generated on NVIDIA’s GeForce RTX 4090 GPUs is high across all batch sizes, but is more than quadrupled when TensorRT-LLM acceleration is enabled. NVIDIA has released new benchmarks conducted using the open-source Jan.ai platform, which recently integrated TensorRT-LLM into its local chatbot app. These benchmarks compare the performance of NVIDIA’s GeForce RTX 40 GPUs to laptop CPUs with dedicated AI NPUs.
The NVIDIA GeForce RTX 4090 GPU offers an 8.7x improvement over the AMD Ryzen 9 8945HS CPU without TensorRT-LLM. With acceleration enabled, this lead increases to 15x (a 70% increase over the non-TensorRT-LLM configuration). In this scenario, the RTX 4090 can process up to 170.63 tokens per second, while the AMD CPU only achieves 11.57 tokens/second. Even the NVIDIA GeForce RTX 4070 laptop GPU offers an acceleration of up to 4.45 times.
NVIDIA has also published benchmarks with an RTX 4090 in an eGPU configuration to show how laptop performance for AI workloads can be further boosted by an external GPU. This configuration offers a 9.07x performance increase over the same AMD laptop CPU. NVIDIA has once again proven its lead in the AI segment. The GeForce RTX 40 GPUs offer unrivaled performance for AI applications and are the best choice for those looking to drive the next generation of AI innovation.
Source: NVIDIA
7 Antworten
Kommentar
Lade neue Kommentare
Veteran
Veteran
Urgestein
Veteran
Mitglied
Urgestein
Veteran
Alle Kommentare lesen unter igor´sLAB Community →