Today we’re doing something completely different, because for once it’s not about gaming, which is getting boring by now, but about the new golden calf, namely AI. NVIDIA’s record result of 26.04 billion dollars in sales announced yesterday represents an increase of 262 percent, so it was simply time for a test. I am testing a total of 12 graphics cards, 6 from AMD and 6 from NVIDIA. The special thing about this is the selection, as the three fastest workstation and consumer cards from each manufacturer are competing against each other, and in the case of NVIDIA, with and without the use of Tensor cores
The UL Procyon AI Computer Vision Benchmark I am using today offers exactly the detailed insights into the performance of AI inference engines on this hardware in a Windows environment that we need. This benchmark includes multiple AI inference engines from different vendors and evaluates the performance of on-device inference operations.
AI workloads and tasks
The AI workloads include common machine vision tasks such as image classification, image segmentation, object detection and super-resolution. These tasks are performed using a set of popular state-of-the-art neural networks running on the device’s CPU, GPU or a dedicated AI accelerator to benchmark hardware performance. Various SDKs are used to measure AI inference performance, including:
- Microsoft® Windows ML
- Qualcomm® SNPE
- Intel® OpenVINO™
- NVIDIA® TensorRT™
- Apple® Core ML™
The benchmark uses various neural network models, including
- MobileNet V3: Optimized for visual recognition on mobile devices.
- Inception V4: An accurate model for image classification tasks.
- YOLO V 3: For object recognition and localization of objects in images.
- DeepLab V3: For semantic image segmentation.
- Real-ESRGAN: For super-resolution to upscale images to a higher resolution.
- ResNet 50: Provides a novel method for adding more layers in neural networks.
The benchmark includes both float- (FP32, FP16) and integer-optimized versions of each model, running sequentially on all compatible hardware components of the device. But I have a detailed explanation of all these individual benchmarks on the respective page, because I can’t assume that everyone knows exactly what I’m testing. But I am sure that the topic is (a) interesting and (b) also future-oriented, so that (c) readers will also be interested in it.
The results provide detailed insights into AI inference performance, including comparability of float- and integer-optimized models, as well as performance measurement across the GPU and specialized AI accelerators. The benchmark is designed primarily for engineering teams and professional users who need independent, standardized tools to evaluate the overall AI performance of inference engine implementations and dedicated hardware. It is ideal for hardware manufacturers, companies and the press to make informed decisions and verify the quality of AI inference. Unmd at press I just felt addressed.
In the world of artificial intelligence and machine learning, the FP32, FP16 and Integer data types play a crucial role in the performance and efficiency of computations on GPUs. Each of these data types has specific advantages and disadvantages that can vary depending on the use case and hardware architecture. This is one of the reasons why I show all results separately and have also run all the maps for each data type individually. With quite interesting results, by the way.
FP32 (32-bit floating point)
Advantages:
- Precision: FP32 offers high accuracy and is therefore ideal for applications that require high numerical precision, such as scientific calculations and complex models.
- Compatibility: Many existing neural networks and frameworks are optimized for FP32 and deliver the best results here.
Disadvantages:
- Power consumption: FP32 calculations are more computationally intensive and require more power and memory, resulting in higher power consumption and lower efficiency.
- Speed: FP32 calculations are slower compared to FP16 and Integer, which reduces the processing speed.
FP16 (16-bit floating point)
Advantages:
- Performance: FP16 calculations are faster and require less energy than FP32, which increases efficiency and throughput rate.
- Memory requirement: The memory requirement is lower, which means that more data can be processed and stored simultaneously.
Disadvantages:
- Accuracy: The lower accuracy of FP16 can lead to rounding errors, which can be problematic in certain applications.
- Adaptation effort: It may require additional effort to optimize and adapt existing models and algorithms to FP16.
Integer (INT8 and INT16)
Advantages:
- Efficiency: Integer computations are extremely efficient and consume significantly less energy than FP32 and FP16, making them ideal for mobile and embedded systems.
- Speed: They are faster than FP calculations, which increases inference speed and reduces latency.
Disadvantages:
- Accuracy: Integer formats offer the lowest precision, which can lead to greater errors and inaccuracies, especially with complex models.
- Complexity: Quantizing models to make them suitable for integer calculations can be complex and time-consuming.
Architectures and their optimization
Different GPU architectures are optimized differently for these data types:
- NVIDIA GPUs: these offer special tensor cores that are optimized for FP16 and INT8 computations, making them particularly efficient in AI computation.
- AMD GPUs: AMD is also focusing on improved support for FP16 and is working on improving efficiency with lower precision.
- Intel GPUs: With the OpenVINO architecture, Intel is optimizing for broad support of different data types, including INT8, to enable high performance with lower power consumption.
The bottom line is that the choice of data type and architecture depends on the specific requirements of the application. For high accuracy and compatibility, FP32 is suitable, while FP16 and integer are preferred for efficiency and speed in inference applications.
Test system
35 Antworten
Kommentar
Lade neue Kommentare
Urgestein
Mitglied
Urgestein
Veteran
Urgestein
1
Urgestein
Urgestein
1
Urgestein
1
Urgestein
Veteran
Urgestein
Urgestein
Urgestein
Urgestein
Urgestein
Urgestein
Alle Kommentare lesen unter igor´sLAB Community →