AMD has released the performance results of its Instinct MI300X GPU in the MLPerf Inference v4.1 benchmark, an industry-standard assessment for AI hardware, software, and services. The results, announced on August 28, detail the capabilities of the MI300X when compared to Nvidia's H100 GPU. MLPerf, developed by MLCommons, is a benchmark suite designed to evaluate the training and inference performance of AI systems. It includes tests that cover a wide range of AI tasks, such as large language models (LLMs), natural language processing, computer vision, and medical image segmentation.
The benchmark is regularly updated to reflect the latest advancements in AI technology and includes a variety of workloads to ensure a comprehensive evaluation. In the MLPerf Inference v4.1 benchmark, AMD's Instinct MI300X GPU demonstrated competitive performance. For the LLama2-70B model, the MI300X achieved a throughput of 21,028 tokens per second in server mode and 23,514 tokens per second in offline mode when paired with an EPYC Genoa CPU. When tested with the 5th generation EPYC "Turin" CPU, the Instinct MI300X showed improved results, reaching 22,021 tokens per second in server mode and 24,110 tokens per second in offline mode. This represents an increase of 4.7% and 2.5% in server and offline scenarios, respectively, compared to the Genoa CPU configuration. When compared to Nvidia's H100 GPU, the AMD Instinct MI300X showed mixed results. In server mode, the MI300X was slightly behind the H100 in performance, while in offline mode, the performance gap was more pronounced. However, when using the Turin CPU configuration, the MI300X outperformed the H100 by 2% in server mode but fell short in offline scenarios.
The benchmark is regularly updated to reflect the latest advancements in AI technology and includes a variety of workloads to ensure a comprehensive evaluation. In the MLPerf Inference v4.1 benchmark, AMD's Instinct MI300X GPU demonstrated competitive performance. For the LLama2-70B model, the MI300X achieved a throughput of 21,028 tokens per second in server mode and 23,514 tokens per second in offline mode when paired with an EPYC Genoa CPU. When tested with the 5th generation EPYC "Turin" CPU, the Instinct MI300X showed improved results, reaching 22,021 tokens per second in server mode and 24,110 tokens per second in offline mode. This represents an increase of 4.7% and 2.5% in server and offline scenarios, respectively, compared to the Genoa CPU configuration. When compared to Nvidia's H100 GPU, the AMD Instinct MI300X showed mixed results. In server mode, the MI300X was slightly behind the H100 in performance, while in offline mode, the performance gap was more pronounced. However, when using the Turin CPU configuration, the MI300X outperformed the H100 by 2% in server mode but fell short in offline scenarios.
A significant advantage of the AMD Instinct MI300X GPU is its memory capacity, which surpasses that of Nvidia’s H100 platform. This larger memory capacity allows the MI300X to handle the most extensive language models and various data formats more effectively, providing a key advantage in certain AI workloads.