Computational fluid dynamics (CFD) is used in industry and academia to address a wide range of use cases, including external aerodynamics, internal flows, heat transfer, combustion, reacting flows, free-surface flows, multiphase flows, and aero-acoustics.Â
CFD solvers are commonly used to simulate external aerodynamics, which helps reduce drag and improve fuel efficiency, leading to lower emissions in transport vehicles. Some solvers also support combustion or reacting flows, which can further reduce waste and emissions in systems.
Computationally solving these fluid flow problems is challenging due to the large range of time and length scales. Industrial scale problems must employ large high-performance computing (HPC) resources for timely turnaround of flow scenarios during the design and testing phases. Even when employing HPC resources, engineers still have to simplify models, by de-featuring and using flow approximations, to solve them in a useful timeframe.
Accelerated computing revolution
The accelerated processing capabilities of GPUs have unlocked new horizons in quantum computing, climate science, financial engineering, and artificial intelligence. Their far-reaching applications enable businesses to achieve processing speeds up to 1x faster than CPU-only processing.Â
CFD is an area that has particularly benefited from the accelerated computing capabilities of NVIDIA GPUs with accelerations of over 30x compared to traditional CPU-based compute. For more information, see Unleashing the Full Power of GPUs for Ansys Fluent.Â
Why the NVIDIA H200 Tensor Core GPU is a game changer for CFD
Running CFD workloads on GPUs is nothing new, so what makes the H200 a particularly good fit for running CFD?Â
The first consideration, which is shared by CPUs, is the available memory. The more memory, the larger the case that can be run. You must have enough GPU memory available across all the GPUs on which you plan to run.Â
Previous generations of GPUs (A100, H100) had 80 GB of GPU memory. The H200 GPU has 141 GB of memory, enabling much larger cases to be run. The exact amount of memory needed depends on the mesh size, solvers, and physics selected. A rule of thumb is 1.2M cells per GB of GPU memory for a single-phase, segregated scenario. Therefore, a 80-GB GPU could run a ~90M cell case whereas an H200 GPU can run a case closer to 160M cells.
Another key factor to consider is the limiting factor in compute performance. These limiting factors can broadly be memory bandwidth and compute rate. We found for many finite-volume CFD codes the limiting factor is memory bandwidth. Increasing memory bandwidth directly correlates to reduced solve times.
Table 1 shows the specifications of some of the most recent data center GPUs. These GPUs were selected as they use the highest-performing high-bandwidth memory (HBM). The table shows a comparison of these GPUs and their memory and memory bandwidth. The H200 GPU has over 2x the memory bandwidth of the 80-GB A100 GPU. You can expect ~2x performance improvement moving from A100 to H200. Â
GPU (SXM) | Memory (GB) | Memory Bandwidth (TB/s) |
A100 | 80 (HBM2e) | 2.0 |
H100 | 80 (HBM3) | 3.35 |
H200 | 141 (HBM3e) | 4.8 |
Ansys Fluent results
To show the performance of NVIDIA H200 Tensor Core GPU, we partnered with Ansys to show the performance of their industry-standard Ansys Fluent CFD solver. Ansys developed a fully accelerated native GPU solver, which has resulted in significant speedups.Â
GPU and CPU performance
Figure 1 shows the impressive performance when running Ansys Fluent on NVIDIA GPUs. The blue line shows a range of results on CPU from 4 to 96 nodes (512 to 12288 cores). Â
The results show that eight H200 GPUs are 34x faster than 512 cores. Ninety-six CPU nodes cannot match the performance of eight H200 GPUs. H200 is still 1.9x faster. This performance enables transient, scale-resolved cases running on a GPU to be completed in hours. These types of cases could take weeks of CPU time previously.
Generational GPU performance improvement
Figure 2 shows the same 250M-cell DrivAer model case for a range of GPUs. Eight H200 GPUs are 1.9x faster than eight A100 GPUs, which shows the improvement in performance between the NVIDIA Ampere and NVIDIA Hopper generations of Tensor Core GPUs.
CFD with greater performance and fidelity
The NVIDIA H200 Tensor Core GPU represents a step change in CFD performance due to 140 GB of HBM3e memory and 4.8 TB/s of memory bandwidth. This enables larger, higher fidelity models to run faster on a smaller number of GPUs.Â
The results from the industry standard code Ansys Fluent show significantly faster performance than CPU and competitor GPU offerings. These advantages enable you to simulate more design iterations at higher fidelity, ultimately resulting in better product performance.
For more information about how Ansys and NVIDIA are accelerating a new era of digital engineering, see NVIDIA and Ansys Partnership for Industrial Solutions.Â