GIGABYTE and NVIDIA have lengthy partnered to develop NVIDIA-certified techniques for GPU computing use instances similar to synthetic intelligence (AI), high-performance computing (HPC), digital desktop (VDI), edge computing, 5G, rendering farm, skilled graphics processing and extra. To deal with the multitude of use instances, GIGABYTE presents the biggest portfolio of GPU compute server options in the marketplace, with modular system design and configurability in thoughts.
The options include optimized air cooling and preparation for DLC cooling and immersion cooling (in partnership with Asperitas, CoolIT, GRC, Submer and lots of others). The portfolio continues to develop as the newest computing applied sciences from main CPU/GPU producers enter the market, all aiming for the very best compute density, efficiency and vitality effectivity.
Among the many numerous licensed techniques, the next fashions are of specific curiosity for this text: G292-Z20, R282-Z96, G492-ZD2 and immersion cooling techniques.
G292-Z20 – probably the most dense GPU computing platform
Primarily based on the newest AMD EPYC 7002 / 7003 CPU structure, the G292-Z20 system design has a single CPU socket and depends on the massive variety of AMD EPYC CPU cores (as much as 64 cores) to regulate as much as 8 NVIDIA GPU playing cards (PCIe type). issue, double-slot or single-slot sizes). Unified reminiscence house (as in a single NUMA) throughout CPU, system reminiscence, GPU, and community units gives the very best computing efficiency with the bottom latency in knowledge motion. Whether or not in a naked metallic configuration or in virtualization, the G292-Z20 can assure the optimum distribution of computing assets.
G292-Z20 comes with 8x PCIe Gen4 slots for NVIDIA GPU, 1x CPU socket for AMD EPYC, 8x DDR4 3200MHz DIMM slots, 8x hot-swap drive bays (the place 2 bays assist NVMe PCIe Gen3 and 6 SATA/6 bays), 2x PCIe Gen4 growth slots for added units similar to HBA FC / storage playing cards and NVIDIA SmartNIC to speed up knowledge switch between nodes and clusters and GPUDirect/RDMA. These compact, GPU-centric computing options are of specific curiosity to HPC customers working with synthetic intelligence, molecular simulations, genomic sequencing, climate prediction, and different use instances.
The G292-Z20 additionally comes with immersion cooling preparation. The article addresses this subject on the finish.
R282-Z96 – a flexible, common GPU computing platform
The R282-Z96 comes with twin CPU sockets for AMD EPYC 7002 / 7003 processors (as much as 64 cores every socket), assist for as much as 3 NVIDIA GPU playing cards (PCIe type issue, dual-slot or single-slot sizes), and prolonged choices for PCIe add-in card configuration.
The 32 built-in DIMM slots present as much as 4TB of DDR4 ECC reminiscence (or as much as 8TB utilizing 3DS LRDIMM modules). For native storage, the R282-Z96 has one M.2 storage slot and 12 hot-swap 3.5″/2.5″ SATA/SAS HDD/SSD drive bays. There may be additionally an non-compulsory NVMe equipment for integrating U.2 NVMe PCIe Gen4 drives.
Most significantly, the R282-Z96 system design gives a balanced NUMA format throughout the 2 CPU domains: system reminiscence, native storage, and PCIe slots are evenly distributed, guaranteeing optimum efficiency and lowering efficiency bottlenecks in demanding workloads.
The R282-Z96 is due to this fact a great answer for VDI and HPC. For instance, two NVIDIA GPU playing cards similar to A16 and A40 can be utilized for low/mid/excessive finish digital desktops and digital purposes. The NVIDIA A30 and A100 can be utilized for containerization in AI improvement and for molecular evaluation, particle simulation, genomic sequencing, climate prediction, and different HPC workloads that require balanced CPU-GPU computing assets.
G492-ZD2 – Probably the most highly effective GPU system with NVIDIA A100 SXM4 and NVLink
The G492-ZD2 is amongst GIGABYTE’s best-selling fashions: the system relies on 8x NVIDIA A100 SXM4 GPUs and 2x AMD EPYC CPU sockets and presents the chance to put in as much as 10x NVIDIA SmartNICs to hurry up knowledge switch between nodes and clusters and GPUDirect/ RDMA. Licensed for RHEL and VMWare, the G492-ZD2 can also be appropriate for offering most multi-instance GPU (MIG) periods for AI builders who run workloads in numerous containerized environments and require customized algorithms, libraries and datasets to run in remoted person premises. .
The system makes use of a brand new cooling answer that dedicates a cooling chamber for NVIDIA GPUs and SmartNICs used within the PCIe growth slots, guaranteeing the very best potential airflow for cooling high-performance elements. The system really consists of two separate elements: a 3U GPU sled that sits on high of a 1U server housing the CPU, system reminiscence, storage bays, and entrance PCIe slots. The 3U GPU sled allows simple changeover in case of system upkeep, given the complicated on-board interconnects that join all GPU modules and the 1U server.
The inclusion and choices of NVIDIA A100 SXM4 modules within the G492-ZD2 system is vital as a result of the brand new NVIDIA Magnum IO GPUDirect applied sciences favor sooner throughput whereas offloading workloads from the CPU to realize efficiency positive aspects. The G492-ZD2 helps NVIDIA GPUDirect RDMA for direct knowledge trade between GPUs and third-party units similar to NICs or storage adapters. And there is assist for GPUDirect Storage for a direct knowledge path to maneuver knowledge from storage to GPU reminiscence whereas offloading the CPU, leading to increased bandwidth and decrease latency.
State-of-the-art HPC coaching: Liquid-cooled and immersion-cooled servers
At GIGABYTE, we’re seeing a drastic enhance in demand for direct liquid cooling (DLC) and immersion cooling (primarily single-phase primarily based) in comparison with the pre-COVID period. The demand comes primarily from knowledge middle operators and cloud service suppliers (CSPs) who’re involved concerning the steady enhance in computing energy and thus the ensuing warmth manufacturing by computing elements (in particularly by CPU and GPU).
We help knowledge facilities and prospects with their design evaluation, energy consumption, warmth dissipation, house optimization and PUE/water use effectivity (WUE), amongst many different technical matters, at each step of answer design.
Taking it a step additional, GIGABYE additionally presents set up/implementation providers working with knowledge middle infrastructure firms to make sure purchasers obtain easy venture supply and quick turnaround time for operational readiness. Most significantly, GIGABYTE strongly advises its prospects to make the most of its Proof-of-Idea (PoC) assets to validate every design answer and venture parameters to make the most effective determination, as many environmental elements might alter efficiency anticipated and system stability. GIGABYTE has PoC items (each single-phase and dual-phase immersion cooling) for testing and validating immersion cooled servers. Server mannequin choices are available 1U/2U/4U type elements and may be modified on demand to suit totally different use instances and workloads. GIGABYTE works with all the most important liquid cooling and immersion cooling know-how companions in the marketplace in order that prospects can rely on the design compatibility of GIGABYTE’s complete options with their infrastructure.
Conclusion
Past at the moment’s HPC know-how and additional into This autumn 2022, 2023 and past, GIGABYTE is poised to launch next-generation GPU computing options in partnership with NVIDIA. GIGABYTE will proceed to deal with numerous use instances by adapting the system design to real-world workflows and knowledge middle architectures.