NEXGENCLOUD
DATA CENTER OPERATIONS
WARN

GPU Commissioning Report

EXAMPLE-GPU-001 PowerEdge R750xa 8× NVIDIA A100 80GB PCIe
Install
PASS
4m 40s
Inventory
WARN
5s
Stress Test
PASS
15m 0s
SeveritySourceIssue
CRITICALInventory1 GPU(s) with ECC uncorrectable errors or double-bit retired pages
WARNINGInventory1 GPU(s) with UCE remapped rows (degrading, not yet at failure threshold)
HostnameEXAMPLE-GPU-001
SerialABCD1234567
PlatformPowerEdge R750xa
MotherboardPowerEdge R750xa
BIOS--
CPUIntel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz — 128 threads
RAM512 GB
NODE 0GPU 0 GPU 1 GPU 2 GPU 3
NODE 1GPU 4 GPU 5 GPU 6 GPU 7
Kernel5.15.0-151-generic
Architecture--
NVIDIA Drivernvidia-driver-580-server-open (580.126.09)
CUDAcuda-toolkit-12-8 (reports 13.0)
DCGMv4.5.2
8× NVIDIA A100 80GB PCIe — 81920 MiB HBM2e
#SerialPCIeNUMA IdleIdle PowerECC Remapped RowsBanks GFLOPS PCIe BW Power Stress Stress Lvl Mem Test
0 0000000000001 Gen4 x16 0 28°C 50.0W / 300.0W Clean None Max:640 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
1 0000000000002 Gen4 x16 0 29°C 52.3W / 300.0W CA:3 None Max:640 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
2 0000000000003 Gen4 x16 0 30°C 54.6W / 300.0W CA:1,891 None Max:640 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
3 0000000000004 Gen4 x16 0 31°C 56.9W / 300.0W CA:15 UA:58 None N/A 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
4 0000000000005 Gen4 x16 1 32°C 59.2W / 300.0W Clean UCE:1 Max:639 Part:1 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
5 0000000000006 Gen4 x16 1 33°C 61.5W / 300.0W CA:26 UA:30 UCE:2 Max:639 Low:1 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
6 0000000000007 Gen4 x16 1 34°C 63.8W / 300.0W CA:248 UA:651 None N/A 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
7 0000000000008 Gen4 x16 1 35°C 66.1W / 300.0W Clean CE:1 Max:640 49483.35 -- (3.162us) 297.0W / 308.1W 2688 98.5%
Level 3 Duration 14m 10s Exit 0
40 passed
Test01234567
diagnostic
memory
pcie
targeted_power
targeted_stress
DIMM inventory not available
No physical NICs found
No block devices found
No commissioning data