机器简介

Currently, we are using the SuperMicro 4124GS-TNR server.

CPU: Epyc 7742

AMD claim that theoretical floating point performance can be calculated as: Double Precision theoretical Floating Point performance = #real_cores * 8 DP flop/clk * core frequency. For a 2 socket system =2 * 64 cores * 8 DP flops/ clk * 2.2 GHz=2252.8 Gflops. This includes counting FMA as two flops.

GPU

RDMA

a1:00.0 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
a1:00.1 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
  • Official Documention
  • IB 卡的通讯协议 https://www.rdmamojo.com/2013/06/01/which-queue-pair-type-to-use/
  • OpenMPI 使用 http://scc.ustc.edu.cn/zlsc/user_doc/html/mpi-application/mpi-application.html

RAID

e6:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 11) (prog-if 01 [AHCI 1.0])

Official brief: https://www.marvell.com/content/dam/marvell/en/public-collateral/storage/marvell-storage-88se92xx-product-brief-2012-04.pdf

To configure the RAID controller, the easiest way is to press Ctrl+M during booting.

If you want to boot a system on RAID, please use Legacy mode. If you switched to UEFI only, you can't find the controller even if you change it back later. To solve it, see Supermicro FAQ Entry

Firmware

It's possible to flash firmware, see Marvell 9230 Firmware Updates and such. Our current firmware is 1070 (bios oprom version). If you want to flash another firmware, you might need to make a FreeDOS bootable disk.

Note: Do backup before flashing!

Many links to firmware or utilities are broken. Station Drivers may still work. Also refer Marvell 92xx A1 Firmware Image Repository, it have a full collection of firmware images.

You can find supermicro's firmware from official site but you can't download it. Try download from http://members.iinet.net.au/~michaeldd/.

NVMe

Installed with https://www.asus.com/us/Motherboards-Components/Motherboards/Accessories/HYPER-M-2-X16-CARD-V2/.

21:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
22:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
23:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
24:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]

⚠️ The PCIE socket of the NVME card must be configured as 4x4x4x4 so as to be recognized by the system correctly.

The card may have problems. If you find it doesn't work correctly, ask in Slack.

RAID Controller

MegaRAID

LSI_SAS_EmbMRAID_SWUG.pdf 2006 LSI_SAS_EmbMRAID_SWUG.pdf

ASrock

faq1

faq2

Win-Raid

forum

Help-Problem-to-flash-the-Marvel-SE-card-resolve

Syba-SI-PEX-PCIe-Card-with-Marvell-SATA-Controller

firmware UEFI

firmware DOS

http://members.iinet.net.au/~michaeldd/CDR-A1-UP_1.01_for_Intel_A1_UP_platform.zip

Supermicro superserver bios change cause 960 nvme disappear

https://tinkertry.com/supermicro-superserver-bios-change-can-cause-960-pro-and-evo-to-hide-heres-the-fix

Background knowlodge