1 - Mellanox cx6 ocp3网卡

Mellanox cx6 ocp3 200g网卡

曾经购入过两块cx6 200g网卡,ocp3 接口,用 ocp3转pcie4 转换卡安装到 z690 主板。结果因为 pcie 信号强度问题降速,可惜了。

记录一下当时测试的一些情况,以后买 ocp 网卡需要谨慎。

lspci

插在 pcie 4.0 4x 槽上:

sudo lspci -s 01:00.0 -vvv | grep Width                     
[sudo] password for sky:          
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM not supported
		LnkSta:	Speed 8GT/s (ok), Width x4 (ok)

插在 pcie 4.0 16x 槽上:

mlx5_core 0000:01:00.0 enp1s0np0: Link up
➜  ~ sudo lspci -s 01:00.0 -vvv | grep Width  
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM not supported
		LnkSta:	Speed 8GT/s (downgraded), Width x16 (ok)

dmesg

插在 pcie 4.0 4x 槽上:

sudo dmesg | grep mlx5_core
[    1.437917] mlx5_core 0000:04:00.0: enabling device (0000 -> 0002)
[    1.438118] mlx5_core 0000:04:00.0: firmware version: 20.27.446
[    1.438142] mlx5_core 0000:04:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:1b.4 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    1.731931] mlx5_core 0000:04:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    1.732211] mlx5_core 0000:04:00.0: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384)
[    1.734873] mlx5_core 0000:04:00.0: Port module event: module 0, Cable plugged
[    1.735122] mlx5_core 0000:04:00.0: mlx5_pcie_event:299:(pid 324): Detected insufficient power on the PCIe slot (25W).
[    1.741863] mlx5_core 0000:04:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(2048) RxCqeCmprss(1 basic)
[    1.989891] mlx5_core 0000:04:00.0 eth0: Disabling rxhash, not supported when CQE compress is active
[    1.990170] mlx5_core 0000:04:00.0 enp4s0np0: renamed from eth0
[    5.203057] mlx5_core 0000:04:00.0 enp4s0np0: Link down

插在 pcie 4.0 16x 槽上:

sudo dmesg | grep mlx5_core
[    1.444801] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)
[    1.445005] mlx5_core 0000:01:00.0: firmware version: 20.27.446
[    1.445026] mlx5_core 0000:01:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:00:01.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    1.684398] mlx5_core 0000:01:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    1.684600] mlx5_core 0000:01:00.0: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384)
[    1.686818] mlx5_core 0000:01:00.0: Port module event: module 0, Cable plugged
[    1.687055] mlx5_core 0000:01:00.0: mlx5_pcie_event:304:(pid 328): PCIe slot advertised sufficient power (75W).
[    1.693066] mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[    1.801740] mlx5_core 0000:01:00.0 enp1s0np0: renamed from eth0
[    5.189354] mlx5_core 0000:01:00.0 enp1s0np0: Link down
[   21.129045] mlx5_core 0000:01:00.0 enp1s0np0: Link up

mst

插在 pcie 4.0 4x 槽上:

sudo mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4123_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:04:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00

插在 pcie 4.0 16x 槽上:

sudo mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4123_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:01:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00

插在 pcie 4.0 4x 槽上:

sudo mlxlink -d /dev/mst/mt4123_pciconf0 --port_type PCIE -e

PCIe Operational (Enabled) Info
-------------------------------
Depth, pcie index, node            : 0, 0, 0
Link Speed Active (Enabled)        : 2.5G-Gen 1 (16G-Gen 4)
Link Width Active (Enabled)        : 4X (16X)
 

Errors
------
Showing Eye via SLRG raised the following exception: Eye information available for Gen3 and above

降速严重。

插在 pcie 4.0 16x 槽上:

sudo mlxlink -d /dev/mst/mt4123_pciconf0 --port_type PCIE -e

PCIe Operational (Enabled) Info
-------------------------------
Depth, pcie index, node            : 0, 0, 0
Link Speed Active (Enabled)        : 8G-Gen 3 (16G-Gen 4)
Link Width Active (Enabled)        : 16X (16X)

EYE Opening Info (PCIe)
-----------------------
Physical Grade                     :   1848,  2220,  2146,  2301,  2067,  2052,  1944,  2135,  2035,  1980,  2090,  2340,  1980,  2170,  2196,  2072
Height Eye Opening [mV]            :    147,   177,   171,   184,   165,   164,   155,   170,   162,   158,   167,   187,   158,   173,   175,   165
Phase  Eye Opening [psec]          :     16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16

设置 mtu

sudo ifconfig enp1s0np0 mtu 9000 up

2 - Mellanox 华为sp350 100G网卡

华为sp350/Mellanox cx5 100g单头网卡

2.1 - 华为sp350刷新固件

给华为sp350网卡刷新固件

目前华为sp350网卡无法刷新官方固件,暂时放弃。

附录

debian 12 下安装 mft

Debian 12 下安装 mft 的方式参考 cx4121a 一节,方法是类似的。

但是,不知道为什么安装完成后无法启动

$ sudo mst start                       
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI modulemodprobe: ERROR: could not insert 'mst_pci': Key was rejected by service
 - Failure: 1
Loading MST PCI configuration modulemodprobe: ERROR: could not insert 'mst_pciconf': Key was rejected by service
 - Failure: 1
Create devices

mst_pci driver not found
Unloading MST PCI module (unused) - Success
Unloading MST PCI configuration module (unused) - Success

反复测试过:

  • 不管是用 debian12 自带的驱动,还是安装官方最新驱动,mst 都无法启动。

  • mft mft-4.26 和 mft-4.27 两个版本都测试过,一样的问题

windows10 下删除 rom

为了开机更快一些,在 windows 下删除网卡的 rom,方法是通用的:

flint -d /dev/mst/mt4117_pciconf0 --allow_rom_change drom

执行过程很慢:

-I- Preparing to remove ROM ...
Removing ROM image    - OK  # 这一步要1-2分钟
Restoring signature  - OK

2.2 - cx4121a 驱动

为 Mellanox MCX4121A 网卡安装驱动

2.2.1 - debian 12 安装驱动

在debian12上安装华为sp350网卡的驱动

2024-09-09 更新: debian12 已经升级到 12.5 版本,网卡驱动版本为最新的 24.07-0.6.1.0 版本。

准备工作

查看默认驱动

这是debian12自带的默认驱动情况:

$lsmod | grep mlx

mlx5_ib               405504  0
ib_uverbs             172032  1 mlx5_ib
ib_core               438272  2 ib_uverbs,mlx5_ib
mlx5_core            1683456  1 mlx5_ib
mlxfw                  36864  1 mlx5_core
psample                20480  1 mlx5_core
pci_hyperv_intf        16384  1 mlx5_core

mlx5_core 的详细信息:

 $ modinfo mlx5_core               
filename:       /lib/modules/6.1.0-20-amd64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
license:        Dual BSD/GPL
description:    Mellanox 5th generation network adapters (ConnectX series) core driver
author:         Eli Cohen <eli@mellanox.com>
alias:          auxiliary:mlx5_core.eth
alias:          pci:v000015B3d0000A2DFsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2DCsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D6sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D3sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D2sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001023sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001021sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Fsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Esv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Dsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Csv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Bsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias:          pci:v000015B3d00001019sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001018sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001017sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001016sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001015sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001014sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001013sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001012sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001011sv*sd*bc*sc*i*
alias:          auxiliary:mlx5_core.eth-rep
depends:        psample,pci-hyperv-intf,mlxfw
retpoline:      Y
intree:         Y
name:           mlx5_core
vermagic:       6.1.0-20-amd64 SMP preempt mod_unload modversions 
sig_id:         PKCS#7
signer:         Debian Secure Boot CA
sig_key:        32:A0:28:7F:84:1A:03:6F:A3:93:C1:E0:65:C4:3A:E6:B2:42:26:43
sig_hashalgo:   sha256
signature:      86:53:46:C0:77:7E:22:E0:2A:B3:23:32:E3:87:DA:7C:94:3A:B1:1B:
		5A:92:14:41:17:78:2B:25:A9:9E:B9:9E:0C:F7:1C:2E:30:F3:D3:96:
		44:27:A8:74:A3:7D:2F:83:7D:2B:F4:A7:4E:C5:00:98:0B:56:15:0C:
		DF:53:B8:01:66:B2:C0:9D:C9:DD:2C:E3:A6:BA:91:E0:B0:11:37:DF:
		D7:32:B9:DA:B4:B5:B8:FB:CA:8F:21:46:91:05:28:C1:F1:D9:1B:C5:
		C7:B4:67:58:D9:29:B2:43:84:A0:5F:AD:01:E8:41:71:18:08:18:83:
		0E:F3:E7:88:32:08:46:3B:42:AF:A9:8F:63:E4:45:5D:45:16:E8:48:
		84:67:02:C1:A1:AF:A3:71:35:4C:E5:12:83:4D:05:BD:BE:14:01:F6:
		E5:19:E2:3A:60:9D:0A:D1:C6:B7:E6:CE:FE:8C:7C:0F:B5:01:49:08:
		D9:BB:CE:16:4C:5D:18:CC:61:ED:D3:D4:CA:2E:44:A0:4A:2B:59:DC:
		2B:30:06:27:8E:25:7E:0D:4B:00:7B:4E:2A:7F:65:87:22:B0:1A:BC:
		75:C5:83:47:21:92:D9:84:F5:FC:89:5B:3F:5A:9F:6E:16:FC:38:C4:
		2F:5C:9C:BF:7A:AB:F3:91:32:C6:CA:05:50:5C:27:10
parm:           debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm:           prof_sel:profile selector. Valid range 0 - 2 (uint)

默认驱动的版本为 6.1.0-20:

modinfo mlx5_core |  grep version
vermagic:       6.1.0-20-amd64 SMP preempt mod_unload modversions 

下载驱动

下载地址:

https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

选择对应的 debian 版本,最新的 24.07-0.6.1.0 版本已经提供对 debian 12.5 版本的支持了:

mlnx_ofed_download

下载得到 MLNX_OFED_LINUX-24.07-0.6.1.0-debian12.5-x86_64.tgz 文件, scp 传到 debian 12 下。

关闭 secure boot

需要在物理机或者虚拟机的 bios 中关闭了 secure boot,会和最新 24.07-0.6.1.0 版本的 mlnx_ofed 驱动冲突。

pve虚拟机中如图:

disable-secure-boot

否则安装最新版本的驱动后会报错而导致网卡无法使用。

安装驱动

su root
tar MLNX_OFED_LINUX-24.07-0.6.1.0-debian12.5-x86_64.tgz
cd MLNX_OFED_LINUX-24.07-0.6.1.0-debian12.5-x86_64

设置 PATH 否则默认 PATH 会找不到某些重要的命令而失败:

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

配置代理,加快下载速度:

export all_proxy=socks5://192.168.0.1:7891;export http_proxy=http://192.168.0.1:7890;export https_proxy=http://192.168.0.1:7890;export no_proxy=127.0.0.1,localhost,local,.local,.lan,192.168.0.0/16,10.0.0.0/16

开始安装:

./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk

注意对于某些版本的驱动要加 --distro debian12.1, 否则可能会报错:

Error: The current MLNX_OFED_LINUX is intended for debian12.1

这是因为我安装debian12时版本已经是 12.5了,而最新的 24.07-0.6.1.0 驱动已经有针对 debian 12.5 的打包:

./mlnxofedinstall --print-distro
debian12.5

--with-nvmf --with-nfsrdma --ovs-dpdk 这三个参数是可选的,我增加这三个参数主要是为了要学习测试这几个功能。

安装过程如下(例子还是24.01-0.3.3.1的,24.07-0.6.1.0 版本类似):

$./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk

Logs dir: /tmp/MLNX_OFED_LINUX.1071.logs
General log file: /tmp/MLNX_OFED_LINUX.1071.logs/general.log

Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):

ofed-scripts
mlnx-tools
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
iser-dkms
isert-dkms
srp-dkms
mlnx-nvme-dkms
rdma-core
libibverbs1
ibverbs-utils
ibverbs-providers
libibverbs-dev
libibverbs1-dbg
libibumad3
libibumad-dev
ibacm
librdmacm1
rdmacm-utils
librdmacm-dev
mstflint
ibdump
libibmad5
libibmad-dev
libopensm
opensm
opensm-doc
libopensm-devel
libibnetdisc5
infiniband-diags
mft
kernel-mft-dkms
perftest
ibutils2
ibsim
ibsim-doc
ucx
sharp
hcoll
knem-dkms
knem
openmpi
mpitests
srptools
mlnx-ethtool
mlnx-iproute2
rshim
ibarr
libopenvswitch
openvswitch-common
openvswitch-switch

This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:y

Checking SW Requirements...
One or more required packages for installing MLNX_OFED_LINUX are missing.
Attempting to install the following missing packages:
libipsec-mb1 uuid-runtime libunwind8 libunbound8 libpcap0.8
Removing old packages...
Uninstalling the previous version of MLNX_OFED_LINUX
Installing new packages
Installing ofed-scripts-24.01.OFED.24.01.0.3.3...
Installing mlnx-tools-24.01.0...
Installing mlnx-ofed-kernel-utils-24.01.OFED.24.01.0.3.3.1...
Installing mlnx-ofed-kernel-dkms-24.01.OFED.24.01.0.3.3.1...
Installing iser-dkms-24.01.OFED.24.01.0.3.3.1...
Installing isert-dkms-24.01.OFED.24.01.0.3.3.1...
Installing srp-dkms-24.01.OFED.24.01.0.3.3.1...
Installing mlnx-nvme-dkms-24.01.OFED.24.01.0.3.3.1...
Installing rdma-core-2307mlnx47...
Installing libibverbs1-2307mlnx47...
Installing ibverbs-utils-2307mlnx47...
Installing ibverbs-providers-2307mlnx47...
Installing libibverbs-dev-2307mlnx47...
Installing libibverbs1-dbg-2307mlnx47...
Installing libibumad3-2307mlnx47...
Installing libibumad-dev-2307mlnx47...
Installing ibacm-2307mlnx47...
Installing librdmacm1-2307mlnx47...
Installing rdmacm-utils-2307mlnx47...
Installing librdmacm-dev-2307mlnx47...
Installing mstflint-4.16.1...
Installing ibdump-6.0.0...
Installing libibmad5-2307mlnx47...
Installing libibmad-dev-2307mlnx47...
Installing libopensm-5.18.0.MLNX20240128.3f266a48...
Installing opensm-5.18.0.MLNX20240128.3f266a48...
Installing opensm-doc-5.18.0.MLNX20240128.3f266a48...
Installing libopensm-devel-5.18.0.MLNX20240128.3f266a48...
Installing libibnetdisc5-2307mlnx47...
Installing infiniband-diags-2307mlnx47...
Installing mft-4.27.0...
Installing kernel-mft-dkms-4.27.0.83...
Installing perftest-24.01.0...
Installing ibutils2-2.1.1...
Installing ibsim-0.12...
Installing ibsim-doc-0.12...
Installing ucx-1.16.0...
Installing sharp-3.6.0.MLNX20240128.e669b4e8...
Installing hcoll-4.8.3227...
Installing knem-dkms-1.1.4.90mlnx3...
Installing knem-1.1.4.90mlnx3...
Installing openmpi-4.1.7a1...
Installing mpitests-3.2.22...
Installing srptools-2307mlnx47...
Installing mlnx-ethtool-6.4...
Installing mlnx-iproute2-6.4.0...
Installing rshim-2.0.19...
Installing ibarr-0.1.3...
Installing libopenvswitch-2.17.8...
Installing openvswitch-common-2.17.8...
Installing openvswitch-switch-2.17.8...
Selecting previously unselected package mlnx-fw-updater.
(Reading database ... 101192 files and directories currently installed.)
Preparing to unpack .../mlnx-fw-updater_24.01-0.3.3.1_amd64.deb ...
Unpacking mlnx-fw-updater (24.01-0.3.3.1) ...
Setting up mlnx-fw-updater (24.01-0.3.3.1) ...

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Skipping FW update.
Device (01:00.0):
	01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
	Link Width: x16
	PCI Link Speed: 8GT/s

Installation passed successfully
To load the new driver, run:
/etc/init.d/openibd restart
Note: In order to load the new nvme-rdma and nvmet-rdma modules, the nvme module must be reloaded.

重启之后, 24.01-0.3.3.1 和之前的版本就可以正常工作了。

报错:pci_hp_register failed

但最新的 24.07-0.6.1.0 版本会报错, ip addr 会发现 cx5 网卡不见了。

dmesg 查看,会发现有这样的错误提示:

pci_hp_register failed with error -16

如果升级 linix 内核,则会在升级时提示 “Your system has UEFI Secure Boot enabled”:

我就是根据这个线索,去虚拟机的 bios 中关闭了 secure boot:

重启就正常了。

安装后处理

取消 openibd 的自动启动

安装完成后,重启之前,取消 openibd 的开机自动启动:

sudo systemctl disable openibd

输出为:

Synchronizing state of openibd.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable openibd
Removed "/etc/systemd/system/sysinit.target.wants/openibd.service".

反正目前也只用到 eth 模式,不用 ib 模式。

参考:

查看安装后的驱动信息

$ lsmod | grep mlx

mlx5_ib               479232  0
ib_uverbs             184320  2 rdma_ucm,mlx5_ib
ib_core               454656  8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx5_core            2420736  1 mlx5_ib
mlxfw                  36864  1 mlx5_core
psample                20480  1 mlx5_core
mlxdevm               180224  1 mlx5_core
mlx_compat             20480  11 rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
tls                   135168  1 mlx5_core
pci_hyperv_intf        16384  1 mlx5_core

mlx5_core 的详细信息:

$ modinfo mlx5_core                
filename:       /lib/modules/6.1.0-25-amd64/updates/dkms/mlx5_core.ko
alias:          auxiliary:mlx5_core.eth-rep
alias:          auxiliary:mlx5_core.eth
basedon:        Korg 6.8-rc4
version:        24.07-0.6.1
license:        Dual BSD/GPL
description:    Mellanox 5th generation network adapters (ConnectX series) core driver
author:         Eli Cohen <eli@mellanox.com>
srcversion:     769E8732BF9FAF2E580D2BC
alias:          pci:v000015B3d0000A2DFsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2DCsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D6sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D3sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D2sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001023sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001021sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Fsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Esv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Dsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Csv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Bsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias:          pci:v000015B3d00001019sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001018sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001017sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001016sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001015sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001014sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001013sv*sd*bc*sc*i*
alias:          auxiliary:mlx5_core.sf
depends:        mlxdevm,mlx_compat,tls,pci-hyperv-intf,psample,mlxfw
retpoline:      Y
name:           mlx5_core
vermagic:       6.1.0-25-amd64 SMP preempt mod_unload modversions 
sig_id:         PKCS#7
signer:         DKMS module signing key
sig_key:        25:DA:47:F2:9F:35:E2:08:53:6F:AD:D7:4E:06:E8:E0:59:C8:1E:89
sig_hashalgo:   sha256
signature:      01:97:E6:8D:53:AD:D9:38:E0:D5:8C:00:B9:8F:EB:C6:2E:5F:DF:7F:
		C5:DB:AA:62:85:81:36:F1:8E:E3:82:2E:33:63:9B:E6:57:07:2D:DC:
		43:51:C4:04:15:AA:C9:B7:A1:02:58:1F:74:EE:2A:27:91:B4:A2:23:
		FE:25:31:06:62:1D:D0:2D:A6:55:C5:B2:CB:A4:25:0B:DA:24:18:81:
		0E:E3:7A:76:EC:5A:C3:E0:A7:E5:75:44:4C:BD:3C:E1:AD:55:EA:F1:
		6A:E7:B4:7A:03:A6:DD:32:10:5B:A4:A0:74:EC:02:E0:D1:33:65:E2:
		17:4C:16:01:54:5D:60:C5:AF:0E:4C:4A:73:4B:FB:C8:BB:0A:00:AB:
		80:05:82:E2:9A:72:58:F6:0A:18:21:E2:3E:57:91:9A:2D:31:DC:04:
		55:A0:3E:B2:62:7D:F4:F1:9A:8C:B6:9F:88:27:A3:92:07:14:57:28:
		D4:61:4C:B2:EE:70:A4:DF:90:C9:F3:0C:85:43:8F:C2:C0:C1:75:77:
		E6:76:CD:26:B6:6D:F7:13:10:B0:EC:CA:9F:B8:31:3E:C3:A3:FA:ED:
		3E:CB:55:D6:7D:0E:6A:32:66:1E:C0:95:E1:00:F3:47:DA:20:0D:1E:
		68:DF:1F:4E:4C:99:97:D6:55:48:2B:65:E6:47:1A:35
parm:           num_of_groups:Eswitch offloads number of big groups in FDB table. Valid range 1 - 1024. Default 15 (uint)
parm:           debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm:           prof_sel:profile selector. Valid range 0 - 4 (uint)

更新之后 mlx5_core 的版本从默认升级到 24.01-0.3.3 :

$ modinfo mlx5_core |  grep version

version:        24.01-0.3.3
srcversion:     59290B9C495B89FC195B001
vermagic:       6.1.0-20-amd64 SMP preempt mod_unload modversions

2.2.2 - windows 驱动

在windows上安装华为sp350网卡的驱动

安装方式和 cx4121a 网卡方式一致。

3 - Mellanox cx4121a 25G网卡

Mellanox MCX4121A 25g双头网卡

3.1 - cx4121a刷新固件

给Mellanox MCX4121A 网卡刷新固件

背景

Mellanox MCX4121A 网卡全系通过固件和固件配置进行了型号划分,还有其他厂家的oem型号,都可以通过刷新固件的方式来升级。

使用ConnectX-4 lx核心的网卡 作者:

cx4121a

主要是将10g刷新为25g,或者将dell等oem型号刷新为原厂。

刷新固件

准备mft工具

从官网下载 mft 工具,安装:

https://network.nvidia.com/products/adapter-software/firmware-tools/

下载固件

原厂固件下载地址:

https://network.nvidia.com/support/firmware/connectx4lxen/

型号选择 MCX4121A-ACUT ,具体看上面的型号列表, 2x25g uefi enable 的型号就是 MCX4121A-ACUT。

下载得到文件:fw-ConnectX4Lx-rel-14_32_1010-MCX4121A-ACU_Ax-UEFI-14.25.17-FlexBoot-3.6.502.bin.zip

windows下刷新固件

MFT 中的 flint工具用于刷新网卡的固件。注意: 有权限要求,在 windows 下打开 cmd 时必须选择以管理员身份运行 cmd ,linux下需要用到 sudo。

用管理员方式打开 cmd,windows 下可以将要刷新的固件文件(如 fw-ConnectX4Lx-rel-14_32_1010-MCX4121A-ACU_Ax-UEFI-14.25.17-FlexBoot-3.6.502.bin ) 放在mft的安装目录下,如 C:\Program Files\Mellanox\WinMFT

执行:

cd C:\Program Files\Mellanox\WinMFT

flint -d mt4103_pci_cr0 -i fw-ConnectX4Lx-rel-14_32_1010-MCX4121A-ACU_Ax-UEFI-14.25.17-FlexBoot-3.6.502.bin -allow_psid_change burn

实测

  • dell cx4121c

    dell cx4121c 刷新 MCX4121A-ACUT 固件成功。

  • MCX4121A-XCAT 10G

    刷新25g MCX4121A-ACUT 固件成功。

参考资料

附录

debian 12 下安装 mft

Debian 12 下安装 mft 会稍微麻烦一些。下载之后,首先要安装一些基础包才能安装:

su root
export all_proxy=socks5://192.168.0.1:7891
apt-get install gcc make dkms

安装过程中会自动安装 linux header。或者参考这个文章安装 linux header:

How to Install Linux Kernel Headers on Debian 12 (linuxhint.com)

安装完成执行 ./install.sh,会继续报错,日志显示:

dpkg: warning: 'ldconfig' not found in PATH or not executable
dpkg: warning: 'start-stop-daemon' not found in PATH or not executable
dpkg: error: 2 expected programs not found in PATH or not executable
Note: root's PATH should usually contain /usr/local/sbin, /usr/sbin and /sbin

这时一个普遍问题,主要是 PATH 路径不对,不够齐全。参考文章:

ubuntu - dpkg cannot find ldconfig/start-stop-daemon in the PATH variable - Unix & Linux Stack Exchange

解决方法就是补全 PATH

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

然后再执行 ./install.sh 就可以完成安装。

$ ./install.sh 

-I- Removing mft external packages installed on the machine
-I- Installing package: /home/sky/temp/mft-4.26.1-3-x86_64-deb/SDEBS/kernel-mft-dkms_4.26.1-3_all.deb
-I- Installing package: /home/sky/temp/mft-4.26.1-3-x86_64-deb/DEBS/mft_4.26.1-3_amd64.deb
-I- In order to start mst, please run "mst start".

执行 mst start:

$ mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success

然后看一下情况,目前我插了两块网卡:

mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4117_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:05:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4117_pciconf1         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:06:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00

我要删除他们的 flexboot ,避免影响启动速度:

flint -d /dev/mst/mt4117_pciconf0 --allow_rom_change drom
flint -d /dev/mst/mt4117_pciconf1 --allow_rom_change drom

执行过程很慢:

-I- Preparing to remove ROM ...
Removing ROM image    - OK  # 这一步要1分钟
Restoring signature  - OK

3.2 - cx4121a 驱动

为 Mellanox MCX4121A 网卡安装驱动

3.2.1 - debian 12 安装驱动

在 debian 12 上安装 Mellanox MCX4121A 网卡的驱动

debian12

下载驱动

下载地址:https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

debian12-download

下载得到 MLNX_OFED_LINUX-23.10-1.1.9.0-debian12.1-x86_64.tgz 文件, scp 传到 debian 12 下:

su root
tar xvf MLNX_OFED_LINUX-23.10-1.1.9.0-debian12.1-x86_64.tgz
cd MLNX_OFED_LINUX-23.10-1.1.9.0-debian12.1-x86_64

# 设置 PATH 否则默认 PATH 会找不到某些重要的命令而失败
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk --distro debian12.1

注意要加 --distro debian12.1, 否则可能会报错:

Error: The current MLNX_OFED_LINUX is intended for debian12.1

这是因为我安装debian12时版本已经是 12.4了:

./mlnxofedinstall --print-distro
debian12.4

--with-nvmf --with-nfsrdma --ovs-dpdk 这三个参数是可选的,我增加这三个参数主要是为了要学习测试这几个功能。

3.2.2 - PVE 8.1 驱动

在 PVE 8.1 上安装 Mellanox MCX4121A 网卡的驱动

安装驱动

pve8.1 是基于 debian12 的,因此驱动安装方式和 debian 12 非常类似。同样下载驱动,然后执行:

./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk --distro debian12.1

......
Do you want to continue?[y/N]:y

Checking SW Requirements...
One or more required packages for installing MLNX_OFED_LINUX are missing.
Attempting to install the following missing packages:
pkg-config libnl-3-dev libgfortran5 flex m4 graphviz tcl ifupdown libltdl-dev uuid-runtime libnl-route-3-dev swig bison autoconf quilt gfortran lsb-base autotools-dev debhelper chrpath libipsec-mb1 automake tk
Failed command: apt-get install -y -o Dpkg::Options::='--force-confdef' -o Dpkg::Options::='--force-confold' pkg-config libnl-3-dev libgfortran5 flex m4 graphviz tcl ifupdown libltdl-dev uuid-runtime libnl-route-3-dev swig bison autoconf quilt gfortran lsb-base autotools-dev debhelper chrpath libipsec-mb1 automake tk
See /tmp/MLNX_OFED_LINUX.60098.logs/general.log# 

会遇到失败,按照提示打开 /tmp/MLNX_OFED_LINUX.60098.logs/general.log

单独执行 apt install 命令看看:

apt-get install -y -o Dpkg::Options::='--force-confdef' -o Dpkg::Options::='--force-confold' pkg-config libnl-3-dev libgfortran5 flex m4 graphviz tcl ifupdown libltdl-dev uuid-runtime libnl-route-3-dev swig bison autoconf quilt gfortran lsb-base autotools-dev debhelper chrpath libipsec-mb1 automake tk

......
W: (pve-apt-hook) !! WARNING !!
W: (pve-apt-hook) You are attempting to remove the meta-package 'proxmox-ve'!
W: (pve-apt-hook) 
W: (pve-apt-hook) If you really want to permanently remove 'proxmox-ve' from your system, run the following command
W: (pve-apt-hook) 	touch '/please-remove-proxmox-ve'
W: (pve-apt-hook) run apt purge proxmox-ve to remove the meta-package
W: (pve-apt-hook) and repeat your apt invocation.

按照提示,执行命令手工删除 proxmox-ve:

$ touch '/please-remove-proxmox-ve'
$ apt purge proxmox-ve

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be REMOVED:
  proxmox-ve*
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 27.6 kB disk space will be freed.
Do you want to continue? [Y/n] y
W: (pve-apt-hook) '/please-remove-proxmox-ve' exists, proceeding with removal of package 'proxmox-ve'
(Reading database ... 121504 files and directories currently installed.)
Removing proxmox-ve (8.1.0) ...
(Reading database ... 121498 files and directories currently installed.)
Purging configuration files for proxmox-ve (8.1.0) ...

然后再次执行前面的 apt install 命令,就可以正常安装了。

再次执行 ./mlnxofedinstall 命令,提示需要加 --force,最后执行的命令是:

./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk --distro debian12.1 --force

成功安装好驱动。

不幸的是看到了这个一个提示,NFSoRDMA 不支持 6.5.11-7-pve 内核。

WARNING: NFSoRDMA is not supported over kernel 6.5.11-7-pve, will continue installation without it.

故障

但是,重启之后发现驱动不可用:

➜  ~ lsmod | grep mlx
mlxdevm               184320  0
mlxfw                  36864  0
mlx_compat             20480  6 ib_ipoib,mlxdevm,ib_umad,ib_core,ib_uverbs,ib_cm
➜  ~ /etc/init.d/openibd

Usage: openibd {start|force-start|stop|force-stop|restart|force-restart|status}

➜  ~ /etc/init.d/openibd restart
Unloading HCA driver:                                      [  OK  ]
Failed loading kernel module mlx5_ib:                      [FAILED]
Loading Mellanox MLX5_IB HCA driver:                       [FAILED]
Failed loading kernel module mlx5_core:                    [FAILED]
Loading Mellanox MLX5 HCA driver:                          [FAILED]
Loading HCA driver and Access Layer:                       [FAILED]

Please run /usr/sbin/sysinfo-snapshot.py to collect the debug information
and open an issue in the http://support.mellanox.com/SupportWeb/service_center/SelfService

google了一下,发现这个讨论:

https://forum.proxmox.com/threads/upgrade-7-to-8-connect-4-dkms-module-installed.139297/

看样子似乎不需要安装额外的驱动,用内核自带的就好了。

3.2.3 - windows 驱动

在 windows 上安装 Mellanox MCX4121A 网卡的驱动

经过验证,适用于 windows 10 和 windows server 2022。

物理机安装

下载地址: https://network.nvidia.com/products/adapter-software/ethernet/windows/winof-2/

直接安装即可。

虚拟机安装

在虚拟机下安装驱动,如果只有一个网卡则会遇到死结:没有网卡驱动就无法从网上下载驱动,因此无法安装网卡驱动。

解决的方案是提前准备一个包含驱动的 iso 文件,然后以 cd 载入 iso 文件的方式将驱动文件传入虚拟机。

各个操作系统下制作 iso 文件的方式如下。

macos 下制作iso

参考:

步骤如下:

  • 将驱动文件放在一个文件夹下

  • 用系统自带的 Disk Utility 创建一个装载该驱动文件所在目录的镜像文件, image format 选 DVD/CD master 格式,得到 cdr 文件

  • 将 cdr 文件转为 iso 文件

    hdiutil makehybrid -iso -joliet -o yourname.iso yourname.cdr
    
  • 上传 iso 文件到 pve 下。

5 - Realtek 8125b 2.5G网卡

Realtek 8125b 是一块被广泛使用的 2.5G 网卡

5.1 - ubuntu 20.04 驱动安装

在 ubuntu 20.04 下安装 Realtek 8125b 驱动

问题

在 ubuntu 20.04 下,Realtek 8125b 网卡无法被识别。

lspci 可以发现该网卡,但 lsmod 只能看到 r8169 模块,而这个模块在启动时会报错。通过执行

dmesg | grep r8139

可以看到这样的日志信息:

r8169 0000:06:00.0: unknown chip XID 641

这个报错是表明 r8139 模块无法驱动 Realtek 8125b 网卡。需要安装 Realtek 8125b 的驱动。

安装驱动

Realtek 8125b 网卡驱动的下载地址:

https://www.realtek.com/zh-tw/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software

在 “Unix (Linux)” 下找到 “2.5G Ethernet LINUX driver r8125 for kernel up to 5.19” ,下载下来,然后解压缩并复制到 ubuntu 20.04 下。

然后执行

cd r8125-9.011.01
sudo ./autorun.sh

驱动安装完毕后重启即可。

掉驱动的问题

使用中发现偶尔会出现一个问题:在某一次重启后,安装好的r8125驱动消失了,r8139重新出现。然后由于 r8139 无法驱动 Realtek 8125b 网卡,因此导致这个网卡不可用。

这个问题基本上每隔一两个月就会发生一次。解决的方式也简单,重新再安装一次 r8125 驱动即可。但是如果遇到人在远程就束手无策了。

google了一下,发现这里有人遇到过和我类似的问题:

https://askubuntu.com/questions/1259947/cant-get-rtl8125b-working-on-20-04

After installing r8125, from time to time my PC automatically upgrades the driver to r8169, making internet inaccessible again, do you guys know how to disable this specific upgrade?

这个问题也可以通过升级到高一点版本的内核即可解决,比如 5.10 以上。但由于 hp544+ 网卡驱动的问题,我有些电脑不得不停留在 5.4 内核,因此这个问题总是会偶尔发生一次。

这里也讨论到类似问题:

https://ubuntu-mate.community/t/realtek-rtl8125-2-5gbe-ethernet-not-working-on-amd-b550-mobo/22469/4

有意见说是每当内核更新时,需要重新安装这个驱动。

After each kernel update , you must re install ethernet driver .

So for removing :

sudo modprobe -rfv r8125

sudo dkms remove -m r8125 -v 9.003.05 –all

sudo rm -r /usr/src/r8125-9.003.05

then reinstall with the initial procedure seen before .

ls /usr/src/
kernel-mft-dkms-4.24.0   linux-headers-5.4.0-169-generic  linux-headers-5.4.0-170-generic
linux-headers-5.4.0-169  linux-headers-5.4.0-170

6 - HP544+ 40g/56G光纤网卡

HP544+是一块价格非常低廉的40g/56G光纤网卡

6.1 - HP544+ 网卡介绍

HP544+ 网卡介绍

6.1.1 - RDMA介绍

RDMA介绍

RDMA 的底层传输模式 https://winddoing.github.io/post/53570e5e.html

6.2 - HP544+ 网卡固件

介绍HP544+ 网卡固件的下载、修改和刷新

6.2.1 - 查看固件

介绍MFT工具软件的安装和使用,查看当前固件信息

介绍

MFT 是 Nvidia Firmware Tools (Mellanox Firmware Tools),包含多个工具,日常主要用到的是:

  1. mst

    该工具提供以下功能:

    • 启动/停止寄存器访问驱动程序
    • 列举可用的mst设备
  2. flint

    该工具可将固件二进制镜像或扩展ROM镜像刻录到NVIDIA网络适配器/网关/交换机设备的闪存设备中。它包括对刻录的固件镜像和二进制镜像文件的查询功能。

下载

下载地址: NVIDIA Firmware Tools (MFT)

在页面中的 MFT Download Center 下找到对应的平台,一般常用的就是 windows 和 linux。

注意:hp544+ 等 cx3 pro 的网卡因为已经结束维护周期,新版本的 mft 已经不再支持这些网卡, 体现在 mft 安装完成后,执行 mst status 时会无法找到设备。

mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module is not loaded

PCI Devices:
------------

	No devices were found.

因此,对于这些老的网卡,要安装旧的版本,如 4.24 版本。

对于 cx4 / cx5 等新一点的网卡,可以安装最新版本。

安装

windows安装

直接打开下载的 WinMFT_x64_4_24_0_72.exe 安装即可,默认安装路径为 C:\Program Files\Mellanox\WinMFT

linux安装

以 ubuntu 为例,

$ wget --no-check-certificate https://www.mellanox.com/downloads/MFT/mft-4.24.0-72-x86_64-deb.tgz
$ tar xvf mft-4.24.0-72-x86_64-deb.tgz
$ cd mft-4.24.0-72-x86_64-deb
# 如果缺少依赖包会导致安装失败,可以先apt命令安装以下包
$ sudo apt-get install gcc make dkms
$ sudo ./install.sh
-I- Removing mft external packages installed on the machine
-I- Installing package: /home/sky/hp544/mft-4.24.0-72-x86_64-deb/SDEBS/kernel-mft-dkms_4.24.0-72_all.deb
-I- Installing package: /home/sky/hp544/mft-4.24.0-72-x86_64-deb/DEBS/mft_4.24.0-72_amd64.deb
-I- In order to start mst, please run "mst start".

重新安装

在 pve 下遇到安装时正常使用,后续不知道为什么突然无法使用了,报错如下:

mst start  
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI modulemodprobe: FATAL: Module mst_pci not found in directory /lib/modules/6.5.13-1-pve
 - Failure: 1
Loading MST PCI configuration modulemodprobe: FATAL: Module mst_pciconf not found in directory /lib/modules/6.5.13-1-pve
 - Failure: 1
Create devices

mst_pci driver not found
Unloading MST PCI module (unused)modprobe: FATAL: Module mst_pci not found.
 - Failure: 1
Unloading MST PCI configuration module (unused)modprobe: FATAL: Module mst_pciconf not found.
 - Failure: 1

解决的方式是重新安装 mft 工具,就可以恢复正常使用。

启动

在 linux 下需要运行如下命令启动 mst (windows 下不需要):

sudo mst start

Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices

使用

所有工具都有权限要求,在 windows 下打开 cmd 时必须选择以管理员身份运行 cmd ,linux下需要用到 sudo。

mst工具

mst status 用来查看当前的网卡情况,linux下是这样:

sudo mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4103_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:04:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4103_pciconf1         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:14:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4103_pciconf2         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:1d:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4103_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:04:00.0 bar=0xfe700000 size=0x100000
                                   Chip revision is: 00
/dev/mst/mt4103_pci_cr1          - PCI direct access.
                                   domain:bus:dev.fn=0000:14:00.0 bar=0xfd900000 size=0x100000
                                   Chip revision is: 00
/dev/mst/mt4103_pci_cr2          - PCI direct access.
                                   domain:bus:dev.fn=0000:1d:00.0 bar=0xfd100000 size=0x100000
                                   Chip revision is: 00

我这里插了三块卡,所以出来的信息比较多,如果只有一块网卡,则应该是这样:

sudo mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4103_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:04:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
                                   Chip revision is: 00
/dev/mst/mt4103_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:04:00.0 

windows下输出内容类似,只是 devices 信息中,"/dev/mst/mt4103_pci_cr0" 会变成 “mt4103_pci_cr0”。

mst status

MST devices:
------------
  mt4103_pci_cr0
  mt4103_pciconf0

mlxfwmanager工具

mlxfwmanager工具可以用来查询当前固件信息:

sudo mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX3Pro
  Part Number:      764285-B21_Ax
  Description:      HP InfiniBand FDR/Ethernet 10Gb/40Gb 2-port 544+FLR-QSFP Adapter
  PSID:             HP_1380110017
  PCI Device Name:  /dev/mst/mt4103_pci_cr2
  Port1 MAC:        e0071b783ea1
  Port2 MAC:        e0071b783ea2
  Versions:         Current        Available     
     FW             2.42.5700      N/A           
     CLP            8025           N/A           
     PXE            3.4.0754       N/A           
     UEFI           14.11.0049     N/A           

  Status:           No matching image found

FW 2.42.5700 N/A 这里显示的就是当前设置的固件版本。5700是最新的固件版本,有些老一点的固件会显示为类似 FW 2.42.5016 N/A

注意:在 linux 下运行时要 sudo,不然没有权限会显示没有设备的信息,容易被误导:

mlxfwmanager  
-E- No devices found or specified, mst might be stopped, run 'mst start' to load MST modules
➜  mft-4.24.0-72-x86_64-deb sudo mlxfwmanager
Querying Mellanox devices firmware ...

类似的,在 windows 下打开 cmd 时必须选择以管理员身份运行 cmd 。

mlxconfig

mlxconfig工具用于修改网卡的设置,常见命令有:

查询设置

windows下:

mlxconfig -d mt4103_pci_cr0 query

linux下:

sudo mlxconfig -d /dev/mst/mt4103_pci_cr0 query

sudo mlxconfig -d /dev/mst/mt4103_pci_cr0 query

Device #1:
----------

Device type:    ConnectX3Pro    
Device:         /dev/mst/mt4103_pci_cr0

Configurations:                                      Next Boot
         SRIOV_EN                                    True(1)         
         NUM_OF_VFS                                  16              
         WOL_MAGIC_EN_P2                             True(1)         
         LINK_TYPE_P1                                ETH(2)          
         PHY_TYPE_P1                                 XFI(2)          
         XFI_MODE_P1                                 _10G(0)         
         FORCE_MODE_P1                               False(0)        
         LINK_TYPE_P2                                ETH(2)          
         PHY_TYPE_P2                                 XFI(2)          
         XFI_MODE_P2                                 _10G(0)         
         FORCE_MODE_P2                               False(0)        
         LOG_BAR_SIZE                                5               
         BOOT_PKEY_P1                                0               
         BOOT_PKEY_P2                                0               
         BOOT_OPTION_ROM_EN_P1                       True(1)         
         BOOT_VLAN_EN_P1                             False(0)        
         BOOT_RETRY_CNT_P1                           0               
         LEGACY_BOOT_PROTOCOL_P1                     PXE(1)          
         BOOT_VLAN_P1                                1               
         BOOT_OPTION_ROM_EN_P2                       True(1)         
         BOOT_VLAN_EN_P2                             False(0)        
         BOOT_RETRY_CNT_P2                           0               
         LEGACY_BOOT_PROTOCOL_P2                     PXE(1)          
         BOOT_VLAN_P2                                1               
         IP_VER_P1                                   IPv4(0)         
         IP_VER_P2                                   IPv4(0)         
         CQ_TIMESTAMP                                True(1)         
         STEER_FORCE_VLAN                            False(0)

重置设置

windows下:

mlxconfig -d mt4103_pci_cr0 reset

linux下:

sudo mlxconfig -d /dev/mst/mt4103_pci_cr0 reset

切换IB/Ethernet模式

windows下:

mlxconfig -d mt4103_pci_cr0 set LINK_TYPE_P1=2  # 1. ib模式 2. eth模式 3. vpi 模式 
mlxconfig -d mt4103_pci_cr0 set LINK_TYPE_P2=2

linux下:

sudo mlxconfig -d /dev/mst/mt4103_pciconf0 set LINK_TYPE_P1=2
sudo mlxconfig -d /dev/mst/mt4103_pciconf0 set LINK_TYPE_P2=2

flint工具

flint工具用于刷新网卡的固件,后续详细介绍。

6.2.2 - 下载固件

介绍如何下载hp544+网卡的固件

HPE 官方下载

HPE 官方下载页面

Product Detail - HPE FDR InfiniBand Adapters | HPE Support

Select Model 中选择 “HPE InfiniBand FDR/Ethernet 10Gb/40Gb 2-port 544+FLR-QSFP Adapter”,列出来的内容中找到

这就是最新的 5700 版本的固件了。

直接下载链接为:

解压缩之后得到的 fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.bin 文件就可以用于后续的固件刷新了。

6.2.3 - 刷新固件

介绍如何刷新hp544+网卡的固件

准备工作

首先安装前面的要求安装好 MFT 工具,并下载好需要进行刷新的固件文件。

刷新固件

MFT 中的 flint工具用于刷新网卡的固件。

注意: 有权限要求,在 windows 下打开 cmd 时必须选择以管理员身份运行 cmd ,linux下需要用到 sudo。

windows刷新固件

windows 下可以将要刷新的固件文件(如 fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.bin ) 放在mft的安装目录下,如 C:\Program Files\Mellanox\WinMFT

执行 flint 命令刷新固件:

cd C:\Program Files\Mellanox\WinMFT

flint -d mt4103_pci_cr0 -i fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.bin b

    Current FW version on flash:  2.42.5016
    New FW version:               2.42.5700

Burning FS2 FW image without signatures - OK
Restoring signature                     - OK

再次执行 mlxfwmanager 命令查看刷新之后的固件信息:

mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX3Pro
  Part Number:      764285-B21_Ax
  Description:      HP InfiniBand FDR/Ethernet 10Gb/40Gb 2-port 544+FLR-QSFP Adapter
  PSID:             HP_1380110017
  PCI Device Name:  mt4103_pci_cr0
  Port1 GUID:       24be05ffffbd0801
  Port2 GUID:       24be05ffffbd0802
  Versions:         Current        Available
     FW             2.42.5700      2.42.5700
     CLP            8025           8025
     FW (Running)   2.42.5016      N/A
     PXE            3.4.0754       3.4.0754
     UEFI           14.11.0049     14.11.0049

  Status:           Up to date

可以看到固件已经被刷新到 2.42.5700 版本。

linux下刷新固件

linux 下类似,注意 -d 参数后面的设置信息需要是类似 “/dev/mst/mt4103_pci_cr0” 这样,具体参见 mft status 命令的输出:

sudo flint -d /dev/mst/mt4103_pci_cr0 -i fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.bin b

重置设置

刷新固件之后,推荐重置网卡设置。windows下:

mlxconfig -d mt4103_pci_cr0 reset

linux下:

sudo mlxconfig -d /dev/mst/mt4103_pci_cr0 reset

切换Ethernet模式

hp544+支持 ib / eth / vpi 三种模式,但一般我固定用 eth 模式(ib模式暂时还不会设置软路由)。这种情况下可以考虑将网卡的工作模式固定为 eth。

windows下:

mlxconfig -d mt4103_pci_cr0 set LINK_TYPE_P1=2  # 1. ib模式 2. eth模式 3. vpi 模式 
mlxconfig -d mt4103_pci_cr0 set LINK_TYPE_P2=2

linux下:

sudo mlxconfig -d /dev/mst/mt4103_pciconf0 set LINK_TYPE_P1=2
sudo mlxconfig -d /dev/mst/mt4103_pciconf0 set LINK_TYPE_P2=2

移除 flexboot

flexboot 支持网卡启动,但目前我没有这方面的需求,而 flexboot 会影响开机启动速度,因此在不需要 flexboot 的情况下可以通过移除 flexboot 来加快开机启动速度。

windows下:

flint -d mt4103_pci_cr0 --allow_rom_change drom

linux下:

sudo flint -d /dev/mst/mt4103_pciconf0 --allow_rom_change drom

6.2.4 - 修改原厂固件

介绍如何修改原厂固件以便支持特性如单模光模块和56G以太网

说明

原文地址

原文请访问:

544+ flr 解锁56G直连 (github.com)

修改后的固件下载

修改好的固件请在此下载:

链接: https://pan.baidu.com/s/1uiebg1P-tTL1WIuxMgLblQ?pwd=tfgi

简单说明:

  • powerlevel: 支持单模光模块 (已验证,可用,但我只测试了40g的光模块)
  • 56kr4: 支持56g 以太网 (已验证,可用,但我只测试原厂dac线)
  • drom:去除 flexboot,这样开机不用在屏幕上显示网卡的字样,可以提供开机速度(应该bios里面也不会增加boot的内容,待验证)

有尝试这几个固件的同学请留言说明验证情况,谢谢!


复制一份原文在此以备不时之需:

原文

需要的工具

准备工作

下载所有需要的工具,并且安装 NVIDIA Firmware Tools (MFT)NVIDIA Firmware Tools (MFT) 4.3.0.25ConnectX3Pro-rel-2_40_5030.tgz 里的 fw-ConnectX3Pro-rel.mlxMCX354A-FCC_Ax.ini 复制到 NVIDIA Firmware Tools (MFT) 4.3.0.25 安装目录 我安装在了 C:\Program Files\Mellanox\WinMFT_x64_4_3_0_25 再复制一份 MCX354A-FCC_Ax.ini 命名为 MCX354A-FCC_Ax_56G.ini

再将 fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.tgz 里的 fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.binmft-scripts 里的 fs2_update_ini.py 复制到 NVIDIA Firmware Tools (MFT) 安装目录 我安装在了 C:\Program Files\Mellanox\WinMFT

生成固件

修改 MCX354A-FCC_Ax_56G.ini[IB] 部分

port1_802_3ap_56kr4_ability = true
port2_802_3ap_56kr4_ability = true

port1_802_3ap_cr4_enable = true
port2_802_3ap_cr4_enable = true
port1_802_3ap_cr4_ability = true
port2_802_3ap_cr4_ability = true

port1_802_3ap_kr4_enable = true
port2_802_3ap_kr4_enable = true
port1_802_3ap_kr4_ability = true
port2_802_3ap_kr4_ability = true

改为

port1_802_3ap_56kr4_enable = true
port2_802_3ap_56kr4_enable = true
port1_802_3ap_56kr4_ability = true
port2_802_3ap_56kr4_ability = true

port1_802_3ap_cr4_enable = true
port2_802_3ap_cr4_enable = true
port1_802_3ap_cr4_ability = true
port2_802_3ap_cr4_ability = true

port1_802_3ap_kr4_enable = true
port2_802_3ap_kr4_enable = true
port1_802_3ap_kr4_ability = true
port2_802_3ap_kr4_ability = true

分别生成两个固件, 开启56G和不开启的

C:\Program Files\Mellanox\WinMFT_x64_4_3_0_25> mlxburn -fw fw-ConnectX3Pro-rel.mlx -c MCX354A-FCC_Ax_56G.ini -wrimage MCX354A-FCC_Ax_56G.bin
-W- Removing parameter defined outside a group: "prepMLX version".
-I- Generating image ...
-I- Image generation completed successfully.
C:\Program Files\Mellanox\WinMFT_x64_4_3_0_25> mlxburn -fw fw-ConnectX3Pro-rel.mlx -c MCX354A-FCC_Ax.ini -wrimage MCX354A-FCC_Ax.bin
-W- Removing parameter defined outside a group: "prepMLX version".
-I- Generating image ...
-I- Image generation completed successfully.

分析固件

通过 UltraCompare 对比两个固件, 一共有四处改动

第一处在头部

MCX354A-FCC_Ax_56G.bin MCX354A-FCC_Ax.bin
00000020h: 00 00 68 E6 00 00 00 04 F5 00 00 0B FD 00 3B C8 ; 00000020h: 00 00 32 AE 00 00 00 04 F5 00 00 0B FD 00 3B C8 ;
00000030h: 00 0A 99 48 00 00 3B 84 00 10 00 40 00 00 01 ; 00000030h: 00 0A 99 44 00 00 3B 84 00 10 00 40 00 00 01 85 ;

第二处在文件靠后位置

MCX354A-FCC_Ax_56G.bin MCX354A-FCC_Ax.bin
000a7bb0h: 1F 83 F9 00 7F 8F FF 20 00 01 F9 A0 00 8F F0 02 ; 000a7bb0h: 1F 03 F9 00 7F 8F FF 20 00 01 F9 A0 00 8F F0 02 ;
000a7bc0h: 03 8F F0 17 00 01 F9 A4 00 40 00 01 00 D3 01 FF ; 000a7bc0h: 03 8F F0 17 00 01 F9 A4 00 40 00 01 00 D3 01 FF ;
000a7bd0h: 00 01 F9 AC 1F 83 F9 00 7F 8F FF 20 00 01 F9 B0 ; 000a7bd0h: 00 01 F9 AC 1F 03 F9 00 7F 8F FF 20 00 01 F9 B0 ;

第三处在文件末尾前

MCX354A-FCC_Ax_56G.bin MCX354A-FCC_Ax.bin
000a8fe0h: 00 00 96 1F 00 00 00 03 00 00 00 18 00 00 00 00 ; 000a8fe0h: 00 00 22 2D 00 00 00 03 00 00 00 18 00 00 00 ;

第四处为文件末尾全部

第四处改动较多, 实际为ini压缩后数据, 第二处改了两个位置, 通过

port1_802_3ap_56kr4_enable = true
port2_802_3ap_56kr4_enable = true

可以猜测, 此处为两个端口的设置项 而第一处和第三处则为校验位

继续分析 fw-ConnectX3Pro-rel.mlx, 找到 port1_802_3ap_56kr4_enable相关选项

scratchpad.eth.port[0].mode_40g_is_50g 0x1f99c.5 1 scratchpad.eth.port[0].b0_hw_eye_opener_cfg_measure_time 0x1f99c.8 4 scratchpad.eth.port[0].eth_802_3ap_56kr4_ability 0x1f99c.12 1 scratchpad.eth.port[0].eth_802_3ap_cr4_ability 0x1f99c.13 1 scratchpad.eth.port[0].eth_802_3ap_kr4_ability 0x1f99c.14 1 scratchpad.eth.port[0].eth_802_3ap_kr_ability 0x1f99c.15 1 scratchpad.eth.port[0].eth_802_3ap_kx_ability 0x1f99c.16 1 scratchpad.eth.port[0].eth_802_3ap_kx4_ability 0x1f99c.17 1 scratchpad.eth.port[0].eth_802_3ap_kr2_ability 0x1f99c.18 1 scratchpad.eth.port[0].eth_802_3ap_100M_ability 0x1f99c.19 1 scratchpad.eth.port[0].eth_802_3ap_56kr4_enable 0x1f99c.23 1 scratchpad.eth.port[0].eth_802_3ap_cr4_enable 0x1f99c.24 1 scratchpad.eth.port[0].eth_802_3ap_kr4_enable 0x1f99c.25 1 scratchpad.eth.port[0].eth_802_3ap_kr_enable 0x1f99c.26 1 scratchpad.eth.port[0].eth_802_3ap_kx_enable 0x1f99c.27 1 scratchpad.eth.port[0].eth_802_3ap_kx4_enable 0x1f99c.28 1 scratchpad.eth.port[0].eth_802_3ap_kr2_enable 0x1f99c.29 1 scratchpad.eth.port[0].eth_802_3ap_100M_enable 0x1f99c.30 1

可以看到此项位置为 0x1f99c.23 将第二处改动转为二进制进行对比

1F83 F900

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0

1F03 F900

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0

很明显, 第23位为 port1_802_3ap_56kr4_enable, 所以只要修改此位就可以不通过 mlxburn 解锁56G

制作固件

先到 NVIDIA Firmware Tools (MFT) 目录提取一份配置文件

C:\Program Files\Mellanox\WinMFT> flint -i fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.bin dc HP_1380110017.ini

转到 [IB]

;;speed flags for port0
cx3_spec1_3_ib_support_port0 = 1
cx3_spec1_2_ib_support_port0 = 1
spec1_3_fdr10_ib_support_port0 = 1
spec1_3_fdr14_ib_support_port0 = 1
port1_802_3ap_56kr4_ability = 1
port1_802_3ap_cr4_ability = 1
port1_802_3ap_cr4_enable  = 1

可以看到缺少了以下三项

port1_802_3ap_56kr4_enable = true
port1_802_3ap_kr4_enable = true
port1_802_3ap_kr4_ability = true

根据上面分析, 缺失的部分为

scratchpad.eth.port[0].eth_802_3ap_kr4_ability 0x1f99c.14 1 scratchpad.eth.port[0].eth_802_3ap_56kr4_enable 0x1f99c.23 1 scratchpad.eth.port[0].eth_802_3ap_kr4_enable 0x1f99c.25 1

此时设置项应为

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0

转换成16进制 1D03 B900 使用 UltraEdit 查找 1D03 B900

fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.bin
000ed760h: 00 01 F9 9C 1D 03 B9 00 7F 8F FF 20 00 01 F9 A0 ;
000ed770h: 00 8F F0 02 03 8F F0 17 00 01 F9 A4 00 40 00 01 ;
000ed780h: 00 D3 01 FF 00 01 F9 AC 1D 03 B9 00 7F 8F FF 20 ;

说明判断正确, 修改设置项

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0

转换成16进制 1F83 F900, 与 MCX354A-FCC_Ax_56G.bin 一致 使用 UltraEdit 更改 fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754.bin 并重命名为 fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin

fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin
000ed760h: 00 01 F9 9C 1F 83 F9 00 7F 8F FF 20 00 01 F9 A0 ;
000ed770h: 00 8F F0 02 03 8F F0 17 00 01 F9 A4 00 40 00 01 ;
000ed780h: 00 D3 01 FF 00 01 F9 AC 1F 83 F9 00 7F 8F FF 20 ;

此时并不能直接刷固件, 因为前面说了还有校验码

校验固件

C:\Program Files\Mellanox\WinMFT> flint -i fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin v

     FS2 failsafe image. Start address: 0x0. Chunk size 0x80000:

     NOTE: The addresses below are contiguous logical addresses. Physical addresses on
           flash may be different, based on the image start address and chunk size

     /0x00000038-0x0000065b (0x000624)/ (BOOT2) - OK
     /0x0000065c-0x0000297f (0x002324)/ (BOOT2) - OK
     /0x00002980-0x00003923 (0x000fa4)/ (Configuration) - OK
     /0x00003924-0x00047f5f (0x04463c)/ (ROM) - OK
     /0x00047f60-0x00047fa3 (0x000044)/ (GUID) - OK
     /0x00047fa4-0x0004812f (0x00018c)/ (Image Info) - OK
     /0x00048130-0x00055513 (0x00d3e4)/ (DDR) - OK
     /0x00055514-0x00056577 (0x001064)/ (DDR) - OK
     /0x00056578-0x00056967 (0x0003f0)/ (DDR) - OK
     /0x00056968-0x00094fab (0x03e644)/ (DDR) - OK
     /0x00094fac-0x00099e2f (0x004e84)/ (DDR) - OK
     /0x00099e30-0x0009e423 (0x0045f4)/ (DDR) - OK
     /0x0009e424-0x0009ef1b (0x000af8)/ (DDR) - OK
     /0x0009ef1c-0x000cf0ef (0x0301d4)/ (DDR) - OK
     /0x000cf0f0-0x000d2c9b (0x003bac)/ (DDR) - OK
     /0x000d2c9c-0x000e820f (0x015574)/ (DDR) - OK
     /0x000e8210-0x000e8317 (0x000108)/ (DDR) - OK
     /0x000e8318-0x000ed39b (0x005084)/ (DDR) - OK
     /0x000ed39c-0x000eeb97 (0x0017fc)/ (Configuration) /0x000ed39c/ - wrong CRC (exp:0x8e4c, act:0x9008)
-E- FW image verification failed: Bad CRC.. AN HCA DEVICE CAN NOT BOOT FROM THIS IMAGE.

使用 UltraEdit 定位到 0x000eeb96

fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin
000eeb90h: 00 00 00 7F 00 00 90 08 00 00 00 03 00 00 00 18 ;

按照提示修改为

fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin
000eeb90h: 00 00 00 7F 00 00 8E 4C 00 00 00 03 00 00 00 18 ;

现在可以把ini文件再更新进去 复制 HP_1380110017.iniHP_1380110017_56G.ini, 修改 [IB] 部分

;;speed flags for port0
cx3_spec1_3_ib_support_port0 = 1
cx3_spec1_2_ib_support_port0 = 1
spec1_3_fdr10_ib_support_port0 = 1
spec1_3_fdr14_ib_support_port0 = 1
port1_802_3ap_56kr4_ability = 1
port1_802_3ap_cr4_ability = 1
port1_802_3ap_cr4_enable  = 1

;;speed flags for port1
cx3_spec1_3_ib_support_port1 = 1
cx3_spec1_2_ib_support_port1 = 1
spec1_3_fdr10_ib_support_port1 = 1
spec1_3_fdr14_ib_support_port1 = 1
port2_802_3ap_56kr4_ability = 1
port2_802_3ap_cr4_ability = 1
port2_802_3ap_cr4_enable  = 1

改为

;;speed flags for port0
cx3_spec1_3_ib_support_port0 = 1
cx3_spec1_2_ib_support_port0 = 1
spec1_3_fdr10_ib_support_port0 = 1
spec1_3_fdr14_ib_support_port0 = 1
port1_802_3ap_56kr4_ability = 1
port1_802_3ap_56kr4_enable = 1
port1_802_3ap_cr4_ability = 1
port1_802_3ap_cr4_enable  = 1
port1_802_3ap_kr4_ability = 1
port1_802_3ap_kr4_enable = 1

;;speed flags for port1
cx3_spec1_3_ib_support_port1 = 1
cx3_spec1_2_ib_support_port1 = 1
spec1_3_fdr10_ib_support_port1 = 1
spec1_3_fdr14_ib_support_port1 = 1
port2_802_3ap_56kr4_ability = 1
port2_802_3ap_56kr4_enable = 1
port2_802_3ap_cr4_ability = 1
port2_802_3ap_cr4_enable  = 1
port2_802_3ap_kr4_ability = 1
port2_802_3ap_kr4_enable = 1

替换固件内的ini文件

C:\Program Files\Mellanox\WinMFT> python3 fs2_update_ini.py fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin HP_1380110017_56G.ini

再次校验固件

C:\Program Files\Mellanox\WinMFT> flint -i fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin v

     FS2 failsafe image. Start address: 0x0. Chunk size 0x80000:

     NOTE: The addresses below are contiguous logical addresses. Physical addresses on
           flash may be different, based on the image start address and chunk size

     /0x00000038-0x0000065b (0x000624)/ (BOOT2) - OK
     /0x0000065c-0x0000297f (0x002324)/ (BOOT2) - OK
     /0x00002980-0x00003923 (0x000fa4)/ (Configuration) - OK
     /0x00003924-0x00047f5f (0x04463c)/ (ROM) - OK
     /0x00047f60-0x00047fa3 (0x000044)/ (GUID) - OK
     /0x00047fa4-0x0004812f (0x00018c)/ (Image Info) - OK
     /0x00048130-0x00055513 (0x00d3e4)/ (DDR) - OK
     /0x00055514-0x00056577 (0x001064)/ (DDR) - OK
     /0x00056578-0x00056967 (0x0003f0)/ (DDR) - OK
     /0x00056968-0x00094fab (0x03e644)/ (DDR) - OK
     /0x00094fac-0x00099e2f (0x004e84)/ (DDR) - OK
     /0x00099e30-0x0009e423 (0x0045f4)/ (DDR) - OK
     /0x0009e424-0x0009ef1b (0x000af8)/ (DDR) - OK
     /0x0009ef1c-0x000cf0ef (0x0301d4)/ (DDR) - OK
     /0x000cf0f0-0x000d2c9b (0x003bac)/ (DDR) - OK
     /0x000d2c9c-0x000e820f (0x015574)/ (DDR) - OK
     /0x000e8210-0x000e8317 (0x000108)/ (DDR) - OK
     /0x000e8318-0x000ed39b (0x005084)/ (DDR) - OK
     /0x000ed39c-0x000eeb97 (0x0017fc)/ (Configuration) - OK
     /0x000eeb98-0x000eec0b (0x000074)/ (Jump addresses) - OK
     /0x000eec0c-0x000ef7d7 (0x000bcc)/ (FW Configuration) - OK
     /0x00000000-0x000ef7d7 (0x0ef7d8)/ (Full Image) - OK

-I- FW image verification succeeded. Image is bootable.

全部OK

刷入固件

获取网卡名

C:\Program Files\Mellanox\WinMFT> mst status -v
MST devices:
------------
  mt4103_pci_cr0         bus:dev.fn=2d:00.0
  mt4103_pciconf0        bus:dev.fn=2d:00.0

刷入修改好的固件

C:\Program Files\Mellanox\WinMFT> flint -d mt4103_pci_cr0 -i fw-ConnectX3Pro-rel-2_42_5700-764285-B21_Ax-CLP-8025-UEFI-14.11.49-FlexBoot-3.4.754_56G.bin b

    Current FW version on flash:  2.42.5700
    New FW version:               2.42.5700

    Note: The new FW version is the same as the current FW version on flash.

 Do you want to continue ? (y/n) [n] : y

Burning FS2 FW image without signatures - OK
Restoring signature                     - OK

(可选)删除 FlexBoot

C:\Program Files\Mellanox\WinMFT> flint -d mt4103_pci_cr0 --allow_rom_change drom

-I- Preparing to remove ROM ...
Removing ROM image    - OK
Restoring signature  - OK

重置网卡

C:\Program Files\Mellanox\WinMFT> mlxconfig -d mt4103_pci_cr0 reset

 Reset configuration for device mt4103_pci_cr0? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

重启即可开启56G

参考

6.3 - HP544+ 网卡驱动

介绍HP544+ 网卡驱动的下载和安装

6.3.1 - windows驱动

下载安装hp544+网卡的windows驱动

下载

下载页面:

Mellanox OFED for Windows - WinOF / WinOF-2 (nvidia.com)

按照页面的提示:

For ConnectX-3 and ConnectX-3 Pro drivers download WinOF.

For ConnectX-4 and onwards adapter cards drivers download WinOF-2.

hp544+网卡属于 ConnectX-3 Pro,因此只能下载 WinOF. 驱动。

windows驱动目前有 53000 和 54000 两个版本,其中 54000 是 Windows Server 2019 专用的,其他操作系统请使用 53000

WinOF v5.50.54000 includes a driver for Windows Server 2019 only. For other OSes, please see WinOF v5.50.53000.

下载链接:

安装

windows 10 安装

直接运行下载的 MLNX_VPI_WinOF-5_50_53000_All_Win2019_x64.exe 文件即可。

6.3.2 - linux驱动

下载安装hp544+网卡的linux驱动

6.3.2.1 - 下载linux驱动

下载hp544+网卡的linux驱动

下载驱动

下载页面:

Linux InfiniBand Drivers (nvidia.com)

按照页面的提示:

Note: MLNX_OFED 4.9-x LTS should be used by customers who would like to utilize one of the following:

  • NVIDIA ConnectX-3 Pro
  • NVIDIA ConnectX-3
  • NVIDIA Connect-IB
  • RDMA experimental verbs library (mlnx_lib)
  • OSs based on kernel version lower than 3.10

Note: All of the above are not available on MLNX_OFED 5.x branch.

hp544+网卡属于 ConnectX-3 Pro,因此只能下载 4.9 驱动。

windows驱动目前有 53000 和 54000 两个版本,其中 54000 是 Windows Server 2019 专用的,其他操作系统请使用 53000

WinOF v5.50.54000 includes a driver for Windows Server 2019 only. For other OSes, please see WinOF v5.50.53000.

下载链接:

6.3.2.2 - ubuntu20.04上安装驱动

在 ubuntu 20.04上安装hp544+网卡驱动

准备工作

检查当前默认驱动:

$ sudo su
$ modinfo mlx4_core | grep version
version:        4.0-0
srcversion:     CD88194143D98D15E719CD7
vermagic:       5.4.0-94-generic SMP mod_unload modversions

$ modinfo mlx4_core | grep ^version:|sed 's/version: * //g'
4.0-0

在操作前,网卡最好连接好网线,否则ifconfig会看不到网卡信息,或者需要加 -a 参数:

$ ifconfig -a

ens1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.20  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::4a0f:cfff:fef7:89c1  prefixlen 64  scopeid 0x20<link>
        ether 48:0f:cf:f7:89:c1  txqueuelen 1000  (Ethernet)
        RX packets 2196  bytes 166931 (166.9 KB)
        RX errors 0  dropped 1577  overruns 0  frame 0
        TX packets 418  bytes 48817 (48.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ ethtool -i ens1
driver: mlx4_en
version: 4.0-0
firmware-version: 2.42.5700
expansion-rom-version: 
bus-info: 0000:b3:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

查看HP544+网卡的设备信息:

$ lspci -vvv
$ lspci | grep Mellanox
b3:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

$ lspci -vv -s b3:00.0 | grep "Part number" -A 3
			[PN] Part number: 764285-B21
			[EC] Engineering changes: A3
			[SN] Serial number: IL254902M0
			[V0] Vendor specific: Alom FDR x8 13W

更新驱动

从其他机器上复制文件:

scp ./MLNX_OFED_LINUX-4.9-6.0.6.0-ubuntu20.04-x86_64.tgz sky@192.168.0.10:/home/sky

解压缩:

tar -xvf MLNX_OFED_LINUX-4.9-6.0.6.0-ubuntu20.04-x86_64.tgz
cd MLNX_OFED_LINUX-4.9-6.0.6.0-ubuntu20.04-x86_64

执行安装命令:

$ sudo ./mlnxofedinstall --without-fw-update

Logs dir: /tmp/MLNX_OFED_LINUX.18504.logs
General log file: /tmp/MLNX_OFED_LINUX.18504.logs/general.log

Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):

ofed-scripts
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
rshim-dkms
iser-dkms
isert-dkms
srp-dkms
libibverbs1
ibverbs-utils
libibverbs-dev
libibverbs1-dbgmlnxofedinstall
libmlx4-1
libmlx4-dev
libmlx4-1-dbg
libmlx5-1
libmlx5-dev
libmlx5-1-dbg
libibumad
libibumad-static
libibumad-devel
ibacm
ibacm-dev
librdmacm1
librdmacm-utils
librdmacm-dev
mstflint
ibdump
libibmad
libibmad-static
libibmad-devel
libopensm
opensm
opensm-doc
libopensm-devel
infiniband-diags
infiniband-diags-compat
mft
kernel-mft-dkms
libibcm1
libibcm-dev
perftest
ibutils2
libibdm1
ibutils
ar-mgr
dump-pr
ibsim
ibsim-doc
ucx
sharp
hcoll
knem-dkms
knem
openmpi
mpitests
libdapl2
dapl2-utils
libdapl-dev
srptools
mlnx-ethtool
mlnx-iproute2

This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:y

Checking SW Requirements...
One or more required packages for installing MLNX_OFED_LINUX are missing.
Attempting to install the following missing packages:
quilt automake dpatch libgfortran4 flex dkms gcc make autoconf chrpath tcl gfortran swig autotools-dev graphviz tk m4 libnl-route-3-200 libltdl-dev debhelper bison pkg-config
Removing old packages...
Installing new packages
Installing ofed-scripts-4.9...
Installing mlnx-ofed-kernel-utils-4.9...
Installing mlnx-ofed-kernel-dkms-4.9...

Installing rshim-dkms-1.18...
Installing iser-dkms-4.9...
Installing isert-dkms-4.9...
Installing srp-dkms-4.9...
Installing libibverbs1-41mlnx1...
Installing ibverbs-utils-41mlnx1...
Installing libibverbs-dev-41mlnx1...
Installing libibverbs1-dbg-41mlnx1...
Installing libmlx4-1-41mlnx1...
Installing libmlx4-dev-41mlnx1...
Installing libmlx4-1-dbg-41mlnx1...
Installing libmlx5-1-41mlnx1...
Installing libmlx5-dev-41mlnx1...
Installing libmlx5-1-dbg-41mlnx1...
Installing libibumad-43.1.1.MLNX20200211.078947f...
Installing libibumad-static-43.1.1.MLNX20200211.078947f...
Installing libibumad-devel-43.1.1.MLNX20200211.078947f...
Installing ibacm-41mlnx1...
Installing ibacm-dev-41mlnx1...
Installing librdmacm1-41mlnx1...
Installing librdmacm-utils-41mlnx1...
Installing librdmacm-dev-41mlnx1...
Installing mstflint-4.14.0...
Installing ibdump-6.0.0...
Installing libibmad-5.4.0.MLNX20190423.1d917ae...
Installing libibmad-static-5.4.0.MLNX20190423.1d917ae...
Installing libibmad-devel-5.4.0.MLNX20190423.1d917ae...
Installing libopensm-5.7.2.MLNX20201014.9378048...
Installing opensm-5.7.2.MLNX20201014.9378048...
Installing opensm-doc-5.7.2.MLNX20201014.9378048...
Installing libopensm-devel-5.7.2.MLNX20201014.9378048...
Installing infiniband-diags-5.6.0.MLNX20200211.354e4b7...
Installing infiniband-diags-compat-5.6.0.MLNX20200211.354e4b7...
Installing mft-4.15.1...
Installing kernel-mft-dkms-4.15.1...
Installing libibcm1-41mlnx1...
Installing libibcm-dev-41mlnx1...
Installing perftest-4.5.0.mlnxlibs...
Installing ibutils2-2.1.1...
Installing libibdm1-1.5.7.1...
Installing ibutils-1.5.7.1...
Installing ar-mgr-1.0...
Installing dump-pr-1.0...
Installing ibsim-0.10...
Installing ibsim-doc-0.10...
Installing ucx-1.8.0...
Installing sharp-2.1.2.MLNX20200428.ddda184...
Installing hcoll-4.4.2968...
Installing knem-dkms-1.1.4.90mlnx1...
Installing knem-1.1.4.90mlnx1...
Installing openmpi-4.0.3rc4...
Installing mpitests-3.2.20...
Installing libdapl2-2.1.10.1.mlnx...
Installing dapl2-utils-2.1.10.1.mlnx...
Installing libdapl-dev-2.1.10.1.mlnx...
Installing srptools-41mlnx1...
Installing mlnx-ethtool-5.4...
Installing mlnx-iproute2-5.4.0...
Selecting previously unselected package mlnx-fw-updater.
(Reading database ... 87526 files and directories currently installed.)
Preparing to unpack .../mlnx-fw-updater_4.9-4.1.7.0_amd64.deb ...
Unpacking mlnx-fw-updater (4.9-4.1.7.0) ...
Setting up mlnx-fw-updater (4.9-4.1.7.0) ...

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Installation passed successfully
To load the new driver, run:
/etc/init.d/openibd restart

重启设备驱动:

/etc/init.d/openibd restart

Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]

重启机器,然后检验:

$ modinfo mlx4_core | grep version

version:        4.9-6.0.6
srcversion:     B7B1BFEEF8DC7BE5A999C14
vermagic:       5.4.0-94-generic SMP mod_unload modversions 

可以看到现在的驱动版本已经从默认安装的 4.0-0 变成了 4.9-6.0.6 。

5.15内核

尝试在 ubuntu 20.04 上更新内核到 5.15.0-58, 然后在安装 MLNX_OFED 驱动时报错:

Copying build sources from '/var/lib/dkms/mlnx-ofed-kernel/4.9/build/../build' to '/usr/src/ofa_kernel/5.15.0-58-generic' ...
/bin/cp: cannot stat 'Module*.symvers': No such file or directory

目前 MLNX_OFED 不支持高版本的内核,测试验证过的只有 5.4 内核(也就是 ubuntu 20.04 默认的内核)。如果更新内核,就只能使用默认的 4.0.0 版本的驱动了,无法更新 4.9 版本。貌似 nvidia 也不准备继续支持。

尝试了一下在 5.4 内核上,更新 MLNX_OFED_LINUX-4.9-6.0.6.0 非常顺利。

参考:

总结

hp544+ 这种 cx3 网卡,比较尴尬,驱动更新基本要停了,对新内核的支持也处于放弃状态。目前只能选择:

  • 用 ubuntu 20.04 + 5.4 内核,驱动升级到 MLNX_OFED_LINUX-4.9-6.0.6.0
  • 用 ubuntu 20.04 + 5.15 内核,驱动继续使用默认自带的 4.0.0 版本。

考虑到目前对最新版本内核没有特别的需求,我选择第一个方案,暂时维持系统在最稳定的状态:ubuntu 20.04 + 5.4 内核 + MLNX_OFED_LINUX-4.9-6.0.6.0 ,都是有官方支持做保障的。

参考资料

6.3.2.3 - linux-mint上安装驱动

在 linux-mint 上安装hp544+网卡驱动

在基于 ubuntu 20.04 内核的 linux mint 20.02 版本上更新驱动。但奈何没能搞定,各种错误,最后放弃,只能继续使用自带的驱动。

以下记录仅作为归档。


直接执行命令会报错,因为默认有linux发行版本的检查:

./mlnxofedinstall
Current operation system is not supported (linuxmint20.2)!

解决的方式之一是通过命令行参数 --distro 传递发行版本信息进去:

./mlnxofedinstall --distro ubuntu20.04

Removing old packages...
Installing new packages
Installing ofed-scripts-4.9...
Installing mlnx-ofed-kernel-utils-4.9...
Installing mlnx-ofed-kernel-dkms-4.9...
Failed to install mlnx-ofed-kernel-dkms DEB
Collecting debug info...
See /tmp/MLNX_OFED_LINUX.11004.logs/mlnx-ofed-kernel-dkms.debinstall.log

遇到报错,/tmp/MLNX_OFED_LINUX.11004.logs/mlnx-ofed-kernel-dkms.debinstall.log 中的信息为:

/usr/bin/dpkg -i --force-confnew --force-confmiss /home/sky/hp544/MLNX_OFED_LINUX-4.9-4.1.7.0-ubuntu20.04-x86_64/DEBS/MLNX_LIBS/mlnx-ofed-kernel-dkms_4.9-OFED.4.9.4.1.7.1_all.deb
Selecting previously unselected package mlnx-ofed-kernel-dkms.
(Reading database ... 322968 files and directories currently installed.)
Preparing to unpack .../mlnx-ofed-kernel-dkms_4.9-OFED.4.9.4.1.7.1_all.deb ...
Unpacking mlnx-ofed-kernel-dkms (4.9-OFED.4.9.4.1.7.1) ...
Setting up mlnx-ofed-kernel-dkms (4.9-OFED.4.9.4.1.7.1) ...
Loading new mlnx-ofed-kernel-4.9 DKMS files...
First Installation: checking all kernels...
Building only for 5.4.0-92-generic
Building for architecture x86_64
Building initial module for 5.4.0-92-generic
Error! Bad return status for module build on kernel: 5.4.0-92-generic (x86_64)
Consult /var/lib/dkms/mlnx-ofed-kernel/4.9/build/make.log for more information.
dpkg: error processing package mlnx-ofed-kernel-dkms (--install):
 installed mlnx-ofed-kernel-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
 mlnx-ofed-kernel-dkms

/var/lib/dkms/mlnx-ofed-kernel/4.9/build/make.log 文件中的信息:

Copying build sources from '/var/lib/dkms/mlnx-ofed-kernel/4.9/build/../build' to '/usr/src/ofa_kernel/5.4.0-92-generic' ...
/bin/cp: cannot stat 'Module*.symvers': No such file or directory

检查这台机器上的相关信息:

$ uname --all
Linux skyserver3 5.4.0-94-generic #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ ls /usr/src/
linux-headers-5.4.0-94  linux-headers-5.4.0-94-generic  mlnx-ofed-kernel-4.9  ofa_kernel  ofa_kernel-4.9
$ ls /lib/modules/
5.4.0-94-generic
$ locate Module.symvers
# 注意这里没有信息

对照成功安装驱动的ubuntu server 20.04 机器上的信息:

$ uname --all
Linux skywork2 5.4.0-94-generic #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ ls /usr/src/
iser-4.9                linux-headers-5.4.0-94          ofa_kernel-4.9
isert-4.9               linux-headers-5.4.0-94-generic  rshim-1.18
kernel-mft-dkms-4.15.1  mlnx-ofed-kernel-4.9            srp-4.9
knem-1.1.4.90mlnx1      ofa_kernel
$ ls /lib/modules/
5.4.0-94-generic
$ locate Module.symvers
/usr/src/linux-headers-5.4.0-94-generic/Module.symvers
/usr/src/ofa_kernel/5.4.0-94-generic/Module.symvers
/usr/src/ofa_kernel/5.4.0-94-generic/compat/build/Module.symvers

执行 locate Module.symvers 时发现没有列出信息:

$ locate Module.symvers
# 事实上python3已经安装好了
sudo apt-get install python3
# 但是python命令不存在,需要通过python-is-python3来把python命令按照python3来执行
sudo apt-get install python-is-python3
# 安装过程中需要用到 distutils
sudo apt-get install python3-distutils

$ 开始安装
$./mlnxofedinstall  --distro ubuntu20.04 --without-fw-update

Logs dir: /tmp/MLNX_OFED_LINUX.1976.logs
General log file: /tmp/MLNX_OFED_LINUX.1976.logs/general.log

Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):

ofed-scripts
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
......
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:y

Checking SW Requirements...


One or more required packages for installing MLNX_OFED_LINUX are missing.
Attempting to install the following missing packages:
gfortran graphviz tcl swig chrpath dpatch debhelper libltdl-dev libgfortran4 tk automake quilt autotools-dev autoconf


Removing old packages...
Installing new packages
Installing ofed-scripts-4.9...
Installing mlnx-ofed-kernel-utils-4.9...
Installing mlnx-ofed-kernel-dkms-4.9...


Error: mlnx-ofed-kernel-dkms installation failed!
Collecting debug info...

See:
	/tmp/MLNX_OFED_LINUX.1976.logs/mlnx-ofed-kernel-dkms.debinstall.log
Removing newly installed packages...

但奇怪的是 /tmp/MLNX_OFED_LINUX.1976.logs/mlnx-ofed-kernel-dkms.debinstall.log 日志文件中并没有报错,反而是显示 DKMS: install completed.

./mlnxofedinstall  --distro ubuntu20.04 
./mlnxofedinstall  --distro ubuntu20.04 --without-fw-update
./mlnxofedinstall --add-kernel-support --distro ubuntu20.04 --skip-repo

参考 https://docs.nvidia.com/networking/display/MLNXOFEDv494080/Installing+Mellanox+OFED

./mlnxofedinstall --without-dkms --add-kernel-support --kernel 3.13.0-85-generic --without-fw-update --force

失败但是日志没有错误信息,实在无力再拍错了。

退回去用自带 4.0.0 版本驱动,好处就是可以使用比较新的 5.15 内核。

6.3.3 - esxi驱动

下载安装hp544+网卡的esxi驱动

6.3.3.1 - 更新esxi 6.7的驱动

下载更新hp544+网卡的esxi 6.7的驱动

现有驱动

esxi 6.7 下默认带有 mlx4 驱动。

在软件包中搜索 mlx 可以看到:

也可以 ssh 之后通过命令查询:

$ esxcli software vib list | grep mlx
net-mlx4-core                  1.9.7.0-1vmw.670.0.0.8169922          VMW      VMwareCertified     2023-03-31
net-mlx4-en                    1.9.7.0-1vmw.670.0.0.8169922          VMW      VMwareCertified     2023-03-31
nmlx4-core                     3.17.13.1-1vmw.670.2.48.13006603      VMW      VMwareCertified     2023-03-31
nmlx4-en                       3.17.13.1-1vmw.670.2.48.13006603      VMW      VMwareCertified     2023-03-31
nmlx4-rdma                     3.17.13.1-1vmw.670.2.48.13006603      VMW      VMwareCertified     2023-03-31
nmlx5-core                     4.17.13.1-1vmw.670.3.73.14320388      VMW      VMwareCertified     2023-03-31
nmlx5-rdma                     4.17.13.1-1vmw.670.2.48.13006603      VMW      VMwareCertified     2023-03-31

可以看到当前版本为 3.17.13.1-1vmw.670.2.48.13006603

下载新驱动

官方下载

mellannox 网站的 exsi 驱动下载页面:

ConnectX® Ethernet Driver for VMware® ESXi Server (nvidia.com)

找到 ESXi6.7 和 ConnectX-3 Pro:

在这里可以看到 ConnectX-3 Pro 最高只支持到 esxi 6.7, **nmlx4_en **驱动最高版本为 3.17.70.1,下载地址为:

Download VMware vCloud Suite - VMware Customer Connect

怎么找到最新版本的驱动?

而 esxi 7.0.2 自带的 nmlx4_en 驱动版本为 3.19.16.8。

我在网上无意间看到的一个版本号是 3.19.70.1,然后靠这个版本号通过google才搜索到 mellannox 网站的下载链接。这个 3.19.70.1 版本是 2020-09-08 发布的。

所以,问题来了?还有没有更新的 nmlx4_en 版本?怎么找出来?

mellannox网站自带的搜索功能实在是太烂了。

下载地址:

Download VMware vSphere - VMware Customer Connect

解压缩

下载下来的文件需要先解压缩,得到里面的 zip 文件,这个文件才可以用于后面的驱动更新,否则会报错。

注意文件名有 offline_bundle 字样:

  • Mellanox-nmlx4_3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip

更新驱动

通过esxi 的控制台,用数据存储浏览器将下载的文件上传到 datastore1 下。

ssh 登录,然后执行命令:

$ esxcli software vib update -d /vmfs/volumes/d
atastore1/upload/Mellanox-nmlx4_3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip 
Installation Result
   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
   Reboot Required: true
   VIBs Installed: MEL_bootbank_nmlx4-core_3.19.70.1-1OEM.670.0.0.8169922, MEL_bootbank_nmlx4-en_3.19.70.1-1OEM.670.0.0.8169922, MEL_bootbank_nmlx4-rdma_3.19.70.1-1OEM.670.0.0.8169922
   VIBs Removed: VMW_bootbank_nmlx4-core_3.17.13.1-1vmw.670.2.48.13006603, VMW_bootbank_nmlx4-en_3.17.13.1-1vmw.670.2.48.13006603, VMW_bootbank_nmlx4-rdma_3.17.13.1-1vmw.670.2.48.13006603
   VIBs Skipped: 

提示需要重启才能生效。

特别注意:这里的文件路径必须是绝对路径,否则会报错说文件找到不到,错误提示如下:

$ cd /vmfs/volumes/datastore1/upload/
$ esxcli software vib update -d Mellanox-nmlx4_
3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip 
 [MetadataDownloadError]
 Could not download from depot at zip:/var/log/vmware/Mellanox-nmlx4_3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip?index.xml, skipping (('zip:/var/log/vmware/Mellanox-nmlx4_3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip?index.xml', '', "Error extracting index.xml from /var/log/vmware/Mellanox-nmlx4_3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip: [Errno 2] No such file or directory: '/var/log/vmware/Mellanox-nmlx4_3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip'"))
        url = zip:/var/log/vmware/Mellanox-nmlx4_3.19.70.1-1OEM.670.0.0.8169922-offline_bundle-17262032.zip?index.xml
 Please refer to the log file for more details.

重启之后验证:

$ esxcli software vib list | grep mlx
nmlx4-core                     3.19.70.1-1OEM.670.0.0.8169922        MEL      VMwareCertified     2023-05-25
nmlx4-en                       3.19.70.1-1OEM.670.0.0.8169922        MEL      VMwareCertified     2023-05-25
nmlx4-rdma                     3.19.70.1-1OEM.670.0.0.8169922        MEL      VMwareCertified     2023-05-25
net-mlx4-core                  1.9.7.0-1vmw.670.0.0.8169922          VMW      VMwareCertified     2023-03-31
net-mlx4-en                    1.9.7.0-1vmw.670.0.0.8169922          VMW      VMwareCertified     2023-03-31
nmlx5-core                     4.17.13.1-1vmw.670.3.73.14320388      VMW      VMwareCertified     2023-03-31
nmlx5-rdma                     4.17.13.1-1vmw.670.2.48.13006603      VMW      VMwareCertified     2023-03-31

可以看到 nmlx4 驱动从 3.17.13.1 升级到了 3.19.70.1。

6.4 - HP544+ 网卡测速

对 HP544+ 网卡进行速度测试

使用的软件主要是 iperf 和 iperf3,主要原因是某些情况下 iperf3 无法跑满带宽,因此需要补充 iperf 的测试。

软交换机器的测速

有一台技嘉 x99 ud4 机器,插了四块 HP544+ 网卡,都跑在 pcie 3.0 8x 上,cpu 是 e5 2680 v4 14核28线,物理机方式安装 debian 12 操作系统。使用系统默认的驱动。机器 IP 地址为 192.168.0.99 。

安装 iperf 和 iperf3 并设置 iperf3 自动启动。

$ uname -a 
Linux switch99 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

$ iperf --version                               
iperf version 2.1.8 (12 August 2022) pthreads

$ iperf3 --version                      
iperf 3.12 (cJSON 1.7.15)
Linux switch99 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64
Optional features available: CPU affinity setting, IPv6 flow label, SCTP, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, authentication, bind to device, support IPv4 don't fragment

本机测速

首先尝试 iperf3 访问本机的速度,即 iperf3 服务器端和 iperf3 客户端都在本地运行

$ iperf3 -c 192.168.0.99 -t 30 -i 1 -P 5
Connecting to host 192.168.0.99, port 5201
[  5] local 192.168.0.99 port 43692 connected to 192.168.0.99 port 5201
[  7] local 192.168.0.99 port 43702 connected to 192.168.0.99 port 5201
[  9] local 192.168.0.99 port 43710 connected to 192.168.0.99 port 5201
[ 11] local 192.168.0.99 port 43712 connected to 192.168.0.99 port 5201
[ 13] local 192.168.0.99 port 43716 connected to 192.168.0.99 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   751 MBytes  6.29 Gbits/sec    0    639 KBytes       
[  7]   0.00-1.00   sec   751 MBytes  6.29 Gbits/sec    0    639 KBytes       
[  9]   0.00-1.00   sec   751 MBytes  6.29 Gbits/sec    0    639 KBytes       
[ 11]   0.00-1.00   sec   751 MBytes  6.29 Gbits/sec    0    639 KBytes       
[ 13]   0.00-1.00   sec   751 MBytes  6.29 Gbits/sec    0    639 KBytes       
[SUM]   0.00-1.00   sec  3.67 GBytes  31.5 Gbits/sec    0 
......
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec    0             sender
[  5]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec                  receiver
[  7]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec    0             sender
[  7]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec                  receiver
[  9]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec    0             sender
[  9]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec                  receiver
[ 11]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec    0             sender
[ 11]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec                  receiver
[ 13]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec    0             sender
[ 13]   0.00-30.00  sec  22.8 GBytes  6.54 Gbits/sec                  receiver
[SUM]   0.00-30.00  sec   114 GBytes  32.7 Gbits/sec    0             sender
[SUM]   0.00-30.00  sec   114 GBytes  32.7 Gbits/sec                  receiver

测试出来的带宽只有 32.7 Gbits/sec ,非常的低。

改用 iperf ,启动服务器端

iperf -s -p 10001 

开始客户端测试:

iperf -c 192.168.0.99 -t 30 -i 1 -P 10 -p 10001

测试出来的带宽是 286 Gbits/sec,远远大于 iperf3 测试出来的 32.7:

iperf -c 192.168.0.99 -t 10 -i 1 -P 10 -p 10001
------------------------------------------------------------
Client connecting to 192.168.0.99, TCP port 10001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  2] local 192.168.0.99 port 54650 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/20)
[  4] local 192.168.0.99 port 54656 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/24)
[  1] local 192.168.0.99 port 54672 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/26)
[ 10] local 192.168.0.99 port 54716 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/17)
[  3] local 192.168.0.99 port 54666 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/23)
[  5] local 192.168.0.99 port 54680 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/24)
[  6] local 192.168.0.99 port 54690 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/25)
[  8] local 192.168.0.99 port 54724 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/24)
[  7] local 192.168.0.99 port 54700 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/24)
[  9] local 192.168.0.99 port 54730 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=319/32741/24)
[ ID] Interval       Transfer     Bandwidth
[  5] 0.0000-1.0000 sec  3.36 GBytes  28.8 Gbits/sec
[  7] 0.0000-1.0000 sec  3.33 GBytes  28.6 Gbits/sec
[  4] 0.0000-1.0000 sec  3.26 GBytes  28.0 Gbits/sec
[  2] 0.0000-1.0000 sec  3.33 GBytes  28.6 Gbits/sec
[  3] 0.0000-1.0000 sec  3.23 GBytes  27.7 Gbits/sec
[  9] 0.0000-1.0000 sec  3.25 GBytes  27.9 Gbits/sec
[  1] 0.0000-1.0000 sec  3.24 GBytes  27.8 Gbits/sec
[  8] 0.0000-1.0000 sec  3.35 GBytes  28.8 Gbits/sec
[  6] 0.0000-1.0000 sec  3.28 GBytes  28.2 Gbits/sec
[ 10] 0.0000-1.0000 sec  3.22 GBytes  27.7 Gbits/sec
[SUM] 0.0000-1.0000 sec  32.8 GBytes   282 Gbits/sec
......
[  5] 0.0000-10.0046 sec  33.9 GBytes  29.1 Gbits/sec
[  4] 0.0000-10.0044 sec  33.0 GBytes  28.4 Gbits/sec
[  2] 0.0000-10.0045 sec  33.0 GBytes  28.3 Gbits/sec
[  9] 0.0000-10.0043 sec  33.3 GBytes  28.6 Gbits/sec
[  8] 0.0000-10.0041 sec  33.1 GBytes  28.5 Gbits/sec
[ 10] 0.0000-10.0050 sec  34.1 GBytes  29.3 Gbits/sec
[  3] 0.0000-10.0044 sec  33.0 GBytes  28.3 Gbits/sec
[  6] 0.0000-10.0041 sec  33.1 GBytes  28.4 Gbits/sec
[  1] 0.0000-10.0042 sec  33.3 GBytes  28.6 Gbits/sec
[  7] 0.0000-10.0045 sec  33.5 GBytes  28.7 Gbits/sec
[SUM] 0.0000-10.0007 sec   333 GBytes   286 Gbits/sec

备注

注意并发线程即 -P 参数的影响很大,需要反复测试才能找到测试结果最好的参数。比如同样这台机器测试,-P 不同的结果是这样的:

  • -P 1: 33.1 Gbits/sec
  • -P 2: 61.3 Gbits/sec
  • -P 2: 147 Gbits/sec
  • -P 8: 242 Gbits/sec
  • -P 10: 288 Gbits/sec
  • -P 12: 225 Gbits/sec
  • -P 20: 163 Gbits/sec

能看到 10 个测试线程可以跑到最高带宽 288 G,之后加大线程反而跑的更慢。

从其他机器访问

从其他机器访问,测试的这几台机器都是安装的 pve 8.1,linux 6.5 内核,同样是采用的操作系统自带的驱动。

物理网卡访问

在 pve 下直接访问软交换机器,此时可以视为两台物理机用网卡直连。

iperf -c 192.168.0.99 -P 10 -t 10 -i 1 -p 10001

测试结果为 47.6 Gbits/sec,根据经验,这是物理机模式下两块 hp544+ 直连能跑出来的最大带宽了。

iperf -c 192.168.0.99 -P 10 -t 10 -i 1 -p 10001 
------------------------------------------------------------
Client connecting to 192.168.0.99, TCP port 10001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  4] local 192.168.0.19 port 54790 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/176)
[  2] local 192.168.0.19 port 54788 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/149)
[  1] local 192.168.0.19 port 54798 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/124)
[  7] local 192.168.0.19 port 54806 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/113)
[  5] local 192.168.0.19 port 54808 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/81)
[  3] local 192.168.0.19 port 54800 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/106)
[ 10] local 192.168.0.19 port 54840 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/91)
[  9] local 192.168.0.19 port 54830 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/49)
[  6] local 192.168.0.19 port 54810 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/86)
[  8] local 192.168.0.19 port 54824 connected with 192.168.0.99 port 10001 (icwnd/mss/irtt=14/1448/71)
[ ID] Interval       Transfer     Bandwidth
[ 10] 0.0000-1.0000 sec   805 MBytes  6.75 Gbits/sec
[  4] 0.0000-1.0000 sec   189 MBytes  1.59 Gbits/sec
[  9] 0.0000-1.0000 sec   804 MBytes  6.75 Gbits/sec
[  1] 0.0000-1.0000 sec   194 MBytes  1.63 Gbits/sec
[  8] 0.0000-1.0000 sec   798 MBytes  6.70 Gbits/sec
[  6] 0.0000-1.0000 sec   362 MBytes  3.04 Gbits/sec
[  7] 0.0000-1.0000 sec   448 MBytes  3.75 Gbits/sec
[  5] 0.0000-1.0000 sec   808 MBytes  6.78 Gbits/sec
[  3] 0.0000-1.0000 sec   445 MBytes  3.74 Gbits/sec
[  2] 0.0000-1.0000 sec   815 MBytes  6.84 Gbits/sec
[SUM] 0.0000-1.0000 sec  5.54 GBytes  47.6 Gbits/sec
......
[ 10] 0.0000-10.0117 sec  7.91 GBytes  6.79 Gbits/sec
[  9] 0.0000-10.0115 sec  7.92 GBytes  6.79 Gbits/sec
[  1] 0.0000-10.0116 sec  2.00 GBytes  1.72 Gbits/sec
[  6] 0.0000-10.0115 sec  3.84 GBytes  3.29 Gbits/sec
[  5] 0.0000-10.0115 sec  7.81 GBytes  6.70 Gbits/sec
[  2] 0.0000-10.0117 sec  7.93 GBytes  6.80 Gbits/sec
[  8] 0.0000-10.0117 sec  7.87 GBytes  6.75 Gbits/sec
[  3] 0.0000-10.0113 sec  3.97 GBytes  3.41 Gbits/sec
[  7] 0.0000-10.0116 sec  4.01 GBytes  3.44 Gbits/sec
[  4] 0.0000-10.0116 sec  2.08 GBytes  1.79 Gbits/sec
[SUM] 0.0000-10.0048 sec  55.3 GBytes  47.5 Gbits/sec

多台类似配置的机器,分别用同样的测试方法,得到的最高测试速度分别是:

  • skyserver:47.5 Gbits/sec
  • Skyserver2:43.6 Gbits/sec
  • skyserver3:47.7 Gbits/sec
  • Skyserver4:42.2 Gbits/sec
  • Skyserver5:42.6 Gbits/sec
  • Skyserver6:42.4 Gbits/sec

同样用 iperf3 测试:

iperf3 -c 192.168.0.99 -P 10 -t 10 -i 1 

测试结果是 39.3 Gbits/sec :

iperf3 -c 192.168.0.99 -P 10 -t 10 -i 1        
Connecting to host 192.168.0.99, port 5201
[  5] local 192.168.0.19 port 40664 connected to 192.168.0.99 port 5201
[  7] local 192.168.0.19 port 40672 connected to 192.168.0.99 port 5201
[  9] local 192.168.0.19 port 40674 connected to 192.168.0.99 port 5201
[ 11] local 192.168.0.19 port 40678 connected to 192.168.0.99 port 5201
[ 13] local 192.168.0.19 port 40688 connected to 192.168.0.99 port 5201
[ 15] local 192.168.0.19 port 40700 connected to 192.168.0.99 port 5201
[ 17] local 192.168.0.19 port 40706 connected to 192.168.0.99 port 5201
[ 19] local 192.168.0.19 port 40716 connected to 192.168.0.99 port 5201
[ 21] local 192.168.0.19 port 40730 connected to 192.168.0.99 port 5201
[ 23] local 192.168.0.19 port 40734 connected to 192.168.0.99 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   432 MBytes  3.62 Gbits/sec    0    266 KBytes       
[  7]   0.00-1.00   sec   432 MBytes  3.62 Gbits/sec    0    267 KBytes       
[  9]   0.00-1.00   sec   432 MBytes  3.62 Gbits/sec    0    205 KBytes       
[ 11]   0.00-1.00   sec   431 MBytes  3.61 Gbits/sec    0    214 KBytes       
[ 13]   0.00-1.00   sec   431 MBytes  3.61 Gbits/sec    0    314 KBytes       
[ 15]   0.00-1.00   sec   431 MBytes  3.61 Gbits/sec    0    293 KBytes       
[ 17]   0.00-1.00   sec   432 MBytes  3.62 Gbits/sec    0    233 KBytes       
[ 19]   0.00-1.00   sec   431 MBytes  3.61 Gbits/sec    0    187 KBytes       
[ 21]   0.00-1.00   sec   431 MBytes  3.61 Gbits/sec    0    310 KBytes       
[ 23]   0.00-1.00   sec   431 MBytes  3.61 Gbits/sec    0    315 KBytes       
[SUM]   0.00-1.00   sec  4.22 GBytes  36.1 Gbits/sec    0   
......
[SUM]   0.00-10.00  sec  45.8 GBytes  39.3 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  45.8 GBytes  39.3 Gbits/sec                  receiver

还是比 iperf 速度慢不少。

sriov 虚拟网卡测试

vmbr 网卡测试

交换速度

其他两台机器通过软交换机相互访问的速度。

  • skyserver:充当 iperf 服务器端
  • Skyserver2:42.4 Gbits/sec
  • skyserver3:36.2 Gbits/sec (和 skyserver 共用同一块网卡,受限于pcie3.0 8x 64g的总带宽)
  • Skyserver4:42.3 Gbits/sec
  • Skyserver5:42.2 Gbits/sec
  • Skyserver6:42.2 Gbits/sec