debian 12 安装驱动

在debian12上安装华为sp350网卡的驱动

2025-04-23 更新: debian12 已经升级到 12.9 版本,网卡驱动版本为最新的 24.07-0.6.1.0 版本。

准备工作

查看默认驱动

这是debian12自带的默认驱动情况:

$ lsmod | grep mlx

mlx5_ib               405504  0
ib_uverbs             172032  1 mlx5_ib
ib_core               438272  2 ib_uverbs,mlx5_ib
mlx5_core            1691648  1 mlx5_ib
mlxfw                  36864  1 mlx5_core
psample                20480  1 mlx5_core
pci_hyperv_intf        16384  1 mlx5_core

mlx5_core 的详细信息:

 $ modinfo mlx5_core               
filename:       /lib/modules/6.1.0-31-amd64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
license:        Dual BSD/GPL
description:    Mellanox 5th generation network adapters (ConnectX series) core driver
author:         Eli Cohen <eli@mellanox.com>
alias:          auxiliary:mlx5_core.eth
alias:          pci:v000015B3d0000A2DFsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2DCsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D6sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D3sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D2sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001025sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001023sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001021sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Fsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Esv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Dsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Csv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Bsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias:          pci:v000015B3d00001019sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001018sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001017sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001016sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001015sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001014sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001013sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001012sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001011sv*sd*bc*sc*i*
alias:          auxiliary:mlx5_core.eth-rep
depends:        psample,pci-hyperv-intf,mlxfw
retpoline:      Y
intree:         Y
name:           mlx5_core
vermagic:       6.1.0-31-amd64 SMP preempt mod_unload modversions
sig_id:         PKCS#7
signer:         Debian Secure Boot CA
sig_key:        32:A0:28:7F:84:1A:03:6F:A3:93:C1:E0:65:C4:3A:E6:B2:42:26:43
sig_hashalgo:   sha256
signature:      0B:7F:71:B7:60:71:43:5A:78:9B:7A:A0:9B:80:CC:B1:22:DE:6E:01:
                97:CF:38:0D:14:6A:A7:5D:A8:E5:84:DE:89:6E:28:78:73:90:D1:CF:
                AD:87:4E:0D:92:A9:32:68:36:B8:1A:8A:AF:8E:38:73:85:B8:42:CC:
                77:63:4B:00:06:B9:07:33:5C:63:62:00:D8:4A:FD:64:71:DB:CC:CB:
                20:4B:47:12:F9:C5:38:B8:DA:02:72:8D:55:CE:BF:5D:0A:BE:73:22:
                B5:8B:C7:A6:71:49:43:23:EA:23:E2:7B:0D:F4:7D:22:FE:36:E8:00:
                64:F2:93:9D:45:54:49:09:58:0E:DE:54:A8:17:1D:66:2E:21:47:1D:
                C7:A8:2F:41:41:AC:80:0C:30:7B:21:CD:B5:05:93:69:50:1A:65:DB:
                03:D5:C8:06:5D:CE:5B:45:6C:F2:D3:6D:58:A8:56:C3:46:89:05:FA:
                5E:FD:04:EA:29:5F:1B:5A:E6:40:5D:7E:46:D5:61:30:AC:E8:83:A8:
                67:E6:05:26:91:7F:43:31:E3:70:A1:CF:E9:E5:65:5A:46:AE:86:F4:
                84:6F:A8:18:A7:C4:97:67:71:76:93:4C:7E:3F:68:65:3D:E9:A1:0C:
                FB:BB:94:58:52:AE:9F:F0:9D:74:FC:60:10:EF:4A:83
parm:           debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm:           prof_sel:profile selector. Valid range 0 - 2 (uint)

下载驱动

下载地址:

https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

选择对应的 debian 版本,最新的 24.10-2.1.8.0 版本已经提供对 debian 12.5 版本的支持了:

mlnx_ofed_download

下载得到 MLNX_OFED_LINUX-24.10-2.1.8.0-debian12.5-x86_64.tgz 文件, scp 传到 debian 12 下。

关闭 secure boot

需要在物理机或者虚拟机的 bios 中关闭了 secure boot,会和最新 24.10-2.1.8.0 版本的 mlnx_ofed 驱动冲突。

pve虚拟机中如图:

“Device Manager” -> “Secure Boot Configuration” -> “Attempt Secure Boot” -> 取消勾选。

disable-secure-boot

否则安装最新版本的驱动后会报错而导致网卡无法使用。

安装驱动

su root
tar xvf MLNX_OFED_LINUX-24.10-2.1.8.0-debian12.5-x86_64.tgz
cd MLNX_OFED_LINUX-24.10-2.1.8.0-debian12.5-x86_64

设置 PATH 否则默认 PATH 会找不到某些重要的命令而失败:

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

配置代理,加快下载速度:

export all_proxy=socks5://192.168.0.1:7891;export http_proxy=http://192.168.0.1:7890;export https_proxy=http://192.168.0.1:7890;export no_proxy=127.0.0.1,localhost,local,.local,.lan,192.168.0.0/16,10.0.0.0/16

开始安装:

./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk --distro debian12.5

注意对于某些版本的驱动要加 --distro debian12.5, 否则可能会报错:

Error: The current MLNX_OFED_LINUX is intended for debian12.1

这是因为我安装debian12时版本已经是 12.9了,而最新的 24.10-2.1.8.0 驱动针对 debian 12.5 的打包:

./mlnxofedinstall --print-distro
debian12.9

--with-nvmf --with-nfsrdma --ovs-dpdk 这三个参数是可选的,我增加这三个参数主要是为了要学习测试这几个功能。

安装过程如下:

$./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk

Logs dir: /tmp/MLNX_OFED_LINUX.3039.logs
General log file: /tmp/MLNX_OFED_LINUX.3039.logs/general.log

Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):

ofed-scripts
mlnx-tools
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
iser-dkms
isert-dkms
srp-dkms
mlnx-nvme-dkms
rdma-core
libibverbs1
ibverbs-utils
ibverbs-providers
libibverbs-dev
libibverbs1-dbg
libibumad3
libibumad-dev
ibacm
librdmacm1
rdmacm-utils
librdmacm-dev
ibdump
libibmad5
libibmad-dev
libopensm
opensm
opensm-doc
libopensm-devel
libibnetdisc5
infiniband-diags
mft
kernel-mft-dkms
perftest
ibutils2
ibsim
ibsim-doc
ucx
sharp
hcoll
knem-dkms
knem
openmpi
mpitests
libxpmem0
libxpmem-dev
srptools
mlnx-ethtool
mlnx-iproute2
rshim
ibarr
libopenvswitch
openvswitch-common
openvswitch-switch

This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:y

Checking SW Requirements...
Removing old packages...
Installing new packages
Installing ofed-scripts-24.10.OFED.24.10.2.1.8...
Installing mlnx-tools-24.10...
Installing mlnx-ofed-kernel-utils-24.10.OFED.24.10.2.1.8.1...
Installing mlnx-ofed-kernel-dkms-24.10.OFED.24.10.2.1.8.1...
Installing iser-dkms-24.10.OFED.24.10.2.1.8.1...
Installing isert-dkms-24.10.OFED.24.10.2.1.8.1...
Installing srp-dkms-24.10.OFED.24.10.2.1.8.1...
Installing mlnx-nvme-dkms-24.10.OFED.24.10.2.1.8.1...
Installing rdma-core-2410mlnx54...
Installing libibverbs1-2410mlnx54...
Installing ibverbs-utils-2410mlnx54...
Installing ibverbs-providers-2410mlnx54...
Installing libibverbs-dev-2410mlnx54...
Installing libibverbs1-dbg-2410mlnx54...
Installing libibumad3-2410mlnx54...
Installing libibumad-dev-2410mlnx54...
Installing ibacm-2410mlnx54...
Installing librdmacm1-2410mlnx54...
Installing rdmacm-utils-2410mlnx54...
Installing librdmacm-dev-2410mlnx54...
Installing ibdump-6.0.0...
Installing libibmad5-2410mlnx54...
Installing libibmad-dev-2410mlnx54...
Installing libopensm-5.21.0.MLNX20241126.d9aa3dff...
Installing opensm-5.21.0.MLNX20241126.d9aa3dff...
Installing opensm-doc-5.21.0.MLNX20241126.d9aa3dff...
Installing libopensm-devel-5.21.0.MLNX20241126.d9aa3dff...
Installing libibnetdisc5-2410mlnx54...
Installing infiniband-diags-2410mlnx54...
Installing mft-4.30.1...
Installing kernel-mft-dkms-4.30.1.113...
Installing perftest-24.10.0...
Installing ibutils2-2.1.1...
Installing ibsim-0.12...
Installing ibsim-doc-0.12...
Installing ucx-1.18.0...
Installing sharp-3.9.0.MLNX20241029.7a20b607...
Installing hcoll-4.8.3230...
Installing knem-dkms-1.1.4.90mlnx3...
Installing knem-1.1.4.90mlnx3...
Installing openmpi-4.1.7rc1...
Installing mpitests-3.2.24...
Installing libxpmem0-2.7...
Installing libxpmem-dev-2.7...
Installing srptools-2410mlnx54...
Installing mlnx-ethtool-6.9...
Installing mlnx-iproute2-6.10.0...
Installing rshim-2.1.10...
Installing ibarr-0.1.3...
Installing libopenvswitch-2.17.8...
Installing openvswitch-common-2.17.8...
Installing openvswitch-switch-2.17.8...
Selecting previously unselected package mlnx-fw-updater.
(Reading database ... 74800 files and directories currently installed.)
Preparing to unpack .../mlnx-fw-updater_24.10-2.1.8.0_amd64.deb ...
Unpacking mlnx-fw-updater (24.10-2.1.8.0) ...
Setting up mlnx-fw-updater (24.10-2.1.8.0) ...

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Skipping FW update.
Device (01:00.0):
        01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
        Link Width: x16
        PCI Link Speed: 16GT/s

Device (01:00.1):
        01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
        Link Width: x16
        PCI Link Speed: 16GT/s

Installation passed successfully
To load the new driver, run:
/etc/init.d/openibd restart
Note: In order to load the new nvme-rdma and nvmet-rdma modules, the nvme module must be reloaded.

重启之后, 24.01-0.3.3.1 和之前的版本就可以正常工作了。

报错:pci_hp_register failed

但最新的 24.07-0.6.1.0 版本会报错, ip addr 会发现 cx5 网卡不见了。

dmesg 查看,会发现有这样的错误提示:

pci_hp_register failed with error -16

如果升级 linix 内核,则会在升级时提示 “Your system has UEFI Secure Boot enabled”:

我就是根据这个线索,去虚拟机的 bios 中关闭了 secure boot:

重启就正常了。

安装后处理

查看安装后的驱动信息

$ lsmod | grep mlx

mlx5_ib               495616  0
ib_uverbs             188416  1 mlx5_ib
ib_core               462848  2 ib_uverbs,mlx5_ib
mlx5_core            2441216  1 mlx5_ib
mlxfw                  36864  1 mlx5_core
psample                20480  1 mlx5_core
mlxdevm               188416  1 mlx5_core
mlx_compat             20480  6 mlxdevm,mlxfw,ib_core,ib_uverbs,mlx5_ib,mlx5_core
tls                   135168  1 mlx5_core
pci_hyperv_intf        16384  1 mlx5_core

mlx5_core 的详细信息:

$ modinfo mlx5_core                
filename:       /lib/modules/6.1.0-31-amd64/updates/dkms/mlx5_core.ko
alias:          auxiliary:mlx5_core.eth-rep
alias:          auxiliary:mlx5_core.eth
basedon:        Korg 6.8-rc4
version:        24.10-2.1.8
license:        Dual BSD/GPL
description:    Mellanox 5th generation network adapters (ConnectX series) core driver
author:         Eli Cohen <eli@mellanox.com>
srcversion:     78352976D87E8F24553D352
alias:          pci:v000015B3d0000A2DFsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2DCsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D6sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D3sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000A2D2sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001025sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001023sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001021sv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Fsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Esv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Dsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Csv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Bsv*sd*bc*sc*i*
alias:          pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias:          pci:v000015B3d00001019sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001018sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001017sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001016sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001015sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001014sv*sd*bc*sc*i*
alias:          pci:v000015B3d00001013sv*sd*bc*sc*i*
alias:          auxiliary:mlx5_core.sf
depends:        mlxdevm,mlx_compat,tls,pci-hyperv-intf,psample,mlxfw
retpoline:      Y
name:           mlx5_core
vermagic:       6.1.0-31-amd64 SMP preempt mod_unload modversions
sig_id:         PKCS#7
signer:         DKMS module signing key
sig_key:        71:58:C0:4E:AD:2B:3B:86:A0:6C:5E:B4:1B:14:52:CB:B6:CA:D0:91
sig_hashalgo:   sha256
signature:      65:99:87:08:64:8C:B3:04:74:79:60:3E:83:4A:F8:48:96:C7:D4:3D:
                70:D6:32:B8:B3:6D:EE:CA:41:4D:CA:A4:8B:8F:60:79:17:81:21:40:
                41:87:E6:B9:25:C4:C8:1E:25:60:15:87:D3:F6:EA:D9:E6:CB:57:7B:
                DC:59:D1:C6:4E:6C:D3:80:BE:8E:D6:C0:1F:E3:1F:0F:07:8A:9B:E4:
                06:E7:7D:2C:27:C1:B6:47:01:5D:11:CC:DF:9A:14:63:0F:3F:84:6A:
                9F:22:AE:70:68:E8:7F:5F:AB:4D:DB:9B:C4:A1:E2:E6:9D:AD:E0:6C:
                BD:2D:50:4F:31:CA:29:C6:86:D0:D2:5E:B2:3F:44:7C:4E:23:A0:CD:
                E2:3C:2E:2A:DC:8E:7E:E1:B0:E8:7C:86:6D:78:29:E1:56:3A:91:BD:
                41:C4:40:09:0D:A1:75:70:0A:AC:44:AE:91:CB:26:64:1C:3C:F2:B5:
                E3:F6:0B:01:25:D9:D2:4E:5A:7D:EA:C4:86:4A:2F:93:0B:E5:AF:25:
                ED:D5:A3:50:99:94:A9:3B:8F:AD:B9:55:2D:A5:C3:E3:A5:A7:3B:F0:
                D0:A1:19:C9:8E:85:02:0E:A9:6A:54:94:41:C0:AE:E1:78:EB:D9:CB:
                A8:9E:9E:41:24:AF:D5:26:FF:DB:E5:08:0D:16:6B:0B
parm:           num_of_groups:Eswitch offloads number of big groups in FDB table. Valid range 1 - 1024. Default 15 (uint)
parm:           debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm:           prof_sel:profile selector. Valid range 0 - 4 (uint)

更新之后 mlx5_core 的版本从默认升级到 24.10-2.1.8 :

$ modinfo mlx5_core |  grep version

version:        24.10-2.1.8
srcversion:     78352976D87E8F24553D352
vermagic:       6.1.0-31-amd64 SMP preempt mod_unload modversions