1 - debian 12 安装驱动
debian12
下载驱动
下载地址:https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/
下载得到 MLNX_OFED_LINUX-23.10-1.1.9.0-debian12.1-x86_64.tgz 文件, scp 传到 debian 12 下:
su root
tar xvf MLNX_OFED_LINUX-23.10-1.1.9.0-debian12.1-x86_64.tgz
cd MLNX_OFED_LINUX-23.10-1.1.9.0-debian12.1-x86_64
# 设置 PATH 否则默认 PATH 会找不到某些重要的命令而失败
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk --distro debian12.1
注意要加 --distro debian12.1
, 否则可能会报错:
Error: The current MLNX_OFED_LINUX is intended for debian12.1
这是因为我安装debian12时版本已经是 12.4了:
./mlnxofedinstall --print-distro
debian12.4
--with-nvmf --with-nfsrdma --ovs-dpdk
这三个参数是可选的,我增加这三个参数主要是为了要学习测试这几个功能。
2 - PVE 8.1 驱动
安装驱动
pve8.1 是基于 debian12 的,因此驱动安装方式和 debian 12 非常类似。同样下载驱动,然后执行:
./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk --distro debian12.1
......
Do you want to continue?[y/N]:y
Checking SW Requirements...
One or more required packages for installing MLNX_OFED_LINUX are missing.
Attempting to install the following missing packages:
pkg-config libnl-3-dev libgfortran5 flex m4 graphviz tcl ifupdown libltdl-dev uuid-runtime libnl-route-3-dev swig bison autoconf quilt gfortran lsb-base autotools-dev debhelper chrpath libipsec-mb1 automake tk
Failed command: apt-get install -y -o Dpkg::Options::='--force-confdef' -o Dpkg::Options::='--force-confold' pkg-config libnl-3-dev libgfortran5 flex m4 graphviz tcl ifupdown libltdl-dev uuid-runtime libnl-route-3-dev swig bison autoconf quilt gfortran lsb-base autotools-dev debhelper chrpath libipsec-mb1 automake tk
See /tmp/MLNX_OFED_LINUX.60098.logs/general.log#
会遇到失败,按照提示打开 /tmp/MLNX_OFED_LINUX.60098.logs/general.log
单独执行 apt install 命令看看:
apt-get install -y -o Dpkg::Options::='--force-confdef' -o Dpkg::Options::='--force-confold' pkg-config libnl-3-dev libgfortran5 flex m4 graphviz tcl ifupdown libltdl-dev uuid-runtime libnl-route-3-dev swig bison autoconf quilt gfortran lsb-base autotools-dev debhelper chrpath libipsec-mb1 automake tk
......
W: (pve-apt-hook) !! WARNING !!
W: (pve-apt-hook) You are attempting to remove the meta-package 'proxmox-ve'!
W: (pve-apt-hook)
W: (pve-apt-hook) If you really want to permanently remove 'proxmox-ve' from your system, run the following command
W: (pve-apt-hook) touch '/please-remove-proxmox-ve'
W: (pve-apt-hook) run apt purge proxmox-ve to remove the meta-package
W: (pve-apt-hook) and repeat your apt invocation.
按照提示,执行命令手工删除 proxmox-ve:
$ touch '/please-remove-proxmox-ve'
$ apt purge proxmox-ve
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be REMOVED:
proxmox-ve*
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 27.6 kB disk space will be freed.
Do you want to continue? [Y/n] y
W: (pve-apt-hook) '/please-remove-proxmox-ve' exists, proceeding with removal of package 'proxmox-ve'
(Reading database ... 121504 files and directories currently installed.)
Removing proxmox-ve (8.1.0) ...
(Reading database ... 121498 files and directories currently installed.)
Purging configuration files for proxmox-ve (8.1.0) ...
然后再次执行前面的 apt install 命令,就可以正常安装了。
再次执行 ./mlnxofedinstall 命令,提示需要加 --force
,最后执行的命令是:
./mlnxofedinstall --without-fw-update --with-nvmf --with-nfsrdma --ovs-dpdk --distro debian12.1 --force
成功安装好驱动。
不幸的是看到了这个一个提示,NFSoRDMA 不支持 6.5.11-7-pve 内核。
WARNING: NFSoRDMA is not supported over kernel 6.5.11-7-pve, will continue installation without it.
故障
但是,重启之后发现驱动不可用:
➜ ~ lsmod | grep mlx
mlxdevm 184320 0
mlxfw 36864 0
mlx_compat 20480 6 ib_ipoib,mlxdevm,ib_umad,ib_core,ib_uverbs,ib_cm
➜ ~ /etc/init.d/openibd
Usage: openibd {start|force-start|stop|force-stop|restart|force-restart|status}
➜ ~ /etc/init.d/openibd restart
Unloading HCA driver: [ OK ]
Failed loading kernel module mlx5_ib: [FAILED]
Loading Mellanox MLX5_IB HCA driver: [FAILED]
Failed loading kernel module mlx5_core: [FAILED]
Loading Mellanox MLX5 HCA driver: [FAILED]
Loading HCA driver and Access Layer: [FAILED]
Please run /usr/sbin/sysinfo-snapshot.py to collect the debug information
and open an issue in the http://support.mellanox.com/SupportWeb/service_center/SelfService
google了一下,发现这个讨论:
https://forum.proxmox.com/threads/upgrade-7-to-8-connect-4-dkms-module-installed.139297/
看样子似乎不需要安装额外的驱动,用内核自带的就好了。
3 - windows 驱动
经过验证,适用于 windows 10 和 windows server 2022。
物理机安装
下载地址: https://network.nvidia.com/products/adapter-software/ethernet/windows/winof-2/
直接安装即可。
虚拟机安装
在虚拟机下安装驱动,如果只有一个网卡则会遇到死结:没有网卡驱动就无法从网上下载驱动,因此无法安装网卡驱动。
解决的方案是提前准备一个包含驱动的 iso 文件,然后以 cd 载入 iso 文件的方式将驱动文件传入虚拟机。
各个操作系统下制作 iso 文件的方式如下。
macos 下制作iso
参考:
步骤如下:
-
将驱动文件放在一个文件夹下
-
用系统自带的 Disk Utility 创建一个装载该驱动文件所在目录的镜像文件, image format 选
DVD/CD master
格式,得到 cdr 文件 -
将 cdr 文件转为 iso 文件
hdiutil makehybrid -iso -joliet -o yourname.iso yourname.cdr
-
上传 iso 文件到 pve 下。