配置 CX-4 网卡以开启 SR-IOV
配置 Mellanox connextx-4 网卡配置以便在 pve 中开启网卡的 SR-IOV 功能
准备工作
更新网卡固件
对于 mcx4121 网卡,参考:
https://skyao.io/learning-computer-hardware/nic/cx4121a/firmware/
设置主板bios
需要在主板中设置开启以下功能:
- 虚拟机支持
- vt-d
- SR-IOV
安装 pve-headers
对于 PVE,还需要安装额外的 pve-headers,否则后续安装 mft 时会报错,提示要安装 linux-headers 和 linux-headers-generic:
./install.sh
-E- There are missing packages that are required for installation of MFT.
-I- You can install missing packages using: apt-get install linux-headers linux-headers-generic
安装:
apt install -y gcc make dkms
apt install -y pve-headers-$(uname -r)
apt install --fix-broken
安装完成之后,需要重启,否则直接安装 mft,依然会继续同样报错。
安装 mft
对于 cx4 / cx5 等新一点的网卡,可以安装 mft 最新版本。
mkdir -p ~/work/soft/mellanox
cd ~/work/soft/mellanox
wget --no-check-certificate https://www.mellanox.com/downloads/MFT/mft-4.27.0-83-x86_64-deb.tgz
tar xvf mft-4.27.0-83-x86_64-deb.tgz
cd mft-4.27.0-83-x86_64-deb
执行安装脚本:
./install.sh
输出为:
-I- Removing mft external packages installed on the machine
-I- Removing the packages: kernel-mft-dkms...
-I- Installing package: /root/temp/mft-4.27.0-83-x86_64-deb/SDEBS/kernel-mft-dkms_4.27.0-83_all.deb
-I- Installing package: /root/temp/mft-4.27.0-83-x86_64-deb/DEBS/mft_4.27.0-83_amd64.deb
-I- Installing package: /root/temp/mft-4.27.0-83-x86_64-deb/DEBS/mft-autocomplete_4.27.0-83_amd64.deb
下载 mlxup
cd ~/work/soft/mellanox
wget https://www.mellanox.com/downloads/firmware/mlxup/4.22.1/SFX/linux_x64/mlxup
chmod +x ./mlxup
./mlxup
能看到当前网卡的固件情况:
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX3Pro
Part Number: 764285-B21_Ax
Description: HP InfiniBand FDR/Ethernet 10Gb/40Gb 2-port 544+FLR-QSFP Adapter
PSID: HP_1380110017
PCI Device Name: /dev/mst/mt4103_pci_cr0
Port1 GUID: c4346bffffdfe181
Port2 GUID: c4346bffffdfe182
Versions: Current Available
FW 2.42.5700 N/A
Status: No matching image found
安装 mstflint
apt install mstflint
设置网卡
启动 mft 工具
mst start
输出为:
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
查看网卡配置
mst status
可以看到当前网卡信息(这是只插了一块 mcx4121a 网卡的情况):
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
MST devices:
------------
/dev/mst/mt4117_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:06:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00
继续执行命令:
mlxconfig -d /dev/mst/mt4117_pciconf0 q
可以看到网卡的配置信息:
Device #1:
----------
Device type: ConnectX4LX
Name: MCX4121A-ACU_Ax
Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; UEFI Enabled; tall bracket
Device: /dev/mst/mt4117_pciconf0
Configurations: Next Boot
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
FLEX_PARSER_PROFILE_ENABLE 0
FLEX_IPV4_OVER_VXLAN_PORT 0
ROCE_NEXT_PROTOCOL 254
PF_NUM_OF_VF_VALID False(0)
NON_PREFETCHABLE_PF_BAR False(0)
VF_VPD_ENABLE False(0)
STRICT_VF_MSIX_NUM False(0)
VF_NODNIC_ENABLE False(0)
NUM_PF_MSIX_VALID True(1)
NUM_OF_VFS 8
NUM_OF_PF 2
SRIOV_EN True(1)
PF_LOG_BAR_SIZE 5
VF_LOG_BAR_SIZE 0
NUM_PF_MSIX 63
NUM_VF_MSIX 11
INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0)
PCIE_CREDIT_TOKEN_TIMEOUT 0
ACCURATE_TX_SCHEDULER False(0)
PARTIAL_RESET_EN False(0)
SW_RECOVERY_ON_ERRORS False(0)
RESET_WITH_HOST_ON_ERRORS False(0)
PCI_BUS0_RESTRICT_SPEED PCI_GEN_1(0)
PCI_BUS0_RESTRICT_ASPM False(0)
PCI_BUS0_RESTRICT_WIDTH PCI_X1(0)
PCI_BUS0_RESTRICT False(0)
PCI_DOWNSTREAM_PORT_OWNER Array[0..15]
CQE_COMPRESSION BALANCED(0)
IP_OVER_VXLAN_EN False(0)
MKEY_BY_NAME False(0)
UCTX_EN True(1)
PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
TUNNEL_ECN_COPY_DISABLE False(0)
LRO_LOG_TIMEOUT0 6
LRO_LOG_TIMEOUT1 7
LRO_LOG_TIMEOUT2 8
LRO_LOG_TIMEOUT3 13
ICM_CACHE_MODE DEVICE_DEFAULT(0)
TX_SCHEDULER_BURST 0
LOG_MAX_QUEUE 17
LOG_DCR_HASH_TABLE_SIZE 14
MAX_PACKET_LIFETIME 0
DCR_LIFO_SIZE 16384
ROCE_CC_PRIO_MASK_P1 255
ROCE_CC_PRIO_MASK_P2 255
CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1)
CLAMP_TGT_RATE_P1 False(0)
RPG_TIME_RESET_P1 300
RPG_BYTE_RESET_P1 32767
RPG_THRESHOLD_P1 1
RPG_MAX_RATE_P1 0
RPG_AI_RATE_P1 5
RPG_HAI_RATE_P1 50
RPG_GD_P1 11
RPG_MIN_DEC_FAC_P1 50
RPG_MIN_RATE_P1 1
RATE_TO_SET_ON_FIRST_CNP_P1 0
DCE_TCP_G_P1 1019
DCE_TCP_RTT_P1 1
RATE_REDUCE_MONITOR_PERIOD_P1 4
INITIAL_ALPHA_VALUE_P1 1023
MIN_TIME_BETWEEN_CNPS_P1 4
CNP_802P_PRIO_P1 6
CNP_DSCP_P1 48
CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1)
CLAMP_TGT_RATE_P2 False(0)
RPG_TIME_RESET_P2 300
RPG_BYTE_RESET_P2 32767
RPG_THRESHOLD_P2 1
RPG_MAX_RATE_P2 0
RPG_AI_RATE_P2 5
RPG_HAI_RATE_P2 50
RPG_GD_P2 11
RPG_MIN_DEC_FAC_P2 50
RPG_MIN_RATE_P2 1
RATE_TO_SET_ON_FIRST_CNP_P2 0
DCE_TCP_G_P2 1019
DCE_TCP_RTT_P2 1
RATE_REDUCE_MONITOR_PERIOD_P2 4
INITIAL_ALPHA_VALUE_P2 1023
MIN_TIME_BETWEEN_CNPS_P2 4
CNP_802P_PRIO_P2 6
CNP_DSCP_P2 48
LLDP_NB_DCBX_P1 False(0)
LLDP_NB_RX_MODE_P1 OFF(0)
LLDP_NB_TX_MODE_P1 OFF(0)
LLDP_NB_DCBX_P2 False(0)
LLDP_NB_RX_MODE_P2 OFF(0)
LLDP_NB_TX_MODE_P2 OFF(0)
ROCE_RTT_RESP_DSCP_P1 0
ROCE_RTT_RESP_DSCP_MODE_P1 DEVICE_DEFAULT(0)
ROCE_RTT_RESP_DSCP_P2 0
ROCE_RTT_RESP_DSCP_MODE_P2 DEVICE_DEFAULT(0)
DCBX_IEEE_P1 True(1)
DCBX_CEE_P1 True(1)
DCBX_WILLING_P1 True(1)
DCBX_IEEE_P2 True(1)
DCBX_CEE_P2 True(1)
DCBX_WILLING_P2 True(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 False(0)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
DO_NOT_CLEAR_PORT_STATS_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 False(0)
KEEP_ETH_LINK_UP_P2 True(1)
KEEP_IB_LINK_UP_P2 False(0)
KEEP_LINK_UP_ON_BOOT_P2 False(0)
KEEP_LINK_UP_ON_STANDBY_P2 False(0)
DO_NOT_CLEAR_PORT_STATS_P2 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P2 False(0)
NUM_OF_VL_P1 _4_VLs(3)
NUM_OF_TC_P1 _8_TCs(0)
NUM_OF_PFC_P1 8
VL15_BUFFER_SIZE_P1 0
NUM_OF_VL_P2 _4_VLs(3)
NUM_OF_TC_P2 _8_TCs(0)
NUM_OF_PFC_P2 8
VL15_BUFFER_SIZE_P2 0
DUP_MAC_ACTION_P1 LAST_CFG(0)
SRIOV_IB_ROUTING_MODE_P1 LID(1)
IB_ROUTING_MODE_P1 LID(1)
DUP_MAC_ACTION_P2 LAST_CFG(0)
SRIOV_IB_ROUTING_MODE_P2 LID(1)
IB_ROUTING_MODE_P2 LID(1)
PHY_FEC_OVERRIDE_P1 DEVICE_DEFAULT(0)
PHY_FEC_OVERRIDE_P2 DEVICE_DEFAULT(0)
PF_SD_GROUP 0
ROCE_CONTROL ROCE_ENABLE(2)
PCI_WR_ORDERING per_mkey(0)
MULTI_PORT_VHCA_EN False(0)
PORT_OWNER True(1)
ALLOW_RD_COUNTERS True(1)
RENEG_ON_CHANGE True(1)
TRACER_ENABLE True(1)
IP_VER IPv4(0)
BOOT_UNDI_NETWORK_WAIT 0
UEFI_HII_EN True(1)
BOOT_DBG_LOG False(0)
UEFI_LOGS DISABLED(0)
BOOT_VLAN 1
LEGACY_BOOT_PROTOCOL PXE(1)
BOOT_INTERRUPT_DIS False(0)
BOOT_LACP_DIS True(1)
BOOT_VLAN_EN False(0)
BOOT_PKEY 0
DYNAMIC_VF_MSIX_TABLE False(0)
ADVANCED_PCI_SETTINGS False(0)
SAFE_MODE_THRESHOLD 10
SAFE_MODE_ENABLE True(1)
修改网卡配置
mlxconfig -d /dev/mst/mt4117_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=8
对于有多块网卡的机器,可以根据需要决定是否为其中的一块或者多块网卡开启 SR-IOV ,然后需要开启的设置 SRIOV_EN=1 ,不需要开启的设置 SRIOV_EN=0。
重启机器。
开启 SR-IOV
开启前的检查
查看网卡情况:
ip addr
enp6s0f0np0 和 enp6s0f1np1 为这块 cx4 lx 双头网卡的两个网口,这里我们使用了 enp6s0f1np1:
2: enp6s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b8:ce:f6:0b:d9:1c brd ff:ff:ff:ff:ff:ff
3: enp6s0f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
link/ether b8:ce:f6:0b:d9:1d brd ff:ff:ff:ff:ff:ff
查看网卡 pci 情况:
lspci -k | grep -i ethernet
可以看到 enp6s0f1np1 这个网卡对应的是 06:00.1 :
06:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
执行命令查看网卡的 vfs 设置:
cat /sys/bus/pci/devices/0000:06:00.1/sriov_totalvfs
8
开启的脚本
新建文件 cx4sriov.service:
cd /etc/systemd/system
vi cx4sriov.service
内容为(注意替换 enp6s0f1np1):
[Unit]
Description=Script to enable SR-IOV for cx4 NIC on boot
[Service]
Type=simple
start SR-IOV
ExecStartPre=/usr/bin/bash -c '/usr/bin/echo 8 > /sys/class/net/enp6s0f1np1/device/sriov_numvfs'
set VF MAC
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 0 mac 00:80:00:00:00:00'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 1 mac 00:80:00:00:00:01'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 2 mac 00:80:00:00:00:02'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 3 mac 00:80:00:00:00:03'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 4 mac 00:80:00:00:00:04'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 5 mac 00:80:00:00:00:05'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 6 mac 00:80:00:00:00:06'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set dev enp6s0f1np1 vf 7 mac 00:80:00:00:00:07'
set PF up
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1 up'
set VF up
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v0 up'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v1 up'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v2 up'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v3 up'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v4 up'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v5 up'
ExecStartPre=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v6 up'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp6s0f1np1v7 up'
Restart=on-failure
[Install]
WantedBy=multi-user.target
启用 cx4sriov.service:
systemctl daemon-reload
systemctl enable cx4sriov.service
reboot
重启之后查看网卡信息:
$ lspci -k | grep -i ethernet
06:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:01.2 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:01.3 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:01.4 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:01.5 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:01.6 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:01.7 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
06:02.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp6s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b8:ce:f6:0b:d9:1c brd ff:ff:ff:ff:ff:ff
3: enp6s0f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
link/ether b8:ce:f6:0b:d9:1d brd ff:ff:ff:ff:ff:ff
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether b8:ce:f6:0b:d9:1d brd ff:ff:ff:ff:ff:ff
inet 192.168.0.109/24 scope global vmbr0
valid_lft forever preferred_lft forever
inet6 fe80::bace:f6ff:fe0b:d91d/64 scope link
valid_lft forever preferred_lft forever
5: enp6s0f1v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ae:0a:fe:86:e3:9a brd ff:ff:ff:ff:ff:ff
6: enp6s0f1v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 12:e3:9e:f9:22:b6 brd ff:ff:ff:ff:ff:ff
7: enp6s0f1v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 2a:f5:00:14:0f:cc brd ff:ff:ff:ff:ff:ff
8: enp6s0f1v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ba:5d:6c:3d:4a:68 brd ff:ff:ff:ff:ff:ff
9: enp6s0f1v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 86:e2:c0:ec:a3:d8 brd ff:ff:ff:ff:ff:ff
10: enp6s0f1v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether aa:52:93:a7:01:af brd ff:ff:ff:ff:ff:ff
11: enp6s0f1v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether be:b2:59:79:13:9f brd ff:ff:ff:ff:ff:ff
12: enp6s0f1v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:07:60:b5:04:74 brd ff:ff:ff:ff:ff:ff