1 - 修改swap分区参数

修改swap分区参数,更多使用物理内存

打开文件:

vi /etc/sysctl.conf

在文件最后增加内容:

vm.swappiness = 1

设置内存剩余多少时使用 swap 分区,数字越小代表多使用物理内存。

sysctl -p

让配置生效。

2 - 修改locale设置

修改locale设置避免报错

修改配置

默认安装后,有时会遇到 locale 报错:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_IDENTIFICATION = "zh_CN.UTF-8",
	LC_NUMERIC = "zh_CN.UTF-8",
	LC_TIME = "zh_CN.UTF-8",
	LC_PAPER = "zh_CN.UTF-8",
	LC_MONETARY = "zh_CN.UTF-8",
	LC_TELEPHONE = "zh_CN.UTF-8",
	LC_MEASUREMENT = "zh_CN.UTF-8",
	LC_NAME = "zh_CN.UTF-8",
	LC_ADDRESS = "zh_CN.UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

最简单的修改方案:

vi /etc/default/locale

将内容修改为:

LC_CTYPE="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
LANG="en_US.UTF-8"

参考资料

3 - 合并local-lvm和local分区

合并local-lvm和local分区

删除 local-lvm

lvremove pve/data

输出:

lvremove pve/data
Do you really want to remove active logical volume pve/data? [y/n]: y
  Logical volume "data" successfully removed.

扩展local

lvextend -l +100%FREE -r pve/root

输出为:

  Size of logical volume pve/root changed from 96.00 GiB (24576 extents) to 885.25 GiB (226624 extents).
  Logical volume pve/root successfully resized.
resize2fs 1.47.0 (5-Feb-2023)
Filesystem at /dev/mapper/pve-root is mounted on /; on-line resizing required
old_desc_blocks = 12, new_desc_blocks = 111
The filesystem on /dev/mapper/pve-root is now 232062976 (4k) blocks long.

完成后,查看修改后的状态:

df -Th  

/dev/mapper/pve-root 现在是 831G 了。

Filesystem           Type      Size  Used Avail Use% Mounted on
udev                 devtmpfs   63G     0   63G   0% /dev
tmpfs                tmpfs      13G  1.9M   13G   1% /run
/dev/mapper/pve-root ext4      871G  4.0G  831G   1% /
tmpfs                tmpfs      63G   66M   63G   1% /dev/shm
tmpfs                tmpfs     5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p2       vfat     1022M  344K 1022M   1% /boot/efi
/dev/fuse            fuse      128M   48K  128M   1% /etc/pve
tmpfs                tmpfs      13G     0   13G   0% /run/user/0

对比修改前的:

df -Th
Filesystem           Type      Size  Used Avail Use% Mounted on
udev                 devtmpfs   63G     0   63G   0% /dev
tmpfs                tmpfs      13G  2.0M   13G   1% /run
/dev/mapper/pve-root ext4       94G  4.0G   86G   5% /
tmpfs                tmpfs      63G   66M   63G   1% /dev/shm
tmpfs                tmpfs     5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p2       vfat     1022M  344K 1022M   1% /boot/efi
/dev/fuse            fuse      128M   48K  128M   1% /etc/pve
tmpfs                tmpfs      13G     0   13G   0% /run/user/0

页面删除Local-LVM

管理页面上,找到 “Datacenter” -> “Storage” ,找到 local-lvm ,然后点击删除。

修改 local 属性

“Datacenter” -> “Storage” 下,选择 local, 编辑 Content ,选中所有内容。

参考资料

4 - 清理不再使用的内核

清理不再使用的内核

脚本操作

https://tteck.github.io/Proxmox/

找到 Proxmox VE Kernel Clean 这个脚本,执行:

bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/misc/kernel-clean.sh)"

也可以手工把这个脚本下载到本地,方便以后执行本地运行:

mkdir -p ~/work/soft/pve
cd ~/work/soft/pve
wget https://github.com/tteck/Proxmox/raw/main/misc/kernel-clean.sh
chmod +x kernel-clean.sh

以后运行时,就只要执行

~/work/soft/pve/kernel-clean.sh

手工操作

参考: https://asokolsky.github.io/proxmox/kernels.html

5 - 安装配置timeshift

安装配置 timeshift 来对 PVE 进行备份和恢复

前面在安装 pve 时,选择了留出 50G 大小的空间给 timeshift 使用。

为 timeshift 准备分区

查看当前磁盘分区情况

fdisk -l

查看当前硬盘情况:

Disk /dev/nvme0n1: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC WDS500G1B0C-00S6U0                  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 000372CE-8B4D-4B12-974B-99C231DB3D67

Device           Start       End   Sectors  Size Type
/dev/nvme0n1p1      34      2047      2014 1007K BIOS boot
/dev/nvme0n1p2    2048   2099199   2097152    1G EFI System
/dev/nvme0n1p3 2099200 838860800 836761601  399G Linux LVM


Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 96 GiB, 103079215104 bytes, 201326592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

这是一个大小为 480G 的 nvme SSD 硬盘,实际可用空间为 465G。在安装 PVE 时我将 hdsize 设置为 400G, 因此 PVE 使用的空间只有400G。从上面输出可以看到:

/dev/nvme0n1p1      34      2047      2014 1007K BIOS boot
/dev/nvme0n1p2    2048   2099199   2097152    1G EFI System
/dev/nvme0n1p3 2099200 838860800 836761601  399G Linux LVM

nvme0n1p1 是 BIOS, nvme0n1p2 是 EFI 分区,PVE 使用的是 nvme0n1p3,399G。

另外可以看到:

Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Disk /dev/mapper/pve-root: 96 GiB, 103079215104 bytes, 201326592 sectors

swap 使用了 8G, pve root 使用了 96 G。

lsblk -f

可以看到 nvme0n1p3 是一个 LVM2,400G 的总空间中,内部被分为 pve-swap / pve-root / pve-data_tmeta


NAME               FSTYPE      FSVER    LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1                                                                                             
|-nvme0n1p1                                                                                         
|-nvme0n1p2        vfat        FAT32          26F2-091E                              1021.6M     0% /boot/efi
`-nvme0n1p3        LVM2_member LVM2 001       foXyBZ-IPQT-wOCF-DWSN-u88a-sBVD-LaZ9zn                
  |-pve-swap       swap        1              c5d97cb5-4d18-4f43-a36f-fb67eb8dcc84                  [SWAP]
  |-pve-root       ext4        1.0            a0a46add-dda5-42b4-a978-97d363eeddd0     85.2G     4% /
  |-pve-data_tmeta                                                                                  
  | `-pve-data                                                                                      
  `-pve-data_tdata                                                                                  
    `-pve-data   

剩余的大约 60G 的空间在这里没有显示。

添加 timeshift 分区

fdisk /dev/nvme0n1

输出为:

Welcome to fdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

This disk is currently in use - repartitioning is probably a bad idea.
It's recommended to umount all file systems, and swapoff all swap
partitions on this disk.

在警告说磁盘使用中,重新分区不是一个好注意。不过我们是操作剩余空间,比较安全。

输入 m 获得帮助:

Command (m for help): m

Help:

  GPT
   M   enter protective/hybrid MBR

  Generic
   d   delete a partition
   F   list free unpartitioned space
   l   list known partition types
   n   add a new partition
   p   print the partition table
   t   change a partition type
   v   verify the partition table
   i   print information about a partition

  Misc
   m   print this menu
   x   extra functionality (experts only)

  Script
   I   load disk layout from sfdisk script file
   O   dump disk layout to sfdisk script file

  Save & Exit
   w   write table to disk and exit
   q   quit without saving changes

  Create a new label
   g   create a new empty GPT partition table
   G   create a new empty SGI (IRIX) partition table
   o   create a new empty MBR (DOS) partition table
   s   create a new empty Sun partition table

输入 F 列出未分区的剩余空间:

Command (m for help): F

Unpartitioned space /dev/nvme0n1: 65.76 GiB, 70610066944 bytes, 137910287 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes

    Start       End   Sectors  Size
838862848 976773134 137910287 65.8G

这个 65.8G 就是我们之前预留的空间。

输入 n 添加新的分区,默认是利用剩余空间的全部。这里敲三次回车全部默认即可。

Command (m for help): n
Partition number (4-128, default 4): 
First sector (838860801-976773134, default 838862848): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (838862848-976773134, default 976773119): 

Created a new partition 4 of type 'Linux filesystem' and of size 65.8 GiB.

如果不是利用全部剩余空间,则可以通过输入 +100G 这样的内容来设置要添加的分区大小。

输入 p 显示修改之后的分区表:

Command (m for help): p
Disk /dev/nvme0n1: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC WDS500G1B0C-00S6U0                  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 000372CE-8B4D-4B12-974B-99C231DB3D67

Device             Start       End   Sectors  Size Type
/dev/nvme0n1p1        34      2047      2014 1007K BIOS boot
/dev/nvme0n1p2      2048   2099199   2097152    1G EFI System
/dev/nvme0n1p3   2099200 838860800 836761601  399G Linux LVM
/dev/nvme0n1p4 838862848 976773119 137910272 65.8G Linux filesystem

可以看到我们新加的 nvme0n1p4 分区大小为 65.8G。

检查无误,输入 w 写入修改操作并退出。

Command (m for help): w
The partition table has been altered.
Syncing disks.

在 pve 下执行

fdisk -l 

再次检查分区情况:

......
Device             Start       End   Sectors  Size Type
/dev/nvme0n1p1        34      2047      2014 1007K BIOS boot
/dev/nvme0n1p2      2048   2099199   2097152    1G EFI System
/dev/nvme0n1p3   2099200 838860800 836761601  399G Linux LVM
/dev/nvme0n1p4 838862848 976773119 137910272 65.8G Linux filesystem

至此 timeshift 分区准备OK.

格式化 timeshift 分区

mkfs.ext4 /dev/nvme0n1p4

对分区进行格式化,格式为 ext4:


mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done                            
Creating filesystem with 17238784 4k blocks and 4317184 inodes
Filesystem UUID: 2411eb4e-67f1-4c7d-b633-17df1fa0c127
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done 

安装 timeshift

具体参考这里:

https://skyao.io/learning-ubuntu-server/docs/installation/timeshift/install/

安装配置完成后,执行

timeshift --list

查看 timeshift 情况:


First run mode (config file not found)
Selected default snapshot type: RSYNC
Mounted '/dev/nvme0n1p4' at '/run/timeshift/5364/backup'
Device : /dev/nvme0n1p4
UUID   : 2411eb4e-67f1-4c7d-b633-17df1fa0c127
Path   : /run/timeshift/5364/backup
Mode   : RSYNC
Status : No snapshots on this device
First snapshot requires: 0 B

No snapshots found

使用 timeshift 进行备份

设置自动备份

vi /etc/timeshift/timeshift.json

打开 timeshift 配置文件,修改以下内容

{
  "schedule_monthly" : "true",
  "schedule_weekly" : "true",
  "schedule_daily" : "true",
  "schedule_hourly" : "false",
  "schedule_boot" : "false",
  "count_monthly" : "2",
  "count_weekly" : "3",
  "count_daily" : "5",
  "count_hourly" : "6",
  "count_boot" : "5",
}

对于软路由机器,由于是长期开机并且改动不过,因此 schedule_monthly / schedule_weekly / schedule_daily 设置为 true,数量用默认值。

排除不需要备份的内容

默认情况下 timeshift 的配置文件中已经设置了 exclude:

{
  "exclude" : [
    "/var/lib/ceph/**",
    "/root/**"
  ],
  "exclude-apps" : []
}
  • /var/lib/ceph/: ceph 相关的文件

  • /root/: root 用户的 home 路径,如果使用其他用户,也需要将它们的 home 目录加入进来。

但这些是不够的,还需要增加以下内容:

  • /var/lib/vz/

    pve的各种文件,包括虚拟机,模版,备份,上传的iso文件等都在这里,这些文件太大,不适合用 timeshift 备份,因此必须排除。

    ls /var/lib/vz
    dump  images  private  snippets  template
    
    • dump: 这里保存 pve backup 备份下来的虚拟机镜像文件
    • images: 这里保存 pve 虚拟机镜像文件
    • template: iso 子目录中保存的是上传到 pve 的各种 iso/img 文件
  • /etc/pve/qemu-server/: pve 的虚拟机配置文件,这些文件也不要用 timeshift 备份,避免恢复时把虚拟机文件也给覆盖了。虚拟机文件的备份会由下一节中提到的自动脚本进行备份。

  • /root/data/backup/pve/pveConfBackup: pve虚拟机文件的自动备份目录,详情见下一节。

  • /mnt/pve: 在 pve 下使用 nfs 存储时,远程 nfs 会自动 mount 到这个目录下,这些文件肯定也不能被 timeshift 备份。因此必须排除。

最后通过设置的 exclude 为:

{
  ......
  "exclude" : [
    "/var/lib/ceph/**",
    "/root/**",
    "/var/lib/vz/**",
    "/etc/pve/qemu-server/**",
    "/root/data/backup/pve/pveConfBackup/**",
    "/mnt/pve/**"
  ],
}

设置自动备份虚拟机文件

见下一章,推荐完成这个操作之后再执行手工备份。

手工备份

timeshift --create --comments "first backup after apt upgrade and basic soft install"

输入为:

Estimating system size...
Creating new snapshot...(RSYNC)
Saving to device: /dev/nvme0n1p4, mounted at path: /run/timeshift/6888/backup
Syncing files with rsync...
Created control file: /run/timeshift/6888/backup/timeshift/snapshots/2023-07-25_21-10-04/info.json
RSYNC Snapshot saved successfully (41s)
Tagged snapshot '2023-07-25_21-10-04': ondemand
------------------------------------------------------------------------------

查看当前备份情况:

timeshift --list                                                                      
Mounted '/dev/nvme0n1p4' at '/run/timeshift/7219/backup'
Device : /dev/nvme0n1p4
UUID   : 2411eb4e-67f1-4c7d-b633-17df1fa0c127
Path   : /run/timeshift/7219/backup
Mode   : RSYNC
Status : OK
1 snapshots, 64.5 GB free

Num     Name                 Tags  Description                                            
------------------------------------------------------------------------------
0    >  2023-07-25_21-10-04  O     first backup after apt upgrade and basic soft install  

结合备份之前看到的 timeshift 备份分区的大小为 65.8G,减去这里的 64.5 GB free,也就是这个备份用掉了 1.3 G 的磁盘空间。

6 - 开启自动备份虚拟机配置文件

备份PVE的虚拟机配置文件,方便在必要时进行恢复

准备工作

建立备份存放的目录

建立备份存放的目录,为了不被 timeshift 影响,我们将备份目录放到 root 用户的 home 目录(即 /root 目录)。由于在前面 timeshift 设置中的 exclude 中排除了 /root/**,因此我们可以将备份目录放到 /root/** 下。

建立备份目录 /root/data/backup/pve:

mkdir -p /root/data/backup/pve
cd /root/data/backup/pve

启动虚拟机conf文件备份

通过 pvetools 脚本来启用 “自动备份虚拟机 conf 文件”

/root/work/soft/pvetools/pvetools.sh

输入前面准备的备份路径 /root/data/backup/pve

备份数量输入 8 。之后虚拟机配置文件就会自动备份到这个目录了。

可以对比一下备份的原始数据和备份后的文件:

  • 这是要备份的文件

    $ ls /etc/pve/qemu-server/                       
    1000.conf  1001.conf
    
  • 这时备份好的目录

    $ ls /root/data/backup/pve/pveConfBackup 
    20240315
    $ ls /root/data/backup/pve/pveConfBackup/20240315 
    1000.conf  1001.conf
    

恢复虚拟机配置文件

在虚拟机配置文件丢失的情况下(典型例子就是用 timeshift restore 功能从备份中恢复),可以通过获取备份的文件来进行恢复。

找到备份的文件:

$ cd /root/data/backup/pve/pveConfBackup/
$ ls 
20240315
$ cd 20240315                            
$ ls                      
1000.conf  1001.conf

复制到 /etc/pve/qemu-server/ 目录:

rm /etc/pve/qemu-server/*
cp /root/data/backup/pve/pveConfBackup/20240315/* /etc/pve/qemu-server/

配置 timeshift excludes

注意:必须将虚拟机配置文件备份路径加入到 timeshift excludes 中,避免在通过 timeshift 进行恢复时丢失备份的数据。

{
  ......
  "exclude" : [
    ......
    "/root/data/backup/pve/pveConfBackup/**",
    ......
  ],
}

7 - 恢复系统设置

在必要时恢复PVE的系统设置

8 - 进入恢复模式修复系统

在必要时通过进入 pve 的恢复模式来修复系统

背景

有时在经过错误的配置之后,pve 系统会无法使用,甚至无法进入系统和页面控制台。

此时需要有其他的方案来修复系统。

恢复模式

进入恢复模式

开机,选择 “Advanced options for Proxmox VE GNU/Linux” ,然后选择 Rescue mode。

用 root 密码登录即可。

启动集群

systemctl start pve-cluster

这是就可以访问目录 /etc/pve 来操作集群和虚拟机了。

操作虚拟机

比如关闭有问题的虚拟机:

qm stop id

修改虚拟机配置

cd /etc/pve/nodes/<node_name>/qume-server

这里能看到各个虚拟机的配置文件,打开相应的配置文件修改即可,比如取消虚拟机的开机,只要将 onboot 修改为 0.

9 - 固定网卡名称

固定网卡名称避免因pcie顺序发生变化造成网卡名称不一致

在Proxmox VE中,当 PCIe 设备发生变化时,网络接口名称可能会重新排序,这是一个已知问题。要解决这个问题,可以通过创建持久的网络接口命名规则来固定网卡名称。

问题描述

执行

ip addr

可以看到:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether 40:8d:5c:b4:9b:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s25
4: ens4f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 0c:42:a1:62:1a:88 brd ff:ff:ff:ff:ff:ff
    altname enp5s0f0np0
5: ens4f1np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 0c:42:a1:62:1a:89 brd ff:ff:ff:ff:ff:ff
    altname enp5s0f1np1
6: enp6s0f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 04:3f:72:c5:7e:fa brd ff:ff:ff:ff:ff:ff
7: enp6s0f1np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 04:3f:72:c5:7e:fb brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 40:8d:5c:b4:9b:e2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.98/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::428d:5cff:feb4:9be2/64 scope link
       valid_lft forever preferred_lft forever
9: tap1000i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr1000i0 state UNKNOWN group default qlen 1000
    link/ether 2a:1d:4a:70:6d:93 brd ff:ff:ff:ff:ff:ff
10: fwbr1000i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 9e:75:1d:6a:2a:bd brd ff:ff:ff:ff:ff:ff
11: fwpr1000p0@fwln1000i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 42:c3:f0:25:de:17 brd ff:ff:ff:ff:ff:ff
12: fwln1000i0@fwpr1000p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr1000i0 state UP group default qlen 1000
    link/ether 9e:75:1d:6a:2a:bd brd ff:ff:ff:ff:ff:ff
13: tap9002i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether 4e:da:a7:3c:4c:92 brd ff:ff:ff:ff:ff:ff

去掉 lo 和虚拟网卡,只剩下:

3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether 40:8d:5c:b4:9b:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s25
4: ens4f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 0c:42:a1:62:1a:88 brd ff:ff:ff:ff:ff:ff
    altname enp5s0f0np0
5: ens4f1np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 0c:42:a1:62:1a:89 brd ff:ff:ff:ff:ff:ff
    altname enp5s0f1np1
6: enp6s0f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 04:3f:72:c5:7e:fa brd ff:ff:ff:ff:ff:ff
7: enp6s0f1np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 04:3f:72:c5:7e:fb brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 40:8d:5c:b4:9b:e2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.98/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::428d:5cff:feb4:9be2/64 scope link
       valid_lft forever preferred_lft forever

为了让 linux bridge 工作, vmbr0 的 bridge-ports 就必须将所有的网卡名称列进去:

vi /etc/network/interfaces

可以看到 bridge-ports:

auto vmbr0
iface vmbr0 inet static
        address 192.168.3.98/24
        gateway 192.168.3.1
        bridge-ports eno1 ens4f0np0 ens4f1np1 enp6s0f0np0 enp6s0f1np1
        bridge-stp off
        bridge-fd 0

网卡的命名中有数字表示顺序,这个顺序取决于系统内所有 pcie 设备的排序,例如:

$ lspci

00:00.0 Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 01)
00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 01)
00:02.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 01)
00:02.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 01)
00:02.3 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 01)
00:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01)
00:03.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01)
00:05.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management (rev 01)
00:05.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug (rev 01)
00:05.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO RAS/Control Status/Global Errors (rev 01)
00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation C610/X99 series chipset HD Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
00:1c.5 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #6 (rev d5)
00:1c.7 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #8 (rev d5)
00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R5 430 OEM / R7 240/340 / Radeon 520 OEM]
09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series]
......

单独看网卡是这样的,我这里有四块网卡,其中两块25g cx4121 网卡是双头的:

$ lspci | grep Ethernet

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
06:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)

可以用下面的命令查看:

$ udevadm info -a /sys/class/net/enp6s0f1np1 | grep "KERNELS==" | head -n 1
    KERNELS=="0000:06:00.1"

输出的 KERNELS=="0000:06:00.1" 就是这块网卡的 pcie 顺序。然后根据这个顺序推导出网卡名称 enp6s0f1np1。

备注:这里的 08:00.0 这块网卡被我直通给 openwrt 虚拟机了,因此不放在 vmbr0 中的 bridge-ports。

因此,一旦 pcie 设备有变化,比如增加或者减少了某个设备(不限于网卡,也可以是其他设备),比如开启或者关闭了板载声卡之类,也会导致 pcie 顺序的变化,从而造成网卡名称的变化,最后基于网卡名称的各种设置就都会发生错乱, 比如 vmbr0 的 bridge-ports。

解决问题的思路

基本思路就是在网卡命名时,不要绑定不稳定的 pcie 顺序,而且绑定其他固定的属性,最常用的方案就是绑定 MAC 地址。

通常情况下,一个 pve 设备中的网卡数量和网卡 mac 地址都是固定的,除非明确更换或者增减网卡,否则不会发生变化。

这样,不管网卡和其他 pcie 设备的增减和顺序调整,都不会影响到网卡的命名,然后基于网卡名称的各种设备也就稳定了。

因此,后面将根据网卡的 mac 地址,将网卡重新命名为,暂时定义的规则为类似 en25g1 这样:

  • en:前缀,表示 Ethernet 以太网网卡
  • 1g:网卡速度,根据实际情况可能有 1g/2g/10g/40g/100g,分别对应千兆,2.5g网卡,万兆,40g,100g网卡
  • 1:网卡序号,在同速度网卡中排列,方便起见,不以 0 开头,而是 1/2/3/4 这样表示第一/第二/第三/第四块网卡

按照这样的规则,上面的五块网卡,名称会固定为

  • en1g1
  • en25g1 / en25g2 / en25g3 / en25g4

这样 vmbr0 的 bridge-ports 就可以写成 en1g1 en25g1 en25g2 en25g3 en25g4,之后不管网卡顺序和pcie设备调整,都不会影响 vmbr0 的 bridge-ports 工作。

实现方案

准备脚本

mkdir -p ~/work/soft/pve
cd ~/work/soft/pve

vi ~/work/soft/pve/fix_pve_nic_name.sh 

输入以下内容:

#!/bin/bash

# 检查 root 权限
if [ "$(id -u)" -ne 0 ]; then
    echo "请使用 root 用户运行此脚本!" >&2
    exit 1
fi

RULE_FILE="/etc/udev/rules.d/70-persistent-net.rules"

echo "正在生成基于 MAC 地址的网卡命名规则..."
echo "# 由脚本自动生成的持久化网卡命名规则(基于MAC地址)" > "$RULE_FILE"
echo "# 生成时间: $(date)" >> "$RULE_FILE"
echo "" >> "$RULE_FILE"

# 遍历所有网络接口
for DEV in /sys/class/net/*; do
    INTERFACE=$(basename "$DEV")
    MAC_FILE="$DEV/address"

    # 跳过虚拟接口和没有MAC地址的设备
    [[ "$INTERFACE" == "lo" ]] && continue
    [[ ! -f "$MAC_FILE" ]] && continue

    MAC=$(cat "$MAC_FILE" 2>/dev/null)
    [[ -z "$MAC" ]] && continue

    # 写入规则(仅使用 MAC 地址)
    echo "SUBSYSTEM==\"net\", ACTION==\"add\", ATTR{address}==\"$MAC\", NAME=\"$INTERFACE\"" >> "$RULE_FILE"
done

# 重新加载 udev 规则
# udevadm control --reload-rules
# udevadm trigger

echo ""
echo "规则已生成到 $RULE_FILE,内容如下:"
echo "----------------------------------------"
cat "$RULE_FILE"
echo "----------------------------------------"
echo ""

然后添加执行权限:

chmod +x fix_pve_nic_name.sh

再执行该脚本:

./fix_pve_nic_name.sh

输出为:

正在生成基于 MAC 地址的网卡命名规则...

规则已生成到 /etc/udev/rules.d/70-persistent-net.rules,内容如下:
----------------------------------------
# 由脚本自动生成的持久化网卡命名规则(基于MAC地址)
# 生成时间: Mon Apr 14 06:31:37 AM CST 2025

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="40:8d:5c:b4:9b:e2", NAME="eno1"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="04:3f:72:c5:7e:fa", NAME="enp6s0f0np0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="04:3f:72:c5:7e:fb", NAME="enp6s0f1np1"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="0c:42:a1:62:1a:88", NAME="ens4f0np0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="0c:42:a1:62:1a:89", NAME="ens4f1np1"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="9e:75:1d:6a:2a:bd", NAME="fwbr1000i0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="9e:75:1d:6a:2a:bd", NAME="fwln1000i0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="42:c3:f0:25:de:17", NAME="fwpr1000p0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="2a:1d:4a:70:6d:93", NAME="tap1000i0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="4e:da:a7:3c:4c:92", NAME="tap9002i0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="40:8d:5c:b4:9b:e2", NAME="vmbr0"
vi /etc/udev/rules.d/70-persistent-net.rules

打开 /etc/udev/rules.d/70-persistent-net.rules 文件,删除不需要的内容,只保留这物理网卡:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="40:8d:5c:b4:9b:e2", NAME="eno1"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="04:3f:72:c5:7e:fa", NAME="enp6s0f0np0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="04:3f:72:c5:7e:fb", NAME="enp6s0f1np1"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="0c:42:a1:62:1a:88", NAME="ens4f0np0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="0c:42:a1:62:1a:89", NAME="ens4f1np1"

然后按照前面的命名规则修改网卡名字:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="40:8d:5c:b4:9b:e2", NAME="en1g1"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="04:3f:72:c5:7e:fa", NAME="en25g3"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="04:3f:72:c5:7e:fb", NAME="en25g4"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="0c:42:a1:62:1a:88", NAME="en25g1"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="0c:42:a1:62:1a:89", NAME="en25g2"

执行下面的命令让修改好的规则生效:

udevadm control --reload-rules
udevadm trigger

执行下面的命令检验一下重命名的规则:

udevadm test /sys/class/net/eno1 2>&1 | grep -i 'renamed'
udevadm test /sys/class/net/ens4f0np0 2>&1 | grep -i 'renamed'
udevadm test /sys/class/net/ens4f1np1 2>&1 | grep -i 'renamed'
udevadm test /sys/class/net/enp6s0f0np0 2>&1 | grep -i 'renamed'
udevadm test /sys/class/net/enp6s0f1np1 2>&1 | grep -i 'renamed'

输出为:

en1g1: Network interface 3 is renamed from 'eno1' to 'en1g1'
en25g1: Network interface 4 is renamed from 'ens4f0np0' to 'en25g1'
en25g2: Network interface 5 is renamed from 'ens4f1np1' to 'en25g2'
en25g3: Network interface 6 is renamed from 'enp6s0f0np0' to 'en25g3'
en25g4: Network interface 7 is renamed from 'enp6s0f1np1' to 'en25g4'

最后,去修改 vmbr0 的 bridge-ports:

# 修改前备份一下
cp /etc/network/interfaces /etc/network/interfaces_backup
vi /etc/network/interfaces

设置内容为:

auto lo
iface lo inet loopback

iface en1g1 inet manual

iface en25g1 inet manual

iface en25g2 inet manual

iface en25g3 inet manual

iface en25g4 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.3.98/24
        gateway 192.168.3.1
        bridge-ports en1g1 en25g1 en25g2 en25g3 en25g4
        bridge-stp off
        bridge-fd 0

source /etc/network/interfaces.d/*

重启 pve 机器。

之后重新登录,

ip addr

查看网卡信息:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
3: en1g1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether 40:8d:5c:b4:9b:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s25
    altname eno1
4: en25g1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 0c:42:a1:62:1a:88 brd ff:ff:ff:ff:ff:ff
    altname enp5s0f0np0
    altname ens4f0np0
5: en25g2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 0c:42:a1:62:1a:89 brd ff:ff:ff:ff:ff:ff
    altname enp5s0f1np1
    altname ens4f1np1
6: en25g3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 04:3f:72:c5:7e:fa brd ff:ff:ff:ff:ff:ff
    altname enp6s0f0np0
7: en25g4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN group default qlen 1000
    link/ether 04:3f:72:c5:7e:fb brd ff:ff:ff:ff:ff:ff
    altname enp6s0f1np1
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 40:8d:5c:b4:9b:e2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.98/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::428d:5cff:feb4:9be2/64 scope link
       valid_lft forever preferred_lft forever

至此网卡的名称就被固定下来,不受 pcie 设备顺序的影响了。