Preamble

So, why build a firewall with Proxmox as a hypervisor and not install bare-metal directly? Well, i ran OPNSense bare-metal for years on a Sophos XG105. It was stable and running fine. The Sophos also had an usb serial port, which was very useful for console access.

Unfortunaly, the system began to show its age and performance was not sufficient anymore. When there was a lot of network traffic on other VLANs, I had intermittent packet loss and lag spikes in video and voice calls. I tried to spread out the VLANs on all interfaces, but it only helped marginal because the CPU was the limiting factor. Also the ports were only just 1000BASE-T, meh.

So I recently bought a very cheap, refurbished lenovo m720q thin client. I ordered a PCI-E riser on aliexpress for an existing Mellanox Connect-X3 card and installed OPNSense directly bare-metal.
Unfortunaly I had problems with the hardware that I couldn’t clearly identify at first (in the end it was the CRC/TSO/LRO hardware offloading), which was a pain to debug, because I always needed to connect a monitor and keyboard.

So the most obvious solution was to use Proxmox as a virtualization layer. However, I didn’t want to go the classic route with a Linux bridge and virtio interfaces, but rather pass the card directly to the VM.

Pro:

  • painless snapshots for potential dangerous actions (Updates, big configuration changes etc.)
  • better utilization of resources
  • directly integrated in my backup structure (I’m using Proxmox Backup Server)

Cons:

  • very finicky to setup
  • hardware must support SR-IOV (or atleast IOMMU)
  • moar things to update

Enable IOMMU

On EFI booted systems you need to modify /etc/kernel/cmdline to include intel_iommu=on iommu=pt or on AMD systems amd_iommu=on iommu=pt.

For the M720q we also need the ACS override to be enabled (otherwise the IOMMU configuration is suboptimal and unusable for SR-IOV)

root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction

Activate the changes with update-initramfs -u -k all and reboot.

Configure virtual functions

If you want to use SR-IOV, you need to configure the virtual functions of the network card. So in this case, we need to use Mellanox tools for that.

Install the Mellanox Firmware Tool

NOTE: 4.25.1 is the latest version which has support for older cards such as the ConnectX-3.

wget https://www.mellanox.com/downloads/MFT/mft-4.25.1-11-x86_64-deb.tgz
tar xfv mft-4.25.1-11-x86_64-deb.tgz
./install.sh
mst start

Get the PCI-E Path

root@m720q:/etc/systemd/system# mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4103_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:01:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4103_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:01:00.0 bar=0xb3100000 size=0x100000
                                   Chip revision is: 00

Query the current configuration

root@m720q:~# mlxconfig -d /dev/mst/mt4103_pci_cr0 q

Device #1:
----------

Device type:    ConnectX3Pro
Device:         /dev/mst/mt4103_pci_cr0

Configurations:                                      Next Boot
        SRIOV_EN                                    True(1)
        NUM_OF_VFS                                  8
        PHY_TYPE_P1                                 XFI(2)
        XFI_MODE_P1                                 _10G(0)
        FORCE_MODE_P1                               False(0)
        PHY_TYPE_P2                                 XFI(2)
        XFI_MODE_P2                                 _10G(0)
        FORCE_MODE_P2                               False(0)
        LOG_BAR_SIZE                                5
        BOOT_OPTION_ROM_EN_P1                       True(1)
        BOOT_VLAN_EN_P1                             False(0)
        BOOT_RETRY_CNT_P1                           0
        LEGACY_BOOT_PROTOCOL_P1                     PXE(1)
        BOOT_VLAN_P1                                1
        BOOT_OPTION_ROM_EN_P2                       True(1)
        BOOT_VLAN_EN_P2                             False(0)
        BOOT_RETRY_CNT_P2                           0
        LEGACY_BOOT_PROTOCOL_P2                     PXE(1)
        BOOT_VLAN_P2                                1
        IP_VER_P1                                   IPv4(0)
        IP_VER_P2                                   IPv4(0)
        CQ_TIMESTAMP                                True(1)
        STEER_FORCE_VLAN                            False(0)

Enable SR-IOV and set the number of VFs

mlxconfig -d /dev/mst/mt4103_pci_cr0 set SRIOV_EN=1 NUM_OF_VFS=8

Set Kernel module options

NOTE: See the documentation for more information.

In my case i want four virtual functions for both ports, operating in ethernet mode.

echo "options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0 probe_vf=4,4,0" > /etc/modprobe.d/mlx4_core.conf

Reboot or reload the modules to take effect

modprobe -r mlx3_en mlx4_ib
modprobe mlx4_en

Check if the VFs have been created

root@m720q:/etc/modprobe.d# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 98:fa:9b:78:3b:a9 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: enp1s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 9c:dc:71:4d:0f:60 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 66:75:01:46:22:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 1     link/ether 66:75:01:46:22:01 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 2     link/ether 66:75:01:46:22:02 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 3     link/ether 66:75:01:46:22:03 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
4: enp1s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 9c:dc:71:4d:0f:61 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 4     link/ether 66:75:01:46:22:04 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 5     link/ether 66:75:01:46:22:05 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 6     link/ether 66:75:01:46:22:06 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 7     link/ether 66:75:01:46:22:07 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto

Configuring MAC addresses and VLANs

So, the VFs have been created. No MAC addresses are assigned by default; these are then regenerated at each start. This can be an undesirable behavior, but it is possible to adjust the MAC address with a systemd service.
The same applies to the VLAN assignment.

NOTE: there seems to be a reporting bug - each port of the card shows 8 virtual functions, although actually only 4 should exist. However, if you want to set the MAC addresses (or generally change anything on the interface), you have to use the VFs 4-7 for the second port, otherwise you will change the VFs of the first port again.

/etc/systemd/system/sriov-mac.service

[Unit]
Description=Script to set SR-IOV MACs and/or VLANs on boot

[Service]
Type=oneshot
# Setting static MAC for VFs
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0 vf 0 mac 66:75:01:46:22:00'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0 vf 1 mac 66:75:01:46:22:01'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0 vf 2 mac 66:75:01:46:22:02'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0 vf 3 mac 66:75:01:46:22:03'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0d1 vf 4 mac 66:75:01:46:22:04'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0d1 vf 5 mac 66:75:01:46:22:05'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0d1 vf 6 mac 66:75:01:46:22:06'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0d1 vf 7 mac 66:75:01:46:22:07'
# Settings VLANS
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp1s0d1 vf 7 vlan 50'

[Install]
WantedBy=multi-user.target

Configure virtual machines

After the VFs are setup you can pass through the virtual card as a normal physical one.

Proxmox PCI Devie Wizard