Networks configuration file¶
Description¶
/etc/pcocc/networks.yaml
is a YAML formatted file defining virtual networks available to pcocc VMs. Virtual networks are referenced through VM resource sets defined in the /etc/pcocc/resources.yaml
configuration file. For each virtual cluster, private instances of the virtual networks referenced by its VMs are created, which means each virtual network instance is only shared by VMs within a single virtual cluster.
A network is defined by its name, type and settings, which are specific to each network type. Two types of networks are supported: Ethernet and Infiniband.
Warning
Before editing this configuration file on a compute node, you should first make sure that no VMs are running on the node and execute the following command, as root:
pcocc internal setup cleanup
Syntax¶
/etc/pcocc/networks.yaml
contains a key/value mapping. Each key defines a network by its name and the associated value must contain two keys: type which defines the type of network to define, and settings which is a key/value mapping defining the parameters for this network. This is summed up in the example below:
# Define a network named 'network1'
network1:
# Select the network type
type: ethernet
# Define settings for ethernet networks
settings:
setting1: 'foo'
setting2: 'bar'
The following networks are supported:
Ethernet network¶
A virtual Ethernet network is defined by using the network type ethernet. A VM connected to a network of this type receives an Ethernet interface linked to an isolated virtual switch. All the VMs of a virtual cluster connected to a given network are linked to the same virtual switch. Connectivity is provided by encapsulating Ethernet packets from the VMs in IP tunnels between hypervisors. If the network-layer parameter is set to L2 pcocc only provides Ethernet layer 2 connectivity between the VMs. The network is entirely isolated and no services (such as DHCP) are provided, which means the user is responsible for configuring the VM interfaces as he likes. If the network-layer is set to L3 pcocc also manages IP addressing and optionally provides access to external networks through a gateway which performs NAT (Network Address Translation) using the hypervisor IP as source. Reverse NAT can also be setup to allow connecting to a VM port such as the SSH port from the outside. DHCP and DNS servers are automatically setup on the private network to provide IP addresses for the VMs. The available parameters are:
- dev-prefix
- Prefix to use when assigning names to virtual devices such as bridges and TAPs created on the host.
- network-layer
Whether pcocc should provide layer 3 services or only a layer 2 Ethernet network (see above). Can be set to:
- L3 (default): Manage IP layer and provide services such as DHCP
- L2: Only provide layer 2 connectivity
- mtu
- MTU of the Ethernet network. (defaults to 1500)
Warning
Please note that the MTU of the Ethernet interfaces in the VMs has to be set 50 bytes lower than this value to account for the encapsulation headers. The DHCP server on a L3 network automatically provides an appropriate value.
- mac-prefix
- Prefix to use when assigning MAC addresses to virtual Ethernet interfaces. MAC addresses are assigned to each VM in order starting from the MAC address constructed by appending zeros to the prefix. (defaults to 52:54:00)
- host-if-suffix
- Suffix to append to hostnames when establishing a remote tunnel if compute nodes have specific hostnames to address each network interface. For example, if a compute node known by SLURM as computeXX can reached more efficiently via IPoIB at the computeXX-ib address, the host-if-suffix parameter can be set to -ib so that the Ethernet tunnels between hypervisors transit over IPoIB.
The following parameters only apply for a L3 network:
- int-network
- IP network range in CIDR notation reserved for assigning IP addresses to VM network interfaces via DHCP. This network range should be unused on the host and not be routable. It is private to each virtual cluster and VMs get a fixed IP address depending on their rank in the virtual cluster. (defaults to 10.200.0.0/16)
- ext-network
- IP network range in CIDR notation reserved for assigning unique VM IPs on the host network stack. This network range should be unused on the host and not be routable. (defaults to 10.201.0.0/16)
- dns-server
- The IP of a domain name resolver to forward DNS requests. (defaults to reading resolv.conf on the host)
- domain-name
- The domain name to provide to VMs via DHCP. (defaults to pcocc.<host domain name>)
- dns-search:
- Comma separated DNS search list to provide to VMs via DHCP in addition to the domain name.
- ntp-server
- The IP of a NTP server to provide to VMs via DHCP.
- allow-outbound
- Set to none to prevent VMs from establishing outbound connections.
- reverse-nat
A key/value mapping which can be defined to allow inbound connections to a VM port via reverse NAT of a host port. It contains the following keys:
- vm-port
- The VM port to make accessible.
- min-host-port
- Minimum port to select on the host for reverse NATing.
- max-host-port
- Maximum port to select on the host for reverse NATing.
The example below defines a managed network with reverse NAT for SSH access:
# Define an ethernet network NAT'ed to the host network
# with a reverse NAT for the SSH port
nat-rssh:
type: ethernet
settings:
# Manage layer 3 properties such as VM IP adresses
network-layer: "L3"
# Name prefix used for devices created for this network
dev-prefix: "nat"
# MTU of the network
mtu: 1500
reverse-nat:
# VM port to expose on the host
vm-port: 22
# Range of free ports on the host to use for reverse NAT
min-host-port: 60222
max-host-port: 60322
The example below defines a private layer 2 network
# Define a private ethernet network isolated from the host
pv:
# Private ethernet network isolated from the host
type: ethernet
settings:
# Only manage Ethernet layer
network-layer: "L2"
# Name prefix used for devices created for this network
dev-prefix: "pv"
# MTU of the network
mtu: 1500
IB network¶
A virtual Infiniband network is defined by using the type infiniband. An Infiniband partition is allocated for each virtual Infiniband network instantiated by a virtual cluster. VMs connected to Infiniband networks receive direct access to an Infiniband SRIOV virtual function restricted to using the allocated partition as well as the default partition, as limited members, which is required for IPoIB.
Warning
This means that, for proper isolation of the virtual clusters, physical nodes should be set as limited members of the default partition and/or use other partitions for their communications.
pcocc makes use of a daemon on the OpenSM node which dynamically updates the partition configuration (which means pcocc has to be installed on the OpenSM node). The daemon generates the configuration from a template holding the static configuration to which it appends the dynamic configuration. Usually, you will want to copy your current configuration to the template file (/etc/opensm/partitions.conf.tpl in the example below) and have pcocc append its dynamic configuration to form the actual partition file referenced in the OpenSM configuration. The following parameters can be defined:
- host-device
- Device name of a physical function from which to map virtual functions in the VM.
- min-pkey
- Minimum pkey value to assign to virtual clusters.
- max-pkey
- Maximum pkey value to assign to virtual clusters.
- opensm-daemon
- Name of the OpenSM process (to signal from the pkeyd daemon).
- opensm-partition-cfg
- The OpenSM partition configuration file to generate dynamically.
- opensm-partition-tpl
- The file containing the static partitions to include in the generatied partition configuration file.
The example below sums up the available parameters:
ib:
# Infiniband network based on SRIOV virtual functions
type: infiniband
settings:
# Host infiniband device
host-device: "mlx5_0"
# Range of PKeys to allocate for virtual clusters
min-pkey: "0x2000"
max-pkey: "0x3000"
# Name of opensm process
opensm-daemon: "opensm"
# Configuration file for opensm partitions
opensm-partition-cfg: /etc/opensm/partitions.conf
# Template for generating the configuration file for opensm partitions
opensm-partition-tpl: /etc/opensm/partitions.conf.tpl
As explained above, pcocc must be installed on the OpenSM node(s) and the pkeyd daemon must be running to manage the partition configuration file:
systemctl enable pkeyd
systemctl start pkeyd
Sample configuration file¶
This is the default configuration file for reference:
# Define an ethernet network NAT'ed to the host network
# with a reverse NAT for the SSH port
nat-rssh:
type: ethernet
settings:
# Manage layer 3 properties such as VM IP adresses
network-layer: "L3"
# Private IP range for VM interfaces on this ethernet network.
int-network: "10.251.0.0/16"
# External IP range used to map private VM IPs to unique VM IPs on the
# host network stack for NAT.
ext-network: "10.250.0.0/16"
# Name prefix used for devices created for this network
dev-prefix: "nat"
# MTU of the network
mtu: 1500
reverse-nat:
# VM port to expose on the host
vm-port: 22
# Range of free ports on the host to use for reverse NAT
min-host-port: 60222
max-host-port: 60322
# Suffix to append to remote hostnames when tunneling
# Ethernet packets
host-if-suffix: ""
# Define a private ethernet network isolated from the host
pv:
# Private ethernet network isolated from the host
type: ethernet
settings:
# Only manage Ethernet layer
network-layer: "L2"
# Name prefix used for devices created for this network
dev-prefix: "pv"
# MTU of the network
mtu: 1500
# Suffix to append to remote hostnames when tunneling
# Ethernet packets
host-if-suffix: ""
# Define a private Infiniband network
ib:
# Infiniband network based on SRIOV virtual functions
type: infiniband
settings:
# Host infiniband device
host-device: "mlx5_0"
# Range of PKeys to allocate for virtual clusters
min-pkey: "0x2000"
max-pkey: "0x3000"
# Resource manager token to request when allocating this network
license: "pkey"
# Name of opensm process
opensm-daemon: "opensm"
# Configuration file for opensm partitions
opensm-partition-cfg: /etc/opensm/partitions.conf
# Template for generating the configuration file for opensm partitions
opensm-partition-tpl: /etc/opensm/partitions.conf.tpl