The Grumpy Troll

Ramblings of a grumpy troll.

LXC Routed on Ubuntu

Containers are a decent technology, whether they’re FreeBSD’s Jails, Solaris Zones or Linux’s version. Linux comes with the LXC tools which can be quite useful to managing the containers.

If you’re happy to use NAT in front of each container, or a proxy (such as SSH configuration using ProxyCommand to ssh to the containing host) or a web-proxy in front of services, the defaults are decent enough; to be able to directly connect to container service, you want the containers to be on a network which is reachable from outside that machine. There are a few ways to do this. Reasons might include use of Kerberos and Kerberised NFS.

This post summarizes some changes made months ago and some made more recently, so that such containers autoboot cleanly, can use NFS and can use separate routed networks where the network routes are distrbuted via DHCP without interfering with LXC.

Networking

One approach is to use one flat network and reserve DHCP ranges for each VM hosting box. That’s not the way this troll went.

I prefer to allocate a network to the VM hosts and tell the DHCP server to use the CIDR option (DHCP option 121) to provide routes to clients on the network to be able to reach those services, and as a fallback for clients which ignore this add static routes on my default router, so that in a worst-case scenario, there’s some doglegging in the traffic.

Thus each VM has an identifiable network (which I can perhaps switch around for failover). To do this, I use virbr0 on the VM hosts.

Edit /etc/lxc/default.conf to set lxc.network.link = virbr0. So far, I’ve also been editing /etc/default/lxc-net to set USE_LXC_BRIDGE="false" and LXC_BRIDGE="virbr0" – I need to dig deeper to confirm if editing these is actually needed.

With sudo virsh net-edit default (to edit /etc/libvirt/qemu/networks/default.xml) I set <forward mode='route'/>, switch the network to a netblock dedicated to this VM and change the DHCP range to a small block of addresses, leaving room for static assignments. To make the ‘static’ assignment, after later creating an LXC container, I can grab lxc.network.hwaddr from its configuration and using virsh net-edit add a new rule inside network→ip→dhcp along the lines of: <host mac='00:16:3e:00:11:22' name='container.tld' ip='192.0.2.10' />.

The problem with using the DHCP CIDR option (121) is that by default the dhclient hook will grab this and insert a route, then later libvirtd will see that the route already exists and refuse to assign that network to a new network interface. The fix for this is ugly script hacking.

Create a new file /etc/troll-net-reserve and inside it put, one-per-line, any netblocks which should be ignored if received as CIDR options.

# don't let dhclient insert a route for our local LXC network
192.0.2.0/24

Then modify /etc/dhcp/dhclient-exit-hooks.d/rfc3442-classless-routes to add a function near the top and invoke it just before adding a route:

is_local_virtual() {
        if ip -4 route get "$1" | grep -q 'dev virbr'; then
                return 0
        fi
        local l
        while read l ; do
                if [ ".$l" = ".$1" ]; then
                        return 0
                fi
        done < /etc/troll-net-reserve
        return 1
}
## .... just before adding the route
### between "take care of link-local routes"
### and "set route (ip detects host routes automatically)":
  if is_local_virtual "${net_address}/${net_length}"; then
        continue
  fi

Note that this setup only protects against exact matches between /etc/troll-net-reserve and the netblocks received via DHCP. Change the prefix length or use a contraining netblock and you’ll need more sophisticated scripting. For the common /24 case, this works well enough.

NFS

You can mount NFS from outside the container, which is the approach I use with NAT’d containers, although then the container is unaware of the mount-point and you’re not using the same uid space.

To mount NFS inside the container, you need to tell AppArmor to allow this; it is not necessary to modify the LXC conf file to avoid dropping the mac_admin capability: although mounting should require mac_admin, this appears to not be the case for NFS inside containers.

# vi /etc/apparmor.d/abstractions/lxc/container-base
...
# service apparmor restart

The rules I add are:

  # allow NFS
  mount fstype=nfs,
  mount fstype=nfs4,
  mount fstype=rpc_pipefs,

You can then just add NFS mount-points to the /etc/fstab inside the container’s rootfs.

Boot

Using lxc-ls --fancy you can see whether a given container will auto-start on boot; you enable this for a container with a symlink; for container foo:
ln -s /var/lib/lxc/foo/config /etc/lxc/auto/foo.conf

However, with the virbr0 routed networking, this won’t work by default. The interface appears to be created by the libvirtd process, slowly enough that the race against lxc is normally lost. There are two parts to fixing this.

The key is to make the lxc service depend upon the libvirt-bin service having started, and make sure that the libvirt-bin service only reports that it has started once the libvirtd process has progressed far enough to create the virbr0 interface.

  1. Edit /etc/init/lxc.conf to change the start line to read start on runlevel [2345] and started libvirt-bin

  2. Edit /etc/init/libvirt-bin.conf to add a post-start script, so that emission of the started event is delayed until the concurrently-running libvirtd has created the interface. That’s:

    post-start script
        while ! /bin/ip link show virbr0 >/dev/null ; do sleep 1; done
    end script
    

The sleep 1 is unclean, as is the hard-coded assumption of virbr0, but this works and is reliable. The lxc.start.delay LXC configuration option is too new and not supported in Ubuntu Saucy.

-The Grumpy Troll

Categories: Ubuntu LXC Containers Routed