LXC Routed on Ubuntu
Containers are a decent technology, whether they’re FreeBSD’s Jails, Solaris Zones or Linux’s version. Linux comes with the LXC tools which can be quite useful to managing the containers.
If you’re happy to use NAT in front of each container, or a proxy (such as SSH
configuration using ProxyCommand
to ssh to the containing host) or a
web-proxy in front of services, the defaults are decent enough; to be able to
directly connect to container service, you want the containers to be on a
network which is reachable from outside that machine. There are a few ways to
do this. Reasons might include use of Kerberos and Kerberised NFS.
This post summarizes some changes made months ago and some made more recently, so that such containers autoboot cleanly, can use NFS and can use separate routed networks where the network routes are distrbuted via DHCP without interfering with LXC.
Networking
One approach is to use one flat network and reserve DHCP ranges for each VM hosting box. That’s not the way this troll went.
I prefer to allocate a network to the VM hosts and tell the DHCP server to use the CIDR option (DHCP option 121) to provide routes to clients on the network to be able to reach those services, and as a fallback for clients which ignore this add static routes on my default router, so that in a worst-case scenario, there’s some doglegging in the traffic.
Thus each VM has an identifiable network (which I can perhaps switch around
for failover). To do this, I use virbr0
on the VM hosts.
Edit /etc/lxc/default.conf
to set lxc.network.link = virbr0
. So far, I’ve
also been editing /etc/default/lxc-net
to set USE_LXC_BRIDGE="false"
and
LXC_BRIDGE="virbr0"
– I need to dig deeper to confirm if editing these is
actually needed.
With sudo virsh net-edit default
(to edit
/etc/libvirt/qemu/networks/default.xml
) I set <forward mode='route'/>
,
switch the network to a netblock dedicated to this VM and change the DHCP
range to a small block of addresses, leaving room for static assignments.
To make the ‘static’ assignment, after later creating an LXC container, I can
grab lxc.network.hwaddr
from its configuration and using virsh net-edit
add a new rule inside network→ip→dhcp along the lines of:
<host mac='00:16:3e:00:11:22' name='container.tld' ip='192.0.2.10' />
.
The problem with using the DHCP CIDR option (121) is that by default the dhclient hook will grab this and insert a route, then later libvirtd will see that the route already exists and refuse to assign that network to a new network interface. The fix for this is ugly script hacking.
Create a new file /etc/troll-net-reserve
and inside it put, one-per-line,
any netblocks which should be ignored if received as CIDR options.
# don't let dhclient insert a route for our local LXC network
192.0.2.0/24
Then modify /etc/dhcp/dhclient-exit-hooks.d/rfc3442-classless-routes
to add
a function near the top and invoke it just before adding a route:
is_local_virtual() {
if ip -4 route get "$1" | grep -q 'dev virbr'; then
return 0
fi
local l
while read l ; do
if [ ".$l" = ".$1" ]; then
return 0
fi
done < /etc/troll-net-reserve
return 1
}
## .... just before adding the route
### between "take care of link-local routes"
### and "set route (ip detects host routes automatically)":
if is_local_virtual "${net_address}/${net_length}"; then
continue
fi
Note that this setup only protects against exact matches between
/etc/troll-net-reserve
and the netblocks received via DHCP. Change the
prefix length or use a contraining netblock and you’ll need more sophisticated
scripting. For the common /24
case, this works well enough.
NFS
You can mount NFS from outside the container, which is the approach I use with NAT’d containers, although then the container is unaware of the mount-point and you’re not using the same uid space.
To mount NFS inside the container, you need to tell AppArmor to allow this; it
is not necessary to modify the LXC conf file to avoid dropping the
mac_admin
capability: although mounting should require mac_admin
, this
appears to not be the case for NFS inside containers.
# vi /etc/apparmor.d/abstractions/lxc/container-base
...
# service apparmor restart
The rules I add are:
# allow NFS
mount fstype=nfs,
mount fstype=nfs4,
mount fstype=rpc_pipefs,
You can then just add NFS mount-points to the /etc/fstab
inside the
container’s rootfs.
Boot
Using lxc-ls --fancy
you can see whether a given container will auto-start
on boot; you enable this for a container with a symlink; for container foo:
ln -s /var/lib/lxc/foo/config /etc/lxc/auto/foo.conf
However, with the virbr0
routed networking, this won’t work by default. The
interface appears to be created by the libvirtd
process, slowly enough that
the race against lxc is normally lost. There are two parts to fixing this.
The key is to make the lxc
service depend upon the libvirt-bin
service
having started, and make sure that the libvirt-bin
service only reports that
it has started once the libvirtd
process has progressed far enough to create
the virbr0
interface.
-
Edit
/etc/init/lxc.conf
to change thestart
line to readstart on runlevel [2345] and started libvirt-bin
-
Edit
/etc/init/libvirt-bin.conf
to add apost-start
script, so that emission of thestarted
event is delayed until the concurrently-runninglibvirtd
has created the interface. That’s:post-start script while ! /bin/ip link show virbr0 >/dev/null ; do sleep 1; done end script
The sleep 1
is unclean, as is the hard-coded assumption of virbr0
, but
this works and is reliable. The lxc.start.delay
LXC configuration option is
too new and not supported in Ubuntu Saucy.
-The Grumpy Troll