The Grumpy Troll

Ramblings of a grumpy troll.

Porting Python WSGI app to nginx/uWSGI

One of the pieces of software this troll runs is the Synchronising Key-Server, SKS, which serves keys for PGP via the HKP protocol (based on HTTP). Recently, Daniel Kahn Gillmor brought an issue to the attention of the keyserver operator community, relating to undesirable behaviour of the server-code when faced with certain types of request, which makes it wise to run the key-server behind an HTTP proxy.

Until now, my SKS website had been running under Apache, with a simple redirect for traffic that matches /pks/ to bump over to the HKP port (11371), plus a WSGI app I wrote, for spidering the key-server mesh and reporting stats. That was handled via Graham Dumpleton's excellent mod_wsgi module. That provides for running the Python in a separate process which can fork/thread independently of Apache, under administrator-controlled uids for each app.

Daniel's example configuration for proxying used nginx, a high-performance web-server well-suited to proxying configurations. I hadn't previously run it, but had been considering it for a new project, so this seemed like the ideal opportunity to try it out with something a couple of small steps above trivial. Alas, this meant leaving behind mod_wsgi. Looking around, I decided to try uWSGI, an application container geared heavily towards Python (but supporting some other languages too), using uwsgi as the protocol for talking from the web-server to the app. You might think of uWSGI as a bit like monit combined with a servlet framework, speaking an optimised protocol for handling requests.

Making this easier, I use dedicated IPv6 addresses for the various services I run, including SKS, and had a "services" IPv4 address which only SKS was currently running. Since I had Apache listening on specific IP addresses, I "just" needed to drop those addresses and the vhosts from Apache, add to nginx, and there would be no conflict. Easier still: almost everyone using SKS will be using a round-robin DNS name to reach the servers, so I could take my service down briefly for a cut-over.

The first thing making things harder: nginx (1.0.14) appears to ignore alternative config/directory directives when performing config testing, so there's no way to syntax check the configs in my svn repo before deploying. *sigh*

I opted for the "emperor" setup of uWSGI, which let me set up one master server, with configuration files in a directory specifying which tool to run.

The first pass was to replicate the configuration in an nginx style in nginx configuration, drop the configuration from Apache, switch SKS to listen on localhost, have nginx proxy on, setup uWSGI and have nginx talking to that, but not worry about debugging the WSGI app. Not much! In my favour: all configurations in Subversion, fairly easy rollback.

The sks change was trivial, a small change to sksconf and that worked perfectly, first time.

Apache: I missed an IP address leading to an address conflict, but otherwise dropping the configuration went smoothly enough. I was only taking away, after all.

With nginx: I made the /pks/ URI pass-through as a proxy, as well as having nginx listen on the public IPs on port 11371 and pass everything through on that. It took a little puzzling, but I got the various vhosts sorted out. The biggest issue was that I listen on every known pool-name as a vhost for this site, so that port 80 traffic to a round-robin pool could serve content; I wanted /pks/ to pass-through as proxy now, and wanted to keep the HTTP redirects which canonicalised the hostname: otherwise resource inclusion (javascript, images, style-sheets) would go to a hostname which would resolve to hosts not all having the same content.

I already miss mod_macro for Apache, as it helps cut down on repetitive boiler-plate. The nginx configuration is cleaner and more concise, but repetition always leads to mistakes.

Key conceptual points in nginx: the IPs and ports to listen on are taken from the union of all listen directives in server blocks; those get repeated, for each vhost, so that every vhost has both IP/port information and server_name directives. Passing data onto another service is easy, as you define it as an upstream, but that only defines how to reach the upstream and doesn't include common settings that need to be available for every reference to that upstream. So as the uwsgi_* directives needed piled up, they needed to be repeated. They can be abstracted out one layer up, but then you can't use different directives for different services, or have to do a lot of additional dispatch based on if evaluation. The variable names and construction of log format directives is lovely; the TLS configuration was a breeze.

Which leads to an eye-opening point I hadn't considered until reading the nginx documentation: TLS session resumption happens before TLS extensions happen, so you can't have separate TLS session pools for separate vhosts as vhosting requires the Server Name Indication (SNI) extension. I opted for a shared pool, but am still thinking through the implications.

nginx uses its own configuration file format for even the MIME type information. Generating that file from my canonical specification, I discovered the hard way that nginx is case-insensitive in extension names and regards duplicates as a fatal error preventing startup.

After fixing that and some missing semi-colons (see above about inability to test from svn repo dir), nginx came up. The web-site worked. Port 11371 was not reachable. Fixed /etc/pf.conf where I had told my packet-filter that port 11371 would only be reachable if the listening socket was opened by a process which, at open time, ran as the SKS run-time user. Easy fix, { root, sks }. Yes, nginx drops privileges correctly.

At this point, with the SKS db process running, I brought back up SKS recon (for the mesh peering, which is still vulnerable) and had everything except my own WSGI app running. This is where the debugging started.

Problem 0: rebuilt uWSGI to use Python 2.7 instead of 2.6.

Problem 1: uWSGI (1.0.1) core-dumps; pointed gdb at it, saw that it uses realpath(3), passing NULL for the second parameter, expecting that to malloc(3) the storage. My dated OS release does not support that extension. Problem already fixed (a) in later releases of FreeBSD and (b) in current head of uWSGI (which has uninformative repository commit messages, but had fixed it). Added extra patch to Ports tree, re-built, uwsgi now ran and would pick up the configuration.

Problem 2: I loath the Python logging.config framework, but had used it anyway as The Right Thing To Do. Unfortunately, debugging why it's not being picked up took much effort before I abandoned it and just wrote to a file.

Problem 3: by default, uWSGI does not honour the WSGI specification and strip off SCRIPT_NAME from PATH_INFO, which meant my selector.Selector dispatch was failing. The magic-number-laden directive uwsgi_modifier1 30; in nginx.conf, for each uwsgi_pass invocation, fixed that. This is referenced in the uWSGI RunOnNginx documentation, but since I wrote the app a few years ago, I'd forgotten that this manipulation was required by the specification and didn't realise this wasn't an esoteric corner-case but pretty much essential. A list of the available modifier values is documented on a protocol page.

At this point, I had an app which would run and serve content, but not populate the data. I soon decided to reduce this problem to a simpler test-case; I added a new entry to DNS, patched it through as a vhost, which dispatched all content to my test-script running under uWSGI. I wrote it with a wsgiref.simple_server created when invoked standalone, much like my spider, and just reported some basic data. Then I made it use a thread, and decided the simplest thing to do with a thread was to have a couple of counters, one incremented for every request, the other incremented about once per second, and grab that. (I tried to include the code below, but Blogger won't preserve leading-indents in <code> blocks, even if expressed as &nbsp;. So instead, see this public gist, 2133355.)

Works great stand-alone, but when run under uWSGI, the timer-based value never increments. Add os.getpid() reporting, to make sure I'm not just getting the same memory layout on repeated forks, no. Same pid, thread exists, thread value never goes up. Per-request counter only ever goes up by 2, but that's easy: favicon.ico retrieval by the browser. Go peruse docs once more.

Edit the uWSGI emperor YAML file for this app, add “enable-threads: 1” and suddenly the timer counter goes up too. So in uWSGI, by default the threads are created and .start() without errors but never actually run. I can see the logic from the implementer's point of view (don't do unneeded extra setup work when you're trying to be efficient) but as a user/administrator, the lack of error messages, or even a warning to the log-file that the thread would never actually run, made this awkward to track down.

But after adding that enable-threads directive to the main script, suddenly things worked and I soon had data being served. I then just re-enabled the cron-job which pulls the machine-readable dump of "valid IP servers" for updating my own noddy pool definition in DNS and everything's running fine.

Currently outstanding problem: in uWSGI's emperor mode, there's a double-fork of the emperor; the parent writes its pid to the pid-file and lingers, but the child is the actual control. So any time I issue a reload/stop/other-mutate command to uWSGI, the parent dies, the child continues on unchanged. If I manually plop the real pid into the pidfile (original value + 1), then the init script works just fine for control thereafter.

In all? Some annoyances, but far better than it could have been. I was updating code I hadn't touched in a long time, with technologies I'd forgotten the protocol details for, and after bug-fixing and configuration kicking, it still all just worked. WSGI as a standard API works.

-The Grumpy Troll, tolerably happy

[edited 2012-10-30: uwsgi is moving, updated links; old site was which is still up for now]
Categories: WSGI PGP nginx python SKS Apache proxy