The Grumpy Troll

Ramblings of a grumpy troll.

Golang SSH Redux

I’d like to set a couple of things straight, for the record.

I’ll cover the post/blog, and then I’d like to counter some misconceptions. While part of me thinks “I must’ve been very unclear to have so many people misunderstand”, I also saw how many people commented without bothering to read, so really there’s a limit to how much self-flagellation will happen.

I am not a security researcher. I do not try to get bug bounties. I normally care about things like CVEs when I’m coordinating the Exim project’s response to some problem and talking to CERTs, packagers, etc. I’m thus either on the receiving side of the CVE report or requesting a CVE for software for which I am somehow responsible.

The Post

In late March, while working on a project for a client, I encountered something surprising in the state of the world around Golang (the programming language) and its SSH support (Secure SHell, for secure connections to remote computers by limited sets of authorized users, often with the goal of running arbitrary commands on those computers: so remote access, rather than remote retrieval of resources).

On April 2nd, I wrote about what happened. On April 3rd, I added the CVE number when it was assigned. On April 4th, I started a Twitter poll and linked to it. The poll ran for the maximum time allowed (10 days). After it was over, I updated the blog post again with the results. Like most of my blog posts, a couple of friends read it and commented but nothing more came of the issue.

Then on April 15th, out of the blue, a former colleague tagged me in a message with “your article is tops on HN, in the unlikely event you were not already aware”.

I was not aware. I had received one message earlier that day from someone I vaguely knew, and who had read the post and had some more concerns about their own experiences with the security handling of a certain vendor. I didn’t know there was anything else happening.

In fact, the post went to the top of Hacker News, then also appeared on Lobsters, Slashdot, two different /r/netsec posts on Reddit and all sorts of other places. (Yes, this Grumpy Troll goes by syscomet in a few fora).

This blog is hosted in an AWS S3 bucket, with AWS CloudFront in front of it. That setup handled the load without blinking; it would handle a far higher load than anything thrown at it that weekend. Total added cost to me for around 64k post views, 608k HTTP requests and 21GB of traffic has been just under $3. That I don’t need to care that this was over one weekend instead of spread over the month is an advantage of using infrastructure built to scale for giants and riding along as a minnow.

Also: I need to get around to using HTML Subresource Integrity checksums and offloading serving jQuery and some CSS stuff to common hosting. That would have saved me most of that $3, while speeding access for most people and with a “probably acceptable” privacy trade-off for resources which are “probably loaded anyway, and use a cryptographic checksum to avoid even having to leak that we might need to load it again”.

In the post, I aimed for my usual level of dry humor, for which folks should be able to see a warning in the sidebar. Expeditions are still failing to report.


The Go Programming Language never had an insecure SSH library.

What they had was a library with an API prone to misuse. There was never a documentation flaw. The requirement for secure use was always documented.

I know this, because I read the API docs and clearly understood that it was on me to implement hostkey verification. After some brief playing around to sketch out what this would look like, I decided to search to see if there were common debugged libraries which already handled the corner cases, and so looked to see what other projects were doing. Not Invented Here (NIH) is a real problem and I try to avoid it.

At this point, I discovered how few people (none which I saw) were bothering to implement the hostkey verification. They got working connections, they never checked that things failed when they should fail.

The handling of this issue by the Go maintainers was exemplary. They had no security problem but accepted that so many people misusing the library was a security problem and that they could do something about this. They changed the library defaults so as to force programmers to make a decision about how to handle verification. Instead of naive use leading to code which doesn’t fail when it should, naive use now leads to code which doesn’t work until programmers decide how to handle ensuring their application fails when it should.

Further, the Go maintainers have continued to improve the available facilities in this area, with features such as a new sub-package knownhosts to make it easier to Do What OpenSSH Does.

The CVE added to the blog-post, CVE-2017-3204 is an identifier for the issue, so that other programmers fixing their library-using code have a common identifier to use in describing what they’re responding to, and communicating this to their customers and users and having folks able to clearly link things together instead of having to puzzle and decode which set of security issues some release is claiming to be addressing. This is, to my understanding, part of the point of the CVE system.

There was one tiny snafu in reporting the issue, which I described without naming names, in how PGP was handled. The person who had asked me to resend my email without PGP provided the following within a comment on the HN post:

  1. 99% of the PGP-encrypted emails we get to are bogus security reports. Whereas “cleartext” security reports are only about 5-10% bogus. Getting a PGP-encrypted email to has basically become a reliable signal that the report is going to be bogus, so I stopped caring about spending the 5 minutes decrypting the damn thing (logging in to the key server to get the key, remembering how to use gpg).

That is a somewhat depressing indictment upon the state of affairs with use of secure communications tool. I can’t fault him for anything he said. I did get a laugh out of:

I’d say it was worth it. This thread was more fun than using gpg, even if the time savings is a wash.

Context: I have patches in GnuPG, in SKS (the keyserver software) and am somewhat heavily involved in the PGP keyserver community (wrote the setup guide which everyone uses today, etc). I use GnuPG and it’s (1) better than the alternatives; (2) high quality cryptographic engineering; (3) not created by UX professionals.

If I’d known how bad the stats were for PGP mail to security addresses, I wouldn’t have bothered. Instead, I’ve now pinned in my mail-server as a domain which requires validated TLS when sending mail to the MX.

The Vendor

The vendor I mentioned in the original blog-post stepped into the fray and described their view. Please remember that this is not the only vendor with a security problem here. I found no examples in my searches at the time of anything doing this right. I picked one vendor of popular devops tooling, whose work I respected, and reported it to them. They’re already far better than most, or I wouldn’t have been recommending their tools to start with.

I’ll summarize the vendor’s response on HN as “we didn’t communicate clearly; we said some things but changed our minds and didn’t tell the person reporting; we have better ways to handle SSH and have always said to use them”.

I was pleased that they repeatedly noted that I was accurate in my portrayal of their response.

Their stance about the impact of the problem has improved.

About Vault, they wrote:

With the addition of the ability to generate SSH certificates (which was on our roadmap for a long time and added in 0.7, prior to both the original report and the blog post).

About Packer, they describe that the availability of SSH Certificates makes it less of an issue.

I know quite a few people using Packer. None are willing to say that they’re using SSH certificate generation in image generation. There’s a better way available but it’s not used by default. I emphasize this point because this is the same core issue, reframed, which was in the Golang package: it was possible to use it correctly, but almost nobody did so and it wasn’t the easy path.

One small aside: I was an attendee at the vendor’s first conference when the feature of Vault handling SSH for users was introduced. At the conference party, I grabbed the author of the feature and probably edged across the line into rudeness as I described why what they had was not something I’d risk deploying and pointed him at SSH Certificates as a saner path forward without the same operational failure modes and a much smaller attack surface for those who want that.

So I’m glad that the vendor put SSH certificates on their roadmap after I explained what they were and why their first pass implementation was problematic. It’s good when people listen, put aside the problems incurred by my forgetting about human social graces, and work to improve things anyway. That Vault can use SSH Certificates for host and user management is a good thing. It’s not a replacement for hostkey verification unless and until it’s almost as easy to operationally deploy as is non-CA TOFU SSH where clients ignore the “F” in “TOFU”.

Other Thoughts

The quality of the cryptography in Golang is better than in most other languages. I’ve read the source to a few TLS libraries in my time and Golang’s is by far the nicest, with good encapsulation via crypto/subtle of issues such as key-variant timing protection.

Golang isn’t perfect and is sometimes frustrating, but pragmatically it’s the only language I’ll use today for a variety of systems programming and security sensitive tasks. There are a number of other options in this space too (eg, Rust) and they’re all gaining some traction. We finally have a realistic shot at migrating a lot of security-boundary services on Unix-heritage systems to code written in languages not particularly prone to buffer overflow attacks.

Things have reached the point where I can take a stance and decline to deploy on my own systems any new Internet facing servers written in C. Some of those which exist now will remain for a while; servers from groups with an excellent track record (OpenBSD) might cause me to make an exception, based upon their reputation.

Sure, there are more security problems than buffer overflows. But you know what? When systemically purging that one class of flaw gets rid of at least 95% of the remote code execution flaws, I’ll take it.

I look at two approaches to programming language security. On the one hand, the Golang maintainers have a type-safe language immune from buffer overruns, with text templating systems designed to auto-escape data appropriately to avoid various web-based attacks (JS injections etc) and where an API which is being heavily misused gets changed to force folks to confront their problem space more fully. On the other hand, we have C compiler maintainers who assert that any behavior not defined in the C standard is not something where they have to pick one behavior and stick with it, but can instead declare it an impossible situation, that code which relies upon common assumptions is buggy, and proceed to optimize away the security checks.

If I program in C, I need to defend against the compiler maintainers.
If I program in Go, the language maintainers defend me from my mistakes.

If you’re choosing software to deploy on your systems, which language would you rather that the vendors be writing in?

Categories: meta golang ssh CVE security