Skip to main content

External Blogs

Concerns with Signal receipt notifications

Anarcat - Fri, 07/27/2018 - 16:18

During some experiments with a custom Signal client with a friend, let's call him Bob, he was very surprised when we had a conversation that went a little like this:

A> hey Bob! welcome home! B> what? B> wait, how did you know I got home? B> what the heck man? did you hack my machine? OMGWTFSTHUBERTBBQ?!

I'm paraphrasing as I lost copy of the original chat, but it was striking how he had absolutely no clue how I figured out he had just came home in front of his laptop. He was quite worried I hacked into his system to spy on his webcam or some other "hack". As it turns out, I just made simple assertions based on data Signal provides to other peers when you send messages. Using those messages, I could establish when my friend opened his laptop and the Signal Desktop app got back online.

How this works

This is possible because the receipt notifications in Signal are per-device. This means that the "double-checkmark" you see when a message is delivered to the device is actually only when the first device receives the message. Behind the scenes, Signal actually sends a notification for each device, with a unique, per-device identifier. Those identifiers are visible with signal-cli. For example, this is a normal notification the Signal app will send when confirming reception for a message, as seen from signal-cli:

Envelope from: “Bob” +15555555555 (device: 1) Timestamp: 1532279834422 (2018-07-22T17:17:14.422Z) Got receipt.

That's Bob's phone telling me it received the message. On my side, the Signal app shows a second checkmark to tell me the message was transmitted. (There are also "blue checkmarks" now that tell the user the other person has seen the message, but I haven't looked into those in detail.) Then another notification comes in:

Envelope from: “Bob” +15555555555 (device: 2) Timestamp: 1532279901951 (2018-07-22T17:18:21.951Z) Got receipt.

Notice the device number there? It changed from 1 to 2. This tells me this is a different device than the first one. Device 1 will most likely be the phone app and device 2 will most likely be Signal Desktop. (In my case, I tried so many different configurations thatI have device numbers up to 8, but my phone is still device 1.)

An attacker can use those notifications to tell when my phone goes online. It is also possible to make reasonable assertions about the identity of each device: any device number above one is most likely a Signal Desktop client. This can be used to assert physical presence on different machines: the desktop at home, laptop in the office, etc. It might not seem like much, but it sure felt creepy to Bob.

While writing this article, I figured I would reproduce those results, I wrote Bob again to ask for help. Here's how the (redacted and reformatted) conversation went:

A-1> hey you there? * B-1 message received A-1> i want to see if i can freak you out with signal again * B-1 message received A-1> i'm going to write about the issue, and i want to reproduce the results * B-1 message received B-1> he's driving B-1> sure, I'll be your guinea pig he says A-1> all he needs to do is open his laptop and start signal-desktop :p * B-1 message received B-1> we'll be home in 1h30 A-1> i'll know, don't worry :p * B-1 message received

After an hour or two, Bob gets home opens his laptop, and you can see the key message that reveals it:

* B-2 mesage received A-1> welcome home, sucker! ;) B-2> dang dog.

This attack can be carried out by anyone who knows Bob's phone number. Because Signal is an open network, you are free to send messages to anyone without their consent. An attacker only has to send spam messages to a victim to figure out when they're online, how many devices they own and when they are online. There's no way for Bob to protect himself from this attack, other than trying to keep his phone number private.

Why Signal works that way

When I shared an earlier draft of this article to the Signal Security team, they stated this was a necessary trade-off, as each device carries a unique cryptographic key anyways and that:

Signal encrypts messages individually to each recipient device. Thus as long as there is a "delivery receipt" feature, it will be possible to learn which recipient devices are online, for example by sending an encrypted message to a subset of the recipient devices, and seeing whether a delivery receipt is received or not.

The alternative seems to be to either disable receipt notifications or sharing the same private key among different devices, which induces other problems:

Having all recipient devices share the same encryption keys would render the Diffie-Hellman ratcheting which is part of the Signal protocol ineffective, since all devices (including offline ones) would have to use synchronized DH ratchet key pairs, preventing these values from adding fresh randomness. In addition, it would add massive protocol complexity and fragility to try to keep recipient devices synchronized, while trying to achieve the (probably-infeasible) goal of eliminating all ways to distinguish recipient devices.

I am not certain those tradeoffs are that clear-cut, however. I am not a cryptographer, and specifically not very familiar with the "ratcheting" algorithm behind the "Signal protocol" (or is it called Noise now?), but it seems to me there should be a way to provide multi-device, multi-key encryption, without revealing per-device identifiers to other clients. In particular, I do not understand what purpose those integers serve: maybe they are automatically generated by signal-cli and are just a side-effect of a fundamental property of the protocol, in which case I would understand why they would be unavoidable. To be fair, other cryptographic systems also share similar problems: an encrypted OpenPGP email usually embeds metadata about source and destination addresses, as email headers are not encrypted. Even a normal OpenPGP encrypted blob includes OpenPGP key data by default, although there are ways to turn that off and make sure an encrypted blob is just an undecipherable blob. The problem with this, of course, is that many critics of OpenPGP present it as an old technology that should be replaced by more modern alternatives like Signal, so it's a bit disappointing to see it suffers from similar metadata exposure problems as older protocols.

But apart from cryptographic properties, there are certain user expectations regarding Signal, and my experience with this specific issue is that this property certainly breaks some privacy expectations for users. I'm not sure people would choose to have delivery notifications if they were given the choice.

Other metadata issues

There are other metadata issues in Signal, of course. Like receipt notifications, they are tradeoffs between usability and privacy. The most notable one is probably how Signal shares your contact list. The user-visible effect is the "Bob is on Signal!" message that pops up when the server figures that out. The Signal people have done extensive research to make this work securely while at the same time leveraging the contacts on your phone, but it's still a surprising phenomenon to new users who don't know about the specifics of how this is implemented.

Another one is how groups are opt-out only: anyone can add you to a group without your consent, which shares your phone number to the other members of the group, a bit like how carbon-copies in emails reveals a social network.

Compared with groups and new users notifications, the receipt notification issue is a little more pernicions: the leak is not visible at all to users except if they run signal-cli... While people clearly see each other's presence in a group, they definitely will not know that those little checkmark disclose more information than they seem to other users.

The bottomline is that crypto and security are hard to implement but also hard to make visible to users. Signal does a great job at making a solid communication application that provides decent security, but it can have surprising properties even for skilled engineers who thought they knew about the security properties of the system, so I am worried about my fellow non-technical friends and their expectations of privacy...

Categories: External Blogs

My free software activities, July 2018

Anarcat - Fri, 07/27/2018 - 13:37
Debian Long Term Support (LTS)

This is my monthly Debian LTS report.

Most of my hours this month were spent updating jessie to catchup with all the work we've done in Wheezy that were never forward-ported (DLA-1414-1, fixing CVE-2017-9462, CVE-2017-17458, CVE-2018-1000132, OVE-20180430-0001, OVE-20180430-0002, and OVE-20180430-0004). Unfortunately, work was impeded by how upstream now refuses to get CVE identifiers for new issues they discover in the process, which meant that I actually missed three more patches which were required to fix the subrepo vulnerability (CVE-2017-17458). In other issues, upstream at least attempted to try identifiers through the OVE system which is not as well integrated in our security tracker but does allow some cross-distro collaboration at least. The regression advisory was published as DLA-1414-2.

Overall, the updates of the Mercurial package were quite difficult as the test suite would fail because order of one test would vary between builds (and not runs!) which was quite confusing. I originally tried fixing this by piping the output of the test suite through sort to get consistent output, but, after vetting the idea one of the upstream maintainers (durin42), I ended up sorting the dictionnary in the code directly.

I have also uploaded fixes for cups (DLA-1412-1, fixing CVE-2017-18190 and CVE-2017-18248) and dokuwiki (DLA-1413-1, fixing CVE-2017-18123).

Other activities

This month was fairly quiet otherwise, as I was on vacation.

I still managed to push a few projects forward. The pull request to add nopaste to ELPA was met with skepticism considering there is already another paste tool in ELPA called webpaste.el which takes the different (and unfortunate) approach of reimplementing all pastebins natively, instead of reusing the existing paste programs. I have, incidentally, discovered similar functionality in my terminal emulator, in the form of urxvt-selection-pastebin although I have yet to try (and probably patch) that approach.

We have also been dealing with a vast attack on IRC servers primarily aimed at hurting the reputation of Freenode operators, but that affected all IRC networks. On top of implementing custom measures to deal with the problem on our networks, I have contributed some documentation to help users and improvements to a IRC service to help with the attack.

I've also had a great conversation with the author of croc, a derivative of magic-wormhole because of flaws I felt were present in the croc implementation. It seems I was able to convince the author to do the right thing and future versions of the program might be fully compatible with wormhole, which is great news.

Categories: External Blogs

My free software activities, June 2018

Anarcat - Thu, 06/28/2018 - 12:55

It's been a while since I haven't done a report here! Since I need to do one for LTS, I figured I would also catchup you up with the work I've done in the last three months. Maybe I'll make that my new process: quarterly reports would reduce the overhead on my side with little loss on you, my precious (few? many?) readers.

Debian Long Term Support (LTS)

This is my monthly Debian LTS report.

I omitted doing a report in May because I didn't spend a significant number of hours, so this also covers a handful of hours of work in May.

May and June were strange months to work on LTS, as we made the transition between wheezy and jessie. I worked on all three LTS releases now, and I must have been absent from the last transition because I felt this one was a little confusing to go through. Maybe it's because I was on frontdesk duty during that time...

For a week or two it was unclear if we should have worked on wheezy, jessie, or both, or even how to work on either. I documented which packages needed an update from wheezy to jessie and proposed a process for the transition period. This generated a good discussion, but I am not sure we resolved the problems we had this time around in the long term. I also sent patches to the security team in the hope they would land in jessie before it turns into LTS, but most of those ended up being postponed to LTS.

Most of my work this month was spent actually working on porting the Mercurial fixes from wheezy to jessie. Technically, the patches were ported from upstream 4.3 and led to some pretty interesting results in the test suite, which fails to build from source non-reproducibly. Because I couldn't figure out how to fix this in the alloted time, I uploaded the package to my usual test location in the hope someone else picks it up. The test package fixes 6 issues (CVE-2018-1000132, CVE-2017-9462, CVE-2017-17458 and three issues without a CVE).

I also worked on cups in a similar way, sending a test package to the security team for 2 issues (CVE-2017-18190, CVE-2017-18248). Same for Dokuwiki, where I sent a patch single issue (CVE-2017-18123). Those have yet to be published, however, and I will hopefully wrap that up in July.

Because I was looking for work, I ended up doing meta-work as well. I made a prototype that would use the embedded-code-copies file to populate data/CVE/list with related packages as a way to address a problem we have in LTS triage, where package that were renamed between suites do not get correctly added to the tracker. It ended up being rejected because the changes were too invasive, but led to Brian May suggesting another approach, we'll see where that goes.

I've also looked at splitting up that dreaded data/CVE/list but my results were negative: it looks like git is very efficient at splitting things up. While a split up list might be easier on editors, it would be a massive change and was eventually refused by the security team.

Other free software work

With my last report dating back to February, this will naturally be a little imprecise, as three months have passed. But let's see...


I wrote eigth articles in the last three months, for an average of three monthly articles. I was aiming at an average of one or two a week, so I didn't get reach my goal. My last article about Kubecon generated a lot of feedback, probably the best I have ever received. It seems I struck a chord for a lot of people, so that certainly feels nice.


Usual maintenance work, but we at last finally got access to the Linkchecker organization on GitHub, which meant a bit of reorganizing. The only bit missing now it the PyPI namespace, but that should also come soon. The code of conduct and contribution guides were finally merged after we clarified project membership. This gives us issue templates which should help us deal with the constant flow of issues that come in every day.

The biggest concern I have with the project now is the C parser and the outdated Windows executable. The latter has been removed from the website so hopefully Windows users won't report old bugs (although that means we won't gain new Windows users at all) and the former might be fixed by a port to BeautifulSoup.

Email over SSH

I did a lot of work to switch away from SMTP and IMAP to synchronise my workstation and laptops with my mailserver. Having the privilege of running my own server has its perks: I have SSH access to my mail spool, which brings the opportunity for interesting optimizations.

The first I have done is called rsendmail. Inspired by work from Don Armstrong and David Bremner, rsendmail is a Python program I wrote from scratch to deliver email over a pipe, securely. I do not trust the sendmail command: its behavior can vary a lot between platforms (e.g. allow flushing the mailqueue or printing it) and I wanted to reduce the attack surface. It works with another program I wrote called sshsendmail which connects to it over a pipe. It integrates well into "dumb" MTAs like nullmailer but I also use it with the popular Postfix as well, without problems.

The second is to switch from OfflineIMAP to Syncmaildir (SMD). The latter allows synchronization over SSH only. The migration was a little difficult but I very much like the results: SMD is faster than OfflineIMAP and works transparently in the background.

I really like to use SSH for email. I used to have my email password stored all over the place: in my Postfix config, in my email clients' memory, it was a mess. With the new configuration, things just work unattended and email feels like a solved problem, at least the synchronization aspects of it.


As often happens, I've done some work on my Emacs configuration. I switched to a new Solarized theme, the bbatsov version which has support for a light and dark mode and generally better colors. I had problems with the cursor which are unfortunately unfixed.

I learned about and used the Emacs iPython Notebook project (EIN) and filed a feature request to replicate the "restart and run" behavior of the web interface. Otherwise it's real nice to have a decent editor to work on Python notebooks and I have used this to work on the terminal emulators series and the related source code

I have also tried to complete my conversion to Magit, a pretty nice wrapper around git for Emacs. Some of my usual git shortcuts have good replacements, but not all. For example, those are equivalent:

  • vc-annotate (C-x C-v g): magit-blame
  • vc-diff (C-x C-v =): magit-diff-buffer-file

Those do not have a direct equivalent:

  • vc-next-action (C-x C-q, or F6): anarcat/magic-commit-buffer, see below
  • vc-git-grep (F8): no replacement

I wrote my own replacement for "diff and commit this file" as the following function:

(defun anarcat/magit-commit-buffer () "commit the changes in the current buffer on the fly This is different than `magit-commit' because it calls `git commit' without going through the staging area AKA index first. This is a replacement for `vc-next-action'. Tip: setting the git configuration parameter `commit.verbose' to 2 will show the diff in the changelog buffer for review. See `git-config(1)' for more information. An alternative implementation was attempted with `magit-commit': (let ((magit-commit-ask-to-stage nil)) (magit-commit (list \"commit\" \"--\" (file-relative-name buffer-file-name))))) But it seems `magit-commit' asserts that we want to stage content and will fail with: `(user-error \"Nothing staged\")'. This is why this function calls `magit-run-git-with-editor' directly instead." (interactive) (magit-run-git-with-editor (list "commit" "--" (file-relative-name buffer-file-name))))

It's not very pretty, but it works... Mostly. Sometimes the magit-diff buffer becomes out of sync, but the --verbose output in the commitlog buffer still works.

I've also looked at git-annex integration. The magit-annex package did not work well for me: the file listing is really too slow. So I found the git-annex.el package, but did not try it out yet.

While working on all of this, I fell in a different rabbit hole: I found it inconvenient to "pastebin" stuff from Emacs, as it would involve selection a region, piping to pastebinit and copy-pasting the URL found in the *Messages* buffer. So I wrote this first prototype:

(defun pastebinit (begin end) "pass the region to pastebinit and add output to killring TODO: prompt for possible pastebins (pastebinit -l) with prefix arg Note that there's a `nopaste.el' project which already does this, which we should use instead. " (interactive "r") (message "use nopaste.el instead") (let ((proc (make-process :filter #'pastebinit--handle :command '("pastebinit") :connection-type 'pipe :buffer nil :name "pastebinit"))) (process-send-region proc begin end) (process-send-eof proc))) (defun pastebinit--handle (proc string) "handle output from pastebinit asynchronously" (let ((url (car (split-string string)))) (kill-new url) (message "paste uploaded and URL added to kill ring: %s" url)))

It was my first foray into aynchronous process operations in Emacs: difficult and confusing, but it mostly worked. Those who know me know what's coming next, however: I found not only one, but two libraries for pastebins in Emacs: nopaste and (after patching nopaste to add asynchronous support and customize support of course) debpaste.el. I'm not sure where that will go: there is a proposal to add nopaste in Debian that was discussed a while back and I made a detailed report there.


I made a minor release of Monkeysign to cover for CVE-2018-12020 and its GPG sigspoof vulnerability. I am not sure where to take this project anymore, and I opened a discussion to possibly retire the project completely. Feedback welcome.


I wrote a new ikiwiki plugin called bootstrap to fix table markup to match what the Bootstrap theme expects. This was particularly important for the previous blog post which uses tables a lot. This was surprisingly easy and might be useful to tweak other stuff in the theme.

Random stuff
  • I wrote up a review of security of APT packages when compared with the TUF project, in TufDerivedImprovements
  • contributed to about 20 different repositories on GitHub, too numerous to list here
Categories: External Blogs

Historical inventory of collaborative editors

Anarcat - Tue, 06/26/2018 - 13:19

A quick inventory of major collaborative editor efforts, in chronological order.

As with any such list, it must start with an honorable mention to the mother of all demos during which Doug Engelbart presented what is basically an exhaustive list of all possible software written since 1968. This includes not only a collaborative editor, but graphics, programming and math editor.

Everything else after that demo is just a slower implementation to compensate for the acceleration of hardware.

Software gets slower faster than hardware gets faster. - Wirth's law

So without further ado, here is the list of notable collaborative editors that I could find. By "notable" i mean that they introduce a notable feature or implementation detail.

Project Date Platform Notes SubEthaEdit 2003-2015? Mac-only first collaborative, real-time, multi-cursor editor I could find. An reverse-engineering attempt in Emacs failed to produce anything. DocSynch 2004-2007 ? built on top of IRC! Gobby 2005-now C, multi-platform first open, solid and reliable implementation and still around! The protocol ("libinfinoted") is notoriously hard to port to other editors (e.g. Rudel failed to implement this in Emacs). 0.7 release in jan 2017 adds possible python bindings that might improve this. Interesting plugins: autosave to disk. Ethercalc 2005-now Web, Javascript First spreadsheet, along with Google docs moonedit 2005-2008? ? Original website died. Other user's cursors visible and emulated keystrokes noises. Included a calculator and music sequencer! synchroedit 2006-2007 ? First web app. Inkscape 2007-2011 C++ First graphics editor with collaborative features backed by the "whiteboard" plugin built on top of Jabber, now defunct. Abiword 2008-now C++ First word processor Etherpad 2008-now Web First solid web app. Originally developped as a heavy Java app in 2008, acquired and opensourced by Google in 2009, then rewritten in Node.js in 2011. Widely used. Wave 2009-2010 Web, Java Failed attempt at a grand protocol unification CRDT 2011 Specification Standard for replicating a document's datastructure among different computers reliably. Operational transform 2013 Specification Similar to CRDT, yet, well, different. Floobits 2013-now ? Commercial, but opensource plugins for different editors LibreOffice Online 2015-now Web free Google docs equivalent, now integrated in Nextcloud HackMD 2015-now ? Commercial but opensource. Inspired by hackpad, which was bought up by Dropbox. Cryptpad 2016-now web? spin-off of xwiki. encrypted, "zero-knowledge" on server Prosemirror 2016-now Web, Node.JS "Tries to bridge the gap between Markdown text editing and classical WYSIWYG editors." Not really an editor, but something that can be used to build one. Qill 2013-now Web, Node.JS Rich text editor, also javascript. Not sure it is really collaborative. Teletype 2017-now WebRTC, Node.JS For the GitHub's Atom editor, introduces "portal" idea that makes guests follow what the host is doing across multiple docs. p2p with webRTC after visit to introduction server, CRDT based. Tandem 2018-now Node.JS? Plugins for atom, vim, neovim, sublime... uses a relay to setup p2p connexions CRDT based. Dubious license issues were resolved thanks to the involvement of Debian developers, which makes it a promising standard to follow in the future. Other lists
Categories: External Blogs

Montréal-Python 72 - Carroty Xenophon

Montreal Python - Sun, 06/03/2018 - 23:00

Let’s meet one last time before our Summer break! Thanks to Notman House for sponsoring this event.

Presentations Socket - Éric Lafontaine

Most of our everyday job include doing request over the internet or hosting a web solution for our company. Each connections we make utilize the socket API in some way that is not always evident. I hope to, by giving this talk, elucidate some of the magic contained in the socket API. I'm also going to give away some trick that I've been using since understanding that API.

Probabilistic Programming and Bayesian Modeling with PyMC3 - Christopher Fonnesbeck

Bayesian statistics offers powerful, flexible methods for data analysis that, because they are based on full probability models, confer several benefits to analysts including scalability, straightforward quantification of uncertainty, and improved interpretability relative to classical methods. The advent of probabilistic programming has served to abstract the complexity associated with fitting Bayesian models, making such methods more widely available. PyMC3 is software for probabilistic programming in Python that implements several modern, computationally-intensive statistical algorithms for fitting Bayesian models. PyMC3’s intuitive syntax is helpful for new users, and its reliance on the Theano library for fast computation has allowed developers to keep the code base simple, making it easy to extend and expand the software to meet analytic needs. Importantly, PyMC3 implements several next-generation Bayesian computational methods, allowing for more efficient sampling for small models and better approximations to larger models with larger associated dataset. I will demonstrate how to construct, fit and check models in PyMC, using a selection of applied problems as motivation.


Monday June 11th, 2018 at 6PM


Notman House

51 Sherbrooke West

Montréal, QC

H2X 1X2

  • 6:00PM - Doors open
  • 6:30PM - Presentations
  • 8:00PM - End of the event
  • 8:15PM - Benelux
Categories: External Blogs

Diversity, education, privilege and ethics in technology

Anarcat - Sat, 05/26/2018 - 11:48

This article is part of a series on KubeCon Europe 2018.

This is a rant I wrote while attending KubeCon Europe 2018. I do not know how else to frame this deep discomfort I have with the way one of the most cutting edge projects in my community is moving. I see it as a symptom of so many things wrong in society at large and figured it was as good a way as any to open the discussion regarding how free software communities seem to naturally evolved into corporate money-making machines with questionable ethics.

A white man groomed by a white woman

Diversity and education

There is often a great point made of diversity at KubeCon, and that is something I truly appreciate. It's one of the places where I have seen the largest efforts towards that goal; I was impressed by the efforts done in Austin, and mentioned it in my overview of that conference back then. Yet it is still one of the less diverse places I've ever participated in: in comparison, Pycon "feels" more diverse, for example. And then, of course, there's real life out there, where women constitute basically half the population, of course. This says something about the actual effectiveness diversity efforts in our communities.

4000 white men

The truth is that contrary to programmer communities, "operations" knowledge (sysadmin, SRE, DevOps, whatever it's called these days) comes not from institutional education, but from self-learning. Even though I have years of university training, the day to day knowledge I need in my work as a sysadmin comes not from the university, but from late night experiments on my personal computer network. This was first on the Macintosh, then on the FreeBSD source code of passed down as a magic word from an uncle and finally through Debian consecrated as the leftist's true computing way. Sure, my programming skills were useful there, but I acquired those before going to university: even there teachers expected students to learn programming languages (such as C!) in-between sessions.

Diversity program

The real solutions to the lack of diversity in our communities not only comes from a change in culture, but also real investments in society at large. The mega-corporations subsidizing events like KubeCon make sure they get a lot of good press from those diversity programs. However, the money they spend on those is nothing compared to tax evasion in their home states. As an example, Amazon recently put 7000 jobs on hold because of a tax the city of Seattle wanted to impose on corporations to help the homeless population. Google, Facebook, Microsoft, and Apple all evade taxes like gangsters. This is important because society changes partly through education, and that costs money. Education is how more traditional STEM sectors like engineering and medicine have changed: women, minorities, and poorer populations were finally allowed into schools after the epic social struggles of the 1970s finally yielded more accessible education. The same way that culture changes are seeing a backlash, the tide is turning there as well and the trend is reversing towards more costly, less accessible education of course. But not everywhere. The impacts of education changes are long-lasting. By evading taxes, those companies are keeping the state from revenues that could level the playing field through affordable education.

Hell, any education in the field would help. There is basically no sysadmin education curriculum right now. Sure you can follow a Cisco CCNA or MSCE private trainings. But anyone who's been seriously involved in running any computing infrastructure knows those are a scam: that will tie you down in a proprietary universe (Cisco and Microsoft, respectively) and probably just to "remote hands monkey" positions and rarely to executive positions.


Besides, providing an education curriculum would require the field to slow down so that knowledge would settle down and trickle into a curriculum. Configuration management is pretty old, but because the changes in tooling are fast, any curriculum built in the last decade (or even less) quickly becomes irrelevant. Puppet publishes a new release every 6 month, Kubernetes is barely 4 years old now, and is changing rapidly with a ~3 month release schedule.

Here at KubeCon, Mark Zuckerberg's mantra of "move fast and break things" is everywhere. We call it "velocity": where you are going does not matter as much as how fast you're going there. At one of the many keynotes, Abby Kearns from the Cloud Foundry Foundation boasted at how Home Depot, in trying to sell more hammers than Amazon, is now deploying code to production multiple times a day. I am still unclear as whether this made Home Depot actually sell more hammers, or if it's something that we should even care about in the first place. Shouldn't we converge over selling less hammers? Making them more solid, reliable, so that they are passed down from generations instead of breaking and having to be replaced all the time?

Home Depot ecstasy

We're solving a problem that wasn't there in some new absurd faith that code deployments will naturally make people happier, by making sure Home Depot sells more hammers. And that's after telling us that Cloud Foundry helped the USAF save 600M$ by moving their databases to the cloud. No one seems bothered by the idea that the most powerful military in existence would move state secrets into a private cloud, out of the control of any government. It's the name of the game, at KubeCon.

USAF saves (money)

In his keynote, Alexis Richardson, CEO of Weaveworks, presented the toaster project as an example of what not to do. "He did not use any sourced components, everything was built from scratch, by hand", obviously missing the fact that toasters are deliberately not built from reusable parts, as part of the planned obsolescence design. The goal of the toaster experiment is also to show how fragile our civilization has become precisely because we depend on layers upon layers of parts. In this totalitarian view of the world, people are also "reusable" or, in that case "disposable components". Not just the white dudes in California, but also workers outsourced out of the USA decades ago; it depends on precious metals and the miners of Africa, the specialized labour of the factories and intricate knowledge of the factory workers in Asia, and the flooded forests of the first nations powering this terrifying surveillance machine.


"Left to his own devices he couldn’t build a toaster. He could just about make a sandwich and that was it." -- Mostly Harmless, Douglas Adams, 1992

Staying in an hotel room for a week, all expenses paid, certainly puts things in perspectives. Rarely have I felt more privileged in my entire life: someone else makes my food, makes my bed, and cleans up the toilet magically when I'm gone. For me, this is extraordinary, but for many people at KubeCon, it's routine: traveling is part of the rock star agenda of this community. People get used to being served, both directly in their day to day lives, but also through the complex supply chain of the modern technology that is destroying the planet.

Nothing is like corporate nothing.

The nice little boxes and containers we call the cloud all abstract this away from us and those dependencies are actively encouraged in the community. We like containers here and their image is ubiquitous. We acknowledge that a single person cannot run a Kube shop because the knowledge is too broad to be possibly handled by a single person. While there are interesting collaborative and social ideas in that approach, I am deeply skeptical of its impact on civilization in the long run. We already created systems so complex that we don't truly know who hacked the Trump election or how. Many feel it was, but it's really just a hunch: there were bots, maybe they were Russian, or maybe from Cambridge? The DNC emails, was that really Wikileaks? Who knows! Never mind failing close or open: the system has become so complex that we don't even know how we fail when we do. Even those in the highest positions of power seem unable to protect themselves; politics seem to have become a game of Russian roulette: we cock the bot, roll the secret algorithm, and see what dictator will shoot out.


All this is to build a new Skynet; not this one or that one, those already exist. I was able to pleasantly joke about the AI takeover during breakfast with a random stranger without raising as much as an eyebrow: we know it will happen, oh well. I've skipped that track in my attendance, but multiple talks at KubeCon are about AI, TensorFlow (it's opensource!), self-driving cars, and removing humans from the equation as much as possible, as a general principle. Kubernetes is often shortened to "Kube", which I always think of as a reference to the Star Trek Borg all mighty ship, the "cube". This might actually make sense given that Kubernetes is an open source version of Google's internal software incidentally called... Borg. To make such fleeting, tongue-in-cheek references to a totalitarian civilization is not harmless: it makes more acceptable the notion that AI domination is inescapable and that resistance truly is futile, the ultimate neo-colonial scheme.

"We are the Borg. Your biological and technological distinctiveness will be added to our own. Resistance is futile."

The "hackers" of our age are building this machine with conscious knowledge of the social and ethical implications of their work. At best, people admit to not knowing what they really are. In the worse case scenario, the AI apocalypse will bring massive unemployment and a collapse of the industrial civilization, to which Silicon Valley executives are responding by buying bunkers to survive the eventual roaming gangs of revolted (and now armed) teachers and young students coming for revenge.

Only the most privileged people in society could imagine such a scenario and actually opt out of society as a whole. Even the robber barons of the 20th century knew they couldn't survive the coming revolution: Andrew Carnegie built libraries after creating the steel empire that drove much of US industrialization near the end of the century and John D. Rockefeller subsidized education, research and science. This is not because they were humanists: you do not become an oil tycoon by tending to the poor. Rockefeller said that "the growth of a large business is merely a survival of the fittest", a social darwinist approach he gladly applied to society as a whole.

But the 70's rebel beat offspring, the children of the cult of Job, do not seem to have the depth of analysis to understand what's coming for them. They want to "hack the system" not for everyone, but for themselves. Early on, we have learned to be selfish and self-driven: repressed as nerds and rejected in the schools, we swore vengeance on the bullies of the world, and boy are we getting our revenge. The bullied have become the bullies, and it's not small boys in schools we're bullying, it is entire states, with which companies are now negotiating as equals.

The fraud

...but what are you creating exactly?

And that is the ultimate fraud: to make the world believe we are harmless little boys, so repressed that we can't communicate properly. We're so sorry we're awkward, it's because we're all somewhat on the autism spectrum. Isn't that, after all, a convenient affliction for people that would not dare to confront the oppression they are creating? It's too easy to hide behind such a real and serious condition that does affect people in our community, but also truly autistic people that simply cannot make it in the fast-moving world the magical rain man is creating. But the real con is hacking power and political control away from traditional institutions, seen as too slow-moving to really accomplish the "change" that is "needed". We are creating an inextricable technocracy that no one will understand, not even us "experts". Instead of serving the people, the machine is at the mercy of markets and powerful oligarchs.

A recurring pattern at Kubernetes conferences is the KubeCon chant where Kelsey Hightower reluctantly engages the crowd in a pep chant:

When I say 'Kube!', you say 'Con!'

'Kube!' 'Con!' 'Kube!' 'Con!' 'Kube!' 'Con!'

Cube Con indeed...

I wish I had some wise parting thoughts of where to go from here or how to change this. The tide seems so strong that all I can do is observe and tell stories. My hope is that the people that need to hear this will take it the right way, but I somehow doubt it. With chance, it might just become irrelevant and everything will fix itself, but somehow I fear things will get worse before they get better.

Categories: External Blogs

Easier container security with entitlements

Anarcat - Mon, 05/21/2018 - 19:00

This article is part of a series on KubeCon Europe 2018.

During KubeCon + CloudNativeCon Europe 2018, Justin Cormack and Nassim Eddequiouaq presented a proposal to simplify the setting of security parameters for containerized applications. Containers depend on a large set of intricate security primitives that can have weird interactions. Because they are so hard to use, people often just turn the whole thing off. The goal of the proposal is to make those controls easier to understand and use; it is partly inspired by mobile apps on iOS and Android platforms, an idea that trickled back into Microsoft and Apple desktops. The time seems ripe to improve the field of container security, which is in desperate need of simpler controls.

The problem with container security

Cormack first stated that container security is too complicated. His slides stated bluntly that "unusable security is not security" and he pleaded for simpler container security mechanisms with clear guarantees for users.

"Container security" is a catchphrase that actually includes all sorts of measures, some of which we have previously covered. Cormack presented an overview of those mechanisms, including capabilities, seccomp, AppArmor, SELinux, namespaces, control groups — the list goes on. He showed how docker run --help has a "ridiculously large number of options"; there are around one hundred on my machine, with about fifteen just for security mechanisms. He said that "most developers don't know how to actually apply those mechanisms to make sure their containers are secure". In the best-case scenario, some people may know what the options are, but in most cases people don't actually understand each mechanism in detail.

He gave the example of capabilities; there are about forty possible values that can be provided for the --cap-drop option, each with its own meaning. He described some capabilities as "understandable", but said that others end up in overly broad boxes. The kernel's data structure limits the system to a maximum of 64 capabilities, so a bunch of functionality was lumped together into CAP_SYS_ADMIN, he said.

Cormack also talked about namespaces and seccomp. While there are fewer namespaces than capabilities, he said that "it's very unclear for a general user what their security properties are". For example, "some combinations of capabilities and namespaces will let you escape from a container, and other ones don't". He also described seccomp as a "long JSON file" as that's the way Kubernetes configures it. Even though he said those files could "usefully be even more complicated" and said that the files are "very difficult to write".

Cormack stopped his enumeration there, but the same applies to the other mechanisms. He said that while developers could sit down and write those policies for their application by hand, it's a real mess and makes their heads explode. So instead developers run their containers in --privileged mode. It works, but it disables all the nice security mechanisms that the container abstraction provides. This is why "containers do not contain", as Dan Walsh famously quipped.

Introducing entitlements

There must be a better way. Eddequiouaq proposed this simple idea: "provide something humans can actually understand without diving into code or possibly even without reading documentation". The solution proposed by the Docker security team is "entitlements": the ability for users to choose simple permissions on the command line. Eddequiouaq said that application users and developers alike don't need to understand the low-level security mechanisms or how they interact within the kernel; "people don't care about that, they want to make sure their app is secure."

Entitlements divide resources into meaningful domains like "network", "security", or "host resources" (like devices). Behind the scenes, Docker translates those into whatever security mechanisms are available. This implies that the actual mechanism deployed will vary between runtimes, depending on the implementation. For example, a "confined" network access might mean a seccomp filter blocking all networking-related system calls except socket(AF_UNIX|AF_LOCAL) along with dropping network-related capabilities. AppArmor will deny network on some platforms while SELinux would do similar enforcement on others.

Eddequiouaq said the complexity of implementing those mechanisms is the responsibility of platform developers. Image developers can ship entitlement lists along with container images created with a regular docker build, and sign the whole bundle with docker trust. Because entitlements do not specify explicit low-level mechanisms, the resulting image is portable to different runtimes without change. Such portability helps Kubernetes on non-Linux platforms do its job.

Entitlements shift the responsibility for configuring sandboxing environments to image developers, but also empowers them to deliver security mechanisms directly to end users. Developers are the ones with the best knowledge about what their applications should or should not be doing. Image end-users, in turn, benefit from verifiable security properties delivered by the bundles and the expertise of image developers when they docker pull and run those images.

Eddequiouaq gave a demo of the community's nemesis: Docker inside Docker (DinD). He picked that use case because it requires a lot of privileges, which usually means using the dreaded --privileged flag. With the entitlements patch, he was able to run DinD with network.admin, security.admin, and host.devices.admin, which looks like --privileged, but actually means some protections are still in place. According to Eddequiouaq, "everything works and we didn't have to disable all the seccomp and AppArmor profiles". He also gave a demo of how to build an image and demonstrated how docker inspect shows the entitlements bundled inside the image. With such an image, docker run starts a DinD image without any special flags. That requires a way to trust the content publisher because suddenly images can elevate their own privileges without the caller specifying anything on the Docker command line.

Goals and future

The specification aims to provide the best user experience possible, so that people actually start using the security mechanisms provided by the platforms instead of opting out of security configurations when they get a "permission denied" error. Eddequiouaq said that Docker eventually wants to "ditch the --privileged flag because it is really a bad habit". Instead, applications should run with the least privileges they need. He said that "this is not the case; currently, everyone works with defaults that work with 95% of the applications out there." Those Docker defaults, he said, provide a "way too big attack surface".

Eddequiouaq opened the door for developers to define custom entitlements because "it's hard to come up with a set that will cover all needs". One way the team thought of dealing with that uncertainty is to have versions of the specification but it is unclear how that would work in practice. Would the version be in the entitlement labels (e.g. network-v1.admin), or out of band?

Another feature proposed is the control of API access and service-to-service communication in the security profile. This is something that's actually available on phones, where an app can only talk with a specific set of services. But that is also relevant to containers in Kubernetes clusters as administrators often need to restrict network access with more granularity than the "open/filter/close" options. An example of such policy could allow the "web" container to talk with the "database" container, although it might be difficult to specify such high-level policies in practice.

While entitlements are now implemented in Docker as a proof of concept, Kubernetes has the same usability issues as Docker so the ultimate goal is to get entitlements working in Kubernetes runtimes directly. Indeed, its PodSecurityPolicy maps (almost) one-to-one with the Docker security flags. But as we have previously reported, another challenge in Kubernetes security is that the security models of Kubernetes and Docker are not exactly identical.

Eddequiouaq said that entitlements could help share best security policies for a pod in Kubernetes. He proposed that such configuration would happen through the SecurityContext object. Another way would be an admission controller that would avoid conflicts between the entitlements in the image and existing SecurityContext profiles already configured in the cluster. There are two possible approaches in that case: the rules from the entitlements could expand the existing configuration or restrict it where the existing configuration becomes a default. The problem here is that the pod's SecurityContext already provides a widely deployed way to configure security mechanisms, even if it's not portable or easy to share, so the proposal shouldn't break existing configurations. There is work in progress in Docker to allow inheriting entitlements within a Dockerfile. Eddequiouaq proposed that Kubernetes should implement a simple mechanism to inherit entitlements from images in the admission controller.

The Docker security team wants to create a "widely adopted standard" supported by Docker swarm, Kubernetes, or any container scheduler. But it's still unclear how deep into the Kubernetes stack entitlements belong. In the team's current implementation, Docker translates entitlements into the security mechanisms right before calling its runtime (containerd), but it might be possible to push the entitlements concept straight into the runtime itself, as it knows best how the platform operates.

Some readers might also notice fundamental similarities between this and other mechanisms such as OpenBSD's pledge(), which made me wonder if entitlements belong in user space in the first place. Cormack observed that seccomp was such a "pain to work with to do complicated policies". He said that having eBPF seccomp filters would make it easier to deal with conflicts between policies and also mentioned the work done on the Checmate and Landlock security modules as interesting avenues to explore. It seems that none of those kernel mechanisms are ready for prime time, at least not to the point that Docker can use them in production. Eddequiouaq said that the proposal was open to changes and discussion so this is all work in progress at this stage. The next steps are to make a proposal to the Kubernetes community before working on an actual implementation outside of Docker.

I have found the core idea of protecting users from all the complicated stuff in container security interesting. It is a recurring theme in container security; we've previously discussed proposals to add container identifiers in the kernel directly for example. Everyone knows security is sensitive and important in Kubernetes, yet doing it correctly is hard. This is a recipe for disaster, which has struck in high profile cases recently. Hopefully having such easier and cleaner mechanisms will help users, developers, and administrators alike.

A YouTube video and slides [PDF] of the talk are available.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Securing the container image supply chain

Anarcat - Thu, 05/17/2018 - 12:00

This article is part of a series on KubeCon Europe 2018.

KubeCon EU "Security is hard" is a tautology, especially in the fast-moving world of container orchestration. We have previously covered various aspects of Linux container security through, for example, the Clear Containers implementation or the broader question of Kubernetes and security, but those are mostly concerned with container isolation; they do not address the question of trusting a container's contents. What is a container running? Who built it and when? Even assuming we have good programmers and solid isolation layers, propagating that good code around a Kubernetes cluster and making strong assertions on the integrity of that supply chain is far from trivial. The 2018 KubeCon + CloudNativeCon Europe event featured some projects that could eventually solve that problem.

Image provenance

A first talk, by Adrian Mouat, provided a good introduction to the broader question of "establishing image provenance and security in Kubernetes" (video, slides [PDF]). Mouat compared software to food you get from the supermarket: "you can actually tell quite a lot about the product; you can tell the ingredients, where it came from, when it was packaged, how long it's good for". He explained that "all livestock in Europe have an animal passport so we can track its movement throughout Europe and beyond". That "requires a lot of work, and time, and money, but we decided that this is was worthwhile doing so that we know [our food is] safe to eat. Can we say the same thing about the software running in our data centers?" This is especially a problem in complex systems like Kubernetes; containers have inherent security and licensing concerns, as we have recently discussed.

You should be able to easily tell what is in a container: what software it runs, where it came from, how it was created, and if it has any known security issues, he said. Mouat also expects those properties to be provable and verifiable with strong cryptographic assertions. Kubernetes can make this difficult. Mouat gave a demonstration of how, by default, the orchestration framework will allow different versions of the same container to run in parallel. In his scenario, this is because the default image pull policy (ifNotPresent) might pull a new version on some nodes and not others. This problem arises because of an inconsistency between the way Docker and Kubernetes treat image tags; the former as mutable and the latter as immutable. Mouat said that "the default semantics for pulling images in Kubernetes are confusing and dangerous." The solution here is to deploy only images with tags that refer to a unique version of a container, for example by embedding a Git hash or unique version number in the image tag. Obviously, changing the policy to AlwaysPullImages will also help in solving the particular issue he demonstrated, but will create more image churn in the cluster.

But that's only a small part of the problem; even if Kubernetes actually runs the correct image, how can you tell what is actually in that image? In theory, this should be easy. Docker seems like the perfect tool to create deterministic images that consist exactly of what you asked for: a clean and controlled, isolated environment. Unfortunately, containers are far from reproducible and the problem begins on the very first line of a Dockerfile. Mouat gave the example of a FROM debian line, which can mean different things at different times. It should normally refer to Debian "stable", but that's actually a moving target; Debian makes new stable releases once in a while, and there are regular security updates. So what first looks like a static target is actually moving. Many Dockerfiles will happily fetch random source code and binaries from the network. Mouat encouraged people to at least checksum the downloaded content to prevent basic attacks and problems.

Unfortunately, all this still doesn't get us reproducible builds since container images include file timestamps, build identifiers, and image creation time that will vary between builds, making container images hard to verify through bit-wise comparison or checksums. One solution there is to use alternative build tools like Bazel that allow you to build reproducible images. Mouat also added that there is "tension between reproducibility and keeping stuff up to date" because using hashes in manifests will make updates harder to deploy. By using FROM debian, you automatically get updates when you rebuild that container. Using FROM debian:stretch-20180426 will get you a more reproducible container, but you'll need to change your manifest regularly to follow security updates. Once we know what is in our container, there is at least a standard in the form of the OCI specification that allows attaching annotations to document the contents of containers.

Another problem is making sure containers are up to date, a "weirdly hard" question to answer according to Mouat: "why can't I ask my registry [if] there is new version of [a] tag, but as far as I know, there's no way you can do that." Mouat literally hand-waved at a slide showing various projects designed to scan container images for known vulnerabilities, introducing Aqua, Clair, NeuVector, and Twistlock. Mouat said we need a more "holistic" solution than the current whack-a-mole approach. His company is working on such a product called Trow, but not much information about it was available at the time of writing.

The long tail of the supply chain

Verifying container images is exactly the kind of problem Notary is designed to solve. Notary is a server "that allows anyone to have trust over arbitrary collections of data". In practice, that can be used by the Docker daemon as an additional check before fetching images from the registry. This allows operators to approve images with cryptographic signatures before they get deployed in the cluster.

Notary implements The Update Framework (TUF), a specification covering the nitty-gritty details of signatures, key rotation, and delegation. It keeps signed hashes of container images that can be used for verification; it can be deployed by enabling Docker's "content Trust" in any Docker daemon, or by configuring a custom admission controller with a web hook in Kubernetes. In another talk (slides [PDF], video) Liam White and Michael Hough covered the basics of Notary's design and how it interacts with Docker. They also introduced Porteiris as an admission controller hook that can implement a policy like "allow any image from the LWN Docker registry as long as it's signed by your favorite editor". Policies can be scoped by namespace as well, which can be useful in multi-tenant clusters. The downside of Porteris is that it supports only IBM Cloud Notary servers because the images need to be explicitly mapped between the Notary server and the registry. The IBM team knows only about how to map its own images but the speakers said they were open to contributions there.

A limitation of Notary is that it looks only at the last step of the build chain; in itself, it provides no guarantees on where the image comes from, how the image was built, or what it's made of. In yet another talk (slides [PDF] video), Wendy Dembowski and Lukas Puehringer introduced a possible solution to that problem: two projects that work hand-in-hand to provide end-to-end verification of the complete container supply chain. Puehringer first introduced the in-toto project as a tool to authenticate the integrity of individual build steps: code signing, continuous integration (CI), and deployment. It provides a specification for "open and extensible" metadata that certifies how each step was performed and the resulting artifacts. This could be, at the source step, as simple as a Git commit hash or, at the CI step, a build log and artifact checksums. All steps are "chained" as well, so that you can track which commit triggered the deployment of a specific image. The metadata is cryptographically signed by role keys to provide strong attestations as to the provenance and integrity of each step. The in-toto project is supervised by Justin Cappos, who also works on TUF, so it shares some of its security properties and integrates well with the framework. Each step in the build chain has its own public/private key pair, with support for role delegation and rotation.

In-toto is a generic framework allowing a complete supply chain verification by providing "attestations" that a given artifact was created by the right person using the right source. But it does not necessarily provide the hooks to do those checks in Kubernetes itself. This is where Grafeas comes in, by providing a global API to read and store metadata. That can be package versions, vulnerabilities, license or vulnerability scans, builds, images, deployments, and attestations such as those provided by in-toto. All of those can then be used by the Kubernetes admission controller to establish a policy that regulates image deployments. Dembowski referred to this tutorial by Kelsey Hightower as an example configuration to integrate Grafeas in your cluster. According to Puehringer: "It seems natural to marry the two projects together because Grafeas provides a very well-defined API where you can push metadata into, or query from, and is well integrated in the cloud ecosystem, and in-toto provides all the steps in the chain."

Dembowski said that Grafeas is already in use at Google and it has been found useful to keep track of metadata about containers. Grafeas can keep track of what each container is running, who built it, when (sometimes vulnerable) code was deployed, and make sure developers do not ship containers built on untrusted development machines. This can be useful when a new vulnerability comes out and administrators scramble to figure out if or where affected code is deployed.

Puehringer explained that in-toto's reference implementation is complete and he is working with various Linux distributions to get them to use link metadata to have their package managers perform similar verification.


The question of container trust hardly seems resolved at all; the available solutions are complex and would be difficult to deploy for Kubernetes rookies like me. However, it seems that Kubernetes could make small improvements to improve security and auditability, the first of which is probably setting the image pull policy to a more reasonable default. In his talk, Mouat also said it should be easier to make Kubernetes fetch images only from a trusted registry instead of allowing any arbitrary registry by default.

Beyond that, cluster operators wishing to have better control over their deployments should start looking into setting up Notary with an admission controller, maybe Portieris if they can figure out how to make it play with their own Notary servers. Considering the apparent complexity of Grafeas and in-toto, I would assume that those would probably be reserved only to larger "enterprise" deployments but who knows; Kubernetes may be complex enough as it is that people won't mind adding a service or two in there to improve its security. Keep in mind that complexity is an enemy of security, so operators should be careful when deploying solutions unless they have a good grasp of the trade-offs involved.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Updates in container isolation

Anarcat - Wed, 05/16/2018 - 12:00

This article is part of a series on KubeCon Europe 2018.

KubeCon EU At KubeCon + CloudNativeCon Europe 2018, several talks explored the topic of container isolation and security. The last year saw the release of Kata Containers which, combined with the CRI-O project, provided strong isolation guarantees for containers using a hypervisor. During the conference, Google released its own hypervisor called gVisor, adding yet another possible solution for this problem. Those new developments prompted the community to work on integrating the concept of "secure containers" (or "sandboxed containers") deeper into Kubernetes. This work is now coming to fruition; it prompts us to look again at how Kubernetes tries to keep the bad guys from wreaking havoc once they break into a container.

Attacking and defending the container boundaries

Tim Allclair's talk (slides [PDF], video) was all about explaining the possible attacks on secure containers. To simplify, Allclair said that "secure is isolation, even if that's a little imprecise" and explained that isolation is directional across boundaries: for example, a host might be isolated from a guest container, but the container might be fully visible from the host. So there are two distinct problems here: threats from the outside (attackers trying to get into a container) and threats from the inside (attackers trying to get out of a compromised container). Allclair's talk focused on the latter. In this context, sandboxed containers are concerned with threats from the inside; once the attacker is inside the sandbox, they should not be able to compromise the system any further.

Attacks can take multiple forms: untrusted code provided by users in multi-tenant clusters, un-audited code fetched from random sites by trusted users, or trusted code compromised through an unknown vulnerability. According to Allclair, defending a system from a compromised container is harder than defending a container from external threats, because there is a larger attack surface. While outside attackers only have access to a single port, attackers on the inside often have access to the kernel's extensive system-call interface, a multitude of storage backends, the internal network, daemons providing services to the cluster, hardware interfaces, and so on.

Taking those vectors one by one, Allclair first looked at the kernel and said that there were 169 code execution vulnerabilities in the Linux kernel in 2017. He admitted this was a bit of fear mongering; it indeed was a rather unusual year and "most of those were in mobile device drivers". These vulnerabilities are not really a problem for Kubernetes unless you run it on your phone. Allclair said that at least one attendee at the conference was probably doing exactly that; as it turns out, some people have managed to run Kubernetes on a vacuum cleaner. Container runtimes implement all sorts of mechanisms to reduce the kernel's attack surface: Docker has seccomp profiles, but Kubernetes turns those off by default. Runtimes will use AppArmor or SELinux rule sets. There are also ways to run containers as non-root, which was the topic of a pun-filled separate talk as well. Unfortunately, those mechanisms do not fundamentally solve the problem of kernel vulnerabilities. Allclair cited the Dirty COW vulnerability as a classic example of a container escape through race conditions on system calls that are allowed by security profiles.

The proposed solution to this problem is to add a second security boundary. This is apparently an overarching principle at Google, according to Allclair: "At Google, we have this principle security principle that between any untrusted code and user data there have to be at least two distinct security boundaries so that means two independent security mechanisms need to fail in order to for that untrusted code to get out that user data."

Adding another boundary makes attacks harder to accomplish. One such solution is to use a hypervisor like Kata Containers or gVisor. Those new runtimes depend on a sandboxed setting that is still in the proposal stage in the Kubernetes API.

gVisor as an extra boundary

Let's look at gVisor as an example hypervisor. Google spent five years developing the project in the dark before sharing it with the world. At KubeCon, it was introduced in a keynote and a more in-depth talk (slides [PDF], video) by Dawn Chen and Zhengyu He. gVisor is a user-space kernel that implements a subset of the Linux kernel API, but which was written from scratch in Go. The idea is to have an independent kernel that reduces the attack surface; while the Linux kernel has 20 million lines of code, at the time of writing gVisor only has 185,000, which should make it easier to review and audit. It provides a cleaner and simpler interface: no hardware drivers, interrupts, or I/O port support to implement, as the host operating system takes care of all that mess.

As we can see in the diagram above (taken from the talk slides), gVisor has a component called "sentry" that implements the core of the system-call logic. It uses ptrace() out of the box for portability reasons, but can also work with KVM for better security and performance, as ptrace() is slow and racy. Sentry can use KVM to map processes to CPUs and provide lower-level support like privilege separation and memory-management. He suggested thinking of gVisor as a "layered solution" to provide isolation, as it also uses seccomp filters and namespaces. He explained how it differed from user-mode Linux (UML): while UML is a port of Linux to user space, gVisor actually reimplements the Linux system calls (211 of the 319 x86-64 system calls) using only 64 system calls in the host system. Another key difference from other systems, like unikernels or Google's Native Client (NaCL), is that it can run unmodified binaries. To fix classes of attacks relying on the open() system call, gVisor also forbids any direct filesystem access; all filesystem operations go through a second process called the gopher that enforces access permissions, in another example of a double security boundary.

According to He, gVisor has a 150ms startup time and 15MB overhead, close to Kata Containers startup times, but smaller in terms of memory. He said the approach is good for small containers in high-density workloads. It is not so useful for trusted images (because it's not required), workloads that make heavy use of system calls (because of the performance overhead), or workloads that require hardware access (because that's not available at all). Even though gVisor implements a large number of system calls, some functionality is missing. There is no System V shared memory, for example, which means PostgreSQL does not work under gVisor. A simple ping might not work either, as gVisor lacks SOCK_RAW support. Linux has been in use for decades now and is more than just a set of system calls: interfaces like /proc and sysfs also make Linux what it is. ~~gVisor implements none of those~~ Of those, gVisor only implements a subset of /proc currently, with the result that some containers will not work with gVisor without modification, for now.

As an aside, the new hypervisor does allow for experimentation and development of new system calls directly in user space. The speakers confirmed this was another motivation for the project; the hope is that having a user-space kernel will allow faster iteration than working directly in the Linux kernel.

Escape from the hypervisor

Of course, hypervisors like gVisor are only a part of the solution to pod security. In his talk, Allclair warned that even with a hypervisor, there are still ways to escape a container. He cited the CVE-2017-1002101 vulnerability, which allows hostile container images to take over a host through specially crafted symbolic links. Like native containers, hypervisors like Kata Containers also allow the guest to mount filesystems across the container boundary, so they are vulnerable to such an attack.

Kubernetes fixed that specific bug, but a general solution is still in the design phase. Allclair said that ephemeral storage should be treated as opaque to the host, making sure that the host never interacts directly with image files and just passes them down to the guest untouched. Similarly, runtimes should "mount block volumes directly into the sandbox, not onto the host". Network filesystems are trickier; while it's possible to mount (say) a Ceph filesystem in the guest, that means the access credentials now reside within the guest, which moves the security boundary into the untrusted container.

Allclair outlined networking as another attack vector: Kubernetes exposes a lot of unauthenticated services on the network by default. In particular, the API server is a gold mine of information about the cluster. Another attack vector is untrusted data flows from containers to the user. For example, container logs travel through various Kubernetes components, and some components, like Fluentd, will end up parsing those logs directly. Allclair said that many different programs are "looking at untrusted data; if there's a vulnerability there, it could lead to remote code execution". When he looked at the history of vulnerabilities in that area, he could find no direct code execution, but "one of the dependencies in Fluentd for parsing JSON has seven different bugs with segfault issues so we can see that could lead to a memory vulnerability". As a possible solution to such issues, Allclair proposed isolating components in their own (native, as opposed to sandboxed) containers, which might be sufficient because Fluentd acts as a first trusted boundary.


A lot of work is happening to improve what is widely perceived as defective container isolation in the Linux kernel. Some take the approach of trying to run containers as regular users ("root-less containers") and rely on the Linux kernel's user-isolation properties. Others found this relies too much on the security of the kernel and use separate hypervisors, like Kata Containers and gVisor. The latter seems especially interesting because it is lightweight and doesn't add much attack surface. In comparison, Kata Containers relies on a kernel running inside the container, which actually expands the attack surface instead of reducing it. The proposed API for sandboxed containers is currently experimental in the containerd and CRI-O projects; Allclair expects the API to ship in alpha as part the Kubernetes 1.12 release.

It's important to keep in mind that hypervisors are not a panacea: they do not support all workloads because of compatibility and performance issues. A hypervisor is only a partial solution; Allclair said the next step is to provide hardened interfaces for storage, logging, and networking and encouraged people to get involved in the node special interest group and the proposal [Google Docs] on the topic.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Montreal-Python 72: Call for speakers

Montreal Python - Sun, 05/13/2018 - 23:00

We are looking for lightning talks (5min) submissions for our next event. Send your proposals at


June 11th, 2018 6PM to 9PM


To be determined

Categories: External Blogs

Autoscaling for Kubernetes workloads

Anarcat - Sun, 05/13/2018 - 19:00

This article is part of a series on KubeCon Europe 2018.

Technologies like containers, clusters, and Kubernetes offer the prospect of rapidly scaling the available computing resources to match variable demands placed on the system. Actually implementing that scaling can be a challenge, though. During KubeCon + CloudNativeCon Europe 2018, Frederic Branczyk from CoreOS (now part of Red Hat) held a packed session to introduce a standard and officially recommended way to scale workloads automatically in Kubernetes clusters.

Kubernetes has had an autoscaler since the early days, but only recently did the community implement a more flexible and extensible mechanism to make decisions on when to add more resources to fulfill workload requirements. The new API integrates not only the Prometheus project, which is popular in Kubernetes deployments, but also any arbitrary monitoring system that implements the standardized APIs.

The old and new autoscalers

Branczyk first covered the history of the autoscaler architecture and how it has evolved through time. Kubernetes, since version 1.2, features a horizontal pod autoscaler (HPA), which dynamically allocates resources depending on the detected workload. When the load becomes too high, the HPA increases the number of pod replicas and, when the load goes down again, it removes superfluous copies. In the old HPA, a component called Heapster would pull usage metrics from the internal cAdvisor monitoring daemon and the HPA controller would then scale workloads up or down based on those metrics.

Unfortunately, the controller would only make decisions based on CPU utilization, even though Heapster provides other metrics like disk, memory, or network usage. According to Branczyk, while in theory any workload can be converted to a CPU-bound problem, this is an inconvenient limitation, especially when implementing higher-level service level agreements. For example, an arbitrary agreement like "process 95% of requests within 100 milliseconds" would be difficult to represent as a CPU-usage problem. Another limitation is that the Heapster API was only loosely defined and never officially adopted as part of the larger Kubernetes API. Heapster also required the help of a storage backend like InfluxDB or Google's Stackdriver to store samples, which made deploying an HPA challenging.

In late 2016, the "autoscaling special interest group" (SIG autoscaling) decided that the pipeline needed a redesign that would allow scaling based on arbitrary metrics from external monitoring systems. The result is that Kubernetes 1.6 shipped with a new API specification defining how the autoscaler integrates with those systems. Having learned from the Heapster experience, the developers specified the new API, but did not implement it for any specific system. This shifts responsibility of maintenance to the monitoring vendors: instead of "dumping" their glue code in Heapster, vendors now have to maintain their own adapter conforming to a well-defined API to get certified.

The new specification defines core metrics like CPU, memory, and disk usage. Kubernetes provides a canonical implementation of those metrics through the metrics server, a stripped down version of Heapster. The metrics server provides the core metrics required by Kubernetes so that scheduling, autoscaling, and things like kubectl top work out of the box. This means that any Kubernetes 1.8 cluster now supports autoscaling using those metrics out of the box: for example minikube or Google's Kubernetes Engine both offer a native metrics server without an external database or monitoring system.

In terms of configuration syntax, the change is minimal. Here is an example of how to configure the autoscaler in earlier Kubernetes releases, taken from the OpenShift Container Platform documentation:

apiVersion: extensions/v1beta1 kind: HorizontalPodAutoscaler metadata: name: frontend spec: scaleRef: kind: DeploymentConfig name: frontend apiVersion: v1 subresource: scale minReplicas: 1 maxReplicas: 10 cpuUtilization: targetPercentage: 80

The new API configuration is more flexible:

apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: hpa-resource-metrics-cpu spec: scaleTargetRef: apiVersion: apps/v1beta1 kind: ReplicationController name: hello-hpa-cpu minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 50

Notice how the cpuUtilization field is replaced by a more flexible metrics field that targets CPU utilization, but can support other core metrics like memory usage.

The ultimate goal of the new API, however, is to support arbitrary metrics, through the custom metrics API. This behaves like the core metrics, except that Kubernetes does not ship or define a set of custom metrics directly, which is where systems like Prometheus come in. Branczyk demonstrated the k8s-prometheus-adapter, which connects any Prometheus metric to the Kubernetes HPA, allowing the autoscaler to add new pods to reduce request latency, for example. Those metrics are bound to Kubernetes objects (e.g. pod, node, etc.) but an "external metrics API" was also introduced in the last two months to allow arbitrary metrics to influence autoscaling. This could allow Kubernetes to scale up a workload to deal with a larger load on an external message broker service, for example.

Here is an example of the custom metrics API pulling metrics from Prometheus to make sure that each pod handles around 200 requests per second:

metrics: - type: Pods pods: metricName: http_requests targetAverageValue: 200

Here http_requests is a metric exposed by the Prometheus server which looks at how many requests each pod is processing. To avoid putting too much load on each pod, the HPA will then ensure that this number will be around a target value by spawning or killing pods as appropriate.

Upcoming features

The SIG seem to have rounded up everything quite neatly. The next step is to deprecate Heapster: as of 1.10, all critical parts of Kubernetes use the new API so a discussion is under way in another group (SIG instrumentation) to finish moving away from the older design.

Another thing the community is looking into is vertical scaling. Horizontal scaling is fine for certain workloads, like caching servers or application frontends, but database servers, most notably, are harder to scale by just adding more replicas; in this case what an autoscaler should do is increase the size of the replicas instead of their numbers. Kubernetes supports this through the vertical pod autoscaler (VPA). It is less practical than the HPA because there is a physical limit to the size of individual servers that the autoscaler cannot exceed, while the HPA can scale up as long as you add new servers. According to Branczyk, the VPA is also more "complicated and fragile, so a lot more thought needs to go into that." As a result, the VPA is currently in alpha. It is not fully compatible with the HPA and is relevant only in cases where the HPA cannot do the job: for example, workloads where there is only a single pod or a fixed number of pods like StatefulSets.

Branczyk gave a set of predictions for other improvements that could come down the pipeline. One issue he identified is that, while the HPA and VPA can scale pods, there is a different Cluster Autoscaler (CA) that manages nodes, which are the actual machines running the pods. The CA allows a cluster to move pods between the nodes to remove underutilized nodes or create new nodes to respond to demand. It's similar to the HPA, except the HPA cannot provision new hardware resources like physical machines on its own: it only creates new pods on existing nodes. The idea here is to combine to two projects into a single one to keep a uniform interface for what is really the same functionality: scaling a workload by giving it more resources.

Another hope is that OpenMetrics will emerge as a standard for metrics across vendors. This process seems to be well under way with Kubernetes already using the Prometheus library, which serves as a basis for the standard, and with commercial vendors like Datadog supporting the Prometheus API as well. Another area of possible standardization is the gRPC protocol used in some Kubernetes clusters to communicate between microservices. Those endpoints can now expose metrics through "interceptors" that get executed before the request is passed to the application. One of those interceptors is the go-grpc-prometheus adapter, which enables Prometheus to scrape metrics from any gRPC-enabled service. The ultimate goal is to have standard metrics deployed across an entire cluster, allowing the creation of reusable dashboards, alerts, and autoscaling mechanisms in a uniform system.


This session was one of the most popular of the conference, which shows a deep interest in this key feature of Kubernetes deployments. It was great to see Branczyk, who is involved with the Prometheus project as well, work on standardization so other systems can work with Kubernetes.

The speed at which APIs change is impressive; in only a few months, the community upended a fundamental component of Kubernetes and replaced it with a new API that users will need to become familiar with. Given the flexibility and clarity of the new API, it is a small cost to pay to represent business logic inside such a complex system. Any simplification will surely be welcome in the maelstrom of APIs and subsystems that Kubernetes has become.

A video of the talk and slides [PDF] are available. SIG autoscaling members Marcin Wielgus and Solly Ross presented an introduction (video) and deep dive (video) talks that might be interesting to our readers who want all the gory details about Kubernetes autoscaling.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Montréal-Python 71 - Burning Yeti

Montreal Python - Sun, 04/29/2018 - 23:00

Enjoy our May meetup just in time before PyCon US with these amazing speakers - 2 of which will be be presenting at PyCon!

Please RSVP on Meetup

Thanks to Google Montreal for sponsoring the event!

Presentations Survival analysis for conversion rates - Tristan Boudreault

What percentage of your users will spend? Typically, analysts use the conversion rate to assess how successful a website is at converting trial users into paying ones. But is this calculation giving us results that are lower than reality? With a talk rich in examples, Tristan will show how Shopify reframes the traditional conversion questions in survival analysis terms.

Data Science at Shopify - Françoise Provencher

Françoise is a data science technical lead at Shopify, a multi-channel commerce platform that has a decade-worth of data on a very diverse set of businesses. We’ll hear about how Python is particularly useful when it comes to understanding Shopify’s user base by sifting through tons of data.

This presentation will be in English.

Integrate Geocode data with Python - Jean Luc Semedo

Les applications intégrant des modules de géolocalisation sont de plus en plus demandées. Avec Python, il existe de nombreuses librairies permettant de gérer la géolocalisation de façon native et très simplement. Nous allons durant cette présentation en survoler quelques-unes : Geopy, pyproj, Mapnik, GeoDjango...

Jean Luc SEMEDO, Back-end and mobile developper en Freelance

  • 6:00PM - Doors open
  • 6:30PM - Presentations
  • 8:30PM - End of the event
  • 9:00PM - Benelux

Monday, May 7th, 2018 at 6:00PM


Google Montréal 1253 McGill College #150 Montréal, QC

Categories: External Blogs

Epic Lameness

Eric Dorland - Mon, 09/01/2008 - 17:26 now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Categories: External Blogs

Sentiment Sharing

Eric Dorland - Mon, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Categories: External Blogs
Syndicate content