Skip to main content

External Blogs

My free software activities, June 2017

Anarcat - lun, 07/03/2017 - 11:37
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. This time I worked on Mercurial, sudo and Puppet.

Mercurial remote code execution

I issued DLA-1005-1 to resolve problems with the hg server --stdio command that could be abused by "remote authenticated users to launch the Python debugger, and consequently execute arbitrary code, by using --debugger as a repository name" (CVE-2017-9462).

Backporting the patch was already a little tricky because, as is often the case in our line of work, the code had changed significantly in newer version. In particular, the commandline dispatcher had been refactored which made the patch non-trivial to port. On the other hand, mercurial has an extensive test suite which allowed me to make those patches in all confidence. I also backported a part of the test suite to detect certain failures better and to fix the output so that it matches the backported code. The test suite is slow, however, which meant slow progress when working on this package.

I also noticed a strange issue with the test suite: all hardlink operations would fail. Somehow it seems that my new sbuild setup doesn't support doing hardlinks. I ended up building a tarball schroot to build those types of packages, as it seems the issue is related to the use of overlayfs in sbuild. The odd part is my tests of overlayfs, following those instructions, show that it does support hardlinks, so there maybe something fishy here that I misunderstand.

This, however, allowed me to get a little more familiar with sbuild and the schroots. I also took this opportunity to optimize the builds by installing an apt-cacher-ng proxy to speed up builds, which will also be useful for regular system updates.

Puppet remote code execution

I have issued DLA-1012-1 to resolve a remote code execution attack against puppetmaster servers, from authenticated clients. To quote the advisory: "Versions of Puppet prior to 4.10.1 will deserialize data off the wire (from the agent to the server, in this case) with a attacker-specified format. This could be used to force YAML deserialization in an unsafe manner, which would lead to remote code execution."

The fix was non-trivial. Normally, this would have involved fixing the YAML parsing, but this was considered problematic because the ruby libraries themselves were vulnerable and it wasn't clear we could fix the problem completely by fixing YAML parsing. The update I proposed took the bold step of switching all clients to PSON and simply deny YAML parsing from the server. This means all clients need to be updated before the server can be updated, but thankfully, updated clients will run against an older server as well. Thanks to LeLutin at Koumbit for helping in testing patches to solve this issue.

Sudo privilege escalation

I have issued DLA-1011-1 to resolve an incomplete fix for a privilege escalation issue (CVE-2017-1000368 from CVE-2017-1000367). The backport was not quite trivial as the code had changed quite a lot since wheezy as well. Whereas mercurial's code was more complex, it's nice to see that sudo's code was actually simpler and more straightforward in newer versions, which is reassuring. I uploaded the packages for testing and uploaded them a year later.

I also took extra time to share the patch in the Debian bugtracker, so that people working on the issue in stable may benefit from the backported patch, if needed. One issue that came up during that work is that sudo doesn't have a test suite at all, so it is quite difficult to test changes and make sure they do not break anything.

Should we upload on fridays?

I brought up a discussion on the mailing list regarding uploads on fridays. With the sudo and puppet uploads pending, it felt really ... daring to upload both packages, on a friday. Years of sysadmin work hardwired me to be careful on fridays; as the saying goes: "don't deploy on a friday if you don't want to work on the weekend!"

Feedback was great, but I was surprised to find that most people are not worried worried about those issues. I have tried to counter some of the arguments that were brought up: I wonder if there could be a disconnection here between the package maintainer / programmer work and the sysadmin work that is at the receiving end of that work. Having myself to deal with broken updates in the past, I'm surprised this has never come up in the discussions yet, or that the response is so underwhelming.

So far, I'll try to balance the need for prompt security updates and the need for stable infrastructure. One does not, after all, go without the other...

Triage

I also did small fry triage:

Hopefully some of those will come to fruitition shortly.

Other work

My other work this month was a little all over the place.

Stressant

Uploaded a new release (0.4.1) of stressant to split up the documentation from the main package, as the main package was taking up too much space according to grml developers.

The release also introduces limited anonymity option, by blocking serial numbers display in the smartctl output.

Debiman

Also did some small followup on the debiman project to fix the FAQ links.

Local server maintenance

I upgraded my main server to Debian stretch. This generally went well, althought the upgrade itself took way more time than I would have liked (4 hours!). This is partly because I have a lot of cruft installed on the server, but also because of what I consider to be issues in the automation of major Debian upgrades. For example, I was prompted for changes in configuration files at seemingly random moments during the upgrade, and got different debconf prompts to answer. This should really be batched together, and unfortunately I had forgotten to use the home-made script I established when i was working at Koumbit which shortens the upgrade a bit.

I wish we would improve on our major upgrade mechanism. I documented possible solutions for this in the AutomatedUpgrade wiki page, but I'm not sure I see exactly where to go from here.

I had a few regressions after the upgrade:

  • the infrared remote control stopped working: still need to investigate
  • my home-grown full-disk encryption remote unlocking script broke, but upstream has a nice workaround, see Debian bug #866786
  • gdm3 breaks bluetooth support (Debian bug #805414 - to be fair, this is not a regression in stretch, it's just that I switched my workstation from lightdm to gdm3 after learning that the latter can do rootless X11!)
Docker and Subsonic

I did my first (and late?) foray into Docker and containers. My rationale was that I wanted to try out Subsonic, an impressive audio server which some friends have shown me. Since Subsonic is proprietary, I didn't want it to contaminate the rest of my server and it seemed like a great occasion to try out containers to keep things tidy. Containers may also allow me to transparently switch to the FLOSS fork LibreSonic once the trial period is over.

I have learned a lot and may write more about the details of that experience soon, for now you can look at the contributions I made to the unofficial Subsonic docker image, but also the LibreSonic one.

Since Subsonic also promotes album covers as first-class citizens, I used beets to download a lot of album covers, which was really nice. I look forward to using beets more, but first I'll need to implement two plugins.

Wallabako

I did a small release of wallabako to fix the build with the latest changes in the underlying wallabago library, which led me to ask upstream to make versionned releases.

I also looked into creating a separate documentation site but it looks like mkdocs doesn't like me very much: the table of contents is really ugly...

Small fry

That's about it! And that was supposed to be a slow month...

Catégories: External Blogs

Alioth moving toward pagure

Anarcat - mer, 06/14/2017 - 12:00

Since 2003, the Debian project has been running a server called Alioth to host source code version control systems. The server will hit the end of life of the Debian LTS release (Wheezy) next year; that deadline raised some questions regarding the plans for the server over the coming years. Naturally, that led to a discussion regarding possible replacements.

In response, the current Alioth maintainer, Alexander Wirt, announced a sprint to migrate to pagure, a free-software "Git-centered forge" written in Python for the Fedora project, which LWN covered last year. Alioth currently runs FusionForge, previously known as GForge, which is the free-software fork of the SourceForge code base when that service closed its source in 2001. Alioth hosts source code repositories, mainly Git and Subversion (SVN) and, like other "forge" sites, also offers forums, issue trackers, and mailing list services. While other alternatives are still being evaluated, a consensus has emerged on a migration plan from FusionForage to a more modern and minimal platform based on pagure.

Why not GitLab?

While this may come as a surprise to some who would expect Debian to use the more popular GitLab project, the discussion and decision actually took place a while back. During a lengthy debate last year, Debian contributors discussed the relative merits of different code-hosting platforms, following the initiative of Debian Developer "Pirate" Praveen Arimbrathodiyil to package GitLab for Debian. At that time, Praveen also got a public GitLab instance running for Debian (gitlab.debian.net), which was sponsored by GitLab B.V. — the commercial entity behind the GitLab project. The sponsorship was originally offered in 2015 by the GitLab CEO, presumably to counter a possible move to GitHub, as there was a discussion about creating a GitHub Organization for Debian at the time. The deployment of a Debian-specific GitLab instance then raised the question of the overlap with the already existing git.debian.org service, which is backed by Alioth's FusionForge deployment. It then seemed natural that the new GitLab instance would replace Alioth.

But when Praveen directly proposed to move to GitLab, Wirt stepped in and explained that a migration plan was already in progress. The plan then was to migrate to a simpler gitolite-based setup, a decision that was apparently made in corridor discussions surrounding the Alioth Git replacement BoF held during Debconf 2015. The first objection raised by Wirt against GitLab was its "huge number of dependencies". Another issue Wirt identified was the "open core / enterprise model", preferring a "real open source system", an opinion which seems shared by other participants on the mailing list. Wirt backed his concerns with an hypothetical example:

Debian needs feature X but it is already in the enterprise version. We make a patch and, for commercial reasons, it never gets merged (they already sell it in the enterprise version). Which means we will have to fork the software and keep those patches forever. Been there done that. For me, that isn't acceptable.

This concern was further deepened when GitLab's Director of Strategic Partnerships, Eliran Mesika, explained the company's stewardship policy that explains how GitLab decides which features end up in the proprietary version. Praveen pointed out that:

[...] basically it boils down to features that they consider important for organizations with less than 100 developers may get accepted. I see that as a red flag for a big community like debian.

Since there are over 600 Debian Developers, the community seems to fall within the needs of "enterprise" users. The features the Debian community may need are, by definition, appropriate only to the "Enterprise Edition" (GitLab EE), the non-free version, and are therefore unlikely to end up in the "Community Edition" (GitLab CE), the free-software version.

Interestingly, Mesika asked for clarification on which features were missing, explaining that GitLab is actually open to adding features to GitLab CE. The response from Debian Developer Holger Levsen was categorical: "It's not about a specific patch. Free GitLab and we can talk again." But beyond the practical and ethical concerns, some specific features Debian needs are currently only in GitLab EE. For example, debian.org systems use LDAP for authentication, which would obviously be useful in a GitLab deployment; GitLab CE supports basic LDAP authentication, but advanced features, like group or SSH-key synchronization, are only available in GitLab EE.

Wirt also expressed concern about the Contributor License Agreement that GitLab B.V. requires contributors to sign when they send patches, which forces users to allow the release of their code under a non-free license.

The debate then went on going through a exhaustive inventory of different free-software alternatives:

  • GitLab, a Ruby-based GitHub replacement, dual-licensed MIT/Commercial
  • Gogs, Go, MIT
  • Gitblit, Java, Apache-licensed
  • Kallithea, in Python, also supports Mercurial, GPLv3
  • and finally, pagure, also written Python, GPLv2

A feature comparison between each project was created in the Debian wiki as well. In the end, however, Praveen gave up on replacing Alioth with GitLab because of the controversy and moved on to support the pagure migration, which resolved the discussion in July 2016.

More recently, Wirt admitted in an IRC conversation that "on the technical side I like GitLab a lot more than pagure" and that "as a user, GitLab is much nicer than pagure and it has those nice CI [continuous integration] features". However, as he explained in his blog "GitLab is Opencore, [and] that it is not entirely opensource. I don't think we should use software licensed under such a model for one of our core services" which leaves pagure as the only stable candidate. Other candidates were excluded on technical grounds, according to Wirt: Gogs "doesn't scale well" and a quick security check didn't yield satisfactory results; "Gitblit is Java" and Kallithea doesn't have support for accessing repositories over SSH (although there is a pending pull request to add the feature).

In an email interview, Sid Sijbrandij, CEO of GitLab, did say that "we want to make sure that our open source edition can be used by open source projects". He gave examples of features liberated following requests by the community, such as branded login pages for the VLC project and GitLab Pages after popular demand. He stressed that "There are no artificial limits in our open source edition and some organizations use it with more than 20.000 users." So if the concern of the Debian community is that features may be missing from GitLab CE, there is definitely an opening from GitLab to add those features. If, however, the concern is purely ethical, it's hard to see how an agreement could be reached. As Sijbrandij put it:

On the mailinglist it seemed that some Debian maintainers do not agree with our open core business model and demand that there is no proprietary version. We respect that position but we don't think we can compete with the purely proprietary software like GitHub with this model.

Working toward a pagure migration

The issue of Alioth maintenance came up again last month when Boyuan Yang asked what would happen to Alioth when support for Debian LTS (Wheezy) ends next year. Wirt brought up the pagure migration proposal and the community tried to make a plan for the migration.

One of the issues raised was the question of the non-Git repositories hosted on Alioth, as pagure, like GitLab, only supports Git. Indeed, Ben Hutchings calculated that while 90% (\~19,000) of the repositories currently on Alioth are Git, there are 2,400 SVN repositories and a handful of Mercurial, Bazaar (bzr), Darcs, Arch, and even CVS repositories. As part of an informal survey, however, most packaging teams explained they either had already migrated away from SVN to Git or were in the process of doing so. The largest CVS user, the web site team, also explained it was progressively migrating to Git. Mattia Rizzolo then proposed that older repository services like SVN could continue running even if FusionForge goes down, as FusionForge is, after all, just a web interface to manage those back-end services. Repository creation would be disabled, but older repositories would stay operational until they migrate to Git. This would, effectively, mean the end of non-Git repository support for new projects in the Debian community, at least officially.

Another issue is the creation of a Debian package for pagure. Ironically, while Praveen and other Debian maintainers have been working for 5 years to package GitLab for Debian, pagure isn't packaged yet. Antonio Terceiro, another Debian Developer, explained this isn't actually a large problem for debian.org services: "note that DSA [Debian System Administrator team] does not need/want the service software itself packaged, only its dependencies". Indeed, for Debian-specific code bases like ci.debian.net or tracker.debian.org, it may not make sense to have the overhead of maintaining Debian packages since those tools have limited use outside of the Debian project directly. While Debian derivatives and other distributions could reuse them, what usually happens is that other distributions roll their own software, like Ubuntu did with the Launchpad project. Still, Paul Wise, a member of the DSA team, reasoned that it was better, in the long term, to have Debian packages for debian.org services:

Personally I'm leaning towards the feeling that all configuration, code and dependencies for Debian services should be packaged and subjected to the usual Debian QA activities but I acknowledge that the current archive setup (testing migration plus backporting etc) doesn't necessarily make this easy.

Wise did say that "DSA doesn't have any hard rules/policy written down, just evaluation on a case-by-case basis" which probably means that pagure packaging will not be a blocker for deployment.

The last pending issue is the question of the mailing lists hosted on Alioth, as pagure doesn't offer mailing list management (nor does GitLab). In fact, there are three different mailing list services for the Debian project:

Wirt, with his "list-master hat" on, explained that the main mailing list service is "not really suited as a self-service" and expressed concern at the idea of migrating the large number mailing lists hosted on Alioth. Indeed, there are around 1,400 lists on Alioth while the main service has a set of 300 lists selected by the list masters. No solution for those mailing lists was found at the time of this writing.

In the end, it seems like the Debian project has chosen pagure, the simpler, less featureful, but also less controversial, solution and will use the same hosting software as their fellow Linux distribution, Fedora. Wirt is also considering using FreeIPA for account management on top of pagure. The plan is to migrate away from FusionForge one bit at a time, and pagure is the solution for the first step: the Git repositories. Lists, other repositories, and additional features of FusionForge will be dealt with later on, but Wirt expects a plan to come out of the upcoming sprint.

It will also be interesting to see how the interoperability promises of pagure will play out in the Debian world. Even though the federation features of pagure are still at the early stages, one can already clone issues and pull requests as Git repositories, which allows for a crude federation mechanism.

In any case, given the long history and the wide variety of workflows in the Debian project, it is unlikely that a single tool will solve all problems. Alioth itself has significant overlap with other Debian services; not only does it handle mailing lists and forums, but it also has its own issue tracker that overlaps with the Debian bug tracking system (BTS). This is just the way things are in Debian: it is an old project with lots of moving part. As Jonathan Dowland put it: "The nature of the project is loosely-coupled, some redundancy, lots of legacy cruft, and sadly more than one way to do it."

Hopefully, pagure will not become part of that "legacy redundant cruft". But at this point, the focus is on keeping the services running in a simpler, more maintainable way. The discussions between Debian and GitLab are still going on as we speak, but given how controversial the "open core" model used by GitLab is for the Debian community, pagure does seem like a more logical alternative.

Note: this article first appeared in the Linux Weekly News.

Catégories: External Blogs

Montréal-Python 65: Quadratic Radicalism

Montreal Python - dim, 06/11/2017 - 23:00

Come join us for our last Montreal meetup of the season.

After the presentations, we will be heading to Benelux, so we can share a drink or two together, so we can welcome the summer!

If you want to give a lightning talk (5 minutes presentation), you can still submit it. Send us an email to team@mtlpy.org or send ping us on the channel #talks in the Montreal-Python Slack http://slack.mtlpy.org/.

Where

Shopify Offices 490 de la Gauchetière street Montréal, Québec

When

Monday, June 19th, 2017 at 6PM

Schedule
  • 6:00PM - Door will open
  • 6:30PM - Start of presentations
  • 7:15PM - Break
  • 7:30PM - Presentations
  • 8:00PM - End of event
  • 8:30PM - Benelux
Presentations Tuyet Minh Truong - Microcomputational Microphone Robert Dupuis - Collaborative Versioning of Scientific Articles: IEEESoftware Alexandre Desilets-Benoit - Startup world from a cynical scientist:

“A story of using scikit learn & life lessons from building startup

Alberto Pepe - 21st-century research, 17th-century publishing

Researchers spend their days doing cutting-edge work. But when it comes time to writing, publishing, and disseminating their work, they’re often still using models and tools that haven’t changed much in decades, if not centuries. Research does not advance through isolated scientific results, but through the publication and dissemination thereof. Scientists get recognition through citation and reputation. Open Science is the philosophical view that sharing benefits scientific research and, hence, that barriers to sharing ideas and methods should be lowered as much as possible. The practice of Open Science addresses the question of how to lower these barriers---whether cultural, systemic, methodological or technical. Authorea is a new collaborative, web-based platform that lowers these barriers via a technological solution for writing, editing and publishing that covers the research cycle from writing a first draft, through to submission and publication. Authorea lets you integrate data, code, and all the materials needed to reproduce scientific results in your papers. This lowers the technical barrier to sharing, paving the way for transparency and reproducibility---the very foundations of the scientific method. Authorea also supports the creation of mixed-format documents including LaTeX, markdown, and rich text. Authorea can propel the scientific paper into the 21st century! It is now in use at each of the top 100 research universities worldwide, across a diverse set of fields ranging from astrophysics to zoology.

Alberto Pepe is the co-founder of Authorea. He recently finished a Postdoctorate in Astrophysics at Harvard University. http://albertopepe.com/

Roberta Voulon (Les Pitonneux) - The Power of Community

To be announced

We’d like to thank our sponsors for their continued support:
  • Shopify
  • UQÀM
  • Bénélux
  • Savoir-faire Linux
Catégories: External Blogs

Montréal-Python 64: Oleophobic Pretzel

Montreal Python - mer, 05/17/2017 - 23:00

Right after PyConUS 2017 we are inviting for our May meetup. It's gonna be held on Monday May 29th at the Shopify offices. It is your chance to get the latest news from he Montreal's Python community. Then join us!

Please note that we are still looking for lightning talks presentations (5 minute talks). Come present a project you are working on, a library you like, etc.

Presentations David Larlet - Inclusive Python

After 12 years hacking in Python, what did I learn the hard way? From biology to the web, across startups and now French government, I realized one thing: making your code resilient requires empathy.

David Larlet is seeking for a more diverse and enthusiastic world. Thoughts (en/fr): https://larlet.fr/david/

Mavi Ruiz-Blondet - A dive into the mind: Python for sensory stimulation and brain signal analysis

Commercial EEG headset are readily available nowadays. How can we find useful information from the EEG data that they can easily acquire? It all starts with the sensory stimulation that will elicit meaningful brain signals, when done correctly. PyschoPy is a library that greatly helps this purpose. Proper sensory stimulation is the first step, once the EEG data has been acquired, its post processing is what will allow us to predict what the user was thinking. This post processing can also be achieved on Python. At the talk, we will see the entire process, from the proper data acquisition to the data post processing and how to interpret the results in order to have a look into a human's mind.

Mavi has always been interested in the electrical signals from the body. She got her bachelor's on Electronic Engineering from the Pontifical Catholic University of Peru, and after working for a year with medical equipment, went to the US to get a Masters in Biomedical Engineering at the State University of New York at Binghamton. After very fruitful research using brain signals as a biometric for authentication purposes, she continues the research as a PhD student in Cognitive and Brain Sciences.

Claude Arpin - The micro:bit and the worldwide rush to teach coding Where

Shopify Offices 490 de la Gauchetière street Montréal, Québec https://goo.gl/maps/EXQWD6jAEAr

When

Monday, May 29th, 2017 at 6PM

Schedule
  • 6:00PM - Door will open
  • 6:30PM - Start of presentations
  • 7:15PM - Break
  • 7:30PM - Presentations
  • 8:00PM - End of event
  • 8:30PM - Benelux
We’d like to thank our sponsors for their continued support:
  • Shopify
  • UQÀM
  • Bénélux
  • Savoir-faire Linux
Catégories: External Blogs

My free software activities, April 2017

Anarcat - sam, 04/29/2017 - 14:23
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. My time this month was spent working on various hairy security issues, most notably XBMC (now known as Kodi) and yaml-cpp.

Kodi directory transversal

I started by looking in CVE-2017-5982, a "directory traversal" vulnerability in XBMC (now known as Kodi) which is a technical term for "allow attackers to read any world-readable file on your computer from the network". It's a serious vulnerability which has no known fix. When you enable the "remote control" interface in Kodi, it allows anyone with the password (which is disabled by default) to download any files Kodi has read access to on the machine it's running. Considering Kodi is often connected to multiple services, this may mean elevated compromise and more nasty stuff.

I furthered the investigation done with my own analysis which showed the problem is difficult to solve: Kodi internally uses the facility to show thumbnails and media to the user, and there are no clear way of restricting which paths Kodi should have access to. Indeed, Kodi is designed to access mounted file systems and paths in arbitrary locations. In Debian bug #855225, I further confirmed confirmed wheezy and jessie-backports as vulnerable and therefore showed with good certainty that stretch and sid are vulnerable as well. I also suggested possible workaround, but at this point, it's in upstream's hands, as the changes will be intrusive. The file transfer mechanism need to be revamped all over Kodi, or authentication (with a proper password policy), need to be enforced.

Squirrelmail

Next I looked at that old webmail software, Squirrelmail, which suffers from a remote code execution vulnerability (CVE-2017-7692) when sending mails with sendmail on the commandline. This is arguably an edge case, but considering the patch was simple, I figured I would provide an update to the LTS community. I tried to get a coordinated release for jessie, since the code is the same, but this wasn't completed at the time of writing. A patch is available and will hopefully be picked up by another LTS worker soon.

Fop and Batik

Those issues (CVE-2017-5661 and CVE-2017-5662) were more difficult. The patches weren't clearly documented and there were no upstream references other than security advisories for the first release in years (in the case of batik) or months (in the case of fop), which made it hard to track down the issues. Fortunately, I was able to track down the upstream issues (FOP-2668 and BATIK-1139) where I got confirmation on what the proper fixes were. I could then release DLA-927-1 and DLA-926-1 with the backported patches.

I do not use fop or batik. In fact, even after reading the homepage of both products, I couldn't quite figure out what use people could possibly have for that thing. Before uploading the packages, I therefore made packages available for testing for fop and batik.

libsndfile

Next up was libsndfile which a bunch of overflows when parsing various audio files. I backported a patch for CVE-2017-7585 CVE-2017-7586 and CVE-2017-7741 which all seemed to be fixed by a single patch usptream. CVE-2017-7742 was also fixed, although with a separate patch. In all of those, i could only test CVE-2017-7741 and CVE-2017-7742, as the others were missing test cases.

I provided a test package for a few days then I also figured it would be best to incorporate the security fixes done in stable, which brought in fixes for CVE-2015-7805, CVE-2014-9756 and CVE-2014-9496. So in the end, I ported patches from wheezy to jessie and uploaded the jessie version (reverting certain build changes) into wheezy and uploaded DLA-928-1 with the results.

yaml-cpp

I then turned to yaml-cpp, a C++ parser for YAML. This one didn't have a known upstream fix, but I figured I would give it a shot anyways. I ended up writing my first C++ code in years which is still pending review and merge upstream. It's not an easy problem to fix: this is basically an excessive recursion problem that can be used to smash the stack. I figured I could introduce a recursion limit, but as the discussion showed, this is a limited approach: stack size varies on different platforms and it's not easy to find the right limit.

The real solution is to rewrite the code to avoid recursion but that's a major code refactoring I didn't feel belong in a LTS update. Besides, this could be better handled by upstream, so I will leave things at that for now. It does make you wonder how much code out there is recursing on untrusted data structures...

kedpm

Finally, a friend over at Koumbit.org reported Debian bug #860817, as information leak in kedpm, a password manager I previously maintained. I requested and got assigned CVE-2017-8296 and provided a fix for wheezy and jessie. For unstable and the coming stretch release, I have requested kedpm to be completely removed from Debian (Debian bug #860817) which involved a release notes update (Debian bug #861277).

It's unfortunate to see software go, but kedpm wasn't maintained. I wasn't the original author: I just gave a few patches and ended up maintaining that software and not using it. It's a bad situation to be in, as you don't really know what's working and not with the tools you are supposed to be responsible for. There are more modern alternatives available now and I encourage everyone to switch.

Triage

Looking for more work, I peeked a bit in the secretary tasks to triage some pending issues. I found that trafficserver could be crashed with simple requests (CVE-2017-5659) so I looked into that issue. My analysis showed that the patch is long and complex and could be difficult to backport to the old version available in wheezy. I also couldn't reproduce the issue in wheezy, so it may be a bug introduced only later, although I couldn't confirm that directly.

I also triaged wireshark, where I just noted the maintainer expressed concern that we were taking up issues too fast and will probably take care of this one. I also postponed various issues in GnuTLS (marked "no-dsa") as they affect only a (unfortunately) rarely used part of GnuTLS that has been removed in later version: OpenPGP support.

Other free software work Debiman

I finally got around contributing to the debiman project. I worked on ensuring that there is a dman compatibility in debiman, by shipping dman in the debian-goodies package (Debian bug #860920). I also submitted a pull request to fix the fix about page title, document the custom assets repository, fix a stray bracket and link to the link to venerable man7.org project

After a discussion on IRC, I also filed a few more issues:

I'm happy to be able to contribute to this important service and I hope the new FAQ I created will be online soon!

XMonad and Emacs

I started using writeroom-mode again as part of my work on LWN. As it turns out, my setup was not exactly working: I had to port my config to the new version and windows weren't "sticky" as they should be, a known issue with Xmonad. Indeed, Xmonad doesn't obey the "static" or "all desktops" standard directives.

Writeroom is a "distraction-free writing" mode for Emacs, so the irony of working on such a deep distraction in establishing a distraction free environment is not lost on me.

Needing to scratch that particular itch, and with the help of clever people from the IRC channel, I was able to make Emacs tell Xmonad to show its window (or "frame" as Emacs likes to call it) on all desktops. This involved creating a new function which I think could be useful in the CopyWindow library:

-- | Toggle between "copyToAll" or "killAllOtherCopies". Copies to all -- workspaces, or remove from all other workspaces, depending on -- previous state (checked with "wsContainingCopies"). copyToAllToggle :: X () copyToAllToggle = do -- check which workspaces have copies copies <- wsContainingCopies if null copies then windows copyToAll -- no workspaces, make sticky else killAllOtherCopies -- already other workspaces, unstick

There are probably better ways of implementing this directly in the CopyWindow code - wsContainingCopies, in particular, is probably overkill. But it's all I can use directly from my xmonad.hs, so that's what I did.

The other bit I needed was something to trigger that function from the outside. I rejected the ServerMode hook because it looked a bit too complicated and there is a built-in facility within X that works without this, which, from Emacs' point of view, is the x-send-client-message function. So I made up a new message identifier and wrote a event hook handler to process it:

-- | handle X client messages that tell Xmonad to make a window appear -- on all workspaces -- -- this should really be using _NET_WM_STATE and -- _NET_WM_STATE_STICKY. but that's more complicated: then we'd need -- to inspect a window and figure out the current state and act -- accordingly. I am not good enough with Xmonad to figure out that -- part yet. -- -- Instead, just check for the relevant message and check if the -- focused window is already on all workspaces and toggle based on -- that. -- -- this is designed to interoperate with Emacs's writeroom-mode module -- and called be called from elisp with: -- -- (x-send-client-message nil 0 nil "XMONAD_COPY_ALL_SELF" 8 '(0)) myClientMessageEventHook :: Event -> X All myClientMessageEventHook (ClientMessageEvent {ev_message_type = mt, ev_data = dt}) = do dpy <- asks display -- the client message we're expecting copyAllMsg <- io $ internAtom dpy "XMONAD_COPY_ALL_SELF" False -- if the event matches the message we expect, toggle sticky state when (mt == copyAllMsg && dt /= []) $ do copyToAllToggle -- we processed the event completely return $ All True

All that was left was to hook that into Emacs, and I was done! Whoohoo! Full screen total domination, distraction free work!

I would love to hear from others what they think of that approach, if they have improvements or if the above copyToAllToggle function could be merged in. Ideally, Xmonad would just parse the STICKY client messages and do the right thing - maybe even directly in CopyWindow - but I have found this enough Haskell for one day.

You can see the diff on my home directory to see exactly the changes involved to make this configuration work.

Emacs packaging

Speaking of Emacs, after complaining in the noisy #emacs IRC channel about the poor TLS configuration of marmelade.org -- and filing a bug (Debian bug #861106) regarding the use of SHA-1 in certificate pinning -- I was told we shouldn't expect trust from third-party ELPA repositories. Marmelade seems to be dead, as the maintainer is "behind the great firewall of China" and MELPA still hasn't figured out how to sign packages. In the end, it seems like there are tons of elpa packages in Debian and that if your favorite one is missing, that's a bug that can be filed and fixed.

I first discovered that 6 of the packages I used were already packaged:

And so I went ahead and filed a ton more bugs for the packages I am using but that aren't in Debian just yet:

Of those, I can't recommend multiple-cursors (MC) enough: I used it at least 4 times just writing this text. It's just awesome. The other ones are also all great in their own right of course, but I feel they are more specific to my workflow whereas MC is just amazing.

ikiwiki

I did some more work on ikiwiki the software driving this blog. I created a new plugin to, at least, fix anchors in the table of contents to be human readable. This is something I had done on the MoinMoin wiki almost a decade ago -- which I called then NicerHeadingIds and that I have always found frustrating with Ikiwiki.

It turns out the problem was both easier and hairier than I thought. Right from the start, something weird was happening: something was already adding nice headings, but they were somewhat broken. It turns out that multimarkdown already inserts those headers, but I wasn't satisfied with the way they were generated. But even worse, I had the headinganchors plugin enabled, but that plugin wasn't taking effect, because of multimarkdown. And even if it would take effect, it doesn't behave well with non-ASCII characters, which gets turned in their numeric presentation.

So I also wrote the i18nheadinganchors plugin that creates better headings and patched the toc plugin so that it can reuse existing anchors if they exist, while keeping backwards compatibility. I hope this gets merged in a future ikiwiki release so I do not have to carry this patch locally too long...

In other news, I have upgraded the ikiwiki-hosting package to the latest version and sent a patch upstream to provide HSTS support.

Other stuff

I have migrated all my public repos hosted on my home server to either Gitlab.com or Github. I also have repositories on 0xacab and it seemed ludicrous to have 4 different, canonical, places where my code was hosted. I have now about 40 different projects on Gitlab and about 60 on Github, although most of the latter are forks of existing projects.

I also made a manpage for stressant and moved the documentation to RTFD which makes it neatly accessible. I also made small incremental improvements (like --directory support).

I installed Rainloop on this server to give a nice, mobile-friendly webmail. Instructions to replicate this setup are in mail.

In the constant git-annex documentation effort, I tried to draft a user guide that could be a basis for restructuring the documentation to be more easily accessible. I also helped a friend put his documentation on the wiki in splitting a repository.

Finally, I also looked into Android stuff a little more. I wrote a usability review of the F-Droid privileged extension that will bring good changes, I hope. I also opened the discussion regarding reproducible builds to try and clarify exactly how those worked to help the Wallabag people ship consistently signed alphas. So far, it seems that it will remain a standard practice on F-Droid to ship packages that are not signed by the official upstream signature, unfortunately, unless upstream provides a reproducible build that is publicly available... Switching to such build is also a hairy issue, as, obviously, the signature changes, which raises the alarm we are trying to avoid in the first place.

Catégories: External Blogs

The rise of Linux-based networking hardware

Anarcat - mer, 04/19/2017 - 12:00

Linux usage in networking hardware has been on the rise for some time. During the latest Netdev conference held in Montreal this April, people talked seriously about Linux running on high end, "top of rack" (TOR) networking equipment. Those devices have long been the realm of proprietary hardware and software companies like Cisco or Juniper, but Linux seems to be making some significant headway into the domain. According to Shrijeet Mukherjee, VP of Engineering at Cumulus Networks: "we are seeing a 28% adoption rate in the Fortune 50" companies.

As someone who has worked in system administration and networking for over a decade, I was surprised by this: switches have always been black boxes of proprietary hardware that we barely get a shell into. But as more and more TOR hardware gets Linux support, some of that work trickles down outside of that niche. Are we really seeing the rise of Linux in high-end networking hardware?

Linux as the standard interface

During his keynote at Netdev, Jesse Brandeburg explained that traffic is exploding on the Internet: "From 2006 to 2016 the compound annual growth rate was 78% of network traffic. So the network traffic is growing like crazy. In 2010 to 2023, it's going to grow by a thousand times."

He also mentioned Intel was working on devices that could do up to 400 Gbps. In his talk, he argued that Linux has a critical place in the modern world by repeating the mantra that "everything is on the network, and the network runs Linux", but does it really? Through Android, Linux has become the most popular end-user device operating system and is also dominant on the server — but what about the actual fabric of the network: the routers and switches that interconnect all those machines?

Mukherjee, in his own keynote, argued that even though Linux has achieved dominance in the server and virtualization markets, there is a "discontinuity [...] at the boundary where the host and the network meet" and argued "that the host without the network will not survive". In other words, proprietary hardware and software in the network threaten free software in the server. Even though some manufacturers are already providing a "Linux interface" in their hardware, it is often only some sort of compatibility shell which might be compared with the Ubuntu compatibility layer in Windows: it's not a real Linux.

Mukherjee pushed this idea further by saying that those companies are limiting themselves by not using the full Linux stack. He presented Linux as the "top vehicle for innovation" that provides a featureful network stack, citing VXLAN, eBPF, and Quagga as prime features used on switches. Mukherjee also praised the diversity of the user-space Linux ecosystem as something that commercial alternatives can't rival; he compared the faster Linux development to the commercial sector where similar top features stay in the beta stage for up to 3 years.

Because of its dominance in the server market, consumers are expecting a Linux-like interface on their networking gear now, which means Linux could be the standard interface all providers strive toward. As a Debian developer, I can't help but smile at the thought; if there's one thing we have not been able to do among Linux distributions, it's pretty much standardize user space in a consistent interface. POSIX is old and incomplete, the FHS is showing its age, and most distributions have abandoned LSB. Yet, the idea is certainly enticing: there is a base set of tools and applications, especially for the Linux kernel, that are standardized: iproute2, ethtool, and iptables are generally consistent across distributions, even though each distribution has its own way of using them.

Yet Linux is not dominant, why? Mukherjee identified the problem as "packaging issues" and listed a set of features he would like Linux to improve:

  • Standardization of the ethtool interface. The idea is to make ethtool a de-facto standard to manage switches and ports. Mukherjee gave the example that data centers spend more money on cables than any other hardware and explained that making it easier to identify cables is therefore a key functionality. Getting consistent interface naming was also a key problem identified by numerous participants at the conference. While systemd tried to fix this with the predictable network interface names specification, the interface names are not necessarily predictable across virtual machines or on special hardware; in fact, this was the topic of the first talk of the conference. ethtool also needs to support interfaces that run faster than 1 Gbps, something that still has limited support in Linux at the moment.

  • Scaling of the Linux bridge. Through the rise of "software defined networking", we are likely to see multi-switch virtual environments that need to scale to hundreds of interfaces and devices easily. This is something the Linux bridge was never designed to do and it's showing scalability issues. During the conference, there was hope that the new XDP and eBPF developments could help, but also concerns this would create yet another bridge layer inside the kernel.

Cumulus's goal seems to be to establish Linux as the industry standard for this new era of networking and it is not alone. Through its Open Compute project, Facebook is sharing open designs of data center products and, while we have yet to see commercial off-the-shelf (COTS) 24 and 48-port gigabit switches trickle down to the consumer market, the company is definitely deploying new hardware based on those specifications in its own data centers, and those devices are often based on Linux.

The Linux switch implementation

So how exactly do switches work in Linux?

The Linux kernel manipulates switches with three different operation structures: switchdev_ops, which we previously covered, ethtool_ops, and netdev_ops. Certain switches, however, also need distributed switch architecture (DSA) features to be properly handled. DSA is a more obscure part of the network stack that allows Linux to represent hardware switches or chains of switches using regular Linux tools like bridge, ifconfig, and so on. While switchdev is a new layer, DSA has been in the kernel since 2.6.28 in 2008. Originally developed to support Marvell switches, DSA is now a generic layer deployed in WiFi access points, set-top boxes, on-board flight entertainment systems, trains, and other industrial equipment. Switches that have an Ethernet controller need DSA, whereas the kernel can support switches without Ethernet controllers directly with switchdev drivers.

The first years of DSA's development consisted only of basic maintenance but, in the last three years, DSA has seen a resurgence of contributions, as part of Linux networking push to support hardware offloading and network switches. Between 2014 and 2015, DSA added support for Broadcom hardware, wake on LAN, and hardware port bridging, among other features.

DSA's development was parallel to swconfig, written by the OpenWrt project to support the small office and home office (SOHO) routers that the project is targeting. The main difference between swconfig and DSA is that DSA-supported switches show one network interface per port, whereas swconfig-configured switches show up as a single port, which limits the amount of information that can be extracted from the switch. For example, you cannot have per-port traffic statistics with swconfig. That limitation is what led to the creation of the switchdev framework, when swconfig was proposed (then refused) for inclusion in mainline. Another goal of switchdev was to support bridge hardware offloading and network interface card (NIC) virtualization.

Also, whereas swconfig uses virtual LAN (VLAN) tagging to address ports, DSA enables the use of device-specific tagging headers to address different ports, which enables DSA to have better control over the switches. This allows, for example, DSA to do internet group management protocol (IGMP) snooping or implement the spanning tree protocol, whereas swconfig doesn't have those features. Some switches are actually connected to the host CPU through an Ethernet interface instead of regular PCI-Express interface, and DSA supports this as well.

One advantage that remains in the swconfig approach is that it treats the internal switch as a simple external switch, and addresses ports with standard VLAN tags. This is something DSA could do, as well, but no one has bothered implementing this just yet. For now, DSA drivers use device-specific tagging mechanisms that limit the number of supported devices. Other areas of future improvement for DSA are better multi-chip support, IGMP snooping, and bonding, as well as firewall, NAT, and TC offloading.

Where is the freedom?

Given all those technical improvements, you might rightfully wonder if your own wireless router or data center switch runs Linux.

In recent years, we have seen more and more networking devices shipped with Linux and sometimes even OpenWrt (e.g. in the case of the Turris Omnia, which we previously covered), and especially on SOHO routers, but it sometimes means a crippled operating system that only offers you a proprietary web interface. But at least those efforts make it easier to deploy free operating systems on those devices.

Based on my experience running OpenWrt on wireless routers to build the Montreal mesh network, deploying Linux on routers and switches has always been a manual process. The Ubiquiti hardware being used in the mesh network comes with an OpenWrt derivative, but it includes proprietary drivers and a proprietary web interface. To use the mesh networking protocol that was chosen, it was necessary to deploy custom OpenWrt images by hand. For years, it was a continuous struggle for OpenWrt developers to liberate generation after generation of proprietary hardware with companies like Cisco locking down the venerable WRT platform in 2006 and the US Federal Communications Commission (FCC) rules that forced TP-Link to block free software on its routers, a change that was later reverted.

Most hardware providers are obviously not dedicated to software freedom: deploying Linux on their hardware is for them an economic, not political choice. As you might expect, a "Linux router" these days often means a closed device and operating system, using Linux as the only free component. I had the privilege of doing some reverse engineering on the SmartRG SR603n VDSL modem, which also doubles as a WiFi router and VoIP phone adapter. I was able to get a shell on that machine, and what I found was a custom operating system built on top of the Linux kernel. I wrote a detailed report about this two years ago and the situation then was pretty grim.

Another experience I had was working over a decade in data centers, which tells an even worse story: most switches and routers there are not running free software at all. We have deployed HP ProCurve switches that provide free (as in beer) software updates and have struggled for years to find free (as in speech) software alternatives for those. We built our own routers using COTS server hardware, at a significant performance cost over the dedicated application-specific integrated circuits (ASICs) built into commercial routers, which do not offer us the trust and flexibility we were looking for.

But Linux is definitely making some headway, and has been for a while. When we covered switchdev in February 2016, it was just getting started, but now vendors like Mellanox, Broadcom, Cumulus, and Intel are writing and shipping code using the framework. Cumulus, in particular, is developing a Debian-based distribution (Cumulus Linux) that it deploys for clients on targeted hardware. Most of the hardware in that list, however, is not actually open in the more radical sense: they are devices that can run free software but are not generally open-source hardware. There are some exceptions, but they sit at the higher end of the spectrum: most organizations probably don't need a 100 Gbps ports, let alone the 128 ports in the Backpack switch that Cumulus is shipping.

How much this translates to actual freedom for the end-user is therefore questionable. While I have seen encouraging progress on the high end of the hardware spectrum at Netdev, I'm not sure this will translate into immediate wins in the data center or even for home users in the short term. In the long term, however, we will hopefully see some progress in Linux's rise in general-purpose networking hardware following its dominance in general-purpose computing.

The author would like to thank the Netdev organizers for travel assistance. Also, thanks to Andrew Lunn for a technical review of this article.

Note: this article first appeared in the Linux Weekly News.

Catégories: External Blogs

Montreal Bug Squashing Party report

Anarcat - dim, 04/16/2017 - 14:19

Un sommaire de cet article est également traduit vers le français, merci!

Last friday, a group of Debian users, developers and enthusiasts met at Koumbit.org offices for a bug squashing party. We were about a dozen people of various levels: developers, hackers and users.

I gave a quick overview of Debian packaging using my quick development guide, which proved to be pretty useful. I made a deb.li link (https://deb.li/quickdev) for people to be able to easily find the guide on their computers.

Then I started going through a list of different programs used to do Debian packaging, to try and see the level of the people attending:

  • apt-get install - everyone knew about it
  • apt-get source - everyone paying attention
  • dget - only 1 knew about it
  • dch - 1
  • quilt - about 2
  • apt-get build-dep - 1
  • dpkg-buildpackage - only 3 people
  • git-buildpackage / gitpkg - 1
  • sbuild / pbuilder
  • dput - 1
  • rmadison - 0 (the other DD wasn't paying attention anymore)

So mostly skilled Debian users (they know apt-get source) but not used to packaging (they don't know about dpkg-buildpackage). So I went through the list again and explained how they all fit together and could be used to work on Debian packages in the context of a Debian release bug squashing party. This was the fastest crash course in Debian packaging I have ever given (and probably the first too) - going through those tools in about 30 minutes. I was happy to have the guide that people could refer to later in the back.

The first question after the presentation was "how do we find bugs"? which led me to add links to the UDD bugs page and release-critical bugs page. I also explained the key links on top of the UDD page to find specific sets of bugs, and explained the useful "patch" filter that allows to select bugs with our without patch.

I guess that maybe half of the people were able to learn new, or improve their skills to make significant contributions or test actual patches. Other learned how to hunt and triage bugs in the BTS.

Update: sorry for the wording: all contributions were really useful, thanks and apologies to bug hunters!!

I myself learned how to use sbuild thanks to the excellent sbuild wiki page which I improved upon. A friend was able to pick up sbuild very quickly and use it to build a package for stretch, which I find encouraging: my first experience with pbuilder was definitely not as good. I have therefore starting the process of switching my build chroots to sbuild, which didn't go so well on Jessie because I use a backported kernel, and had to use the backported sbuild as well. That required a lot of poking around, so I ended up just using pbuilder for now, but I will definitely switch on my home machine, and I updated the sbuild wiki page to give out more explanations on how to setup pbuilder.

We worked on a bunch of bugs, and learned how to tag them as part of the BSP, which was documented in the BSP wiki page. It seems we have worked on about 11 different bugs which is a better average than the last BSP that I organized, so I'm pretty happy with that.

More importantly, we got Debian people together to meet and talk, over delicious pizza, thanks to a sponsorship granted by the DPL. Some people got involved in the next DebConf which is also great.

On top of fixing bugs and getting people involved in Debian, my third goal was to have fun, and fun we certainly had. I didn't work on as many bugs as I expected myself, achieving only one upload in the end, but since I was answering so many questions left and right, I felt useful and that is certainly gratifying. Organization was simple enough: just get a place, send invites and get food, and the rest is just sharing knowledge and answering questions.

Thanks everyone for coming, and let's do this again soon!

Catégories: External Blogs

New approaches to network fast paths

Anarcat - jeu, 04/13/2017 - 12:00

With the speed of network hardware now reaching 100 Gbps and distributed denial-of-service (DDoS) attacks going in the Tbps range, Linux kernel developers are scrambling to optimize key network paths in the kernel to keep up. Many efforts are actually geared toward getting traffic out of the costly Linux TCP stack. We have already covered the XDP (eXpress Data Path) patch set, but two new ideas surfaced during the Netconf and Netdev conferences held in Toronto and Montreal in early April 2017. One is a patch set called af_packet, which aims at extracting raw packets from the kernel as fast as possible; the other is the idea of implementing in-kernel layer-7 proxying. There are also user-space network stacks like Netmap, DPDK, or Snabb (which we previously covered).

This article aims at clarifying what all those components do and to provide a short status update for the tools we have already covered. We will focus on in-kernel solutions for now. Indeed, user-space tools have a fundamental limitation: if they need to re-inject packets onto the network, they must again pay the expensive cost of crossing the kernel barrier. User-space performance is effectively bounded by that fundamental design. So we'll focus on kernel solutions here. We will start from the lowest part of the stack, the af_packet patch set, and work our way up the stack all the way up to layer-7 and in-kernel proxying.

af_packet v4

John Fastabend presented a new version of a patch set that was first published in January regarding the af_packet protocol family, which is currently used by tcpdump to extract packets from network interfaces. The goal of this change is to allow zero-copy transfers between user-space applications and the NIC (network interface card) transmit and receive ring buffers. Such optimizations are useful for telecommunications companies, which may use it for deep packet inspection or running exotic protocols in user space. Another use case is running a high-performance intrusion detection system that needs to watch large traffic streams in realtime to catch certain types of attacks.

Fastabend presented his work during the Netdev network-performance workshop, but also brought the patch set up for discussion during Netconf. There, he said he could achieve line-rate extraction (and injection) of packets, with packet rates as high as 30Mpps. This performance gain is possible because user-space pages are directly DMA-mapped to the NIC, which is also a security concern. The other downside of this approach is that a complete pair of ring buffers needs to be dedicated for this purpose; whereas before packets were copied to user space, now they are memory-mapped, so the user-space side needs to process those packets quickly otherwise they are simply dropped. Furthermore, it's an "all or nothing" approach; while NIC-level classifiers could be used to steer part of the traffic to a specific queue, once traffic hits that queue, it is only accessible through the af_packet interface and not the rest of the regular stack. If done correctly, however, this could actually improve the way user-space stacks access those packets, providing projects like DPDK a safer way to share pages with the NIC, because it is well defined and kernel-controlled. According to Jesper Dangaard Brouer (during review of this article):

This proposal will be a safer way to share raw packet data between user space and kernel space than what DPDK is doing, [by providing] a cleaner separation as we keep driver code in the kernel where it belongs.

During the Netdev network-performance workshop, Fastabend asked if there was a better data structure to use for such a purpose. The goal here is to provide a consistent interface to user space regardless of the driver or hardware used to extract packets from the wire. af_packet currently defines its own packet format that abstracts away the NIC-specific details, but there are other possible formats. For example, someone in the audience proposed the virtio packet format. Alexei Starovoitov rejected this idea because af_packet is a kernel-specific facility while virtio has its own separate specification with its own requirements.

The next step for af_packet is the posting of the new "v4" patch set, although Miller warned that this wouldn't get merged until proper XDP support lands in the Intel drivers. The concern, of course, is that the kernel would have multiple incomplete bypass solutions available at once. Hopefully, Fastabend will present the (by then) merged patch set at the next Netdev conference in November.

XDP updates

Higher up in the networking stack sits XDP. The af_packet feature differs from XDP in that it does not perform any sort of analysis or mangling of packets; its objective is purely to get the data into and out of the kernel as fast as possible, completely bypassing the regular kernel networking stack. XDP also sits before the networking stack except that, according to Brouer, it is "focused on cooperating with the existing network stack infrastructure, and on use-cases where the packet doesn't necessarily need to leave kernel space (like routing and bridging, or skipping complex code-paths)."

XDP has evolved quite a bit since we last covered it in LWN. It seems that most of the controversy surrounding the introduction of XDP in the Linux kernel has died down in public discussions, under the leadership of David Miller, who heralded XDP as the right solution for a long-term architecture in the kernel. He presented XDP as a fast, flexible, and safe solution.

Indeed, one of the controversies surrounding XDP was the question of the inherent security challenges with introducing user-provided programs directly into the Linux kernel to mangle packets at such a low level. Miller argued that whatever protections are expected for user-space programs also apply to XDP programs, comparing the virtual memory protections to the eBPF (extended BPF) verifier applied to XDP programs. Those programs are actually eBPF that have an interesting set of restrictions:

  • they have a limited size
  • they cannot jump backward (and thus cannot loop), so they execute in predictable time
  • they do only static allocation, so they are also limited in memory

XDP is not a one-size-fits-all solution: netfilter, the TC traffic shaper, and other normal Linux utilities still have their place. There is, however, a clear use case for a solution like XDP in the kernel.

For example, Facebook and Cloudflare have both started testing XDP and, in Facebook's case, deploying XDP in production. Martin Kafai Lau, from Facebook, presented the tool set the company is using to construct a DDoS-resilience solution and a level-4 load balancer (L4LB), which got a ten-times performance improvement over the previous IPVS-based solution. Facebook rolled out its own user-space solution called "Droplet" to detect hostile traffic and deploy blocking rules in the form of eBPF programs loaded in XDP. Lau demonstrated the way Facebook deploys a three-part chained eBPF program: the first part allows debugging and dumping of packets, the second is Droplet itself, which drops undesirable traffic, and the last segment is the load balancer, which mangles the packets to tweak their destination according to internal rules. Droplet can drop DDoS attacks at line rate while keeping the architecture flexible, which were two key design requirements.

Gilberto Bertin, from Cloudflare, presented a similar approach: Cloudflare has a tool that processes sFlow data generated from iptables in order to generate cBPF (classic BPF) mitigation rules that are then deployed on edge routers. Those rules are created with a tool called bpfgen, part of Cloudflare's BSD-licensed bpftools suite. For example, it could create a cBPF bytecode blob that would match DNS queries to any example.com domain with something like:

bpfgen dns *.example.com

Originally, Cloudflare would deploy those rules to plain iptables firewalls with the xt_bpf module, but this led to performance issues. It then deployed a proprietary user-space solution based on Solarflare hardware, but this has the performance limitations of user-space applications — getting packets back onto the wire involves the cost of re-injecting packets back into the kernel. This is why Cloudflare is experimenting with XDP, which was partly developed in response to the company's problems, to deploy those BPF programs.

A concern that Bertin identified was the lack of visibility into dropped packets. Cloudflare currently samples some of the dropped traffic to analyze attacks; this is not currently possible with XDP unless you pass the packets down the stack, which is expensive. Miller agreed that the lack of monitoring for XDP programs is a large issue that needs to be resolved, and suggested creating a way to mark packets for extraction to allow analysis. Cloudflare is currently in a testing phase with XDP and it is unclear if its whole XDP tool chain will be publicly available.

While those two companies are starting to use XDP as-is, there is more work needed to complete the XDP project. As mentioned above and in our previous coverage, massive statistics extraction is still limited in the Linux kernel and introspection is difficult. Furthermore, while the existing actions (XDP_DROP and XDP_TX, see the documentation for more information) are well implemented and used, another action may be introduced, called XDP_REDIRECT, which would allow redirecting packets to different network interfaces. Such an action could also be used to accelerate bridges as packets could be "switched" based on the MAC address table. XDP also requires network driver support, which is currently limited. For example, the Intel drivers still do not support XDP, although that should come pretty soon.

Miller, in his Netdev keynote, focused on XDP and presented it as the standard solution that is safe, fast, and usable. He identified the next steps of XDP development to be the addition of debugging mechanisms, better sampling tools for statistics and analysis, and user-space consistency. Miller foresees a future for XDP similar to the popularization of the Arduino chips: a simple set of tools that anyone, not just developers, can use. He gave the example of an Arduino tutorial that he followed where he could just look up a part number and get easy-to-use instructions on how to program it. Similar components should be available for XDP. For this purpose, the conference saw the creation of a new mailing list called xdp-newbies where people can learn how to create XDP build environments and how to write XDP programs.

In-kernel layer-7 proxying

The third approach that struck me as innovative is the idea of doing layer-7 (application) proxying directly in the kernel. This comes from the idea that, traditionally, we build firewalls to segregate traffic and apply controls, but as most services move to HTTP, those policies become ineffective.

Thomas Graf, presented this idea during Netconf using a Star Wars allegory: what if the Death Star were a server with an API? You would have endpoints like /dock or /comms that would allow you to dock a ship or communicate with the Death Star. Those API endpoints should obviously be public, but then there is this /exhaust-port endpoint that should never be publicly available. In order for a firewall to protect such a system, it must be able to inspect traffic at a higher level than the traditional address-port pairs. Graf presented a design where the kernel would create an in-kernel socket that would negotiate TCP connections on behalf of user space and then be able to apply arbitrary eBPF rules in the kernel.

In this scenario, instead of doing the traditional transfer from Netfilter's TPROXY to user space, the kernel directly decapsulates the HTTP traffic and passes it to BPF rules that can make decisions without doing expensive context switches or memory copies in the case of simply wanting to refuse traffic (e.g. issue an HTTP 403 error). This, of course, requires the inclusion of kTLS to process HTTPS connections. HTTP2 support may also prove problematic, as it multiplexes connections and is harder to decapsulate. This design was described as a "pure pre-accept() hook". Starovoitov also compared the design to the kernel connection multiplexer (KCM). Tom Herbert, KCM's author, agreed that it could be extended to support this, but would require some extensions in user space to provide an interface between regular socket-based applications and the KCM layer.

In any case, if the application does TLS (and lots of them do), kTLS gets tricky because it breaks the end-to-end nature of TLS, in effect becoming a man in the middle between the client and the application. Eric Dumazet argued that HA-Proxy already does things like this: it uses splice() to avoid copying too much data around, but it still does a context switch to hand over processing to user space, something that could be fixed in the general case.

Another similar project that was presented at Netdev is the Tempesta firewall and reverse-proxy. The speaker, Alex Krizhanovsky, explained the Tempesta developers have taken one person month to port the mbed TLS stack to the Linux kernel to allow an in-kernel TLS handshake. Tempesta also implements rate limiting, cookies, and JavaScript challenges to mitigate DDoS attacks. The argument behind the project is that "it's easier to move TLS to the kernel than it is to move the TCP/IP stack to user space". Graf explained that he is familiar with Krizhanovsky's work and he is hoping to collaborate. In effect, the design Graf is working on would serve as a foundation for Krizhanovsky's in-kernel HTTP server (kHTTP). In a private email, Graf explained that:

The main differences in the implementation are currently that we foresee to use BPF for protocol parsing to avoid having to implement every single application protocol natively in the kernel. Tempesta likely sees this less of an issue as they are probably only targeting HTTP/1.1 and HTTP/2 and to some [extent] JavaScript.

Neither project is really ready for production yet. There didn't seem to be any significant pushback from key network developers against the idea, which surprised some people, so it is likely we will see more and more layer-7 intelligence move into the kernel sooner rather than later.

Conclusion

All of this work aims at replacing a rag-tag bunch of proprietary solutions that recently came up to bypass the Linux kernel TCP/IP stack and improve performance for firewalls, proxies, and other key edge network elements. The idea is that, unless the kernel improves its performance, or at least provides a way to bypass its more complex code paths, people will work around it. With this set of solutions in place, engineers will now be able to use standard APIs to hook high-performance systems into the Linux kernel.

The author would like to thank the Netdev and Netconf organizers for travel assistance, Thomas Graf for a review of the in-kernel proxying section of this article, and Jesper Dangaard Brouer for review of the af_packet and XDP sections.

Note: this article first appeared in the Linux Weekly News.

Catégories: External Blogs

A report from Netconf: Day 1

Anarcat - mar, 04/11/2017 - 12:00

As is becoming traditional, two times a year the kernel networking community meets in a two-stage conference: an invite-only, informal, two-day plenary session called Netconf, held in Toronto this year, and a more conventional one-track conference open to the public called Netdev. I was invited to cover both conferences this year, given that Netdev was in Montreal (my hometown), and was happy to meet the crew of developers that maintain the network stack of the Linux kernel.

This article covers the first day of the conference which consisted of around 25 Linux developers meeting under the direction of David Miller, the kernel's networking subsystem maintainer. Netconf has no formal sessions; although some people presented slides, interruptions are frequent (indeed, encouraged) and the focus is on hashing out issues that are blocked on the mailing list and getting suggestions, ideas, solutions, and feedback from their peers.

Removing ndo_select_queue()

One of the first discussions that elicited a significant debate was the ndo_select_queue() function, a key component of the Linux polling system that determines when and how to send packets on a network interface (see netdev_pick_tx and friends). The general question was whether the use of ndo_select_queue() in drivers is a good idea. Alexander Duyck explained that Intel people were considering using ndo_select_queue() for receive/transmit queue matching. Intel drivers do not currently use the hook provided by the Linux kernel and it turns out no one is happy with ndo_select_queue(): the heuristics it uses don't really please anyone. The consensus (including from Duyck himself) seemed to be that it should just not be used anymore, or at least not used for that specific purpose.

The discussion turned toward the wireless network stack, which uses it extensively, but for other purposes. Johannes Berg explained that the wireless stack uses ndo_select_queue() for traffic classification, for example to get voice traffic through even if the best-effort queue is backed up. The wireless stack could stop using it by doing flow control completely inside the wireless stack, which already uses the fq_codel flow-control mechanism for other purposes, so porting away from ndo_select_queue() seems possible there.

The problem then becomes how to update all the drivers to change that behavior, which would be a lot of work. Still, it seems people are moving away from a generic ndo_select_queue() interface to stack-specific or even driver-specific (in the case of Intel) queue management interfaces.

refcount_t followup

There was a followup discussion on the integration of the refcount_t type into the network stack, which we covered recently. This type is meant to be an in-kernel defense against exploits based on overflowing or underflowing an object's reference count.

The consensus seems to be that having refcount_t used for debugging is acceptable, but it cannot be enabled by default. An issue that was identified is that the networking developers are fairly sure that introducing refcount_t would have a severe impact on performance, but they do not have benchmarks to prove it, something Miller identified as a problem that needs to be worked on. Miller then expressed some openness to the idea of having it as a kernel configuration option.

A similar discussion happened, on the second day, regarding the KASan memory error detector which was covered when it was introduced in 2014. Eric Dumazet warned that there could be a lot of issues that cannot be detected by KASan because of the way the network stack often bypasses regular memory-allocation routines for performance reasons. He also noted that this can sometimes mean the stack may go over the regular 10% memory limit (the tcp_mem parameter, described in the tcp(7) man page) for certain operations, especially when rebuilding out of order packets with lots of parallel TCP connections.

Therefore it was proposed that these special memory recycling tricks could be optionally disabled, at run or compile-time, to instrument proper memory tracking. Dumazet argued this was a situation similar to refcount_t in that we need a way to disable high performance to make the network stack easier to debug with KAsan.

The problem with optional parameters is that they are often disabled in production or even by default, which, in turn, means that critical bugs cannot actually be found because the code paths are not tested. When I asked Dumazet about this, he explained that Google performs integration testing of new kernels before putting them in production, and those toggles could be enabled there to find and fix those bugs. But he agreed that certain code paths are then not tested until the code gets deployed in production.

So it seems the status quo remains: security folks wants to improve the reliability of the kernel, but the network folks can't afford the performance cost. Yet it was clear in the discussions that the team cares about security issues and wants those issues to be fixed; the impact of some of the solutions is just too big.

Lightweight wireless management packet access

Berg explained that some users need to have high-performance access to certain management frames in the wireless stack and wondered how to best expose those to user space. The wireless stack already allows users to clone a network interface in "monitor" mode, but this has a big performance cost, as the radiotap header needs to be constructed from scratch and the packet header needs to be copied. As wireless improves and the bandwidth rises to gigabit levels, this can become significant bottleneck for packet sniffers or reporting software that need to know precisely what's going on over the air outside of the regular access point client operation.

It seems the proper way to do this is with an eBPF program. As Miller summarized, just add another API call that allows loading a BPF program into the kernel and then those users can use a BPF filtering point to get the statistics they need. This will require an extra hook in the wireless stack, but it seems like this is the way that will be taken to implement this feature.

VLAN 0 inconsistencies

Hannes Frederic Sowa brought up the seemingly innocuous question of "how do we handle VLAN 0?" In theory, VLAN 0 means "no VLAN". But the Linux kernel currently handles this differently depending on whether the VLAN module is loaded and whether a VLAN 0 interface was created. Sometimes the VLAN tag is stripped, sometimes not.

It turns out the semantics of this were accidentally changed last time there was a change here and this was originally working but is now broken. Sowa therefore got the go-ahead to fix this to make the behavior consistent again.

Loopy fun

Then it came the turn of Jamal Hadi Salim, the maintainer of the kernel's traffic-control (tc) subsystem. The first issue he brought up is a problem in the tc REDIRECT action that can create infinite loops within the kernel. The problem can be easily alleviated when loops are created on the same interface: checks can be added that just drop packets coming from the same device and rate-limit logging to avoid a denial-of-service (DoS) condition.

The more serious problem occurs when a packet is forwarded from (say) interface eth0 to eth1 which then promptly redirects it from eth1 back to eth0. Obviously, this kind of problem can only be created by a user with root access so, at first glance, those issues don't seem that serious: admins can shoot themselves in the foot, so what?

But things become a little more serious when you consider the container case, where an untrusted user has root access inside a container and should have constrained resource limitations. Such a loop could allow this user to deploy an effective DoS attack against a whole group of containers running on the same machine. Even worse, this endless loop could possibly turn into a deadlock in certain scenarios, as the kernel could try to transmit the packet on the same device it originated from and block, progressively filling the queues and eventually completely breaking network access. Florian Westphal argued that a container can already create DoS conditions, for example by doing a ping flood.

According to Salim, this whole problem was created when two bits used for tracking such packets were reclaimed from the skb structure used to represent packets in the kernel. Those bits were a simple TTL (time to live) field that was incremented on each loop and dropped after a pre-determined limit was reached, breaking infinite loops. Salim asked everyone if this should be fixed or if we should just forget about this issue and move on.

Miller proposed to keep a one-behind state for the packet, fixing the simplest case (two interfaces). The general case, however, would requite a bitmap of all the interfaces to be scanned, which would impose a large overhead. Miller said an attempt to fix this should somehow be made. The root of the problem is that the network maintainers are trying to reduce the size of the skb structure, because it's used in many critical paths of the network stack. Salim's position is that, without the TTL fields, there is no way to fix the general case here, and this constitutes a security issue. So either the bits need to be brought back, or we need to live with the inherent DoS threat.

Dumping large statistics sets

Another issue Salim brought up was the question of how to export large statistics sets from the kernel. It turns out that some use cases may end up dumping a lot of data. Salim mentioned a real-world tc use case that calls for reading six-million entries. The current netlink-based API provides a way to get only 20 entries at a time, which means it takes forever to dump the state of all those policy actions. Salim has a patch that changes the dump size be eight times the NLMSG_GOOD_SIZE, which improves performance by an order of magnitude already, although there are issues with checking the user-space buffer size there.

But a more complete solution is needed. What Salim proposed was a way to ask only for the states that changed since the last dump was requested. He has a patch to add a last_access field to the netlink_callback structure used by netlink_dump() to output data; that raised the question of how to actually use that field. Since Salim fetches that data every five seconds, he figured he could just tell the kernel to return all the nodes that changed in that period. But then if the dump takes more than five seconds to complete, the next dump may be missing states that changed during the extra delay. An alternative mechanism would be for the user-space utility to keep the time stamp it requested and use that as a delta for the next dump.

It turns out this is a larger problem than just tc. Dumazet mentioned this was an issue with fq_codel classes: he would even like to be able to dump those statistics faster than every five seconds. Roopa Prabhu mentioned that Cumulus also has similar problems dumping stats from bridges, so clearly a more generic solution is needed here. There is, however, a fundamental problem with dumping large statistics sets from the kernel: those statistics are constantly changing while the dump is created and unless versioning or locking mechanisms are used — which would slow things down — the data returned is bound to be only an approximation of reality. Salim promised to send a set of RFC patches to further discussions regarding this issue, but during the following Netdev conference, Berg published a patch to fix this ten-year-old issue, which brought cheers from the audience.

The author would like to thank the Netconf and Netdev organizers for travel to, and hosting assistance in, Toronto. Many thanks to Berg, Dumazet, Salim, and Sowa for their time taken for a technical review of this article.

Note: this article first appeared in the Linux Weekly News.

Catégories: External Blogs

A report from Netconf: Day 2

Anarcat - mar, 04/11/2017 - 12:00

This article covers the second day of the informal Netconf discussions, held on on April 4, 2017. Topics discussed this day included the binding of sockets in VRF, identification of eBPF programs, inconsistencies between IPv4 and IPv6, changes to data-center hardware, and more. (See this article for coverage from the first day of discussions).

How to bind to specific sockets in VRF

One of the first presentations was from David Ahern of Cumulus, who presented a few interesting questions for the audience. His first was the problem of binding sockets to a given interface. Right now, there are four different ways this can be done:

  • the old SO_BINDTODEVICE generic socket option (see socket(7))
  • the IP_PKTINFO, IP-specific socket option (see ip(7)), introduced in Linux 2.2
  • the IP_UNICAST_IF flag, introduced in Linux 3.3 for WINE
  • the IPv6 scope ID suffix, part of the IPv6 addressing standard

So there's a problem of having too many ways of doing the same thing, something that cannot really be fixed without breaking ABI compatibility. But even worse, conflicts between those options are not reported by the kernel so it's possible for a user to set up socket flags in a way that certain flags override others and there are no checks made or errors reported. It was agreed that the user should get some notification of conflicting changes here, at least.

Furthermore, binding sockets to a specific VRF (Virtual Routing and Forwarding) device is not currently possible, so Ahern asked what the best way to do this would be, considering the many options available. A use case example is a UDP multicast socket that could be bound to a specific interface within a VRF.

This is an old problem: Tom Herbert explained that there were previous discussions about making the bind() system call more programmable so that, for example, you could bind() a UDP socket to a discrete list of IP addresses or a subnet. So he identified this issue as a broader problem that should be addressed by making the interfaces more generic.

Ahern explained that it is currently possible to bind sockets to the slave device of a VRF even though that should not be allowed. He also raised the question of how the kernel should tell which socket should be selected for incoming packets. Right now, there is a scoring mechanism for UDP sockets, but that cannot be used directly in this more general case.

David Miller said that there are already different ways of specifying scope: there is the VRF layer and the namespace ("netns") layer. A long time ago, Miller reluctantly accepted the addition of netns keys everywhere, swallowing the performance cost to gain flexibility. He argued that a new key should not be added and instead existing infrastructure should be reused. Herbert argued this was exactly the reason why this should be simplified: "if we don't answer the question, people will keep on trying this". For example, one can use a VRF to limit listening addresses, but it gets complicated if we need a device for every address. It seems the consensus evolved towards using, IP_UNICAST_IF, added back in 2012, which is accessible for non-root users. It is currently limited to UDP and RAW sockets, but it could be extended for TCP.

XDP and eBPF program identification

Ahern then turned to the problem of extracting BPF programs from the kernel. He gave the example of a simple cBPF (classic BPF) filter that checks for ARP packets. If the filter is read back from the kernel, the user gets a blob of binary data, which is hard to interpret. There is an kernel verifier that can show C-like output, but that is also difficult to interpret. Ahern then added annotations to his slide that showed what the original program actually does, which was a good demonstration of why such a feature is needed.

Ahern explained that, at least for cBPF, it should be possible to recover the original plaintext, or at least something close to the original program. A first step would be to replace known constants (like 0x806 for ARP). Even with eBPF, it should be possible to improve the output. Alexei Starovoitov, the BPF maintainer, explained that it might make sense to start by returning information about the maps used by an eBPF program. Then more complex data structures could be inspected once we know their type.

The first priority is to get simple debugging tools working but, in the long term, the goal is a full decompiler that can reconstruct instructions into a human-readable program. The question that remains is how to return this data. Ahern explained that right now the bpf() system call copies the data to a different file descriptor, but it could just fill in a buffer. Starovoitov argued for a file descriptor; that would allow the kernel to stream everything through the same descriptor instead of having many attach points. Netlink cannot be used for this because of its asynchronous nature.

A similar issue regarding the way we identify express data path (XDP) programs (which are also written in BPF) was raised by Daniel Borkmann from Covalent. Miller explained that users will want ways to figure out which XDP program was installed, so XDP needs an introspection mechanism. We currently have SHA-1 identifiers that can be internally used to tell which binary is currently loaded but those are not exposed to user space. Starovoitov mentioned it is now just a boolean that shows if a program is loaded or not.

A use case for this, on top of just trying to figure out which BPF program is loaded, is to actually fetch the source code of a BPF program that was deployed in the field for which the source was lost. It is still uncertain that it will be possible to extract an exact copy that could then be recompiled into the same program. Starovoitov added that he needed this in production to do proper reporting.

IPv4/IPv6 equivalency

The last issue — or set of issues — that Ahern brought up was the question of inconsistencies between IPv4 and IPv6. It turns out that, because both protocols were (naturally) implemented separately, there are inconsistencies in how they are handled in the Linux kernel, which affect, among other things, the VRF framework. The first example he gave was the fact that IPv6 addresses added on the loopback interface generate unreachable routes in the main routing table, yet this doesn't happen with IPv4 addresses. Hannes Frederic Sowa explained this was part of the IPv6 specification: there are stronger restrictions on loopback interfaces in IPv6 than IPv4. Ahern explained that VRF loopback interfaces do not implement these restrictions and wanted to know if this was a problem.

Another issue is that anycast routes are added to the wrong interface. This is apparently not specific to VRF: this was done "just because Java", and has been there from day one. It seems that the Java Virtual Machine builds its own routing table and assumes this behavior, so changing this would break every JVM out there, which is obviously not acceptable.

Finally, Martin Kafai Lau asked if work should be done to merge the IPv4 and IPv6 FIB (forwarding information base) trees. The FIB tree is the data structure that represents routing tables in the Linux kernel. Miller explained that the two trees are not semantically equivalent: while IPv6 does source-address lookup and routing, IPv4 does not. We can't remove the source lookups from IPv6, because "people probably use that". According to Alexander Duyck, adding source tables to IPv4 would degrade performance to the level of IPv6 performance, which was jokingly referred to as an incentive to switch to IPv6.

More seriously, Sowa argued that using the same compressed tree IPv4 uses in IPv6 could make sense. People may want to have source routing in IPv4 as well. Miller argued that the kernel is optimized for 32-bit addresses in IPv4, and conceded that it could be scaled to 64-bit subnets, but 128-bit addresses would be much harder. Sowa suggested that they could be limited to 64 bits, as global routes that are announced over BGP usually have such a limit, and more specific routes are usually at discrete prefixes like /65, /127 (for interconnect links) or /128 for (for point-to-point links). He expressed concerns over the reliability of such an implementation so, at this point, it is unlikely that the data structures could be merged. What is more likely is that the code path could be merged and simplified, while keeping the data structures separate.

Modules options substitutions

The next issue that was raised was from Jiří Pírko, who asked how to pass configuration options to a driver before the driver is initialized. Some chips require that some settings be sent before the firmware is loaded, which leads to a weird situation where there is a need to address a device before it's actually recognized by the kernel. The question then can be summarized as to how to pass information to a device that doesn't exist yet.

The answer seems to be that devlink could do this, as it has access to the full device tree and, therefore, to devices that can be addressed by (say) PCI identifiers. Then a possible devlink command could look something like:

devlink dev pci/0000:03:00.0 option set foo bar

This idea raised a bunch of extra questions: some devices don't have a one-to-one mapping with the PCI bridge identifiers, for example, meaning that those identifiers cannot be used to access such devices. Another issue is that you may want to send multiple settings in a single transaction, which doesn't fit well in the devlink model. Miller then proposed to let the driver initialize itself to some state and wait for configuration to be sent when necessary. Another way would be to unregister the driver and re-register with the given configuration. Shrijeet Mukherjee explained that right now, Cumulus is doing this using horrible startup script magic by retrying and re-registering, but it would be nice to have a more standard way to do this.

Control over UAPI patches

Another issue that came up was the problem of changes in the user-space API (UAPI) which break backward compatibility. Pírko said that "we have to be more careful about those changes". The problem is that reviewers are not always available to make detailed reviews of such changes and may not notice API-breaking changes. Pírko proposed creating a bot to check if a given patch introduces UAPI changes, changes in structs, or in netlink enums. Miller said he could block merges until discussions happen and that patchwork, which Miller uses to process patches from the mailing list, does some of this. He also pointed out there aren't enough test cases in the first place.

Starovoitov argued UAPI isn't special, there are other ways of breaking backward compatibility. He expressed concerns that such a bot could create a false sense that everything is fine while a patch could break compatibility and not be detected. Miller countered that UAPI is special in that "we're stuck with it forever". He then went on to propose that, since there's a maintainer (or more) for each module, he can make sure that each maintainer explicitly approves changes to those modules.

Data-center hardware changes

Starovoitov brought up the issue of a new type of hardware that is currently being deployed in data centers called a "multi-host NIC" (network interface card). It's a single NIC that is connected to multiple servers. Facebook, for example, uses this in its Yosemite platform that shoves twelve servers into a 2U rack mount, in three modules. Each module is made of four servers connected to the traditional switch fabric with a single NIC through PCI-Express. Mellanox and and Broadcom also have similar devices.

One question is how to manage those devices. Since they are connected through a PCI-Express bus, Linux will see them as a NIC, yet they are also a little like switches, in that they interconnect multiple servers. Furthermore, the kernel security model assumes that a NIC is trusted, and gladly opens its own memory to NICs through DMA; this can become a huge security issue when the NIC is under the control of another server. This can especially become problematic if we consider that there could be TLS hardware offloading in the future with the introduction of in-kernel TLS stacks.

The other problem is the question of reliability: since those devices are currently "dumb", they need to be managed just like a regular NIC. If the host managing the card crashes, it could disable a whole set of servers that rely on the same NIC. There could be an election process among the servers, but that complicates significantly what used to be a simple PCI connection.

Mukherjee pointed out that the model Cisco uses for this is that the "smart NIC" is a "slave" of the main switch fabric. It's a daughter card, which makes it easier to manage from a network perspective. It is clear that Linux will need a way to represent those devices, probably through the newly introduced switchdev or DSA (distributed switch architecture), but it will be something to keep an eye on as density increases in the data center.

There were many more discussions during Netconf, too many to cover here, but in the end, Miller thanked everyone for all the interesting topics as the participants dispersed for a day off to travel to Montreal to attend the following Netdev conference.

The author would like to thank the Netconf and Netdev organizers for travel to, and hosting assistance in, Toronto. Many thanks to Alexei Starovoitov for his time taken for a technical review of this article.

Note: this article first appeared in the Linux Weekly News.

Catégories: External Blogs

Contribute your skills to Debian in Montreal, April 14 2017

Anarcat - dim, 04/09/2017 - 10:06

Join us in Montreal, on April 14 2017, and we will find a way in which you can help Debian with your current set of skills! You might even learn one or two things in passing (but you don't have to).

Debian is a free operating system for your computer. An operating system is the set of basic programs and utilities that make your computer run. Debian comes with dozens of thousands of packages, precompiled software bundled up for easy installation on your machine. A number of other operating systems, such as Ubuntu and Tails, are based on Debian.

The upcoming version of Debian, called Stretch, will be released later this year. We need you to help us make it awesome

Whether you're a computer user, a graphics designer, or a bug triager, there are many ways you can contribute to this effort. We also welcome experience in consensus decision-making, anti-harassment teams, and package maintenance. No effort is too small and whatever you bring to this community will be appreciated.

Here's what we will be doing:

  • We will triage bug reports that are blocking the release of the upcoming version of Debian.

  • Debian package maintainers will fix some of these bugs.

Goals and principles

This is a work in progress, and a statement of intent. Not everything is organized and confirmed yet.

We want to bring together a heterogeneous group of people. This goal will guide our handling of sponsorship requests, and will help us make decisions if more people want to attend than we can welcome properly. In other words: if you're part of a group that is currently under-represented in computer communities, we would like you to be able to attend.

We are committed to providing a friendly, safe and welcoming environment for all, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, nationality, or other similar personal characteristic. Attending this event requires reading and respecting the Debian Code of Conduct, that sets the standards in terms of behaviour for the whole event, including communication (public and private) before, while and after.

The space where this event will take place is unfortunately not accessible to wheelchairs. Food (including vegetarian options) should be provided for lunch. If you have any specific needs regarding food, please let us know when registering, and we will do our best.

What we will be doing

This will be an informal session to confirm and fix bugs in Debian. If you have never worked with Debian packages, this is a good opportunity to learn about packaging and bugtracker usage.

Bugs flagged as Release Critical are blocking the release of the upcoming version of Debian. To fix them, it helps to make sure the bug report documents the up-to-date status of the bug, and of its resolution. One does not need to be a programmer to do this work! For example, you can try and reproduce bugs in software you use... or in software you will discover. This helps package maintainers better focus their work.

We will also try to actually fix bugs by testing patches and uploading fixes into Debian itself. Antoine Beaupré, a seasoned Debian developer, will be available to sponsor uploads and teach people about basic Debian packaging skills.

Where? When? How to register?

See https://wiki.debian.org/BSP/2017/04/ca/Montreal for the exact address and time.

Catégories: External Blogs

My free software activities, February and March 2017

Anarcat - sam, 04/01/2017 - 17:51
Looking into self-financing

Before I begin, I should mention that I started tracking my time working on free software more systematically. I spend a lot of time on the computer, as regular readers of this blog might remember so I wanted to know exactly how much time was paid vs free work. I was already using org-mode's time clock system to keep track of my work hours, so I just extended this to my regular free software contributions, which also helps in writing those reports.

It turns out that over 60% of my computer time is spent working on free software. That's huge! I was expecting something more along the range of 20 to 40% of my time. So I started thinking about ways of financing this work. I created a Patreon page but I'm hesitant into launching such a campaign: the only thing worse than "no patreon page" is "a patreon page with failed goals and no one financing it". So before starting such an effort, I'd like to get a feeling of what other people's experience with it are. I know that joeyh is close to achieving his goals, but I can't compare with the guy that invented git-annex or debhelper, so I'm concerned I wouldn't be able to raise the same level of funding.

So any advice you have, feel free to contact me in private or in the comments. If you would be ready to fund my work, I'd love to know about it, obviously, but I guess I wouldn't get real numbers until I actually open up such a page...

Now, onto the regular report.

Wallabako

I spent a good chunk of time completing most of the things I had in mind for Wallabako, which I mentioned quickly in the previous report. Wallabako is now much easier to installed, with clearer instructions, an easier to use configuration file, more reliable synchronization and read status propagation. As usual the Wallabako README file has all the details.

I've also looked at better integration with Koreader, the free software e-reader that forms the basis of the okreader free software distribution which has been able to port Debian to the Kobo e-readers, a project I am really excited about. This project has the potential of supporting Kobo readers beyond the lifetime that upstream grants it and removes a lot of proprietary software and spyware that ships with the Kobo readers. So I have made a few contributions to okreader and also on koreader, the ebook reader okreader is based on.

Stressant

I rewrote stressant, my simple burn-in and stress-testing tool. After struggling in turn with Debirf, live-build, vmdebootstrap and even FAI, I just figured maybe it wasn't the best idea to try and reinvent that particular wheel: instead of reinventing how to build yet another Debian system build tool, maybe I should just reuse what's already there.

It turns out there's a well known, succesful and fairly complete recovery system called Grml. It is a Debian Derivative, so all I needed to do was to stop procrastinating and actually write the actual stressant tool instead of just creating a distribution with a bunch of random tools shipped in. This allowed me to focus on which tools were the best to stress test different components. This selection ended up being:

fio can also be used to overwrite disk drives with the proper options (--overwrite and --size=100%), although grml also ships with nwipe for wiping old spinning disks and hdparm to do a secure erase of SSD disks (whatever that's worth).

Stressant still needs to be shipped with grml for this transition to be complete. In the meantime, I was able to configure the excellent public Gitlab CI service to provide ISO images with Stressant built-in as a stopgap measure. I also need to figure out a way to automate starting stressant from a boot menu to automate deployments on a larger scale, although because I have little need for the feature at this moment in time, this will likely wait for a sponsor to show up for this to be implemented.

Still, stressant has useful features like the capability of sending logs by email using a fresh new implementation of the Python SMTPHandler (BufferedSMTPHandler) which waits for logging to complete before sending a single email. Another interesting piece of code in there is the NegateAction argparse handler that enables the use of "toggle flags" (e.g. --flag / --no-flag). I'm so happy with the code that I figure I could just share it here directly:

class NegateAction(argparse.Action): '''add a toggle flag to argparse this is similar to 'store_true' or 'store_false', but allows arguments prefixed with --no to disable the default. the default is set depending on the first argument - if it starts with the negative form (define by default as '--no'), the default is False, otherwise True. ''' negative = '--no' def __init__(self, option_strings, *args, **kwargs): '''set default depending on the first argument''' default = not option_strings[0].startswith(self.negative) super(NegateAction, self).__init__(option_strings, *args, default=default, nargs=0, **kwargs) def __call__(self, parser, ns, values, option): '''set the truth value depending on whether it starts with the negative form''' setattr(ns, self.dest, not option.startswith(self.negative))

Short and sweet. I wonder why stuff like this is not in the standard library yet - maybe just because no one bothered yet? It'd be great to get feedback of more experienced Pythonistas on this one.

I hope that my work on Stressant is complete. I get zero funding for this work, and have little use for it myself: I manage only a few machines and such a tool really shines when you regularly put new hardware online, which is (fortunately?) not my case anymore. I'd be happy, of course, to accompany organisations and people that wish to further develop and use such a tool.

A short demo of stressant as well as detailed description of how it works is of course available in its README file.

Standard third party repositories

After looking at improvements for the grml repository instructions, I realized there was no real "best practices" document on how to configure an Apt repository. Sure, there are tools like reprepro and others, but those hardly qualify as policy: they are very flexible and there are lots of ways to create insecure repositories or curl | sh style instructions, which we of course generally want to avoid.

While the larger problem of Unstrusted Debian packages remain generally unsolved (e.g. when you install any .deb file, it can get root on your system), it seemed to me one critical part of this problem was how to add a random third-party repository to your machine while limiting, as much as possible, what possible attackers could do with such a repository. In other words, to solve the more general problem of insecure .deb files, we also need to solve the distribution problem, otherwise fixing the .deb files themselves will be useless.

This lead to the creation of standardized repository instructions that define:

  1. how to distribute the repository's public signing key (ie. over HTTPS)
  2. how to name suites and components (e.g. use stable and main unless you have a good reason, and explain yourself)
  3. recommend a healthy does of apt preferences pinning
  4. how to distribute keys (e.g. with a derive-archive-keyring package)

I've seen so many third party repositories get this wrong. For example, a lot of repositories recommend this type of command to intialize the OpenPGP trust path:

curl http://example.com/key.asc | apt-key add -

This has the following problems:

  • the key is transfered in plaintext and can easily be manipulated by an active attacker (e.g. a router on your path to the server or a neighbor in a Wifi cafe)
  • the key is added to the main trust root, which allows the key to authentify as the real Debian archive, therefore giving it all rights over all packages
  • since it's part of the global archive, it's difficult for a package to remove/add the key when a key rollover is necessary (and repositories generally don't provide a deriv-archive-keyring to do that process anyways)

An example of this are the Docker install instructions that, at least, manage to do this over HTTPS. Some other repositories don't even bother teaching people about the proper way of adding those keys. We settled for:

wget -O /usr/share/keyrings/deriv-archive-keyring.gpg https://deriv.example.net/debian/deriv-archive-keyring.gpg

That location was explicitly chosen to be out of the main trust directory, so that it needs to be explicitly added to the sources.list as well:

deb [signed-by=/usr/share/keyrings/deriv-archive-keyring.gpg] https://deriv.example.net/debian/ stable main

Similarly, we highly recommend users setup "apt pinning" to restrict what a given repository can do. Since pinning is so confusing, most people don't actually bother even configuring it and I have yet to see a single repo advise its users to configure those preferences, which are essential to limit what a repository can do. To keep configuration simple, we recommend this:

Package: * Pin: origin deriv.example.net Pin-Priority: 100

Obviously, for a single-package repository, the actual package name should be listed, e.g.:

Package: foo Pin: origin deriv.example.net Pin-Priority: 100

And the priority should probably be set to 1 unless you want to allow automatic upgrades.

It is my hope that this design will get more traction in the years to come and become a de-facto standard that will be a key part in safely adding third party repositories. There is obviously much more work to be done to improve security when installing untrusted .deb files, and I encourage Debian developers to consider contributing to the UntrustedDebs discussions and particularly to the Teams/Dpkg/Spec/DeclarativePackaging work.

Signal R&D

I spent a significant amount of time this month struggling with the Signal project on my phone. I'm still ambivalent on Signal: it's a centralized designed, too dependent on phone numbers, but I must admit they get a lot of things right and it's the only free-software platform that allows for easy-to-use, multi-platform videoconferencing that my family can use.

I've been following Signal for a while: up until now, I had been using the LibreSignal rebuild of the official client, as it is distributed on a F-Droid repository. Because I try to avoid Google (proprietary) software on my phone, it's basically the only way I could even install Signal. Unfortunately, the repository is out of date and introduces another point of trust in the distribution model: now you not only need to trust the Signal authors to do the right thing, you also need to trust that F-Droid repo not to inject nasty code on your phone. I've therefore started a discussion about how Signal could be distributed outside of the Google Play Store. I'd like to think it's one of the things that led the Signal people to distribute an official copy of Signal outside of the playstore.

After much struggling, I was able to upgrade to this official client and will be able to upgrade easily by just downloading the APK. (Do note that I ended up reinstalling and re-registering Signal, which unfortunately changed my secret keys.) I do hope Signal enters F-Droid one day, but it could take a while because it still doesn't work without Google services and barely works with MicroG, the free software alternative to the Google services clients. Moxie also set a list of requirements like crash reporting and statistics that need to be implemented on F-Droid's side before he agrees to the deployment, so this could take a while.

I've also participated in the, ahem, discussion on the JWZ blog regarding a supposed vulnerability in Signal where it would leak previously unknown phone numbers to third parties. I reviewed the way the phone number is uploaded and, while it's possible to create a rainbow table of phone numbers (which are hashed with a truncated SHA-1 checksum), I couldn't verify the claims of other participants in the thread. For me, Signal still does the right thing with contacts, although I do question the way "read status" notifications get transmitted, but that belong in another bug report / blog post.

Debian Long Term Support (LTS)

It's been more than a year working on Debian LTS, started by Raphael Hertzog at Freexian. I didn't work much in February so I had a lot of hours to catchup with, and was unfortunately unable to do so, partly because I was busy with other projects, and partly because my colleagues are doing a great job at resolving the most important issues.

So one my concerns this month was finding work. It seemed that all the hard packages were either taken (e.g. my usual favorites, tiff and imagemagick, we done by others) or just too challenging (e.g. I don't feel quite comfortable tackling the LTS branch of the Linux kernel yet).

I spent quite a bit of time trying to figure out what was wrong with pcre3, only to realise the "32" in the report was not about the architecture, but about the character width. Because of thise, I marked 4 CVEs (CVE-2017-7186, CVE-2017-7244, CVE-2017-7245, CVE-2017-7246) as "not-affected", since the 32-bith character support wasn't enabled in wheezy (or jessie, for that matter). I still spent some time trying to reproduce the issues, which require a compiler with an AddressSanitizer, something that was introduced in both Clang and GCC after Wheezy was released, which makes reproducing this fairly complicated...

This allowed me to experiment more with Vagrant, however, and I have provided the Debian cloud team with a 32-bit Vagrant box that was merged in shortly after, although it doesn't show up yet in the official list of Debian images.

Then I looked at the apparmor situation (CVE-2017-6507), Debian bug #858768). That was one tricky bug as well, since it's not a security issue in apparmor per se, but more an issue with things that assume a certain behavior from apparmor. I have concluded that Wheezy was not affected because there are no assumptions of proper isolation there - which are provided only starting from LXC 1.0 - and Docker is not in Wheezy. I also couldn't reproduce the issue on Jessie, but, as it turns out, the issue was sysvinit-specific, which is why I couldn't reproduce it under the default systemd configuration shipped with Jessie.

I also looked at the various binutils security issues: as I reported on the mailing list, I didn't see anything serious enough in there to warrant a a security release and followed the lead of both the stable and Red Hat security teams by marking this "no-dsa". I similiarly reviewed the mp3splt security issues (specifically CVE-2017-5666) and was fairly puzzled by that issue, which seems to be triggered only the same address sanitization extensions than PCRE, although there was some pretty wild interplay with debugging flags in there. All in all, it seems we can't reproduce that issue in wheezy, but I do not feel confident enough in the results to push that issue aside for now.

I finally uploaded the pending graphicsmagick issue (DLA-547-2), a regression update to fix a crash that was introduced in the previous release (DLA-547-1, mistakenly named DLA-574-1). Hopefully that release should clear up some of the confusion and fix the regression.

I also released DLA-879-1 for the CVE-2017-6369 in firebird2.5 which was an interesting experiment: I couldn't reproduce the issue in a local VM. After following the Ubuntu setup tutorial, as I wasn't too familiar with the Firebird database until now (hint: the default username and password is sysdba/masterkey), I ended up assuming we were vulnerable and just backporting the patch after seeing the jessie folks push out a release just in case.

I also looked at updating the ca-certificates package to deal with the pending WoSign/Startcom removal: I made an explicit list of the CAs that need to be removed after reviewing the Mozilla list. I also sent a patch for an unrelated issue where ca-certificates is writing to /usr/local (!!) in Debian bug #843722.

I have also done some "meta" work in starting a discussion about fixing the missing DLA links in the tracker, as you will notice all of the above links lead to nowhere. Thanks to pabs, there are now some links but unfortunately there are about 500 DLAs missing from the website. We also discussed ways to Debian bug #859123, something which is currently a manual process. This is now in the hands of the excellent webmaster team.

I have also filed a few missing security bugs (Debian bug #859135, Debian bug #859136), partly because I wanted to help the security team. But it turned out that I felt the script needed some improvements, so I submitted a patch to improve the script so it is easier to run.

Other projects

As usual, there's the usual mixed bags of chaos:

More stuff on Github...

Catégories: External Blogs

Epic Lameness

Eric Dorland - lun, 09/01/2008 - 17:26
SF.net now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Catégories: External Blogs

Sentiment Sharing

Eric Dorland - lun, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Catégories: External Blogs
Syndiquer le contenu