Skip to main content

External Blogs

Montréal-Python 68: Wysiwyg Xylophone

Montreal Python - Tue, 11/14/2017 - 00:00

Please RSVP on our meetup event

When

November 20th at 6:00PM

Where

Google Montréal 1253 McGill College #150 Montréal, QC

We thank Google Montreal for sponsoring MP68

Schedule
  • 6:00PM - Doors open
  • 6:30PM - Talks
  • 7:30PM - Break
  • 7:45PM - Talk
  • 8:30-9:00PM - End of event
Presentations Va debugger ton Python! - Stéphane Wirtel

Cette présentation vous explique les bases de Pdb ainsi que GDB, afin de debugger plus facilement vos scripts Python.

Writing a Python interpreter in Python from scratch - Zhentao Li

I will show a prototype of a Python interpreter written in entirely in Python itself (that isn't Pypy).

The goal is to have simpler internals to allow experimenting with changes to the language more easily. This interpreter has a small core with much of the library modifiable at run time for quickly testing changes. This differs from Pypy that aimed for full Python compatibility and speed (from JIT compilation). I will show some of the interesting things that you can do with this interpreter.

This interpreter has two parts: a parser to transform source to Abstract Syntax Tree and a runner for traversing this tree. I will give an overview of how both part work and discuss some challenges encountered and their solution.

This interpreter makes use of very few libraries, and only those included with CPython.

This project is looking for members to discuss ways of simplifying parts of the interpreter (among other things).

The talk would be about Rasa, an open-source chatbots platform - Nathan Zylbersztejn

Most chatbots rely on external APIs for the cool stuff such as natural language understanding (NLU) and disappoint because if and else conditionals fail at delivering good representations of our non linear human way to converse. Wouldn’t it be great if we could 1) take control of NLU and tweak it to better fit our needs and 2) really apply machine learning, extract patterns from real conversations, and handle dialogue in a decent manner? Well, we can, thanks to Rasa.ai. It’s open-source, it’s in Python, and it works

About Nathan Zylbersztejn:

Nathan is the founder of Mr. Bot, a dialogue consulting agency in Montreal with clients in the media, banking and aerospace industries. He holds a master in economics, a graduate diploma in computer science, and a machine learning nanodegree.

Categories: External Blogs

Montréal-Python 68: Call For Speakers

Montreal Python - Wed, 11/08/2017 - 00:00

We are looking for speakers that want to give a regular presentation (20 to 25 minutes) or a lightning talk (5 minutes).

Submit your proposal at team@montrealpython.org

When

November 20th, 2017
6PM to 9PM

Where

Google Montréal
1253 McGill College Ave #150
H3B 2Y5
https://goo.gl/maps/oNruyD2oVbq

Categories: External Blogs

October 2017 report: LTS, feed2exec beta, pandoc filters, git mediawiki

Anarcat - Thu, 11/02/2017 - 11:12
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. This time I worked on the famous KRACK attack, git-annex, golang and the continuous stream of GraphicsMagick security issues.

WPA & KRACK update

I spent most of my time this month on the Linux WPA code, to backport it to the old (~2012) wpa_supplicant release. I first published a patchset based on the patches shipped after the embargo for the oldstable/jessie release. After feedback from the list, I also built packages for i386 and ARM.

I have also reviewed the WPA protocol to make sure I understood the implications of the changes required to backport the patches. For example, I removed the patches touching the WNM sleep mode code as that was introduced only in the 2.0 release. Chunks of code regarding state tracking were also not backported as they are part of the state tracking code introduced later, in 3ff3323. Finally, I still have concerns about the nonce setup in patch #5. In the last chunk, you'll notice peer->tk is reset, to_set to negotiate a new TK. The other approach I considered was to backport 1380fcbd9f ("TDLS: Do not modify RNonce for an TPK M1 frame with same INonce") but I figured I would play it safe and not introduce further variations.

I should note that I share Matthew Green's observations regarding the opacity of the protocol. Normally, network protocols are freely available and security researchers like me can easily review them. In this case, I would have needed to read the opaque 802.11i-2004 pdf which is behind a TOS wall at the IEEE. I ended up reading up on the IEEE_802.11i-2004 Wikipedia article which gives a simpler view of the protocol. But it's a real problem to see such critical protocols developed behind closed doors like this.

At Guido's suggestion, I sent the final patch upstream explaining the concerns I had with the patch. I have not, at the time of writing, received any response from upstream about this, unfortunately. I uploaded the fixed packages as DLA 1150-1 on October 31st.

Git-annex

The next big chunk on my list was completing the work on git-annex (CVE-2017-12976) that I started in August. It turns out doing the backport was simpler than I expected, even with my rusty experience with Haskell. Type-checking really helps in doing the right thing, especially considering how Joey Hess implemented the fix: by introducing a new type.

So I backported the patch from upstream and notified the security team that the jessie and stretch updates would be similarly easy. I shipped the backport to LTS as DLA-1144-1. I also shared the updated packages for jessie (which required a similar backport) and stretch (which didn't) and those Sebastien Delafond published those as DSA 4010-1.

Graphicsmagick

Up next was yet another security vulnerability in the Graphicsmagick stack. This involved the usual deep dive into intricate and sometimes just unreasonable C code to try and fit a round tree in a square sinkhole. I'm always unsure about those patches, but the test suite passes, smoke tests show the vulnerability as fixed, and that's pretty much as good as it gets.

The announcement (DLA 1154-1) turned out to be a little special because I had previously noticed that the penultimate announcement (DLA 1130-1) was never sent out. So I made a merged announcement to cover both instead of re-sending the original 3 weeks late, which may have been confusing for our users.

Triage & misc

We always do a bit of triage even when not on frontdesk duty, so I:

I also did smaller bits of work on:

The latter reminded me of the concerns I have about the long-term maintainability of the golang ecosystem: because everything is statically linked, an update to a core library (say the SMTP library as in CVE-2017-15042, thankfully not affecting LTS) requires a full rebuild of all packages including the library in all distributions. So what would be a simple update in a shared library system could mean an explosion of work on statically linked infrastructures. This is a lot of work which can definitely be error-prone: as I've seen in other updates, some packages (for example the Ruby interpreter) just bit-rot on their own and eventually fail to build from source. We would also have to investigate all packages to see which one include the library, something which we are not well equipped for at this point.

Wheezy was the first release shipping golang packages but at least it's shipping only one... Stretch has shipped with two golang versions (1.7 and 1.8) which will make maintenance ever harder in the long term.

We build our computers the way we build our cities--over time, without a plan, on top of ruins. - Ellen Ullman

Other free software work

This month again, I was busy doing some serious yak shaving operations all over the internet, on top of publishing two of my largest LWN articles to date (2017-10-16-strategies-offline-pgp-key-storage and 2017-10-26-comparison-cryptographic-keycards).

feed2exec beta

Since I announced this new project last month I have released it as a beta and it entered Debian. I have also wrote useful plugins like the wayback plugin that saves pages on the Wayback machine for eternal archival. The archive plugin can also similarly save pages to the local filesystem. I also added bash completion, expanded unit tests and documentation, fixed default file paths and a bunch of bugs, and refactored the code. Finally, I also started using two external Python libraries instead of rolling my own code: the pyxdg and requests-file libraries, the latter which I packaged in Debian (and fixed a bug in their test suite).

The program is working pretty well for me. The only thing I feel is really missing now is a retry/fail mechanism. Right now, it's a little brittle: any network hiccup will yield an error email, which are readable to me but could be confusing to a new user. Strangely enough, I am particularly having trouble with (local!) DNS resolution that I need to look into, but that is probably unrelated with the software itself. Thankfully, the user can disable those with --loglevel=ERROR to silence WARNINGs.

Furthermore, some plugins still have some rough edges. For example, The Transmission integration would probably work better as a distinct plugin instead of a simple exec call, because when it adds new torrents, the output is totally cryptic. That plugin could also leverage more feed parameters to save different files in different locations depending on the feed titles, something would be hard to do safely with the exec plugin now.

I am keeping a steady flow of releases. I wish there was a way to see how effective I am at reaching out with this project, but unfortunately GitLab doesn't provide usage statistics... And I have received only a few comments on IRC about the project, so maybe I need to reach out more like it says in the fine manual. Always feels strange to have to promote your project like it's some new bubbly soap...

Next steps for the project is a final review of the API and release production-ready 1.0.0. I am also thinking of making a small screencast to show the basic capabilities of the software, maybe with asciinema's upcoming audio support?

Pandoc filters

As I mentioned earlier, I dove again in Haskell programming when working on the git-annex security update. But I also have a small Haskell program of my own - a Pandoc filter that I use to convert the HTML articles I publish on LWN.net into a Ikiwiki-compatible markdown version. It turns out the script was still missing a bunch of stuff: image sizes, proper table formatting, etc. I also worked hard on automating more bits of the publishing workflow by extracting the time from the article which allowed me to simply extract the full article into an almost final copy just by specifying the article ID. The only thing left is to add tags, and the article is complete.

In the process, I learned about new weird Haskell constructs. Take this code, for example:

-- remove needless blockquote wrapper around some tables -- -- haskell newbie tips: -- -- @ is the "at-pattern", allows us to define both a name for the -- construct and inspect the contents as once -- -- {} is the "empty record pattern": it basically means "match the -- arguments but ignore the args" cleanBlock (BlockQuote t@[Table {}]) = t

Here the idea is to remove <blockquote> elements needlessly wrapping a <table>. I can't specify the Table type on its own, because then I couldn't address the table as a whole, only its parts. I could reconstruct the whole table bits by bits, but it wasn't as clean.

The other pattern was how to, at last, address multiple string elements, which was difficult because Pandoc treats spaces specially:

cleanBlock (Plain (Strong (Str "Notifications":Space:Str "for":Space:Str "all":Space:Str "responses":_):_)) = []

The last bit that drove me crazy was the date parsing:

-- the "GAByline" div has a date, use it to generate the ikiwiki dates -- -- this is distinct from cleanBlock because we do not want to have to -- deal with time there: it is only here we need it, and we need to -- pass it in here because we do not want to mess with IO (time is I/O -- in haskell) all across the function hierarchy cleanDates :: ZonedTime -> Block -> [Block] -- this mouthful is just the way the data comes in from -- LWN/Pandoc. there could be a cleaner way to represent this, -- possibly with a record, but this is complicated and obscure enough. cleanDates time (Div (_, [cls], _) [Para [Str month, Space, Str day, Space, Str year], Para _]) | cls == "GAByline" = ikiwikiRawInline (ikiwikiMetaField "date" (iso8601Format (parseTimeOrError True defaultTimeLocale "%Y-%B-%e," (year ++ "-" ++ month ++ "-" ++ day) :: ZonedTime))) ++ ikiwikiRawInline (ikiwikiMetaField "updated" (iso8601Format time)) ++ [Para []] -- other elements just pass through cleanDates time x = [x]

Now that seems just dirty, but it was even worse before. One thing I find difficult in adapting to coding in Haskell is that you need to take the habit of writing smaller functions. The language is really not well adapted to long discourse: it's more about getting small things connected together. Other languages (e.g. Python) discourage this because there's some overhead in calling functions (10 nanoseconds in my tests, but still), whereas functions are a fundamental and important construction in Haskell that are much more heavily optimized. So I constantly need to remind myself to split things up early, otherwise I can't do anything in Haskell.

Other languages are more lenient, which does mean my code can be more dirty, but I feel get things done faster then. The oddity of Haskell makes frustrating to work with. It's like doing construction work but you're not allowed to get the floor dirty. When I build stuff, I don't mind things being dirty: I can cleanup afterwards. This is especially critical when you don't actually know how to make things clean in the first place, as Haskell will simply not let you do that at all.

And obviously, I fought with Monads, or, more specifically, "I/O" or IO in this case. Turns out that getting the current time is IO in Haskell: indeed, it's not a "pure" function that will always return the same thing. But this means that I would have had to change the signature of all the functions that touched time to include IO. I eventually moved the time initialization up into main so that I had only one IO function and moved that timestamp downwards as simple argument. That way I could keep the rest of the code clean, which seems to be an acceptable pattern.

I would of course be happy to get feedback from my Haskell readers (if any) to see how to improve that code. I am always eager to learn.

Git remote MediaWiki

Few people know that there is a MediaWiki remote for Git which allow you to mirror a MediaWiki site as a Git repository. As a disaster recovery mechanism, I have been keeping such a historical backup of the Amateur radio wiki for a while now. This originally started as a homegrown Python script to also convert the contents in Markdown. My theory then was to see if we could switch from Mediawiki to Ikiwiki, but it took so long to implement that I never completed the work.

When someone had the weird idea of renaming a page to some impossible long name on the wiki, my script broke. I tried to look at fixing it and then remember I also had a mirror running using the Git remote. It turns out it also broke on the same issue and that got me looking in the remote again. I got lost in a zillion issues, including fixing that specific issue, but I especially looked at the possibility of fetching all namespaces because I realized that the remote fetches only a part of the wiki by default. And that drove me to submit namespace support as a patch to the git mailing list. Finally, the discussion came back to how to actually maintain that contrib: in git core or outside? Finally, it looks like I'll be doing some maintenance that project outside of git, as I was granted access to the GitHub organisation...

Galore Yak Shaving

Then there's the usual hodgepodge of fixes and random things I did over the month.

There is no [web extension] only XUL! - Inside joke

Categories: External Blogs

Montréal-Python 67: Ultramodern Vintage - PyCon Canada Special

Montreal Python - Tue, 10/24/2017 - 23:00

Some of the member of the Montreal Python community were selected as speakers for PyCon Canada next month.

We've decided to give them an opportunity to practice their talk and to let them try their new material.

This is a perfect opportunity for you to give them tips, feedback and to have a chance to get an avant-premiere of this amazing conference coming to Montreal soon.

Also, don't forget to buy your PyCon Canada ticket, as their are only a few left! Tickets for PyCon Canada are available on Eventbrite. You can find more informations about the other speakers on the official schedule.

Presentations Nicole Parrot - Multiprocessing: how to run Blockly generated code in Python

Blockly is a front-end GUI developed by Google that allows developers to create a drag-and-drop language for their product. However Blockly only generates a string and does not run code directly. I will take a look at the various potential approaches to execute code that is in a string: exec - quick for short commands subprocess - can be interrupted * multiprocessing - can pass data back and forth by using queues The talk is mainly about launching processes, dealing with zombies, interrupting processes based on user input and passing data back and forth between processes. Brief mentions of Flask are to be expected. Examples will be based on my experience in developing Bloxter a graphical drag-and-drop language for the GoPiGo3 to be used at home or in classrooms.

Nicholas Nadeau - Domo arigato, Mr. Roboto: calibrating robots with Python

Modern robotic applications often rely on offline programming to reduce process downtime. In a virtual environment, robot application specialists may program, visualize, and test their robotic application before uploading it to the real production environment, saving time and costs. However, to achieve a high level of fidelity between virtual and production environments, the robot system must be accurate.

Unfortunately, even though most industrial robots are inherently precise (i.e., repeatable), they are not necessarily very accurate. One cost-effective approach to obtaining a more accurate robot is through calibration, where the actual kinematic and non-kinematic parameters of the robot model are identified and improved upon when compared to the nominal model. This talk introduces pybotics, an open-source Python toolbox for robot kinematics and calibration

Éric Araujo - 10 points pour comprendre les containers

Qu'est-ce qu'un container? Une image? Peut-on rouler plusieurs programmes dans un seul container? Peut-on utiliser Docker directement ou faut-il un orchestrateur comme Kubernetes? Avec dix questions en dix minutes, vous aurez une bonne compréhension des définitions et des rôles de ces objets, et pourrez mieux apprécier les présentations plus avancées sur le sujet.

Julia Evans - So you want to be a wizard

I don't always feel like a wizard. I'm not the most experienced member on my team, like most people I sometimes find my work difficult, and I still have a TON TO LEARN.

But along the way, I have learned a few ways to debug tricky problems, get the information I need from my colleagues, and get my job done.

Where

Notman house 51 Sherbrooke St W, Montreal, QC H2X 1X2 https://goo.gl/maps/K5WPD59YSWN2

When

Monday, October 30th, 2017 at 6PM

Schedule
  • 6:00PM - Door will open
  • 6:30PM - Start of presentations
  • 7:30PM - Break
  • 7:45PM - Presentations
  • 8:45PM - End of event
  • 8:50PM - Benelux

We would like to thank our sponsors for their continuous support

  • Notman House
  • Savoir-Faire Linux
  • Benelux

Want to sponsor Montréal-Python? Join the team at team@montrealpython.org

Categories: External Blogs

A comparison of cryptographic keycards

Anarcat - Mon, 10/16/2017 - 19:00

An earlier article showed that private key storage is an important problem to solve in any cryptographic system and established keycards as a good way to store private key material offline. But which keycard should we use? This article examines the form factor, openness, and performance of four keycards to try to help readers choose the one that will fit their needs.

I have personally been using a YubiKey NEO, since a 2015 announcement on GitHub promoting two-factor authentication. I was also able to hook up my SSH authentication key into the YubiKey's 2048 bit RSA slot. It seemed natural to move the other subkeys onto the keycard, provided that performance was sufficient. The mail client that I use, (Notmuch), blocks when decrypting messages, which could be a serious problems on large email threads from encrypted mailing lists.

So I built a test harness and got access to some more keycards: I bought a FST-01 from its creator, Yutaka Niibe, at the last DebConf and Nitrokey donated a Nitrokey Pro. I also bought a YubiKey 4 when I got the NEO. There are of course other keycards out there, but those are the ones I could get my hands on. You'll notice none of those keycards have a physical keypad to enter passwords, so they are all vulnerable to keyloggers that could extract the key's PIN. Keep in mind, however, that even with the PIN, an attacker could only ask the keycard to decrypt or sign material but not extract the key that is protected by the card's firmware.

Form factor

The four keycards have similar form factors: they all connect to a standard USB port, although both YubiKey keycards have a capacitive button by which the user triggers two-factor authentication and the YubiKey 4 can also require a button press to confirm private key use. The YubiKeys feel sturdier than the other two. The NEO has withstood two years of punishment in my pockets along with the rest of my "real" keyring and there is only minimal wear on the keycard in the picture. It's also thinner so it fits well on the keyring.

The FST-01 stands out from the other two with its minimal design. Out of the box, the FST-01 comes without a case, so the circuitry is exposed. This is deliberate: one of its goals is to be as transparent as possible, both in terms of software and hardware design and you definitely get that feeling at the physical level. Unfortunately, that does mean it feels more brittle than other models: I wouldn't carry it in my pocket all the time, although there is a case that may protect the key a little better, but it does not provide an easy way to hook it into a keyring. In the group picture above, the FST-01 is the pink plastic thing, which is a rubbery casing I received along with the device when I got it.

Notice how the USB connectors of the YubiKeys differ from the other two: while the FST-01 and the Nitrokey have standard USB connectors, the YubiKey has only a "half-connector", which is what makes it thinner than the other two. The "Nano" form factor takes this even further and almost disappears in the USB port. Unfortunately, this arrangement means the YubiKey NEO often comes loose and falls out of the USB port, especially when connected to a laptop. On my workstation, however, it usually stays put even with my whole keyring hanging off of it. I suspect this adds more strain to the host's USB port but that's a tradeoff I've lived with without any noticeable wear so far. Finally, the NEO has this peculiar feature of supporting NFC for certain operations, as LWN previously covered, but I haven't used that feature yet.

The Nitrokey Pro looks like a normal USB key, in contrast with the other two devices. It does feel a little brittle when compared with the YubiKey, although only time will tell how much of a beating it can take. It has a small ring in the case so it is possible to carry it directly on your keyring, but I would be worried the cap would come off eventually. Nitrokey devices are also two times thicker than the Yubico models which makes them less convenient to carry around on keyrings.

Open and closed designs

The FST-01 is as open as hardware comes, down to the PCB design available as KiCad files in this Git repository. The software running on the card is the Gnuk firmware that implements the OpenPGP card protocol, but you can also get it with firmware implementing a true random number generator (TRNG) called NeuG (pronounced "noisy"); the device is programmable through a standard Serial Wire Debug (SWD) port. The Nitrokey Start model also runs the Gnuk firmware. However, the Nitrokey website announces only ECC and RSA 2048-bit support for the Start, while the FST-01 also supports RSA-4096. Nitrokey's founder Jan Suhr, in a private email, explained that this is because "Gnuk doesn't support RSA-3072 or larger at a reasonable speed". Its devices (the Pro, Start, and HSM models) use a similar chip to the FST-01: the STM32F103 microcontroller.

Nitrokey also publishes its hardware designs, on GitHub, which shows the Pro is basically a fork of the FST-01, according to the ChangeLog. I opened the case to confirm it was using the STM MCU, something I should warn you against; I broke one of the pins holding it together when opening it so now it's even more fragile. But at least, I was able to confirm it was built using the STM32F103TBU6 MCU, like the FST-01.

But this is where the comparison ends: on the back side, we find a SIM card reader that holds the OpenPGP card that, in turn, holds the private key material and does the cryptographic operations. So, in effect, the Nitrokey Pro is really a evolution of the original OpenPGP card readers. Nitrokey confirmed the OpenPGP card featured in the Pro is the same as the one shipped by the Free Software Foundation Europe (FSFE): the BasicCard built by ZeitControl. Those cards, however, are covered by NDAs and the firmware is only partially open source.

This makes the Nitrokey Pro less open than the FST-01, but that's an inevitable tradeoff when choosing a design based on the OpenPGP cards, which Suhr described to me as "pretty proprietary". There are other keycards out there, however, for example the SLJ52GDL150-150k smartcard suggested by Debian developer Yves-Alexis Perez, which he prefers as it is certified by French and German authorities. In that blog post, he also said he was experimenting with the GPL-licensed OpenPGP applet implemented by the French ANSSI.

But the YubiKey devices are even further away in the closed-design direction. Both the hardware designs and firmware are proprietary. The YubiKey NEO, for example, cannot be upgraded at all, even though it is based on an open firmware. According to Yubico's FAQ, this is due to "best security practices": "There is a 'no upgrade' policy for our devices since nothing, including malware, can write to the firmware."

I find this decision questionable in a context where security updates are often more important than trying to design a bulletproof design, which may simply be impossible. And the YubiKey NEO did suffer from critical security issue that allowed attackers to bypass the PIN protection on the card, which raises the question of the actual protection of the private key material on those cards. According to Niibe, "some OpenPGP cards store the private key unencrypted. It is a common attitude for many smartcard implementations", which was confirmed by Suhr: "the private key is protected by hardware mechanisms which prevent its extraction and misuse". He is referring to the use of tamper resistance.

After that security issue, there was no other option for YubiKey NEO users than to get a new keycard (for free, thankfully) from Yubico, which also meant discarding the private key material on the key. For OpenPGP keys, this may mean having to bootstrap the web of trust from scratch if the keycard was responsible for the main certification key.

But at least the NEO is running free software based on the OpenPGP card applet and the source is still available on GitHub. The YubiKey 4, on the other hand, is now closed source, which was controversial when the new model was announced last year. It led the main Linux Foundation system administrator, Konstantin Ryabitsev, to withdraw his endorsement of Yubico products. In response, Yubico argued that this approach was essential to the security of its devices, which are now based on "a secure chip, which has built-in countermeasures to mitigate a long list of attacks". In particular, it claims that:

A commercial-grade AVR or ARM controller is unfit to be used in a security product. In most cases, these controllers are easy to attack, from breaking in via a debug/JTAG/TAP port to probing memory contents. Various forms of fault injection and side-channel analysis are possible, sometimes allowing for a complete key recovery in a shockingly short period of time.

While I understand those concerns, they eventually come down to the trust you have in an organization. Not only do we have to trust Yubico, but also hardware manufacturers and designs they have chosen. Every step in the hidden supply chain is then trusted to make correct technical decisions and not introduce any backdoors.

History, unfortunately, is not on Yubico's side: Snowden revealed the example of RSA security accepting what renowned cryptographer Bruce Schneier described as a "bribe" from the NSA to weaken its ECC implementation, by using the presumably backdoored Dual_EC_DRBG algorithm. What makes Yubico or its suppliers so different from RSA Security? Remember that RSA Security used to be an adamant opponent to the degradation of encryption standards, campaigning against the Clipper chip in the first crypto wars.

Even if we trust the Yubico supply chain, how can we trust a closed design using what basically amounts to security through obscurity? Publicly auditable designs are an important tradition in cryptography, and that principle shouldn't stop when software is frozen into silicon.

In fact, a critical vulnerability called ROCA disclosed recently affects closed "smartcards" like the YubiKey 4 and allows full private key recovery from the public key if the key was generated on a vulnerable keycard. When speaking with Ars Technica, the researchers outlined the importance of open designs and questioned the reliability of certification:

Our work highlights the dangers of keeping the design secret and the implementation closed-source, even if both are thoroughly analyzed and certified by experts. The lack of public information causes a delay in the discovery of flaws (and hinders the process of checking for them), thereby increasing the number of already deployed and affected devices at the time of detection.

This issue with open hardware designs seems to be recurring topic of conversation on the Gnuk mailing list. For example, there was a discussion in September 2017 regarding possible hardware vulnerabilities in the STM MCU that would allow extraction of encrypted key material from the key. Niibe referred to a talk presented at the WOOT 17 workshop, where Johannes Obermaier and Stefan Tatschner, from the Fraunhofer Institute, demonstrated attacks against the STMF0 family MCUs. It is still unclear if those attacks also apply to the older STMF1 design used in the FST-01, however. Furthermore, extracted private key material is still protected by user passphrase, but the Gnuk uses a weak key derivation function, so brute-forcing attacks may be possible. Fortunately, there is work in progress to make GnuPG hash the passphrase before sending it to the keycard, which should make such attacks harder if not completely pointless.

When asked about the Yubico claims in a private email, Niibe did recognize that "it is true that there are more weak points in general purpose implementations than special implementations". During the last DebConf in Montreal, Niibe explained:

If you don't trust me, you should not buy from me. Source code availability is only a single factor: someone can maliciously replace the firmware to enable advanced attacks.

Niibe recommends to "build the firmware yourself", also saying the design of the FST-01 uses normal hardware that "everyone can replicate". Those advantages are hard to deny for a cryptographic system: using more generic components makes it harder for hostile parties to mount targeted attacks.

A counter-argument here is that it can be difficult for a regular user to audit such designs, let alone physically build the device from scratch but, in a mailing list discussion, Debian developer Ian Jackson explained that:

You don't need to be able to validate it personally. The thing spooks most hate is discovery. Backdooring supposedly-free hardware is harder (more costly) because it comes with greater risk of discovery.

To put it concretely: if they backdoor all of them, someone (not necessarily you) might notice. (Backdooring only yours involves messing with the shipping arrangements and so on, and supposes that you specifically are of interest.)

Since that, as far as we know, the STM microcontrollers are not backdoored, I would tend to favor those devices instead of proprietary ones, as such a backdoor would be more easily detectable than in a closed design. Even though physical attacks may be possible against those microcontrollers, in the end, if an attacker has physical access to a keycard, I consider the key compromised, even if it has the best chip on the market. In our email exchange, Niibe argued that "when a token is lost, it is better to revoke keys, even if the token is considered secure enough". So like any other device, physical compromise of tokens may mean compromise of the key and should trigger key-revocation procedures.

Algorithms and performance

To establish reliable performance results, I wrote a benchmark program naively called crypto-bench that could produce comparable results between the different keys. The program takes each algorithm/keycard combination and runs 1000 decryptions of a 16-byte file (one AES-128 block) using GnuPG, after priming it to get the password cached. I assume the overhead of GnuPG calls to be negligible, as it should be the same across all tokens, so comparisons are possible. AES encryption is constant across all tests as it is always performed on the host and fast enough to be irrelevant in the tests.

I used the following:

  • Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz running Debian 9 ("stretch"/stable amd64), using GnuPG 2.1.18-6 (from the stable Debian package)
  • Nitrokey Pro 0.8 (latest firmware)
  • FST-01, running Gnuk version 1.2.5 (latest firmware)
  • YubiKey NEO OpenPGP applet 1.0.10 (not upgradable)
  • YubiKey 4 4.2.6 (not upgradable)

I ran crypto-bench for each keycard, which resulted in the following:

Algorithm Device Mean time (s) ECDH-Curve25519 CPU 0.036 FST-01 0.135 RSA-2048 CPU 0.016 YubiKey-4 0.162 Nitrokey-Pro 0.610 YubiKey-NEO 0.736 FST-01 1.265 RSA-4096 CPU 0.043 YubiKey-4 0.875 Nitrokey-Pro 3.150 FST-01 8.218

There we see the performance of the four keycards I tested, compared with the same operations done without a keycard: the "CPU" device. That provides the baseline time of GnuPG decrypting the file. The first obvious observation is that using a keycard is slower: in the best scenario (FST-01 + ECC) we see a four-fold slowdown, but in the worst case (also FST-01, but RSA-4096), we see a catastrophic 200-fold slowdown. When I presented the results on the Gnuk mailing list, GnuPG developer Werner Koch confirmed those "numbers are as expected":

With a crypto chip RSA is much faster. By design the Gnuk can't be as fast - it is just a simple MCU. However, using Curve25519 Gnuk is really fast.

And yes, the FST-01 is really fast at doing ECC, but it's also the only keycard that handles ECC in my tests; the Nitrokey Start and Nitrokey HSM should support it as well, but I haven't been able to test those devices. Also note that the YubiKey NEO doesn't support RSA-4096 at all, so we can only compare RSA-2048 across keycards. We should note, however, that ECC is slower than RSA on the CPU, which suggests the Gnuk ECC implementation used by the FST-01 is exceptionally fast.

In discussions about improving the performance of the FST-01, Niibe estimated the user tolerance threshold to be "2 seconds decryption time". In a new design using the STM32L432 microcontroller, Aurelien Jarno was able to bring the numbers for RSA-2048 decryption from 1.27s down to 0.65s, and for RSA-4096, from 8.22s down to 3.87s seconds. RSA-4096 is still beyond the two-second threshold, but at least it brings the FST-01 close to the YubiKey NEO and Nitrokey Pro performance levels.

We should also underline the superior performance of the YubiKey 4: whatever that thing is doing, it's doing it faster than anyone else. It does RSA-4096 faster than the FST-01 does RSA-2048, and almost as fast as the Nitrokey Pro does RSA-2048. We should also note that the Nitrokey Pro also fails to cross the two-second threshold for RSA-4096 decryption.

For me, the FST-01's stellar performance with ECC outshines the other devices. Maybe it says more about the efficiency of the algorithm than the FST-01 or Gnuk's design, but it's definitely an interesting avenue for people who want to deploy those modern algorithms. So, in terms of performance, it is clear that both the YubiKey 4 and the FST-01 take the prize in their own areas (RSA and ECC, respectively).

Conclusion

In the above presentation, I have evaluated four cryptographic keycards for use with various OpenPGP operations. What the results show is that the only efficient way of storing a 4096-bit encryption key on a keycard would be to use the YubiKey 4. Unfortunately, I do not feel we should put our trust in such closed designs so I would argue you should either stick with 2048-bit encryption subkeys or keep the keys on disk. Considering that losing such a key would be catastrophic, this might be a good approach anyway. You should also consider switching to ECC encryption: even though it may not be supported everywhere, GnuPG supports having multiple encryption subkeys on a keyring: if one algorithm is unsupported (e.g. GnuPG 1.4 doesn't support ECC), it will fall back to a supported algorithm (e.g. RSA). Do not forget your previously encrypted material doesn't magically re-encrypt itself using your new encryption subkey, however.

For authentication and signing keys, speed is not such an issue, so I would warmly recommend either the Nitrokey Pro or Start, or the FST-01, depending on whether you want to start experimenting with ECC algorithms. Availability also seems to be an issue for the FST-01. While you can generally get the device when you meet Niibe in person for a few bucks (I bought mine for around \$30 Canadian), the Seeed online shop says the device is out of stock at the time of this writing, even though Jonathan McDowell said that may be inaccurate in a debian-project discussion. Nevertheless, this issue may make the Nitrokey devices more attractive. When deciding on using the Pro or Start, Suhr offered the following advice:

In practice smart card security has been proven to work well (at least if you use a decent smart card). Therefore the Nitrokey Pro should be used for high security cases. If you don't trust the smart card or if Nitrokey Start is just sufficient for you, you can choose that one. This is why we offer both models.

So far, I have created a signing subkey and moved that and my authentication key to the YubiKey NEO, because it's a device I physically trust to keep itself together in my pockets and I was already using it. It has served me well so far, especially with its extra features like U2F and HOTP support, which I use frequently. Those features are also available on the Nitrokey Pro, so that may be an alternative if I lose the YubiKey. I will probably move my main certification key to the FST-01 and a LUKS-encrypted USB disk, to keep that certification key offline but backed up on two different devices. As for the encryption key, I'll wait for keycard performance to improve, or simply switch my whole keyring to ECC and use the FST-01 or Nitrokey Start for that purpose.

[The author would like to thank Nitrokey for providing hardware for testing.]

This article first appeared in the Linux Weekly News.

Categories: External Blogs

My free software activities, September 2017

Anarcat - Mon, 10/02/2017 - 18:55
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. I mostly worked on the git, git-annex and ruby packages this month but didn't have time to completely use my allocated hours because I started too late in the month.

Ruby

I was hoping someone would pick up the Ruby work I submitted in August, but it seems no one wanted to touch that mess, understandably. Since then, new issues came up, and not only did I have to work on the rubygems and ruby1.9 package, but now the ruby1.8 package also had to get security updates. Yes: it's bad enough that the rubygems code is duplicated in one other package, but wheezy had the misfortune of having two Ruby versions supported.

The Ruby 1.9 also failed to build from source because of test suite issues, which I haven't found a clean and easy fix for, so I ended up making test suite failures non-fatal in 1.9, which they were already in 1.8. I did keep a close eye on changes in the test suite output to make sure tests introduced in the security fixes would pass and that I wouldn't introduce new regressions as well.

So I published the following advisories:

  • ruby 1.8: DLA-1113-1, fixing CVE-2017-0898 and CVE-2017-10784. 1.8 doesn't seem affected by CVE-2017-14033 as the provided test does not fail (but it does fail in 1.9.1). test suite was, before patch:

    2199 tests, 1672513 assertions, 18 failures, 51 errors

    and after patch:

    2200 tests, 1672514 assertions, 18 failures, 51 errors
  • rubygems: uploaded the package prepared in August as is in DLA-1112-1, fixing CVE-2017-0899, CVE-2017-0900, CVE-2017-0901. here the test suite passed normally.

  • ruby 1.9: here I used the used 2.2.8 release tarball to generate a patch that would cover all issues and published DLA-1114-1 that fixes the CVEs of the two packages above. the test suite was, before patches:

    10179 tests, 2232711 assertions, 26 failures, 23 errors, 51 skips

    and after patches:

    1.9 after patches (B): 10184 tests, 2232771 assertions, 26 failures, 23 errors, 53 skips
Git

I also quickly issued an advisory (DLA-1120-1) for CVE-2017-14867, an odd issue affecting git in wheezy. The backport was tricky because it wouldn't apply cleanly and the git package had a custom patching system which made it tricky to work on.

Git-annex

I did a quick stint on git-annex as well: I was able to reproduce the issue and confirm an approach to fixing the issue in wheezy, although I didn't have time to complete the work before the end of the month.

Other free software work New project: feed2exec

I should probably make a separate blog post about this, but ironically, I don't want to spend too much time writing those reports, so this will be quick.

I wrote a new program, called feed2exec. It's basically a combination of feed2imap, rss2email and feed2tweet: it allows you to fetch RSS feeds and send them in a mailbox, but what's special about it, compared to the other programs above, is that it is more generic: you can basically make it do whatever you want on new feed items. I have, for example, replaced my feed2tweet instance with it, using this simple configuration:

[anarcat] url = https://anarc.at/blog/index.rss output = feed2exec.plugins.exec args = tweet "%(title)0.70s %(link)0.70s"

The sample configuration file also has examples to talk with Mastodon, Pump.io and, why not, a torrent server to download torrent files available over RSS feeds. A trivial configuration can also make it work as a crude podcast client. My main motivation to work on this was that it was difficult to extend feed2imap to do what I needed (which was to talk to transmission to download torrent files) and rss2email didn't support my workflow (which is delivering to feed-specific mail folders). Because both projects also seemed abandoned, it seemed like a good idea at the time to start a new one, although the rss2email community has now restarted the project and may produce interesting results.

As an experiment, I tracked my time working on this project. It turns out it took about 45 hours to write that software. Considering feed2exec is about 1400 SLOC, that's 30 lines of code per hour. I don't know if that's slow or fast, but it's an interesting metric for future projects. It sure seems slow to me, but we need to keep in mind those 30 lines of code don't include documentation and repeated head banging on the keyboard. For example, I found two issues with the upstream feedparser package which I use to parse feeds which also seems unmaintained, unfortunately.

Feed2exec is beta software at this point, but it's working well enough for me and the design is much simpler than the other programs of the kind. The main issue people can expect from it at this point is formatting issues or parse errors on exotic feeds, and noisy error messages on network errors, all of which should be fairly easy to fix in the test suite. I hope it will be useful for the community and, as usual, I welcome contributions, help and suggestions on how to improve the software.

More Python templates

As part of the work on feed2exec, I did cleanup a few things in the ecdysis project, mostly to hook tests up in the CI, improve on the advancedConfig logger and cleanup more stuff.

While I was there, it turns out that I built a pretty decent basic CI configuration for Python on GitLab. Whereas the previous templates only had a non-working Django example, you should now be able to chose a Python template when you configure CI on GitLab 10 and above, which should hook you up with normal Python setup procedures like setup.py install and setup.py test.

Selfspy

I mentioned working on a monitoring tool in my last post, because it was a feature from Workrave missing in SafeEyes. It turns out there is already such a tool called selfspy. I did an extensive review of the software to make sure it wouldn't leak out confidential information out before using it, and it looks, well... kind of okay. It crashed on me at least once so far, which is too bad because then it loses track of the precious activity. I have used it at least once to figure out what the heck I worked on during the day, so it's pretty useful. I particularly used it to backtrack my work on feed2exec as I didn't originally track my time on the project.

Unfortunately, selfspy seems unmaintained. I have proposed a maintenance team and hopefully the project maintainer will respond and at least share access so we don't end up in a situation like linkchecker. I also sent a bunch of pull requests to fix some issues like being secure by default and fixing the build. Apart from the crash, the main issue I have found with the software is that it doesn't detect idle time which means certain apps are disproportionatly represented in statistics. There are also some weaknesses in the crypto that should be adressed for people that encrypt their database.

Next step is to package selfspy in Debian which should hopefully be simple enough...

Restic documentation security

As part of a documentation patch on the Restic backup software, I have improved on my previous Perl script to snoop on process commandline arguments. A common flaw in shell scripts and cron jobs is to pass secret material in the environment (usually safe) but often through commandline arguments (definitely not safe). The challenge, in this peculiar case, was the env binary, but the last time I encountered such an issue was with the Drush commandline tool, which was passing database credentials in clear to the mysql binary. Using my Perl sniffer, I could get to 60 checks per second (or 60Hz). After reimplementing it in Python, this number went up to 160Hz, which still wasn't enough to catch the elusive env command, which is much faster at hiding arguments than MySQL, in large part because it simply does an execve() once the environment is setup.

Eventually, I just went crazy and rewrote the whole thing in C which was able to get 700-900Hz and did catch the env command about 10-20% of the time. I could probably have rewritten this by simply walking /proc myself (since this is what all those libraries do in the end) to get better result, but then my point was made. I was able to prove to the restic author the security issues that warranted the warning. It's too bad I need to repeat this again and again, but then my tools are getting better at proving that issue... I suspect it's not the last time I have to deal with this issue and I am happy to think that I can come up with an even more efficient proof of concept tool the next time around.

Ansible 101

After working on documentation last month, I ended up writing my first Ansible playbook this month, converting my tasksel list to a working Ansible configuration. This was a useful exercise: it allow me to find a bunch of packages which have been removed from Debian and provides much better usability than tasksel. For example, it provides a --diff argument that shows which packages are missing from a given setup.

I am still unsure about Ansible. Manifests do seem really verbose and I still can't get used to the YAML DSL. I could probably have done the same thing with Puppet and just run puppet apply on the resulting config. But I must admit my bias towards Python is showing here: I can't help but think Puppet is going to be way less accessible with its rewrite in Clojure and C (!)... But then again, I really like Puppet's approach of having generic types like package or service rather than Ansible's clunky apt/yum/dnf/package/win_package types...

Pat and Ham radio

After responding (too late) to a request for volunteers to help in Puerto Rico, I realized that my amateur radio skills were somewhat lacking in the "packet" (data transmission in ham jargon) domain, as I wasn't used to operate a Winlink node. Such a node can receive and transmit actual emails over the airwaves, for free, without direct access to the internet, which is very useful in disaster relief efforts. Through summary research, I stumbled upon the new and very promising Pat project which provides one of the first user-friendly Linux-compatible Winlink programs. I provided improvements on the documentation and some questions regarding compatibility issues which are still pending.

But my pet issue is the establishment of pat as a normal internet citizen by using standard protocols for receiving and sending email. Not sure how that can be implemented, but we'll see. I am also hoping to upload an official Debian package and hopefully write more about this soon. Stay tuned!

Random stuff

I ended up fixing my Kodi issue by starting it as a standalone systemd service, instead of gdm3, which is now completely disabled on the media box. I simply used the following /etc/systemd/service/kodi.service file:

[Unit] Description=Kodi Media Center After=systemd-user-sessions.service network.target sound.target [Service] User=xbmc Group=video Type=simple TTYPath=/dev/tty7 StandardInput=tty ExecStart=/usr/bin/xinit /usr/bin/dbus-launch --exit-with-session /usr/bin/kodi-standalone -- :1 -nolisten tcp vt7 Restart=on-abort RestartSec=5 [Install] WantedBy=multi-user.target

The downside of this is that it needs Xorg to run as root, whereas modern Xorg can now run rootless. Not sure how to fix this or where... But if I put needs_root_rights=no in Xwrapper.config, I get the following error in .local/share/xorg/Xorg.1.log:

[ 2502.533] (EE) modeset(0): drmSetMaster failed: Permission denied

After fooling around with iPython, I ended up trying the xonsh shell, which is supposed to provide a bash-compatible Python shell environment. Unfortunately, I found it pretty unusable as a shell: it works fine to do Python stuff, but then all my environment and legacy bash configuration files were basically ignored so I couldn't get working quickly. This is too bad because the project looked very promising...

Finally, one of my TLS hosts using a Let's Encrypt certificate wasn't renewing properly, and I figured out why. It turns out the ProxyPass command was passing everything to the backend, including the /.well-known requests, which obviously broke ACME verification. The solution was simple enough, disable the proxy for that directory:

ProxyPass /.well-known/ !
Categories: External Blogs

My free software activities, September 2017

Anarcat - Mon, 10/02/2017 - 18:55
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. I mostly worked on the git, git-annex and ruby packages this month but didn't have time to completely use my allocated hours because I started too late in the month.

Ruby

I was hoping someone would pick up the Ruby work I submitted in August, but it seems no one wanted to touch that mess, understandably. Since then, new issues came up, and not only did I have to work on the rubygems and ruby1.9 package, but now the ruby1.8 package also had to get security updates. Yes: it's bad enough that the rubygems code is duplicated in one other package, but wheezy had the misfortune of having two Ruby versions supported.

The Ruby 1.9 also failed to build from source because of test suite issues, which I haven't found a clean and easy fix for, so I ended up making test suite failures non-fatal in 1.9, which they were already in 1.8. I did keep a close eye on changes in the test suite output to make sure tests introduced in the security fixes would pass and that I wouldn't introduce new regressions as well.

So I published the following advisories:

  • ruby 1.8: DLA-1113-1, fixing CVE-2017-0898 and CVE-2017-10784. 1.8 doesn't seem affected by CVE-2017-14033 as the provided test does not fail (but it does fail in 1.9.1). test suite was, before patch:

    2199 tests, 1672513 assertions, 18 failures, 51 errors

    and after patch:

    2200 tests, 1672514 assertions, 18 failures, 51 errors
  • rubygems: uploaded the package prepared in August as is in DLA-1112-1, fixing CVE-2017-0899, CVE-2017-0900, CVE-2017-0901. here the test suite passed normally.

  • ruby 1.9: here I used the used 2.2.8 release tarball to generate a patch that would cover all issues and published DLA-1114-1 that fixes the CVEs of the two packages above. the test suite was, before patches:

    10179 tests, 2232711 assertions, 26 failures, 23 errors, 51 skips

    and after patches:

    1.9 after patches (B): 10184 tests, 2232771 assertions, 26 failures, 23 errors, 53 skips
Git

I also quickly issued an advisory (DLA-1120-1) for CVE-2017-14867, an odd issue affecting git in wheezy. The backport was tricky because it wouldn't apply cleanly and the git package had a custom patching system which made it tricky to work on.

Git-annex

I did a quick stint on git-annex as well: I was able to reproduce the issue and confirm an approach to fixing the issue in wheezy, although I didn't have time to complete the work before the end of the month.

Other free software work New project: feed2exec

I should probably make a separate blog post about this, but ironically, I don't want to spend too much time writing those reports, so this will be quick.

I wrote a new program, called feed2exec. It's basically a combination of feed2imap, rss2email and feed2tweet: it allows you to fetch RSS feeds and send them in a mailbox, but what's special about it, compared to the other programs above, is that it is more generic: you can basically make it do whatever you want on new feed items. I have, for example, replaced my feed2tweet instance with it, using this simple configuration:

[anarcat] url = https://anarc.at/blog/index.rss output = feed2exec.plugins.exec args = tweet "%(title)0.70s %(link)0.70s"

The sample configuration file also has examples to talk with Mastodon, Pump.io and, why not, a torrent server to download torrent files available over RSS feeds. A trivial configuration can also make it work as a crude podcast client. My main motivation to work on this was that it was difficult to extend feed2imap to do what I needed (which was to talk to transmission to download torrent files) and rss2email didn't support my workflow (which is delivering to feed-specific mail folders). Because both projects also seemed abandoned, it seemed like a good idea at the time to start a new one, although the rss2email community has now restarted the project and may produce interesting results.

As an experiment, I tracked my time working on this project. It turns out it took about 45 hours to write that software. Considering feed2exec is about 1400 SLOC, that's 30 lines of code per hour. I don't know if that's slow or fast, but it's an interesting metric for future projects. It sure seems slow to me, but we need to keep in mind those 30 lines of code don't include documentation and repeated head banging on the keyboard. For example, I found two issues with the upstream feedparser package which I use to parse feeds which also seems unmaintained, unfortunately.

Feed2exec is beta software at this point, but it's working well enough for me and the design is much simpler than the other programs of the kind. The main issue people can expect from it at this point is formatting issues or parse errors on exotic feeds, and noisy error messages on network errors, all of which should be fairly easy to fix in the test suite. I hope it will be useful for the community and, as usual, I welcome contributions, help and suggestions on how to improve the software.

More Python templates

As part of the work on feed2exec, I did cleanup a few things in the ecdysis project, mostly to hook tests up in the CI, improve on the advancedConfig logger and cleanup more stuff.

While I was there, it turns out that I built a pretty decent basic CI configuration for Python on GitLab. Whereas the previous templates only had a non-working Django example, you should now be able to chose a Python template when you configure CI on GitLab 10 and above, which should hook you up with normal Python setup procedures like setup.py install and setup.py test.

Selfspy

I mentioned working on a monitoring tool in my last post, because it was a feature from Workrave missing in SafeEyes. It turns out there is already such a tool called selfspy. I did an extensive review of the software to make sure it wouldn't leak out confidential information out before using it, and it looks, well... kind of okay. It crashed on me at least once so far, which is too bad because then it loses track of the precious activity. I have used it at least once to figure out what the heck I worked on during the day, so it's pretty useful. I particularly used it to backtrack my work on feed2exec as I didn't originally track my time on the project.

Unfortunately, selfspy seems unmaintained. I have proposed a maintenance team and hopefully the project maintainer will respond and at least share access so we don't end up in a situation like linkchecker. I also sent a bunch of pull requests to fix some issues like being secure by default and fixing the build. Apart from the crash, the main issue I have found with the software is that it doesn't detect idle time which means certain apps are disproportionatly represented in statistics. There are also some weaknesses in the crypto that should be adressed for people that encrypt their database.

Next step is to package selfspy in Debian which should hopefully be simple enough...

Restic documentation security

As part of a documentation patch on the Restic backup software, I have improved on my previous Perl script to snoop on process commandline arguments. A common flaw in shell scripts and cron jobs is to pass secret material in the environment (usually safe) but often through commandline arguments (definitely not safe). The challenge, in this peculiar case, was the env binary, but the last time I encountered such an issue was with the Drush commandline tool, which was passing database credentials in clear to the mysql binary. Using my Perl sniffer, I could get to 60 checks per second (or 60Hz). After reimplementing it in Python, this number went up to 160Hz, which still wasn't enough to catch the elusive env command, which is much faster at hiding arguments than MySQL, in large part because it simply does an execve() once the environment is setup.

Eventually, I just went crazy and rewrote the whole thing in C which was able to get 700-900Hz and did catch the env command about 10-20% of the time. I could probably have rewritten this by simply walking /proc myself (since this is what all those libraries do in the end) to get better result, but then my point was made. I was able to prove to the restic author the security issues that warranted the warning. It's too bad I need to repeat this again and again, but then my tools are getting better at proving that issue... I suspect it's not the last time I have to deal with this issue and I am happy to think that I can come up with an even more efficient proof of concept tool the next time around.

Ansible 101

After working on documentation last month, I ended up writing my first Ansible playbook this month, converting my tasksel list to a working Ansible configuration. This was a useful exercise: it allow me to find a bunch of packages which have been removed from Debian and provides much better usability than tasksel. For example, it provides a --diff argument that shows which packages are missing from a given setup.

I am still unsure about Ansible. Manifests do seem really verbose and I still can't get used to the YAML DSL. I could probably have done the same thing with Puppet and just run puppet apply on the resulting config. But I must admit my bias towards Python is showing here: I can't help but think Puppet is going to be way less accessible with its rewrite in Clojure and C (!)... But then again, I really like Puppet's approach of having generic types like package or service rather than Ansible's clunky apt/yum/dnf/package/win_package types...

Pat and Ham radio

After responding (too late) to a request for volunteers to help in Puerto Rico, I realized that my amateur radio skills were somewhat lacking in the "packet" (data transmission in ham jargon) domain, as I wasn't used to operate a Winlink node. Such a node can receive and transmit actual emails over the airwaves, for free, without direct access to the internet, which is very useful in disaster relief efforts. Through summary research, I stumbled upon the new and very promising Pat project which provides one of the first user-friendly Linux-compatible Winlink programs. I provided improvements on the documentation and some questions regarding compatibility issues which are still pending.

But my pet issue is the establishment of pat as a normal internet citizen by using standard protocols for receiving and sending email. Not sure how that can be implemented, but we'll see. I am also hoping to upload an official Debian package and hopefully write more about this soon. Stay tuned!

Random stuff

I ended up fixing my Kodi issue by starting it as a standalone systemd service, instead of gdm3, which is now completely disabled on the media box. I simply used the following /etc/systemd/service/kodi.service file:

[Unit] Description=Kodi Media Center After=systemd-user-sessions.service network.target sound.target [Service] User=xbmc Group=video Type=simple TTYPath=/dev/tty7 StandardInput=tty ExecStart=/usr/bin/xinit /usr/bin/dbus-launch --exit-with-session /usr/bin/kodi-standalone -- :1 -nolisten tcp vt7 Restart=on-abort RestartSec=5 [Install] WantedBy=multi-user.target

The downside of this is that it needs Xorg to run as root, whereas modern Xorg can now run rootless. Not sure how to fix this or where... But if I put needs_root_rights=no in Xwrapper.config, I get the following error in .local/share/xorg/Xorg.1.log:

[ 2502.533] (EE) modeset(0): drmSetMaster failed: Permission denied

After fooling around with iPython, I ended up trying the xonsh shell, which is supposed to provide a bash-compatible Python shell environment. Unfortunately, I found it pretty unusable as a shell: it works fine to do Python stuff, but then all my environment and legacy bash configuration files were basically ignored so I couldn't get working quickly. This is too bad because the project looked very promising...

Finally, one of my TLS hosts using a Let's Encrypt certificate wasn't renewing properly, and I figured out why. It turns out the ProxyPass command was passing everything to the backend, including the /.well-known requests, which obviously broke ACME verification. The solution was simple enough, disable the proxy for that directory:

ProxyPass /.well-known/ !
Categories: External Blogs

Strategies for offline PGP key storage

Anarcat - Mon, 10/02/2017 - 12:00

While the adoption of OpenPGP by the general population is marginal at best, it is a critical component for the security community and particularly for Linux distributions. For example, every package uploaded into Debian is verified by the central repository using the maintainer's OpenPGP keys and the repository itself is, in turn, signed using a separate key. If upstream packages also use such signatures, this creates a complete trust path from the original upstream developer to users. Beyond that, pull requests for the Linux kernel are verified using signatures as well. Therefore, the stakes are high: a compromise of the release key, or even of a single maintainer's key, could enable devastating attacks against many machines.

That has led the Debian community to develop a good grasp of best practices for cryptographic signatures (which are typically handled using GNU Privacy Guard, also known as GnuPG or GPG). For example, weak (less than 2048 bits) and vulnerable PGPv3 keys were removed from the keyring in 2015, and there is a strong culture of cross-signing keys between Debian members at in-person meetings. Yet even Debian developers (DDs) do not seem to have established practices on how to actually store critical private key material, as we can see in this discussion on the debian-project mailing list. That email boiled down to a simple request: can I have a "key dongles for dummies" tutorial? Key dongles, or keycards as we'll call them here, are small devices that allow users to store keys on an offline device and provide one possible solution for protecting private key material. In this article, I hope to use my experience in this domain to clarify the issue of how to store those precious private keys that, if compromised, could enable arbitrary code execution on millions of machines all over the world.

Why store keys offline?

Before we go into details about storing keys offline, it may be useful to do a small reminder of how the OpenPGP standard works. OpenPGP keys are made of a main public/private key pair, the certification key, used to sign user identifiers and subkeys. My public key, shown below, has the usual main certification/signature key (marked SC) but also an encryption subkey (marked E), a separate signature key (S), and two authentication keys (marked A) which I use as RSA keys to log into servers using SSH, thanks to the Monkeysphere project.

pub rsa4096/792152527B75921E 2009-05-29 [SC] [expires: 2018-04-19] 8DC901CE64146C048AD50FBB792152527B75921E uid [ultimate] Antoine Beaupré <anarcat@anarc.at> uid [ultimate] Antoine Beaupré <anarcat@koumbit.org> uid [ultimate] Antoine Beaupré <anarcat@orangeseeds.org> uid [ultimate] Antoine Beaupré <anarcat@debian.org> sub rsa2048/B7F648FED2DF2587 2012-07-18 [A] sub rsa2048/604E4B3EEE02855A 2012-07-20 [A] sub rsa4096/A51D5B109C5A5581 2009-05-29 [E] sub rsa2048/3EA1DDDDB261D97B 2017-08-23 [S]

All the subkeys (sub) and identities (uid) are bound by the main certification key using cryptographic self-signatures. So while an attacker stealing a private subkey can spoof signatures in my name or authenticate to other servers, that key can always be revoked by the main certification key. But if the certification key gets stolen, all bets are off: the attacker can create or revoke identities or subkeys as they wish. In a catastrophic scenario, an attacker could even steal the key and remove your copies, taking complete control of the key, without any possibility of recovery. Incidentally, this is why it is so important to generate a revocation certificate and store it offline.

So by moving the certification key offline, we reduce the attack surface on the OpenPGP trust chain: day-to-day keys (e.g. email encryption or signature) can stay online but if they get stolen, the certification key can revoke those keys without having to revoke the main certification key as well. Note that a stolen encryption key is a different problem: even if we revoke the encryption subkey, this will only affect future encrypted messages. Previous messages will be readable by the attacker with the stolen subkey even if that subkey gets revoked, so the benefits of revoking encryption certificates are more limited.

Common strategies for offline key storage

Considering the security tradeoffs, some propose storing those critical keys offline to reduce those threats. But where exactly? In an attempt to answer that question, Jonathan McDowell, a member of the Debian keyring maintenance team, said that there are three options: use an external LUKS-encrypted volume, an air-gapped system, or a keycard.

Full-disk encryption like LUKS adds an extra layer of security by hiding the content of the key from an attacker. Even though private keyrings are usually protected by a passphrase, they are easily identifiable as a keyring. But when a volume is fully encrypted, it's not immediately obvious to an attacker there is private key material on the device. According to Sean Whitton, another advantage of LUKS over plain GnuPG keyring encryption is that you can pass the --iter-time argument when creating a LUKS partition to increase key-derivation delay, which makes brute-forcing much harder. Indeed, GnuPG 2.x doesn't have a run-time option to configure the key-derivation algorithm, although a patch was introduced recently to make the delay configurable at compile time in gpg-agent, which is now responsible for all secret key operations.

The downside of external volumes is complexity: GnuPG makes it difficult to extract secrets out of its keyring, which makes the first setup tricky and error-prone. This is easier in the 2.x series thanks to the new storage system and the associated keygrip files, but it still requires arcane knowledge of GPG internals. It is also inconvenient to use secret keys stored outside your main keyring when you actually do need to use them, as GPG doesn't know where to find those keys anymore.

Another option is to set up a separate air-gapped system to perform certification operations. An example is the PGP clean room project, which is a live system based on Debian and designed by DD Daniel Pocock to operate an OpenPGP and X.509 certificate authority using commodity hardware. The basic principle is to store the secrets on a different machine that is never connected to the network and, therefore, not exposed to attacks, at least in theory. I have personally discarded that approach because I feel air-gapped systems provide a false sense of security: data eventually does need to come in and out of the system, somehow, even if only to propagate signatures out of the system, which exposes the system to attacks.

System updates are similarly problematic: to keep the system secure, timely security updates need to be deployed to the air-gapped system. A common use pattern is to share data through USB keys, which introduce a vulnerability where attacks like BadUSB can infect the air-gapped system. From there, there is a multitude of exotic ways of exfiltrating the data using LEDs, infrared cameras, or the good old TEMPEST attack. I therefore concluded the complexity tradeoffs of an air-gapped system are not worth it. Furthermore, the workflow for air-gapped systems is complex: even though PGP clean room went a long way, it's still lacking even simple scripts that allow signing or transferring keys, which is a problem shared by the external LUKS storage approach.

Keycard advantages

The approach I have chosen is to use a cryptographic keycard: an external device, usually connected through the USB port, that stores the private key material and performs critical cryptographic operations on the behalf of the host. For example, the FST-01 keycard can perform RSA and ECC public-key decryption without ever exposing the private key material to the host. In effect, a keycard is a miniature computer that performs restricted computations for another host. Keycards usually support multiple "slots" to store subkeys. The OpenPGP standard specifies there are three subkeys available by default: for signature, authentication, and encryption. Finally, keycards can have an actual physical keypad to enter passwords so a potential keylogger cannot capture them, although the keycards I have access to do not feature such a keypad.

We could easily draw a parallel between keycards and an air-gapped system; in effect, a keycard is a miniaturized air-gapped computer and suffers from similar problems. An attacker can intercept data on the host system and attack the device in the same way, if not more easily, because a keycard is actually "online" (i.e. clearly not air-gapped) when connected. The advantage over a fully-fledged air-gapped computer, however, is that the keycard implements only a restricted set of operations. So it is easier to create an open hardware and software design that is audited and verified, which is much harder to accomplish for a general-purpose computer.

Like air-gapped systems, keycards address the scenario where an attacker wants to get the private key material. While an attacker could fool the keycard into signing or decrypting some data, this is possible only while the key is physically connected, and the keycard software will prompt the user for a password before doing the operation, though the keycard can cache the password for some time. In effect, it thwarts offline attacks: to brute-force the key's password, the attacker needs to be on the target system and try to guess the keycard's password, which will lock itself after a limited number of tries. It also provides for a clean and standard interface to store keys offline: a single GnuPG command moves private key material to a keycard (the keytocard command in the --edit-key interface), whereas moving private key material to a LUKS-encrypted device or air-gapped computer is more complex.

Keycards are also useful if you operate on multiple computers. A common problem when using GnuPG on multiple machines is how to safely copy and synchronize private key material among different devices, which introduces new security problems. Indeed, a "good rule of thumb in a forensics lab", according to Robert J. Hansen on the GnuPG mailing list, is to "store the minimum personal data possible on your systems". Keycards provide the best of both worlds here: you can use your private key on multiple computers without actually storing it in multiple places. In fact, Mike Gerwitz went as far as saying:

For users that need their GPG key on multiple boxes, I consider a smartcard to be essential. Otherwise, the user is just furthering her risk of compromise.

Keycard tradeoffs

As Gerwitz hinted, there are multiple downsides to using a keycard, however. Another DD, Wouter Verhelst clearly expressed the tradeoffs:

Smartcards are useful. They ensure that the private half of your key is never on any hard disk or other general storage device, and therefore that it cannot possibly be stolen (because there's only one possible copy of it).

Smartcards are a pain in the ass. They ensure that the private half of your key is never on any hard disk or other general storage device but instead sits in your wallet, so whenever you need to access it, you need to grab your wallet to be able to do so, which takes more effort than just firing up GnuPG. If your laptop doesn't have a builtin cardreader, you also need to fish the reader from your backpack or wherever, etc.

"Smartcards" here refer to older OpenPGP cards that relied on the IEC 7816 smartcard connectors and therefore needed a specially-built smartcard reader. Newer keycards simply use a standard USB connector. In any case, it's true that having an external device introduces new issues: attackers can steal your keycard, you can simply lose it, or wash it with your dirty laundry. A laptop or a computer can also be lost, of course, but it is much easier to lose a small USB keycard than a full laptop — and I have yet to hear of someone shoving a full laptop into a washing machine. When you lose your keycard, unless a separate revocation certificate is available somewhere, you lose complete control of the key, which is catastrophic. But, even if you revoke the lost key, you need to create a new one, which involves rebuilding the web of trust for the key — a rather expensive operation as it usually requires meeting other OpenPGP users in person to exchange fingerprints.

You should therefore think about how to back up the certification key, which is a problem that already exists for online keys; of course, everyone has a revocation certificates and backups of their OpenPGP keys... right? In the keycard scenario, backups may be multiple keycards distributed geographically.

Note that, contrary to an air-gapped system, a key generated on a keycard cannot be backed up, by design. For subkeys, this is not a problem as they do not need to be backed up (except encryption keys). But, for a certification key, this means users need to generate the key on the host and transfer it to the keycard, which means the host is expected to have enough entropy to generate cryptographic-strength random numbers, for example. Also consider the possibility of combining different approaches: you could, for example, use a keycard for day-to-day operation, but keep a backup of the certification key on a LUKS-encrypted offline volume.

Keycards introduce a new element into the trust chain: you need to trust the keycard manufacturer to not have any hostile code in the key's firmware or hardware. In addition, you need to trust that the implementation is correct. Keycards are harder to update: the firmware may be deliberately inaccessible to the host for security reasons or may require special software to manipulate. Keycards may be slower than the CPU in performing certain operations because they are small embedded microcontrollers with limited computing power.

Finally, keycards may encourage users to trust multiple machines with their secrets, which works against the "minimum personal data" principle. A completely different approach called the trusted physical console (TPC) does the opposite: instead of trying to get private key material onto all of those machines, just have them on a single machine that is used for everything. Unlike a keycard, the TPC is an actual computer, say a laptop, which has the advantage of needing no special procedure to manage keys. The downside is, of course, that you actually need to carry that laptop everywhere you go, which may be problematic, especially in some corporate environments that restrict bringing your own devices.

Quick keycard "howto"

Getting keys onto a keycard is easy enough:

  1. Start with a temporary key to test the procedure:

    export GNUPGHOME=$(mktemp -d) gpg --generate-key
  2. Edit the key using its user ID (UID):

    gpg --edit-key UID
  3. Use the key command to select the first subkey, then copy it to the keycard (you can also use the addcardkey command to just generate a new subkey directly on the keycard):

    gpg> key 1 gpg> keytocard
  4. If you want to move the subkey, use the save command, which will remove the local copy of the private key, so the keycard will be the only copy of the secret key. Otherwise use the quit command to save the key on the keycard, but keep the secret key in your normal keyring; answer "n" to "save changes?" and "y" to "quit without saving?" . This way the keycard is a backup of your secret key.

  5. Once you are satisfied with the results, repeat steps 1 through 4 with your normal keyring (unset $GNUPGHOME)

When a key is moved to a keycard, --list-secret-keys will show it as sec> (or ssb> for subkeys) instead of the usual sec keyword. If the key is completely missing (for example, if you moved it to a LUKS container), the # sign is used instead. If you need to use a key from a keycard backup, you simply do gpg --card-edit with the key plugged in, then type the fetch command at the prompt to fetch the public key that corresponds to the private key on the keycard (which stays on the keycard). This is the same procedure as the one to use the secret key on another computer.

Conclusion

There are already informal OpenPGP best-practices guides out there and some recommend storing keys offline, but they rarely explain what exactly that means. Storing your primary secret key offline is important in dealing with possible compromises and we examined the main ways of doing so: either with an air-gapped system, LUKS-encrypted keyring, or by using keycards. Each approach has its own tradeoffs, but I recommend getting familiar with keycards if you use multiple computers and want a standardized interface with minimal configuration trouble.

And of course, those approaches can be combined. This tutorial, for example, uses a keycard on an air-gapped computer, which neatly resolves the question of how to transmit signatures between the air-gapped system and the world. It is definitely not for the faint of heart, however.

Once one has decided to use a keycard, the next order of business is to choose a specific device. That choice will be addressed in a followup article, where I will look at performance, physical design, and other considerations.

This article first appeared in the Linux Weekly News.

Categories: External Blogs

Montréal-Python 66: Saxophonic Twist

Montreal Python - Mon, 09/25/2017 - 23:00

It's back-to-everything and Montreal-Python is no exception! Join us at our first meetup of Fall and meet us at Benelux for some networking after the event!

We still have lightning talk spots (5 minutes). If you are interested send su an email to team@montrealpython.org

Montreal-Python is open to everybody, no matter your Python level.

When

October 2nd, 2017 at 6PM

• 6:00PM - Doors will open

• 6:30PM - Start of presentations

• 7:15PM - Break

• 8:00PM - End of event

• 8:30PM - Networking at Benelux - Sherbrooke

Where

Shopify - 490 de la Gauchetière Ouest suite 300, Montréal, QC

Lighting Talks Making graphical interfaces from geometric primitives - Zhentao Li Presentations Introduction to Apache Airflow - Merouane Benthameur

Data integration tools and workflow schedulers come with huge variety these days, but most of them are commercial. Apache incubator has recently introduced Airflow, which is originally developed by RBNB.

Airflow is an open source platform to programmatically author, Schedule and monitor data pipelines. At the talk, we will cover the principal components and functionalities of Airflow, as well as the integration and managing workflows.

On distributed architectures and services patterns - Ismael Serratos

Even though the hype for micro architectures and serverless applications is starting to get low we all face the results of it in some way or another. This talk aims to provide a tour on how RPC can go very good or very wrong. At the same time some comparisons will be made between the most common used stacks to create distributed service architectures.

PyCon Canada Early Bird Tickets

Also, a little reminder that Early Bird tickets for PyCon Canada (which will be held in Montreal on November 18th to 21st) are now available at https://2017.pycon.ca/

The early bird rates are only for a limited quantity of tickets, so get yours soon!

PyCon Canada Sponsorship

Would you like to become a sponsor for PyCon Canada? Send an email to sponsorship@pycon.ca

We’d like to thank our sponsors for their continued support:

• Shopify

• UQÀM

• Benelux

• Savoir-faire Linux

Categories: External Blogs

Montréal-Python 66: Call For Speakers

Montreal Python - Wed, 09/20/2017 - 23:00

It's back-to-everything and Montreal-Python is no exception! We are looking for speakers for our first meetup of fall.

We are looking for speakers that want to give a regular presentation (20 to 25 minutes) or a lightning talk (5 minutes).

Submit your proposal at team@montrealpython.org

When

October 2nd, 2017 at 6PM

Where

TBD

PyCon Canada Early Bird Tickets

Also, a little reminder that Early Bird tickets for PyCon Canada (which will be held in Montreal on November 18th to 21st) are now available at https://2017.pycon.ca/.

The early bird rates are only for a limited quantity of tickets, so get yours soon!

PyCon Canada Sponsorship

Would you like to become a sponsor for PyCon Canada? Send an email to sponsorship@pycon.ca

Categories: External Blogs

My free software activities, August 2017

Anarcat - Sat, 09/02/2017 - 15:16
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. This month I worked on a few major packages that took a long time instead of multiple smaller issues. Affected packages were Mercurial, libdbd-mysql-perl and Ruby.

Mercurial updates

Mercurial was vulnerable to two CVEs: CVE-2017-1000116 (command injection on clients through malicious ssh URLs) and CVE-2017-1000115 (path traversal via symlink). The former is an issue that actually affects many other similar software like Git (CVE-2017-1000117), Subversion (CVE-2017-9800) and even CVS (CVE-2017-12836). The latter symlink issue is a distinct issue that came up during an internal audit.

The fix, shipped as DLA-1072-1, involved a rather difficult backport, especially because the Mercurial test suite takes a long time to complete. This reminded me of the virtues of DEB_BUILD_OPTIONS=parallel=4, which sped up the builds considerably. I also discovered that the Wheezy build chain doesn't support sbuild's --source-only-changes flag which I had hardcoded in my sbuild.conf file. This seems to be simply because sbuild passes --build=source to dpkg-buildpackage, an option that is supported only in jessie or later.

libdbd-mysql-perl

I have worked on fixing two issues with the libdbd-mysql-perl package, CVE-2017-10788 and CVE-2017-10789, which resulted in the DLA-1079-1 upload. Behind this mysteriously named package sits a critical piece of infrastructure, namely the mysql commandline client which is probably used and abused by hundreds if not thousands of home-made scripts, but also all of Perl's MySQL support, which is probably used by even a larger base of software.

Through the Debian bug reports (Debian bug #866818 and Debian bug #866821), I have learned that the patches existed in the upstream tracker but were either ignored or even reverted in the latest 4.043 upstream release. It turns out that there are talks of forking that library because of maintainership issue. It blows my mind that such an important part of MySQL is basically unmaintained.

I ended up backporting the upstream patches, which was also somewhat difficult because of the long-standing issues with SSL support in MySQL. The backport there was particularly hard to test, as you need to run that test suite by hand, twice: once with a server configured with a (valid!) SSL certificate and one without (!). I'm wondering how much time it is really worth spending on trying to fix SSL in MySQL, however. It has been badly broken forever, and while the patch is an improvement, I would actually still never trust SSL transports in MySQL over an untrusted network. The few people that I know use such transports wrap their connections around a simpler stunnel instead.

The other issue was easier to fix so I submitted a pull request upstream to make sure that work isn't lost, although it is not clear what the future of that patch (or project!) will be at this point.

Rubygems

I also worked on the rubygems issues, which, thanks to the "vendoring" practice of the Ruby community, also affects the ruby1.9 package. 4 distinct CVEs were triaged here (CVE-2017-0899, CVE-2017-0900, CVE-2017-0901 and CVE-2017-0902) and I determined the latter issue didn't affect wheezy as rubygems doesn't do its own DNS resolution there (later versions lookup SRV records).

This is another package where the test suite takes a long time to run. Worse, the packages in Wheezy actually fails to build from source: the test suites just fail in various steps, particularly because of dh key too small errors for Rubygems, but also other errors for Ruby. I also had trouble backporting one test which I had to simply skip for Rubygems. I uploaded and announced test packages and hopefully I'll be able to complete this work soon, although I would certainly appreciate any help on this...

Triage

I took a look at the sox, libvorbis and exiv2 issues. None had fixes available. sox and exiv2 were basically a list of fuzzing issues, which are often minor or at least of unknown severity. Those would have required a significant amount of work and I figured I would prioritize other work first.

I also triaged CVE-2017-7506, which doesn't seem to affect the spice package in wheezy, after doing a fairly thorough audit of the code. The vulnerability is specifically bound to the reds_on_main_agent_monitors_config function, which is simply not present in our older version. A hostile message would fall through the code and not provoke memory allocation or out of bounds access, so I simply marked the wheezy version as not-affected, something which usually happens during the original triage but can also happen during the actual patching work, as in this case.

Other free software work

This describes the volunteer work I do on various free software projects. This month, again, my internal reports show that I spent about the same time on volunteer and paid time, but this is probably a wrong estimate because I spent a lot of time at Debconf which I didn't clock in...

Debconf

So I participated in the 17th Debian Conference in Montreal. It was great to see (and make!) so many friends from all over the world in person again, and I was happy to work on specific issues together with other Debian developers. I am especially thankful to David Bremner for fixing the syncing of the flagged tag when added to new messages (patch series). This allows me to easily sync the one tag (inbox) that is not statically assigned during notmuch new, by using flagged as a synchronization tool. This allows me to use notmuch more easily across multiple machines without having to sync all tags with dump/restore or using muchsync which wasn't working for me (although a new release came out which may fix my issues). The magic incantation looks something like this:

notmuch tag -inbox tag:inbox and not tag:flagged notmuch tag +inbox not tag:inbox and tag:flagged

However, most of my time in the first week (Debcamp) was spent trying to complete the networking setup: configure switches, setup wiring and so on. I also configured an apt-cacher-ng proxy to serve packages to attendees during the conference. I configured it with Avahi to configure clients automatically, which led me to discover (and fix) issue Debian bug #870321) although there are more issues with the autodiscovery mechanism... I spent extra time to document the (somewhat simple) configuration of such a server in the Debian wiki because it was not the first time I had research that procedure...

I somehow thought this was a great time to upgrade my laptop to stretch. Normally, I keep that device running stable because I don't use it often and I don't want to have major traumatizing upgrades every time I leave with it on a trip. But this time was special: there were literally hundreds of Debian developers to help me out if there was trouble. And there was, of course, trouble as it turns out! I had problems with the fonts on my display, because, well, I had suspended (twice) my laptop during the install. The fix was simply to flush the fontconfig cache, and I tried to document this in the fonts wiki page and my upgrades page.

I also gave a short training called Debian packaging 101 which was pretty successful. Like the short presentation I made at the last Montreal BSP, the workshop was based on my quick debian development guide. I'm thinking of expanding this to a larger audience with a "102" course that would discuss more complex packaging problems. But my secret plan (well, secret until now I guess) is to make packaging procedures more uniform in Debian by training new Debian packagers using that same training for the next 2 decades. But I will probably start by just trying to do this again at the next Debconf, if I can attend.

Debian uploads

I also sponsored two packages during Debconf: one was a "scratch an itch" upload (elpa-ivy) which I requested (Debian bug #863216) as part of a larger effort to ship the Emacs elisp packages as Debian packages. The other was an upload of diceware to build the documentation in a separate package and fix other issues I have found in the package during a review.

I also uploaded a bunch of other fixes to the Debian archive:

Signing keys rotation

I also started the process of moving my main OpenPGP certification key by adding a signing subkey. The subkey is stored in a cryptographic token so I can sign things on more than one machine without storing that critical key on all those devices physically.

Unfortunately, this meant that I need to do some shenanigans when I want to sign content in my Debian work, because the new subkey takes time to propagate to the Debian archive. For example, I have to specify the primary key with a "bang" when signing packages (debsign -k '792152527B75921E!' ...) or use inline signatures in email sent for security announcement (since that trick doesn't work in Mutt or Notmuch). I tried to figure out how to better coordinate this next time by reading up documentation on keyring.debian.org, but there is no fixed date for key changes on the rsync interface. There are "monthly changes" so one's best bet is to look for the last change in their git repository.

GitLab.com and LFS migration

I finally turned off my src.anarc.at git repository service by moving the remaining repos to GitLab. Unfortunately, GitLab removed support for git-annex recently, so I had to migrate my repositories to Git-LFS, which was an interesting experience. LFS is pretty easy to use, definitely simpler than git-annex. It also seems to be a good match for the use-case at hand, which is to store large files (videos, namely) as part of slides for presentations.

It turns out that their migration guide could have been made much simpler. I tried to submit those changes to the documentation but couldn't fork the GitLab EE project to make a patch, so I just documented the issue in the original MR for now. While I was there I filed a feature request to add a new reference shortcut (GL-NNN) after noticing a similar token used on GitHub. This would be a useful addition because I often have numbering conflicts between Debian BTS bug numbers and GitLab issues in packages I maintain there. In particular, I have problems using GitLab issue numbers in Monkeysign, because commit logs end up in Debian changelogs and will be detected by the Debian infrastructure even though those are GitLab bug numbers. Using such a shortcut would avoid detection and such a conflict.

Numpy-stats

I wrote a small tool to extract numeric statistics from a given file. I often do ad-hoc benchmarks where I store a bunch of numbers in a file and then try to make averages and so on. As an exercise in learning NumPy, I figured I would write such a simple tool, called numpy-stats, which probably sounds naive to seasoned Python scientists.

My incentive was that I was trying to figure out what was the distribution of password length in a given password generator scheme. So I wrote this simple script:

for i in seq 10000 ; do shuf -n4 /usr/share/dict/words | tr -d '\n' done > length

And then feed that data in the tool:

$ numpy-stats lengths { "max": 60, "mean": 33.883293722913464, "median": 34.0, "min": 14, "size": 143060, "std": 5.101490225062775 }

I am surprised that there isn't such a tool already: hopefully I am wrong and will just be pointed towards the better alternative in the comments here!

Safe Eyes

I added screensaver support to the new SafeEyes project, which I am considering as a replacement to the workrave project I have been using for years. I really like how the interruptions basically block the whole screen: way more effective than only blocking the keyboard, because all potential distractions go away.

One feature that is missing is keystrokes and mouse movement counting and of course an official Debian package, although the latter would be easy to fix because upstream already has an unofficial build. I am thinking of writing my own little tool to count keystrokes, since the overlap between SafeEyes and such a counter isn't absolutely necessary. This is something that workrave does, but there are "idle time" extensions in Xorg that do not need to count keystrokes. There are already certain tools to count input events, but none seem to do what I want (most of them are basically keyloggers). It would be an interesting test to see if it's possible to write something that would work both for Xorg and Wayland at the same time. Unfortunately, preliminary research show that:

  1. in Xorg, the only way to implement this is to sniff all events, ie. to implement a keylogger

  2. in Wayland, this is completely unsupported. it seems some compositors could implement such a counter, but then it means that this is compositor specific, or, in other words, unportable

So there is little hope here, which brings to my mind "painmeter" as an appropriate name for this future programming nightmare.

Ansible

I sent my first contribution to the ansible project with a small documentation fix. I had an eye opener recently when I discovered a GitLab ansible prototype that would manipulate GitLab settings. When I first discovered Ansible, I was frustrated by the YAML/Jinja DSL: it felt silly to write all this code in YAML when you are a Python developer. It was great to see reasonably well-written Python code that would do things and delegate the metadata storage (and only that!) to YAML, as opposed to using YAML as a DSL.

So I figured I would look at the Ansible documentation on how this works, but unfortunately, the Ansible documentation is severly lacking in this area. There are broken links (I only fixed one page) and missing pieces. For example, the developing plugins page doesn't explain how to program a plugin at all.

I was told on IRC that: "documentation around developing plugins is sparse in general. the code is the best documentation that exists (right now)". I didn't get a reply when asking which code in particular could provide good examples either. In comparison, Puppet has excellent documentation on how to create custom types, functions and facts. That is definitely a turn-off for a new contributor, but at least my pull request was merged in and I can only hope that seasoned Ansible contributors expand on this critical piece of documentation eventually.

Misc

As you can see, I'm all over the place, as usual. GitHub tells me I "Opened 13 other pull requests in 11 repositories" (emphasis mine), which I guess means on top of the "9 commits in 5 repositories" mentioned earlier. My profile probably tells a more detailed story that what would be useful to mention here. I should also mention how difficult it is to write those reports: I basically do a combination of looking into my GitHub and GitLab profiles, the last 30 days of emails and filesystem changes (!!). En vrac, a list of changes which may be of interest:

  • font-large (and its alias, font-small): shortcut to send the right escape sequence to rxvt so it changes its font
  • fix-acer: short script to hardcode the modeline (you remember those?!) for my screen which has a broken EDID pin (so autodetection fails, yay Xorg log files...)
  • ikiwiki-pandoc-quickie: fake ikiwiki renderer that (ab)uses pandoc to generate a HTML file with the right stylesheet to preview Markdown as it may look in this blog (the basic template is missing still)
  • git-annex-transfer: a command I've often been missing in git-annex, which is a way to transfer files between remotes without having to copy them locally (upstream feature request)
  • I linked the graphics of the Debian archive software architecture in the Debian wiki in the hope more people notice it.
  • I did some tweaks on my Taffybar to introduce a battery meter and hoping to have temperature sensors, which mostly failed. there's a pending pull request that may bring some sense into this, hopefully.
  • I made two small patches in Monkeysign to fix gpg.conf handling and multiple email output, a dumb bug I cannot believe anyone noticed or reported just yet. Thanks Valerie for the bug report! The upload of this in Debian is pending a review from the release team.
Categories: External Blogs

My free software activities, August 2017

Anarcat - Sat, 09/02/2017 - 15:16
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. This month I worked on a few major packages that took a long time instead of multiple smaller issues. Affected packages were Mercurial, libdbd-mysql-perl and Ruby.

Mercurial updates

Mercurial was vulnerable to two CVEs: CVE-2017-1000116 (command injection on clients through malicious ssh URLs) and CVE-2017-1000115 (path traversal via symlink). The former is an issue that actually affects many other similar software like Git (CVE-2017-1000117), Subversion (CVE-2017-9800) and even CVS (CVE-2017-12836). The latter symlink issue is a distinct issue that came up during an internal audit.

The fix, shipped as DLA-1072-1, involved a rather difficult backport, especially because the Mercurial test suite takes a long time to complete. This reminded me of the virtues of DEB_BUILD_OPTIONS=parallel=4, which sped up the builds considerably. I also discovered that the Wheezy build chain doesn't support sbuild's --source-only-changes flag which I had hardcoded in my sbuild.conf file. This seems to be simply because sbuild passes --build=source to dpkg-buildpackage, an option that is supported only in jessie or later.

libdbd-mysql-perl

I have worked on fixing two issues with the libdbd-mysql-perl package, CVE-2017-10788 and CVE-2017-10789, which resulted in the DLA-1079-1 upload. Behind this mysteriously named package sits a critical piece of infrastructure, namely the mysql commandline client which is probably used and abused by hundreds if not thousands of home-made scripts, but also all of Perl's MySQL support, which is probably used by even a larger base of software.

Through the Debian bug reports (Debian bug #866818 and Debian bug #866821), I have learned that the patches existed in the upstream tracker but were either ignored or even reverted in the latest 4.043 upstream release. It turns out that there are talks of forking that library because of maintainership issue. It blows my mind that such an important part of MySQL is basically unmaintained.

I ended up backporting the upstream patches, which was also somewhat difficult because of the long-standing issues with SSL support in MySQL. The backport there was particularly hard to test, as you need to run that test suite by hand, twice: once with a server configured with a (valid!) SSL certificate and one without (!). I'm wondering how much time it is really worth spending on trying to fix SSL in MySQL, however. It has been badly broken forever, and while the patch is an improvement, I would actually still never trust SSL transports in MySQL over an untrusted network. The few people that I know use such transports wrap their connections around a simpler stunnel instead.

The other issue was easier to fix so I submitted a pull request upstream to make sure that work isn't lost, although it is not clear what the future of that patch (or project!) will be at this point.

Rubygems

I also worked on the rubygems issues, which, thanks to the "vendoring" practice of the Ruby community, also affects the ruby1.9 package. 4 distinct CVEs were triaged here (CVE-2017-0899, CVE-2017-0900, CVE-2017-0901 and CVE-2017-0902) and I determined the latter issue didn't affect wheezy as rubygems doesn't do its own DNS resolution there (later versions lookup SRV records).

This is another package where the test suite takes a long time to run. Worse, the packages in Wheezy actually fails to build from source: the test suites just fail in various steps, particularly because of dh key too small errors for Rubygems, but also other errors for Ruby. I also had trouble backporting one test which I had to simply skip for Rubygems. I uploaded and announced test packages and hopefully I'll be able to complete this work soon, although I would certainly appreciate any help on this...

Triage

I took a look at the sox, libvorbis and exiv2 issues. None had fixes available. sox and exiv2 were basically a list of fuzzing issues, which are often minor or at least of unknown severity. Those would have required a significant amount of work and I figured I would prioritize other work first.

I also triaged CVE-2017-7506, which doesn't seem to affect the spice package in wheezy, after doing a fairly thorough audit of the code. The vulnerability is specifically bound to the reds_on_main_agent_monitors_config function, which is simply not present in our older version. A hostile message would fall through the code and not provoke memory allocation or out of bounds access, so I simply marked the wheezy version as not-affected, something which usually happens during the original triage but can also happen during the actual patching work, as in this case.

Other free software work

This describes the volunteer work I do on various free software projects. This month, again, my internal reports show that I spent about the same time on volunteer and paid time, but this is probably a wrong estimate because I spent a lot of time at Debconf which I didn't clock in...

Debconf

So I participated in the 17th Debian Conference in Montreal. It was great to see (and make!) so many friends from all over the world in person again, and I was happy to work on specific issues together with other Debian developers. I am especially thankful to David Bremner for fixing the syncing of the flagged tag when added to new messages (patch series). This allows me to easily sync the one tag (inbox) that is not statically assigned during notmuch new, by using flagged as a synchronization tool. This allows me to use notmuch more easily across multiple machines without having to sync all tags with dump/restore or using muchsync which wasn't working for me (although a new release came out which may fix my issues). The magic incantation looks something like this:

notmuch tag -inbox tag:inbox and not tag:flagged notmuch tag +inbox not tag:inbox and tag:flagged

However, most of my time in the first week (Debcamp) was spent trying to complete the networking setup: configure switches, setup wiring and so on. I also configured an apt-cacher-ng proxy to serve packages to attendees during the conference. I configured it with Avahi to configure clients automatically, which led me to discover (and fix) issue Debian bug #870321) although there are more issues with the autodiscovery mechanism... I spent extra time to document the (somewhat simple) configuration of such a server in the Debian wiki because it was not the first time I had research that procedure...

I somehow thought this was a great time to upgrade my laptop to stretch. Normally, I keep that device running stable because I don't use it often and I don't want to have major traumatizing upgrades every time I leave with it on a trip. But this time was special: there were literally hundreds of Debian developers to help me out if there was trouble. And there was, of course, trouble as it turns out! I had problems with the fonts on my display, because, well, I had suspended (twice) my laptop during the install. The fix was simply to flush the fontconfig cache, and I tried to document this in the fonts wiki page and my upgrades page.

I also gave a short training called Debian packaging 101 which was pretty successful. Like the short presentation I made at the last Montreal BSP, the workshop was based on my quick debian development guide. I'm thinking of expanding this to a larger audience with a "102" course that would discuss more complex packaging problems. But my secret plan (well, secret until now I guess) is to make packaging procedures more uniform in Debian by training new Debian packagers using that same training for the next 2 decades. But I will probably start by just trying to do this again at the next Debconf, if I can attend.

Debian uploads

I also sponsored two packages during Debconf: one was a "scratch an itch" upload (elpa-ivy) which I requested (Debian bug #863216) as part of a larger effort to ship the Emacs elisp packages as Debian packages. The other was an upload of diceware to build the documentation in a separate package and fix other issues I have found in the package during a review.

I also uploaded a bunch of other fixes to the Debian archive:

Signing keys rotation

I also started the process of moving my main OpenPGP certification key by adding a signing subkey. The subkey is stored in a cryptographic token so I can sign things on more than one machine without storing that critical key on all those devices physically.

Unfortunately, this meant that I need to do some shenanigans when I want to sign content in my Debian work, because the new subkey takes time to propagate to the Debian archive. For example, I have to specify the primary key with a "bang" when signing packages (debsign -k '792152527B75921E!' ...) or use inline signatures in email sent for security announcement (since that trick doesn't work in Mutt or Notmuch). I tried to figure out how to better coordinate this next time by reading up documentation on keyring.debian.org, but there is no fixed date for key changes on the rsync interface. There are "monthly changes" so one's best bet is to look for the last change in their git repository.

GitLab.com and LFS migration

I finally turned off my src.anarc.at git repository service by moving the remaining repos to GitLab. Unfortunately, GitLab removed support for git-annex recently, so I had to migrate my repositories to Git-LFS, which was an interesting experience. LFS is pretty easy to use, definitely simpler than git-annex. It also seems to be a good match for the use-case at hand, which is to store large files (videos, namely) as part of slides for presentations.

It turns out that their migration guide could have been made much simpler. I tried to submit those changes to the documentation but couldn't fork the GitLab EE project to make a patch, so I just documented the issue in the original MR for now. While I was there I filed a feature request to add a new reference shortcut (GL-NNN) after noticing a similar token used on GitHub. This would be a useful addition because I often have numbering conflicts between Debian BTS bug numbers and GitLab issues in packages I maintain there. In particular, I have problems using GitLab issue numbers in Monkeysign, because commit logs end up in Debian changelogs and will be detected by the Debian infrastructure even though those are GitLab bug numbers. Using such a shortcut would avoid detection and such a conflict.

Numpy-stats

I wrote a small tool to extract numeric statistics from a given file. I often do ad-hoc benchmarks where I store a bunch of numbers in a file and then try to make averages and so on. As an exercise in learning NumPy, I figured I would write such a simple tool, called numpy-stats, which probably sounds naive to seasoned Python scientists.

My incentive was that I was trying to figure out what was the distribution of password length in a given password generator scheme. So I wrote this simple script:

for i in seq 10000 ; do shuf -n4 /usr/share/dict/words | tr -d '\n' done > length

And then feed that data in the tool:

$ numpy-stats lengths { "max": 60, "mean": 33.883293722913464, "median": 34.0, "min": 14, "size": 143060, "std": 5.101490225062775 }

I am surprised that there isn't such a tool already: hopefully I am wrong and will just be pointed towards the better alternative in the comments here!

Safe Eyes

I added screensaver support to the new SafeEyes project, which I am considering as a replacement to the workrave project I have been using for years. I really like how the interruptions basically block the whole screen: way more effective than only blocking the keyboard, because all potential distractions go away.

One feature that is missing is keystrokes and mouse movement counting and of course an official Debian package, although the latter would be easy to fix because upstream already has an unofficial build. I am thinking of writing my own little tool to count keystrokes, since the overlap between SafeEyes and such a counter isn't absolutely necessary. This is something that workrave does, but there are "idle time" extensions in Xorg that do not need to count keystrokes. There are already certain tools to count input events, but none seem to do what I want (most of them are basically keyloggers). It would be an interesting test to see if it's possible to write something that would work both for Xorg and Wayland at the same time. Unfortunately, preliminary research show that:

  1. in Xorg, the only way to implement this is to sniff all events, ie. to implement a keylogger

  2. in Wayland, this is completely unsupported. it seems some compositors could implement such a counter, but then it means that this is compositor specific, or, in other words, unportable

So there is little hope here, which brings to my mind "painmeter" as an appropriate name for this future programming nightmare.

Ansible

I sent my first contribution to the ansible project with a small documentation fix. I had an eye opener recently when I discovered a GitLab ansible prototype that would manipulate GitLab settings. When I first discovered Ansible, I was frustrated by the YAML/Jinja DSL: it felt silly to write all this code in YAML when you are a Python developer. It was great to see reasonably well-written Python code that would do things and delegate the metadata storage (and only that!) to YAML, as opposed to using YAML as a DSL.

So I figured I would look at the Ansible documentation on how this works, but unfortunately, the Ansible documentation is severly lacking in this area. There are broken links (I only fixed one page) and missing pieces. For example, the developing plugins page doesn't explain how to program a plugin at all.

I was told on IRC that: "documentation around developing plugins is sparse in general. the code is the best documentation that exists (right now)". I didn't get a reply when asking which code in particular could provide good examples either. In comparison, Puppet has excellent documentation on how to create custom types, functions and facts. That is definitely a turn-off for a new contributor, but at least my pull request was merged in and I can only hope that seasoned Ansible contributors expand on this critical piece of documentation eventually.

Misc

As you can see, I'm all over the place, as usual. GitHub tells me I "Opened 13 other pull requests in 11 repositories" (emphasis mine), which I guess means on top of the "9 commits in 5 repositories" mentioned earlier. My profile probably tells a more detailed story that what would be useful to mention here. I should also mention how difficult it is to write those reports: I basically do a combination of looking into my GitHub and GitLab profiles, the last 30 days of emails and filesystem changes (!!). En vrac, a list of changes which may be of interest:

  • font-large (and its alias, font-small): shortcut to send the right escape sequence to rxvt so it changes its font
  • fix-acer: short script to hardcode the modeline (you remember those?!) for my screen which has a broken EDID pin (so autodetection fails, yay Xorg log files...)
  • ikiwiki-pandoc-quickie: fake ikiwiki renderer that (ab)uses pandoc to generate a HTML file with the right stylesheet to preview Markdown as it may look in this blog (the basic template is missing still)
  • git-annex-transfer: a command I've often been missing in git-annex, which is a way to transfer files between remotes without having to copy them locally (upstream feature request)
  • I linked the graphics of the Debian archive software architecture in the Debian wiki in the hope more people notice it.
  • I did some tweaks on my Taffybar to introduce a battery meter and hoping to have temperature sensors, which mostly failed. there's a pending pull request that may bring some sense into this, hopefully.
  • I made two small patches in Monkeysign to fix gpg.conf handling and multiple email output, a dumb bug I cannot believe anyone noticed or reported just yet. Thanks Valerie for the bug report! The upload of this in Debian is pending a review from the release team.
Categories: External Blogs

The supposed decline of copyleft

Anarcat - Wed, 08/23/2017 - 12:00

At DebConf17, John Sullivan, the executive director of the FSF, gave a talk on the supposed decline of the use of copyleft licenses use free-software projects. In his presentation, Sullivan questioned the notion that permissive licenses, like the BSD or MIT licenses, are gaining ground at the expense of the traditionally dominant copyleft licenses from the FSF. While there does seem to be a rise in the use of permissive licenses, in general, there are several possible explanations for the phenomenon.

When the rumor mill starts

Sullivan gave a recent example of the claim of the decline of copyleft in an article on Opensource.com by Jono Bacon from February 2017 that showed a histogram of license usage between 2010 and 2017 (seen below).

From that, Bacon elaborates possible reasons for the apparent decline of the GPL. The graphic used in the article was actually generated by Stephen O'Grady in a January article, The State Of Open Source Licensing, which said:

In Black Duck's sample, the most popular variant of the GPL – version 2 – is less than half as popular as it was (46% to 19%). Over the same span, the permissive MIT has gone from 8% share to 29%, while its permissive cousin the Apache License 2.0 jumped from 5% to 15%.

Sullivan, however, argued that the methodology used to create both articles was problematic. Neither contains original research: the graphs actually come from the Black Duck Software "KnowledgeBase" data, which was partly created from the old Ohloh web site now known as Open Hub.

To show one problem with the data, Sullivan mentioned two free-software projects, GNU Bash and GNU Emacs, that had been showcased on the front page of Ohloh.net in 2012. On the site, Bash was (and still is) listed as GPLv2+, whereas it changed to GPLv3 in 2011. He also claimed that "Emacs was listed as licensed under GPLv3-only, which is a license Emacs has never had in its history", although I wasn't able to verify that information from the Internet archive. Basically, according to Sullivan, "the two projects featured on the front page of a site that was using [the Black Duck] data set were wrong". This, in turn, seriously brings into question the quality of the data:

I reported this problem and we'll continue to do that but when someone is not sharing the data set that they're using for other people to evaluate it and we see glimpses of it which are incorrect, that should give us a lot of hesitation about accepting any conclusion that comes out of it.

Reproducible observations are necessary to the establishment of solid theories in science. Sullivan didn't try to contact Black Duck to get access to the database, because he assumed (rightly, as it turned out) that he would need to "pay for the data under terms that forbid you to share that information with anybody else". So I wrote Black Duck myself to confirm this information. In an email interview, Patrick Carey from Black Duck confirmed its data set is proprietary. He believes, however, that through a "combination of human and automated techniques", Black Duck is "highly confident at the accuracy and completeness of the data in the KnowledgeBase". He did point out, however, that "the way we track the data may not necessarily be optimal for answering the question on license use trend" as "that would entail examination of new open source projects coming into existence each year and the licenses used by them".

In other words, even according to Black Duck, its database may not be useful to establish the conclusions drawn by those articles. Carey did agree with those conclusions intuitively, however, saying that "there seems to be a shift toward Apache and MIT licenses in new projects, though I don't have data to back that up". He suggested that "an effective way to answer the trend question would be to analyze the new projects on GitHub over the last 5-10 years." Carey also suggested that "GitHub has become so dominant over the recent years that just looking at projects on GitHub would give you a reasonable sampling from which to draw conclusions".

Indeed, GitHub published a report in 2015 that also seems to confirm MIT's popularity (45%), surpassing copyleft licenses (24%). The data is, however, not without its own limitations. For example, in the above graph going back to the inception of GitHub in 2008, we see a rather abnormal spike in 2013, which seems to correlate with the launch of the choosealicense.com site, described by GitHub as "our first pass at making open source licensing on GitHub easier".

In his talk, Sullivan was critical of the initial version of the site which he described as biased toward permissive licenses. Because the GitHub project creation page links to the site, Sullivan explained that the site's bias could have actually influenced GitHub users' license choices. Following a talk from Sullivan at FOSDEM 2016, GitHub addressed the problem later that year by rewording parts of the front page to be more accurate, but that any change in license choice obviously doesn't show in the report produced in 2015 and won't affect choices users have already made. Therefore, there can be reasonable doubts that GitHub's subset of software projects may not actually be that representative of the larger free-software community.

In search of solid evidence

So it seems we are missing good, reproducible results to confirm or dispel these claims. Sullivan explained that it is a difficult problem, if only in the way you select which projects to analyze: the impact of a MIT-licensed personal wiki will obviously be vastly different from, say, a GPL-licensed C compiler or kernel. We may want to distinguish between active and inactive projects. Then there is the problem of code duplication, both across publication platforms (a project may be published on GitHub and SourceForge for example) but also across projects (code may be copy-pasted between projects). We should think about how to evaluate the license of a given project: different files in the same code base regularly have different licenses—often none at all. This is why having a clear, documented and publicly available data set and methodology is critical. Without this, the assumptions made are not clear and it is unreasonable to draw certain conclusions from the results.

It turns out that some researchers did that kind of open research in 2016 in a paper called "The Debsources Dataset: Two Decades of Free and Open Source Software" [PDF] by Matthieu Caneill, Daniel M. Germán, and Stefano Zacchiroli. The Debsources data set is the complete Debian source code that covers a large history of the Debian project and therefore includes thousands of free-software projects of different origins. According to the paper:

The long history of Debian creates a perfect subject to evaluate how FOSS licenses use has evolved over time, and the popularity of licenses currently in use.

Sullivan argued that the Debsources data set is interesting because of its quality: every package in Debian has been reviewed by multiple humans, including the original packager, but also by the FTP masters to ensure that the distribution can legally redistribute the software. The existence of a package in Debian provides a minimal "proof of use": unmaintained packages get removed from Debian on a regular basis and the mere fact that a piece of software gets packaged in Debian means at least some users found it important enough to work on packaging it. Debian packagers make specific efforts to avoid code duplication between packages in order to ease security maintenance. The data set covers a period longer than Black Duck's or GitHub's, as it goes all the way back to the Hamm 2.0 release in 1998. The data and how to reproduce it are freely available under a CC BY-SA 4.0 license.

Sullivan presented the above graph from the research paper that showed the evolution of software license use in the Debian archive. Whereas previous graphs showed statistics in percentages, this one showed actual absolute numbers, where we can't actually distinguish a decline in copyleft licenses. To quote the paper again:

The top license is, once again, GPL-2.0+, followed by: Artistic-1.0/GPL dual-licensing (the licensing choice of Perl and most Perl libraries), GPL-3.0+, and Apache-2.0.

Indeed, looking at the graph, at most do we see a rise of the Apache and MIT licenses and no decline of the GPL per se, although its adoption does seem to slow down in recent years. We should also mention the possibility that Debian's data set has the opposite bias: toward GPL software. The Debian project is culturally quite different from the GitHub community and even the larger free-software ecosystem, naturally, which could explain the disparity in the results. We can only hope a similar analysis can be performed on the much larger Software Heritage data set eventually, which may give more representative results. The paper acknowledges this problem:

Debian is likely representative of enterprise use of FOSS as a base operating system, where stable, long-term and seldomly updated software products are desirable. Conversely Debian is unlikely representative of more dynamic FOSS environments (e.g., modern Web-development with micro libraries) where users, who are usually developers themselves, expect to receive library updates on a daily basis.

The Debsources research also shares methodology limitations with Black Duck: while Debian packages are reviewed before uploading and we can rely on the copyright information provided by Debian maintainers, the research also relies on automated tools (specifically FOSSology) to retrieve license information.

Sullivan also warned against "ascribing reason to numbers": people may have different reasons for choosing a particular license. Developers may choose the MIT license because it has fewer words, for compatibility reasons, or simply because "their lawyers told them to". It may not imply an actual deliberate philosophical or ideological choice.

Finally, he brought up the theory that the rise of non-copyleft licenses isn't necessarily at the detriment of the GPL. He explained that, even if there is an actual decline, it may not be much of a problem if there is an overall growth of free software to the detriment of proprietary software. He reminded the audience that non-copyleft licenses are still free software, according to the FSF and the Debian Free Software Guidelines, so their rise is still a positive outcome. Even if the GPL is a better tool to accomplish the goal of a free-software world, we can all acknowledge that the conversion of proprietary software to more permissive—and certainly simpler—licenses is definitely heading in the right direction.

[I would like to thank the DebConf organizers for providing meals for me during the conference.]

Note: this article first appeared in the Linux Weekly News.

Categories: External Blogs

The supposed decline of copyleft

Anarcat - Wed, 08/23/2017 - 12:00

At DebConf17, John Sullivan, the executive director of the FSF, gave a talk on the supposed decline of the use of copyleft licenses use free-software projects. In his presentation, Sullivan questioned the notion that permissive licenses, like the BSD or MIT licenses, are gaining ground at the expense of the traditionally dominant copyleft licenses from the FSF. While there does seem to be a rise in the use of permissive licenses, in general, there are several possible explanations for the phenomenon.

When the rumor mill starts

Sullivan gave a recent example of the claim of the decline of copyleft in an article on Opensource.com by Jono Bacon from February 2017 that showed a histogram of license usage between 2010 and 2017 (seen below).

From that, Bacon elaborates possible reasons for the apparent decline of the GPL. The graphic used in the article was actually generated by Stephen O'Grady in a January article, The State Of Open Source Licensing, which said:

In Black Duck's sample, the most popular variant of the GPL – version 2 – is less than half as popular as it was (46% to 19%). Over the same span, the permissive MIT has gone from 8% share to 29%, while its permissive cousin the Apache License 2.0 jumped from 5% to 15%.

Sullivan, however, argued that the methodology used to create both articles was problematic. Neither contains original research: the graphs actually come from the Black Duck Software "KnowledgeBase" data, which was partly created from the old Ohloh web site now known as Open Hub.

To show one problem with the data, Sullivan mentioned two free-software projects, GNU Bash and GNU Emacs, that had been showcased on the front page of Ohloh.net in 2012. On the site, Bash was (and still is) listed as GPLv2+, whereas it changed to GPLv3 in 2011. He also claimed that "Emacs was listed as licensed under GPLv3-only, which is a license Emacs has never had in its history", although I wasn't able to verify that information from the Internet archive. Basically, according to Sullivan, "the two projects featured on the front page of a site that was using [the Black Duck] data set were wrong". This, in turn, seriously brings into question the quality of the data:

I reported this problem and we'll continue to do that but when someone is not sharing the data set that they're using for other people to evaluate it and we see glimpses of it which are incorrect, that should give us a lot of hesitation about accepting any conclusion that comes out of it.

Reproducible observations are necessary to the establishment of solid theories in science. Sullivan didn't try to contact Black Duck to get access to the database, because he assumed (rightly, as it turned out) that he would need to "pay for the data under terms that forbid you to share that information with anybody else". So I wrote Black Duck myself to confirm this information. In an email interview, Patrick Carey from Black Duck confirmed its data set is proprietary. He believes, however, that through a "combination of human and automated techniques", Black Duck is "highly confident at the accuracy and completeness of the data in the KnowledgeBase". He did point out, however, that "the way we track the data may not necessarily be optimal for answering the question on license use trend" as "that would entail examination of new open source projects coming into existence each year and the licenses used by them".

In other words, even according to Black Duck, its database may not be useful to establish the conclusions drawn by those articles. Carey did agree with those conclusions intuitively, however, saying that "there seems to be a shift toward Apache and MIT licenses in new projects, though I don't have data to back that up". He suggested that "an effective way to answer the trend question would be to analyze the new projects on GitHub over the last 5-10 years." Carey also suggested that "GitHub has become so dominant over the recent years that just looking at projects on GitHub would give you a reasonable sampling from which to draw conclusions".

Indeed, GitHub published a report in 2015 that also seems to confirm MIT's popularity (45%), surpassing copyleft licenses (24%). The data is, however, not without its own limitations. For example, in the above graph going back to the inception of GitHub in 2008, we see a rather abnormal spike in 2013, which seems to correlate with the launch of the choosealicense.com site, described by GitHub as "our first pass at making open source licensing on GitHub easier".

In his talk, Sullivan was critical of the initial version of the site which he described as biased toward permissive licenses. Because the GitHub project creation page links to the site, Sullivan explained that the site's bias could have actually influenced GitHub users' license choices. Following a talk from Sullivan at FOSDEM 2016, GitHub addressed the problem later that year by rewording parts of the front page to be more accurate, but that any change in license choice obviously doesn't show in the report produced in 2015 and won't affect choices users have already made. Therefore, there can be reasonable doubts that GitHub's subset of software projects may not actually be that representative of the larger free-software community.

In search of solid evidence

So it seems we are missing good, reproducible results to confirm or dispel these claims. Sullivan explained that it is a difficult problem, if only in the way you select which projects to analyze: the impact of a MIT-licensed personal wiki will obviously be vastly different from, say, a GPL-licensed C compiler or kernel. We may want to distinguish between active and inactive projects. Then there is the problem of code duplication, both across publication platforms (a project may be published on GitHub and SourceForge for example) but also across projects (code may be copy-pasted between projects). We should think about how to evaluate the license of a given project: different files in the same code base regularly have different licenses—often none at all. This is why having a clear, documented and publicly available data set and methodology is critical. Without this, the assumptions made are not clear and it is unreasonable to draw certain conclusions from the results.

It turns out that some researchers did that kind of open research in 2016 in a paper called "The Debsources Dataset: Two Decades of Free and Open Source Software" [PDF] by Matthieu Caneill, Daniel M. Germán, and Stefano Zacchiroli. The Debsources data set is the complete Debian source code that covers a large history of the Debian project and therefore includes thousands of free-software projects of different origins. According to the paper:

The long history of Debian creates a perfect subject to evaluate how FOSS licenses use has evolved over time, and the popularity of licenses currently in use.

Sullivan argued that the Debsources data set is interesting because of its quality: every package in Debian has been reviewed by multiple humans, including the original packager, but also by the FTP masters to ensure that the distribution can legally redistribute the software. The existence of a package in Debian provides a minimal "proof of use": unmaintained packages get removed from Debian on a regular basis and the mere fact that a piece of software gets packaged in Debian means at least some users found it important enough to work on packaging it. Debian packagers make specific efforts to avoid code duplication between packages in order to ease security maintenance. The data set covers a period longer than Black Duck's or GitHub's, as it goes all the way back to the Hamm 2.0 release in 1998. The data and how to reproduce it are freely available under a CC BY-SA 4.0 license.

Sullivan presented the above graph from the research paper that showed the evolution of software license use in the Debian archive. Whereas previous graphs showed statistics in percentages, this one showed actual absolute numbers, where we can't actually distinguish a decline in copyleft licenses. To quote the paper again:

The top license is, once again, GPL-2.0+, followed by: Artistic-1.0/GPL dual-licensing (the licensing choice of Perl and most Perl libraries), GPL-3.0+, and Apache-2.0.

Indeed, looking at the graph, at most do we see a rise of the Apache and MIT licenses and no decline of the GPL per se, although its adoption does seem to slow down in recent years. We should also mention the possibility that Debian's data set has the opposite bias: toward GPL software. The Debian project is culturally quite different from the GitHub community and even the larger free-software ecosystem, naturally, which could explain the disparity in the results. We can only hope a similar analysis can be performed on the much larger Software Heritage data set eventually, which may give more representative results. The paper acknowledges this problem:

Debian is likely representative of enterprise use of FOSS as a base operating system, where stable, long-term and seldomly updated software products are desirable. Conversely Debian is unlikely representative of more dynamic FOSS environments (e.g., modern Web-development with micro libraries) where users, who are usually developers themselves, expect to receive library updates on a daily basis.

The Debsources research also shares methodology limitations with Black Duck: while Debian packages are reviewed before uploading and we can rely on the copyright information provided by Debian maintainers, the research also relies on automated tools (specifically FOSSology) to retrieve license information.

Sullivan also warned against "ascribing reason to numbers": people may have different reasons for choosing a particular license. Developers may choose the MIT license because it has fewer words, for compatibility reasons, or simply because "their lawyers told them to". It may not imply an actual deliberate philosophical or ideological choice.

Finally, he brought up the theory that the rise of non-copyleft licenses isn't necessarily at the detriment of the GPL. He explained that, even if there is an actual decline, it may not be much of a problem if there is an overall growth of free software to the detriment of proprietary software. He reminded the audience that non-copyleft licenses are still free software, according to the FSF and the Debian Free Software Guidelines, so their rise is still a positive outcome. Even if the GPL is a better tool to accomplish the goal of a free-software world, we can all acknowledge that the conversion of proprietary software to more permissive—and certainly simpler—licenses is definitely heading in the right direction.

[I would like to thank the DebConf organizers for providing meals for me during the conference.]

Note: this article first appeared in the Linux Weekly News.

Categories: External Blogs

My free software activities, July 2017

Anarcat - Sat, 07/29/2017 - 16:16
Debian Long Term Support (LTS)

This is my monthly working on Debian LTS. This time I worked on various hairy issues surrounding ca-certificates, unattended-upgrades, apache2 regressions, libmtp, tcpdump and ipsec-tools.

ca-certificates updates

I've been working on the removal of the Wosign and StartCom certificates (Debian bug #858539) and, in general, the synchronisation of ca-certificates across suites (Debian bug #867461) since at least last march. I have made an attempt at summarizing the issue which led to a productive discussion and it seems that, in the end, the maintainer will take care of synchronizing information across suites.

Guido was right in again raising the question of synchronizing NSS across all suites (Debian bug #824872) which itself raised the other question of how to test reverse dependencies. This brings me back to Debian bug #817286 which, basically proposed the idea of having "proposed updates" for security issues. The problem is while we can upload test packages to stable proposed-updates, we can't do the same in LTS because the suite is closed and we operate only on security packages. This issue came up before in other security upload and we need to think better about how to solve this.

unattended-upgrades

Speaking of security upgrades brings me to the question of a bug (Debian bug #867169) that was filed against the wheezy version of unattended-upgrades, which showed that the package simply stopped working since the latest stable release, because wheezy became "oldoldstable". I first suggested using the "codename" but that appears to have been introduced only after wheezy.

In the end, I proposed a simple update that would fix the configuration files and uploaded this as DLA-1032-1. This is thankfully fixed in later releases and will not require such hackery when jessie becomes LTS as well.

libmtp

Next up is the work on the libmtp vulnerabilities (CVE-2017-9831 and CVE-2017-9832). As I described in my announcement, the work to backport the patch was huge, as upstream basically backported a whole library from the gphoto2 package to fix those issues (and probably many more). The lack of a test suite made it difficult to trust my own work, but given that I had no (negative) feedback, I figured it was okay to simply upload the result and that became DLA-1029-1.

tcpdump

I then looked at reproducing CVE-2017-11108, a heap overflow triggered tcpdump would parse specifically STP packets. In Debian bug #867718, I described how to reproduce the issue across all suites and opened an issue upstream, given that the upstream maintainers hadn't responded responded in weeks according to notes in the RedHat Bugzilla issue. I eventually worked on a patch which I shared upstream, but that was rejected as they were already working on it in their embargoed repository.

I can explain this confusion and duplication of work with:

  1. the original submitter didn't really contact security@tcpdump.org
  2. he did and they didn't reply, being just too busy
  3. they replied and he didn't relay that information back

I think #2 is most likely: the tcpdump.org folks are probably very busy with tons of reports like this. Still, I should probably have contacted security@tcpdump.org directly before starting my work, even though no harm was done because I didn't divulge issues that were already public.

Since then, tcpdump has released 4.9.1 which fixes the issue, but then new CVEs came out that will require more work and probably another release. People looking into this issue must be certain to coordinate with the tcpdump security team before fixing the actual issues.

ipsec-tools

Another package that didn't quite have a working solution is the ipsec-tools suite, in which the racoon daemon was vulnerable to a remotely-triggered DOS attack (CVE-2016-10396). I reviewed and fixed the upstream patch which introduced a regression. Unfortunately, there is no test suite or proof of concept to control the results.

The reality is that ipsec-tools is really old, and should maybe simply be removed from Debian, in favor of strongswan. Upstream hasn't done a release in years and various distributions have patched up forks of those to keep it alive... I was happy, however, to know that a maintainer will take care of updating the various suites, including LTS, with my improved patch. So this fixes the issue for now, but I would strongly encourage users to switch away from ipsec-tools in the future.

apache2

Finally, I was bitten by the old DLA-841-1 upload I did all the way back in February, as it introduced a regression (Debian bug #858373). It turns out it was possible to segfault Apache workers with a trivial HTTP request, in certain (rather exotic, I might add) configurations (ErrorDocument 400 directive pointing to a cgid script in worker mode).

Still, it was a serious regression and I found a part of the nasty long patch we worked on back then that was faulty, and introduced a small fix to correct that. The proposed package unfortunately didn't yield any feedback, and I can only assume it will work okay for people. The result is the DLA-841-2 upload which fixes the regression. I unfortunately didn't have time to work on the remaining CVEs affecting apache2 in LTS at the time of writing.

Triage

I also did some miscellaneous triage by filing Debian bug #867477 for poppler in an effort to document better the pending issue.

Next up was some minor work on eglibc issues. CVE-2017-8804 has a patch, but it's been disputed. since the main victim of this and the core of the vulnerability (rpcbind) has already been fixed, I am not sure this vulnerability is still a thing in LTS at all.

I also looked at CVE-2014-9984, but the code is so different in wheezy that I wonder if LTS is affected at all. Unfortunately, the eglibc gymnastics are a little beyond me and I do not feel confident enough to just push those issues aside for now and let them open for others to look at.

Other free software work

And of course, there's my usual monthly volunteer work. My ratio is a little better this time, having reached an about even ratio between paid and volunteer work, whereas this was 60% volunteer work in march.

Announcing ecdysis

I recently published ecdysis, a set of template and code samples that I frequently reuse across project. This is probably the least pronounceable project name I have ever chosen, but this is somewhat on purpose. The goal of this project is not collaboration or to become a library: it's just a personal project which I share with the world as a curiosity.

To quote the README file:

The name comes from what snakes and other animals do to "create a new snake": they shed their skin. This is not so appropriate for snakes, as it's just a way to rejuvenate their skin, but is especially relevant for anthropods since then "ecdysis" may be associated with a metamorphosis:

Ecdysis is the moulting of the cuticle in many invertebrates of the clade Ecdysozoa. Since the cuticle of these animals typically forms a largely inelastic exoskeleton, it is shed during growth and a new, larger covering is formed. The remnants of the old, empty exoskeleton are called exuviae. — Wikipedia

So this project is metamorphosed into others when the documentation templates, code examples and so on are reused elsewhere. For that reason, the license is an unusally liberal (for me) MIT/Expat license.

The name also has the nice property of being absolutely unpronounceable, which makes it unlikely to be copied but easy to search online.

It was an interesting exercise to go back into older projects and factor out interesting code. The process is not complete yet, as there are older projects I'm still curious in reviewing. A bunch of that code could also be factored into upstream project and maybe even the Python standard library.

In short, this is stuff I keep on forgetting how to do: a proper setup.py config, some fancy argparse extensions and so on. Instead of having to remember where I had written that clever piece of code, I now shove it in the crazy chaotic project where I can find it again in the future.

Beets experiments

Since I started using Subsonic (or Libresonic) to manage the music on my phone, album covers are suddenly way more interesting. But my collection so far has had limited album covers: my other media player (gmpc) would download those on the fly on its own and store them in its own database - not on the filesystem. I guess this could be considered to be a limitation of Subsonic, but I actually appreciate the separation of duty here. Garbage in, garbage out: the quality of Subsonic's rendering depends largely on how well setup your library and tags are.

It turns out there is an amazing tool called beets to do exactly that kind of stuff. I originally discarded that "media library management system for obsessive-compulsive [OC] music geeks", trying to convince myself i was not an "OC music geek". Turns out I am. Oh well.

Thanks to beets, I was able to download album covers for a lot of the albums in my collection. The only covers that are missing now are albums that are not correctly tagged and that beets couldn't automatically fix up. I still need to go through those and fix all those tags, but the first run did an impressive job at getting album covers.

Then I got the next crazy idea: after a camping trip where we forgot (again) the lyrics to Georges Brassens, I figured I could start putting some lyrics on my ebook reader. "How hard can that be?" of course, being the start of another crazy project. A pull request and 3 days later, I had something that could turn a beets lyrics database into a Sphinx document which, in turn, can be turned into an ePUB. In the process, I probably got blocked from MusixMatch a hundred times, but it's done. Phew!

The resulting e-book is about 8000 pages long, but is still surprisingly responsive. In the process, I also happened to do a partial benchmark of Python's bloom filter libraries. The biggest surprise there was the performance of the set builtin: for small items, it is basically as fast as a bloom filter. Of course, when the item size grows larger, its memory usage explodes, but in this case it turned out to be sufficient and bloom filter completely overkill and confusing.

Oh, and thanks to those efforts, I got admitted in the beetbox organization on GitHub! I am not sure what I will do with that newfound power: I was just scratching an itch, really. But hopefully I'll be able to help here and there in the future as well.

Debian package maintenance

I did some normal upkeep on a bunch of my packages this month, that were long overdue:

  • uploaded slop 6.3.47-1: major new upstream release
  • uploaded an NMU for maim 5.4.64-1.1: maim was broken by the slop release
  • uploaded pv 1.6.6-1: new upstream release
  • uploaded kedpm 1.0+deb8u1 to jessie (oldstable): one last security fix (Debian bug #860817, CVE-2017-8296) for that derelict password manager
  • uploaded charybdis 3.5.5-1: new minor upstream release, with optional support for mbedtls
  • filed Debian bug #866786 against cryptsetup to make the remote initramfs SSH-based unlocking support multiple devices: thanks to the maintainer, this now works flawlessly in buster and may be backported to stretch
  • expanded on Debian bug #805414 against gdm3 and Debian bug #845938 against pulseaudio, because I had trouble connecting my computer to this new Bluetooth speaker. turns out this is a known issue in Pulseaudio: whereas it releases ALSA devices, it doesn't release Bluetooth devices properly. Documented this more clearly in the wiki page
  • filed Debian bug #866790 regarding old stray Apparmor profiles that were lying around my system after an upgrade, which got me interested in Debian bug #830502 in turn
  • filed Debian bug #868728 against cups regarding a weird behavior I had interacting with a network printer. turns out the other workstation was misconfigured... why are printers still so hard?
  • filed Debian bug #870102 to automate sbuild schroots upgrades
  • after playing around with rash tried to complete the packaging (Debian bug #754972) of percol with this pull request upstream. this ended up to be way too much overhead and I reverted to my old normal history habits.
Categories: External Blogs

My free software activities, July 2017

Anarcat - Sat, 07/29/2017 - 16:16
Debian Long Term Support (LTS)

This is my monthly working on Debian LTS. This time I worked on various hairy issues surrounding ca-certificates, unattended-upgrades, apache2 regressions, libmtp, tcpdump and ipsec-tools.

ca-certificates updates

I've been working on the removal of the Wosign and StartCom certificates (Debian bug #858539) and, in general, the synchronisation of ca-certificates across suites (Debian bug #867461) since at least last march. I have made an attempt at summarizing the issue which led to a productive discussion and it seems that, in the end, the maintainer will take care of synchronizing information across suites.

Guido was right in again raising the question of synchronizing NSS across all suites (Debian bug #824872) which itself raised the other question of how to test reverse dependencies. This brings me back to Debian bug #817286 which, basically proposed the idea of having "proposed updates" for security issues. The problem is while we can upload test packages to stable proposed-updates, we can't do the same in LTS because the suite is closed and we operate only on security packages. This issue came up before in other security upload and we need to think better about how to solve this.

unattended-upgrades

Speaking of security upgrades brings me to the question of a bug (Debian bug #867169) that was filed against the wheezy version of unattended-upgrades, which showed that the package simply stopped working since the latest stable release, because wheezy became "oldoldstable". I first suggested using the "codename" but that appears to have been introduced only after wheezy.

In the end, I proposed a simple update that would fix the configuration files and uploaded this as DLA-1032-1. This is thankfully fixed in later releases and will not require such hackery when jessie becomes LTS as well.

libmtp

Next up is the work on the libmtp vulnerabilities (CVE-2017-9831 and CVE-2017-9832). As I described in my announcement, the work to backport the patch was huge, as upstream basically backported a whole library from the gphoto2 package to fix those issues (and probably many more). The lack of a test suite made it difficult to trust my own work, but given that I had no (negative) feedback, I figured it was okay to simply upload the result and that became DLA-1029-1.

tcpdump

I then looked at reproducing CVE-2017-11108, a heap overflow triggered tcpdump would parse specifically STP packets. In Debian bug #867718, I described how to reproduce the issue across all suites and opened an issue upstream, given that the upstream maintainers hadn't responded responded in weeks according to notes in the RedHat Bugzilla issue. I eventually worked on a patch which I shared upstream, but that was rejected as they were already working on it in their embargoed repository.

I can explain this confusion and duplication of work with:

  1. the original submitter didn't really contact security@tcpdump.org
  2. he did and they didn't reply, being just too busy
  3. they replied and he didn't relay that information back

I think #2 is most likely: the tcpdump.org folks are probably very busy with tons of reports like this. Still, I should probably have contacted security@tcpdump.org directly before starting my work, even though no harm was done because I didn't divulge issues that were already public.

Since then, tcpdump has released 4.9.1 which fixes the issue, but then new CVEs came out that will require more work and probably another release. People looking into this issue must be certain to coordinate with the tcpdump security team before fixing the actual issues.

ipsec-tools

Another package that didn't quite have a working solution is the ipsec-tools suite, in which the racoon daemon was vulnerable to a remotely-triggered DOS attack (CVE-2016-10396). I reviewed and fixed the upstream patch which introduced a regression. Unfortunately, there is no test suite or proof of concept to control the results.

The reality is that ipsec-tools is really old, and should maybe simply be removed from Debian, in favor of strongswan. Upstream hasn't done a release in years and various distributions have patched up forks of those to keep it alive... I was happy, however, to know that a maintainer will take care of updating the various suites, including LTS, with my improved patch. So this fixes the issue for now, but I would strongly encourage users to switch away from ipsec-tools in the future.

apache2

Finally, I was bitten by the old DLA-841-1 upload I did all the way back in February, as it introduced a regression (Debian bug #858373). It turns out it was possible to segfault Apache workers with a trivial HTTP request, in certain (rather exotic, I might add) configurations (ErrorDocument 400 directive pointing to a cgid script in worker mode).

Still, it was a serious regression and I found a part of the nasty long patch we worked on back then that was faulty, and introduced a small fix to correct that. The proposed package unfortunately didn't yield any feedback, and I can only assume it will work okay for people. The result is the DLA-841-2 upload which fixes the regression. I unfortunately didn't have time to work on the remaining CVEs affecting apache2 in LTS at the time of writing.

Triage

I also did some miscellaneous triage by filing Debian bug #867477 for poppler in an effort to document better the pending issue.

Next up was some minor work on eglibc issues. CVE-2017-8804 has a patch, but it's been disputed. since the main victim of this and the core of the vulnerability (rpcbind) has already been fixed, I am not sure this vulnerability is still a thing in LTS at all.

I also looked at CVE-2014-9984, but the code is so different in wheezy that I wonder if LTS is affected at all. Unfortunately, the eglibc gymnastics are a little beyond me and I do not feel confident enough to just push those issues aside for now and let them open for others to look at.

Other free software work

And of course, there's my usual monthly volunteer work. My ratio is a little better this time, having reached an about even ratio between paid and volunteer work, whereas this was 60% volunteer work in march.

Announcing ecdysis

I recently published ecdysis, a set of template and code samples that I frequently reuse across project. This is probably the least pronounceable project name I have ever chosen, but this is somewhat on purpose. The goal of this project is not collaboration or to become a library: it's just a personal project which I share with the world as a curiosity.

To quote the README file:

The name comes from what snakes and other animals do to "create a new snake": they shed their skin. This is not so appropriate for snakes, as it's just a way to rejuvenate their skin, but is especially relevant for anthropods since then "ecdysis" may be associated with a metamorphosis:

Ecdysis is the moulting of the cuticle in many invertebrates of the clade Ecdysozoa. Since the cuticle of these animals typically forms a largely inelastic exoskeleton, it is shed during growth and a new, larger covering is formed. The remnants of the old, empty exoskeleton are called exuviae. — Wikipedia

So this project is metamorphosed into others when the documentation templates, code examples and so on are reused elsewhere. For that reason, the license is an unusally liberal (for me) MIT/Expat license.

The name also has the nice property of being absolutely unpronounceable, which makes it unlikely to be copied but easy to search online.

It was an interesting exercise to go back into older projects and factor out interesting code. The process is not complete yet, as there are older projects I'm still curious in reviewing. A bunch of that code could also be factored into upstream project and maybe even the Python standard library.

In short, this is stuff I keep on forgetting how to do: a proper setup.py config, some fancy argparse extensions and so on. Instead of having to remember where I had written that clever piece of code, I now shove it in the crazy chaotic project where I can find it again in the future.

Beets experiments

Since I started using Subsonic (or Libresonic) to manage the music on my phone, album covers are suddenly way more interesting. But my collection so far has had limited album covers: my other media player (gmpc) would download those on the fly on its own and store them in its own database - not on the filesystem. I guess this could be considered to be a limitation of Subsonic, but I actually appreciate the separation of duty here. Garbage in, garbage out: the quality of Subsonic's rendering depends largely on how well setup your library and tags are.

It turns out there is an amazing tool called beets to do exactly that kind of stuff. I originally discarded that "media library management system for obsessive-compulsive [OC] music geeks", trying to convince myself i was not an "OC music geek". Turns out I am. Oh well.

Thanks to beets, I was able to download album covers for a lot of the albums in my collection. The only covers that are missing now are albums that are not correctly tagged and that beets couldn't automatically fix up. I still need to go through those and fix all those tags, but the first run did an impressive job at getting album covers.

Then I got the next crazy idea: after a camping trip where we forgot (again) the lyrics to Georges Brassens, I figured I could start putting some lyrics on my ebook reader. "How hard can that be?" of course, being the start of another crazy project. A pull request and 3 days later, I had something that could turn a beets lyrics database into a Sphinx document which, in turn, can be turned into an ePUB. In the process, I probably got blocked from MusixMatch a hundred times, but it's done. Phew!

The resulting e-book is about 8000 pages long, but is still surprisingly responsive. In the process, I also happened to do a partial benchmark of Python's bloom filter libraries. The biggest surprise there was the performance of the set builtin: for small items, it is basically as fast as a bloom filter. Of course, when the item size grows larger, its memory usage explodes, but in this case it turned out to be sufficient and bloom filter completely overkill and confusing.

Oh, and thanks to those efforts, I got admitted in the beetbox organization on GitHub! I am not sure what I will do with that newfound power: I was just scratching an itch, really. But hopefully I'll be able to help here and there in the future as well.

Debian package maintenance

I did some normal upkeep on a bunch of my packages this month, that were long overdue:

  • uploaded slop 6.3.47-1: major new upstream release
  • uploaded an NMU for maim 5.4.64-1.1: maim was broken by the slop release
  • uploaded pv 1.6.6-1: new upstream release
  • uploaded kedpm 1.0+deb8u1 to jessie (oldstable): one last security fix (Debian bug #860817, CVE-2017-8296) for that derelict password manager
  • uploaded charybdis 3.5.5-1: new minor upstream release, with optional support for mbedtls
  • filed Debian bug #866786 against cryptsetup to make the remote initramfs SSH-based unlocking support multiple devices: thanks to the maintainer, this now works flawlessly in buster and may be backported to stretch
  • expanded on Debian bug #805414 against gdm3 and Debian bug #845938 against pulseaudio, because I had trouble connecting my computer to this new Bluetooth speaker. turns out this is a known issue in Pulseaudio: whereas it releases ALSA devices, it doesn't release Bluetooth devices properly. Documented this more clearly in the wiki page
  • filed Debian bug #866790 regarding old stray Apparmor profiles that were lying around my system after an upgrade, which got me interested in Debian bug #830502 in turn
  • filed Debian bug #868728 against cups regarding a weird behavior I had interacting with a network printer. turns out the other workstation was misconfigured... why are printers still so hard?
  • filed Debian bug #870102 to automate sbuild schroots upgrades
  • after playing around with rash tried to complete the packaging (Debian bug #754972) of percol with this pull request upstream. this ended up to be way too much overhead and I reverted to my old normal history habits.
Categories: External Blogs

Epic Lameness

Eric Dorland - Mon, 09/01/2008 - 17:26
SF.net now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Categories: External Blogs

Sentiment Sharing

Eric Dorland - Mon, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Categories: External Blogs
Syndicate content