Skip to main content

External Blogs

January 2019 report: LTS, Mailman 3, Vero 4k, Kubernetes, Undertime, Monkeysign, oh my!

Anarcat - mer, 02/06/2019 - 10:32

January is often a long month in our northern region. Very cold, lots of snow, which can mean a lot of fun as well. But it's also a great time to cocoon (or maybe hygge?) in front of the computer and do great things. I think the last few weeks were particularly fruitful which lead to this rather lengthy report, which I hope will be nonetheless interesting.

So grab some hot coco, a coffee, tea or whatever warm beverage (or cool if you're in the southern hemisphere) and hopefully you'll learn awesome things. I know I did.

Free software volunteer work

As always, the vast majority of my time was actually spent volunteering on various projects, while scrambling near the end of the month to work on paid stuff. For the first time here I mention my Kubernetes work, but I've also worked on the new Mailman 3 packages, my monkeysign and undertime packages (including a new configuration file support for argparse), random Debian work, and Golang packaging. Oh, and I bought a new toy for my home cinema, which I warmly recommend.

Kubernetes research

While I've written multiple articles on Kubernetes for LWN in the past, I am somewhat embarrassed to say that I don't have much experience running Kubernetes itself for real out there. But for a few months, with a group of fellow sysadmins, we've been exploring various container solutions and gravitated naturally towards Kubernetes. In the last month, I particularly worked on deploying a Ceph cluster with Rook, a tool to deploy storage solutions on a Kubernetes cluster (submitting a patch while I was there). Like many things in Kubernetes, Rook is shipped as a Helm chart, more specifically as an "operator", which might be described (if I understand this right) as a container that talks with Kubernetes to orchestrate other containers.

We've similarly worked on containerizing Nextcloud, which proved to be pretty shitty at behaving like a "cloud" application: secrets and dynamic data and configuration are all mixed up in the config directory, which makes it really hard to manage sanely in a container environment. The only way we found it could work was to mount configuration as a volume, which means configuration becomes data and can't be controled through git. Which is bad. This is also how the proposed Nextcloud Helm solves this problem (on which I've provided a review), for what it's worth.

We've also worked on integrating GitLab in our workflow, so that we keep configuration as code and deploy on pushes. While GitLab talks a lot about Kubernetes integration, the actual integration features aren't that great: unless I totally misunderstood how it's supposed to work, it seems you need to provide your own container and run kubectl from it, using the tokens provided by GitLab. And if you want to do anything of significance, you will probably need to give GitLab cluster access to your Kubernetes cluster, which kind of freaks me out considering the number of security issues that keep coming out with GitLab recently.

In general, I must say I was very skeptical of Kubernetes when I first attended those conferences: too much hype, buzzwords and suits. I felt that Google just threw us a toy project to play with while they kept the real stuff to themselves. I don't think that analysis is wrong, but I do think Kubernetes has something to offer, especially for organizations still stuck in the "shared hosting" paradigm where you give users a shell account or (S?!)FTP access and run mod_php on top. Containers at least provide some level of isolation out of the box and make such multi-tenant offerings actually reasonable and much more scalable. With a little work, we've been able to setup a fully redundant and scalable storage cluster and Nextcloud service: doing this from scratch wouldn't be that hard either, but it would have been done only for Nextcloud. The trick is the knowledge and experience we gained by doing this with Nextcloud will be useful for all the other apps we'll be hosting in the future. So I think there's definitely something there.

Debian work

I participated in the Montreal BSP, of which Louis-Philippe Véronneau made a good summary. I also sponsored a few uploads and fixed a few bugs. We didn't fix that many bugs, but I gave two workshops, including my now well-tuned packaging 101 workshop, which seems to be always quite welcome. I really wish I could make a video of that talk, because I think it's useful in going through the essentials of Debian packaging and could use a wider audience. In the meantime, my reference documentation is the best you can get.

I've decided to let bugs-everywhere die in Debian. There's a release critical bug and it seems no one is really using this anymore, at least I'm not. I would probably orphan the package once it gets removed from buster, but I'm not actually the maintainer, just an uploader... A promising alternative to BE seems to be git-bug, with support for synchronization with GitHub issues.

I've otherwise tried to get my figurative "house" of Debian packages in order for the upcoming freeze, which meant new updates for

I've also sponsored the introduction of web-mode (RFS #921130) a nice package to edit HTML in Emacs and filed the usual barrage of bug reports and patches.

Elegant argparse configfile support and new date parser for undertime

I've issued two new releases for my undertime project which helps users coordinate meetings across timezones. I first started working on improvingthe date parser which mostly involved finding a new library to handle dates. I started using dateparser which behaves slightly better, and I ended up packaging it for Debian as well although I still have to re-upload undertime to use the new dependency.

That was a first 1.6.0 release, but that wasn't enough - my users wanted a configuration file! I ended up designing a simple, YAML-based configuration file parser that integrates quite well with argparse, after finding too many issues with existing solutions like Configargparse. I summarized those for the certbot project which suffered from similar issues. I'm quite happy with my small, elegant solution for config file support. It is significantly better than the one I used for Monkeysign which was (ab)using the fromfile option of argparse.

Mailman 3

Motivated by this post extolling the virtues of good old mailing lists to resist social media hegemony, I did a lot (too much) work on installing Mailman 3 on my own server. I have ran Mailman 2 mailing lists for hundreds of clients in my previous job at Koumbit and I have so far used my access there to host a few mailing lists. This time, I wanted to try something new and figured Mailman 3 might have been ready after 4 years since the 3.0 release and almost 10 years since the project started.

How wrong I was! Many things don't work: there is no french translation at all (nor any other translation, for that matter), no invite feature, templates translation is buggy, the Debian backport fails with the MySQL version in stable... it's a mess. The complete history of my failure is better documented in mail.

I worked around many of those issues. I like the fact that I was almost able to replace the missing "invite" feature through the API and there Mailman 3 is much better to look at than the older version. They did fix a lot of things and I absolutely love the web interface which allows users to interact with the mailing list as a forum. But maybe it will take a bit more time before it's ready for my use case.

Right now, I'm hesitant: either I go with a mailing list to connect with friends and family. It works with everyone because everyone uses email, if only for their password resets. The alternative is to use something like a (private?) Discourse instance, which could also double as a comments provider for my blog if I ever decide to switch away from Ikiwiki... Neither seems like a good solution, and both require extra work and maintenance, Discourse particularly so because it is very unlikely it will get shipped as a Debian package.

Vero: my new home cinema box

Speaking of Discourse, the reason I'm thinking about it is I am involved in many online forums running it. It's generally a great experience, although I wish email integration was mandatory - it's great to be able to reply through your email client, and it's not always supported. One of the forums I participate in is the forum where I posted a description of my photography kit, explained different NAS options I'm considering and explained part of my git-annex/dartkable workflow.

Another forum I recently started working on is the forum. I first asked what were the full specifications for their neat little embedded set-top box, the Vero 4k+. I wasn't fully satisfied with the answers (the hardware is not fully open), but I ended up ordering the device and moving the "home cinema services" off of the venerable marcos server, which is going to turn 8 years old this year. This was an elaborate enterprise which involved wiring power outlets (because a ground was faulty), vacuuming the basement (because it was filthy), doing elaborate research on SSHFS setup and performance, deal with systemd bugs and so on.

In the end it was worth it: my roommates enjoy the new remote control. It's much more intuitive than the previous Bluetooth keyboard, it performs well enough, and is one less thing to overload poor marcos with.

Monkeysign alternatives testing

I already mentioned I was considering Monkeysign retirement and recently a friend asked me to sign his key so I figured it was a great time to test out possible replacements for the project. Turns out things were not as rosy as I thought.

I first tested pius and it didn't behave as well as I hoped. Generally, it asks too many cryptic questions the user shouldn't have to guess the answer to. Specifically, here's the issues I found in my review:

  1. it forces you to specify your signing key, which is error-prone and needlessly difficult for the user

  2. I don't quite understand what the first question means - there's too much to unpack there: is it for inline PGP/MIME? for sending email at all? for sending individual emails? what's going on? and the second questions

  3. the second question should be optional: i already specified my key on the commandline, it should use that as a From...

  4. the signature level is useless and generally disregarded by all software, including OpenPGP. even if it would be used, 0/1/2/3/s/n/h/q is a pretty horrible user interface.

And then it simply fails to send the email completely on dkg's key, but that might be because its key was so exotic...

Gnome-keysign didn't fare much better - I opened six different issues on the promising project:

  1. what does the internet button do?
  2. signing arbitrary keys in GUI
  3. error in french translation
  4. using mutt as a MUA does not work
  5. signing a key on the commandline never completes
  6. flatpak instructions failure

So, surprisingly, Monkeysign might survive a bit longer, as much as I have come to dislike the poor little thing...

Golang packaging

To help a friend getting the new RiseupVPN package in Debian, I uploaded a bunch of Golang dependencies (bug #919936, bug #919938, bug #919941, bug #919944, bug #919945, bug #919946, bug #919947, bug #919948) in Debian. This involved filing many bugs upstream as many of those (often tiny) packages didn't have explicit licences, so many of those couldn't actually be uploaded, but the ITPs are there and hopefully someone will complete that thankless work.

I also tried to package two other useful Golang programs, dmarc-cat and gotop, both of which also required a significant number of dependencies to be packaged (bug #920387, bug #920388, bug #920389, bug #920390, bug #921285, bug #921286, bug #921287, bug #921288). dmarc-cat has just been accepted in Debian - it's very useful to decipher DMARC reports you get when you configure your DNS to receive such reports. This is part of a larger effort to modernize my DNS and mail configuration.

But gotop is just starting - none of the dependencies have been update just yet, and I'm running out of steam a little, even though that looks like an awesome package.

Other work
  • I hosed my workstation / laptop backup by trying to be too clever with Borg. It bit back and left me holding the candle, the bastard.

  • Expanded on my disk testing documentation to include better examples of fio as part of my neglected stressant package

GitHub said I "opened 21 issues in 14 other repositories" which seems a tad insane. And there's of course probably more stuff I'm forgetting here.

Debian Long Term Support (LTS)

This is my monthly Debian LTS report.

sbuild regression

My first stop this month was to notice a problem with sbuild from buster running on jessie chroots (bug #920227). After discussions on IRC, where fellow Debian Developers basically fabricated me a patch on the fly, I sent merge request #5 which was promptly accepted and should be part of the next upload.


I again worked a bit on systemd. I marked CVE-2018-16866 as not affecting jessie, because the vulnerable code was introduced in later versions. I backported fixes for CVE-2018-16864 and CVE-2018-16865 and published the resulting package as DLA-1639-1, after doing some smoke-testing.

I still haven't gotten the courage to dig back in the large backport of tmpfiles.c required to fix CVE-2018-6954.

tiff review

I did a quick review of the fix for CVE-2018-19210 proposed upstream which seems to have brought upstream's attention back to the issue and finally merge the fix.

Enigmail EOL

After reflecting on the issue one last time, I decided to mark Enigmail as EOL in jessie, which involved an upload of debian-security-support to jessie (DLA-1657-1), unstable and a stable-pu.

DLA / website work

I worked again on fixing the LTS workflow with the DLAs on the main website. Reminder: hundreds of DLAs are missing from the website (bug #859122) and we need to figure out a way to automate the import of newer ones (bug #859123).

The details of my work are in this post but basically, I readded a bunch more DLAs to the MR and got some good feedback from the www team (in MR #47). There's still some work to be done on the DLA parser, although I have merged my own improvements (MR #46) as I felt they had been sitting for review long enough.

Next step is to deal with noise like PGP signatures correctly and thoroughly review the proposed changes.

While I was in the webmaster's backyard, I tried to help with a few things by merging a LTS errata and a paypal integration note although the latter ended up being a mistake that was reverted. I also rejected some issues (MR #13, MR #15) during a quick triage.

phpMyAdmin review

After reading this email from Lucas Kanashiro, I reviewed CVE-2018-19968 and reviewed and tested CVE-2018-19970.

Catégories: External Blogs

Debian build helpers: dh dominates

Anarcat - mar, 02/05/2019 - 19:54

It's been a while since someone did this. Back in 2009, Joey Hess made a talk at Debconf 9 about debhelper and mentioned in his slides (PDF) that it was used in most Debian packages. Here was the ratio (page 10):

  • debhelper: 54%
  • cdbs: 25%
  • dh: 9%
  • other: 3%

Then Lucas Nussbaum made graphs from that did the same, but with history. His latest post (archive link because original is missing images), from 2015 confirmed Joey's 2009 results. It also showed cdbs was slowly declining and a sharp uptake in the dh usage (over debhelper). Here were the approximate numbers:

  • debhelper: 15%
  • cdbs: 15%
  • dh: 69%
  • other: 1%

I ran the numbers again. Jakub Wilk pointed me to the output that can be used to get the current state easily:

$ curl -so lintian.log.gz $ zgrep debian-build-system lintian.log.gz | awk '{print $NF}' | sort | uniq -c | sort -nr 25772 dh 2268 debhelper 2124 257 dhmk 123 other 8

Shoving this in a LibreOffice spreadsheet (sorry, my R/Python brain is slow today) gave me this nice little graph:

As of today, the numbers are now:

  • debhelper: 7%
  • cdbs: 7%
  • dh: 84%
  • other: 1%

(No the numbers don't add up. Yes it's a rounding error. Blame LibreOffice.)

So while cdbs lost 10% of the packages in 6 years, it lost another half of its share in the last 4. It's also interesting to note that debhelper and cdbs are both shrinking at a similar rate.

This confirms that debhelper development is where everything is happening right now. The new dh(1) sequencer is also a huge improvement that almost everyone has adopted wholeheartedly.

Now of course, that remaining 15% of debhelper/cdbs (or just 7% of cdbs, depending on how pedantic you are) will be the hard part to transition. Notice how the 1% of "other" packages hasn't really moved in the last four years: that's because some packages in Debian are old, abandoned, ignored, complicated, or all of the above. So it will be difficult to convert the remaining packages and finalize this great unification Joey (unknowingly) started ten years ago, as the remaining packages are probably the hard, messy, old ones no want wants to fix because, well, "they're not broken so don't fix it".

Still, it's nice to see us agree on something for a change. I'd be quite curious to see an update of Lucas' historical graphs. It would be particularly useful to see the impact of the old Alioth server replacement with, because it runs GitLab and only supports Git. Without an easy-to-use internal hosting service, I doubt SVN, Darcs, Bzr and whatever is left in "other" there will survive very long.

Catégories: External Blogs

December 2018 report: archiving Brazil, calendar and LTS

Anarcat - ven, 12/21/2018 - 14:40
Last two months free software work

Keen readers probably noticed that I didn't produce a report in November. I am not sure why, but I couldn't find the time to do so. When looking back at those past two months, I didn't find that many individual projects I worked on, but there were massive ones, of the scale of archiving the entire government of Brazil or learning the intricacies of print media, both of which were slightly or largely beyond my existing skill set.

Calendar project

I've been meaning to write about this project more publicly for a while, but never found the way to do so productively. But now that the project is almost over -- I'm getting the final prints today and mailing others hopefully soon -- I think this deserves at least a few words.

As some of you might know, I bought a new camera last January. Wanting to get familiar with how it works and refresh my photography skills, I decided to embark on the project of creating a photo calendar for 2019. The basic idea was simple: take pictures regularly, then each month pick the best picture of that month, collect all those twelve pictures and send that to the corner store to print a few calendars.

Simple, right?

Well, twelve pictures turned into a whopping 8000 pictures since January, not all of which were that good of course. And of course, a calendar has twelve months -- so twelve pictures -- but also a cover and a back, which means thirteen pictures and some explaining. Being critical of my own work, it turned out that finding those pictures was sometimes difficult, especially considering the medium imposed some rules I didn't think about.

For example, the US Letter paper size imposes a different ratio (1.29) than the photographic ratio (~1.5) which means I had to reframe every photograph. Sometimes this meant discarding entire ideas. Other photos were discarded because too depressing even if I found them artistically or journalistically important: you don't want to be staring at a poor kid distressed at going into school every morning for an entire month. Another advice I got was to forget about sunsets and dark pictures, as they are difficult to render correctly in print. We're used to bright screens displaying those pictures, paper is a completely different feeling. Having a good vibe for night and star photography, this was a fairly dramatic setback, even though I still did feature two excellent pictures.

Then I got a little carried away. At the suggestion of a friend, I figured I could get rid of the traditional holiday dates and replace them with truly secular holidays, which got me involved in a deep search for layout tools, which in turn naturally brought me to this LaTeX template. Those who have worked with LaTeX (or probably any professional layout tool) know what's next: I spent a significant amount of time perfecting the rendering and crafting the final document.

Slightly upset by the prices proposed by the corner store (15$CAD/calendar!), I figured I could do better by printing on my own, especially encouraged by a friend who had access to a good color laser printer. I then spent multiple days (if not weeks) looking for the right paper, which got me in the rabbit hole of paper weights, brightness, texture, and more. I'll just say this: if you ever thought lengths were ridiculous in the imperial system, wait until you find out how you find out about how paper weights work. I finally managed to find some 270gsm gloss paper at the corner store -- after looking all over town, it was right there -- and did a first print of 15 calendars, which turned into 14 because of trouble with jammed paper. Because the printer couldn't do recto-verso copies, I had to spend basically 4 hours tending to that stupid device, bringing my loathing of printers (the machines) and my respect for printers (the people) to an entirely new level.

The time spent on the print was clearly not worth it in the end, and I ended up scheduling another print with a professional printer. The first proof are clearly superior to the ones I have done myself and, in retrospect, completely worth the 15$ per copy.

I still haven't paid for my time in any significant way on that project, something I seem to excel at doing consistently. The prints themselves are not paid for, but my time in producing those photographs is not paid either, which clearly outlines my future as a professional photographer, if any, lie far away from producing those silly calendars, at least for now.

More documentation on the project is available, in french, in calendrier-2019. I am also hoping to eventually publish a graphical review of the calendar, but for now I'll leave that for the friends and family who will receive the calendar as a gift...

Archival of Brasil

Another modest project I embarked on was a mission to archive the government of Brazil following the election the infamous Jair Bolsonaro, dictatorship supporter, homophobe, racist, nationalist and christian freak that somehow managed to get elected president of Brazil. Since he threatened to rip apart basically the entire fabric of Brazilian society, comrades were worried that he might attack and destroy precious archives and data from government archives when he comes in power, in January 2019. Like many countries in Latin America that lived under dictatorships in the 20th century, Brazil made an effort to investigate and keep memory of the atrocities that were committed during those troubled times.

Since I had written about archiving websites, those comrades naturally thought I could be of use, so we embarked on a crazy quest to archive Brazil, basically. We tried to create a movement similar to the Internet Archive (IA) response to the 2016 Trump election but were not really successful at getting IA involved. I was, fortunately, able to get the good folks at Archive Team (AT) involved and we have successfully archived a significant number of websites, adding terabytes of data to the IA through the backdoor that is AT. We also ran a bunch of archival on a special server, leveraging tools like youtube-dl, git-annex, wpull and, eventually, grab-site to archive websites, social network sites and video feeds.

I kind of burned out on the job. Following Brazilian politics was scary and traumatizing - I have been very close to Brazil folks and they are colorful, friendly people. The idea that such a horrible person could come into power there is absolutely terrifying and I kept on thinking how disgusted I would be if I would have to archive stuff from the government of Canada, which I do not particularly like either... This goes against a lot of my personal ethics, but then it beats the obscurity of pure destruction of important scientific, cultural and historical data.


Considering the workload involved in the above craziness, the fact that I worked on less project than my usual madness shouldn't come as a surprise.

  • As part of the calendar work, I wrote a new tool called moonphases which shows a list of moon phase events in the given time period, and shipped that as part of undertime 1.5 for lack of a better place.

  • AlternC revival: friends at Koumbit asked me for source code of AlternC projects I was working on. I was disappointed (but not surprised) that upstream simply took those repositories down without publishing an archive. Thankfully, I still had SVN checkouts but unfortunately, those do not have the full history, so I reconstructed repositories based on the last checkout that I had for alternc-mergelog, alternc-stats, and alternc-slavedns.

  • I packaged two new projects into Debian, bitlbee-mastodon (to connect to the new Mastodon network over IRC) and python-internetarchive (a command line interface to the IA upload forms)

  • my work on archival tools led to a moderately important patch in pywb: allow symlinking and hardlinking files instead of just copying was important to manage multiple large WARC files along with git-annex.

  • I also noticed the IA people were using a tool called slurm to diagnose bandwidth problems on their networks and implemented iface speed detection on Linux while I was there. slurm is interesting, but I also found out about bmon through the hilarious hollywood project. Each has their advantages: bmon has packets per second graphs, while slurm only has bandwidth graphs, but also notices maximal burst speeds which is very useful.

Debian Long Term Support (LTS)

This is my monthly Debian LTS report. Note that my previous report wasn't published on this blog but on the mailing list.

Enigmail / GnuPG 2.1 backport

I've spent a significant amount of time working on the Enigmail backport for a third consecutive month. I first published a straightforward backport of GnuPG 2.1 depending on the libraries available in jessie-backports last month, but then I actually rebuilt the dependencies as well and sent a "HEADS UP" to the mailing list, which finally got peoples' attention.

There are many changes bundled in that possible update: GnuPG actually depends on about half a dozen other libraries, mostly specific to GnuPG, but in some cases used by third party software as well. The most problematic one is libgcrypt20 which Emilio Pozuelo Monfort said included tens of thousands of lines of change. So even though I tested the change against cryptsetup, gpgme, libotr, mutt and Enigmail itself, there are concerns that other dependencies that merit more testing as well.

This caused many to raise the idea of aborting the work and simply marking Enigmail as unsupported in jessie. But Daniel Kahn Gillmor suggested this should also imply removing Thunderbird itself from jessie, as simply removing Enigmail will force people to use the binaries from Mozilla's add-ons service. Gillmor explained those builds include a OpenPGP.js implementation of dubious origin, which is especially problematic considering it deals with sensitive private key material.

It's unclear which way this will go next. I'm taking a break of this issue and hope others will be able to test the packges. If we keep on working on Enigmail, the next step will be to re-enable the dbg packages that were removed in the stretch updates, use dh-autoreconf correctly, remove some mingw pacakges I forgot and test gcrypt like crazy (especially the 1.7 update). We'd also update to the latest Enigmail, as it fixes issues that forced the Tails project to disable autocrypt because of weird interactions that make it send cleartext (instead of encrypted) mail in some cases.

Automatic unclaimer

My previous report yielded an interesting discussion around my work on the security tracker, specifically the "automatic unclaimer" designed to unassign issues that are idle for too long. Holger Levsen, with his new coordinator hat, tested the program and found many bugs and missing features, which I was happy to implement. After many patches and back and forth, it seems the program is working well, although it's ran by hand by the coordinator.

DLA website publication

I took a look at various issues surrounding the publication of LTS advisories on the main website. While normal security advisories are regularly published on about 500+ DLAs are missing from the website, mainly because DLAs are not automatically imported.

As it turns out, there is a script called that is designed to handle those entries but for some reason, they are not imported anymore. So I got to work to import the backlog and make sure new entries are properly imported.

Various fixes for were necessary to properly parse messages both from the templates generated by gen-DLA and the existing archives correctly. then I tested the result with two existing advisories, which resulted in two MR on the webml repo: add data for DLA-1561 and add dla-1580 advisory. I requested and was granted access to the repo, and eventually merged my own MRs after a review from Levsen.

I eventually used the following procedure to test importing the entire archive:

rsync -vPa . cd debian-lts-announce xz -d \*.xz cat \* > ../giant.mbox mbox2maildir ../giant.mbox debian-lts-announce.d for mail in debian-lts-announce.d/cur/\*; do ~/src/security-tracker/./ $mail; done

This lead to 82 errors on an empty directory, which is not bad at all considering the amount of data processed. Of course, there many more errors in the live directory as many advisories were already present. In the live directory, this resulted in 2431 new advisories added to the website.

There were a few corner cases:

  • The first month or so didn't use DLA identifiers and many of those were not correctly imported even back then.

  • DLA-574-1 was a duplicate, covered by the DLA-574-2 regression update. But I only found the Imagemagick advisory - it looks like the qemu one was never published.

  • Similarly, the graphite2 regression was never assigned a real identifier.

  • Other cases include for example DLA-787-1 which was sent twice and the DLA-1263-1 duplicate, which was irrecuperable as it was never added to data/DLA/list

Those special cases will all need to be handled by an eventual automation of this process, which I still haven't quite figured out. Maybe a process similar to the unclaimer will be followed: the coordinator or me could add missing DLAs until we streamline the process, as it seems unlikely we will want to add more friction to the DLA release by forcing workers to send merge requests to the web team, as that will only put more pressure on the web team...

There are also nine advisories missing from the mailing list archive because of a problem with the mailing list server at that time. We'll need to extract those from people's email archives, which I am not sure how to coordinate at this point.

PHP CVE identifier confusion

I have investigated CVE-2018-19518, mistakenly identified as CVE-2018-19158 in various places, including upstream's bugtracker. I requested the latter erroneous CVE-2018-19158 to be retired to avoid any future confusion. Unfortunately, Mitre indicated the CVE was already in "active use for pre-disclosure vulnerability coordination", which made it impossible to correct the error at that level.

I've instead asked upstream to correct the metadata in their tracker but it seems nothing has changed there yet.

Catégories: External Blogs

Large files with Git: LFS and git-annex

Anarcat - lun, 12/10/2018 - 19:00

Git does not handle large files very well. While there is work underway to handle large repositories through the commit graph work, Git's internal design has remained surprisingly constant throughout its history, which means that storing large files into Git comes with a significant and, ultimately, prohibitive performance cost. Thankfully, other projects are helping Git address this challenge. This article compares how Git LFS and git-annex address this problem and should help readers pick the right solution for their needs.

The problem with large files

As readers probably know, Linus Torvalds wrote Git to manage the history of the kernel source code, which is a large collection of small files. Every file is a "blob" in Git's object store, addressed by its cryptographic hash. A new version of that file will store a new blob in Git's history, with no deduplication between the two versions. The pack file format can store binary deltas between similar objects, but if many objects of similar size change in a repository, that algorithm might fail to properly deduplicate. In practice, large binary files (say JPEG images) have an irritating tendency of changing completely when even the smallest change is made, which makes delta compression useless.

There have been different attempts at fixing this in the past. In 2006, Torvalds worked on improving the pack-file format to reduce object duplication between the index and the pack files. Those changes were eventually reverted because, as Nicolas Pitre put it: "that extra loose object format doesn't appear to be worth it anymore".

Then in 2009, Caca Labs worked on improving the fast-import and pack-objects Git commands to do special handling for big files, in an effort called git-bigfiles. Some of those changes eventually made it into Git: for example, since 1.7.6, Git will stream large files directly to a pack file instead of holding them all in memory. But files are still kept forever in the history.

An example of trouble I had to deal with is for the Debian security tracker, which follows all security issues in the entire Debian history in a single file. That file is around 360,000 lines for a whopping 18MB. The resulting repository takes 1.6GB of disk space and a local clone takes 21 minutes to perform, mostly taken up by Git resolving deltas. Commit, push, and pull are noticeably slower than a regular repository, taking anywhere from a few seconds to a minute depending one how old the local copy is. And running annotate on that large file can take up to ten minutes. So even though that is a simple text file, it's grown large enough to cause significant problems for Git, which is otherwise known for stellar performance.

Intuitively, the problem is that Git needs to copy files into its object store to track them. Third-party projects therefore typically solve the large-files problem by taking files out of Git. In 2009, Git evangelist Scott Chacon released GitMedia, which is a Git filter that simply takes large files out of Git. Unfortunately, there hasn't been an official release since then and it's unclear if the project is still maintained. The next effort to come up was git-fat, first released in 2012 and still maintained. But neither tool has seen massive adoption yet. If I would have to venture a guess, it might be because both require manual configuration. Both also require a custom server (rsync for git-fat; S3, SCP, Atmos, or WebDAV for GitMedia) which limits collaboration since users need access to another service.


That was before GitHub released Git Large File Storage (LFS) in August 2015. Like all software taking files out of Git, LFS tracks file hashes instead of file contents. So instead of adding large files into Git directly, LFS adds a pointer file to the Git repository, which looks like this:

version oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393 size 12345

LFS then uses Git's smudge and clean filters to show the real file on checkout. Git only stores that small text file and does so efficiently. The downside, of course, is that large files are not version controlled: only the latest version of a file is kept in the repository.

Git LFS can be used in any repository by installing the right hooks with git lfs install then asking LFS to track any given file with git lfs track. This will add the file to the .gitattributes file which will make Git run the proper LFS filters. It's also possible to add patterns to the .gitattributes file, of course. For example, this will make sure Git LFS will track MP3 and ZIP files:

$ cat .gitattributes *.mp3 filter=lfs -text *.zip filter=lfs -text

After this configuration, we use Git normally: git add, git commit, and so on will talk to Git LFS transparently.

The actual files tracked by LFS are copied to a path like .git/lfs/objects/{OID-PATH}, where {OID-PATH} is a sharded file path of the form OID[0:2]/OID[2:4]/OID and where OID is the content's hash (currently SHA-256) of the file. This brings the extra feature that multiple copies of the same file in the same repository are automatically deduplicated, although in practice this rarely occurs.

Git LFS will copy large files to that internal storage on git add. When a file is modified in the repository, Git notices, the new version is copied to the internal storage, and the pointer file is updated. The old version is left dangling until the repository is pruned.

This process only works for new files you are importing into Git, however. If a Git repository already has large files in its history, LFS can fortunately "fix" repositories by retroactively rewriting history with git lfs migrate. This has all the normal downsides of rewriting history, however --- existing clones will have to be reset to benefit from the cleanup.

LFS also supports file locking, which allows users to claim a lock on a file, making it read-only everywhere except in the locking repository. This allows users to signal others that they are working on an LFS file. Those locks are purely advisory, however, as users can remove other user's locks by using the --force flag. LFS can also prune old or unreferenced files.

The main limitation of LFS is that it's bound to a single upstream: large files are usually stored in the same location as the central Git repository. If it is hosted on GitHub, this means a default quota of 1GB storage and bandwidth, but you can purchase additional "packs" to expand both of those quotas. GitHub also limits the size of individual files to 2GB. This upset some users surprised by the bandwidth fees, which were previously hidden in GitHub's cost structure.

While the actual server-side implementation used by GitHub is closed source, there is a test server provided as an example implementation. Other Git hosting platforms have also implemented support for the LFS API, including GitLab, Gitea, and BitBucket; that level of adoption is something that git-fat and GitMedia never achieved. LFS does support hosting large files on a server other than the central one --- a project could run its own LFS server, for example --- but this will involve a different set of credentials, bringing back the difficult user onboarding that affected git-fat and GitMedia.

Another limitation is that LFS only supports pushing and pulling files over HTTP(S) --- no SSH transfers. LFS uses some tricks to bypass HTTP basic authentication, fortunately. This also might change in the future as there are proposals to add SSH support, resumable uploads through the protocol, and other custom transfer protocols.

Finally, LFS can be slow. Every file added to LFS takes up double the space on the local filesystem as it is copied to the .git/lfs/objects storage. The smudge/clean interface is also slow: it works as a pipe, but buffers the file contents in memory each time, which can be prohibitive with files larger than available memory.


The other main player in large file support for Git is git-annex. We covered the project back in 2010, shortly after its first release, but it's certainly worth discussing what has changed in the eight years since Joey Hess launched the project.

Like Git LFS, git-annex takes large files out of Git's history. The way it handles this is by storing a symbolic link to the file in .git/annex. We should probably credit Hess for this innovation, since the Git LFS storage layout is obviously inspired by git-annex. The original design of git-annex introduced all sorts of problems however, especially on filesystems lacking symbolic-link support. So Hess has implemented different solutions to this problem. Originally, when git-annex detected such a "crippled" filesystem, it switched to direct mode, which kept files directly in the work tree, while internally committing the symbolic links into the Git repository. This design turned out to be a little confusing to users, including myself; I have managed to shoot myself in the foot more than once using this system.

Since then, git-annex has adopted a different v7 mode that is also based on smudge/clean filters, which it called "unlocked files". Like Git LFS, unlocked files will double disk space usage by default. However it is possible to reduce disk space usage by using "thin mode" which uses hard links between the internal git-annex disk storage and the work tree. The downside is, of course, that changes are immediately performed on files, which means previous file versions are automatically discarded. This can lead to data loss if users are not careful.

Furthermore, git-annex in v7 mode suffers from some of the performance problems affecting Git LFS, because both use the smudge/clean filters. Hess actually has ideas on how the smudge/clean interface could be improved. He proposes changing Git so that it stops buffering entire files into memory, allows filters to access the work tree directly, and adds the hooks he found missing (for stash, reset, and cherry-pick). Git-annex already implements some tricks to work around those problems itself but it would be better for those to be implemented in Git natively.

Being more distributed by design, git-annex does not have the same "locking" semantics as LFS. Locking a file in git-annex means protecting it from changes, so files need to actually be in the "unlocked" state to be editable, which might be counter-intuitive to new users. In general, git-annex has some of those unusual quirks and interfaces that often come with more powerful software.

And git-annex is much more powerful: it not only addresses the "large-files problem" but goes much further. For example, it supports "partial checkouts" --- downloading only some of the large files. I find that especially useful to manage my video, music, and photo collections, as those are too large to fit on my mobile devices. Git-annex also has support for location tracking, where it knows how many copies of a file exist and where, which is useful for archival purposes. And while Git LFS is only starting to look at transfer protocols other than HTTP, git-annex already supports a large number through a special remote protocol that is fairly easy to implement.

"Large files" is therefore only scratching the surface of what git-annex can do: I have used it to build an archival system for remote native communities in northern Québec, while others have built a similar system in Brazil. It's also used by the scientific community in projects like GIN and DataLad, which manage terabytes of data. Another example is the Japanese American Legacy Project which manages "upwards of 100 terabytes of collections, transporting them from small cultural heritage sites on USB drives".

Unfortunately, git-annex is not well supported by hosting providers. GitLab used to support it, but since it implemented Git LFS, it dropped support for git-annex, saying it was a "burden to support". Fortunately, thanks to git-annex's flexibility, it may eventually be possible to treat LFS servers as just another remote which would make git-annex capable of storing files on those servers again.


Git LFS and git-annex are both mature and well maintained programs that deal efficiently with large files in Git. LFS is easier to use and is well supported by major Git hosting providers, but it's less flexible than git-annex.

Git-annex, in comparison, allows you to store your content anywhere and espouses Git's distributed nature more faithfully. It also uses all sorts of tricks to save disk space and improve performance, so it should generally be faster than Git LFS. Learning git-annex, however, feels like learning Git: you always feel you are not quite there and you can always learn more. It's a double-edged sword and can feel empowering for some users and terrifyingly hard for others. Where you stand on the "power-user" scale, along with project-specific requirements will ultimately determine which solution is the right one for you.

Ironically, after thorough evaluation of large-file solutions for the Debian security tracker, I ended up proposing to rewrite history and split the file by year which improved all performance markers by at least an order of magnitude. As it turns out, keeping history is critical for the security team so any solution that moves large files outside of the Git repository is not acceptable to them. Therefore, before adding large files into Git, you might want to think about organizing your content correctly first. But if large files are unavoidable, the Git LFS and git-annex projects allow users to keep using most of their current workflow.

This article first appeared in the Linux Weekly News.

Catégories: External Blogs

October 2018 report: LTS, Monkeysphere, Flatpak, Kubernetes, CD archival and calendar project

Anarcat - jeu, 11/01/2018 - 15:12
Debian Long Term Support (LTS)

This is my monthly Debian LTS report.


As discussed last month, one of the options to resolve the pending GnuTLS security issues was to backport the latest 3.3.x series (3.3.30), an update proposed then uploaded as DLA-1560-1. I after a suggestion, I've included an explicit NEWS.Debian item warning people about the upgrade, a warning also included in the advisory itself.

The most important change is probably dropping SSLv3, RC4, HMAC-SHA384 and HMAC-SHA256 from the list of algorithms, which could impact interoperability. Considering how old RC4 and SSLv3 are, however, this should be a welcome change. As for the HMAC changes, those are mandatory to fix the targeted vulnerabilities (CVE-2018-10844, CVE-2018-10845, CVE-2018-10846).


Xen updates had been idle for a while in LTS, so I bit the bullet and made a first discovery of the pending vulnerabilities. I sent the result to the folks over at Credativ who maintain the 4.4 branch and they came back with a set of proposed updates which I briefly review. Unfortunately, the patches were too deep for me: all I was able to do was to confirm consistency with upstream patches.

I also brought up a discussion regarding the viability of Xen in LTS, especially regarding the "speculative execution" vulnerabilities (XSA-254 and related). My understanding is upstream Xen fixes are not (yet?) complete, but apparently that is incorrect as Peter Dreuw is "condident in the Xen project to provide a solution for these issues". I nevertheless consider, like RedHat that the simpler KVM implementation might provide more adequate protection against those kind of attacks and LTS users should seriously consider switching to KVM for hosing untrusted virtual machines, even if only because that code is actually mainline in the kernel while Xen is unlikely to ever be. It might be, as Dreuw said, simpler to upgrade to stretch than switch virtualization systems...

When all is said and done, however, Linux and KVM are patches in Jessie at the time of writing, while Xen is not (yet).


I spent a significant amount of time working on Enigmail this month again, this time specifically working on reviewing the stretch proposed update to gnupg from Daniel Kahn Gillmor (dkg). I did not publicly share the code review as we were concerned it would block the stable update, which seemed to be in jeopardy when I started working on the issue. Thankfully, the update went through but it means it might impose extra work on leaf packages. Monkeysphere, in particular, might fail to build from source (FTBFS) after the gnupg update lands.

In my tests, however, it seems that packages using GPG can deal with the update correctly. I tested Monkeysphere, Password Store, git-remote-gcrypt and Enigmail, all of which passed a summary smoke test. I have tried to summarize my findings on the mailing list. Basically our options for the LTS update are:

  1. pretend Enigmail works without changing GnuPG, possibly introducing security issues

  2. ship a backport of GnuPG and Enigmail through jessie-sloppy-backports

  3. package OpenPGP.js and backport all the way down to jessie

  4. remove Enigmail from jessie

  5. backport the required GnuPG patchset from stretch to jessie

So far I've taken that last step as my favorite approach...

Firefox / Thunderbird and finding work

... which brings us to the Firefox and Thunderbird updates. I was assuming those were going ahead, but the status of those updates currently seems unclear. This is a symptom of a larger problem in the LTS work organization: some packages can stay "claimed" for a long time without an obvious status update.

We discussed ways of improving on this process and, basically, I will try to be more proactive in taking over packages from others and reaching out to others to see if they need help.

A note on GnuPG

As an aside to the Enigmail / GnuPG review, I was struck by the ... peculiarities in the GnuPG code during my review. I discovered that GnuPG, instead of using the standard resolver, implements its own internal full-stack DNS server, complete with UDP packet parsing. That's 12 000 lines of code right there. There are also abstraction leaks like using "1" and "0" as boolean values inside functions (as opposed to passing an integer and converting as string on output).

A major change in the proposed patchset are changes to the --with-colons batch output, which GnuPG consumers (like GPGME) are supposed to use to interoperate with GnuPG. Having written such a parser myself, I can witness to how difficult parsing those data structures is. Normally, you should always be using GPGME instead of parsing those directly, but unfortunately GPGME does not do everything GPG does: signing operations and keyring management, for example, has long been considered out of scope, so users are force to parse that output.

Long story short, GPG consumers still use --with-colons directly (and that includes Enigmail) because they have to. In this case, critical components were missing from that output (e.g. knowing which key signed which UID) so they were added in the patch. That's what breaks the Monkeysphere test suite, which doesn't expect a specific field to be present. Later versions of the protocol specification have been updated (by dkg) to clarify that might happen, but obviously some have missed the notice, as it came a bit late.

In any case, the review did not make me confident in the software architecture or implementation of the GnuPG program.

autopkgtest testing

As part of our LTS work, we often run tests to make sure everything is in order. Starting with Jessie, we are now seeing packages with autopkgtest enabled, so I started meddling with that program. One of the ideas I was hoping to implement was to unify my virtualization systems. Right now I'm using:

Because sbuild can talk with autopkgtest, and autopkgtest can talk with qemu (which can use KVM images), I figured I could get rid of schroot. Unfortunately, I met a few snags;

  • #911977: how do we correctly guess the VM name in autopkgtest?
  • #911963: qemu build fails with proxy_cmd: parameter not set (fixed and provided a patch)
  • #911979: fails on chown in autopkgtest-qemu backend
  • #911981: qemu server warns about missing CPU features

So I gave up on that approach. But I did get autopkgtest working and documented the process in my quick Debian development guide.

Oh, and I also got sucked down into wiki stylesheet (#864925) after battling with the SystemBuildTools page.

Spamassassin followup

Last month I agreed we could backport the latest upstream version of SpamAssassin (a recurring pattern). After getting the go from the maintainer, I got a test package uploaded but the actual upload will need to wait for the stretch update (#912198) to land to avoid a versioning conflict.

Salt Stack

My first impression of Salt was not exactly impressive. The CVE-2017-7893 issue was rather unclear: first upstream fixed the issue, but reverted the default flag which would enable signature forging after it was discovered this would break compatibility with older clients.

But even worse, the 2014 version of Salt shipped in Jessie did not have master signing in the first place, which means there was simply no way to protect from master impersonation, a worrisome concept. But I assumed this was expected behavior and triaged this away from jessie, and tried to forgot about the horrors I had seen.

phpLDAPadmin with sunweaver

I looked next at the phpLDAPadmin (or PHPLDAPadmin?) vulnerabilities, but could not reproduce the issue using the provided proof of concept. I have also audited the code and it seems pretty clear the code is protected against such an attack, as was explained by another DD in #902186. So I asked Mitre for rejection, and uploaded DLA-1561-1 to fix the other issue (CVE-2017-11107). Meanwhile the original security researcher acknowledged the security issue was a "false positive", although only in a private email.

I almost did a NMU for the package but the security team requested to wait, and marked the package as grave so it gets kicked out of buster instead. I at least submitted the patch, originally provided by Ubuntu folks, upstream.


Finally, I worked on the smart3 package. I confirmed the package in jessie is not vulnerable, because Smarty hadn't yet had the brilliant idea of "optimizing" realpath by rewriting it with new security vulnerabilities. Indeed, the CVE-2018-13982 proof of content and CVE-2018-16831 proof of content both fail in jessie.

I have tried to audit the patch shipped with stretch to make sure it fixed the security issue in question (without introducing new ones of course) abandoned parsing the stretch patch because this regex gave me a headache:

'%^(?<root>(?:<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aalpha%3A" rel="nofollow">?</a>:alpha:</span>:[\\\\]|/|[\\\\]{2}<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aalpha%3A" rel="nofollow">?</a>:alpha:</span>+|<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aprint%3A" rel="nofollow">?</a>:print:</span>{2,}:[/]{2}|[\\\\])?)(?<path>(?:<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aprint%3A" rel="nofollow">?</a>:print:</span>*))$%u', "who is supporting our users?"

I finally participated in a discussion regarding concerns about support of cloud images for LTS releases. I proposed that, like other parts of Debian, responsibility of those images would shift to the LTS team when official support is complete. Cloud images fall in that weird space (ie. "Installing Debian") which is not traditionally covered by the LTS team.

Hopefully that will become the policy, but only time will tell how this will play out.

Other free software work irssi sandbox

I had been uncomfortable running irssi as my main user on my server for a while. It's a constantly running network server, sometimes connecting to shady servers too. So it made sense to run this as a separate user and, while I'm there, start it automatically on boot.

I created the following file in /etc/systemd/system/irssi@.service, based on this gist:

[Unit] Description=IRC screen session [Service] Type=forking User=%i ExecStart=/usr/bin/screen -dmS irssi irssi ExecStop=/usr/bin/screen -S irssi -X stuff '/quit\n' NoNewPrivileges=true [Install]

A whole apparmor/selinux/systemd profile could be written for irssi of course, but I figured I would start with NoNewPrivileges. Unfortunately, that line breaks screen, which is sgid utmp which is some sort of "new privilege". So I'm running this as a vanilla service. To enable, simply enable the service with the right username, previously created with adduser:

systemctl enable irssi@foo.service systemctl start irssi@foo.service

Then I join the session by logging in as the foo user, which can be configured in .ssh/config as a convenience host:

Host Hostname User foo IdentityFile ~/.ssh/id_ed25519_irc # using command= in authorized_keys until we're all on buster #RemoteCommand screen -x RequestTTY force

Then the ssh command rejoins the screen session.

Monkeysphere revival

Monkeysphere was in bad shape in Debian buster. The bit rotten test suite was failing and the package was about to be removed from the next Debian release. I filed and worked on many critical bugs (Debian bug #909700, Debian bug #908228, Debian bug #902367, Debian bug #902320, Debian bug #902318, Debian bug #899060, Debian bug #883015) but the final fix came from another user. I was also welcome on the Debian packaging team which should allow me to make a new release next time we have similar issues, which was a blocker this time round.

Unfortunately, I had to abandon the Monkeysphere FreeBSD port. I had simply forgotten about that commitment and, since I do not run FreeBSD anywhere anymore, it made little sense to keep on doing so, especially since most of the recent updates were done by others anyways.

Calendar project

I've been working on a photography project since the beginning of the year. Each month, I pick the best picture out of my various shoots and will collect those in a 2019 calendar. I documented my work in the photo page, but most of my work in October was around finding a proper tool to layout the calendar itself. I settled on wallcalendar, a beautiful LaTeX template, because the author was very responsive to my feature request.

I also figured out which events to include in the calendar and a way to generate moon phases (now part of the undertime package) for the local timezone. I still have to figure out which other astronomical events to include. I had no response from the local Planetarium but (as always) good feedback from NASA folks which pointed me at useful resources to top up the calendar.


I got deeper into Kubernetes work by helping friends setup a cluster and share knowledge on how to setup and manage the platforms. This led me to fix a bug in Kubespray, the install / upgrade tool we're using to manage Kubernetes. To get the pull request accepted, I had to go through the insanely byzantine CLA process of the CNCF, which was incredibly frustrating, especially since it was basically a one-line change. I also provided a code review of the Nextcloud helm chart and reviewed the python-hvac ITP, one of the dependencies of Kubespray.

As I get more familiar with Kubernetes, it does seem like it can solve real problems especially for shared hosting providers. I do still feel it's overly complex and over-engineered. It's difficult to learn and moving too fast, but Docker and containers are such a convenient way to standardize shipping applications that it's hard to deny this new trend does solve a problem that we have to fix right now.

CD archival

As part of my work on archiving my CD collection, I contributed three pull requests to fix issues I was having with the project, mostly regarding corner cases but also improvements on the Dockerfile. At my suggestion, upstream also enabled automatic builds for the Docker image which should make it easier to install and deploy.

I still wish to write an article on this, to continue my series on archives, which could happen in November if I can find the time...

Flatpak conversion

After reading a convincing benchmark I decided to give Flatpak another try and ended up converting all my Snap packages to Flatpak.

Flatpak has many advantages:

  • it's decentralized: like APT or F-Droid repositories, anyone can host their own (there is only one Snap repository, managed by Canonical)

  • it's faster: the above benchmarks hinted at this, but I could also confirm Signal starts and runs faster under Flatpak than Snap

  • it's standardizing: many of the work Flatpak is doing to make sense of how to containerize desktop applications is being standardized (and even adopted by Snap)

Much of this was spurred by the breakage of Zotero in Debian (Debian bug #864827) due to the Firefox upgrade. I made a wiki page to tell our users how to install Zotero in Debian considering Zotero might take a while to be packaged back in Debian (Debian bug #871502).

Debian work

Without my LTS hat, I worked on the following packages:

Other work

Usual miscellaneous:

Catégories: External Blogs

Epic Lameness

Eric Dorland - lun, 09/01/2008 - 17:26 now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Catégories: External Blogs

Sentiment Sharing

Eric Dorland - lun, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Catégories: External Blogs
Syndiquer le contenu