Skip to main content

External Blogs

October 2018 report: LTS, Monkeysphere, Flatpak, Kubernetes, CD archival and calendar project

Anarcat - jeu, 11/01/2018 - 15:12
Debian Long Term Support (LTS)

This is my monthly Debian LTS report.

GnuTLS

As discussed last month, one of the options to resolve the pending GnuTLS security issues was to backport the latest 3.3.x series (3.3.30), an update proposed then uploaded as DLA-1560-1. I after a suggestion, I've included an explicit NEWS.Debian item warning people about the upgrade, a warning also included in the advisory itself.

The most important change is probably dropping SSLv3, RC4, HMAC-SHA384 and HMAC-SHA256 from the list of algorithms, which could impact interoperability. Considering how old RC4 and SSLv3 are, however, this should be a welcome change. As for the HMAC changes, those are mandatory to fix the targeted vulnerabilities (CVE-2018-10844, CVE-2018-10845, CVE-2018-10846).

Xen

Xen updates had been idle for a while in LTS, so I bit the bullet and made a first discovery of the pending vulnerabilities. I sent the result to the folks over at Credativ who maintain the 4.4 branch and they came back with a set of proposed updates which I briefly review. Unfortunately, the patches were too deep for me: all I was able to do was to confirm consistency with upstream patches.

I also brought up a discussion regarding the viability of Xen in LTS, especially regarding the "speculative execution" vulnerabilities (XSA-254 and related). My understanding is upstream Xen fixes are not (yet?) complete, but apparently that is incorrect as Peter Dreuw is "condident in the Xen project to provide a solution for these issues". I nevertheless consider, like RedHat that the simpler KVM implementation might provide more adequate protection against those kind of attacks and LTS users should seriously consider switching to KVM for hosing untrusted virtual machines, even if only because that code is actually mainline in the kernel while Xen is unlikely to ever be. It might be, as Dreuw said, simpler to upgrade to stretch than switch virtualization systems...

When all is said and done, however, Linux and KVM are patches in Jessie at the time of writing, while Xen is not (yet).

Enigmail

I spent a significant amount of time working on Enigmail this month again, this time specifically working on reviewing the stretch proposed update to gnupg from Daniel Kahn Gillmor (dkg). I did not publicly share the code review as we were concerned it would block the stable update, which seemed to be in jeopardy when I started working on the issue. Thankfully, the update went through but it means it might impose extra work on leaf packages. Monkeysphere, in particular, might fail to build from source (FTBFS) after the gnupg update lands.

In my tests, however, it seems that packages using GPG can deal with the update correctly. I tested Monkeysphere, Password Store, git-remote-gcrypt and Enigmail, all of which passed a summary smoke test. I have tried to summarize my findings on the mailing list. Basically our options for the LTS update are:

  1. pretend Enigmail works without changing GnuPG, possibly introducing security issues

  2. ship a backport of GnuPG and Enigmail through jessie-sloppy-backports

  3. package OpenPGP.js and backport all the way down to jessie

  4. remove Enigmail from jessie

  5. backport the required GnuPG patchset from stretch to jessie

So far I've taken that last step as my favorite approach...

Firefox / Thunderbird and finding work

... which brings us to the Firefox and Thunderbird updates. I was assuming those were going ahead, but the status of those updates currently seems unclear. This is a symptom of a larger problem in the LTS work organization: some packages can stay "claimed" for a long time without an obvious status update.

We discussed ways of improving on this process and, basically, I will try to be more proactive in taking over packages from others and reaching out to others to see if they need help.

A note on GnuPG

As an aside to the Enigmail / GnuPG review, I was struck by the ... peculiarities in the GnuPG code during my review. I discovered that GnuPG, instead of using the standard resolver, implements its own internal full-stack DNS server, complete with UDP packet parsing. That's 12 000 lines of code right there. There are also abstraction leaks like using "1" and "0" as boolean values inside functions (as opposed to passing an integer and converting as string on output).

A major change in the proposed patchset are changes to the --with-colons batch output, which GnuPG consumers (like GPGME) are supposed to use to interoperate with GnuPG. Having written such a parser myself, I can witness to how difficult parsing those data structures is. Normally, you should always be using GPGME instead of parsing those directly, but unfortunately GPGME does not do everything GPG does: signing operations and keyring management, for example, has long been considered out of scope, so users are force to parse that output.

Long story short, GPG consumers still use --with-colons directly (and that includes Enigmail) because they have to. In this case, critical components were missing from that output (e.g. knowing which key signed which UID) so they were added in the patch. That's what breaks the Monkeysphere test suite, which doesn't expect a specific field to be present. Later versions of the protocol specification have been updated (by dkg) to clarify that might happen, but obviously some have missed the notice, as it came a bit late.

In any case, the review did not make me confident in the software architecture or implementation of the GnuPG program.

autopkgtest testing

As part of our LTS work, we often run tests to make sure everything is in order. Starting with Jessie, we are now seeing packages with autopkgtest enabled, so I started meddling with that program. One of the ideas I was hoping to implement was to unify my virtualization systems. Right now I'm using:

Because sbuild can talk with autopkgtest, and autopkgtest can talk with qemu (which can use KVM images), I figured I could get rid of schroot. Unfortunately, I met a few snags;

  • #911977: how do we correctly guess the VM name in autopkgtest?
  • #911963: qemu build fails with proxy_cmd: parameter not set (fixed and provided a patch)
  • #911979: fails on chown in autopkgtest-qemu backend
  • #911981: qemu server warns about missing CPU features

So I gave up on that approach. But I did get autopkgtest working and documented the process in my quick Debian development guide.

Oh, and I also got sucked down into wiki stylesheet (#864925) after battling with the SystemBuildTools page.

Spamassassin followup

Last month I agreed we could backport the latest upstream version of SpamAssassin (a recurring pattern). After getting the go from the maintainer, I got a test package uploaded but the actual upload will need to wait for the stretch update (#912198) to land to avoid a versioning conflict.

Salt Stack

My first impression of Salt was not exactly impressive. The CVE-2017-7893 issue was rather unclear: first upstream fixed the issue, but reverted the default flag which would enable signature forging after it was discovered this would break compatibility with older clients.

But even worse, the 2014 version of Salt shipped in Jessie did not have master signing in the first place, which means there was simply no way to protect from master impersonation, a worrisome concept. But I assumed this was expected behavior and triaged this away from jessie, and tried to forgot about the horrors I had seen.

phpLDAPadmin with sunweaver

I looked next at the phpLDAPadmin (or PHPLDAPadmin?) vulnerabilities, but could not reproduce the issue using the provided proof of concept. I have also audited the code and it seems pretty clear the code is protected against such an attack, as was explained by another DD in #902186. So I asked Mitre for rejection, and uploaded DLA-1561-1 to fix the other issue (CVE-2017-11107). Meanwhile the original security researcher acknowledged the security issue was a "false positive", although only in a private email.

I almost did a NMU for the package but the security team requested to wait, and marked the package as grave so it gets kicked out of buster instead. I at least submitted the patch, originally provided by Ubuntu folks, upstream.

Smarty3

Finally, I worked on the smart3 package. I confirmed the package in jessie is not vulnerable, because Smarty hadn't yet had the brilliant idea of "optimizing" realpath by rewriting it with new security vulnerabilities. Indeed, the CVE-2018-13982 proof of content and CVE-2018-16831 proof of content both fail in jessie.

I have tried to audit the patch shipped with stretch to make sure it fixed the security issue in question (without introducing new ones of course) abandoned parsing the stretch patch because this regex gave me a headache:

'%^(?<root>(?:<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aalpha%3A" rel="nofollow">?</a>:alpha:</span>:[\\\\]|/|[\\\\]{2}<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aalpha%3A" rel="nofollow">?</a>:alpha:</span>+|<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aprint%3A" rel="nofollow">?</a>:print:</span>{2,}:[/]{2}|[\\\\])?)(?<path>(?:<span class="createlink"><a href="/ikiwiki.cgi?do=create&amp;from=blog%2F2018-11-01-report&amp;page=%3Aprint%3A" rel="nofollow">?</a>:print:</span>*))$%u', "who is supporting our users?"

I finally participated in a discussion regarding concerns about support of cloud images for LTS releases. I proposed that, like other parts of Debian, responsibility of those images would shift to the LTS team when official support is complete. Cloud images fall in that weird space (ie. "Installing Debian") which is not traditionally covered by the LTS team.

Hopefully that will become the policy, but only time will tell how this will play out.

Other free software work irssi sandbox

I had been uncomfortable running irssi as my main user on my server for a while. It's a constantly running network server, sometimes connecting to shady servers too. So it made sense to run this as a separate user and, while I'm there, start it automatically on boot.

I created the following file in /etc/systemd/system/irssi@.service, based on this gist:

[Unit] Description=IRC screen session After=network.target [Service] Type=forking User=%i ExecStart=/usr/bin/screen -dmS irssi irssi ExecStop=/usr/bin/screen -S irssi -X stuff '/quit\n' NoNewPrivileges=true [Install] WantedBy=multi-user.target

A whole apparmor/selinux/systemd profile could be written for irssi of course, but I figured I would start with NoNewPrivileges. Unfortunately, that line breaks screen, which is sgid utmp which is some sort of "new privilege". So I'm running this as a vanilla service. To enable, simply enable the service with the right username, previously created with adduser:

systemctl enable irssi@foo.service systemctl start irssi@foo.service

Then I join the session by logging in as the foo user, which can be configured in .ssh/config as a convenience host:

Host irc.anarc.at Hostname shell.anarc.at User foo IdentityFile ~/.ssh/id_ed25519_irc # using command= in authorized_keys until we're all on buster #RemoteCommand screen -x RequestTTY force

Then the ssh irc.anarc.at command rejoins the screen session.

Monkeysphere revival

Monkeysphere was in bad shape in Debian buster. The bit rotten test suite was failing and the package was about to be removed from the next Debian release. I filed and worked on many critical bugs (Debian bug #909700, Debian bug #908228, Debian bug #902367, Debian bug #902320, Debian bug #902318, Debian bug #899060, Debian bug #883015) but the final fix came from another user. I was also welcome on the Debian packaging team which should allow me to make a new release next time we have similar issues, which was a blocker this time round.

Unfortunately, I had to abandon the Monkeysphere FreeBSD port. I had simply forgotten about that commitment and, since I do not run FreeBSD anywhere anymore, it made little sense to keep on doing so, especially since most of the recent updates were done by others anyways.

Calendar project

I've been working on a photography project since the beginning of the year. Each month, I pick the best picture out of my various shoots and will collect those in a 2019 calendar. I documented my work in the photo page, but most of my work in October was around finding a proper tool to layout the calendar itself. I settled on wallcalendar, a beautiful LaTeX template, because the author was very responsive to my feature request.

I also figured out which events to include in the calendar and a way to generate moon phases (now part of the undertime package) for the local timezone. I still have to figure out which other astronomical events to include. I had no response from the local Planetarium but (as always) good feedback from NASA folks which pointed me at useful resources to top up the calendar.

Kubernetes

I got deeper into Kubernetes work by helping friends setup a cluster and share knowledge on how to setup and manage the platforms. This led me to fix a bug in Kubespray, the install / upgrade tool we're using to manage Kubernetes. To get the pull request accepted, I had to go through the insanely byzantine CLA process of the CNCF, which was incredibly frustrating, especially since it was basically a one-line change. I also provided a code review of the Nextcloud helm chart and reviewed the python-hvac ITP, one of the dependencies of Kubespray.

As I get more familiar with Kubernetes, it does seem like it can solve real problems especially for shared hosting providers. I do still feel it's overly complex and over-engineered. It's difficult to learn and moving too fast, but Docker and containers are such a convenient way to standardize shipping applications that it's hard to deny this new trend does solve a problem that we have to fix right now.

CD archival

As part of my work on archiving my CD collection, I contributed three pull requests to fix issues I was having with the project, mostly regarding corner cases but also improvements on the Dockerfile. At my suggestion, upstream also enabled automatic builds for the Docker image which should make it easier to install and deploy.

I still wish to write an article on this, to continue my series on archives, which could happen in November if I can find the time...

Flatpak conversion

After reading a convincing benchmark I decided to give Flatpak another try and ended up converting all my Snap packages to Flatpak.

Flatpak has many advantages:

  • it's decentralized: like APT or F-Droid repositories, anyone can host their own (there is only one Snap repository, managed by Canonical)

  • it's faster: the above benchmarks hinted at this, but I could also confirm Signal starts and runs faster under Flatpak than Snap

  • it's standardizing: many of the work Flatpak is doing to make sense of how to containerize desktop applications is being standardized (and even adopted by Snap)

Much of this was spurred by the breakage of Zotero in Debian (Debian bug #864827) due to the Firefox upgrade. I made a wiki page to tell our users how to install Zotero in Debian considering Zotero might take a while to be packaged back in Debian (Debian bug #871502).

Debian work

Without my LTS hat, I worked on the following packages:

Other work

Usual miscellaneous:

Catégories: External Blogs

Montréal-Python 73: Despotic Wagon

Montreal Python - lun, 10/29/2018 - 23:00

Just in time for PyCon Canada, we are organizing an amazing evening with great local Pythonisthas. It is your chance to come to support them, see their talk in avant-première and who knows maybe to give them some feedback.

For PyCon Canada: don't forget it's next month, on November 10-11, in Toronto and there's still some tickets available. You should pick yours by going at https://2018.pycon.ca/registration.

Presentations Andrew Francis

Physical libraries are great! Managing library material via web interfaces leaves much to be desired. In the age of Siri and Alexa, why can’t one manage one’s library loans with text messaging or voice? This talk discusses questions and answers by prototyping a Python based conversational agent

Python packaging for everyone - Eric Araujo

Packaging in Python used to be a complicated affair, for technical and human reasons. Thankfully, in recent years the Python community has developed robust tools and practices. If you are wondering how to develop and distribute your project, this talk will show you the best of 2018!

Numpy to PyTorch - Shagun Sodhani

Numpy is the de-facto choice for array-based operations while PyTorch largely used as a deep learning framework. At the core, both provide a powerful N-dimensional tensor. This talk would focus on the similarities and difference between the two and how we can use PyTorch to augment Numpy.

Why are robots becoming Pythonistas? - Maxime St-Pierre

Introduction: In the fast pace and intense world of robotics, many praises a particular language, this godsend is Python. In this talk, we will look at some robotic frameworks and try to understand why Python is a popular alternative to C++ and Java.

Keep It Simply Annotated, Stupid - Sébastien Portebois

Des déclarations de type en Python. Hérésie? Depuis quand? Survolons ensemble le support de Python 2.7 à 3.7, les contraintes pour les développeurs et au runtime, et surtout: pour pourquoi voudrait-on ou devrait-on faire ça!

When

Monday November 5th, 2018 at 6PM

Where

Shopify Montreal Office 490 rue de la Gauchetière Montréal, Québec

Schedule
  • 6:00PM - Doors open
  • 6:30PM - Presentations
  • 8:00PM - End of the event
  • 8:15PM - Benelux
Catégories: External Blogs

Montréal-Python 73: Call for Speakers - Despotic Wagon

Montreal Python - dim, 10/14/2018 - 23:00

Just in the time for the end of Fall we are officially calling for speakers for our next pythonesque rendez-vous.

Either you want to practice your talk for PyCon Canada, or you would like to present to us your latest discovery, send us your proposition by email at mtlpyteam@googlegroups.com or join us on slack at http://montrealpython.org/en/slackin.

By the way, it's time to buy your ticket for PyCon Canada 2018 at following link: https://2018.pycon.ca/.

When

Monday November 5th, 2018 at 6PM

Where

Shopify, 490 rue de la Gauchetière Montréal, Québec (map)

Schedule

6:00PM - Doors open

6:30PM - Presentations

8:00PM - End of the event

8:15PM - Benelux

Catégories: External Blogs

Archived a part of my CD collection

Anarcat - jeu, 10/11/2018 - 10:52

After about three days of work, I've finished archiving a part of my old CD collection. There were about 200 CDs in a cardboard box that were gathering dust. After reading Jonathan Dowland's post about CD archival, I got (rightly) worried it would be damaged beyond rescue so I sat down and did some research on the rescue mechanisms. My notes are in rescue and I hope to turn this into a more agreeable LWN article eventually.

I post this here so I can put a note in the box with a permanent URL for future reference as well.

Remaining work

All the archives created were dumped in the ~/archive or ~/mp3 directories on curie. Data needs to be deduplicated, replicated, and archived somewhere more logical.

Inventory

I have a bunch of piles:

  • a spindle of disks that consists mostly of TV episodes, movies, distro and Windows images/ghosts. not imported.
  • a pile of tapes and Zip drives. not imported.
  • about fourty backup disks. not imported.
  • about five "books" disks of various sorts. ISOs generated. partly integrated in my collection, others failed to import or were in formats that were considered non-recoverable
  • a bunch of orange seeds piles
    • Burn Your TV masters and copies
    • apparently live and unique samples - mostly imported in mp3
    • really old stuff with tons of dupes - partly sorted through, in jams4, reste still in the pile
  • a pile of unidentified disks

All disks were eventually identified as trash, blanks, perfect, finished, defective, or not processed. A special needs attention stack was the "to do" pile, and would get sorted through the other piles. each pile was labeled with a sticky note and taped together summarily.

A post-it pointing to the blog post was included in the box, along with a printed version of the blog post summarizing a snapshot of this inventory.

Here is a summary of what's in the box.

Type Count Note trash 13 non-recoverable. not detected by the Linux kernel at all and no further attempt has been made to recover them. blanks 3 never written to, still usable perfect 28 successfully archived, without errors finished 4 almost perfect: but mixed-mode or multi-session defective 21 found to have errors but not considered important enough to re-process total 69 not processed ~100 visual estimate
Catégories: External Blogs

September 2018 report: LTS, Mastodon, Firefox privacy, etc

Anarcat - lun, 10/01/2018 - 15:28
Debian Long Term Support (LTS)

This is my monthly Debian LTS report.

Python updates

Uploaded DLA-1519-1 and DLA-1520-1 to fix CVE-2018-1000802, CVE-2017-1000158, CVE-2018-1061 and CVE-2018-1060 in Python 2.7 and 3.4. The latter three were originally marked as no-dsa but the fix was trivial to backport. I also found that CVE-2017-1000158 was actually relevant for 3.4 even though it was not marked as such in the tracker.

CVE-2018-1000030 was skipped because the fix was too intrusive and unclear.

Enigmail investigations

Security support for Thunderbird and Firefox versions from jessie has stopped upstream. Considering that the Debian security team bit the bullet and updated those in stretch, the consensus seems to be that the versions in jessie will also be updated, which will break third-party extensions in jessie.

One of the main victims of the XULocalypse is Enigmail, which completely stopped working after the stretch update. I looked at how we could handle this. I first proposed to wait before trying to patch the Enigmail version in jessie since it would break when the Thunderbird updates will land. I then detailed five options for the Enigmail security update:

  1. update GnuPG 2 in jessie-security to work with Enigmail, which could break unrelated things

  2. same as 1, but in jessie-backports-sloppy

  3. package the JavaScript dependencies to ship Enigmail with OpenPGP.js correctly.

  4. remove Enigmail from jessie

  5. backport only some patches to GPG 2 in jessie

I then looked at helping the Enigmail maintainers by reviewing the OpenPGP.js packaging through which I found a bug in the JavaScript packaging toolchain, which diverged into a patch in npm2deb to fix source package detection and an Emacs function to write to multiple files. (!!) That work was not directly useful to Jessie, I must admit, but it did end up clarifying which dependencies were missing for OpenPGP to land, which were clearly out of reach of a LTS update.

Switching gears, I tried to help the maintainer untangle the JavaScript mess between multiple copies of code in TB, FF (with itself), and Enigmail's process handling routines; to call GPG properly with multiple file descriptors for password, clear-text, statusfd, and output; to have Autocrypt be able to handle "Autocrypt Setup Messages" (ASM) properly (bug #908510); to finally make the test suite pass. The alternative here would be to simply rip Autocrypt out of Enigmail for the jessie update, but this would mean diverging significantly from the upstream version.

Reports of Enigmail working with older versions of GPG are deceiving, as that configuration introduces unrelated security issues (T4017 and T4018 in upstream's bugtracker).

So much more work remains on backporting Enigmail, but I might work for the stable/unstable updates to complete before pushing that work further. Instead, I might focus on the Thunderbird and Firefox updates next.

GnuTLS

I worked more on the GnuTLS research as a short followup to our previous discussion.

I wrote the researchers who "still stand behind what is written in the paper" and believe the current fix in GnuTLS is incomplete. GnuTLS upstream seems to agree, more or less, but point out that the fix, even if incomplete, greatly reduces the scope of those vulnerabilities and a long-term fix is underway.

Next step, therefore, is deciding if we backport the patches or just upgrade to the latest 3.3.x series, as the ABI/API changes are minor (only additions).

Other work
  • completed the work on gdm3 and git-annex by uploading DLA-1494-1 and DLA-1495-1

  • fixed Debian bug #908062 in devscripts to make dch generate proper version numbers since jessie was released

  • checked with the Spamassessin maintainer regarding the LTS update and whether we just use 3.4.2 across all suites

  • reviewed and tested Hugo's work on 389-ds. That involved getting familiar with that "other" slapd server (apart from OpenLDAP) which I did not know about.

  • checked that kdepim doesn't load external content so it is not vulnerable to EFAIL by default. The proposed upstream patch changes the API so that work is postponed.

  • triaged the Xen security issues by severity

  • filed bugs about Docker security issues (CVE-2017-14992 and CVE-2018-10892)

Other free software work

I have, this month again, been quite spread out on many unrelated projects unfortunately.

Mastodon

I've played around with the latest attempt from the free software community to come up with a "federation" model to replace Twitter and other social networks, Mastodon. I've had an account for a while but I haven't talked about it much here yet.

My Mastodon account is linked with my Twitter account through some unofficial Twitter cross-posting app which more or less works. Another "app" I use is the toot client to connect my website with Mastodon through feed2exec.

And because all of this social networking stuff is just IRC 2.0, I read it all through my IRC client, thanks to Bitlbee and Mastodon is (thankfully) no exception. Unfortunately, there's a problem in my hosting provider's configuration which has made it impossible to read Mastodon status from Bitlbee for a while. I've created a test profile on the main Mastodon instance to double-check, and indeed, Bitlbee works fine there.

Before I figured that out, I tried upgrading the Bitlbee Mastodon bridge (for which I also filed a RFP) and found a regression has been introduced somewhere after 1.3.1. On the plus side, the feature request I filed to allow for custom visibility statuses from Bitlbee has been accepted, which means it's now possible to send "private" messages from Bitlbee.

Those messages, unfortunately, are not really private: they are visible to all followers, which, in the social networking world, means a lot of people. In my case, I have already accepted over a dozen followers before realizing how that worked, and I do not really know or trust most of those people. I have still 15 pending follow requests which I don't want to approve until there's a better solution, which would probably involve two levels of followship. There's at least one proposal to fix this already.

Another thing I'm concerned about with Mastodon is account migration: what happens if I'm unhappy with my current host? Or if I prefer to host it myself? My online identity is strongly tied with that hostname and there doesn't seem to be good mechanisms to support moving around Mastodon instances. OpenID had this concept of delegation where the real OpenID provider could be discovered and redirected, keeping a consistent identity. Mastodon's proposed solutions seem to aim at using redirections or at least informing users your account has moved which isn't as nice, but might be an acceptable long-term compromise.

Finally, it seems that Mastodon will likely end up in the same space as email with regards to abuse: we are already seeing block lists show up to deal with abusive servers, which is horribly reminiscent of the early days of spam fighting, where you could keep such lists (as opposed to bayesian or machine learning). Fundamentally, I'm worried about the viability of this ecosystem, just like I'm concerned about the amount of fake news, spam, and harassment that takes place on commercial platforms. One theory is that the only way to fix this is to enforce two-way sharing between followers, the approach taken by Manyverse and Scuttlebutt.

Only time will tell, I guess, but Mastodon does look like a promising platform, at least in terms of raw numbers of users...

The ultimate paste bin?

I've started switching towards ptpb.pw as a pastebin. Besides the unfortunate cryptic name, it's a great tool: multiple pastes are deduplicated, large pastes are allowed, there is a (limited) server-side viewing mechanism (allowing for some multimedia), etc. The only things missing are "burn after reading" (one-shot links) and client-side encryption yet the latter is planned.

I like the simplistic approach to the API that makes it easy to use from any client. I've submitted the above feature request and a trivial patch so far.

ELPA packaging work

I've done a few reviews and sponsoring of Emacs List Packages ("ELPA") for Debian, mostly for packages I requested myself but who were so nicely made by Nicolas (elpa-markdown-toc, elpa-auto-dictionary). To better figure out which packages are missing, I wrote this script to parse the output from an ELPA and compare it with what is in Debian. This involved digging deep into the API of the Debian archive, which in turn was useful for the JavaScript work previously mentioned. The result is in the firefox page which lists all the extensions I use and their equivalent in Debian.

I'm not very happy with the script: it's dirty, and I feel dirty. It seems to me this should be done on the fly, through some web service, and should support multiple languages. It seems we are constantly solving this problem for each ecosystem while the issues are similar...

Firefox privacy issues

I went down another rabbit hole after learning about Mozilla's plan to force more or less mandatory telemetry in future versions of Firefox. That got me thinking of how many such sniffers were in Firefox and I was in for a bad surprise. It took about a day to establish a (probably incomplete) list of settings necessary to disable all those trackers in a temporary profile starter, originally designed as a replacement for chromium --temp-profile but which turned out to be a study of Firefox's sins.

There are over a hundred of about:config settings that need to be tweaked if someone wants to keep their privacy intact in Firefox. This is especially distressing because Mozilla prides itself on its privacy politics. I've documented this in the Debian wiki as well.

Ideally, there would be a one-shot toggle to disable all those things. Instead, Mozilla is forcing us to play "whack-a-mole" as they pop out another undocumented configuration item with every other release.

Other work
Catégories: External Blogs

Archiving web sites

Anarcat - lun, 09/24/2018 - 19:00

I recently took a deep dive into web site archival for friends who were worried about losing control over the hosting of their work online in the face of poor system administration or hostile removal. This makes web site archival an essential instrument in the toolbox of any system administrator. As it turns out, some sites are much harder to archive than others. This article goes through the process of archiving traditional web sites and shows how it falls short when confronted with the latest fashions in the single-page applications that are bloating the modern web.

Converting simple sites

The days of handcrafted HTML web sites are long gone. Now web sites are dynamic and built on the fly using the latest JavaScript, PHP, or Python framework. As a result, the sites are more fragile: a database crash, spurious upgrade, or unpatched vulnerability might lose data. In my previous life as web developer, I had to come to terms with the idea that customers expect web sites to basically work forever. This expectation matches poorly with "move fast and break things" attitude of web development. Working with the Drupal content-management system (CMS) was particularly challenging in that regard as major upgrades deliberately break compatibility with third-party modules, which implies a costly upgrade process that clients could seldom afford. The solution was to archive those sites: take a living, dynamic web site and turn it into plain HTML files that any web server can serve forever. This process is useful for your own dynamic sites but also for third-party sites that are outside of your control and you might want to safeguard.

For simple or static sites, the venerable Wget program works well. The incantation to mirror a full web site, however, is byzantine:

$ nice wget --mirror --execute robots=off --no-verbose --convert-links \ --backup-converted --page-requisites --adjust-extension \ --base=./ --directory-prefix=./ --span-hosts \ --domains=www.example.com,example.com http://www.example.com/

The above downloads the content of the web page, but also crawls everything within the specified domains. Before you run this against your favorite site, consider the impact such a crawl might have on the site. The above command line deliberately ignores [robots.txt][] rules, as is now common practice for archivists, and hammer the website as fast as it can. Most crawlers have options to pause between hits and limit bandwidth usage to avoid overwhelming the target site.

The above command will also fetch "page requisites" like style sheets (CSS), images, and scripts. The downloaded page contents are modified so that links point to the local copy as well. Any web server can host the resulting file set, which results in a static copy of the original web site.

That is, when things go well. Anyone who has ever worked with a computer knows that things seldom go according to plan; all sorts of things can make the procedure derail in interesting ways. For example, it was trendy for a while to have calendar blocks in web sites. A CMS would generate those on the fly and make crawlers go into an infinite loop trying to retrieve all of the pages. Crafty archivers can resort to regular expressions (e.g. Wget has a --reject-regex option) to ignore problematic resources. Another option, if the administration interface for the web site is accessible, is to disable calendars, login forms, comment forms, and other dynamic areas. Once the site becomes static, those will stop working anyway, so it makes sense to remove such clutter from the original site as well.

JavaScript doom

Unfortunately, some web sites are built with much more than pure HTML. In single-page sites, for example, the web browser builds the content itself by executing a small JavaScript program. A simple user agent like Wget will struggle to reconstruct a meaningful static copy of those sites as it does not support JavaScript at all. In theory, web sites should be using progressive enhancement to have content and functionality available without JavaScript but those directives are rarely followed, as anyone using plugins like NoScript or uMatrix will confirm.

Traditional archival methods sometimes fail in the dumbest way. When trying to build an offsite backup of a local newspaper (pamplemousse.ca), I found that WordPress adds query strings (e.g. ?ver=1.12.4) at the end of JavaScript includes. This confuses content-type detection in the web servers that serve the archive, which rely on the file extension to send the right Content-Type header. When such an archive is loaded in a web browser, it fails to load scripts, which breaks dynamic websites.

As the web moves toward using the browser as a virtual machine to run arbitrary code, archival methods relying on pure HTML parsing need to adapt. The solution for such problems is to record (and replay) the HTTP headers delivered by the server during the crawl and indeed professional archivists use just such an approach.

Creating and displaying WARC files

At the Internet Archive, Brewster Kahle and Mike Burner designed the ARC (for "ARChive") file format in 1996 to provide a way to aggregate the millions of small files produced by their archival efforts. The format was eventually standardized as the WARC ("Web ARChive") specification that was released as an ISO standard in 2009 and revised in 2017. The standardization effort was led by the International Internet Preservation Consortium (IIPC), which is an "international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future", according to Wikipedia; it includes members such as the US Library of Congress and the Internet Archive. The latter uses the WARC format internally in its Java-based Heritrix crawler.

A WARC file aggregates multiple resources like HTTP headers, file contents, and other metadata in a single compressed archive. Conveniently, Wget actually supports the file format with the --warc parameter. Unfortunately, web browsers cannot render WARC files directly, so a viewer or some conversion is necessary to access the archive. The simplest such viewer I have found is pywb, a Python package that runs a simple webserver to offer a Wayback-Machine-like interface to browse the contents of WARC files. The following set of commands will render a WARC file on http://localhost:8080/:

$ pip install pywb $ wb-manager init example $ wb-manager add example crawl.warc.gz $ wayback

This tool was, incidentally, built by the folks behind the Webrecorder service, which can use a web browser to save dynamic page contents.

Unfortunately, pywb has trouble loading WARC files generated by Wget because it followed an inconsistency in the 1.0 specification, which was fixed in the 1.1 specification. Until Wget or pywb fix those problems, WARC files produced by Wget are not reliable enough for my uses, so I have looked at other alternatives. A crawler that got my attention is simply called crawl. Here is how it is invoked:

$ crawl https://example.com/

(It does say "very simple" in the README.) The program does support some command-line options, but most of its defaults are sane: it will fetch page requirements from other domains (unless the -exclude-related flag is used), but does not recurse out of the domain. By default, it fires up ten parallel connections to the remote site, a setting that can be changed with the -c flag. But, best of all, the resulting WARC files load perfectly in pywb.

Future work and alternatives

There are plenty more resources for using WARC files. In particular, there's a Wget drop-in replacement called Wpull that is specifically designed for archiving web sites. It has experimental support for PhantomJS and youtube-dl integration that should allow downloading more complex JavaScript sites and streaming multimedia, respectively. The software is the basis for an elaborate archival tool called ArchiveBot, which is used by the "loose collective of rogue archivists, programmers, writers and loudmouths" at ArchiveTeam in its struggle to "save the history before it's lost forever". It seems that PhantomJS integration does not work as well as the team wants, so ArchiveTeam also uses a rag-tag bunch of other tools to mirror more complex sites. For example, snscrape will crawl a social media profile to generate a list of pages to send into ArchiveBot. Another tool the team employs is crocoite, which uses the Chrome browser in headless mode to archive JavaScript-heavy sites.

This article would also not be complete without a nod to the HTTrack project, the "website copier". Working similarly to Wget, HTTrack creates local copies of remote web sites but unfortunately does not support WARC output. Its interactive aspects might be of more interest to novice users unfamiliar with the command line.

In the same vein, during my research I found a full rewrite of Wget called Wget2 that has support for multi-threaded operation, which might make it faster than its predecessor. It is missing some features from Wget, however, most notably reject patterns, WARC output, and FTP support but adds RSS, DNS caching, and improved TLS support.

Finally, my personal dream for these kinds of tools would be to have them integrated with my existing bookmark system. I currently keep interesting links in Wallabag, a self-hosted "read it later" service designed as a free-software alternative to Pocket (now owned by Mozilla). But Wallabag, by design, creates only a "readable" version of the article instead of a full copy. In some cases, the "readable version" is actually unreadable and Wallabag sometimes fails to parse the article. Instead, other tools like bookmark-archiver or reminiscence save a screenshot of the page along with full HTML but, unfortunately, no WARC file that would allow an even more faithful replay.

The sad truth of my experiences with mirrors and archival is that data dies. Fortunately, amateur archivists have tools at their disposal to keep interesting content alive online. For those who do not want to go through that trouble, the Internet Archive seems to be here to stay and Archive Team is obviously working on a backup of the Internet Archive itself.

This article first appeared in the Linux Weekly News.

As usual, here's the list of issues and patches generated while researching this article:

I also want to personally thank the folks in the #archivebot channel for their assistance and letting me play with their toys.

The Pamplemousse crawl is now available on the Internet Archive, it might end up in the wayback machine at some point if the Archive curators think it is worth it.

Another example of a crawl is this archive of two Bloomberg articles which the "save page now" feature of the Internet archive wasn't able to save correctly. But webrecorder.io could! Those pages can be seen in the web recorder player to get a better feel of how faithful a WARC file really is.

Finally, this article was originally written as a set of notes and documentation in the archive page which may also be of interest to my readers.

Catégories: External Blogs

August 2018 report: LTS, Debian, Upgrades

Anarcat - ven, 08/31/2018 - 19:19
Debian Long Term Support (LTS)

This is my monthly Debian LTS report.

twitter-bootstrap

I researched some of the security issue of the Twitter Bootstrap framework which is clearly showing its age in Debian. Of the three vulnerabilities, I couldn't reproduce two (CVE-2018-14041 and CVE-2018-14042) so I marked them as "not affecting" jessie. I also found that CVE-2018-14040 was relevant only for Bootstrap 3 (because yes, we still have Bootstrap 2, in all suites, which will hopefully be fixed in buster)

The patch for the latter was a little tricky to figure out, but ended up being simple. I tested the patch with a private copy of the code which works here and published the result as DLA-1479-1.

What's concerning with this set of vulnerabilities is they show a broader problem than the one identified in those specific instances. May found at least one similar other issue although I wasn't able to exploit it in a quick attempt. Besides, I'm not sure we want to audit the entire Bootstrap codebase: upstream fixed this issue more widely in the v4 series, and Debian should follow suite, at least in future releases, and remove older releases from the archive.

tiff

A classic. I tried and failed to reproduce CVE-2018-15209 in the tiff package. I'm a bit worried by Brian May's results that the proof of concept eats up all memory in his tests. Since I could not reproduce, I marked the package as N/A in jessie and moved on.

Ruby 2.1

Another classic source of vulnerabilities... The patches were easy to backport, tests passed, so I just uploaded and published DLA-1480-1.

GDM 3

I reviewed Markus Koschany's work on CVE-2018-14424. His patches seemed to work in my tests as I couldn't see any segfault in jessie, either in the kernel messages or through a debugger.

True, the screen still "flashes" so one might think there is still a crash, but this is actually expected behavior. Indeed, this is the first D-Bus command being ran:

dbus-send --system --dest=org.gnome.DisplayManager --type=method_call --print-reply=literal /org/gnome/DisplayManager/LocalDisplayFactory org.gnome.DisplayManager.LocalDisplayFactory.CreateTransientDisplay

Or, in short, CreateTransientDisplay, which is also known as fast user switching, brings you back to the login screen. If you enter the same username and password, you get your session back. So no crash. After talking with Koschany, we'll wait a little longer for feedback from the reporter but otherwise I expect to publish the fixed package shortly.

git-annex

This is a bigger one I took from Koschany. The patch was large, and in a rather uncommon language (Haskell).

The first patch was tricky as function names had changed and some functionality (the P2P layer, the setkey command and content verification) were completely missing. On advice from upstream, the content verification functionality was backported as it was critical for the second tricky patch which required more Haskell gymnastics.

This time again, Haskell was nice to work with: by changing type configurations and APIs, the compiler makes sure that everything works out and there are no inconsistencies. This logic is somewhat backwards to what we are used to: normally, in security updates, we avoid breaking APIs at all costs. But in Haskell, it's a fundamental way to make sure the system is still coherent.

More details, including embarrassing fixes to the version numbering scheme, are best explained in the email thread. An update for this will come out shortly, after giving more time for upstream to review the final patchset.

Fighting phishing

After mistyping the address of the security tracker, I ended up on this weird page:

Some phishing site masquerading as a Teksavvy customer survey.

Confused and alarmed, I thought I was being intercepted by my ISP, but after looking on their forums, I found out they actually get phished like this all the time. As it turns out, the domain name debain.org (notice the typo) is actually registered to some scammers. So I implemented a series of browser quick searches as a security measure and shared those with the community. Only after feedback from a friend did I realize that surfraw (SR) has been doing this all along. The problem with SR is that it's mostly implemented with messy shell scripts and those cannot easily be translated back into browser shortcuts, which are still useful on their own. That and the SR plugins (called "elvi" or "elvis" in plural) are horribly outdated.

Ideally, trivial "elvis" would simply be "bookmarks" (which are really just one link per line) that can then easily be translated back into browser bookmarks. But that would require converting a bunch of those plugins, something I don't currently have the energy (or time) for. All this reminds me a lot of the interwiki links from the wiki world and looks like an awful duplication of information. Even in this wiki I have similar shortcuts, which are yet another database of such redirections. Surely there is a better way than maintaining all of this separately?

Who claimed all the packages?

After struggling again to find some (easy, I admit) work, I worked on a patch to show per-user package claims. Now, if --verbose is specified, the review-update-needed script will also show a list of users who claimed packages and how many are claimed. This can help us figure out who's overloaded and might need some help.

Post-build notifications in sbuild

I sent a patch to sbuild to make sure we can hook into failed builds on completion as well as successful builds. Upstream argued this is best accomplished with a wrapper, but I believe it's unsufficient as a wrapper will not have knowledge of the sbuild internals and won't be able to effectively send notifications. It is, after all, while there is a post-build hook right now, which runs only on succesful builds.

GnuTLS and other reviews

I reviewed questions from Ola Lundqvist regarding the pending GnuTLS security vulnerabilities designated CVE-2018-10844, CVE-2018-10845 and CVE-2018-10846. Those came from a paper called Pseudo Constant Time Implementations of TLS Are Only Pseudo Secure. I am still unsure of the results: after reviewing the paper in detail, I am worried the upstream fixes are complete. Hopefully Lundqvist will figure it out but in any case I am available to review this work again next week.

I also provided advice on a squirrelmail bugfix backport suggestion.

Other free software work

I was actually on vacation this month so this is a surprising amount of activity for what was basically a week of work.

Buster upgrade

I upgraded my main workstation to buster, in order to install various Node.JS programs through npm for that Dat article (which will be public here shortly). It's part of my routine: when enough backports pile up or I need too much stuff from unstable, it's time to make the switch. This makes development on Debian easier and helps testing the next version of stable before it is released. I do this only on my most busy machine where I can fix things quickly and they break: my laptop and server remain on stable so I don't have to worry about them too much.

It was a bumpy ride: font rendering changed because of the new rendering engine in FreeType. Someone ended up finding a workaround in Debian bug #866685 which allowed me to keep the older rendering engine but I am worried it might be removed in the future. Hopefully that bug will trickle upstream and Debian users won't see a regression when they upgrade to buster.

A major issue was a tiny bug in the python-sh library which caused my entire LWN workflow to collapse. Thankfully, it turned out upstream had already released a fix and all I had to do was to update the package and NMU the result. As it turns out, I was already part of the Python team, and that should have been marked as a team upload, but I didn't know. Strange how memory works sometimes.

Other problems were similar: dictd, for example, failed to upgrade (Debian bug #906420, fixed). There are about 15 different packages that are missing from stretch: many FTBFS problems, other with real critical bugs. Others are just merely missing from the archive: I particularly pushed on wireguard (Debian bug #849308), taffybar (Debian bug #895264), and hub (Debian bug #807866).

I won't duplicate the whole upgrade documentation here, the details are in buster.

Debian 25th anniversary updates

The Debian project turned 25 this year so it was a good occasion to look back at history and present. I became a Debian Developer in 2010, a Debian maintainer in 2009, and my first contributions to the project go all the way back to 2003, when I started filing bugs. So this is anywhere between my 8th and 15th birthday in the project.

I didn't celebrate this in any special way, although I did make sure to keep my packages up to date when I returned from vacation. That meant a few uploads:

Work on smokeping and charybdis happened as part of our now regular Debian & Stuff along with LeLutin which is helping and learning a few packaging tricks along the way.

Other software upgrades

During the above buster upgrade, Prometheus broke because the node exporter metrics labels changed. More accurately, what happened is that Grafana would fail to display some of the nodes. As it turns out, all that was needed was to update a few Grafana dashboard (as those don't update automatically of course). But it brought to my attention that a bunch of packages I had installed were not being upgraded as automatically as the rest of my Debian infrastructure. There were a few reasons for that:

  1. packages were installed from a third-party repository

  2. packages were installed from unstable

  3. there were no packages: software was installed in a container

  4. there were no packages: software was installed by hand

I'm not sure what is the worst between 3 and 4. As it turns out, containers were harder to deal with because they also involved upgrading docker.io which was more difficult.

For each forgotten program, I tried to make sure they wouldn't stay stale any longer in the case of 1 or 2, a proper apt preference (or "pin") was added to automate upgrades. For 3 and 4, I added the release feeds of the program to feed2exec so I get an email when upstream makes a new release.

Those are the programs I had to deal with:

  • rainloop: the upgrade guide is trivial. make a backup:

    (umask 0077; tar cfz rainloop.tgz /var/lib/rainloop)

    Then decompress archive on top. It keeps old data in rainloop/v/1.11.0.203/ which should probably be removed on next upgrade. Upgrade presumably runs when visiting the site which worked flawlessly after upgrade.

  • grafana: the upgrade guide says to backup the database in /var/lib/grafana/grafana.db but i backed up the whole thing:

    (umask 0077; tar cfz grafana /var/lib/grafana

    The upgrade from 4.x to 5.2.x was trivial and automated. There is, unfortunately, still no official package. A visit to the Grafana instance shows some style changes and improvements and that things generally just work.

  • the toot Mastodon client has entered Debian so I was able to remove yet another third party repository. this involved adding a pin to follow the buster sources for this package:

    Package: toot Pin: release n=buster Pin-Priority: 500
  • Docker is in this miserable state in stretch. There is an really old binary in jessie-backports (1.6) and for some reason I had a random version from unstable running 1.13.1~ds1-2. I upgraded to the sid version, which installs fine in stretch because golang is statically compiled. But, the containers did not restart automatically. starting them by hand gave this error:

    root@marcos:/etc/apt/sources.list.d# ~anarcat/bin/subsonic-start e4216f435be477dacd129ed8c2b23b2b317e9ef9a61906f3ba0e33265c97608e docker: Error response from daemon: OCI runtime create failed: json: cannot unmarshal object into Go value of type []string: unknown.

    Strangely, the container was started, but is not reachable over the network. The problem is that runc needs to be upgraded as well, so that was promptly fixed.

    The magic pin to follow buster is like this:

    Package: docker.io runc Pin: release n=buster Pin-Priority: 500
  • airsonic upgrades are a little trickier because I run this inside a docker container. First step is to fix the Dockerfile and rebuild the container image:

    sed -i s/10.1.1/10.1.2/ Dockerfile sudo docker build -t anarcat/airsonic .

    then the image is ready to go. the previous container needs to be stopped and the new one started:

    docker ps docker stop 78385cb29cd5 ~anarcat/bin/subsonic-start

    The latter is a script I wrote because I couldn't remember the magic startup sequence, which is silly: you'd think the Dockerfile would know stuff like that. A visit to the radio site showed that everything seemed to be in order but no deeper test was performed.

All of this assumes that updates to unstable will not be too disruptive or that, if they do, the NEWS.Debian file will warn me so I can take action. That is probably a little naive of me, but it beats having outdated infrastructure running exposed on the network.

Other work

Then there's the usual:

Catégories: External Blogs

Sharing and archiving data sets with Dat

Anarcat - dim, 08/26/2018 - 19:00

Dat is a new peer-to-peer protocol that uses some of the concepts of BitTorrent and Git. Dat primarily targets researchers and open-data activists as it is a great tool for sharing, archiving, and cataloging large data sets. But it can also be used to implement decentralized web applications in a novel way.

Dat quick primer

Dat is written in JavaScript, so it can be installed with npm, but there are standalone binary builds and a desktop application (as an AppImage). An online viewer can be used to inspect data for those who do not want to install arbitrary binaries on their computers.

The command-line application allows basic operations like downloading existing data sets and sharing your own. Dat uses a 32-byte hex string that is an ed25519 public key, which is is used to discover and find content on the net. For example, this will download some sample data:

$ dat clone \ dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943666fe639 \ ~/Downloads/dat-demo

Similarly, the share command is used to share content. It indexes the files in a given directory and creates a new unique address like the one above. The share command starts a server that uses multiple discovery mechanisms (currently, the Mainline Distributed Hash Table (DHT), a custom DNS server, and multicast DNS) to announce the content to its peers. This is how another user, armed with that public key, can download that content with dat clone or mirror the files continuously with dat sync.

So far, this looks a lot like BitTorrent magnet links updated with 21st century cryptography. But Dat adds revisions on top of that, so modifications are automatically shared through the swarm. That is important for public data sets as those are often dynamic in nature. Revisions also make it possible to use Dat as a backup system by saving the data incrementally using an archiver.

While Dat is designed to work on larger data sets, processing them for sharing may take a while. For example, sharing the Linux kernel source code required about five minutes as Dat worked on indexing all of the files. This is comparable to the performance offered by IPFS and BitTorrent. Data sets with more or larger files may take quite a bit more time.

One advantage that Dat has over IPFS is that it doesn't duplicate the data. When IPFS imports new data, it duplicates the files into ~/.ipfs. For collections of small files like the kernel, this is not a huge problem, but for larger files like videos or music, it's a significant limitation. IPFS eventually implemented a solution to this problem in the form of the experimental filestore feature, but it's not enabled by default. Even with that feature enabled, though, changes to data sets are not automatically tracked. In comparison, Dat operation on dynamic data feels much lighter. The downside is that each set needs its own dat share process.

Like any peer-to-peer system, Dat needs at least one peer to stay online to offer the content, which is impractical for mobile devices. Hosting providers like Hashbase (which is a pinning service in Dat jargon) can help users keep content online without running their own server. The closest parallel in the traditional web ecosystem would probably be content distribution networks (CDN) although pinning services are not necessarily geographically distributed and a CDN does not necessarily retain a complete copy of a website.

The Photos application loading a test gallery in the Beaker browser

A web browser called Beaker, based on the Electron framework, can access Dat content natively without going through a pinning service. Furthermore, Beaker is essential to get any of the Dat applications working, as they fundamentally rely on dat:// URLs to do their magic. This means that Dat applications won't work for most users unless they install that special web browser. There is a Firefox extension called "dat-fox" for people who don't want to install yet another browser, but it requires installing a helper program. The extension will be able to load dat:// URLs but many applications will still not work. For example, the photo gallery application completely fails with dat-fox.

Dat-based applications look promising from a privacy point of view. Because of its peer-to-peer nature, users regain control over where their data is stored: either on their own computer, an online server, or by a trusted third party. But considering the protocol is not well established in current web browsers, I foresee difficulties in adoption of that aspect of the Dat ecosystem. Beyond that, it is rather disappointing that Dat applications cannot run natively in a web browser given that JavaScript is designed exactly for that.

Dat privacy

An advantage Dat has over other peer-to-peer protocols like BitTorrent is end-to-end encryption. I was originally concerned by the encryption design when reading the academic paper [PDF]:

It is up to client programs to make design decisions around which discovery networks they trust. For example if a Dat client decides to use the BitTorrent DHT to discover peers, and they are searching for a publicly shared Dat key (e.g. a key cited publicly in a published scientific paper) with known contents, then because of the privacy design of the BitTorrent DHT it becomes public knowledge what key that client is searching for.

So in other words, to share a secret file with another user, the public key is transmitted over a secure side-channel, only to then leak during the discovery process. Fortunately, the public Dat key is not directly used during discovery as it is hashed with BLAKE2B. Still, the security model of Dat assumes the public key is private, which is a rather counterintuitive concept that might upset cryptographers and confuse users who are frequently encouraged to type such strings in address bars and search engines as part of the Dat experience. There is a security & privacy FAQ in the Dat documentation warning about this problem:

One of the key elements of Dat privacy is that the public key is never used in any discovery network. The public key is hashed, creating the discovery key. Whenever peers attempt to connect to each other, they use the discovery key.

Data is encrypted using the public key, so it is important that this key stays secure.

There are other privacy issues outlined in the document; it states that "Dat faces similar privacy risks as BitTorrent":

When you download a dataset, your IP address is exposed to the users sharing that dataset. This may lead to honeypot servers collecting IP addresses, as we've seen in Bittorrent. However, with dataset sharing we can create a web of trust model where specific institutions are trusted as primary sources for datasets, diminishing the sharing of IP addresses.

A Dat blog post refers to this issue as reader privacy and it is, indeed, a sensitive issue in peer-to-peer networks. It is how BitTorrent users are discovered and served scary verbiage from lawyers, after all. But Dat makes this a little better because, to join a swarm, you must know what you are looking for already, which means peers who can look at swarm activity only include users who know the secret public key. This works well for secret content, but for larger, public data sets, it is a real problem; it is why the Dat project has avoided creating a Wikipedia mirror so far.

I found another privacy issue that is not documented in the security FAQ during my review of the protocol. As mentioned earlier, the Dat discovery protocol routinely phones home to DNS servers operated by the Dat project. This implies that the default discovery servers (and an attacker watching over their traffic) know who is publishing or seeking content, in essence discovering the "social network" behind Dat. This discovery mechanism can be disabled in clients, but a similar privacy issue applies to the DHT as well, although that is distributed so it doesn't require trust of the Dat project itself.

Considering those aspects of the protocol, privacy-conscious users will probably want to use Tor or other anonymization techniques to work around those concerns.

The future of Dat

Dat 2.0 was released in June 2017 with performance improvements and protocol changes. Dat Enhancement Proposals (DEPs) guide the project's future development; most work is currently geared toward implementing the draft "multi-writer proposal" in HyperDB. Without multi-writer support, only the original publisher of a Dat can modify it. According to Joe Hand, co-executive-director of Code for Science & Society (CSS) and Dat core developer, in an IRC chat, "supporting multiwriter is a big requirement for lots of folks". For example, while Dat might allow Alice to share her research results with Bob, he cannot modify or contribute back to those results. The multi-writer extension allows for Alice to assign trust to Bob so he can have write access to the data.

Unfortunately, the current proposal doesn't solve the "hard problems" of "conflict merges and secure key distribution". The former will be worked out through user interface tweaks, but the latter is a classic problem that security projects have typically trouble finding solutions for---Dat is no exception. How will Alice securely trust Bob? The OpenPGP web of trust? Hexadecimal fingerprints read over the phone? Dat doesn't provide a magic solution to this problem.

Another thing limiting adoption is that Dat is not packaged in any distribution that I could find (although I requested it in Debian) and, considering the speed of change of the JavaScript ecosystem, this is unlikely to change any time soon. A Rust implementation of the Dat protocol has started, however, which might be easier to package than the multitude of Node.js modules. In terms of mobile device support, there is an experimental Android web browser with Dat support called Bunsen, which somehow doesn't run on my phone. Some adventurous users have successfully run Dat in Termux. I haven't found an app running on iOS at this point.

Even beyond platform support, distributed protocols like Dat have a tough slope to climb against the virtual monopoly of more centralized protocols, so it remains to be seen how popular those tools will be. Hand says Dat is supported by multiple non-profit organizations. Beyond CSS, Blue Link Labs is working on the Beaker Browser as a self-funded startup and a grass-roots organization, Digital Democracy, has contributed to the project. The Internet Archive has announced a collaboration between itself, CSS, and the California Digital Library to launch a pilot project to see "how members of a cooperative, decentralized network can leverage shared services to ensure data preservation while reducing storage costs and increasing replication counts".

Hand said adoption in academia has been "slow but steady" and that the Dat in the Lab project has helped identify areas that could help researchers adopt the project. Unfortunately, as is the case with many free-software projects, he said that "our team is definitely a bit limited on bandwidth to push for bigger adoption". Hand said that the project received a grant from Mozilla Open Source Support to improve its documentation, which will be a big help.

Ultimately, Dat suffers from a problem common to all peer-to-peer applications, which is naming. Dat addresses are not exactly intuitive: humans do not remember strings of 64 hexadecimal characters well. For this, Dat took a similar approach to IPFS by using DNS TXT records and /.well-known URL paths to bridge existing, human-readable names with Dat hashes. So this sacrifices a part of the decentralized nature of the project in favor of usability.

I have tested a lot of distributed protocols like Dat in the past and I am not sure Dat is a clear winner. It certainly has advantages over IPFS in terms of usability and resource usage, but the lack of packages on most platforms is a big limit to adoption for most people. This means it will be difficult to share content with my friends and family with Dat anytime soon, which would probably be my primary use case for the project. Until the protocol reaches the wider adoption that BitTorrent has seen in terms of platform support, I will probably wait before switching everything over to this promising project.

This article first appeared in the Linux Weekly News.

Catégories: External Blogs

Concerns with Signal receipt notifications

Anarcat - ven, 07/27/2018 - 16:18

During some experiments with a custom Signal client with a friend, let's call him Bob, he was very surprised when we had a conversation that went a little like this:

A> hey Bob! welcome home! B> what? B> wait, how did you know I got home? B> what the heck man? did you hack my machine? OMGWTFSTHUBERTBBQ?!

I'm paraphrasing as I lost copy of the original chat, but it was striking how he had absolutely no clue how I figured out he had just came home in front of his laptop. He was quite worried I hacked into his system to spy on his webcam or some other "hack". As it turns out, I just made simple assertions based on data Signal provides to other peers when you send messages. Using those messages, I could establish when my friend opened his laptop and the Signal Desktop app got back online.

How this works

This is possible because the receipt notifications in Signal are per-device. This means that the "double-checkmark" you see when a message is delivered to the device is actually only when the first device receives the message. Behind the scenes, Signal actually sends a notification for each device, with a unique, per-device identifier. Those identifiers are visible with signal-cli. For example, this is a normal notification the Signal app will send when confirming reception for a message, as seen from signal-cli:

Envelope from: “Bob” +15555555555 (device: 1) Timestamp: 1532279834422 (2018-07-22T17:17:14.422Z) Got receipt.

That's Bob's phone telling me it received the message. On my side, the Signal app shows a second checkmark to tell me the message was transmitted. (There are also "blue checkmarks" now that tell the user the other person has seen the message, but I haven't looked into those in detail.) Then another notification comes in:

Envelope from: “Bob” +15555555555 (device: 2) Timestamp: 1532279901951 (2018-07-22T17:18:21.951Z) Got receipt.

Notice the device number there? It changed from 1 to 2. This tells me this is a different device than the first one. Device 1 will most likely be the phone app and device 2 will most likely be Signal Desktop. (In my case, I tried so many different configurations thatI have device numbers up to 8, but my phone is still device 1.)

An attacker can use those notifications to tell when my phone goes online. It is also possible to make reasonable assertions about the identity of each device: any device number above one is most likely a Signal Desktop client. This can be used to assert physical presence on different machines: the desktop at home, laptop in the office, etc. It might not seem like much, but it sure felt creepy to Bob.

While writing this article, I figured I would reproduce those results, I wrote Bob again to ask for help. Here's how the (redacted and reformatted) conversation went:

A-1> hey you there? * B-1 message received A-1> i want to see if i can freak you out with signal again * B-1 message received A-1> i'm going to write about the issue, and i want to reproduce the results * B-1 message received B-1> he's driving B-1> sure, I'll be your guinea pig he says A-1> all he needs to do is open his laptop and start signal-desktop :p * B-1 message received B-1> we'll be home in 1h30 A-1> i'll know, don't worry :p * B-1 message received

After an hour or two, Bob gets home opens his laptop, and you can see the key message that reveals it:

* B-2 mesage received A-1> welcome home, sucker! ;) B-2> dang dog.

This attack can be carried out by anyone who knows Bob's phone number. Because Signal is an open network, you are free to send messages to anyone without their consent. An attacker only has to send spam messages to a victim to figure out when they're online, how many devices they own and when they are online. There's no way for Bob to protect himself from this attack, other than trying to keep his phone number private.

Why Signal works that way

When I shared an earlier draft of this article to the Signal Security team, they stated this was a necessary trade-off, as each device carries a unique cryptographic key anyways and that:

Signal encrypts messages individually to each recipient device. Thus as long as there is a "delivery receipt" feature, it will be possible to learn which recipient devices are online, for example by sending an encrypted message to a subset of the recipient devices, and seeing whether a delivery receipt is received or not.

The alternative seems to be to either disable receipt notifications or sharing the same private key among different devices, which induces other problems:

Having all recipient devices share the same encryption keys would render the Diffie-Hellman ratcheting which is part of the Signal protocol ineffective, since all devices (including offline ones) would have to use synchronized DH ratchet key pairs, preventing these values from adding fresh randomness. In addition, it would add massive protocol complexity and fragility to try to keep recipient devices synchronized, while trying to achieve the (probably-infeasible) goal of eliminating all ways to distinguish recipient devices.

I am not certain those tradeoffs are that clear-cut, however. I am not a cryptographer, and specifically not very familiar with the "ratcheting" algorithm behind the "Signal protocol" (or is it called Noise now?), but it seems to me there should be a way to provide multi-device, multi-key encryption, without revealing per-device identifiers to other clients. In particular, I do not understand what purpose those integers serve: maybe they are automatically generated by signal-cli and are just a side-effect of a fundamental property of the protocol, in which case I would understand why they would be unavoidable. To be fair, other cryptographic systems also share similar problems: an encrypted OpenPGP email usually embeds metadata about source and destination addresses, as email headers are not encrypted. Even a normal OpenPGP encrypted blob includes OpenPGP key data by default, although there are ways to turn that off and make sure an encrypted blob is just an undecipherable blob. The problem with this, of course, is that many critics of OpenPGP present it as an old technology that should be replaced by more modern alternatives like Signal, so it's a bit disappointing to see it suffers from similar metadata exposure problems as older protocols.

But apart from cryptographic properties, there are certain user expectations regarding Signal, and my experience with this specific issue is that this property certainly breaks some privacy expectations for users. I'm not sure people would choose to have delivery notifications if they were given the choice.

Other metadata issues

There are other metadata issues in Signal, of course. Like receipt notifications, they are tradeoffs between usability and privacy. The most notable one is probably how Signal shares your contact list. The user-visible effect is the "Bob is on Signal!" message that pops up when the server figures that out. The Signal people have done extensive research to make this work securely while at the same time leveraging the contacts on your phone, but it's still a surprising phenomenon to new users who don't know about the specifics of how this is implemented.

Another one is how groups are opt-out only: anyone can add you to a group without your consent, which shares your phone number to the other members of the group, a bit like how carbon-copies in emails reveals a social network.

Compared with groups and new users notifications, the receipt notification issue is a little more pernicions: the leak is not visible at all to users except if they run signal-cli... While people clearly see each other's presence in a group, they definitely will not know that those little checkmark disclose more information than they seem to other users.

The bottomline is that crypto and security are hard to implement but also hard to make visible to users. Signal does a great job at making a solid communication application that provides decent security, but it can have surprising properties even for skilled engineers who thought they knew about the security properties of the system, so I am worried about my fellow non-technical friends and their expectations of privacy...

Catégories: External Blogs

My free software activities, July 2018

Anarcat - ven, 07/27/2018 - 13:37
Debian Long Term Support (LTS)

This is my monthly Debian LTS report.

Most of my hours this month were spent updating jessie to catchup with all the work we've done in Wheezy that were never forward-ported (DLA-1414-1, fixing CVE-2017-9462, CVE-2017-17458, CVE-2018-1000132, OVE-20180430-0001, OVE-20180430-0002, and OVE-20180430-0004). Unfortunately, work was impeded by how upstream now refuses to get CVE identifiers for new issues they discover in the process, which meant that I actually missed three more patches which were required to fix the subrepo vulnerability (CVE-2017-17458). In other issues, upstream at least attempted to try identifiers through the OVE system which is not as well integrated in our security tracker but does allow some cross-distro collaboration at least. The regression advisory was published as DLA-1414-2.

Overall, the updates of the Mercurial package were quite difficult as the test suite would fail because order of one test would vary between builds (and not runs!) which was quite confusing. I originally tried fixing this by piping the output of the test suite through sort to get consistent output, but, after vetting the idea one of the upstream maintainers (durin42), I ended up sorting the dictionnary in the code directly.

I have also uploaded fixes for cups (DLA-1412-1, fixing CVE-2017-18190 and CVE-2017-18248) and dokuwiki (DLA-1413-1, fixing CVE-2017-18123).

Other activities

This month was fairly quiet otherwise, as I was on vacation.

I still managed to push a few projects forward. The pull request to add nopaste to ELPA was met with skepticism considering there is already another paste tool in ELPA called webpaste.el which takes the different (and unfortunate) approach of reimplementing all pastebins natively, instead of reusing the existing paste programs. I have, incidentally, discovered similar functionality in my terminal emulator, in the form of urxvt-selection-pastebin although I have yet to try (and probably patch) that approach.

We have also been dealing with a vast attack on IRC servers primarily aimed at hurting the reputation of Freenode operators, but that affected all IRC networks. On top of implementing custom measures to deal with the problem on our networks, I have contributed some documentation to help users and improvements to a IRC service to help with the attack.

I've also had a great conversation with the author of croc, a derivative of magic-wormhole because of flaws I felt were present in the croc implementation. It seems I was able to convince the author to do the right thing and future versions of the program might be fully compatible with wormhole, which is great news.

Catégories: External Blogs

Epic Lameness

Eric Dorland - lun, 09/01/2008 - 17:26
SF.net now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Catégories: External Blogs

Sentiment Sharing

Eric Dorland - lun, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Catégories: External Blogs
Syndiquer le contenu