Skip to main content

External Blogs

My free software activities, November and December 2016

Anarcat - Fri, 12/30/2016 - 17:00
Debian Long Term Support (LTS)

Those were 8th and 9th months working on Debian LTS started by Raphael Hertzog at Freexian. I had trouble resuming work in November as I had taken a long break during the month and started looking at issues only during the last week of November.

Imagemagick, again

I have, again, spent a significant amount of time fighting the ImageMagick (IM) codebase. About 15 more vulnerabilities were found since the last upload, which resulted in DLA-756-1. In the advisory, I unfortunately forgot to mention CVE-2016-8677 and CVE-2016-9559, something that was noticed by my colleague Roberto after the upload... More details about the upload are available in the announcement.

When you consider that I worked on IM back in october, which lead to an upload near the end of November covering around 80 more vulnerabilities, it doesn't look good for the project at all. Of the 15 vulnerabilities I worked on, only 6 had CVEs assigned and I had to request CVEs for the other 9 vulnerabilities plus 11 more that were still unassigned. This lead to the assignment of 25 distinct CVE identifiers as a lot of issues were found to be distinct enough to warrant their own CVEs.

One could also question how many of those issues affect the fork, Graphicsmagick. A lot of the vulnerabilities were found through fuzzing searches that may not have been tested on Graphicsmagick. It seems clear to me that a public corpus of test data should be available to test regressions and cross-project vulnerabilities. It's already hard enough to track issues withing IM itself, I can't imagine what it would be for the fork to keep track of those issues, especially since upstream doesn't systematically request CVEs for issues that they find, a questionable practice considering the number of issues we all need to keep track of.

Nagios

I have also worked on the Nagios package and produced DLA 751-1 which fixed two fairly major issues (CVE-2016-9565 and CVE-2016-9566) that could allow remote root access under certain conditions. Fortunately, the restricted permissions setup by default in the Debian package made both exploits limited to information disclosure and privilege escalation if the debug log is enabled.

This says a lot about how proper Debian packaging can help in limiting the attack surface of certain vulnerabilities. It was also "interesting" to have to re-learn dpatch to add patches to the package: I regret not converting it to quilt, as the operation is simple and quilt is so much easier to use.

People new to Debian packaging may be curious to learn about the staggering number of patching systems historically used in Debian. On that topic, I started a conversation about how much we want to reuse existing frameworks when we work on those odd packages, and the feedback was interesting. Basically, the answer is "it depends"...

NSS

I had already worked on the package in November and continued the work in December. Most of the work was done by Raphael, which fixed a lot of issues with the test suite. I tried to wrap this up by fixing CVE-2016-9074, the build on armel and the test suite. Unfortunately, I had to stop again because I ran out of hours and the fips test suite was still failing, but fortunately Raphael was able to complete the work with DLA-759-1.

As things stand now, the package is in better shape than in other suites as tests (Debian bug #806639) and autopkgtest (Debian bug #806207) are still not shipped in the sid or stable releases.

Other work

For the second time, I forgot to formally assign myself a package before working on it, which meant that I wasted part of my hours working on the monit package. Those hours, of course, were not counted in my regular hours. I still spent some time reviewing mejo's patch to ensure it was done properly and it turned out we both made similar patches working independently, always a good sign.

As I reported in my preliminary November report, I have also triaged issues in libxml2, ntp, openssl and tiff.

Finally, I should mention my short review of the phpMyAdmin upload, among the many posts i sent to the LTS mailing list.

Other free software work

One reason why I had so much trouble getting paid work done in November is that I was busy with unpaid work...

manpages.debian.org

A major time hole for me was trying to tackle the manpages.debian.org service, which had been offline since August. After a thorough evaluation of the available codebases, I figured the problem space wasn't so hard and it was worth trying to do an implementation from scratch. The result is a tool called debmans.

It took, obviously, way longer than I expected, as I experimented with Python libraries I had been keeping an eye on for a while. For the commanline interface, I used the click library, which is really a breeze to use, but a bit heavy for smaller scripts. For a web search service prototype, I looked at flask, which was also very interesting, as it is light and simple enough to use that I could get started quickly. It also, surprisingly, fares pretty well in the global TechEmpower benchmarking tests. Those interested in those tools may want to look at the source code, in particular the main command (using an interesting pattern itself, __main__.py) and the search prototype.

Debmans is the first project for which I have tried the CII Best Practices Badge program, an interesting questionnaire to review best practices in software engineering. It is an excellent checklist I recommend every project manager and programmer to get familiar with.

I still need to complete my work on Debmans: as I write this, I couldn't get access to the new server the DSA team setup for this purpose. It was a bit of a frustrating experience to wait for all the bits to get into place while I had a product ready to test. In the end, the existing manpages.d.o maintainer decided to deploy the existing codebase on the new server while the necessary dependencies are installed and accesses are granted. There's obviously still a bunch of work to be done for this to be running in production so I have postponed all this work to January.

My hope is that this tool can be reused by other distributions, but after talking with Ubuntu folks, I am not holding my breath: it seems everyone has something that is "good enough" and that they don't want to break it...

Monkeysign

I spent a good chunk of time giving a kick in the Monkeysign project, with the 2.2.2 release, which features contributions from two other developers, which may be a record for a single release.

I am especially happy to have adopted a new code of conduct - it has been an interesting process to adapt the code of conduct for such a relatively small project. Monkeysign is becoming a bit of a template on how to do things properly for my Python projects: documentation on readthedocs.org including a code of conduct, support and contribution information, and so on. Even though the code now looks a bit old to me and I am embarrassed to read certain parts, I still think it is a solid project that is useful for a lot of people. I would love to have more time to spend on it.

LWN publishing

As you may have noticed if you follow this blog, I have started publishing articles for the LWN magazine, filed here under the lwn tag. It is a way for me to actually get paid for some of my blogging work that used to be done for free. Reports like this one, for example, take up a significant amount of my time and are done without being paid. Converting parts of this work into paid work is part of my recent effort to reduce the amount of time I spend on the computer.

An funny note: I always found the layout of the site to be a bit odd, until I looked at my articles posted there in a different web browser, which didn't have my normal ad blocker configuration. It turns out LWN uses ads, and Google ones too, which surprised me. I definitely didn't want to publish my work under banner ads, and will never do so on this blog. But it seems fair that, since I get paid for this work, there is some sort of revenue stream associated with it. If you prefer to see my work without ads, you can wait for it to be published here or become a subscriber which allows you to get rid of the ads on the site.

My experience with LWN is great: they're great folks, and very supportive. It's my first experience with a real editor and it really pushed me in improving my writing to make better articles that I normally would here. Thanks to the LWN folks for their support! Expect more of those quality articles in 2017.

Debian packaging

I have added a few packages to the Debian archive:

  • magic-wormhole: easy file-transfer tool, co-maintained with Jamie Rollins
  • slop: screenshot tool
  • xininfo: utility used by teiler
  • teiler (currently in NEW): GUI for screenshot and screencast tools

I have also updated sopel and atheme-services.

Other work

Against my better judgment, I worked again on the borg project. This time I tried to improve the documentation, after a friend asked me for help on "how to make a quick backup". I realized I didn't have any good primer to send regular, non-sysadmin users to and figured that, instead of writing a new one, I could improve the upstream documentation instead.

I generated a surprising 18 commits of documentation during that time, mainly to fix display issues and streamline the documentation. My final attempt at refactoring the docs eventually failed, unfortunately, again reminding me of the difficulty I have in collaborating on that project. I am not sure I succeeded in making the project more attractive to non-technical users, but maybe that's okay too: borg is a fairly advanced project and not currently aimed at such a public. This is yet another project I am thinking of creating: a metabackup program like backupninja that would implement the vision created by liw in his A vision for backups in Debian post, which was discarded by the Borg project.

Github also tells me that I have opened 19 issues in 14 different repositories in November. I would like to particularly bring your attention to the linkchecker project which seems to be dead upstream and for which I am looking for collaborators in order to create a healthy fork.

Finally, I started on reviving the stressant project and changing all my passwords, stay tuned for more!

Categories: External Blogs

Debian considering automated upgrades

Anarcat - Wed, 12/14/2016 - 12:00

The Debian project is looking at possibly making automatic minor upgrades to installed packages the default for newly installed systems. While Debian has a reliable and stable package update system that has been an inspiration for multiple operating systems (the venerable APT), upgrades are, usually, a manual process on Debian for most users.

The proposal was brought up during the Debian Cloud sprint in November by longtime Debian Developer Steve McIntyre. The rationale was to make sure that users installing Debian in the cloud have a "secure" experience by default, by installing and configuring the unattended-upgrades package within the images. The unattended-upgrades package contains a Python program that automatically performs any pending upgrade and is designed to run unattended. It is roughly the equivalent of doing apt-get update; apt-get upgrade in a cron job, but has special code to handle error conditions, warn about reboots, and selectively upgrade packages. The package was originally written for Ubuntu by Michael Vogt, a longtime Debian developer and Canonical employee.

Since there was a concern that Debian cloud images would be different from normal Debian installs, McIntyre suggested installing unattended-upgrades by default on all Debian installs, so that people have a consistent experience inside and outside of the cloud. The discussion that followed was interesting as it brought up key issues one would have when deploying automated upgrade tools, outlining both the benefits and downsides to such systems.

Problems with automated upgrades

An issue raised in the following discussion is that automated upgrades may create unscheduled downtime for critical services. For example, certain sites may not be willing to tolerate a master MySQL server rebooting in conditions not controlled by the administrators. The consensus seems to be that experienced administrators will be able to solve this issue on their own, or are already doing so.

For example, Noah Meyerhans, a Debian developer, argued that "any reasonably well managed production host is going to be driven by some kind of configuration management system" where competent administrators can override the defaults. Debian, for example, provides the policy-rc.d mechanism to disable service restarts on certain packages out of the box. unattended-upgrades also features a way to disable upgrades on specific packages that administrators would consider too sensitive to restart automatically and will want to schedule during maintenance windows.

Reboots were another issue discussed: how and when to deploy kernel upgrades? Automating kernel upgrades may mean data loss if the reboot happens during a critical operation. On Debian systems, the kernel upgrade mechanisms already provide a /var/run/reboot-required flag file that tools can monitor to notify users of the required reboot. For example, some desktop environments will popup a warning prompting users to reboot when the file exists. Debian doesn't currently feature an equivalent warning for command-line operation: Vogt suggested that the warning could be shown along with the usual /etc/motd announcement.

The ideal solution here, of course, is reboot-less kernel upgrades, which is also known as "live patching" the kernel. Unfortunately, this area is still in development in the kernel (as was previously discussed here). Canonical deployed the feature for the Ubuntu 16.04 LTS release, but Debian doesn't yet have such capability, since it requires extra infrastructure among other issues.

Furthermore, system reboots are only one part of the problem. Currently, upgrading packages only replaces the code and restarts the primary service shipped with a given package. On library upgrades, however, dependent services may not necessarily notice and will keep running with older, possibly vulnerable, libraries. While libc6, in Debian, has special code to restart dependent services, other libraries like libssl do not notify dependent services that they need to restart to benefit from potentially critical security fixes.

One solution to this is the needrestart package which inspects all running processes and restarts services as necessary. It also covers interpreted code, specifically Ruby, Python, and Perl. In my experience, however, it can take up to a minute to inspect all processes, which degrades the interactivity of the usually satisfying apt-get install process. Nevertheless, it seems like needrestart is a key component of a properly deployed automated upgrade system.

Benefits of automated upgrades

One thing that was less discussed is the actual benefit of automating upgrades. It is merely described as "secure by default" by McIntyre in the proposal, but no one actually expanded on this much. For me, however, it is now obvious that any out-of-date system will be systematically attacked by automated probes and may be taken over to the detriment of the whole internet community, as we are seeing with Internet of Things devices. As Debian Developer Lars Wirzenius said:

The ecosystem-wide security benefits of having Debian systems keep up to date with security updates by default overweigh any inconvenience of having to tweak system configuration on hosts where the automatic updates are problematic.

One could compare automated upgrades with backups: if they are not automated, they do not exist and you will run into trouble without them. (Wirzenius, coincidentally, also works on the Obnam backup software.)

Another benefit that may be less obvious is the acceleration of the feedback loop between developers and users: developers like to know quickly when an update creates a regression. Automation does create the risk of a bad update affecting more users, but this issue is already present, to a lesser extent, with manual updates. And the same solution applies: have a staging area for security upgrades, the same way updates to Debian stable are first proposed before shipping a point release. This doesn't have to be limited to stable security updates either: more adventurous users could follow rolling distributions like Debian testing or unstable with unattended upgrades as well, with all the risks and benefits that implies.

Possible non-issues

That there was not a backlash against the proposal surprised me: I expected the privacy-sensitive Debian community to react negatively to another "phone home" system as it did with the Django proposal. This, however, is different than a phone home system: it merely leaks package lists and one has to leak that information to get the updated packages. Furthermore, privacy-sensitive administrators can use APT over Tor to fetch packages. In addition, the diversity of the mirror infrastructure makes it difficult for a single entity to profile users.

Automated upgrades do imply a culture change, however: administrators approve changes only a posteriori as opposed to deliberately deciding to upgrade parts they chose. I remember a time when I had to maintain proprietary operating systems and was reluctant to enable automated upgrades: such changes could mean degraded functionality or additional spyware. However, this is the free-software world and upgrades generally come with bug fixes and new features, not additional restrictions.

Automating major upgrades?

While automating minor upgrades is one part of the solution to the problem of security maintenance, the other is how to deal with major upgrades. Once a release becomes unsupported, security issues may come up and affect older software. While Debian LTS extends releases lifetimes significantly, it merely delays the inevitable major upgrades. In the grand scheme of things, the lifetimes of Linux systems (Debian: 3-5 years, Ubuntu: 1-5 years) versus other operating systems (Solaris: 10-15 years, Windows: 10+ years) is fairly short, which makes major upgrades especially critical.

While major upgrades are not currently automated in Debian, they are usually pretty simple: edit sources.list then:

# apt-get update && apt-get dist-upgrade

But the actual upgrade process is really much more complex. If you run into problems with the above commands, you will quickly learn that you should have followed the release notes, a whopping 20,000-word, ten-section document that outlines all the gory details of the release. This is a real issue for large deployments and for users unfamiliar with the command line.

The solutions most administrators seem to use right now is to roll their own automated upgrade process. For example, the Debian.org system administrators have their own process for the "jessie" (8.0) upgrade. I have also written a specification of how major upgrades could be automated that attempts to take into account the wide variety of corner cases that occur during major upgrades, but it is currently at the design stage. Therefore, this problem space is generally unaddressed in Debian: Ubuntu does have a do-release-upgrade command but it is Ubuntu-specific and would need significant changes in order to work in Debian.

Future work

Ubuntu currently defaults to "no automation" but, on install, invites users to enable unattended-upgrades or Landscape, a proprietary system-management service from Canonical. According to Vogt, the company supports both projects equally as they differ in scope: unattended-upgrades just upgrades packages while Landscape aims at maintaining thousands of machines and handles user management, release upgrades, statistics, and aggregation.

It appears that Debian will enable unattended-upgrades on the images built for the cloud by default. For regular installs, the consensus that has emerged points at the Debian installer prompting users to ask if they want to disable the feature as well. One reason why this was not enabled before is that unattended-upgrades had serious bugs in the past that made it less attractive. For example, it would simply fail to follow security updates, a major bug that was fortunately promptly fixed by the maintainer.

In any case, it is important to distribute security and major upgrades on Debian machines in a timely manner. In my long experience in professionally administering Unix server farms, I have found the upgrade work to be a critical but time-consuming part of my work. During that time, I successfully deployed an automated upgrade system all the way back to Debian woody, using the simpler cron-apt. This approach is, unfortunately, a little brittle and non-standard; it doesn't address the need of automating major upgrades, for which I had to revert to tools like cluster-ssh or more specialized configuration management tools like Puppet. I therefore encourage any effort towards improving that process for the whole community.

More information about the configuration of unattended-upgrades can be found in the Ubuntu documentation or the Debian wiki.

Note: this article first appeared in the Linux Weekly News.

Categories: External Blogs

Django debates privacy concern

Anarcat - Wed, 11/30/2016 - 12:00

In recent years, privacy issues have become a growing concern among free-software projects and users. As more and more software tasks become web-based, surveillance and tracking of users is also on the rise. While some software may use advertising as a source of revenue, which has the side effect of monitoring users, the Django community recently got into an interesting debate surrounding a proposal to add user tracking—actually developer tracking—to the popular Python web framework.

Tracking for funding

A novel aspect of this debate is that the initiative comes from concerns of the Django Software Foundation (DSF) about funding. The proposal suggests that "relying on the free labor of volunteers is ineffective, unfair, and risky" and states that "the future of Django depends on our ability to fund its development". In fact, the DSF recently hired an engineer to help oversee Django's development, which has been quite successful in helping the project make timely releases with fewer bugs. Various fundraising efforts have resulted in major new Django features, but it is difficult to attract sponsors without some hard data on the usage of Django.

The proposed feature tries to count the number of "unique developers" and gather some metrics of their environments by using Google Analytics (GA) in Django. The actual proposal (DEP 8) is done as a pull request, which is part of Django Enhancement Proposal (DEP) process that is similar in spirit to the Python Enhancement Proposal (PEP) process. DEP 8 was brought forward by a longtime Django developer, Jacob Kaplan-Moss.

The rationale is that "if we had clear data on the extent of Django's usage, it would be much easier to approach organizations for funding". The proposal is essentially about adding code in Django to send a certain set of metrics when "developer" commands are run. The system would be "opt-out", enabled by default unless turned off, although the developer would be warned the first time the phone-home system is used. The proposal notes that an opt-in system "severely undercounts" and is therefore not considered "substantially better than a community survey" that the DSF is already doing.

Information gathered

The pieces of information reported are specifically designed to run only in a developer's environment and not in production. The metrics identified are, at the time of writing:

  • an event category (the developer commands: startproject, startapp, runserver)
  • the HTTP User-Agent string identifying the Django, Python, and OS versions
  • a user-specific unique identifier (a UUID generated on first run)

The proposal mentions the use of the GA aip flag which, according to GA documentation, makes "the IP address of the sender 'anonymized'". It is not quite clear how that is done at Google and, given that it is a proprietary platform, there is no way to verify that claim. The proposal says it means that "we can't see, and Google Analytics doesn't store, your actual IP". But that is not actually what Google does: GA stores IP addresses, the documentation just says they are anonymized, without explaining how.

GA is presented as a trade-off, since "Google's track record indicates that they don't value privacy nearly as high" as the DSF does. The alternative, deploying its own analytics software, was presented as making sustainability problems worse. According to the proposal, Google "can't track Django users. [...] The only thing Google could do would be to lie about anonymizing IP addresses, and attempt to match users based on their IPs".

The truth is that we don't actually know what Google means when it "anonymizes" data: Jannis Leidel, a Django team member, commented that "Google has previously been subjected to secret US court orders and was required to collaborate in mass surveillance conducted by US intelligence services" that limit even Google's capacity of ensuring its users' anonymity. Leidel also argued that the legal framework of the US may not apply elsewhere in the world: "for example the strict German (and by extension EU) privacy laws would exclude the automatic opt-in as a lawful option".

Furthermore, the proposal claims that "if we discovered Google was lying about this, we'd obviously stop using them immediately", but it is unclear exactly how this could be implemented if the software was already deployed. There are also concerns that an implementation could block normal operation, especially in countries (like China) where Google itself may be blocked. Finally, some expressed concerns that the information could constitute a security problem, since it would unduly expose the version number of Django that is running.

In other projects

Django is certainly not the first project to consider implementing analytics to get more information about its users. The proposal is largely inspired by a similar system implemented by the OS X Homebrew package manager, which has its own opt-out analytics.

Other projects embed GA code directly in their web pages. This is apparently the option chosen by the Oscar Django-based ecommerce solution, but that was seen by the DSF as less useful since it would count Django administrators and wasn't seen as useful as counting developers. Wagtail, a Django-based content-management system, was incorrectly identified as using GA directly, as well. It actually uses referrer information to identify installed domains through the version updates checks, with opt-out. Wagtail didn't use GA because the project wanted only minimal data and it was worried about users' reactions.

NPM, the JavaScript package manager, also considered similar tracking extensions. Laurie Voss, the co-founder of NPM, said it decided to completely avoid phoning home, because "users would absolutely hate it". But NPM users are constantly downloading packages to rebuild applications from scratch, so it has more complete usage metrics, which are aggregated and available via a public API. NPM users seem to find this is a "reasonable utility/privacy trade". Some NPM packages do phone home and have seen "very mixed" feedback from users, Voss said.

Eric Holscher, co-founder of Read the Docs, said the project is considering using Sentry for centralized reporting, which is a different idea, but interesting considering Sentry is fully open source. So even though it is a commercial service (as opposed to the closed-source Google Analytics), it may be possible to verify any anonymity claims.

Debian's response

Since Django is shipped with Debian, one concern was the reaction of the distribution to the change. Indeed, "major distros' positions would be very important for public reception" to the feature, another developer stated.

One of the current maintainers of Django in Debian, Raphaël Hertzog, explicitly stated from the start that such a system would "likely be disabled by default in Debian". There were two short discussions on Debian mailing lists where the overall consensus seemed to be that any opt-out tracking code was undesirable in Debian, especially if it was aimed at Google servers.

I have done some research to see what, exactly, was acceptable as a phone-home system in the Debian community. My research has revealed ten distinct bug reports against packages that would unexpectedly connect to the network, most of which were not directly about collecting statistics but more often about checking for new versions. In most cases I found, the feature was disabled. In the case of version checks, it seems right for Debian to disable the feature, because the package cannot upgrade itself: that task is delegated to the package manager. One of those issues was the infamous "OK Google" voice activation binary blog controversy that was previously reported here and has since then been fixed (although other issues remain in Chromium).

I have also found out that there is no clearly defined policy in Debian regarding tracking software. What I have found, however, is that there seems to be a strong consensus in Debian that any tracking is unacceptable. This is, for example, an extract of a policy that was drafted (but never formally adopted) by Ian Jackson, a longtime Debian developer:

Software in Debian should not communicate over the network except: in order to, and as necessary to, perform their function[...]; or for other purposes with explicit permission from the user.

In other words, opt-in only, period. Jackson explained that "when we originally wrote the core of the policy documents, the DFSG [Debian Free Software Guidelines], the SC [Social Contract], and so on, no-one would have considered this behaviour acceptable", which explains why no explicit formal policy has been adopted yet in the Debian project.

One of the concerns with opt-out systems (or even prompts that default to opt-in) was well explained back then by Debian developer Bas Wijnen:

It very much resembles having to click through a license for every package you install. One of the nice things about Debian is that the user doesn't need to worry about such things: Debian makes sure things are fine.

One could argue that Debian has its own tracking systems. For example, by default, Debian will "phone home" through the APT update system (though it only reports the packages requested). However, this is currently not automated by default, although there are plans to do so soon. Furthermore, Debian members do not consider APT as tracking, because it needs to connect to the network to accomplish its primary function. Since there are multiple distributed mirrors (which the user gets to choose when installing), the risk of surveillance and tracking is also greatly reduced.

A better parallel could be drawn with Debian's popcon system, which actually tracks Debian installations, including package lists. But as Barry Warsaw pointed out in that discussion, "popcon is 'opt-in' and [...] the overwhelming majority in Debian is in favour of it in contrast to 'opt-out'". It should be noted that popcon, while opt-in, defaults to "yes" if users click through the install process. [Update: As pointed out in the comments, popcon actually defaults to "no" in Debian.] There are around 200,000 submissions at this time, which are tracked with machine-specific unique identifiers that are submitted daily. Ubuntu, which also uses the popcon software, gets around 2.8 million daily submissions, while Canonical estimates there are 40 million desktop users of Ubuntu. This would mean there is about an order of magnitude more installations than what is reported by popcon.

Policy aside, Warsaw explained that "Debian has a reputation for taking privacy issues very serious and likes to keep it".

Next steps

There are obviously disagreements within the Django project about how to handle this problem. It looks like the phone-home system may end up being implemented as a proxy system "which would allow us to strip IP addresses instead of relying on Google to anonymize them, or to anonymize them ourselves", another Django developer, Aymeric Augustin, said. Augustin also stated that the feature wouldn't "land before Django drops support for Python 2", which is currently estimated to be around 2020. It is unclear, then, how the proposal would resolve the funding issues, considering how long it would take to deploy the change and then collect the information so that it can be used to spur the funding efforts.

It also seems the system may explicitly prompt the user, with an opt-out default, instead of just splashing a warning or privacy agreement without a prompt. As Shai Berger, another Django contributor, stated, "you do not get [those] kind of numbers in community surveys". Berger also made the argument that "we trust the community to give back without being forced to do so"; furthermore:

I don't believe the increase we might get in the number of reports by making it harder to opt-out, can be worth the ill-will generated for people who might feel the reporting was "sneaked" upon them, or even those who feel they were nagged into participation rather than choosing to participate.

Other options may also include gathering metrics in pip or PyPI, which was proposed by Donald Stufft. Leidel also proposed that the system could ask to opt-in only after a few times the commands are called.

It is encouraging to see that a community can discuss such issues without heating up too much and shows great maturity for the Django project. Every free-software project may be confronted with funding and sustainability issues. Django seems to be trying to address this in a transparent way. The project is willing to engage with the whole spectrum of the community, from the top leaders to downstream distributors, including individual developers. This practice should serve as a model, if not of how to do funding or tracking, at least of how to discuss those issues productively.

Everyone seems to agree the point is not to surveil users, but improve the software. As Lars Wirzenius, a Debian developer, commented: "it's a very sad situation if free software projects have to compromise on privacy to get funded". Hopefully, Django will be able to improve its funding without compromising its principles.

Note: this article first appeared in the Linux Weekly News.

Categories: External Blogs

Montréal-Python 61: Isometrical Jello

Montreal Python - Tue, 11/29/2016 - 00:00

After a great time at PyCon Canada and the holidays season only a few weeks away, we see this as a great time to get together and talk about Python. We are lucky to welcome both new and returning speakers from the Montreal community. So come and join us:

Where

UQÀM, Pavillion PK 201, Président-Kennedy avenue Room PK-1140

When

December 5th, 2016

Schedule
  • 6:00 - Doors Open
  • 6:30 - Start of the presentations
  • 7:30 - Break
  • 7:45 - Part 2 of the presentations
  • 9:00 - End of presentations. Drinks afterward at Benelux!
Presentations Patrick Poirier, Jean-Philippe Beaudet and Shawn Lauzon: "Implementing Matrix"

Matrix defines a set of open APIs for decentralised communication, suitable for securely publishing, persisting and subscribing to data over a global open federation of servers with no single point of control. Uses include Instant Messaging (IM), Voice over IP (VoIP) signalling, Internet of Things (IoT) communication, and bridging together existing communication silos - providing the basis of a new open real-time communication ecosystem.

More informations

Federico Ariza : "My Restful Lab"

Using a Rest interface to control an optics laboratory helps to decouple the physical layer (real lab hardware) from the spirit of the tests. The result is an uniform API with greater simplicity for test creation and free portability.

More informations

Alexandre Desilets-Benoit: "PPP : Python, Pi(e) and Particle accelerators."

A short overview of how saved 10k$ of research funds by interfacing my beam line at the UdeM linear accelerator with a raspberry pi 3, python (2!), arduino and my salary as a postdoc. A proof that bigger is not always better and that small things can land you on the front page of the university student's journal if you're not careful enough...

More informations

Rory Geoghegan: “Module of the month: fileinput

We’d like to thank our sponsors for their continued support:

  • UQÀM
  • Bénélux
  • w.illi.am/
  • Outbox
  • Savoir-faire Linux
Categories: External Blogs

The Turris Omnia router: help for the IoT mess?

Anarcat - Tue, 11/15/2016 - 10:28

The Turris Omnia router is not the first FLOSS router out there, but it could well be one of the first open hardware routers to be available. As the crowdfunding campaign is coming to a close, it is worth reflecting on the place of the project in the ecosystem. Beyond that, I got my hardware recently, so I was able to give it a try.

A short introduction to the Omnia project

The Omnia router is a followup project on CZ.NIC's original research project, the Turris. The goal of the project was to identify hostile traffic on end-user networks and develop global responses to those attacks across every monitored device. The Omnia is an extension of the original project: more features were added and data collection is now opt-in. Whereas the original Turris was simply a home router, the new Omnia router includes:

  • 1.6GHz ARM CPU
  • 1-2GB RAM
  • 8GB flash storage
  • 6 Gbit Ethernet ports
  • SFP fiber port
  • 2 Mini-PCI express ports
  • mSATA port
  • 3 MIMO 802.11ac and 2 MIMO 802.11bgn radios and antennas
  • SIM card support for backup connectivity

Some models sold had a larger case to accommodate extra hard drives, turning the Omnia router into a NAS device that could actually serve as a multi-purpose home server. Indeed, it is one of the objectives of the project to make "more than just a router". The NAS model is not currently on sale anymore, but there are plans to bring it back along with LTE modem options and new accessories "to expand Omnia towards home automation".

Omnia runs a fork of the OpenWRT distribution called TurrisOS that has been customized to support automated live updates, a simpler web interface, and other extra features. The fork also has patches to the Linux kernel, which is based on Linux 4.4.13 (according to uname -a). It is unclear why those patches are necessary since the ARMv7 Armada 385 CPU has been supported in Linux since at least 4.2-rc1, but it is common for OpenWRT ports to ship patches to the kernel, either to backport missing functionality or perform some optimization.

There has been some pressure from backers to petition Turris to "speedup the process of upstreaming Omnia support to OpenWrt". It could be that the team is too busy with delivering the devices already ordered to complete that process at this point. The software is available on the CZ-NIC GitHub repository and the actual Linux patches can be found here and here. CZ.NIC also operates a private GitLab instance where more software is available. There is technically no reason why you wouldn't be able to run your own distribution on the Omnia router: OpenWRT development snapshots should be able to run on the Omnia hardware and some people have installed Debian on Omnia. It may require some customization (e.g. the kernel) to make sure the Omnia hardware is correctly supported. Most people seem to prefer to run TurrisOS because of the extra features.

The hardware itself is also free and open for the most part. There is a binary blob needed for the 5GHz wireless card, which seems to be the only proprietary component on the board. The schematics of the device are available through the Omnia wiki, but oddly not in the GitHub repository like the rest of the software.

Hands on

I received my own router last week, which is about six months late from the original April 2016 delivery date; it allowed me to do some hands-on testing of the device. The first thing I noticed was a known problem with the antenna connectors: I had to open up the case to screw the fittings tight, otherwise the antennas wouldn't screw in correctly.

Once that was done, I simply had to go through the usual process of setting up the router, which consisted of connecting the Omnia to my laptop with an Ethernet cable, connecting the Omnia to an uplink (I hooked it into my existing network), and go through a web wizard. I was pleasantly surprised with the interface: it was smooth and easy to use, but at the same time imposed good security practices on the user.

For example, the wizard, once connected to the network, goes through a full system upgrade and will, by default, automatically upgrade itself (including reboots) when new updates become available. Users have to opt-in to the automatic updates, and can chose to automate only the downloading and installation of the updates without having the device reboot on its own. Reboots are also performed during user-specified time frames (by default, Omnia applies kernel updates during the night). I also liked the "skip" button that allowed me to completely bypass the wizard and configure the device myself, through the regular OpenWRT systems (like LuCI or SSH) if I needed to.

Notwithstanding the antenna connectors themselves, the hardware is nice. I ordered the black metal case, and I must admit I love the many LED lights in the front. It is especially useful to have color changes in the reset procedure: no more guessing what state the device is in or if I pressed the reset button long enough. The LEDs can also be dimmed to reduce the glare that our electronic devices produce.

All this comes at a price, however: at \$250 USD, it is a much higher price tag than common home routers, which typically go for around \$50. Furthermore, it may be difficult to actually get the device, because no orders are being accepted on the Indiegogo site after October 31. The Turris team doesn't actually want to deal with retail sales and has now delegated retail sales to other stores, which are currently limited to European deliveries.

A nice device to help fight off the IoT apocalypse

It seems there isn't a week that goes by these days without a record-breaking distributed denial-of-service (DDoS) attack. Those attacks are more and more caused by home routers, webcams, and "Internet of Things" (IoT) devices. In that context, the Omnia sets a high bar for how devices should be built but also how they should be operated. Omnia routers are automatically upgraded on a nightly basis and, by default, do not provide telnet or SSH ports to run arbitrary code. There is the password-less wizard that starts up on install, but it forces the user to chose a password in order to complete the configuration.

Both the hardware and software of the Omnia are free and open. The automatic update's EULA explicitly states that the software provided by CZ.NIC "will be released under a free software licence" (and it has been, as mentioned earlier). This makes the machine much easier to audit by someone looking for possible flaws, say for example a customs official looking to approve the import in the eventual case where IoT devices end up being regulated. But it also makes the device itself more secure. One of the problems with these kinds of devices is "bit rot": they have known vulnerabilities that are not fixed in a timely manner, if at all. While it would be trivial for an attacker to disable the Omnia's auto-update mechanisms, the point is not to counterattack, but to prevent attacks on known vulnerabilities.

The CZ.NIC folks take it a step further and encourage users to actively participate in a monitoring effort to document such attacks. For example, the Omnia can run a honeypot to lure attackers into divulging their presence. The Omnia also runs an elaborate data collection program, where routers report malicious activity to a central server that collects information about traffic flows, blocked packets, bandwidth usage, and activity from a predefined list of malicious addresses. The exact data collected is specified in another EULA that is currently only available to users logged in at the Turris web site. That data can then be turned into tweaked firewall rules to protect the overall network, which the Turris project calls a distributed adaptive firewall. Users need to explicitly opt-in to the monitoring system by registering on a portal using their email address.

Turris devices also feature the Majordomo software (not to be confused with the venerable mailing list software) that can also monitor devices in your home and identify hostile traffic, potentially leading users to take responsibility over the actions of their own devices. This, in turn, could lead users to trickle complaints back up to the manufacturers that could change their behavior. It turns out that some companies do care about their reputations and will issue recalls if their devices have significant enough issues.

It remains to be seen how effective the latter approach will be, however. In the meantime, the Omnia seems to be an excellent all-around server and router for even the most demanding home or small-office environments that is a great example for future competitors.

Note: this article first appeared in the Linux Weekly News.

Categories: External Blogs

Call for Speaker - Montréal-Python 61: Isometrical Jello

Montreal Python - Tue, 11/15/2016 - 00:00

Just in time for the Holiday season, we are gathering the Python community together. This time we've decided to return to our roots at UQAM. For this opportunity, we are looking for speakers. It's your chance to submit a talk. Just write us at mtlpyteam@googlegroups.com.

We are particularly looking for people willing to present lighting talks. Don't hesitate and send us your proposition or join us on slack by subscribing at http://slack.mtlpy.org/ to ask us any question.

Where

UQAM, more details to come

When

Monday, December 5th, 2016 at 6pm

We’d like to thank our sponsors for their continued support:

  • UQÀM
  • Bénélux
  • w.illi.am/
  • Outbox
  • Savoir-faire Linux
Categories: External Blogs

Montreal-Python new slack Channel

Montreal Python - Wed, 11/02/2016 - 23:00

Good news everyone,

The Montreal-Python community now has his own slack channel. Join us to discuss about Python, Numpy, Pyramid, Django and everyone's favorite usage of our preferred language.

To join, head you browser to the following url: http://slack.mtlpy.org/ and create your account.

As usual we are always available also on the following platform:

* Facebook: https://www.facebook.com/montrealpython * Freenode (irc): #montrealpython * Twitter: https://twitter.com/mtlpy * Youtube: https://www.youtube.com/user/MontrealPython

And on our website at http://montrealpython.org/

See you soon

Categories: External Blogs

My free software activities, October 2016

Anarcat - Mon, 10/31/2016 - 15:15
Debian Long Term Support (LTS)

This is my 7th month working on Debian LTS, started by Raphael Hertzog at Freexian, after a long pause during the summer.

I have worked on the following packages and CVEs:

I have also helped review work on the following packages:

  • imagemagick: reviewed BenH's work to figure out what was done. unfortunately, I forgot to officially take on the package and Roberto started working on it in the meantime. I nevertheless took time to review Roberto's work and outline possible issues with the original patchset suggested
  • tiff: reviewed Raphael's work on the hairy TIFFTAG_* issues, all the gory details in this email

The work on ImageMagick and GraphicsMagick was particularly intriguing. Looking at the source of those programs makes me wonder why were are still using them at all: it's a tangled mess of C code that is bound to bring up more and more vulnerabilities, time after time. It seems there's always an "Magick" vulnerability waiting to be fixed out there... I somehow hoped that the fork would bring more stability and reliability, but it seems they are suffering from similar issues because, fundamentally, they haven't rewritten ImageMagick...

It looks this is something that affects all image programs. The review I have done on the tiff suite give me the same shivering sensation as reviewing the "Magick" code. It feels like all image libraries are poorly implemented and then bound to be exploited somehow... Nevertheless, if I had to use a library of the sort in my software, I would stay away from the "Magick" forks and try something like imlib2 first...

Finally, I also did some minor work on the user and developer LTS documentation and some triage work on samba, xen and libass. I also looked at the dreaded CVE-2016-7117 vulnerability in the Linux kernel to verify its impact on wheezy users. I also looked at implementing a --lts flag for dch (see bug #762715).

It was difficult to get back to work after such a long pause, but I am happy I was able to contribute a significant number of hours. It's a bit difficult to find work sometimes in LTS-land, even if there's actually always a lot of work to be done. For example, I used to be one of the people doing frontdesk work, but those duties are now assigned until the end of the year, so it's unlikely I will be doing any of that for the forseable future. Similarly, a lot of packages were assigned when I started looking at the available packages. There was an interesting discussion on the internal mailing list regarding unlocking package ownership, because some people had packages locked for weeks, sometimes months, without significant activity. Hopefully that situation will improve after that discussion.

Another interesting discussion I participated in is the question of whether the LTS team should be waiting for unstable to be fixed before publishing fixes in oldstable. It seems the consensus right now is that it shouldn't be mandatory to fix issues in unstable before we fix security isssues in oldstable and stable. After all, security support for testing and unstable is limited. But I was happy to learn that working on brand new patches is part of our mandate as part of the LTS work. I did work on such a patch for tar which ended up being adopted by the original reporter, although upstream ended up implementing our recommendation in a better way.

It's coincidentally the first time since I start working on LTS that I didn't get the number of requested hours, which means that there are more people working on LTS. That is a good thing, but I am worried it may also mean people are more spread out and less capable of focusing for longer periods of time on more difficult problems. It also means that the team is growing faster than the funding, which is unfortunate: now is a good time as any to remind you to see if you can make your company fund the LTS project if you are still running Debian wheezy.

Other free software work

It seems like forever that I did such a report, and while I was on vacation, a lot has happened since the last report.

Monkeysign

I have done extensive work on Monkeysign, trying to bring it kicking and screaming in the new world of GnuPG 2.1. This was the objective of the 2.1 release, which collected about two years of work and patches, including arbitrary MUA support (e.g. Thunderbird), config files support, and a release on PyPI. I have had to release about 4 more releases to try and fix the build chain, ship the test suite with the program and have a primitive preferences panel. The 2.2 release also finally features Tor suport!

I am also happy to have moved more documentation to Read the docs, part of which I mentionned in in a previous article. The git repositories and issues were also moved to a Gitlab instance which will hopefully improve the collaboration workflow, although we still have issues in streamlining the merge request workflow.

All in all, I am happy to be working on Monkeysign, but it has been a frustrating experience. In the last years, I have been maintaining the project largely on my own: although there are about 20 contributors in Monkeysign, I have committed over 90% of the commits in the code. New contributors recently showed up, and I hope this will release some pressure on me being the sole maintainer, but I am not sure how viable the project is.

Funding free software work

More and more, I wonder how to sustain my contributions to free software. As a previous article has shown, I work a lot on the computer, even when I am not on a full-time job. Monkeysign has been a significant time drain in the last months, and I have done this work on a completely volunteer basis. I wouldn't mind so much except that there is a lot of work I do on a volunteer basis. This means that I sometimes must prioritize paid consulting work, at the expense of those volunteer projects. While most of my paid work usually revolves around free sofware, the benefits of paid work are not always immediately obvious, as the primary objective is to deliver to the customer, and the community as a whole is somewhat of a side-effect.

I have watched with interest joeyh's adventures into crowdfunding which seems to be working pretty well for him. Unfortunately, I cannot claim the incredible (and well-deserved) reputation Joey has, and even if I could, I can't live with 500$ a month.

I would love to hear if people would be interested in funding my work in such a way. I am hesitant in launching a crowdfunding campaign because it is difficult to identify what exactly I am working on from one month to the next. Looking back at earlier reports shows that I am all over the place: one month I'll work on a Perl Wiki (Ikiwiki), the next one I'll be hacking at a multimedia home cinema (Kodi). I can hardly think of how to fund those things short of "just give me money to work on anything I feel like", which I can hardly ask for of anyone. Even worse, it feels like the audience here is either friends or colleagues. It would make little sense for me to seek funding from those people: colleagues have the same funding problems I do, and I don't want to empoverish my friends...

So far I have taken the approach of trying to get funding for work I am doing, bit by bit. For example, I have recently been told that LWN actually pays for contributed articles and have started running articles by them before publishing them here. This is looking good: they will publish an article I wrote about the Omnia router I have recently received. I give them exclusive rights on the article for two weeks, but I otherwise retain full ownership over the article and will publish them after the exclusive period here.

Hopefully, I will be able to find more such projects that pays for the work I do on a day to day basis.

Open Street Map editing

I have ramped up my OpenStreetMap contributions, having (temporarily) moved to a different location. There are lots of things to map here: trails, gaz stations and lots of other things are missing from the map. Sometimes the effort looks a bit ridiculous, reminding me of my early days of editing OSM. I have registered to OSM Live, a project to fund OSM editors that, I must admit, doesn't help much with funding my work: with the hundreds of edits I did in October, I received the equivalent of 1.80$CAD in Bitcoins. This may be the lowest hourly salary I have ever received, probably going at a rate of 10¢ per hour!

Still, it's interesting to be able to point people to the project if someone wants to contribute to OSM mappers. But mappers should have no illusions about getting a decent salary from this effort, I am sorry to say.

Bounties

I feel this is similar to the "bounty" model used by the Borg project: I claimed around $80USD in that project for what probably amounts to tens of hours of work, yet another salary that would qualify as "poor".

Another example is a feature I would like to implement in Borg: support for protocols other than SSH. There is currently no bounty on this, but a similar feature, S3 support has one of the largest bounties Borg has ever seen: $225USD. And the claimant for the bounty hasn't actually implemented the feature, instead backing up to S3, the patch (to a third-party tool) actually enables support for Amazon Cloud Drive, a completely different API.

Even at $225, I wouldn't be able to complete any of those features and get a decent salary. As well explained by the Snowdrift reviews, bounties just don't work at all... The ludicrous 10% fee charged by Bountysource made sure I would never do business with them ever again anyways.

Other work

There are probably more things I did recently, but I am having difficulty keeping track of the last 5 months of on and off work, so you will forgive that I am not as exhaustive as I usually am.

Categories: External Blogs

Montreal Python 60 - Gravitational Hipster

Montreal Python - Mon, 10/24/2016 - 23:00

After an amazing edition of our presentation nights at Ubisoft, last month, we follow up with Montréal-Python 60: Gravitational Hipster on Tuesday, November 1st at WeWork.

It is a great opportunity to get a preview of what to expect at PyCon Canada, which is taking place in Toronto from the 12th through to the 15th of November.

For the occasion, we've invited speakers from Montreal who are going to present at Toronto to come to test-drive their talk. This edition Montreal-Python is gonna presented by Immunio.

Where

WeWork - 3 Place Ville-Marie, 4th floor

When

November 1st, 2016

Schedule
  • 6:00 - Doors Open
  • 6:30 - Start of the presentations
  • 7:30 - Break
  • 7:45 - Part 2 of the presentations
  • 9:00 - End of presentations. Drinks afterward.
Présentations Roberto Rocha: Beautiful Pandas

Pandas is a powerful data analysis library that rivals R among data scientists. It's also gaining popularity in journalism. Pandas can: - Read multiple data formats, including CSV, Excel, SQL, JSON, HTML, Stata, HDF, and others - Reshape and pivot tables in flexible ways - Filter data - Perform quick aggregations on categorical data - Visualize data so you can quickly see what it looks like In this talk, Roberto will show how easy it is to get started and find interesting stories in big, complex datasets.

https://2016.pycon.ca/en/schedule/019-roberto-rocha/

Hadrien David: Painful Serverless

Serverless architecture is a hot topic: It promises low operational costs, infinite scalability, less management. In reality, it strongly challenges the way you design, implement and deploy your application. Moreover, you have to choose amongst a variety of vendors providing function as a service platform (FaaS) with highly opinionated and different APIs. This talk aims to release the pain by presenting a summary of nice idioms and practises with python examples based on AWS Lambdas.

https://2016.pycon.ca/en/schedule/079-hadrien-david/

Marc-André Giroux: Becoming a REST elitist in 10 minutes

You finish what you think is the best REST API anyone has ever seen. Proud of your work, you decide to post it on the internet and collect the appraisal of you peers. To your surprise, your API receives comments like "This is not REST! This is simply JSON over HTTP!" or "Oh my god, this is only a level 2 REST API!". It's time to learn how to become a REST elitist and fight back.

https://2016.pycon.ca/en/schedule/087-marc-andre-giroux/

Jean-Philippe Caissy: From idea to production in 20 minutes: engineering at scale

Shopify strives to be an ever moving company. With a strong force of hundreds in the R&D team, the challenges to keep delivering and shipping scaled accordingly.

From a manual CLI that only ops could execute just a few years back, to today’s automatic 180+ deploys a day, this talk will highlight the different tools, processes and integrations used throughout the team that fuels today's scale.

https://2016.pycon.ca/en/schedule/089-jean-philippe-caissy/

Greg Ward: Version control worst practices with

Nowadays, everybody uses version control. But until you learn to misuse your version control system, you're missing out on ways to minimize developer productivity and pessimize your workflow. I'll show you 12 time-tested worst practices that will set you down the wrong path from Day One.

https://2016.pycon.ca/en/schedule/063-greg-ward/

We would like to thank our sponsors for their constant support

  • Immunio
  • UQÀM
  • Bénélux
  • w.illi.am/
  • Outbox
  • Savoir-faire Linux
Categories: External Blogs

Managing good bug reports

Anarcat - Fri, 10/14/2016 - 10:11

Bug reporting is an art form that is too often neglected in software projects. Bug reports allow contributors to participate without deep technical knowledge and at the same time provide a crucial space for developers to be made aware of issues with their software that they could not have foreseen or found themselves, for lack of resources, variety or imagination.

Prior art

Unfortunately, there are rarely good guidelines for submitting bug reports. Historically, people have pointed towards How to report bugs effectively or How to ask questions the smart way. While those guides can be useful for motivated people and may seem attractive references for project managers, they suffer from serious issues:

  • they are written by technical people, for non-technical people
  • as a result, they have a deeply condescending attitude such as calling people "stupid" or various animal names like "mongoose"
  • they are also very technical themselves: one starts with a copyright notice and a changelog, the other uses magic words like "Core dumps" and $Id$
  • they are too long: sgtatham's is about 3600 words long, esr's is even longer at about 11800 words. those texts will take about 20 to 60 minutes to read by an average reader, according to research

Individual projects have their own guides as well. Linux has the REPORTING_BUGS file that is a much shorter 1200 that can be read under 5 minutes, provided that you can understand the topic at hand. Interestingly, that guide refers to both esr's and sgtatham's guidelines, which means, in the degenerate case where the user hasn't had the "privilege" of reading esr's prose already, they will have an extra hour and a half of reading to do to have honestly followed the guidelines before reporting the bug.

I often find good documentation in the Tails project. Their bug reporting guidelines are easily accessible and quick to read, although they still might be too technical. It could be argued that you need to get technical at some point to get that information out, of course.

In the Monkeysign project, I have started a bug reporting guide that doesn't yet address all those issues. I am considering writing a new guide, but I figured I would look at other people's work and get feedback before writing my own standard.

What's the point?

Why have those documents been written? Are people really expected to read them before seeking help? It seems to me unlikely that someone would:

  1. be motivated enough to do something about a broken part of their computer
  2. figure out they can do something about it
  3. read a fifteen thousand words novel about how to report a bug...
  4. just to finally write a 20-line bug report that has no warranty of support attached to it

And if I would be a paying customer, I wouldn't want to be forced to waste my time reading that prose either: it's your job to help me fix your broken things, not the reverse. As someone doing consulting these days: I totally understand: it's not you, the user, it's us, the developers, that have a problem. We have been socialized through computers, and it makes us weird and obtuse, but that's no excuse, and we need to clean up our act.

Furthermore, it's surprising how often we get (and make!) bug reports that can be difficult to use. The Monkeysign project is very "technical" and I have expected that the bug reports I would get would be well written, with ways to reproduce and so on, but it happened that I received bug reports that were all over the place, didn't have any ways of reproducing or were simply incomplete. Those three bug reports were filed by people that I know to be very technically capable: one is a fellow Debian developer, the second had filed a good bug report 5 days before, and the third one is a contributor that sent good patches before.

In all three cases, they knew what they were doing. Those three people probably read the guidelines mentioned in the past. They may have even read the Monkeysign bug reporting guidelines as well. I can only explain those bug reports by the lack of time: people thought the issue was obvious, that it would get fixed rapidly because, obviously, something is broken.

We need a better way.

The takeaway

What are those guides trying to tell us?

  1. ask questions in the right place
  2. search for similar questions and issues before reporting the bug
  3. try to make the developers reproduce the issues
  4. failing that, try to describe the issue as well as you can
  5. write clearly, be specific and verbose yet concise

There are obviously contradictions in there, like sgtatham telling us to be verbose and esr tells us to, basically, not be verbose. There is definitely a tension in there, and there are many, many more details about how great bug reports can be if done properly.

I tend towards the side of terseness in our descriptions: people that will know how to be concise will be, people that don't will most likely not learn by reading a 12 000 words novel that, in itself, didn't manage to be parsimonious.

But I am willing to allow for verbosity in bug reports: I prefer too many details instead of missing a key bit of information.

Issue trackers

Step 1 is our job: we should send people in the right place, and give them the right tools. Monkeysign used to manage bugs with bugs-everywhere and this turned out to be a terrible idea: you had to understand git and bugs-everywhere to file any bug reports. As a result, there were exactly zero bug reports filed by non-developers during the whole time BE was used, although some bugs were filed in the Debian Bugtracker.

So have a good bug tracker. A mailing list or email address is not a good bug tracker: you lose track of old issues, and it's hard for newcomers to search the archives. It does have the advantage of having a unified interface for the support forum and bug tracking, however.

Redmine, Gitlab, Github and others are all decent-enough bug trackers. The key point is that the issue tracker should be publicly available, and users should be able to register easily to file new issues. You should also be able to mass-edit tickets and users should be able to discover the tracker's features easily. I am sorry to say that the Debian BTS somewhat falls short on those two features.

Step 2 is a shared responsibility: there should be an easy way to search for issues, and we should help the user looking for similar issues. Stackexchange sites do an excellent job at this, by automatically searching for similar questions while you write your question, suggesting similar ones in an attempt to weed out duplicates. Duplicates still happen, but they can then clearly be marked and linked with a distinct mechanism. Most bug trackers do not offer such high level functionality, but should, so I feel the fault lies more on "our" end than at the user's end.

Reproducing the environment

Step 3 and 4 are more or less the user's responsibility. We can detail in our documentation how to clearly share the environment where we reproduced the bug, for example, but in the end, the user decides if they want to share that information or not.

In Monkeysign, I have finally implemented joeyh's suggestion of shipping the test suite with the program. I can now tell people to run the test suite in their environment to see if this is a regression that is specific to their environment - so a known bug, in a way - or a novel bug for which I can look at writing a new unit test. I also include way more information about the environment in the --version output, an idea I brought forward in the borg project to ease debugging. That way, people can just send the output of monkeysign --test and monkeysign --version, and I have a very good overview of what is happening on their end. Of course, Monkeysign also supports the usual --verbose and --debug flag that users should enable when reproducing issues.

Another idea is to report bugs directly from the application. We have all seen Firefox or other software have automatic bug reporting tools, but somehow those seem unsatisfactory for a user: we have no feedback of where the report goes, if it's followed up on. It is useful for larger project to get statistical data, but not so useful for users in the short term.

Monkeysign tries to handle exceptions in the code in a graceful way, but could do better. We use a small library to handle exceptions, but that library has since then been improved to directly file bugs against the Github project. This assumes the user is logged into Github, but it is nice to pre-populate bug reports with the relevant information up front.

Issue templates

In the meantime, to make sure people provide enough information, I have now moved a lot of the bug reporting guidelines to a separate issue template. That issue template is available through the issue creation form now, although it is not enabled by default, a weird limitation of Gitlab. Issue templates are available in Gitlab and Github.

Issue templates somewhat force users in a straight jacket: there is already something to structure their bug report. Those could be distinct form elements that had to be filled in, but I like the flexibility of the template, and the possibility for users to just escape the formalism and just plead for help in their own way.

Issue guidelines

In the end, I opted for a short few paragraphs in the style of the Tails documentation, including a reference to sgtatham, as an optional future reference:

  • Before you report a new bug, review the existing issues in the online issue tracker and the Debian BTS for Monkeysign to make sure the bug has not already been reported elsewhere.

  • The first aim of a bug report is to tell the developers exactly how to reproduce the failure, so try to reproduce the issue yourself and describe how you did that.

  • If that is not possible, try to describe what went wrong in detail. Write down the error messages, especially if they have numbers.

  • Take the necessary time to write clearly and precisely. Say what you mean, and make sure it cannot be misinterpreted.

  • Include the output of monkeysign --test, monkeysign --version and monkeysign --debug in your bug reports. See the issue template for more details about what to include in bug reports.

If you wish to read more about issues regarding communication in bug reports, you can read How to Report Bugs Effectively, which takes around 20 to 30 minutes.

Unfortunately, short of rewriting sgtatham's guide, I do not feel there is much more we can do as a general guide. I find esr's guide to be too verbose and commanding, so sgtatham it will be for now.

The prose and literacy

In the end, there is a fundamental issue with reporting bugs: it assumes our users are literate and capable of writing amazing prose that we will enjoy reading as the last J.K. Rowling novel (if you're into that kind of thing). It's just an unreasonable expectation: some of your users don't even speak the same language as you, let alone read or write it. This makes for challenging collaboration, to say the least. This is where automated reporting makes sense: it doesn't require user's intervention, and the communication is mediated by machines without human intervention and their pesky culture.

But we should, as maintainers, "be liberal in what we accept and conservative in what we send". Be tolerant, and help your users in fixing their issues. It's what you are there for, after all.

And in the end, we all fail the same way. In an attempt to improve the situation on bug reporting guides, I seem to have myself written a 2000 short story that will have taken up a hopefully pleasant 10 minutes of your time at minimum. Hopefully I will have succeeded at being clear, specific, verbose and concise all at once and look forward to your feedback on how to improve our bug reporting culture.

Categories: External Blogs

Epic Lameness

Eric Dorland - Mon, 09/01/2008 - 17:26
SF.net now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Categories: External Blogs

Sentiment Sharing

Eric Dorland - Mon, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Categories: External Blogs
Syndicate content