In recent years, privacy issues have become a growing concern among free-software projects and users. As more and more software tasks become web-based, surveillance and tracking of users is also on the rise. While some software may use advertising as a source of revenue, which has the side effect of monitoring users, the Django community recently got into an interesting debate surrounding a proposal to add user tracking—actually developer tracking—to the popular Python web framework.Tracking for funding
A novel aspect of this debate is that the initiative comes from concerns of the Django Software Foundation (DSF) about funding. The proposal suggests that "relying on the free labor of volunteers is ineffective, unfair, and risky" and states that "the future of Django depends on our ability to fund its development". In fact, the DSF recently hired an engineer to help oversee Django's development, which has been quite successful in helping the project make timely releases with fewer bugs. Various fundraising efforts have resulted in major new Django features, but it is difficult to attract sponsors without some hard data on the usage of Django.
The proposed feature tries to count the number of "unique developers" and gather some metrics of their environments by using Google Analytics (GA) in Django. The actual proposal (DEP 8) is done as a pull request, which is part of Django Enhancement Proposal (DEP) process that is similar in spirit to the Python Enhancement Proposal (PEP) process. DEP 8 was brought forward by a longtime Django developer, Jacob Kaplan-Moss.
The rationale is that "if we had clear data on the extent of Django's usage, it would be much easier to approach organizations for funding". The proposal is essentially about adding code in Django to send a certain set of metrics when "developer" commands are run. The system would be "opt-out", enabled by default unless turned off, although the developer would be warned the first time the phone-home system is used. The proposal notes that an opt-in system "severely undercounts" and is therefore not considered "substantially better than a community survey" that the DSF is already doing.Information gathered
The pieces of information reported are specifically designed to run only in a developer's environment and not in production. The metrics identified are, at the time of writing:
- an event category (the developer commands: startproject, startapp, runserver)
- the HTTP User-Agent string identifying the Django, Python, and OS versions
- a user-specific unique identifier (a UUID generated on first run)
The proposal mentions the use of the GA aip flag which, according to GA documentation, makes "the IP address of the sender 'anonymized'". It is not quite clear how that is done at Google and, given that it is a proprietary platform, there is no way to verify that claim. The proposal says it means that "we can't see, and Google Analytics doesn't store, your actual IP". But that is not actually what Google does: GA stores IP addresses, the documentation just says they are anonymized, without explaining how.
GA is presented as a trade-off, since "Google's track record indicates that they don't value privacy nearly as high" as the DSF does. The alternative, deploying its own analytics software, was presented as making sustainability problems worse. According to the proposal, Google "can't track Django users. [...] The only thing Google could do would be to lie about anonymizing IP addresses, and attempt to match users based on their IPs".
The truth is that we don't actually know what Google means when it "anonymizes" data: Jannis Leidel, a Django team member, commented that "Google has previously been subjected to secret US court orders and was required to collaborate in mass surveillance conducted by US intelligence services" that limit even Google's capacity of ensuring its users' anonymity. Leidel also argued that the legal framework of the US may not apply elsewhere in the world: "for example the strict German (and by extension EU) privacy laws would exclude the automatic opt-in as a lawful option".
Furthermore, the proposal claims that "if we discovered Google was lying about this, we'd obviously stop using them immediately", but it is unclear exactly how this could be implemented if the software was already deployed. There are also concerns that an implementation could block normal operation, especially in countries (like China) where Google itself may be blocked. Finally, some expressed concerns that the information could constitute a security problem, since it would unduly expose the version number of Django that is running.In other projects
Django is certainly not the first project to consider implementing analytics to get more information about its users. The proposal is largely inspired by a similar system implemented by the OS X Homebrew package manager, which has its own opt-out analytics.
Other projects embed GA code directly in their web pages. This is apparently the option chosen by the Oscar Django-based ecommerce solution, but that was seen by the DSF as less useful since it would count Django administrators and wasn't seen as useful as counting developers. Wagtail, a Django-based content-management system, was incorrectly identified as using GA directly, as well. It actually uses referrer information to identify installed domains through the version updates checks, with opt-out. Wagtail didn't use GA because the project wanted only minimal data and it was worried about users' reactions.
Eric Holscher, co-founder of Read the Docs, said the project is considering using Sentry for centralized reporting, which is a different idea, but interesting considering Sentry is fully open source. So even though it is a commercial service (as opposed to the closed-source Google Analytics), it may be possible to verify any anonymity claims.Debian's response
Since Django is shipped with Debian, one concern was the reaction of the distribution to the change. Indeed, "major distros' positions would be very important for public reception" to the feature, another developer stated.
One of the current maintainers of Django in Debian, Raphaël Hertzog, explicitly stated from the start that such a system would "likely be disabled by default in Debian". There were two short discussions on Debian mailing lists where the overall consensus seemed to be that any opt-out tracking code was undesirable in Debian, especially if it was aimed at Google servers.
I have done some research to see what, exactly, was acceptable as a phone-home system in the Debian community. My research has revealed ten distinct bug reports against packages that would unexpectedly connect to the network, most of which were not directly about collecting statistics but more often about checking for new versions. In most cases I found, the feature was disabled. In the case of version checks, it seems right for Debian to disable the feature, because the package cannot upgrade itself: that task is delegated to the package manager. One of those issues was the infamous "OK Google" voice activation binary blog controversy that was previously reported here and has since then been fixed (although other issues remain in Chromium).
I have also found out that there is no clearly defined policy in Debian regarding tracking software. What I have found, however, is that there seems to be a strong consensus in Debian that any tracking is unacceptable. This is, for example, an extract of a policy that was drafted (but never formally adopted) by Ian Jackson, a longtime Debian developer:
Software in Debian should not communicate over the network except: in order to, and as necessary to, perform their function[...]; or for other purposes with explicit permission from the user.
In other words, opt-in only, period. Jackson explained that "when we originally wrote the core of the policy documents, the DFSG [Debian Free Software Guidelines], the SC [Social Contract], and so on, no-one would have considered this behaviour acceptable", which explains why no explicit formal policy has been adopted yet in the Debian project.
One of the concerns with opt-out systems (or even prompts that default to opt-in) was well explained back then by Debian developer Bas Wijnen:
It very much resembles having to click through a license for every package you install. One of the nice things about Debian is that the user doesn't need to worry about such things: Debian makes sure things are fine.
One could argue that Debian has its own tracking systems. For example, by default, Debian will "phone home" through the APT update system (though it only reports the packages requested). However, this is currently not automated by default, although there are plans to do so soon. Furthermore, Debian members do not consider APT as tracking, because it needs to connect to the network to accomplish its primary function. Since there are multiple distributed mirrors (which the user gets to choose when installing), the risk of surveillance and tracking is also greatly reduced.
A better parallel could be drawn with Debian's popcon system, which actually tracks Debian installations, including package lists. But as Barry Warsaw pointed out in that discussion, "popcon is 'opt-in' and [...] the overwhelming majority in Debian is in favour of it in contrast to 'opt-out'". It should be noted that popcon, while opt-in, defaults to "yes" if users click through the install process. [Update: As pointed out in the comments, popcon actually defaults to "no" in Debian.] There are around 200,000 submissions at this time, which are tracked with machine-specific unique identifiers that are submitted daily. Ubuntu, which also uses the popcon software, gets around 2.8 million daily submissions, while Canonical estimates there are 40 million desktop users of Ubuntu. This would mean there is about an order of magnitude more installations than what is reported by popcon.
Policy aside, Warsaw explained that "Debian has a reputation for taking privacy issues very serious and likes to keep it".Next steps
There are obviously disagreements within the Django project about how to handle this problem. It looks like the phone-home system may end up being implemented as a proxy system "which would allow us to strip IP addresses instead of relying on Google to anonymize them, or to anonymize them ourselves", another Django developer, Aymeric Augustin, said. Augustin also stated that the feature wouldn't "land before Django drops support for Python 2", which is currently estimated to be around 2020. It is unclear, then, how the proposal would resolve the funding issues, considering how long it would take to deploy the change and then collect the information so that it can be used to spur the funding efforts.
It also seems the system may explicitly prompt the user, with an opt-out default, instead of just splashing a warning or privacy agreement without a prompt. As Shai Berger, another Django contributor, stated, "you do not get [those] kind of numbers in community surveys". Berger also made the argument that "we trust the community to give back without being forced to do so"; furthermore:
I don't believe the increase we might get in the number of reports by making it harder to opt-out, can be worth the ill-will generated for people who might feel the reporting was "sneaked" upon them, or even those who feel they were nagged into participation rather than choosing to participate.
Other options may also include gathering metrics in pip or PyPI, which was proposed by Donald Stufft. Leidel also proposed that the system could ask to opt-in only after a few times the commands are called.
It is encouraging to see that a community can discuss such issues without heating up too much and shows great maturity for the Django project. Every free-software project may be confronted with funding and sustainability issues. Django seems to be trying to address this in a transparent way. The project is willing to engage with the whole spectrum of the community, from the top leaders to downstream distributors, including individual developers. This practice should serve as a model, if not of how to do funding or tracking, at least of how to discuss those issues productively.
Everyone seems to agree the point is not to surveil users, but improve the software. As Lars Wirzenius, a Debian developer, commented: "it's a very sad situation if free software projects have to compromise on privacy to get funded". Hopefully, Django will be able to improve its funding without compromising its principles.
In past articles, I've looked at several libraries or specialist applications that can be used to model some physical process or another. Sometimes though you want to be able to model several different processes at the same time and in an interactive mode. more>>
After a great time at PyCon Canada and the holidays season only a few weeks away, we see this as a great time to get together and talk about Python. We are lucky to welcome both new and returning speakers from the Montreal community. So come and join us:Where
UQÀM, Pavillion PK 201, Président-Kennedy avenue Room PK-1140When
December 5th, 2016Schedule
- 6:00 - Doors Open
- 6:30 - Start of the presentations
- 7:30 - Break
- 7:45 - Part 2 of the presentations
- 9:00 - End of presentations. Drinks afterward at Benelux!
Matrix defines a set of open APIs for decentralised communication, suitable for securely publishing, persisting and subscribing to data over a global open federation of servers with no single point of control. Uses include Instant Messaging (IM), Voice over IP (VoIP) signalling, Internet of Things (IoT) communication, and bridging together existing communication silos - providing the basis of a new open real-time communication ecosystem.
Using a Rest interface to control an optics laboratory helps to decouple the physical layer (real lab hardware) from the spirit of the tests. The result is an uniform API with greater simplicity for test creation and free portability.
A short overview of how saved 10k$ of research funds by interfacing my beam line at the UdeM linear accelerator with a raspberry pi 3, python (2!), arduino and my salary as a postdoc. A proof that bigger is not always better and that small things can land you on the front page of the university student's journal if you're not careful enough...fileinput”
We’d like to thank our sponsors for their continued support:
- Savoir-faire Linux
Though short of Mr Torvalds' aim of world domination, FutureVault, Inc., has set the ambitious goal to "change the way business is done" with its FutureVault digital collaborative vault application. more>>
In my Open-Source Classroom column in the May 2016 issue, I discussed how to set up Gmail as your SMTP provider for outgoing email. The problem with email is that sometimes the sheer quantity of it makes important messages slip past my radar. So for really important error messages, I like to get SMS messages. more>>
There are hundreds of applications for OS X that place information in the menu bar. Usually, I can find one that almost does what I want, but not quite. Thankfully I found BitBar, which is an open-source project that allows you to write scripts and have their output refreshed and put on the menu bar. more>>
You would have a difficult time today finding a radio station that was all-live and did not have some kind of computerized, automated means of storing and playing audio. more>>
Telco TV/OTT and IPTV operators must deal with the fact that many IP transport streams are asynchronous. This makes the streams prone to poor video quality due to jitter if they are sent to Program Clock Reference (PCR)-compliant devices. more>>
In the May 2016 issue (also available here), I introduced the idea of the Tiny Internet Project, a self-contained Linux project that shows how to build the key pieces of the public internet on a single computer using one or two old computers, a router and a bunch of Linux software. more>>
IT operations seeking to optimize valuable data-center rack space while improving efficiency are the target customers sought by Equus Computer Systems, Inc., for its "unique" new 1.5U server with data transfer rates of 12GiB/s. more>>
The Turris Omnia router is not the first FLOSS router out there, but it could well be one of the first open hardware routers to be available. As the crowdfunding campaign is coming to a close, it is worth reflecting on the place of the project in the ecosystem. Beyond that, I got my hardware recently, so I was able to give it a try.A short introduction to the Omnia project
The Omnia router is a followup project on CZ.NIC's original research project, the Turris. The goal of the project was to identify hostile traffic on end-user networks and develop global responses to those attacks across every monitored device. The Omnia is an extension of the original project: more features were added and data collection is now opt-in. Whereas the original Turris was simply a home router, the new Omnia router includes:
- 1.6GHz ARM CPU
- 1-2GB RAM
- 8GB flash storage
- 6 Gbit Ethernet ports
- SFP fiber port
- 2 Mini-PCI express ports
- mSATA port
- 3 MIMO 802.11ac and 2 MIMO 802.11bgn radios and antennas
- SIM card support for backup connectivity
Some models sold had a larger case to accommodate extra hard drives, turning the Omnia router into a NAS device that could actually serve as a multi-purpose home server. Indeed, it is one of the objectives of the project to make "more than just a router". The NAS model is not currently on sale anymore, but there are plans to bring it back along with LTE modem options and new accessories "to expand Omnia towards home automation".
Omnia runs a fork of the OpenWRT distribution called TurrisOS that has been customized to support automated live updates, a simpler web interface, and other extra features. The fork also has patches to the Linux kernel, which is based on Linux 4.4.13 (according to uname -a). It is unclear why those patches are necessary since the ARMv7 Armada 385 CPU has been supported in Linux since at least 4.2-rc1, but it is common for OpenWRT ports to ship patches to the kernel, either to backport missing functionality or perform some optimization.
There has been some pressure from backers to petition Turris to "speedup the process of upstreaming Omnia support to OpenWrt". It could be that the team is too busy with delivering the devices already ordered to complete that process at this point. The software is available on the CZ-NIC GitHub repository and the actual Linux patches can be found here and here. CZ.NIC also operates a private GitLab instance where more software is available. There is technically no reason why you wouldn't be able to run your own distribution on the Omnia router: OpenWRT development snapshots should be able to run on the Omnia hardware and some people have installed Debian on Omnia. It may require some customization (e.g. the kernel) to make sure the Omnia hardware is correctly supported. Most people seem to prefer to run TurrisOS because of the extra features.
The hardware itself is also free and open for the most part. There is a binary blob needed for the 5GHz wireless card, which seems to be the only proprietary component on the board. The schematics of the device are available through the Omnia wiki, but oddly not in the GitHub repository like the rest of the software.Hands on
I received my own router last week, which is about six months late from the original April 2016 delivery date; it allowed me to do some hands-on testing of the device. The first thing I noticed was a known problem with the antenna connectors: I had to open up the case to screw the fittings tight, otherwise the antennas wouldn't screw in correctly.
Once that was done, I simply had to go through the usual process of setting up the router, which consisted of connecting the Omnia to my laptop with an Ethernet cable, connecting the Omnia to an uplink (I hooked it into my existing network), and go through a web wizard. I was pleasantly surprised with the interface: it was smooth and easy to use, but at the same time imposed good security practices on the user.
For example, the wizard, once connected to the network, goes through a full system upgrade and will, by default, automatically upgrade itself (including reboots) when new updates become available. Users have to opt-in to the automatic updates, and can chose to automate only the downloading and installation of the updates without having the device reboot on its own. Reboots are also performed during user-specified time frames (by default, Omnia applies kernel updates during the night). I also liked the "skip" button that allowed me to completely bypass the wizard and configure the device myself, through the regular OpenWRT systems (like LuCI or SSH) if I needed to.
Notwithstanding the antenna connectors themselves, the hardware is nice. I ordered the black metal case, and I must admit I love the many LED lights in the front. It is especially useful to have color changes in the reset procedure: no more guessing what state the device is in or if I pressed the reset button long enough. The LEDs can also be dimmed to reduce the glare that our electronic devices produce.
All this comes at a price, however: at \$250 USD, it is a much higher price tag than common home routers, which typically go for around \$50. Furthermore, it may be difficult to actually get the device, because no orders are being accepted on the Indiegogo site after October 31. The Turris team doesn't actually want to deal with retail sales and has now delegated retail sales to other stores, which are currently limited to European deliveries.A nice device to help fight off the IoT apocalypse
It seems there isn't a week that goes by these days without a record-breaking distributed denial-of-service (DDoS) attack. Those attacks are more and more caused by home routers, webcams, and "Internet of Things" (IoT) devices. In that context, the Omnia sets a high bar for how devices should be built but also how they should be operated. Omnia routers are automatically upgraded on a nightly basis and, by default, do not provide telnet or SSH ports to run arbitrary code. There is the password-less wizard that starts up on install, but it forces the user to chose a password in order to complete the configuration.
Both the hardware and software of the Omnia are free and open. The automatic update's EULA explicitly states that the software provided by CZ.NIC "will be released under a free software licence" (and it has been, as mentioned earlier). This makes the machine much easier to audit by someone looking for possible flaws, say for example a customs official looking to approve the import in the eventual case where IoT devices end up being regulated. But it also makes the device itself more secure. One of the problems with these kinds of devices is "bit rot": they have known vulnerabilities that are not fixed in a timely manner, if at all. While it would be trivial for an attacker to disable the Omnia's auto-update mechanisms, the point is not to counterattack, but to prevent attacks on known vulnerabilities.
The CZ.NIC folks take it a step further and encourage users to actively participate in a monitoring effort to document such attacks. For example, the Omnia can run a honeypot to lure attackers into divulging their presence. The Omnia also runs an elaborate data collection program, where routers report malicious activity to a central server that collects information about traffic flows, blocked packets, bandwidth usage, and activity from a predefined list of malicious addresses. The exact data collected is specified in another EULA that is currently only available to users logged in at the Turris web site. That data can then be turned into tweaked firewall rules to protect the overall network, which the Turris project calls a distributed adaptive firewall. Users need to explicitly opt-in to the monitoring system by registering on a portal using their email address.
Turris devices also feature the Majordomo software (not to be confused with the venerable mailing list software) that can also monitor devices in your home and identify hostile traffic, potentially leading users to take responsibility over the actions of their own devices. This, in turn, could lead users to trickle complaints back up to the manufacturers that could change their behavior. It turns out that some companies do care about their reputations and will issue recalls if their devices have significant enough issues.
It remains to be seen how effective the latter approach will be, however. In the meantime, the Omnia seems to be an excellent all-around server and router for even the most demanding home or small-office environments that is a great example for future competitors.
Back in 2014, I highlighted Waze, which is a turn-by-turn GPS navigation program created by a startup in Israel. That company was bought by Google, but it still remains independent, at least for now. (It does share some data behind the scenes, but it functions differently when it comes to routing.) more>>
Just in time for the Holiday season, we are gathering the Python community together. This time we've decided to return to our roots at UQAM. For this opportunity, we are looking for speakers. It's your chance to submit a talk. Just write us at firstname.lastname@example.org.
We are particularly looking for people willing to present lighting talks. Don't hesitate and send us your proposition or join us on slack by subscribing at http://slack.mtlpy.org/ to ask us any question.Where
UQAM, more details to comeWhen
Monday, December 5th, 2016 at 6pm
We’d like to thank our sponsors for their continued support:
- Savoir-faire Linux
Zimbra Collaboration Suite is a successful open-source collaboration application that includes email, calendaring, file sharing, chat and video chat. Zimbra's developer, Synacor, Inc., recently released two new Zimbra-related offerings, namely Zimbra Open Source Support (ZOSS) and Zimbra Suite Plus. more>>
Every year, SUSE honors 4 companies worldwide, one in each region of the globe: Latin America, APAC, EMEA and North America. Recipient companies are recognized for “defining the future:” using SUSE open source solutions for IT transformation, increased business agility and continuity. 2016 award recipients are :
- Ach á Laboratorios Farmaceuticos S.A.
The following is a list of security exercises you can try after reading Susan Sons' article "Security Exercises".
1) It's Gone more>>
Regular security exercises are, bar none, the most powerful, cost-effective tool for maturing a project's information security operations—when done well. more>>
While out in the streets of DC there was alternately depression and elation, gnashing of teeth and celebration, at SUSECon yesterday, SUSE announced SUSE Linux Enterprise (SLE) 12 Service Pack 2 designed to power physical, virtual and cloud-based mission-critical environments. more>>