On February 28th, GitHub published a brand new version of its Terms of Service (ToS). While the first draft announced earlier in February didn't generate much reaction, the new ToS raised concerns that they may break at least the spirit, if not the letter, of certain free-software licenses. Digging in further reveals that the situation is probably not as dire as some had feared.
The first person to raise the alarm was probably Thorsten Glaser, a Debian developer, who stated that the "new GitHub Terms of Service require removing many Open Source works from it". His concerns are mainly about section D of the document, in particular section D.4 which states:
You grant us and our legal successors the right to store and display your Content and make incidental copies as necessary to render the Website and provide the Service.
Section D.5 then goes on to say:
[...] You grant each User of GitHub a nonexclusive, worldwide license to access your Content through the GitHub Service, and to use, display and perform your Content, and to reproduce your Content solely on GitHub as permitted through GitHub's functionalityToS versus GPL
The concern here is that the ToS bypass the normal provisions of licenses like the GPL. Indeed, copyleft licenses are based on copyright law which forbid users from doing anything with the content unless they comply with the license, which forces, among other things, "share alike" properties. By granting GitHub and its users rights to reproduce content without explicitly respecting the original license, the ToS may allow users to bypass the copyleft nature of the license. Indeed, as Joey Hess, author of git-annex, explained :
The new TOS is potentially very bad for copylefted Free Software. It potentially neuters it entirely, so GPL licensed software hosted on Github has an implicit BSD-like license
Hess has since removed all his content (mostly mirrors) from GitHub.
Others disagree. In a well-reasoned blog post, Debian developer Jonathan McDowell explained the rationale behind the changes:
My reading of the GitHub changes is that they are driven by a desire to ensure that GitHub are legally covered for the things they need to do with your code in order to run their service.
This seems like a fair point to make: GitHub needs to protect its own rights to operate the service. McDowell then goes on to do a detailed rebuttal of the arguments made by Glaser, arguing specifically that section D.5 "does not grant [...] additional rights to reproduce outside of GitHub".
However, specific problems arise when we consider that GitHub is a private corporation that users have no control over. The "Services" defined in the ToS explicitly "refers to the applications, software, products, and services provided by GitHub". The term "Services" is therefore not limited to the current set of services. This loophole may actually give GitHub the right to bypass certain provisions of licenses used on GitHub. As Hess detailed in a later blog post:
If Github tomorrow starts providing say, an App Store service, that necessarily involves distribution of software to others, and they put my software in it, would that be allowed by this or not?
If that hypothetical Github App Store doesn't sell apps, but licenses access to them for money, would that be allowed under this license that they want to my software?
However, when asked on IRC, Bradley M. Kuhn of the Software Freedom Conservancy explained that "ultimately, failure to comply with a copyleft license is a copyright infringement" and that the ToS do outline a process to deal with such infringement. Some lawyers have also publicly expressed their disagreement with Glaser's assessment, with Richard Fontana from Red Hat saying that the analysis is "basically wrong". It all comes down to the intent of the ToS, as Kuhn (who is not a lawyer) explained:
any license can be abused or misused for an intent other than its original intent. It's why it matters to get every little detail right, and I hope Github will do that.
He went even further and said that "we should assume the ambiguity in their ToS as it stands is favorable to Free Software".
The ToS are in effect since February 28th; users "can accept them by clicking the broadcast announcement on your dashboard or by continuing to use GitHub". The immediacy of the change is one of the reasons why certain people are rushing to remove content from GitHub: there are concerns that continuing to use the service may be interpreted as consent to bypass those licenses. Hess even hosted a separate copy of the ToS [PDF] for people to be able to read the document without implicitly consenting. It is, however, unclear how a user should remove their content from the GitHub servers without actually agreeing to the new ToS.CLAs
When I read the first draft, I initially thought there would be concerns about the mandatory Contributor License Agreement (CLA) in section D.5 of the draft:
[...] unless there is a Contributor License Agreement to the contrary, whenever you make a contribution to a repository containing notice of a license, you license your contribution under the same terms, and agree that you have the right to license your contribution under those terms.
I was concerned this would establish the controversial practice of forcing CLAs on every GitHub user. I managed to find a post from a lawyer, Kyle E. Mitchell, who commented on the draft and, specifically, on the CLA. He outlined issues with wording and definition problems in that section of the draft. In particular, he noted that "contributor license agreement is not a legal term of art, but an industry term" and "is a bit fuzzy". This was clarified in the final draft, in section D.6, by removing the use of the CLA term and by explicitly mentioning the widely accepted norm for licenses: "inbound=outbound". So it seems that section D.6 is not really a problem: contributors do not need to necessarily delegate copyright ownership (as some CLAs require) when they make a contribution, unless otherwise noted by a repository-specific CLA.
An interesting concern he raised, however, was with how GitHub conducted the drafting process. A blog post announced the change on February 7th with a link to a form to provide feedback until the 21st, with a publishing deadline of February 28th. This gave little time for lawyers and developers to review the document and comment on it. Users then had to basically accept whatever came out of the process as-is.
Unlike every software project hosted on GitHub, the ToS document is not part of a Git repository people can propose changes to or even collaboratively discuss. While Mitchell acknowledges that "GitHub are within their rights to update their terms, within very broad limits, more or less however they like, whenever they like", he sets higher standards for GitHub than for other corporations, considering the community it serves and the spirit it represents. He described the process as:
[...] consistent with the value of CYA, which is real, but not with the output-improving virtues of open process, which is also real, and a great deal more pleasant.
Mitchell also explained that, because of its position, GitHub can have a major impact on the free-software world.
And as the current forum of preference for a great many developers, the knock-on effects of their decisions throw big weight. While GitHub have the wheel—and they’ve certainly earned it for now—they can do real damage.
In particular, there have been some concerns that the ToS change may be an attempt to further the already diminishing adoption of the GPL for free-software projects; on GitHub, the GPL has been surpassed by the MIT license. But Kuhn believes that attitudes at GitHub have begun changing:
GitHub historically had an anti-copyleft culture, which was created in large part by their former and now ousted CEO, Preston-Warner. However, recently, I've seen people at GitHub truly reach out to me and others in the copyleft community to learn more and open their minds. I thus have a hard time believing that there was some anti-copyleft conspiracy in this ToS change.GitHub response
However, it seems that GitHub has actually been proactive in reaching out to the free software community. Kuhn noted that GitHub contacted the Conservancy to get its advice on the ToS changes. While he still thinks GitHub should fix the ambiguities quickly, he also noted that those issues "impact pretty much any non-trivial Open Source and Free Software license", not just copylefted material. When reached for comments, a GitHub spokesperson said:
While we are confident that these Terms serve the best needs of the community, we take our users' feedback very seriously and we are looking closely at ways to address their concerns.
There are now free-software alternatives to GitHub. GitLab.com, for example, does not seem to have similar licensing issues in its ToS and GitLab itself is free software, although based on the controversial open core business model. The GitLab hosting service still needs to get better than its grade of "C" in the GNU Ethical Repository Criteria Evaluations (and it is being worked on); other services like GitHub and SourceForge score an "F".
In the end, all this controversy might have been avoided if GitHub was generally more open about the ToS development process and gave more time for feedback and reviews by the community. Terms of service are notorious for being confusing and something of a legal gray area, especially for end users who generally click through without reading them. We should probably applaud the efforts made by GitHub to make its own ToS document more readable and hope that, with time, it will address the community's concerns.
We are launching our next mtlpy meetup for the beginning of April, just in time to welcome in the spring. We want to invite the adventurous on stage to present a geat python discovery. So if you are interested to do a lighting talk, or even a 30 minute talk. There's so much stuff happening these days in the Python community and now is the best time to come to present!
If you are interested to present just send us an email at email@example.com or fill our form at https://goo.gl/forms/3NYgD4kEiU2JTLWK2. You can also join us on slack at http://slack.mtlpy.org/.Where
To be announcedWhen
Monday April 10th 2017 at 6pm
These are notes from my research that led to the publication of the password hashers article. This article is more technical than the previous ones and compares the various cryptographic primitives and algorithms used in the various software I have reviewed. The criteria for inclusion on this list is fairly vague: I mostly included a password hasher if it was significantly different from the previous implementations in some way, and I have included all the major ones I could find as well.The first password hashers
Nic Wolff claims to be the first to have written such a program, all the way back in 2003. Back then the hashing algorithm was MD5, although Wolff has now updated the algorithm to use SHA-1 and still maintains his webpage for public use. Another ancient but unrelated implementation, is the Standford University Applied Cryptography's pwdhash software. That implementation was published in 2004 and unfortunately, that implementation was not updated and still uses MD5 as an hashing algorithm, but at least it uses HMAC to generate tokens, which makes the use of rainbow tables impractical. Those implementations are the simplest password hashers: the inputs are simply the site URL and a password. So the algorithms are, basically, for Wolff's:token = base64(SHA1(password + domain))
And for Standford's PwdHash:token = base64(HMAC(MD5, password, domain))) SuperGenPass
Another unrelated implementation that is still around is supergenpass is a bookmarklet that was created around 2007, originally using MD5 as well but now supports SHA512 now although still limited to 24 characters like MD5 (which needlessly limits the entropy of the resulting password) and still defaults MD5 with not enough rounds (10, when key derivation recommendations are more generally around 10 000, so that it's slower to bruteforce).
Note that Chris Zarate, the supergenpass author, actually credits Nic Wolff as the inspiration for his implementation. Supergenpass is still in active development and is available for the browser (as a bookmarklet) or mobile (as an webpage). Supergenpass allows you to modify the password length, but also add an extra profile secret which adds to the password and generates a personalized identicon presumably to prevent phishing but it also introduces the interesting protection, the profile-specific secret only found later in Password Hasher Plus. So the Supergenpass algorithm looks something like this:token = base64(SHA512(password + profileSecret + ":" + domain, rounds)) The Wijjo Password Hasher
Another popular implementation is the Wijjo Password Hasher, created around 2006. It was probably the first shipped as a browser extension which greatly improved the security of the product as users didn't have to continually download the software on the fly. Wijjo's algorithm also improved on the above algorithms, as it uses HMAC-SHA1 instead of plain SHA-1 or HMAC-MD5, which makes it harder to recover the plaintext. Password Hasher allows you to set different password policies (use digits, punctuation, mixed case, special characters and password length) and saves the site names it uses for future reference. It also happens that the Wijjo Password Hasher, in turn, took its inspiration on different project, hashapass.com, created in 2006 and also based on HMAC-SHA-1. Indeed, hashapass "can easily be generated on almost any modern Unix-like system using the following command line pattern":echo -n parameter \ | openssl dgst -sha1 -binary -hmac password \ | openssl enc -base64 \ | cut -c 1-8
So the algorithm here is obviously:token = base64(HMAC(SHA1, password, domain + ":" + counter)))[:8]
... although in the case of Password Hasher, there is a special routine that takes the token and inserts random characters in locations determined by the sum of the values of the characters in the token.Password Hasher Plus
Honestly, that seems rather strange, but it's what I read from the source code, which is available only after decompressing the extension nowadays. I would have expected the simplest version:token = base64(HMAC(SHA1, HMAC(SHA1, profileSecret, password), domain + ":" + counter))
The idea here would be "hide" the master password from bruteforce attacks as soon as possible... But maybe this is all equivalent.
Regardless, Password Hasher Plus then takes the token and applies the same special character insertion routine as the Password Hasher.LessPass
Last year, Guillaume Vincent a french self-described "humanist and scuba diving fan" released the lesspass extension for Chrome, Firefox and Android. Lesspass introduces several interesting features. It is probably the first to include a commandline version. It also uses a more robust key derivation algorithm (PBKDF2) and takes into account the username on the site, allowing multi account support. The original release (version 1) used only 8192 rounds which is now considered too low. In the bug report it was interesting to note that LessPass couldn't do the usual practice of running the key derivation for 1 second to determine the number of rounds needed as the results need to be deterministic.
At first glance, the LessPass source code seems clear and easy to read which is always a good sign, but of course, the devil is in the details. One key feature that is missing from Password Hasher Plus is the profile-specific seed, although it should be impossible, for a hostile web page to steal keystrokes from a browser extension, as far as I know.
The algorithm then gets a little more interesting:entropy = PBKDF2(SHA256, masterPassword, domain + username + counter, rounds, length) where rounds=10000 length=32
entropy is then used to pick characters to match the chosen profile.
There is also a lesspass-specific character picking routing that is also not base64, and different from the original Password Hasher algorithm.Master Password
A review of password hashers would hardly be complete without mentioning the Master Password and its elaborate algorithm. While the applications surrounding the project are not as refined (there is no web browser plugin and the web interface can't be easily turned into a bookmarklet), the algorithm has been well developed. Of all the password managers reviewed here, Master Password uses one of the strongest key derivation algorithms out there, scrypt:key = scrypt( password, salt, cost, size, parallelization, length ) where salt = "com.lyndir.masterpassword" + len(username) + name cost = 32768 size = 8 parallelization = 2 length = 64 entropy = hmac-sha256(key, "com.lyndir.masterpassword" + len(domain) + domain + counter )
Master Password the uses one of 6 sets of "templates" specially crafted to be "easy for a user to read from a screen and type using a keyboard or smartphone" and "compatible with most site's password policies", our "transferable" criteria defined in the first passwords article. For example, the default template mixes vowels, consonants, numbers and symbols, but carefully avoiding possibly visibly similar characters like O and 0 or i and 1 (although it does mix 1 and l, oddly enough).
The main strength of Master Password seems to be the clear definition of its algorithm (although Hashpass.com does give out OpenSSL commandline examples...), which led to its reuse in another application called freepass. The Master Password app also doubles as a stateful password manager...Other implementations
I have also considered including easypasswords, which uses PBKDF2-HMAC-SHA1, in my list of recommendations. I discovered only recently that the author wrote a detailed review of many more password hashers and scores them according to their relative strength. In the end, I ended up covering more LessPass since the design is very similar and LessPass does seem a bit more usable. Covering LessPass also allowed me to show the contrast and issues regarding the algorithm changes, for example.
It is also interesting to note that the EasyPasswords author has criticized the Master Password algorithm quite severely:
[...] scrypt isn’t being applied correctly. The initial scrypt hash calculation only depends on the username and master password. The resulting key is combined with the site name via SHA-256 hashing then. This means that a website only needs to break the SHA-256 hashing and deduce the intermediate key — as long as the username doesn’t change this key can be used to generate passwords for other websites. This makes breaking scrypt unnecessary[...]
During a discussion with the Master Password author, he outlined that "there is nothing "easy" about brute-force deriving a 64-byte key through a SHA-256 algorithm." SHA-256 is used in the last stage because it is "extremely fast". scrypt is used as a key derivation algorithm to generate a large secret and is "intentionnally slow": "we don't want it to be easy to reverse the master password from a site password". "But it' unnecessary for the second phase because the input to the second phase is so large. A master password is tiny, there are only a few thousand or million possibilities to try. A master key is 8^64, the search space is huge. Reversing that doesn't need to be made slower. And it's nice for the password generation to be fast after the key has been prepared in-memory so we can display site passwords easily on a mobile app instead of having to lock the UI a few seconds for every password."
Finally, I considered covering Blum's Mental Hash (also covered here and elsewhere). This consists of an algorithm that can basically be ran by the human brain directly. It's not for the faint of heart, however: if I understand it correctly, it will require remembering a password that is basically a string of 26 digits, plus compute modulo arithmetics on the outputs. Needless to say, most people don't do modulo arithmetics every day...
In previous articles, we have looked at how to generate passwords and did a review of various password managers. There is, however, a third way of managing passwords other than remembering them or encrypting them in a "vault", which is what I call "password hashing".
A password hasher generates site-specific passwords from a single master password using a cryptographic hash function. It thus allows a user to have a unique and secure password for every site they use while requiring no storage; they need only to remember a single password. You may know these as "deterministic or stateless password managers" but I find the "password manager" phrase to be confusing because a hasher doesn't actually store any passwords. I do not think password hashers represent a good security tradeoff so I generally do not recommend their use, unless you really do not have access to reliable storage that you can access readily.
In this article, I use the word "password" for a random string used to unlock things, but "token" to represent a generated random string that the user doesn't need to remember. The input to a password hasher is a password with some site-specific context and the output from a password hasher is a token.What is a password hasher?
A password hasher uses the master password and a label (generally the host name) to generate the site-specific password. To change the generated password, the user can modify the label, for example by appending a number. Some password hashers also have different settings to generate tokens of different lengths or compositions (symbols or not, etc.) to accommodate different site-specific password policies.
The biggest advantage of password hashers is that you only need to remember a single password. You do not need to carry around a password manager vault: there's no "state" (other than site-specific settings, which can be easily guessed). A password hasher named Master Password makes a compelling case against traditional password managers in its documentation:
It's as though the implicit assumptions are that everybody backs all of their stuff up to at least two different devices and backups in the cloud in at least two separate countries. Well, people don't always have perfect backups. In fact, they usually don't have any.
It goes on to argue that, when you lose your password: "You lose everything. You lose your own identity."
The stateless nature of password hashers also means you do not need to use cloud services to synchronize your passwords, as there is (generally, more on that later) no state to carry around. This means, for example, that the list of accounts that you have access to is only stored in your head, and not in some online database that could be hacked without your knowledge. The downside of this is, of course, that attackers do not actually need to have access to your password hasher to start cracking it: they can try to guess your master key without ever stealing anything from you other than a single token you used to log into some random web site.
Password hashers also necessarily generate unique passwords for every site you use them on. While you can also do this with password managers, it is not an enforced decision. With hashers, you get distinct and strong passwords for every site with no effort.The problem with password hashers
If hashers are so great, why would you use a password manager? Programs like LessPass and Master Password seem to have strong crypto that is well implemented, so why isn't everyone using those tools?
Password hashing, as a general concept, actually has serious problems: since the hashing outputs are constantly compromised (they are sent in password forms to various possibly hostile sites), it's theoretically possible to derive the master password and then break all the generated tokens in one shot. The use of stronger key derivation functions (like PBKDF2, scrypt, or HMAC) or seeds (like a profile-specific secret) makes those attacks much harder, especially if the seed is long enough to make brute-force attacks infeasible. (Unfortunately, in the case of Password Hasher Plus, the seed is derived from Math.random() calls, which are not considered cryptographically secure.)
Basically, as stated by Julian Morrison in this discussion:
A password is now ciphertext, not a block of line noise. Every time you transmit it, you are giving away potential clues of use to an attacker. [...] You only have one password for all the sites, really, underneath, and it's your secret key. If it's broken, it's now a skeleton-key [...]
Newer implementations like LessPass and Master Password fix this by using reasonable key derivation algorithms (PBKDF2 and scrypt, respectively) that are more resistant to offline cracking attacks, but who knows how long those will hold? To give a concrete example, if you would like to use the new winner of the password hashing competition (Argon2) in your password manager, you can patch the program (or wait for an update) and re-encrypt your database. With a password hasher, it's not so easy: changing the algorithm means logging in to every site you visited and changing the password. As someone who used a password hasher for a few years, I can tell you this is really impractical: you quickly end up with hundreds of passwords. The LessPass developers tried to facilitate this, but they ended up mostly giving up.
Which brings us to the question of state. A lot of those tools claim to work "without a server" or as being "stateless" and while those claims are partly true, hashers are way more usable (and more secure, with profile secrets) when they do keep some sort of state. For example, Password Hasher Plus records, in your browser profile, which site you visited and which settings were used on each site, which makes it easier to comply with weird password policies. But then that state needs to be backed up and synchronized across multiple devices, which led LessPass to offer a service (which you can also self-host) to keep those settings online. At this point, a key benefit of the password hasher approach (not keeping state) just disappears and you might as well use a password manager.
Another issue with password hashers is choosing the right one from the start, because changing software generally means changing the algorithm, and therefore changing passwords everywhere. If there was a well-established program that was be recognized as a solid cryptographic solution by the community, I would feel more confident. But what I have seen is that there are a lot of different implementations each with its own warts and flaws; because changing is so painful, I can't actually use any of those alternatives.
Furthermore, I have had difficulty using password hashers in unified login environments like Wikipedia's or StackExchange's single-sign-on systems. Because they allow you to log in with the same password on multiple sites, you need to choose (and remember) what label you used when signing in. Did I sign in on stackoverflow.com? Or was it stackexchange.com?
Also, as mentioned in the previous article about password managers, web-based password managers have serious security flaws. Since more than a few password hashers are implemented using bookmarklets, they bring all of those serious vulnerabilities with them, which can range from account name to master password disclosures.
Finally, some of the password hashers use dubious crypto primitives that were valid and interesting a decade ago, but are really showing their age now. Stanford's pwdhash uses MD5, which is considered "cryptographically broken and unsuitable for further use". We have seen partial key recovery attacks against MD5 already and while those do not allow an attacker to recover the full master password yet (especially not with HMAC-MD5), I would not recommend anyone use MD5 in anything at this point, especially if changing that algorithm later is hard. Some hashers (like Password Hasher and Password Plus) use a a single round of SHA-1 to derive a token from a password; WPA2 (standardized in 2004) uses 4096 iterations of HMAC-SHA1. A recent US National Institute of Standards and Technology (NIST) report also recommends "at least 10,000 iterations of the hash function".Conclusion
Forced to suggest a password hasher, I would probably point to LessPass or Master Password, depending on the platform of the person asking. But, for now, I have determined that the security drawbacks of password hashers are not acceptable and I do not recommend them. It makes my password management recommendation shorter anyway: "remember a few carefully generated passwords and shove everything else in a password manager".
[Many thanks to Daniel Kahn Gillmor for the thorough reviews provided for the password articles.]
As we noted in an earlier article, passwords are a liability and we'd prefer to get rid of them, but the current reality is that we do use a plethora of passwords in our daily lives. This problem is especially acute for technology professionals, particularly system administrators, who have to manage a lot of different machines. But it also affects regular users who still use a large number of passwords, from their online bank to their favorite social-networking site. Despite the remarkable memory capacity of the human brain, humans are actually terrible at recalling even short sets of arbitrary characters with the precision needed for passwords.
Therefore humans reuse passwords, make them trivial or guessable, write them down on little paper notes and stick them on their screens, or just reset them by email every time. Our memory is undeniably failing us and we need help, which is where password managers come in. Password managers allow users to store an arbitrary number of passwords and just remember a single password to unlock them all.
But there is a large variety of password managers out there, so which one should we be using? At my previous job, an inventory was done of about 40 different free-software password managers in different stages of development and of varying quality. So, obviously, this article will not be exhaustive, but instead focus on a smaller set of some well-known options that may be interesting to readers.KeePass: the popular alternative
The most commonly used password-manager design pattern is to store passwords in a file that is encrypted and password-protected. The most popular free-software password manager of this kind is probably KeePass.
An important feature of KeePass is the ability to auto-type passwords in forms, most notably in web browsers. This feature makes KeePass really easy to use, especially considering it also supports global key bindings to access passwords. KeePass databases are designed for simultaneous access by multiple users, for example, using a shared network drive.
KeePass has a graphical interface written in C#, so it uses the Mono framework on Linux. A separate project, called KeePassX is a clean-room implementation written in C++ using the Qt framework. Both support the AES and Twofish encryption algorithms, although KeePass recently added support for the ChaCha20 cipher. AES key derivation is used to generate the actual encryption key for the database, but the latest release of KeePass also added using Argon2, which was the winner of the July 2015 password-hashing competition. Both programs are more or less equivalent, although the original KeePass seem to have more features in general.
The KeePassX project has recently been forked into another project now called KeePassXC that implements a set of new features that are present in KeePass but missing from KeePassX like:
- auto-type on Linux, Mac OS, and Windows
- database merging — which allows multi-user support
- using the web site's favicon in the interface
So far, the maintainers of KeePassXC seem to be open to re-merging the project "if the original maintainer of KeePassX in the future will be more active and will accept our merge and changes". I can confirm that, at the time of writing, the original KeePassX project now has 79 pending pull requests and only one pull request was merged since the last release, which was 2.0.3 in September 2016.
While KeePass and derivatives allow multiple users to access the same database through the merging process, they do not support multi-party access to a single database. This may be a limiting factor for larger organizations, where you may need, for example, a different password set for different technical support team levels. The solution in this case is to use separate databases for each team, with each team using a different shared secret.Pass: the standard password manager?
I am currently using password-store, or pass, as a password manager. It aims to be "the standard Unix password manager". Pass is a GnuPG-based password manager that features a surprising number of features given its small size:
- copy-paste support
- Git integration
- multi-user/group support
- pluggable extensions (in the upcoming 1.7 release)
The command-line interface is simple to use and intuitive. The following, will, for example, create a pass repository, a 20 character password for your LWN account and copy it to the clipboard:$ pass init $ pass generate -c lwn 20
The main issue with pass is that it doesn't encrypt the name of those entries: if someone were to compromise my machine, they could easily see which sites I have access to simply by listing the passwords stored in ~/.password-store. This is a deliberate design decision by the upstream project, as stated by a mailing list participant, Allan Odgaard:
Using a single file per item has the advantage of shell completion, using version control, browse, move and rename the items in a file browser, edit them in a regular editor (that does GPG, or manually run GPG first), etc.
Odgaard goes on to point out that there are alternatives that do encrypt the entire database (including the site names) if users really need that feature.
Furthermore, there is a tomb plugin for pass that encrypts the password store in a LUKS container (called a "tomb"), although it requires explicitly opening and closing the container, which makes it only marginally better than using full disk encryption system-wide. One could also argue that password file names do not hold secret information, only the site name and username, perhaps, and that doesn't require secrecy. I do believe those should be kept secret, however, as they could be used to discover (or prove) which sites you have access to and then used to perform other attacks. One could draw a parallel with the SSH known_hosts file, which used to be plain text but is now hashed so that hosts are more difficult to discover.
Also, sharing a database for multi-user support will require some sort of file-sharing mechanism. Given the integrated Git support, this will likely involve setting up a private Git repository for your team, something which may not be accessible to the average Linux user. Nothing keeps you, however, from sharing the ~/.password-store directory through another file sharing mechanism like (say) Syncthing or Dropbox.
You can use multiple distinct databases easily using the PASSWORD_STORE_DIR environment variable. For example, you could have a shell alias to use a different repository for your work passwords with:alias work-pass="PASSWORD_STORE_DIR=~/work-passwords pass"
Group support comes from a clever use of the GnuPG multiple-recipient encryption support. You simply have to specify multiple OpenPGP identities when initializing the repository, which also works in subdirectories:$ pass init -p Ateam firstname.lastname@example.org email@example.com mkdir: created directory '/home/me/.password-store/Ateam' Password store initialized for firstname.lastname@example.org, email@example.com [master 0e3dbe7] Set GPG id to firstname.lastname@example.org, email@example.com. 1 file changed, 2 insertions(+) create mode 100644 Ateam/.gpg-id
The above will configure pass to encrypt the passwords in the Ateam directory for firstname.lastname@example.org and email@example.com. Pass depends on GnuPG to do the right thing when encrypting files and how those identities are treated is entirely delegated to GnuPG's default configuration. This could lead to problems if arbitrary keys can be injected into your key ring, which could confuse GnuPG. I would therefore recommend using full key fingerprints instead of user identifiers.
Regarding the actual encryption algorithms used, in my tests, GnuPG 1.4.18 and 2.1.18 seemed to default to 256-bit AES for encryption, but that has not always been the case. The chosen encryption algorithm actually depends on the recipient's key preferences, which may vary wildly: older keys and versions may use anything from 128-bit AES to CAST5 or Triple DES. To figure out which algorithm GnuPG chose, you may want to try this pipeline:$ echo test | gpg -e -r firstname.lastname@example.org | gpg -d -v [...] gpg: encrypted with 2048-bit RSA key, ID XXXXXXX, created XXXXX "You Person You <email@example.com>" gpg: AES256 encrypted data gpg: original file name='' test
As you can see, pass is primarily a command-line application, which may make it less accessible to regular users. The community has produced different graphical interfaces that are either using pass directly or operate on the storage with their own GnuPG integration. I personally use pass in combination with Rofi to get quick access to my passwords, but less savvy users may want to try the QtPass interface, which should be more user-friendly. QtPass doesn't actually depend on pass and can use GnuPG directly to interact with the pass database; it is available for Linux, BSD, OS X, and Windows.Browser password managers
Most users are probably already using a password manager through their web browser's "remember password" functionality. For example, Chromium will ask if you want it to remember passwords and encrypt them with your operating system's facilities. For Windows, this encrypts the passwords with your login password and, for GNOME, it will store the passwords in the gnome-keyring storage. If you synchronize your Chromium settings with your Google account, Chromium will store those passwords on Google's servers, encrypted with a key that is stored in the Google Account itself. So your passwords are then only as safe as your Google account. Note that this was covered here in 2010, although back then Chromium didn't synchronize with the Google cloud or encrypt with the system-level key rings. That facility was only added in 2013.
In Firefox, there's an optional, profile-specific master password that unlocks all passwords. In this case, the issue is that browsers are generally always open, so the vault is always unlocked. And this is for users that actually do pick a master password; users are often completely unaware that they should set one.
The unlocking mechanism is a typical convenience-security trade-off: either users need to constantly input their master passwords to login or they don't, and the passwords are available in the clear. In this case, Chromium's approach of actually asking users to unlock their vault seems preferable, even though the developers actually refused to implement the feature for years.
Overall, I would recommend against using a browser-based password manager. Even if it is not used for critical sites, you will end up with hundreds of such passwords that are vulnerable while the browser is running (in the case of Firefox) or at the whim of Google (in the case of Chromium). Furthermore, the "auto-fill" feature that is often coupled with browser-based password managers is often vulnerable to serious attacks, which is mentioned below.
Finally, because browser-based managers generally lack a proper password generator, users may fail to use properly generated passwords, so they can then be easily broken. A password generator has been requested for Firefox, according to this feature request opened in 2007, and there is a password generator in Chrome, but it is disabled by default and hidden in the mysterious chrome://flags URL.Other notable password managers
Another alternative password manager, briefly mentioned in the previous article, is the minimalistic Assword password manager that, despite its questionable name, is also interesting. Its main advantage over pass is that it uses a single encrypted JSON file for storage, and therefore doesn't leak the name of the entries by default. In addition to copy/paste, Assword also supports automatically entering passphrases in fields using the xdo library. Like pass, it uses GnuPG to encrypt passphrases. According to Assword maintainer Daniel Kahn Gillmor in email, the main issue with Assword is "interaction between generated passwords and insane password policies". He gave the example of the Time-Warner Cable registration form that requires, among other things, "letters and numbers, between 8 and 16 characters and not repeat the same characters 3 times in a row".
Another well-known password manager is the commercial LastPass service which released a free-software command-line client called lastpass-cli about three years ago. Unfortunately, the server software of the lastpass.com service is still proprietary. And given that LastPass has had at least two serious security breaches since that release, one could legitimately question whether this is a viable solution for storing important secrets.
In general, web-based password managers expose a whole new attack surface that is not present in regular password managers. A 2014 study by University of California researchers showed that, out of five password managers studied, every one of them was vulnerable to at least one of the vulnerabilities studied. LastPass was, in particular, vulnerable to a cross-site request forgery (CSRF) attack that allowed an attacker to bypass account authentication and access the encrypted database.Problems with password managers
When you share a password database within a team, how do you remove access to a member of the team? While you can, for example, re-encrypt a pass database with new keys (thereby removing or adding certain accesses) or change the password on a KeePass database, a hostile party could have made a backup of the database before the revocation. Indeed, in the case of pass, older entries are still in the Git history. So access revocation is a problematic issue found with all shared password managers, as it may actually mean going through every password and changing them online.
This fundamental problem with shared secrets can be better addressed with a tool like Vault or SFLvault. Those tools aim to provide teams with easy ways to store dynamic tokens like API keys or service passwords and share them not only with other humans, but also make them accessible to machines. The general idea of those projects is to store secrets in a central server and send them directly to relevant services without human intervention. This way, passwords are not actually shared anymore, which is similar in spirit to the approach taken by centralized authentication systems like Kerberos. If you are looking at password management for teams, those projects may be worth a look.
Furthermore, some password managers that support auto-typing were found to be vulnerable to HTML injection attacks: if some third-party ad or content is able to successfully hijack the parent DOM content, it masquerades as a form that could fool auto-typing software as demonstrated by this paper that was submitted at USENIX 2014. Fortunately, KeePass was not vulnerable according to the security researchers, but LastPass was, again, vulnerable.Future of password managers?
All of the solutions discussed here assume you have a trusted computer you regularly have access to, which is a usage pattern that seems to be disappearing with a majority of the population. You could consider your phone to be that trusted device, yet a phone can be lost or stolen more easily than a traditional workstation or even a laptop. And while KeePass has Android and iOS ports, those do not resolve the question of how to share the password storage among those devices or how to back them up.
Password managers are fundamentally file-based, and the "file" concept seems to be quickly disappearing, faster than we technologists sometimes like to admit. Looking at some relatives' use of computers, I notice it is less about "files" than images, videos, recipes, and various abstract objects that are stored in the "cloud". They do not use local storage so much anymore. In that environment, password managers lose their primary advantage, which is a local, somewhat offline file storage that is not directly accessible to attackers. Therefore certain password managers are specifically designed for the cloud, like LastPass or web browser profile synchronization features, without necessarily addressing the inherent issues with cloud storage and opening up huge privacy and security issues that we absolutely need to address.
This is where the "password hasher" design comes in. Also known as "stateless" or "deterministic" password managers, password hashers are emerging as a convenient solution that could possibly replace traditional password managers as users switch from generic computing platforms to cloud-based infrastructure. We will cover password hashers and the major security challenges they pose in a future article.
Passwords are used everywhere in our modern life. Between your email account and your bank card, a lot of critical security infrastructure relies on "something you know", a password. Yet there is little standard documentation on how to generate good passwords. There are some interesting possibilities for doing so; this article will look at what makes a good password and some tools that can be used to generate them.
There is growing concern that our dependence on passwords poses a fundamental security flaw. For example, passwords rely on humans, who can be coerced to reveal secret information. Furthermore, passwords are "replayable": if your password is revealed or stolen, anyone can impersonate you to get access to your most critical assets. Therefore, major organizations are trying to move away from single password authentication. Google, for example, is enforcing two factor authentication for its employees and is considering abandoning passwords on phones as well, although we have yet to see that controversial change implemented.
Yet passwords are still here and are likely to stick around for a long time until we figure out a better alternative. Note that in this article I use the word "password" instead of "PIN" or "passphrase", which all roughly mean the same thing: a small piece of text that users provide to prove their identity.What makes a good password?
A "good password" may mean different things to different people. I will assert that a good password has the following properties:
- high entropy: hard to guess for machines
- transferable: easy to communicate for humans or transfer across various protocols for computers
- memorable: easy to remember for humans
High entropy means that the password should be unpredictable to an attacker, for all practical purposes. It is tempting (and not uncommon) to choose a password based on something else that you know, but unfortunately those choices are likely to be guessable, no matter how "secret" you believe it is. Yes, with enough effort, an attacker can figure out your birthday, the name of your first lover, your mother's maiden name, where you were last summer, or other secrets people think they have.
The only solution here is to use a password randomly generated with enough randomness or "entropy" that brute-forcing the password will be practically infeasible. Considering that a modern off-the-shelf graphics card can guess millions of passwords per second using freely available software like hashcat, the typical requirement of "8 characters" is not considered enough anymore. With proper hardware, a powerful rig can crack such passwords offline within about a day. Even though a recent US National Institute of Standards and Technology (NIST) draft still recommends a minimum of eight characters, we now more often hear recommendations of twelve characters or fourteen characters.
A password should also be easily "transferable". Some characters, like & or !, have special meaning on the web or the shell and can wreak havoc when transferred. Certain software also has policies of refusing (or requiring!) some special characters exactly for that reason. Weird characters also make it harder for humans to communicate passwords across voice channels or different cultural backgrounds. In a more extreme example, the popular Signal software even resorted to using only digits to transfer key fingerprints. They outlined that numbers are "easy to localize" (as opposed to words, which are language-specific) and "visually distinct".
But the critical piece is the "memorable" part: it is trivial to generate a random string of characters, but those passwords are hard for humans to remember. As xkcd noted, "through 20 years of effort, we've successfully trained everyone to use passwords that are hard for human to remember but easy for computers to guess". It explains how a series of words is a better password than a single word with some characters replaced.
Obviously, you should not need to remember all passwords. Indeed, you may store some in password managers (which we'll look at in another article) or write them down in your wallet. In those cases, what you need is not a password, but something I would rather call a "token", or, as Debian Developer Daniel Kahn Gillmor (dkg) said in a private email, a "high entropy, compact, and transferable string". Certain APIs are specifically crafted to use tokens. OAuth, for example, generates "access tokens" that are random strings that give access to services. But in our discussion, we'll use the term "token" in a broader sense.
Notice how we removed the "memorable" property and added the "compact" one: we want to efficiently convert the most entropy into the shortest password possible, to work around possibly limiting password policies. For example, some bank cards only allow 5-digit security PINs and most web sites have an upper limit in the password length. The "compact" property applies less to "passwords" than tokens, because I assume that you will only use a password in select places: your password manager, SSH and OpenPGP keys, your computer login, and encryption keys. Everything else should be in a password manager. Those tools are generally under your control and should allow large enough passwords that the compact property is not particularly important.Generating secure passwords
We'll look now at how to generate a strong, transferable, and memorable password. These are most likely the passwords you will deal with most of the time, as security tokens used in other settings should actually never show up on screen: they should be copy-pasted or automatically typed in forms. The password generators described here are all operated from the command line. Password managers often have embedded password generators, but usually don't provide an easy way to generate a password for the vault itself.
The previously mentioned xkcd cartoon is probably a common cultural reference in the security crowd and I often use it to explain how to choose a good passphrase. It turns out that someone actually implemented xkcd author Randall Munroe's suggestion into a program called xkcdpass:$ xkcdpass estop mixing edelweiss conduct rejoin flexitime
In verbose mode, it will show the actual entropy of the generated passphrase:$ xkcdpass -V The supplied word list is located at /usr/lib/python3/dist-packages/xkcdpass/static/default.txt. Your word list contains 38271 words, or 2^15.22 words. A 6 word password from this list will have roughly 91 (15.22 * 6) bits of entropy, assuming truly random word selection. estop mixing edelweiss conduct rejoin flexitime
Note that the above password has 91 bits of entropy, which is about what a fifteen-character password would have, if chosen at random from uppercase, lowercase, digits, and ten symbols:log2((26 + 26 + 10 + 10)^15) = approx. 92.548875
It's also interesting to note that this is closer to the entropy of a fifteen-letter base64 encoded password: since each character is six bits, you end up with 90 bits of entropy. xkcdpass is scriptable and easy to use. You can also customize the word list, separators, and so on with different command-line options. By default, xkcdpass uses the 2 of 12 word list from 12 dicts, which is not specifically geared toward password generation but has been curated for "common words" and words of different sizes.
Another option is the diceware system. Diceware works by having a word list in which you look up words based on dice rolls. For example, rolling the five dice "1 4 2 1 4" would give the word "bilge". By rolling those dice five times, you generate a five word password that is both memorable and random. Since paper and dice do not seem to be popular anymore, someone wrote that as an actual program, aptly called diceware. It works in a similar fashion, except that passwords are not space separated by default:$ diceware AbateStripDummy16thThanBrock
Diceware can obviously change the output to look similar to xkcdpass, but can also accept actual dice rolls for those who do not trust their computer's entropy source:$ diceware -d ' ' -r realdice -w en_orig Please roll 5 dice (or a single dice 5 times). What number shows dice number 1? 4 What number shows dice number 2? 2 What number shows dice number 3? 6 [...] Aspire O's Ester Court Born Pk
The diceware software ships with a few word lists, and the default list has been deliberately created for generating passwords. It is derived from the standard diceware list with additions from the SecureDrop project. Diceware ships with the EFF word list that has words chosen for better recognition, but it is not enabled by default, even though diceware recommends using it when generating passwords with dice. That is because the EFF list was added later on. The project is currently considering making the EFF list be the default.
One disadvantage of diceware is that it doesn't actually show how much entropy the generated password has — those interested need to compute it for themselves. The actual number depends on the word list: the default word list has 13 bits of entropy per word (since it is exactly 8192 words long), which means the default 6 word passwords have 78 bits of entropy:log2(8192) * 6 = 78
Both of these programs are rather new, having, for example, entered Debian only after the last stable release, so they may not be directly available for your distribution. The manual diceware method, of course, only needs a set of dice and a word list, so that is much more portable, and both the diceware and xkcdpass programs can be installed through pip. However, if this is all too complicated, you can take a look at Openwall's passwdqc, which is older and more widely available. It generates more memorable passphrases while at the same time allowing for better control over the level of entropy:$ pwqgen vest5Lyric8wake $ pwqgen random=78 Theme9accord=milan8ninety9few
For some reason, passwdqc restricts the entropy of passwords between the bounds of 24 and 85 bits. That tool is also much less customizable than the other two: what you see here is pretty much what you get. The 4096-word list is also hardcoded in the C source code; it comes from a Usenet sci.crypt posting from 1997.
A key feature of xkcdpass and diceware is that you can craft your own word list, which can make dictionary-based attacks harder. Indeed, with such word-based password generators, the only viable way to crack those passwords is to use dictionary attacks, because the password is so long that character-based exhaustive searches are not workable, since they would take centuries to complete. Changing from the default dictionary therefore brings some advantage against attackers. This may be yet another "security through obscurity" procedure, however: a naive approach may be to use a dictionary localized to your native language (for example, in my case, French), but that would deter only an attacker that doesn't do basic research about you, so that advantage is quickly lost to determined attackers.
One should also note that the entropy of the password doesn't depend on which word list is chosen, only its length. Furthermore, a larger dictionary only expands the search space logarithmically; in other words, doubling the word-list length only adds a single bit of entropy. It is actually much better to add a word to your password than words to the word list that generates it.Generating security tokens
As mentioned before, most password managers feature a way to generate strong security tokens, with different policies (symbols or not, length, etc). In general, you should use your password manager's password-generation functionality to generate tokens for sites you visit. But how are those functionalities implemented and what can you do if your password manager (for example, Firefox's master password feature) does not actually generate passwords for you?
pass, the standard UNIX password manager, delegates this task to the widely known pwgen program. It turns out that pwgen has a pretty bad track record for security issues, especially in the default "phoneme" mode, which generates non-uniformly distributed passwords. While pass uses the more "secure" -s mode, I figured it was worth removing that option to discourage the use of pwgen in the default mode. I made a trivial patch to pass so that it generates passwords correctly on its own. The gory details are in this email. It turns out that there are lots of ways to skin this particular cat. I was suggesting the following pipeline to generate the password:head -c $entropy /dev/random | base64 | tr -d '\n='
The above command reads a certain number of bytes from the kernel (head -c $entropy /dev/random) encodes that using the base64 algorithm and strips out the trailing equal sign and newlines (for large passwords). This is what Gillmor described as a "high-entropy compact printable/transferable string". The priority, in this case, is to have a token that is as compact as possible with the given entropy, while at the same time using a character set that should cause as little trouble as possible on sites that restrict the characters you can use. Gillmor is a co-maintainer of the Assword password manager, which chose base64 because it is widely available and understood and only takes up 33% more space than the original 8-bit binary encoding. After a lengthy discussion, the pass maintainer, Jason A. Donenfeld, chose the following pipeline:read -r -n $length pass < <(LC_ALL=C tr -dc "$characters" < /dev/urandom)
The above is similar, except it uses tr to directly to read characters from the kernel, and selects a certain set of characters ($characters) that is defined earlier as consisting of [:alnum:] for letters and digits and [:graph:] for symbols, depending on the user's configuration. Then the read command extracts the chosen number of characters from the output and stores the result in the pass variable. A participant on the mailing list, Brian Candler, has argued that this wastes entropy as the use of tr discards bits from /dev/urandom with little gain in entropy when compared to base64. But in the end, the maintainer argued that reading "reading from /dev/urandom has no [effect] on /proc/sys/kernel/random/entropy_avail on Linux" and dismissed the objection.
Another password manager, KeePass uses its own routines to generate tokens, but the procedure is the same: read from the kernel's entropy source (and user-generated sources in case of KeePass) and transform that data into a transferable string.Conclusion
While there are many aspects to password management, we have focused on different techniques for users and developers to generate secure but also usable passwords. Generating a strong yet memorable password is not a trivial problem as the security vulnerabilities of the pwgen software showed. Furthermore, left to their own devices, users will generate passwords that can be easily guessed by a skilled attacker, especially if they can profile the user. It is therefore essential we provide easy tools for users to generate strong passwords and encourage them to store secure tokens in password managers.
It's a new Pythonic year and what could be better than starting it with a Montreal-Python meetup. Come discover different ways of using our favorite programming language.
As usual, snacks will be provided, but eat before, or even better, after by joining us at Benelux so you can network with the speakers and other attendees.
It's also with great joy that we announce that PyCon Canada 2017 will be held at home, here in Montreal. Add it to your calendars! For more information and to stay informed about the upcoming news, visit the PyCon Canada website at https://2017.pycon.ca/.Presentations Roberto Rocha: GIS with Python: 6 libraries you should know
A quick demo of libraries for doing geospatial work and mapping. The libraries are: geopandas, shapely, fiona, Basemap, folium and pysal.Jordi Riera: How to train your Python
Let's go back to basics. We will review how to make a python code more pythonic, easier to maintain and to read. A set of little tricks here and there can change your code for the best. We will talk about topics as Immutable vs mutable variables but also about core tools like dict, defaultdict, named tuple and generator.Rami Sayar: Building Python Microservices with Docker and Kubernetes
Python is powering your production apps and you are struggling with the complexity, bugs and feature requests you need. You just don't know how to maintain your app anymore. You're scared you have created the kraken that will engulf your entire development team!
Microservices architecture has existed for as long as monolithic applications became a common problem. With the DevOps revolution, it is the time to seriously consider building microservice architectures with Python.
This talk will share strategies on how to split up your monolithic apps and show you how to deploy Python microservices using Docker. We will get hands-on with a sample app, walk step-by-step on how to change the app's architecture and deploy it to the cloud.
No longer shall you deal with the endless complexities of monolithic Python apps. Fear the kraken no more!Where
Shopify Offices 490 de la Gauchetière street Montréal, QuébecWhen
Monday, February 13th, 2016 at 6pm
We’d like to thank our sponsors for their continued support:
- Savoir-faire Linux
I got a new computer and wondered... How can I test it? One of those innocent questions that brings hours and hours of work and questionning...A new desktop: Intel NUC devices
After reading up on Jeff Atwood's blog and especially his article on the scooter computer, I have discovered a whole range of small computers that could answer my need for a faster machine in my office at a low price tag and without taking up too much of my precious desk space. After what now seems like a too short review I ended up buying a new Intel NUC device from NCIX.com, along with 16GB of RAM and an amazing 500GB M.2 hard drive for around 750$. I am very happy with the machine. It's very quiet and takes up zero space on my desk as I was able to screw it to the back of my screen. You can see my review of the hardware compatibility and installation report in the Debian wiki.
I wish I had taken more time to review the possible alternatives - for example I found out about the amazing Airtop PC recently and, although that specific brand is a bit too expensive, the space of small computers is far and wide and deserves a more thorough review than just finding the NUC by accident while shopping for laptops on System76.com...Reviving the Stressant project
But this, and Atwood's Is Your Computer Stable? article, got me thinking about how to test new computers. It's one thing to build a machine and fire it up, but how do you know everything is actually really working? It is common practice to do a basic stress test or burn-in when you get a new machine in the industry - how do you proceed with such tests?
Back in the days when I was working at Koumbit, I wrote a tool exactly for that purpose called Stressant. Since I am the main author of the project and I didn't see much activity on it since I left, I felt it would be a good idea to bring it under my personal wing again, and I have therefore moved it to my Gitlab where I hope to bring it back to life. Parts of the project's rationale are explained in an "Intent To Package" the "breakin" tool (Debian bug #707178), which, after closer examination, ended up turning into a complete rewrite.
The homepage has a bit more information about how the tool works and its objectives, but generally, the idea is to have a live CD or USB stick that you can just plugin into a machine to run a battery of automated tests (memtest86, bonnie++, stress-ng and disk wiping, for example) or allow for interactive rescue missions on broken machines. At Koumbit, we had Debirf-based live images that we could boot off the network fairly easily that we would use for various purposes, although nothing was automated yet. The tool is based on Debian, but since it starts from boot, it should be runnable on any computer.
I was able to bring the project back to life, to a certain extent, by switching to vmdebootstrap instead of debirf for builds, but that removed netboot support. Also, I hope that Gitlab could provide with an autobuilder for the images, but unfortunately there's a bug in Docker that makes it impossible to mount loop images in Docker images (which makes it impossible to build Docker in Docker, apparently).Should I start yet another project?
So there's still a lot of work to do in this project to get it off the ground. I am still a bit hesitant in getting into this, however, for a few reasons:
It's yet another volunteer job - which I am trying to reduce for health and obvious economic reasons. That's a purely personal reason and there isn't much you can do about it.
I am not sure the project is useful. It's one thing to build a tool that can do basic tests on a machine - I can probably just build an live image for myself that will do everything I need - it's another completely different thing to build something that will scale to multiple machines and be useful for more various use cases and users.
(A variation of #1 is how everything and everyone is moving to the cloud. It's become a common argument that you shouldn't run your own metal these days, and we seem to be fighting an uphill economic battle when we run our own datacenters, rack or even physical servers these days. I still think it's essential to have some connexion to metal to be autonomous in our communications, but I'm worried that focusing on such a project is another of my precious dead entreprises... )
Part #2 is obviously where you people come in. Here's a few questions I'd like to have feedback on:
(How) do you perform stress-testing of your machines before putting them in production (or when you find issues you suspect to be hardware-related)?
Would a tool like breakin or stressant be useful in your environment?
Which tools do you use now for such purposes?
Would you contribute to such a project? How?
Do you think there is room for such a project in the existing ecology of projects) or should I contribute to an existing project?
Any feedback here would be, of course, greatly appreciated.
The debmans package I had so lovingly worked on last month is now officially abandoned. It turns out that another developer, Michael Stapelberg wrote his own implementation from scratch, called debiman.
Both software share a similar design: they are both static site generators that parse an existing archive and call another tool to convert manpages into HTML. We even both settled on the same converter (mdoc). But while I wrote debmans in Python, debiman is written in Go. debiman also seems much faster, being written with concurrency in mind from the start. Finally, debiman is more feature complete: it properly deals with conflicting packages, localization and all sorts redirections. Heck, it even has a pretty logo, how can I compete?
While debmans was written first and was in the process of being deployed, I had to give it up. It was a frustrating experience because I felt I wasted a lot of time working on software that ended up being discarded, especially because I put so much work on it, creating extensive documentation, an almost complete test suite and even filing a detailed core infrastructure best practices report In the end, I think that was the right choice: debiman seemed clearly superior and the best tool should win. Plus, it meant less work for me: Michael and Javier (the previous manpages.debian.org maintainer) did all the work of putting the site online. I also learned a lot about the CII best practices program, flask, click and, ultimately, the Go programming language itself, which I'll refer to as Golang for brievity. debiman definitely brought Golang into the spotlight for me. I had looked at Go before, but it seemed to be yet another language. But seeing Michael beat me to rebuilding the service really made me look at it again more seriously. While I really appreciate Python and I will probably still use it as my language of choice for GUI work and smaller scripts, but for daemons, network programs and servers, I will seriously consider Golang in the future.
This obviously brings me to the latest project I worked on, Wallabako, my first Golang program ever. Wallabako is basically a client for the Wallabag application, which is a free software "read it later" service, an alternative to the likes of Pocket, Pinboard or Evernote. Back in April, I had looked downloading my "unread articles" into my new ebook reader, going through convoluted ways like implementing OPDS support into Wallabag, which turned out to be too difficult.
Instead, I used this as an opportunity to learn Golang. After reading the quite readable golang specification over the weekend, I found the language to be quite elegant and simple, yet very powerful. Golang feels like C, but built with concurrency and memory (and to a certain extent, type) safety in mind, along with a novel approach to OO programming.
The fact that everything can be compiled in one neat little static binary was also a key feature in selecting golang for this project, as I do not have much control over the platform my E-Reader is running: it is a Linux machine running under the ARM architecture, but beyond that, there isn't much available. I couldn't afford to ship a Python interpreter in there and while there are solutions there like pyinstaller, I felt that it may be so easy to deploy on ARM. The borg team had trouble building a ARM binary, restoring to tricks like building on a Raspberry PI or inside an emulator. In comparison, the native go compiler supports cross-compilation out of the box through a simple environment variable.
So far Wallabako works amazingly well: when I "bag" a new article in Wallabag, either from my phone or my web browser, it will show up on my ebook reader then next time I open the wifi. I still need to "tap" the screen to fake the insertion of the USB cable, but we're working on automating that. I also need to make the installation of the software much easier and improve the documentation, because so far it's unlikely that someone unfamiliar with Kobo hardware hacking will be able to install it.Other work
According to Github, I filed a bunch of bugs all over the place (25 issues in 16 repositories), sent patches everywhere (13 pull requests in 6 repositories), and tried to fix everythin (created 38 commits in 7 repositories). Note that excludes most of my work, which happens on Gitlab. January was still a very busy month, especially considering I had an accident which kept me mostly offline for about a week.
Here are some details on specific projects.Stressant and a new computer
After much discussions, it was decided to fork the linkchecker project, which now lives in its own organization. I still have to write community guidelines and figure out the best way to maintain a stable branch, but I am hopeful that the community will pick up the project as multiple people volunteer to co-maintain the project. There has already been pull requests and issues reported, so that's a good sign.Feed2tweet refresh
I re-rolled my pull requests to the feed2tweet project: last time they were closed before I had time to rebase them. The author was okay with me re-submitting them, but he hasn't commented, reviewed or merged the patches yet so I am worried they will be dropped again.
At that point, I would more likely rewrite this from scratch than try to collaborate with someone that is clearly not interested in doing so...Debian uploads
- calibre 2.75.1 backport - thanks to Nicholas Steeves for finding out about the security issues as well!
- dnsdiag 1.4.0 - sponsored, in NEW, now in unstable!
- monkeysign 2.2.3 - a quick bugfix release to ship a good version in stretch, now finally with GnuPG 2.x support!
- pepper 0.3.3-3 - just regular package maintenance
- tty-clock 2.3 - new upstream release, also got access to the upstream project
- tuptime 3.3.1 - sponsored
This month I worked on a few issues, but they were big issues, so they took a lot of time.
I have done a lot of work trying to backport the heading sanitization patches for CVE-2016-8743. The full report explain all the gritty details, but I ran out of time and couldn't upload the final version either. The issue mostly affects Apache servers in proxy configurations so it's not so severe as to warrant an immediate upload anyways.
Finally, there was a small discussion surrounding tools to use when building and testing update to LTS packages. The resulting conversation was interesting, but it showed that we have a big documentation problem in the Debian project. There are a lot of tools, and the documentation is old and distributed everywhere. Every time I want to contribute something to the documentation, I never know where to start or go. This is why I wrote a separate debian development guide instead of contributing to existing documentation...
It is 2017, and we are getting ready for a great year of Python in Montreal. To start the year on a good note, we are launching our first request for presenters. This is an opportunity for all, we are looking for speakers. It's your chance to submit a talk. Just write us at firstname.lastname@example.org.
We are particularly looking for people willing to present lightning talks of 5minutes. Don't hesitate and send us your proposition or join us on slack by subscribing at http://slack.mtlpy.org/ to ask us any question.Where
Shopify Offices 490 de la Gauchetière street west Montréal, QuébecWhen
Monday, Feburary 13th, 2016 at 6pm
We’d like to thank our sponsors for their continued support:
- Savoir-faire Linux
Those were 8th and 9th months working on Debian LTS started by Raphael Hertzog at Freexian. I had trouble resuming work in November as I had taken a long break during the month and started looking at issues only during the last week of November.Imagemagick, again
I have, again, spent a significant amount of time fighting the ImageMagick (IM) codebase. About 15 more vulnerabilities were found since the last upload, which resulted in DLA-756-1. In the advisory, I unfortunately forgot to mention CVE-2016-8677 and CVE-2016-9559, something that was noticed by my colleague Roberto after the upload... More details about the upload are available in the announcement.
When you consider that I worked on IM back in october, which lead to an upload near the end of November covering around 80 more vulnerabilities, it doesn't look good for the project at all. Of the 15 vulnerabilities I worked on, only 6 had CVEs assigned and I had to request CVEs for the other 9 vulnerabilities plus 11 more that were still unassigned. This lead to the assignment of 25 distinct CVE identifiers as a lot of issues were found to be distinct enough to warrant their own CVEs.
One could also question how many of those issues affect the fork, Graphicsmagick. A lot of the vulnerabilities were found through fuzzing searches that may not have been tested on Graphicsmagick. It seems clear to me that a public corpus of test data should be available to test regressions and cross-project vulnerabilities. It's already hard enough to track issues withing IM itself, I can't imagine what it would be for the fork to keep track of those issues, especially since upstream doesn't systematically request CVEs for issues that they find, a questionable practice considering the number of issues we all need to keep track of.Nagios
I have also worked on the Nagios package and produced DLA 751-1 which fixed two fairly major issues (CVE-2016-9565 and CVE-2016-9566) that could allow remote root access under certain conditions. Fortunately, the restricted permissions setup by default in the Debian package made both exploits limited to information disclosure and privilege escalation if the debug log is enabled.
This says a lot about how proper Debian packaging can help in limiting the attack surface of certain vulnerabilities. It was also "interesting" to have to re-learn dpatch to add patches to the package: I regret not converting it to quilt, as the operation is simple and quilt is so much easier to use.
People new to Debian packaging may be curious to learn about the staggering number of patching systems historically used in Debian. On that topic, I started a conversation about how much we want to reuse existing frameworks when we work on those odd packages, and the feedback was interesting. Basically, the answer is "it depends"...NSS
I had already worked on the package in November and continued the work in December. Most of the work was done by Raphael, which fixed a lot of issues with the test suite. I tried to wrap this up by fixing CVE-2016-9074, the build on armel and the test suite. Unfortunately, I had to stop again because I ran out of hours and the fips test suite was still failing, but fortunately Raphael was able to complete the work with DLA-759-1.
For the second time, I forgot to formally assign myself a package before working on it, which meant that I wasted part of my hours working on the monit package. Those hours, of course, were not counted in my regular hours. I still spent some time reviewing mejo's patch to ensure it was done properly and it turned out we both made similar patches working independently, always a good sign.
As I reported in my preliminary November report, I have also triaged issues in libxml2, ntp, openssl and tiff.
Finally, I should mention my short review of the phpMyAdmin upload, among the many posts i sent to the LTS mailing list.Other free software work
One reason why I had so much trouble getting paid work done in November is that I was busy with unpaid work...manpages.debian.org
A major time hole for me was trying to tackle the manpages.debian.org service, which had been offline since August. After a thorough evaluation of the available codebases, I figured the problem space wasn't so hard and it was worth trying to do an implementation from scratch. The result is a tool called debmans.
It took, obviously, way longer than I expected, as I experimented with Python libraries I had been keeping an eye on for a while. For the commanline interface, I used the click library, which is really a breeze to use, but a bit heavy for smaller scripts. For a web search service prototype, I looked at flask, which was also very interesting, as it is light and simple enough to use that I could get started quickly. It also, surprisingly, fares pretty well in the global TechEmpower benchmarking tests. Those interested in those tools may want to look at the source code, in particular the main command (using an interesting pattern itself, __main__.py) and the search prototype.
Debmans is the first project for which I have tried the CII Best Practices Badge program, an interesting questionnaire to review best practices in software engineering. It is an excellent checklist I recommend every project manager and programmer to get familiar with.
I still need to complete my work on Debmans: as I write this, I couldn't get access to the new server the DSA team setup for this purpose. It was a bit of a frustrating experience to wait for all the bits to get into place while I had a product ready to test. In the end, the existing manpages.d.o maintainer decided to deploy the existing codebase on the new server while the necessary dependencies are installed and accesses are granted. There's obviously still a bunch of work to be done for this to be running in production so I have postponed all this work to January.
My hope is that this tool can be reused by other distributions, but after talking with Ubuntu folks, I am not holding my breath: it seems everyone has something that is "good enough" and that they don't want to break it...Monkeysign
I spent a good chunk of time giving a kick in the Monkeysign project, with the 2.2.2 release, which features contributions from two other developers, which may be a record for a single release.
I am especially happy to have adopted a new code of conduct - it has been an interesting process to adapt the code of conduct for such a relatively small project. Monkeysign is becoming a bit of a template on how to do things properly for my Python projects: documentation on readthedocs.org including a code of conduct, support and contribution information, and so on. Even though the code now looks a bit old to me and I am embarrassed to read certain parts, I still think it is a solid project that is useful for a lot of people. I would love to have more time to spend on it.LWN publishing
As you may have noticed if you follow this blog, I have started publishing articles for the LWN magazine, filed here under the lwn tag. It is a way for me to actually get paid for some of my blogging work that used to be done for free. Reports like this one, for example, take up a significant amount of my time and are done without being paid. Converting parts of this work into paid work is part of my recent effort to reduce the amount of time I spend on the computer.
An funny note: I always found the layout of the site to be a bit odd, until I looked at my articles posted there in a different web browser, which didn't have my normal ad blocker configuration. It turns out LWN uses ads, and Google ones too, which surprised me. I definitely didn't want to publish my work under banner ads, and will never do so on this blog. But it seems fair that, since I get paid for this work, there is some sort of revenue stream associated with it. If you prefer to see my work without ads, you can wait for it to be published here or become a subscriber which allows you to get rid of the ads on the site.
My experience with LWN is great: they're great folks, and very supportive. It's my first experience with a real editor and it really pushed me in improving my writing to make better articles that I normally would here. Thanks to the LWN folks for their support! Expect more of those quality articles in 2017.Debian packaging
I have added a few packages to the Debian archive:
- magic-wormhole: easy file-transfer tool, co-maintained with Jamie Rollins
- slop: screenshot tool
- xininfo: utility used by teiler
- teiler (currently in NEW): GUI for screenshot and screencast tools
Against my better judgment, I worked again on the borg project. This time I tried to improve the documentation, after a friend asked me for help on "how to make a quick backup". I realized I didn't have any good primer to send regular, non-sysadmin users to and figured that, instead of writing a new one, I could improve the upstream documentation instead.
I generated a surprising 18 commits of documentation during that time, mainly to fix display issues and streamline the documentation. My final attempt at refactoring the docs eventually failed, unfortunately, again reminding me of the difficulty I have in collaborating on that project. I am not sure I succeeded in making the project more attractive to non-technical users, but maybe that's okay too: borg is a fairly advanced project and not currently aimed at such a public. This is yet another project I am thinking of creating: a metabackup program like backupninja that would implement the vision created by liw in his A vision for backups in Debian post, which was discarded by the Borg project.
Github also tells me that I have opened 19 issues in 14 different repositories in November. I would like to particularly bring your attention to the linkchecker project which seems to be dead upstream and for which I am looking for collaborators in order to create a healthy fork.
Finally, I started on reviving the stressant project and changing all my passwords, stay tuned for more!
The Debian project is looking at possibly making automatic minor upgrades to installed packages the default for newly installed systems. While Debian has a reliable and stable package update system that has been an inspiration for multiple operating systems (the venerable APT), upgrades are, usually, a manual process on Debian for most users.
The proposal was brought up during the Debian Cloud sprint in November by longtime Debian Developer Steve McIntyre. The rationale was to make sure that users installing Debian in the cloud have a "secure" experience by default, by installing and configuring the unattended-upgrades package within the images. The unattended-upgrades package contains a Python program that automatically performs any pending upgrade and is designed to run unattended. It is roughly the equivalent of doing apt-get update; apt-get upgrade in a cron job, but has special code to handle error conditions, warn about reboots, and selectively upgrade packages. The package was originally written for Ubuntu by Michael Vogt, a longtime Debian developer and Canonical employee.
Since there was a concern that Debian cloud images would be different from normal Debian installs, McIntyre suggested installing unattended-upgrades by default on all Debian installs, so that people have a consistent experience inside and outside of the cloud. The discussion that followed was interesting as it brought up key issues one would have when deploying automated upgrade tools, outlining both the benefits and downsides to such systems.Problems with automated upgrades
An issue raised in the following discussion is that automated upgrades may create unscheduled downtime for critical services. For example, certain sites may not be willing to tolerate a master MySQL server rebooting in conditions not controlled by the administrators. The consensus seems to be that experienced administrators will be able to solve this issue on their own, or are already doing so.
For example, Noah Meyerhans, a Debian developer, argued that "any reasonably well managed production host is going to be driven by some kind of configuration management system" where competent administrators can override the defaults. Debian, for example, provides the policy-rc.d mechanism to disable service restarts on certain packages out of the box. unattended-upgrades also features a way to disable upgrades on specific packages that administrators would consider too sensitive to restart automatically and will want to schedule during maintenance windows.
Reboots were another issue discussed: how and when to deploy kernel upgrades? Automating kernel upgrades may mean data loss if the reboot happens during a critical operation. On Debian systems, the kernel upgrade mechanisms already provide a /var/run/reboot-required flag file that tools can monitor to notify users of the required reboot. For example, some desktop environments will popup a warning prompting users to reboot when the file exists. Debian doesn't currently feature an equivalent warning for command-line operation: Vogt suggested that the warning could be shown along with the usual /etc/motd announcement.
The ideal solution here, of course, is reboot-less kernel upgrades, which is also known as "live patching" the kernel. Unfortunately, this area is still in development in the kernel (as was previously discussed here). Canonical deployed the feature for the Ubuntu 16.04 LTS release, but Debian doesn't yet have such capability, since it requires extra infrastructure among other issues.
Furthermore, system reboots are only one part of the problem. Currently, upgrading packages only replaces the code and restarts the primary service shipped with a given package. On library upgrades, however, dependent services may not necessarily notice and will keep running with older, possibly vulnerable, libraries. While libc6, in Debian, has special code to restart dependent services, other libraries like libssl do not notify dependent services that they need to restart to benefit from potentially critical security fixes.
One solution to this is the needrestart package which inspects all running processes and restarts services as necessary. It also covers interpreted code, specifically Ruby, Python, and Perl. In my experience, however, it can take up to a minute to inspect all processes, which degrades the interactivity of the usually satisfying apt-get install process. Nevertheless, it seems like needrestart is a key component of a properly deployed automated upgrade system.Benefits of automated upgrades
One thing that was less discussed is the actual benefit of automating upgrades. It is merely described as "secure by default" by McIntyre in the proposal, but no one actually expanded on this much. For me, however, it is now obvious that any out-of-date system will be systematically attacked by automated probes and may be taken over to the detriment of the whole internet community, as we are seeing with Internet of Things devices. As Debian Developer Lars Wirzenius said:
The ecosystem-wide security benefits of having Debian systems keep up to date with security updates by default overweigh any inconvenience of having to tweak system configuration on hosts where the automatic updates are problematic.
One could compare automated upgrades with backups: if they are not automated, they do not exist and you will run into trouble without them. (Wirzenius, coincidentally, also works on the Obnam backup software.)
Another benefit that may be less obvious is the acceleration of the feedback loop between developers and users: developers like to know quickly when an update creates a regression. Automation does create the risk of a bad update affecting more users, but this issue is already present, to a lesser extent, with manual updates. And the same solution applies: have a staging area for security upgrades, the same way updates to Debian stable are first proposed before shipping a point release. This doesn't have to be limited to stable security updates either: more adventurous users could follow rolling distributions like Debian testing or unstable with unattended upgrades as well, with all the risks and benefits that implies.Possible non-issues
That there was not a backlash against the proposal surprised me: I expected the privacy-sensitive Debian community to react negatively to another "phone home" system as it did with the Django proposal. This, however, is different than a phone home system: it merely leaks package lists and one has to leak that information to get the updated packages. Furthermore, privacy-sensitive administrators can use APT over Tor to fetch packages. In addition, the diversity of the mirror infrastructure makes it difficult for a single entity to profile users.
Automated upgrades do imply a culture change, however: administrators approve changes only a posteriori as opposed to deliberately deciding to upgrade parts they chose. I remember a time when I had to maintain proprietary operating systems and was reluctant to enable automated upgrades: such changes could mean degraded functionality or additional spyware. However, this is the free-software world and upgrades generally come with bug fixes and new features, not additional restrictions.Automating major upgrades?
While automating minor upgrades is one part of the solution to the problem of security maintenance, the other is how to deal with major upgrades. Once a release becomes unsupported, security issues may come up and affect older software. While Debian LTS extends releases lifetimes significantly, it merely delays the inevitable major upgrades. In the grand scheme of things, the lifetimes of Linux systems (Debian: 3-5 years, Ubuntu: 1-5 years) versus other operating systems (Solaris: 10-15 years, Windows: 10+ years) is fairly short, which makes major upgrades especially critical.
While major upgrades are not currently automated in Debian, they are usually pretty simple: edit sources.list then:# apt-get update && apt-get dist-upgrade
But the actual upgrade process is really much more complex. If you run into problems with the above commands, you will quickly learn that you should have followed the release notes, a whopping 20,000-word, ten-section document that outlines all the gory details of the release. This is a real issue for large deployments and for users unfamiliar with the command line.
The solutions most administrators seem to use right now is to roll their own automated upgrade process. For example, the Debian.org system administrators have their own process for the "jessie" (8.0) upgrade. I have also written a specification of how major upgrades could be automated that attempts to take into account the wide variety of corner cases that occur during major upgrades, but it is currently at the design stage. Therefore, this problem space is generally unaddressed in Debian: Ubuntu does have a do-release-upgrade command but it is Ubuntu-specific and would need significant changes in order to work in Debian.Future work
Ubuntu currently defaults to "no automation" but, on install, invites users to enable unattended-upgrades or Landscape, a proprietary system-management service from Canonical. According to Vogt, the company supports both projects equally as they differ in scope: unattended-upgrades just upgrades packages while Landscape aims at maintaining thousands of machines and handles user management, release upgrades, statistics, and aggregation.
It appears that Debian will enable unattended-upgrades on the images built for the cloud by default. For regular installs, the consensus that has emerged points at the Debian installer prompting users to ask if they want to disable the feature as well. One reason why this was not enabled before is that unattended-upgrades had serious bugs in the past that made it less attractive. For example, it would simply fail to follow security updates, a major bug that was fortunately promptly fixed by the maintainer.
In any case, it is important to distribute security and major upgrades on Debian machines in a timely manner. In my long experience in professionally administering Unix server farms, I have found the upgrade work to be a critical but time-consuming part of my work. During that time, I successfully deployed an automated upgrade system all the way back to Debian woody, using the simpler cron-apt. This approach is, unfortunately, a little brittle and non-standard; it doesn't address the need of automating major upgrades, for which I had to revert to tools like cluster-ssh or more specialized configuration management tools like Puppet. I therefore encourage any effort towards improving that process for the whole community.