Armin Ronacher

Werkzeug 0.6 – Hammer released

written by Armin Ronacher, on Thursday, February 18, 2010 23:29.

I am incredible happy to announce the latest release of Werkzeug 0.6, the WSGI toolkit for Python. There is so much new in this version, I am not sure where to start :)

  • pending deprecations were removed and tons of bugs were fixed. Among those we now have better RFC 2068 compatibility regarding cookie quoting.
  • the FileStorage object now gives you access to the multipart headers. Also the Google Chrome plaintext file-upload handled as form data is gone now.
  • The URL routing system is now able to give you back the URL rule object instead of the endpoint. This makes it possible to put more information onto a rule. This can be used to bind a rule to a function and interpolate the rule endpoint string from the function name. When matching, the rule object is returned and with that you have access to both the function and the endpoint name.
  • The builtin development server now has SSL support. You can finally test your application via HTTPS without the need of an external server.
  • Finally! Response objects are no longer modified in place when evaluated as WSGI application. This is a backwards incompatible change but Werkzeug will warn you if you rely on that behaviour. What does this mean to you? In a nutshell, it is now possible to share response objects between requests. Also, if you freeze a response now, the content length is set automatically.
  • If you are working in multi-charset environments, you can configure the Werkzeug wrappers to support dynamic charset setting. This makes it possible to change the content type or charset and the system reacts dynamically when encoding / decoding.
  • The wrappers also have a default __repr__ now which spits out useful information.
  • All the builtin data structures can be pickled now.
  • Huge performance improvements for the new multipart parser now. If uploads were slow for you, update right now :)
  • If you are working with custom HTTP methods (such as DAV) you can now rely on the stream attribute of the wrappers being usable. Previously you had to fall back to the raw WSGI input stream instead.
  • The accept objects returned for accept headers now have a best_match method that helps you guessing the best type.
  • Werkzeug is finally able to work with IRIs. You can safely use unicode URLs internally now.
  • Also, Werkzeug now provides a OrderedMultiDict that preserves order of form data. Use this if you need to ensure that you can process form data without changing the order. From what I've heard, that is a requirement to work with the PayPal API.
  • The MoinMoin wiki people contributed many patches to improve the filesystem session support from the session contrib module. If you use filesystem sessions, you want to upgrade ASAP :)
  • Werkzeug no longer utilizes the time module for header parsing. This means that finally the time parsing behaviour is consistent regarding overflows and works on a wider range of dates. Also it will no longer have undefined behaviour for invalid dates which do occur in the real world :(
  • The wrappers allow you to change the collection classes used internally now.
  • The Werkzeug debugger should work on the Google Appengine SDK now.

Get it while it's hot from the Python Package Index (aka cheeseshop).

SSH Proxying

written by Armin Ronacher, on Sunday, February 14, 2010 16:57.

For about a year now I run a little atom box with three harddrives hooked up in my flat in Graz, that serves as central media server. It runs samba, mt-daapd (for iTunes music sharing) and an HTTP server. The only thing that is visible to the outside from this server, is port 22 which is obviously an SSH server.

It is however still possible to get access to the services it provides by tunnelling. I don't care too much about samba here because I have access to the file system via SSH as well (OS X is able to mount SSH resources) but I do care about DAAP and the HTTP server. Because I constantly have to remember how to do that, I thought others might be interested in that as well, so here is how:

Simple Port Forwarding

Task is: forward a port so that you can access something remotely. For example the HTTP server:

ssh username@remote-server.example.com -N -L local-port:server-name:remote-port

So say my server that gives me access to my network via SSH is listening at servername.example.com I can forward port 80 to a local port 8080 like this:

ssh username@servername.example.com -N -L 8080:localhost:80

For example I know my rounter responds to the name router.local I can forward access to that as well:

ssh username@servername.example.com -N -L 8080:router.local:80

If you are interested in what the flags do: -N ensures no command is executed on the remote side, -L does the actual port forwarding. The format for the port forwarding is always [bind_address:]port:host:hostport where bind_address can be omitted in which case 0.0.0.0 is assumed.

DAAP Forwarding

Because I have a DAAP server at home, I can forward the music I want to listen to as well. This is a little bit trickier because the service has to be registered locally for iTunes or whatever you want to use, to pick up. On OS X this script does the trick:

#!/usr/bin/env python
import os
import signal
import time
from subprocess import Popen
from threading import Thread

SSH_SERVER_NAME = 'servername.example.com'
DAAP_SERVER_NAME = 'localhost'
REMOTE_DAAP_PORT = 3689
LOCAL_DAAP_PORT = 3689
SHARE_NAME = 'iTunes Share Name'

pids = []

def start_ssh():
    c = Popen(['ssh', SSH_SERVER_NAME, '-g', '-N', '-L', '%d:%s:%d' % (
        LOCAL_DAAP_PORT, DAAP_SERVER_NAME, REMOTE_DAAP_PORT
    )])
    pids.append(c.pid)
    c.wait()

def start_proxy():
    c = Popen(['dns-sd', '-R', SHARE_NAME, '_daap._tcp.', 'local',
               str(LOCAL_DAAP_PORT)])
    pids.append(c.pid)
    c.wait()


def main():
    print 'Starting up...'
    try:
        try:
            for task in start_ssh, start_proxy:
                t = Thread(target=task)
                t.setDaemon(True)
                t.start()
            while 1:
                time.sleep(30)
        except KeyboardInterrupt:
            pass
    finally:
        for pid in pids:
            try:
                os.kill(pid, signal.SIGTERM)
            except OSError:
                pass
        print 'Shut down'

if __name__ == '__main__':
    main()

Just start it and you will have a secure proxy to your remote music collection until you hit ^C. The share name can be anything, this is what shows up in the iTunes sidebar.

Porting to Python 3 — A Guide

written by Armin Ronacher, on Thursday, February 11, 2010 17:58.

The latest Jinja 2 release came with basic support for Python 3. It was surprisingly painless to port the application over but it did require a substantial amount of tweaks and code changes in order to get it running. For everyone else out there who is interested in getting started, I decided to share my experiences:

Changing APIs

Before you start porting the library you have to decide how interfaces will behave in Python 3. The biggest issue here is obviously unicode, but there are others as well. I would say there are four kinds of libraries you might encounter regarding string behavior in Python 2: There are the libraries that only accept unicode and only output unicode, there are those that only accept byte-strings and output byte-strings but operate on textual data, there are the libraries that operate on either or and what has been fed into it, comes out of it and there are libraries that operate either on unicode or byte-strings and also accept the other type as long as it's a subset of the default encoding (ASCII).

First you have to find out what your library does, what it is supposed to do, and how you want to deal with that in Python 3. Because byte-strings no longer exist in Python 3 and were replaced by a bytes object that works similar, but has an incompatible API it is very unlikely that your code will be able to support both in the future (or that it is something you would desire).

Byte-Based Libraries

This is might the most tricky one if you are aiming for Python 2.5 support or lower and you are operating on bytes directly. The issue is that the way you operate on bytes changed fundamentally from Python 2.x to 3.x and 2to3 is not really able to pick it up. Worse, it will try convert all your bytestring literals to unicode! The official support is as far as I know, to explicitly prefix the byte strings in the 2.x code with a leading b to indicate bytes. Unfortunately that means no support for 2.x. I am not completely sure what to do in that situation, but at least I found a way to trick python to operate on bytes: if you have code like this:

magic = 'M23\x01'

And you want to ensure it does not end up being a str in 3.x, add a dummy encode:

magic = u'M23\x01'.encode('iso-8859-1')

The only downside is that the encode happens at runtime, so it will slow down execution a bit.

Text Based Libraries

The second kind of library is a library that operates on text. In 2.x there were multiple ways to implement such libraries and it basically came down to what data type was used internally and what was accepted for input and output. There are the libraries that operate exclusively either on bytestrings or unicode. These are the ones that are the easiest to port, because 2to3 was written with nearly that in mind. If your library was only accepting bytestrings in 2.x it will (after a 2to3 run) only be accepting a Python 3 str type which is unicode based. This works well as long as you do not intend to use some kind of IO in your library. Once you start doing that, you will need to make sure you can somehow specify the encoding to be used when opening files. In that case, make sure you open the file in byte mode (not in text mode!) and do the decoding/encoding yourself. This is the only way your IO code will work the same in both 2.x and 3.x. But more on IO later.

What 2to3 does out of the box is converting calls from unicode to str automatically. Unfortunately it does not change the special __unicode__ method to __str__. You can easily do that in a custom fixer though, so it should be easy to accomplish. If your library however supports both __str__ and __unicode__ you are in a more tricky situation here. Let me show you an example of the kind of classes I deal with in Jinja 2 for example:

class MyObject(object):

    def __init__(self):
        self.value = u'some value'

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __unicode__(self):
        return self.value

The big problem here is that 2to3 will convert it to this:

class MyObject(object):

    def __init__(self):
        self.value = 'some value'

    def __str__(self):
        return str(self).encode('utf-8')

    def __unicode__(self):
        return self.value

If you call str() on your instance now, it will die with a runtime error because it recurses infinitely. Even if it would not recurse, it would try to return a bytes object from the __str__ method because of the encode call. My plan was to write a custom fixer that, if it detects a __str__ that just calls into __unicode__ and encodes, will drop the __str__ method and rename __unicode__ to __str__. Unfortunately the tree you are dealing with in 2to3 does not appear to be designed to removing code so what I do instead of removing the __str__ is just renaming the __unicode__ to __str__ and let Python override the dummy __str__ with the correct one. The fixer I use for that, looks like this:

from lib2to3 import fixer_base
from lib2to3.fixer_util import Name

class FixRenameUnicode(fixer_base.BaseFix):
    PATTERN = r"funcdef< 'def' name='__unicode__' parameters< '(' NAME ')' > any+ >"

    def transform(self, node, results):
        name = results['name']
        name.replace(Name('__str__', prefix=name.prefix))

After conversion with this fixer in place, the class from above will then look like this:

class MyObject(object):

    def __init__(self):
        self.value = 'some value'

    def __str__(self):
        return str(self).encode('utf-8')

    def __str__(self):
        return self.value

But where to put those fixers? Edit 2to3 directly? And do I have to provide two source packages for 2.x and 3.x? This is where distribute comes in.

2to3 through distribute

Distutils itself already has the possibility to run 2to3 for you, but what it cannot do is adding custom fixers without a lot of custom code. distribute on the other hand not gives you built in 2to3 support as a single keyword argument to setup() but can also pass custom fixers to 2to3 which is very helpful. Because these new keyword arguments however would warn if the setup script was executed with setuptools instead of distribute, you should only pass them to the setup function if invoked from Python 3. The setup script then looks like this:

import sys

from setuptools import setup

# if we are running on python 3, enable 2to3 and
# let it use the custom fixers from the custom_fixers
# package.
extra = {}
if sys.version_info >= (3, 0):
    extra.update(
        use_2to3=True,
        use_2to3_fixers=['custom_fixers']
    )


setup(
    name='Your Library',
    version='1.0',
    classifiers=[
        # make sure to use :: Python *and* :: Python :: 3 so
        # that pypi can list the package on the python 3 page
        'Programming Language :: Python',
        'Programming Language :: Python :: 3'
    ],
    packages=['yourlibrary'],
    # make sure to add custom_fixers to the MANIFEST.in
    include_package_data=True,
    **extra
)

Now all you have to do is to put the custom 2to3 fixers (written in Python 3!) into the custom_fixers package next to your real library and they will be added automatically. For examples of fixers, look into the lib2to3/fixes package or your Python 3 installation. If you run python3 setup.py build it will run 2to3 on your files and put the output into the build folder for you to test.

Input/Output

So in Python 3 there is a completely new input/output system. It is very Java-ish and is able to deal with unicode. The downside is that you either don't have it in 2.x or the implementation is too slow, so what you want to do is to create yourself an abstraction layer.

If your library was unicode based in older Python versions you probably just did file.read().decode(encoding) or something similar. This still works on 3.x and I strongly recommend doing that, but be sure to open the file in binary mode, otherwise on Python 3 the decode will attempt to decode an already decoded unicode string, which does not make any sense. If you need normalized newlines (windows newlines converted to '\n') you would have to post-process the string by hand, but must applications and libraries are able to deal with any kind of newline anyways.

You could also just create a IO helper module that calls the builtin open on 3.x and codecs.open on 2.x. Unfortunately codecs.open has a worse performance than the built in open on 2.x, so you might want to check how you are dealing with files, if a high performance is necessary and so forth. Most of the time, opening the file in binary mode is what you want to do.

If you library was byte based in 2.x and you opened files in the library, instead of just working on open file objects, you will have to change your API slightly in order to take the charset and error mode into account. If you previously had a function like this:

def read_file_contents(filename):
    with open(filename) as f:
        return f.read()

You will have to change it to something like this now:

def read_file_contents(filename, charset='utf-8', errors='strict'):
    with open(filename, 'rb') as f:
        return f.read().decode(charset, errors)

And then ensure that you give the user to provide these arguments to the function. This means that whatever calls this, would also have to accept this arguments and so forth. Not everyone is using utf-8, there might be legacy files in iso-8859-1 a user might still want to be able to open. With a proper error handling system, it might even be possible to fall back to another encoding if it does not decode as utf-8 properly.

Last but not least, 3.x StringIO is a "string IO", not something that accepts binary data. If you have a lot of unittests that are dealing with binary data in such objects, you will have to use the io.BytesIO instead. If it does not exist, you are running 2.x, and you can safely fall back to cStringIO.StringIO.

Unit-Testing

Now the biggest problem I had with switching to 3.x: The unittests. First of all: do not use doctest. There is a doctest converter in 2to3, but it does not give you much. Error messages changed, reprs changed which it cannot properly pick up, nested tracebacks cause a lot of grief and they are hard to debug. I was playing with the idea to write a tool that automatically converts doctests to unittests, but I was too lazy and converted the few I had in my code, to unittests by hand. Furthermore, the few doctests left (used as code examples in the documentation) are only tested if the testsuite is invoked from Python 2.x

Nosetest has 3.x support in a separate branch, py.test comes with 3.x for a while now and the builtin unittest does the trick as well. I personally converted all my Jinja 2 tests to unittest lately. If you are using unittest you can point distribute to your test suite function and it will run the test for you if you write python setup.py test. This even runs 2to3 for you if you execute it with Python 3. So very helpful.

Hope that helps you porting your libraries to Python 3. Would love to hear about your experiences, because even if Python 3 did not work out as some of us hoped, it is very important that we continue to port libraries over to 3.x.

Jinja 2.3 Released

written by Armin Ronacher, on Wednesday, February 10, 2010 1:21.

As promised, the libraries I am maintaining get new releases around now :) The first one in a series is Jinja 2, the sandboxed template engine for Python. First it was planned for this to be a bugfix release for 2.2 which came with two scoping bugs, but along the way I was able to fix more problems on the way and integrate cool new features as well. So what is new in Jinja 2.3? First of all, it's 100% backwards compatible so don't hesitate to upgrade.

Changelog

  • Fixed a couple of bugs with the code generator. Added many new tests for edge cases where Jinja's and Python's scoping rules fight each other. Hopefully no more UnboundLocalError: local variable 'l_foo' referenced before assignment.
  • Include tags can now select between multiple templates, and so you can do when loading templates.
  • Greatly improved error reporting for syntax errors
  • The i18n extension can now extract translation comments preceding translatable strings.
  • Added support for a with block that works similar to Django's with block, just with support for multiple variables.
  • Experimental Python 3 support

New Include Tags and Template Selection

As proposed by Justin Lilly an include tag can now accept multiple template names and will use the firs that succeeds. So that means you can do stuff like this:

{% include ["custom/article_%s.html" % article.id, "article.html", "page.html"] %}

Of course, that list can be a list-variable as well. This also lead to a two method on the environment that attempts to select from a list of templates and uses the first that succeeds. There are in total three methods now to load templates:

environment.get_template('template_name.html')
environment.select_template(['template1.html', 'template2.html'])
environment.get_or_select_template(...)

get_or_select_template will check the type of the argument and either call get_template or select_template. This is mainly used internally by the include tag if it has to figure out at runtime what to do. However you can of course use that in your code as well which could be handy. Just replace get_template in your general-purpose render function with get_or_select_template to support the alternative list-based lookup as well.

Improved Error Messages

For a long time Jinja 2 did not really give good hints if you wrote end-tags wrong. Now it will give you something more friendly than just unknown tag endif if you nest the tags wrong:

{% for item in seq %}
  {{ item }}

jinja2.exceptions.TemplateSyntaxError: Unexpected end of template. Jinja was looking for the following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.

{% for item in seq %}
  {% if item.something %}...{% endi %}
{% endfor %}

jinja2.exceptions.TemplateSyntaxError: Encountered unknown tag 'endi'. Jinja was looking for the following tags: 'elif' or 'else' or 'endif'. The innermost block that needs to be closed is 'if'.

{% block invalid-name %}
  ...
{% endblock %}

jinja2.exceptions.TemplateSyntaxError: Block names in Jinja have to be valid Python identifiers and may not contain hypens, use an underscore instead.

Especially the latter will come in handy for people switching over from Django who are used to hypens in identifiers. Before that change it would have printed "expected block_end" instead which is not very friendly and helpful.

Translator Comments

If you are using the i18n extension with Babel you can now provide comments for the translators. Just put the comment in the line before the gettext call / trans block or in the same line (before or after the call) and prefix it with one of the comment tags you have babel configured to use. For example you could use "trans", "l" or "_" for the comment tags. Here some examples how you could use it:

{#_ this is not shown to the user under regular circumstances #}
<p>{{ _('Tempered with form data') }}

<dl>
  <dd>{{ _('Username') }}:  {# trans the actual login name of the user #}
</dl>

{# trans count is the number of logged in users. Message shown in a small dialog #}
<p>{% trans count=users|length %}
  {{ count }} user is online
{% pluralize %}
  {{ count }} users are online
{% endtrans %}

Python 3 Support

This is the first Jinja 2 release that comes with experimental support for Python 3. All regular unittests pass but there are certainly some parts of Jinja 2 which have to be rewritten or re-specified for better Python 3 support. If you are using Jinja 2 with Python 3, please give feedback. Also the error reporting on Python 3 does not work yet like it should. In order to use it with Python 3 you must have the excellent distribute library installed. Also for Python 2.x I strongly recommend using distribute instead of setuptools, although both are supported.

Grab it while it's hot from the cheeseshop.

Expect a blog post about unittesting and 2to3 soon. Got some experiences to share :)

DVCS Ponies

written by Armin Ronacher, on Sunday, January 10, 2010 0:21.

So, I have a pony to share. Just maybe someone likes the idea and has thought of something similar :) So, one of the problems the Python guys currently have with mercurial is, that they need newline normalization/conversion on checkout. The problem with decentralized systems is that the client is a server and runs a stock DVCS client with his own set of hooks. In subversion you have a centralized hook that will check stuff and reject commits or rewrite them if they do not work out as expected.

Now I am very happy that hg is decentralized, but at the end of the day everything ends up in a centralized repository and there certain rules apply. So what I do is review all patches by hand in detail and rewrite parts if they do not match the code style, line endings etc. So what I thought about is doing the check/conversion in a local commit hook and in a pre-changeset hook on the server as well, so that I for myself get stopped early checking stupid stuff in and the server later rejects changesets for sure that are improper.

So I have no problem doing that check twice, but other users that are not so fortunate to have my extension might still check improper stuff in and will not even know it, because nothing stops them from doing so. But I will still have to deal with that later and there will be rejections from the server.

So what about an optional extension distribution system? Say you clone the zine repository and the hg client after clone proposes to download and activate some extensions for you? You can still veto them of course, but you could also accept and it helps to improve your code quality.

There are a couple of such extensions I would love to see that are distributed that way:

  • codestyle (PEP 8) conformance checker that warn if invalid stuff is checked in
  • newline conversions or warnings
  • ensures that files end with a trailing newline
  • warnings if certain files are checked in (object files, debugger informations etc, stackdumps etc.)
  • automatic copyright header updating tools (got a change in 2010? Copyright message is updated)
  • check line lengths
  • run test suites

All these extensions are project specific so someone has to adapt them for each project and provide them somewhere to download. There could be a .hgextensions file that contains download links for each of those extensions (including mirrors) or even relative links to extension scripts in the repository.

I think that would be easy to implement and if the user has the chance to veto the installation/activation it could be quite secure as well (one could even display the user the code of the extension he is about to install for that repository). Unfortunately it would require a change in the hg wire protocol so I guess it's unlikely to be implemented, but on the other hand if there is a wire protocol change planned, why not think about that?

Interested to hear your input :)