Armin Ronacher

Werkzeug 0.6.1 and Jinja 2.4 Released

written by Armin Ronacher, on Monday, April 12, 2010 23:07.

I am thrilled to announce two new Versions of Pocoo Libraries today: Werkzeug 0.6.1 and Jinja 2.4. They both will be the required minimum versions of the Flask microframework which will have its first public preview release at the end of this week.

What's new in Werkzeug 0.6.1

This release is mainly a bugfix release but also brings some small new features and refactorings that make it easier to implement reusable frameworks on top of it:

  • Context local objects were heavily improved. They now support stacked operations which makes it possible easier to implement applications that have the ability to call itself.
  • The routing system now better works with different HTTP methods. HEAD is now implicitly added if GET is present and URL building for non-GET rules will not require the method any more if the result is not ambiguous.
  • Builtin development server now supports IPv6 and should work better on Windows systems.
  • The session system should work more reliable on Windows now as well under heavy load.
  • Added secure password hashing functions (hmac based salting + different hash methods)

What's new in Jinja 2.4?

The new Jinja 2.4 release (codename "Correlation") brings a new concept called evaluation contexts. They make it possible to alter the behavior of the templating system at runtime. The best example of this is the new autoescape extension that can enable and disable the autoescaping for blocks. It's also possible to preselect the autoescape setting based on the filename now. For instance you can configure Jinja to enable escaping for .html and .xml files and disable it for the rest.

Another huge change is the support for module loaders now. This makes it possible to precompile templates into standard Python modules and put them into a .zip file or folder on the filesystem. This allows you to compile templates in advance and upload them to Google's AppEngine or a production server to remove load.

Also the C extension for Jinja2 now supports Python 3.

Grab the libraries while they are hot from the Cheeseshop (aka Python Package Index):

April 1st Post Mortem

written by Armin Ronacher, on Saturday, April 3, 2010 22:36.

This year I decided to finally do what I planned for quite some time: an April's fool joke. (I did contribute a bit to PEP 3117, but that does not count). This year I decided to make a little joke about Python microframeworks (micro-web-frameworks?) and wrote a little thing, and created a website and screencast for it: denied.immersedcode.org.

I did expect some responses to that, but I was a little bit surprised by some of them though. So here my full disclosure of the april's fool prank, what people thought of it and what my conclusion is.

The Motivation

It seems like everybody likes microframeworks. Not sure what caused that, but there are plenty of them. web.py (Python) and camping (Ruby) where the first of their kind I think. Later others followed and it seemed that people love the idea of software that does not have dependencies and comes in a single file. So I thought, I can do the same and make fun of it, so let's just create a framework based on existing technology and throw everything together in a large single file: denied was born. I just bundled a Werkzeug, simplejson and Jinja2 into a single file and added a bit of code that glues them together.

The Implementation

Denied consists of 160 lines of code that implements a very basic WSGI application based on Werkzeug and Jinja2 that incorporates really stupid ideas into the code:

  • it stores state in the module and uses implicitly defined data structures
  • there is a function that accepts both a template filename or a template source string as the same parameter and guesses based on the contents of the string.
  • it introspects the interpreter frame to figure out the name of the function that called a template render function to automagically guess the name of the template.
  • it uses automatic function registration and decorators to register URL rules.

I don't want to go into details why I hate everything there, that would be a blog post of its own, but I want to point out that nearly all of these "features" were inspired by existing microframeworks.

I did not expect anyone to detect from these things that the framework was an April's fool joke, but I thought that the obfuscated sourcecode and the fact that it was basically just a zipfile would be obvious. However I got more than one mail asking me to release the sourcecode of it because people want to hack on it. Right now it has more than 50 followers and 6 forks on github which is insane if you keep in mind that Jinja2 and Werkzeug have less than 30 on bitbucket.

Thinking about it a bit more made me realize that camping back in the days was in fact delivered as obfuscated 2K file of Ruby code. Not sure why _why did that, but he was a man of mysteries so probably just because he thought it was fun.

The Screencast

To make the joke more obvious I created a screencast that would showcase the framework and do pretty much everything wrong. For that I created a persona called "Eirik Lahavre" that implemented the framework and did the screencast. Originally I wanted that person to be a Norwegian web developer but unfortunately the designated speaker disappeared so I had to ask a friend of mine (Jeroen Ruigrok van der Werven) to record it for me but he told me he can't do a norwegian accent so he went with French and Eirik Lundbergh became Eirik Lahavre. I lay flat on the floor when I listened to the recording for the first time because he's actually Dutch :)

The Website

For the website I collected tongue-in-cheek fake endorsements from popular Python programmers and added one for myself that was just bashing the quality of the code. I'm afraid I sort of made myself popular by bashing other people's web frameworks, at least reading reddit, hacker news and various mailinglists leaves that impression so I thought it would be fun to emphasize that a bit more on that website. This also comes very close to the website of web.py which shows a few obviously bad comments from popular Python hackers.

Furthermore the website shows a useless and short hello world example which shows nothing about how the framework works. This was inspired by every other microframework website out there. It claims RESTfulnes and super scaling capabilities, kick-ass performance and describes the developer of the project (the fictional Eirik Lahavre) as god of Python code and coming from an professional company.

The Details

For everything in the joke I did what I would never do. I even went so far to create the HTML of the website against my own code style, to use deprecated HTML tags in the presentation, claim to use XHTML even though the doctype and mimetype was wrong. The screencast also claims that flat files were a scalable NoSQL database and that missing form helpers were something positive because it means full flexibility.

The Impact

The screencast was downloaded over 10,000 times and the website got more than 50.000 hits. The link is still tweeted and I never got that many retweets for anything related to my projects so far. The fake project on github has more than 50 followers and 6 forks. Quite a few people took the project serious from the few comments on reddit and the emails I got.

What I learned

  • It does not matter how good intended or well written a project is, the bold marketing is king. Being present on github is huge. As much as I love bitbucket and mercurial, but there is an immense difference between having your project on github or bitbucket, and I'm afraid that no matter what bitbucket does or what the mercurial people do, they will never even come close to github in terms of user base people following your code and contributing.
  • Small snippets of code on the website are killer. Werkzeug tries to be honest by not showcasing a small "Hello World" application but something more complex to show the API, but that does not attract users. Jinja2 does not even try to show anything at all, you have to look at the documentation to see how it looks like. That drives potential users away.
  • Don't be honest: be bold. Nobody will check your claims anyway and if they don't live up to the promise, you can still say that your test setup was or your understanding of the problem is different.
  • There is no such thing as a "bad endorsement". People took it as a good sign that I did not give the project my blessing.

The Small Library

I'm currently trying to learn everything about game development and 3D graphics I possibly can. I found out that the best way to learn that is to write a minimal engine from scratch. Right now I'm doing that by looking at other source code and reading books and writing the most minimal code I can. I always try to prove to myself: existing code is way to complex, that has to be easier. After the third refactoring and improvements I usually end up with something as complex as the original code or the explanation from the book.

There is a reason why things are as complex as they are and not easier. I think the same is true for microframeworks. The reason why everybody is that crazy about having a single file implementing whatever is necessary to implement a web application is because you can claim it's easy and you can understand it. However things are not that easy in reality. I am pretty sure that other framework developers will agree.

web.py is the perfect example for that. It started as a library in 1000 lines of code in a single file, and look at what it became. It's not that simple any more. Many of the initial design decisions that were plain wrong were reverted. Such as abusing the print statement for outputting values to the browser. There were good reasons why nobody before web.py used print to output strings, yet web.py did it that way. And a few versions later it disappeared again for good.

What will Change?

For one I will put small example snippets on the Werkzeug and Jinja2 website. Also for the fun of it I will publish one of the projects on github just to see how that works out. In general though, I will try to keep things low profile because I just feel more comfortable with that.

Obviously, denied will stay the April's fool joke it was and not get further attention. The "promised" documentation will not come :) However I will probably blog about "how to create your own microframework based on Werkzeug" because right now people base their microframeworks on the standard library which I think is a terrible idea. One dependency might not be as good as no dependency, but with Tarek Ziade's tremendous work on packaging with Python that should not be a problem in the near future.

Werkzeug 0.6 – Hammer released

written by Armin Ronacher, on Thursday, February 18, 2010 23:29.

I am incredible happy to announce the latest release of Werkzeug 0.6, the WSGI toolkit for Python. There is so much new in this version, I am not sure where to start :)

  • pending deprecations were removed and tons of bugs were fixed. Among those we now have better RFC 2068 compatibility regarding cookie quoting.
  • the FileStorage object now gives you access to the multipart headers. Also the Google Chrome plaintext file-upload handled as form data is gone now.
  • The URL routing system is now able to give you back the URL rule object instead of the endpoint. This makes it possible to put more information onto a rule. This can be used to bind a rule to a function and interpolate the rule endpoint string from the function name. When matching, the rule object is returned and with that you have access to both the function and the endpoint name.
  • The builtin development server now has SSL support. You can finally test your application via HTTPS without the need of an external server.
  • Finally! Response objects are no longer modified in place when evaluated as WSGI application. This is a backwards incompatible change but Werkzeug will warn you if you rely on that behaviour. What does this mean to you? In a nutshell, it is now possible to share response objects between requests. Also, if you freeze a response now, the content length is set automatically.
  • If you are working in multi-charset environments, you can configure the Werkzeug wrappers to support dynamic charset setting. This makes it possible to change the content type or charset and the system reacts dynamically when encoding / decoding.
  • The wrappers also have a default __repr__ now which spits out useful information.
  • All the builtin data structures can be pickled now.
  • Huge performance improvements for the new multipart parser now. If uploads were slow for you, update right now :)
  • If you are working with custom HTTP methods (such as DAV) you can now rely on the stream attribute of the wrappers being usable. Previously you had to fall back to the raw WSGI input stream instead.
  • The accept objects returned for accept headers now have a best_match method that helps you guessing the best type.
  • Werkzeug is finally able to work with IRIs. You can safely use unicode URLs internally now.
  • Also, Werkzeug now provides a OrderedMultiDict that preserves order of form data. Use this if you need to ensure that you can process form data without changing the order. From what I've heard, that is a requirement to work with the PayPal API.
  • The MoinMoin wiki people contributed many patches to improve the filesystem session support from the session contrib module. If you use filesystem sessions, you want to upgrade ASAP :)
  • Werkzeug no longer utilizes the time module for header parsing. This means that finally the time parsing behaviour is consistent regarding overflows and works on a wider range of dates. Also it will no longer have undefined behaviour for invalid dates which do occur in the real world :(
  • The wrappers allow you to change the collection classes used internally now.
  • The Werkzeug debugger should work on the Google Appengine SDK now.

Get it while it's hot from the Python Package Index (aka cheeseshop).

SSH Proxying

written by Armin Ronacher, on Sunday, February 14, 2010 16:57.

For about a year now I run a little atom box with three harddrives hooked up in my flat in Graz, that serves as central media server. It runs samba, mt-daapd (for iTunes music sharing) and an HTTP server. The only thing that is visible to the outside from this server, is port 22 which is obviously an SSH server.

It is however still possible to get access to the services it provides by tunnelling. I don't care too much about samba here because I have access to the file system via SSH as well (OS X is able to mount SSH resources) but I do care about DAAP and the HTTP server. Because I constantly have to remember how to do that, I thought others might be interested in that as well, so here is how:

Simple Port Forwarding

Task is: forward a port so that you can access something remotely. For example the HTTP server:

ssh username@remote-server.example.com -N -L local-port:server-name:remote-port

So say my server that gives me access to my network via SSH is listening at servername.example.com I can forward port 80 to a local port 8080 like this:

ssh username@servername.example.com -N -L 8080:localhost:80

For example I know my rounter responds to the name router.local I can forward access to that as well:

ssh username@servername.example.com -N -L 8080:router.local:80

If you are interested in what the flags do: -N ensures no command is executed on the remote side, -L does the actual port forwarding. The format for the port forwarding is always [bind_address:]port:host:hostport where bind_address can be omitted in which case 0.0.0.0 is assumed.

DAAP Forwarding

Because I have a DAAP server at home, I can forward the music I want to listen to as well. This is a little bit trickier because the service has to be registered locally for iTunes or whatever you want to use, to pick up. On OS X this script does the trick:

#!/usr/bin/env python
import os
import signal
import time
from subprocess import Popen
from threading import Thread

SSH_SERVER_NAME = 'servername.example.com'
DAAP_SERVER_NAME = 'localhost'
REMOTE_DAAP_PORT = 3689
LOCAL_DAAP_PORT = 3689
SHARE_NAME = 'iTunes Share Name'

pids = []

def start_ssh():
    c = Popen(['ssh', SSH_SERVER_NAME, '-g', '-N', '-L', '%d:%s:%d' % (
        LOCAL_DAAP_PORT, DAAP_SERVER_NAME, REMOTE_DAAP_PORT
    )])
    pids.append(c.pid)
    c.wait()

def start_proxy():
    c = Popen(['dns-sd', '-R', SHARE_NAME, '_daap._tcp.', 'local',
               str(LOCAL_DAAP_PORT)])
    pids.append(c.pid)
    c.wait()


def main():
    print 'Starting up...'
    try:
        try:
            for task in start_ssh, start_proxy:
                t = Thread(target=task)
                t.setDaemon(True)
                t.start()
            while 1:
                time.sleep(30)
        except KeyboardInterrupt:
            pass
    finally:
        for pid in pids:
            try:
                os.kill(pid, signal.SIGTERM)
            except OSError:
                pass
        print 'Shut down'

if __name__ == '__main__':
    main()

Just start it and you will have a secure proxy to your remote music collection until you hit ^C. The share name can be anything, this is what shows up in the iTunes sidebar.

Porting to Python 3 — A Guide

written by Armin Ronacher, on Thursday, February 11, 2010 17:58.

The latest Jinja 2 release came with basic support for Python 3. It was surprisingly painless to port the application over but it did require a substantial amount of tweaks and code changes in order to get it running. For everyone else out there who is interested in getting started, I decided to share my experiences:

Changing APIs

Before you start porting the library you have to decide how interfaces will behave in Python 3. The biggest issue here is obviously unicode, but there are others as well. I would say there are four kinds of libraries you might encounter regarding string behavior in Python 2: There are the libraries that only accept unicode and only output unicode, there are those that only accept byte-strings and output byte-strings but operate on textual data, there are the libraries that operate on either or and what has been fed into it, comes out of it and there are libraries that operate either on unicode or byte-strings and also accept the other type as long as it's a subset of the default encoding (ASCII).

First you have to find out what your library does, what it is supposed to do, and how you want to deal with that in Python 3. Because byte-strings no longer exist in Python 3 and were replaced by a bytes object that works similar, but has an incompatible API it is very unlikely that your code will be able to support both in the future (or that it is something you would desire).

Byte-Based Libraries

This is might the most tricky one if you are aiming for Python 2.5 support or lower and you are operating on bytes directly. The issue is that the way you operate on bytes changed fundamentally from Python 2.x to 3.x and 2to3 is not really able to pick it up. Worse, it will try convert all your bytestring literals to unicode! The official support is as far as I know, to explicitly prefix the byte strings in the 2.x code with a leading b to indicate bytes. Unfortunately that means no support for 2.x. I am not completely sure what to do in that situation, but at least I found a way to trick python to operate on bytes: if you have code like this:

magic = 'M23\x01'

And you want to ensure it does not end up being a str in 3.x, add a dummy encode:

magic = u'M23\x01'.encode('iso-8859-1')

The only downside is that the encode happens at runtime, so it will slow down execution a bit.

Text Based Libraries

The second kind of library is a library that operates on text. In 2.x there were multiple ways to implement such libraries and it basically came down to what data type was used internally and what was accepted for input and output. There are the libraries that operate exclusively either on bytestrings or unicode. These are the ones that are the easiest to port, because 2to3 was written with nearly that in mind. If your library was only accepting bytestrings in 2.x it will (after a 2to3 run) only be accepting a Python 3 str type which is unicode based. This works well as long as you do not intend to use some kind of IO in your library. Once you start doing that, you will need to make sure you can somehow specify the encoding to be used when opening files. In that case, make sure you open the file in byte mode (not in text mode!) and do the decoding/encoding yourself. This is the only way your IO code will work the same in both 2.x and 3.x. But more on IO later.

What 2to3 does out of the box is converting calls from unicode to str automatically. Unfortunately it does not change the special __unicode__ method to __str__. You can easily do that in a custom fixer though, so it should be easy to accomplish. If your library however supports both __str__ and __unicode__ you are in a more tricky situation here. Let me show you an example of the kind of classes I deal with in Jinja 2 for example:

class MyObject(object):

    def __init__(self):
        self.value = u'some value'

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __unicode__(self):
        return self.value

The big problem here is that 2to3 will convert it to this:

class MyObject(object):

    def __init__(self):
        self.value = 'some value'

    def __str__(self):
        return str(self).encode('utf-8')

    def __unicode__(self):
        return self.value

If you call str() on your instance now, it will die with a runtime error because it recurses infinitely. Even if it would not recurse, it would try to return a bytes object from the __str__ method because of the encode call. My plan was to write a custom fixer that, if it detects a __str__ that just calls into __unicode__ and encodes, will drop the __str__ method and rename __unicode__ to __str__. Unfortunately the tree you are dealing with in 2to3 does not appear to be designed to removing code so what I do instead of removing the __str__ is just renaming the __unicode__ to __str__ and let Python override the dummy __str__ with the correct one. The fixer I use for that, looks like this:

from lib2to3 import fixer_base
from lib2to3.fixer_util import Name

class FixRenameUnicode(fixer_base.BaseFix):
    PATTERN = r"funcdef< 'def' name='__unicode__' parameters< '(' NAME ')' > any+ >"

    def transform(self, node, results):
        name = results['name']
        name.replace(Name('__str__', prefix=name.prefix))

After conversion with this fixer in place, the class from above will then look like this:

class MyObject(object):

    def __init__(self):
        self.value = 'some value'

    def __str__(self):
        return str(self).encode('utf-8')

    def __str__(self):
        return self.value

But where to put those fixers? Edit 2to3 directly? And do I have to provide two source packages for 2.x and 3.x? This is where distribute comes in.

2to3 through distribute

Distutils itself already has the possibility to run 2to3 for you, but what it cannot do is adding custom fixers without a lot of custom code. distribute on the other hand not gives you built in 2to3 support as a single keyword argument to setup() but can also pass custom fixers to 2to3 which is very helpful. Because these new keyword arguments however would warn if the setup script was executed with setuptools instead of distribute, you should only pass them to the setup function if invoked from Python 3. The setup script then looks like this:

import sys

from setuptools import setup

# if we are running on python 3, enable 2to3 and
# let it use the custom fixers from the custom_fixers
# package.
extra = {}
if sys.version_info >= (3, 0):
    extra.update(
        use_2to3=True,
        use_2to3_fixers=['custom_fixers']
    )


setup(
    name='Your Library',
    version='1.0',
    classifiers=[
        # make sure to use :: Python *and* :: Python :: 3 so
        # that pypi can list the package on the python 3 page
        'Programming Language :: Python',
        'Programming Language :: Python :: 3'
    ],
    packages=['yourlibrary'],
    # make sure to add custom_fixers to the MANIFEST.in
    include_package_data=True,
    **extra
)

Now all you have to do is to put the custom 2to3 fixers (written in Python 3!) into the custom_fixers package next to your real library and they will be added automatically. For examples of fixers, look into the lib2to3/fixes package or your Python 3 installation. If you run python3 setup.py build it will run 2to3 on your files and put the output into the build folder for you to test.

Input/Output

So in Python 3 there is a completely new input/output system. It is very Java-ish and is able to deal with unicode. The downside is that you either don't have it in 2.x or the implementation is too slow, so what you want to do is to create yourself an abstraction layer.

If your library was unicode based in older Python versions you probably just did file.read().decode(encoding) or something similar. This still works on 3.x and I strongly recommend doing that, but be sure to open the file in binary mode, otherwise on Python 3 the decode will attempt to decode an already decoded unicode string, which does not make any sense. If you need normalized newlines (windows newlines converted to '\n') you would have to post-process the string by hand, but must applications and libraries are able to deal with any kind of newline anyways.

You could also just create a IO helper module that calls the builtin open on 3.x and codecs.open on 2.x. Unfortunately codecs.open has a worse performance than the built in open on 2.x, so you might want to check how you are dealing with files, if a high performance is necessary and so forth. Most of the time, opening the file in binary mode is what you want to do.

If you library was byte based in 2.x and you opened files in the library, instead of just working on open file objects, you will have to change your API slightly in order to take the charset and error mode into account. If you previously had a function like this:

def read_file_contents(filename):
    with open(filename) as f:
        return f.read()

You will have to change it to something like this now:

def read_file_contents(filename, charset='utf-8', errors='strict'):
    with open(filename, 'rb') as f:
        return f.read().decode(charset, errors)

And then ensure that you give the user to provide these arguments to the function. This means that whatever calls this, would also have to accept this arguments and so forth. Not everyone is using utf-8, there might be legacy files in iso-8859-1 a user might still want to be able to open. With a proper error handling system, it might even be possible to fall back to another encoding if it does not decode as utf-8 properly.

Last but not least, 3.x StringIO is a "string IO", not something that accepts binary data. If you have a lot of unittests that are dealing with binary data in such objects, you will have to use the io.BytesIO instead. If it does not exist, you are running 2.x, and you can safely fall back to cStringIO.StringIO.

Unit-Testing

Now the biggest problem I had with switching to 3.x: The unittests. First of all: do not use doctest. There is a doctest converter in 2to3, but it does not give you much. Error messages changed, reprs changed which it cannot properly pick up, nested tracebacks cause a lot of grief and they are hard to debug. I was playing with the idea to write a tool that automatically converts doctests to unittests, but I was too lazy and converted the few I had in my code, to unittests by hand. Furthermore, the few doctests left (used as code examples in the documentation) are only tested if the testsuite is invoked from Python 2.x

Nosetest has 3.x support in a separate branch, py.test comes with 3.x for a while now and the builtin unittest does the trick as well. I personally converted all my Jinja 2 tests to unittest lately. If you are using unittest you can point distribute to your test suite function and it will run the test for you if you write python setup.py test. This even runs 2to3 for you if you execute it with Python 3. So very helpful.

Hope that helps you porting your libraries to Python 3. Would love to hear about your experiences, because even if Python 3 did not work out as some of us hoped, it is very important that we continue to port libraries over to 3.x.