Archive for March, 2008

High Level AST Module for Python

March 30th, 2008

Georg already blogged about the new ast to code compilation feature of Python so I won’t cover that any more. Basically the old compiler package is nor surpassed by python’s real internal AST which is freaking awesome :)

The only thing that is missing is some sort of high level interface to it which makes manipulating and debugging it easier. The motivation behind it is that template engines such as Genshi, Jinja or Mako which all operate on the AST don’t have to write that much boilerplate code just to modify small pieces of the AST.

I don’t know if this module will make it into the standard library but even if not it will surely be available via the cheeseshop. So what does it do? It provides classes and utility functions to manipulate or traverse the AST in a way that is actually useful for real world applications. For example with the transformer it’s not possible to replace and remove nodes. Additionally it’s possible to generate python sourcecode from an AST to simplify debugging.

Here a small example of a transformer that brings Genshi like Loop behavior to a piece of Python sourcecode:

class GenshiSemantics(ast.NodeTransformer):

    def context(self, method):
        return ast.Expr(ast.Call(
            ast.Attribute(ast.Name('data', ast.Load()), method, ast.Load()),
            [], [], None, None
        ))

    def visit_For(self, node):
        self.generic_visit(node)
        return [self.context('push'), node, self.context('pop')]

    def visit_Name(self, node):
        self.generic_visit(node)
        return ast.Subscript(
            ast.Name('data', ast.Load()),
            ast.Index(ast.Str(node.id)),
            node.ctx
        )

When applied this piece of code:

seq = [1, 2, 3, 4, 5]
item = 42
for item in seq:
    print 'inside', item
print 'outside', item

becomes

data['seq'] = [1, 2, 3, 4, 5]
data['item'] = 42
data.push()
for data['item'] in data['seq']:
    print 'inside', data['item']
data.pop()
print 'outside', data['item']

The sourcecode of the module is in the Pocoo sandbox as usual. Unfortunately there is a bug in the initialization function of the ASTs right now so you will need a patch for it. And obviously this requires Python 2.6.

The full sourcecode for the example above can be found here.

Blöder Wortwitz

Schindlers Lifte, das muss man einfach bloggen ;-)

Jinja 3K

March 27th, 2008

So Python 3 is getting ready and looking at all the libraries I maintain I was really afraid of the final release. Looking at all those language changes it sounded like a horrible job to do. But there is that nice 2to3 script which should transform sourcecode from python2 to python3 in a semiautomatic manner. And what should I say? It kick ass.

Getting the current hg head of Jinja running on top of Python 3 was a matter of roughly 15 minutes. The only things I had to change was adding a missing sys import that came from the intern -> sys.intern translation and fixing a couple of sort() calls. Sort now requires you to use a key function. But that was easy to solve and it was only there for python 2.3 backwards compatibility anyways. The other minor thing I had to change was unicode behavior. Previously Jinja has had a env.to_unicode function that coerced bytestrings and unicode strings into unicode strings depending on the encoding setting on the environment. With Python 3 everything will use unicode internally so no need to have an environment charset anways so I replaced env.to_unicode with str everywhere. Theo the other unicode change was replacing file(foo, 'r').read().decode(charset) (pseudocode) with file(foo, encoding=charset).read().

That’s it! Well. I must admit Jinja is by far the easiest templating language to port from the trio of templating languages I use (Jinja, Mako, Genshi) as it has it’s own parser and does not relay on the now removed compiler module which was superseded with the new _ast with a different structure. So it clearly depends on what you’re using in your library how the quality of the conversion will be. For most libraries 2to3 will do a very good job.

Oh. What it also doesn’t convert are C extensions :) I don’t know if the Jinja traceback tools or speedups module work in Python 3 too as they are optional and I was too lazy to compile them, but that’s another thing you have to keep in mind. Also I was unable to run the Jinja testsuite to run everything as py.test doesn’t work on Python 3 so far.

But I’m very happy. If the process stays that simple my forecast of not using Python 3 before 2012 may be wrong. So, go on, port your libraries now, at least for testing purposes and let’s make some of the best available right with the Python 3.0 release which is scheduled for September. If the initial number of working libraries is high enough for some applications it improves Python’s changes for a quicker adoption a lot. I know it’s very unlikely for big projects to switch in a short timeframe but at least smaller projects will be able to benefit from Python 3 earlier.

And even if Guido disagrees: Python 3 is *the* change to break *small pieces* of the API. Don’t break them in a way that everybody is confused and doesn’t know how things work any more. But if there are some design flaws in the library you want to change adapt them with the change to Python 3 rather than between two Python 2 versions. (And of course document them!)

Genshi Slot @ GSoC 2008

March 26th, 2008

The TurboGears project has been accepted as a mentoring organization for the 2008 Google Summer of Code. Even if you’re not interested in TurboGears because your framework of choice is something else you might still be interested in that one as it includes two Genshi project ideas: Performance and Jython compatibility.

If you have a solid knowledge of XML/HTML, Python and you’re looking for a GSoC project that’s interesting, read Christopher’s blog post about it and go for it :-)

Sphinx Python Documentation Tool Released

March 21st, 2008

As mentioned in an earlier blog post I’m not a fan of full automatically generated documentation as I think it’s clearly the wrong way to solve documentation problems. Whenever I encounter epydoc generated documentation and I find out that that’s the only kind of documentation I run away :)

In the past there have been two kinds of documentation in the python world: handwritten documentation (Django for example) and full automated API documentation (Paste’s). Both of them have advantages and disadvantages for user and developer but none of them was perfect. At least for me. Django’s documentation is one of the best I ever encountered but writing such documentation yourself is painful. I tried to do something similar with Werkzeug and Jinja and it sucks. On the one hand because usually you start improving stuff and add the documentation later on, often forgetting about it. It’s not unlikely that the documentation is slightly different from the actual implementation because someone (usually me) forgot to “sync” them.

With Werkzeug 0.2 I was experimenting with combining those two things. I added some directives to docutils that pulled docstrings automatically from the objects and added them to the rst file. That way the documentation is perfectly in sync with the sourcecode and because the members are specified explicitly I can hide implementation details and private functions. However the Werkzeug documentation builder was and is a hack and was never meant to be used by anyone except the Werkzeug project, and even that one just for one release. The reason for that is that Georg Brandl spent the last couple of weeks rewriting and improving parts of the python documentation tool which powers the new Python 2.6 and 3.0 docs to support non cpython projects too.

The resulting library Sphinx is in my opinion the best general purpose documentation tool since sliced bread! It’s intended to be a tool for handwritten documentation that builds a documentation into standalone HTML files, CHM HTML files, LaTeX or pickles which can be used to display the documentation in a WSGI application.

It uses reStructuredText as markup language for all the documentation, supports syntax highlighting of code blocks via pygments, embedded doctests so that you can extend your testsuite with doctests from your documentation, has support for automatic object documentation by including the docstrings from objects listed (semiautomated documentation as I did with Werkzeug), automatic cross linking, index generation, changelog generators, many custom roles and directives for rst and much more.

While it’s the first release it’s already a very good documented and well tested library and used for the Python documentation. While I hate the word “framework” I think Sphinx could be that for documentation tools. The extension API can be used to add missing features and may also be used for more automated documentation generation in the future.

Unfortunately I wasn’t that active in the implementation of Sphinx so far, so I’m clearly the wrong person for further questions about the development direction of Sphinx but I’m sure Georg will answer your questions. You can contact him in #pocoo on irc.freenode.net or via E-Mail at georg guesswhichcharcomeshere python dot org.

Links:

There’s Music, And then there is Progressive

March 17th, 2008

Many of you probably already know that I’m one of those metalheads. But mainly because the number of concept albums is a lot higher compared to other genres and that most of the songs are very technical and tell a story. There is one metal subgenre I like most: progressive metal. There are tons of bands in said genre and some of them go into the direction of progressive rock (or the other way round of course). I would compare progressive metal more with Jazz and classical music than with thrash metal or death metal (both subgenres I like!) which are what people think of when hearing about metal.

But I think progressive metal is one of the not that extreme genres you usually don’t hear on the radio. The reason for this is probably that the average length of the song is far beyond the usual four minutes and because the songs don’t make that much sense if you don’t listen to them in the context of the rest of the album. Whenever I play selected songs to my pals that don’t listen to progressive normally the reaction is: “hmm. sounds not that bad, but I wouldn’t listen to that myself.”. And yes, progressive metal is not that kind of music you would listen to in the background only. It’s that kind of music you listen to like you read a book.

If you are a metal fan and you don’t know one of the albums below: visit the pirate bay, download one of them and listen to them (And if you like them, buy the CD/send the musicians money, whatever). The following list is my all time favorite collection of progressive metal albums.

Pain of Salvation — One Hour by the Concrete Lake tells the story of a stranger that works in the weapon industry who begins to have doubts about the morality of his profession. He realizes that he’s just a part of a big “machine”. The album then follows him on his voyage around the work that finally ends at Lake Karachay, the concrete lake, a lake in the former USSR where so much nuclear waste was dumped over the past fifty years that if one stood by the shore for one hour the radiation would be lethal.

For me it’s hard to say which of the POS albums is the best one and I really can’t say but “Concrete Lake” is one I can listen to over and over again. Both from a philosophical and musical point of view.

Dream Theater — Octavarium is a tricky beast. While it’s not a concept album that tells a story it certainly has a concept (a very complex one) behind it. It starts with the fact that countless things in the artwork, songs somehow have to do with the number eight and continues with ascending keys in the eight songs. The first song is in F, the negative time of the second one F#, the second one G, the negative of the third is G# and so on, all the way up to the next F which finishes the octave and the album. You can read a detailed analysis of the album at spatang.com.

It’s hard for me to select the Dream Theater album as all of them are incredible pieces of music but Octavarium is one of my personal favorites and every song on the album is totally different.

Opeth — Still Life is a great death metal – progressive metal cross-over concept album telling the story of a young Christ that discovers that Christianity isn’t exactly what he thought it would be and gets expelled from his home town. 15 years later he returns and is looking for his former love “Melinda” just to find out that she become a nun in the meantime. He asks her to come with him and she tells him that she still loves him but will not break her promise to the church. Later she’s killed by a soldier and the protagonist in rage kills the soldier and everyone he meets until he breaks down in total exhaustion. It ends with him being executed and right before his death he thinks he sees Melinda once more waving and watching him.

I think that album was the first real success of Opeth and if you like death metal elements it’s an incredible good one. If you like the story or not is a completely different thing but that shouldn’t stop you from enjoying the music.

Ayreon — 01011001 is one of the Ayreons albums, my personal favorite but probably not the best choice to start listening to Ayreon as the album somewhat continues the story from earlier albums. The album title is actually Y (1011001 is 89 which is the ASCII code for Y) tells the story of the planet Y and the seafaring ‘forever’ that lost their emotions and send their DNA to the earth in order to rediscover their emotions. Their comet extincts the dinosaurs and seed humanity. The experiment however fails as humanity becomes more and more depending on technology over time and in the end all tries to save them fail. The ‘forever’ leave the planet together with the migrator from a former Ayeron album.

Personally my favourite Ayreon album mainly because the synthesizers transport a very “spacey” sound and you can feel the story. Additionally the singers on the album (among others Hansi Kürsch from Blind Guardian, Daniel Gildenlöw from Pain of Salvation, Floor Jansen from After Forever and many more) are split up into man and forever and their duets and given roles match their characters and the characters in the story perfectly.

Blind Guardian — Nightfall in Middle-Earth is strictly speaking more power metal than progressive metal but an awesome concept album. The album is based upon Tolkien’s “The Silmarillion”, a book of tales from the First Age of Middle-earth. The album contains not only songs but also spoken parts narrating parts of the story. A true masterpiece and a great album to listen to while playing warhammer.

There are tons of good conceptual albums and progressive metal bands but time is rare and I could only pick a few of them. As I said earlier: if you like metal but never “tasted” progressive metal so far, give it a try :-)

Draconian Error Handling in XML

Mark found a broken blog, I have a nice broken XHTML page directly from the W3C: screenshot after the jump.

We all love XHTML do we? And yes I do know that this blog’s XML is equally broken but seriously, blame WordPress not me.

Multiple Inheritance Considered Awesome

March 3rd, 2008

I must admit that I was not that much interested in Python 3 pretty much because I maintain a couple of libraries and all that backwards incompatible stuff looks like pain in the ass. Additionally lots of the stuff from the PEPs looks like it completely changes the way the language behaves. But I was wrong.

One of the things I always liked about Python was the ability to use multiple inheritance. Back in the pocoo days we stated using interfaces in a Zope/trac like manner and then noticed that we can do the very same with simple base classes and isinstance calls too. Yes of course you can abuse that and create inheritance trees nobody wants to look at, but seriously, your fault then.

But what always confused me was that nobody used ruby like mixins. Having multiple inheritance is predestined for mixin in functionality but the only classes in the standard library that did something like that was the DictMixin. And that mixin had another problem: it was not a subclass of dict (obviously). Now many libraries do something like isinstance(foo, dict) to switch between modes. A very common situation where this is necessary is if you want to accept iterables of tuples or dicts. This is a situation where duck-typing doesn’t really work out. Of course you can do hasattr(x, 'items') to check if that object implements the dict interface, but that is ugly and can lead to unexpected behavior. For example: what’s a dict? There are ancient dict like implementations missing the __contains__ method and just have has_key for example.

In many languages missing multiple inheritance this is solved by specifying a IMapping interface and implementing that in custom classes. But of course Python can do better and with Python 3 it finally did. And it did that in a incredible cool way that integrates nicely into the language and doesn’t break the zen of python which is freaking awesome.

So what’s the solution Python 3 has? Abstract base classes! So how do they work. As I already said above in the python python developers usually relayed on two things: duck typing (testing if an object implements a specific method) or instance checks against specific types. Now abstract base classes do both in a clever way. The builtin isinstance function is now overridable via __instancecheck__ and __subclasscheck__. While I doubt that anyone will override that by hand there is some cool metaclass magic going on in the abstract base classes that do that for you.

So an abstract base classes isn’t necessarily a baseclass of the object you are testing against but they could. Let me give you a small example. In Python 2.4 testing if an object is iterable worked like this:

try:
    iter(obj)
except TypeError:
    do_something_with_not_iterable_object(obj)
else:
    do_something_with_iterable_object(obj)

That wasn’t that bad and it has the advantage that we don’t have to implement some IIterable interface in all the iterable things. To check if an object is iterable an call to iter() is enough. But with Python 3 we also have an abstract base class called Iterable which we can use for testing now:

from collections import Iterable
if isinstance(obj, Iterable):
    the_object_is_iterable()
else:
    the_object_is_not_iterable()

But how does that work? The obj does not necessarily inherit from that class. As said above the metaclass of that abstract base class Iterable overrides the test functions and performs the checking for us. It sees that the object responds to __iter__ or the sequence iteration protocol inherited from older python versions and returns True so that we can react to it.

Additionally the metaclass of the abstract base classes keeps a registry of classes that provide a compatible interface. This makes it possible to let isinstance(some_dict, Mapping) return True in no time by just comparing the object type against the list of known classes that registered themselves for the abstract base class.

This happens for example for all the builtin classes. Inside the python module that specifies the ABCs this piece of code can be found:

MutableMapping.register(dict)

Imagine you wrote your own C extension that implements a cool linked list implementation. All you have to do to register your list as Sequence is this piece of code:

from _yourlinkedlist import YourLinkedList
from collections import Sequence
Sequence.register(YourLinkedList)

But that’s just one way to use abstract base classes. The nicest feature of them is that they ship a lot of annoying repetitive bootstrapping code. For example in werkzeug I wrote that nice HeaderSet which is basically a sorted case-insensitive set. But what I do not implement is __and__ and all the other set stuff because I’m a) lazy and b) doubt that someone will seriously missing it for the use case of that object. But what if the set behavior would come for free by just subclassing from Set? That’s what’s now possible in Python 3*:

from collections import Set

class HeaderSet(Set):

    def __init__(self, initial=()):
        self._ordering = list(initial)
        self._storage = set(map(str.lower, initial))

    def __contains__(self, x):
        return x.lower() in self._storage

    def __iter__(self):
        return iter(self._ordering)

    def __len__(self):
        return len(self._ordering)

    def __le__(self, other):
        return self._storage.__le__(other)

* actually I lied here. Right now it’s not possible. The basics work but as soon if I do foo & bar I either get a NameError, itertools not defined or a TypeError because __rle__ is not defined. But that’s why it’s an alpha ;-)

And what has this to do with multiple inheritance? In that example it might not be obvious but have a look at the class graph for that HeaderSet:

Set(Sized, Iterable, Container)
    HeaderSet

Very cool stuff. So what can this be used for? Specifying otherwise unspecified protocols. For example it was a common idom in Python 2.x to duck-type accept io like objects. For example in Django it’s pretty common to do something like this:

response = HttpResponse(mimetype='image/png')
my_pil_image.save(response)
return response

But it was pretty much unspecified what PIL calls on that response object. Just write()? write() and writeline()? Does it seek()? Now we have io.IOBase, io.BufferedIOBase and a lot more. This is great stuff and seriously, can’t wait porting my stuff to Python 3.

NIN GHOSTS OMG WTF

March 3rd, 2008

Normally I try to avoid posting what is on digg because nobody is interested in that any more once it’s up there, but I could not resist this time. NIN’s Ghosts is available as download. What’s so awesome about it? It’s CC-NC licensed, Trent Reznor uploaded the first CD himself on the pirate bay, 5$ for the whole full freaking download in FLAC, MP3 or Apple lossless or 10$ for non download version. If you have more money you can buy the 75$ CD bundle that comes with the artwork, a DVD with the multitracks and a blueray disc in high-def stereo. And then there is the obligatory special edition for 300$ that includes four vinyls.

To quote the README in the torrent:

Nine Inch Nails: Ghosts I (2008)

This torrent is an official upload from Nine Inch Nails.

We’re very proud to present a new collection of instrumental music, Ghosts I-IV.
Almost two hours of music recorded over an intense ten week period last fall, Ghosts I-IV sprawls Nine Inch Nails across a variety of new terrain.

Now that we’re no longer constrained by a record label, we’ve decided to personally upload Ghosts I, the first of the four volumes, to various torrent sites, because we believe BitTorrent is a revolutionary digital distribution method, and we believe in finding ways to utilize new technologies instead of fighting them.

We encourage you to share the music of Ghosts I with your friends, post it on your website, play it on your podcast, use it for video projects, etc. It’s licensed for all non-commercial use under Creative Commons.

We’ve also made a 40 page PDF book to accompany the album. If you’d like to download it for free, visit http://ghosts.nin.com/main/pdf

Ghosts I is the first part of the 36 track collection Ghosts I-IV. Undoubtedly you’ll be able to find the complete collection on the same torrent network you found this file, but if you’re interested in the release, we encourage you to check it out at ghosts.nin.com, where the complete Ghosts I-IV is available directly from us in a variety of DRM-free digital formats, including FLAC lossless, for only $5. You can also order it on CD, or as a deluxe package with multitrack audio files, high definition audio on Blu-ray disc, and a large hard-bound book.

We genuinely appreciate your support, and hope you enjoy the new music. Thanks for listening.

http://ghosts.nin.com

cogitations driven by wordpress