Pages tagged as ‘python’

Simple batch function for Python

Often I have an iterable i want to group. For example a list of integers and i want to process two at once. That’s a pretty nice idom I found in the documentation translated to itertools:

from itertools import izip, repeat

def batch(iterable, n):
    return izip(*repeat(iter(iterable), n))

Use it like that:

>>> for key, value in batch([1, 2, 3, 4], 2):
...  print key, value
... 
1 2
3 4

rst2html + git == personal wiki

This Makefile:

RSTOPTS=--time --link-stylesheet --stylesheet=style.css

SOURCES=$(wildcard *.rst)
HTML=$(foreach file,$(SOURCES),_build/$(basename $(file)).html)

all: html

_build/%.html: %.rst
        rst2html.py $(RSTOPTS) $^ > $@

html: $(HTML)

clean:
        rm -f $(HTML)

plus make html in .git/post-{commit.update} + python and docutils + a stylesheet in _build (all paths relative to your repository) is the perfect cross platform wiki :-)

Notice: my blog kills the tabs, copy/paste from the pastebin

How super() in Python3 works and why it’s retarded

I’m deeply sorry for the title of that post, but I hope that gives the topic the awareness I think it should get. In the last weeks something remarkable happened in the Python3 sources: self kinda became implicit. Not in function definitions, but in super calls. But not only self: also the class passed to super. That’s remarkable because it means that the language shifts into a completely different direction.

super was rarely used in the past, mainly because it was weird to use. In the most common use case the current class and the current instance where passed to it, and the super typed returned looked up the parent methods on the MRO for you. It was useful for multiple inheritance and mixin classes that don’t know their parent but confusing for many.

The main problem with replacing super(Foo, self).bar() with something like super.bar() is that self is explicit and the class (in that case Foo) can’t be determined by the caller. Furthermore the Python principle was always against functions doing stack introspection to find the caller. There are few examples in the stdlib or builtins that do some sort of caller introspection. Those are the special functions vars(), locals(), globals(), and __import__ and some functions in the inspect module. Four functions, and all of them do nothing more than getting the current frame and accessing the dict of locals or globals. What super in current Python 3 builds does goes way beyond that.

Currently if super is called without arguments Python performs these steps:

  • getting the current frame of the caller as well as the code object.
  • looking at “co_argcount” to make sure there is a first argument, if there is one it gets the object from the “f_localsplus” array on the frame object. This is btw an attribute not accessible from the Python code.
  • then it checks the “co_freevars” of the code object and iterates over all of them to check if one of them is “__class__” (because accessing __class__ in Python 3 creates a special bytecode that returns the class the function was defined in).
  • It it can’t find the __class__ in there it dies. How does __class__ end up there? Apparently the compiler checks if “super” or “__class__” is accessed. That’s right. It breaks if you alias super to another name and try to call that name.
  • Once it has that information it uses that as two first arguments. The class and the reference to self

I’m sorry, but that’s a very, very bad idea. It’s way more magical than anything we’ve had in Python in the past and just doesn’t fit into the language. We do have an explicit self in methods and we do not have methods. Our methods are functions, just that a descriptor puts a method object around it to pass the self as first arguments. That’s an incredible cool thing and makes things very simple and non-magical. Breaking that principle by coming up with an automatic super harms the whole thing a lot. Defs in classes are not completely differently from defs in the global scope or within another def.

Another odd thing is that Python 3 starts keeping information on the C Layer we can’t access from within Python which is a shame. Super is one example — it’s currently impossible to implement that from within Python. The other good example in Python 3 are methods. They don’t have a descriptor that wraps them if they are accessed via their classes. This as such is not a problem as you can call them the same (just that you can call them with completely different receivers now) but it becomes a problem if some of the functions are marked as staticmethods. Then they look completely the same when looking at them from a classes perspective:

>>> class C:
...  normal = lambda x: None
...  static = staticmethod(lambda x: None)
... 
>>> type(C.normal) is type(C.static)
True
>>> C.normal
<function <lambda> at 0×4da150>

As far as I can see a documentation tool has no chance to keep them apart even though they are completely different on an instance:

>>> type(C().normal) is type(C().static)
False
>>> C().normal
<bound method C.<lambda> of <__main__.C object at 0×4dbcf0>>
>>> C().static
<function <lambda> at 0×4da198>

While I was quite happy with the Python 3 progress so far, these two things are a major, major step into the wrong direction. I really hope that will be rolled back. If there is need for an automatic super self has to go away and __class__ become a free variable all the time or super a keyword. Everything else is too magical and more magical.

Update: I posted the subject on the python-dev mailing list.

The Pythonistas are Wrong

There’s something that’s been bugging me for a long time that I need to get off my chest. Some of you may hate me for it, but perhaps there are others out there with the same complaint, silently in agony, wishing for death to take the pain away. It’s time to set the record straight, and prove once and for all that the Pythonistas are wrong.

Pythons almost NEVER look like this:
python logo

The frog shown here is what the Python Foundation refers to as a “snake” (though it looks more like a frog), more specifically a blue/yellow one. The name “Python” however refers to a group of six British Gentleman* and something like 86.43% people know that. The name was chosen because snakes just suck. Get it? It’s not a snake, they are British.

Pythons however are better represented by a 16-ton weight or a dead parrot. But they are NOT represented by snakes.

scipy logo
See that one in the scipy logo? That’s a public domain circle someone added a white snake to. A SNAKE. Look at the wikipedia article and search for “snake”. Yeah, no match.

pycon08 logo
Even the Pycon (where Guido van Rossum himself spoke) has made the mistake of choosing this stupid snake.

xml tag python
lxml is doing it wrong too.

And probably your favourite Python module too. So keep in mind: Pythons are not Snakes!. And I think that proves once and for all that there are tons of projects with the wrong logo out there.

Sorry headius for taking advantage of your blog post but I wanted to blog about that for quite some time anyways ;-)

Update: fixed my mistake about all Pythons being British. Thanks Joe Pantuso.
Update 2: apparently they are all British now. *Terry Gilliam renounced his American citizenship. Thanks meow

Werkzeug Talk at GLT 08

April 20th, 2008

Yesterday I held my talk about Werkzeug at the Grazer Linuxtage. Unfortunately it went not as well as I hoped it would so I’m not unhappy that there is no audio/video record of it ;-)

The slides however are useful I think so I uploaded them in German and English:

Hope someone finds them useful.

High Level AST Module for Python

March 30th, 2008

Georg already blogged about the new ast to code compilation feature of Python so I won’t cover that any more. Basically the old compiler package is nor surpassed by python’s real internal AST which is freaking awesome :)

The only thing that is missing is some sort of high level interface to it which makes manipulating and debugging it easier. The motivation behind it is that template engines such as Genshi, Jinja or Mako which all operate on the AST don’t have to write that much boilerplate code just to modify small pieces of the AST.

I don’t know if this module will make it into the standard library but even if not it will surely be available via the cheeseshop. So what does it do? It provides classes and utility functions to manipulate or traverse the AST in a way that is actually useful for real world applications. For example with the transformer it’s not possible to replace and remove nodes. Additionally it’s possible to generate python sourcecode from an AST to simplify debugging.

Here a small example of a transformer that brings Genshi like Loop behavior to a piece of Python sourcecode:

class GenshiSemantics(ast.NodeTransformer):

    def context(self, method):
        return ast.Expr(ast.Call(
            ast.Attribute(ast.Name('data', ast.Load()), method, ast.Load()),
            [], [], None, None
        ))

    def visit_For(self, node):
        self.generic_visit(node)
        return [self.context('push'), node, self.context('pop')]

    def visit_Name(self, node):
        self.generic_visit(node)
        return ast.Subscript(
            ast.Name('data', ast.Load()),
            ast.Index(ast.Str(node.id)),
            node.ctx
        )

When applied this piece of code:

seq = [1, 2, 3, 4, 5]
item = 42
for item in seq:
    print 'inside', item
print 'outside', item

becomes

data['seq'] = [1, 2, 3, 4, 5]
data['item'] = 42
data.push()
for data['item'] in data['seq']:
    print 'inside', data['item']
data.pop()
print 'outside', data['item']

The sourcecode of the module is in the Pocoo sandbox as usual. Unfortunately there is a bug in the initialization function of the ASTs right now so you will need a patch for it. And obviously this requires Python 2.6.

The full sourcecode for the example above can be found here.

Jinja 3K

March 27th, 2008

So Python 3 is getting ready and looking at all the libraries I maintain I was really afraid of the final release. Looking at all those language changes it sounded like a horrible job to do. But there is that nice 2to3 script which should transform sourcecode from python2 to python3 in a semiautomatic manner. And what should I say? It kick ass.

Getting the current hg head of Jinja running on top of Python 3 was a matter of roughly 15 minutes. The only things I had to change was adding a missing sys import that came from the intern -> sys.intern translation and fixing a couple of sort() calls. Sort now requires you to use a key function. But that was easy to solve and it was only there for python 2.3 backwards compatibility anyways. The other minor thing I had to change was unicode behavior. Previously Jinja has had a env.to_unicode function that coerced bytestrings and unicode strings into unicode strings depending on the encoding setting on the environment. With Python 3 everything will use unicode internally so no need to have an environment charset anways so I replaced env.to_unicode with str everywhere. Theo the other unicode change was replacing file(foo, 'r').read().decode(charset) (pseudocode) with file(foo, encoding=charset).read().

That’s it! Well. I must admit Jinja is by far the easiest templating language to port from the trio of templating languages I use (Jinja, Mako, Genshi) as it has it’s own parser and does not relay on the now removed compiler module which was superseded with the new _ast with a different structure. So it clearly depends on what you’re using in your library how the quality of the conversion will be. For most libraries 2to3 will do a very good job.

Oh. What it also doesn’t convert are C extensions :) I don’t know if the Jinja traceback tools or speedups module work in Python 3 too as they are optional and I was too lazy to compile them, but that’s another thing you have to keep in mind. Also I was unable to run the Jinja testsuite to run everything as py.test doesn’t work on Python 3 so far.

But I’m very happy. If the process stays that simple my forecast of not using Python 3 before 2012 may be wrong. So, go on, port your libraries now, at least for testing purposes and let’s make some of the best available right with the Python 3.0 release which is scheduled for September. If the initial number of working libraries is high enough for some applications it improves Python’s changes for a quicker adoption a lot. I know it’s very unlikely for big projects to switch in a short timeframe but at least smaller projects will be able to benefit from Python 3 earlier.

And even if Guido disagrees: Python 3 is *the* change to break *small pieces* of the API. Don’t break them in a way that everybody is confused and doesn’t know how things work any more. But if there are some design flaws in the library you want to change adapt them with the change to Python 3 rather than between two Python 2 versions. (And of course document them!)

Sphinx Python Documentation Tool Released

March 21st, 2008

As mentioned in an earlier blog post I’m not a fan of full automatically generated documentation as I think it’s clearly the wrong way to solve documentation problems. Whenever I encounter epydoc generated documentation and I find out that that’s the only kind of documentation I run away :)

In the past there have been two kinds of documentation in the python world: handwritten documentation (Django for example) and full automated API documentation (Paste’s). Both of them have advantages and disadvantages for user and developer but none of them was perfect. At least for me. Django’s documentation is one of the best I ever encountered but writing such documentation yourself is painful. I tried to do something similar with Werkzeug and Jinja and it sucks. On the one hand because usually you start improving stuff and add the documentation later on, often forgetting about it. It’s not unlikely that the documentation is slightly different from the actual implementation because someone (usually me) forgot to “sync” them.

With Werkzeug 0.2 I was experimenting with combining those two things. I added some directives to docutils that pulled docstrings automatically from the objects and added them to the rst file. That way the documentation is perfectly in sync with the sourcecode and because the members are specified explicitly I can hide implementation details and private functions. However the Werkzeug documentation builder was and is a hack and was never meant to be used by anyone except the Werkzeug project, and even that one just for one release. The reason for that is that Georg Brandl spent the last couple of weeks rewriting and improving parts of the python documentation tool which powers the new Python 2.6 and 3.0 docs to support non cpython projects too.

The resulting library Sphinx is in my opinion the best general purpose documentation tool since sliced bread! It’s intended to be a tool for handwritten documentation that builds a documentation into standalone HTML files, CHM HTML files, LaTeX or pickles which can be used to display the documentation in a WSGI application.

It uses reStructuredText as markup language for all the documentation, supports syntax highlighting of code blocks via pygments, embedded doctests so that you can extend your testsuite with doctests from your documentation, has support for automatic object documentation by including the docstrings from objects listed (semiautomated documentation as I did with Werkzeug), automatic cross linking, index generation, changelog generators, many custom roles and directives for rst and much more.

While it’s the first release it’s already a very good documented and well tested library and used for the Python documentation. While I hate the word “framework” I think Sphinx could be that for documentation tools. The extension API can be used to add missing features and may also be used for more automated documentation generation in the future.

Unfortunately I wasn’t that active in the implementation of Sphinx so far, so I’m clearly the wrong person for further questions about the development direction of Sphinx but I’m sure Georg will answer your questions. You can contact him in #pocoo on irc.freenode.net or via E-Mail at georg guesswhichcharcomeshere python dot org.

Links:

Multiple Inheritance Considered Awesome

March 3rd, 2008

I must admit that I was not that much interested in Python 3 pretty much because I maintain a couple of libraries and all that backwards incompatible stuff looks like pain in the ass. Additionally lots of the stuff from the PEPs looks like it completely changes the way the language behaves. But I was wrong.

One of the things I always liked about Python was the ability to use multiple inheritance. Back in the pocoo days we stated using interfaces in a Zope/trac like manner and then noticed that we can do the very same with simple base classes and isinstance calls too. Yes of course you can abuse that and create inheritance trees nobody wants to look at, but seriously, your fault then.

But what always confused me was that nobody used ruby like mixins. Having multiple inheritance is predestined for mixin in functionality but the only classes in the standard library that did something like that was the DictMixin. And that mixin had another problem: it was not a subclass of dict (obviously). Now many libraries do something like isinstance(foo, dict) to switch between modes. A very common situation where this is necessary is if you want to accept iterables of tuples or dicts. This is a situation where duck-typing doesn’t really work out. Of course you can do hasattr(x, 'items') to check if that object implements the dict interface, but that is ugly and can lead to unexpected behavior. For example: what’s a dict? There are ancient dict like implementations missing the __contains__ method and just have has_key for example.

In many languages missing multiple inheritance this is solved by specifying a IMapping interface and implementing that in custom classes. But of course Python can do better and with Python 3 it finally did. And it did that in a incredible cool way that integrates nicely into the language and doesn’t break the zen of python which is freaking awesome.

So what’s the solution Python 3 has? Abstract base classes! So how do they work. As I already said above in the python python developers usually relayed on two things: duck typing (testing if an object implements a specific method) or instance checks against specific types. Now abstract base classes do both in a clever way. The builtin isinstance function is now overridable via __instancecheck__ and __subclasscheck__. While I doubt that anyone will override that by hand there is some cool metaclass magic going on in the abstract base classes that do that for you.

So an abstract base classes isn’t necessarily a baseclass of the object you are testing against but they could. Let me give you a small example. In Python 2.4 testing if an object is iterable worked like this:

try:
    iter(obj)
except TypeError:
    do_something_with_not_iterable_object(obj)
else:
    do_something_with_iterable_object(obj)

That wasn’t that bad and it has the advantage that we don’t have to implement some IIterable interface in all the iterable things. To check if an object is iterable an call to iter() is enough. But with Python 3 we also have an abstract base class called Iterable which we can use for testing now:

from collections import Iterable
if isinstance(obj, Iterable):
    the_object_is_iterable()
else:
    the_object_is_not_iterable()

But how does that work? The obj does not necessarily inherit from that class. As said above the metaclass of that abstract base class Iterable overrides the test functions and performs the checking for us. It sees that the object responds to __iter__ or the sequence iteration protocol inherited from older python versions and returns True so that we can react to it.

Additionally the metaclass of the abstract base classes keeps a registry of classes that provide a compatible interface. This makes it possible to let isinstance(some_dict, Mapping) return True in no time by just comparing the object type against the list of known classes that registered themselves for the abstract base class.

This happens for example for all the builtin classes. Inside the python module that specifies the ABCs this piece of code can be found:

MutableMapping.register(dict)

Imagine you wrote your own C extension that implements a cool linked list implementation. All you have to do to register your list as Sequence is this piece of code:

from _yourlinkedlist import YourLinkedList
from collections import Sequence
Sequence.register(YourLinkedList)

But that’s just one way to use abstract base classes. The nicest feature of them is that they ship a lot of annoying repetitive bootstrapping code. For example in werkzeug I wrote that nice HeaderSet which is basically a sorted case-insensitive set. But what I do not implement is __and__ and all the other set stuff because I’m a) lazy and b) doubt that someone will seriously missing it for the use case of that object. But what if the set behavior would come for free by just subclassing from Set? That’s what’s now possible in Python 3*:

from collections import Set

class HeaderSet(Set):

    def __init__(self, initial=()):
        self._ordering = list(initial)
        self._storage = set(map(str.lower, initial))

    def __contains__(self, x):
        return x.lower() in self._storage

    def __iter__(self):
        return iter(self._ordering)

    def __len__(self):
        return len(self._ordering)

    def __le__(self, other):
        return self._storage.__le__(other)

* actually I lied here. Right now it’s not possible. The basics work but as soon if I do foo & bar I either get a NameError, itertools not defined or a TypeError because __rle__ is not defined. But that’s why it’s an alpha ;-)

And what has this to do with multiple inheritance? In that example it might not be obvious but have a look at the class graph for that HeaderSet:

Set(Sized, Iterable, Container)
    HeaderSet

Very cool stuff. So what can this be used for? Specifying otherwise unspecified protocols. For example it was a common idom in Python 2.x to duck-type accept io like objects. For example in Django it’s pretty common to do something like this:

response = HttpResponse(mimetype='image/png')
my_pil_image.save(response)
return response

But it was pretty much unspecified what PIL calls on that response object. Just write()? write() and writeline()? Does it seek()? Now we have io.IOBase, io.BufferedIOBase and a lot more. This is great stuff and seriously, can’t wait porting my stuff to Python 3.

Sick of pkg_resources Warnings?

Create a sitecustomize.py in your site-packages with the following code:

import warnings
warnings.filterwarnings('ignore',
  message=r'Module .*? is being added to sys.path', append=True)
cogitations driven by wordpress