Pages tagged as ‘python’

High Level AST Module for Python

March 30th, 2008

Georg already blogged about the new ast to code compilation feature of Python so I won’t cover that any more. Basically the old compiler package is nor surpassed by python’s real internal AST which is freaking awesome :)

The only thing that is missing is some sort of high level interface to it which makes manipulating and debugging it easier. The motivation behind it is that template engines such as Genshi, Jinja or Mako which all operate on the AST don’t have to write that much boilerplate code just to modify small pieces of the AST.

I don’t know if this module will make it into the standard library but even if not it will surely be available via the cheeseshop. So what does it do? It provides classes and utility functions to manipulate or traverse the AST in a way that is actually useful for real world applications. For example with the transformer it’s not possible to replace and remove nodes. Additionally it’s possible to generate python sourcecode from an AST to simplify debugging.

Here a small example of a transformer that brings Genshi like Loop behavior to a piece of Python sourcecode:

class GenshiSemantics(ast.NodeTransformer):

    def context(self, method):
        return ast.Expr(ast.Call(
            ast.Attribute(ast.Name('data', ast.Load()), method, ast.Load()),
            [], [], None, None
        ))

    def visit_For(self, node):
        self.generic_visit(node)
        return [self.context('push'), node, self.context('pop')]

    def visit_Name(self, node):
        self.generic_visit(node)
        return ast.Subscript(
            ast.Name('data', ast.Load()),
            ast.Index(ast.Str(node.id)),
            node.ctx
        )

When applied this piece of code:

seq = [1, 2, 3, 4, 5]
item = 42
for item in seq:
    print 'inside', item
print 'outside', item

becomes

data['seq'] = [1, 2, 3, 4, 5]
data['item'] = 42
data.push()
for data['item'] in data['seq']:
    print 'inside', data['item']
data.pop()
print 'outside', data['item']

The sourcecode of the module is in the Pocoo sandbox as usual. Unfortunately there is a bug in the initialization function of the ASTs right now so you will need a patch for it. And obviously this requires Python 2.6.

The full sourcecode for the example above can be found here.

Jinja 3K

March 27th, 2008

So Python 3 is getting ready and looking at all the libraries I maintain I was really afraid of the final release. Looking at all those language changes it sounded like a horrible job to do. But there is that nice 2to3 script which should transform sourcecode from python2 to python3 in a semiautomatic manner. And what should I say? It kick ass.

Getting the current hg head of Jinja running on top of Python 3 was a matter of roughly 15 minutes. The only things I had to change was adding a missing sys import that came from the intern -> sys.intern translation and fixing a couple of sort() calls. Sort now requires you to use a key function. But that was easy to solve and it was only there for python 2.3 backwards compatibility anyways. The other minor thing I had to change was unicode behavior. Previously Jinja has had a env.to_unicode function that coerced bytestrings and unicode strings into unicode strings depending on the encoding setting on the environment. With Python 3 everything will use unicode internally so no need to have an environment charset anways so I replaced env.to_unicode with str everywhere. Theo the other unicode change was replacing file(foo, 'r').read().decode(charset) (pseudocode) with file(foo, encoding=charset).read().

That’s it! Well. I must admit Jinja is by far the easiest templating language to port from the trio of templating languages I use (Jinja, Mako, Genshi) as it has it’s own parser and does not relay on the now removed compiler module which was superseded with the new _ast with a different structure. So it clearly depends on what you’re using in your library how the quality of the conversion will be. For most libraries 2to3 will do a very good job.

Oh. What it also doesn’t convert are C extensions :) I don’t know if the Jinja traceback tools or speedups module work in Python 3 too as they are optional and I was too lazy to compile them, but that’s another thing you have to keep in mind. Also I was unable to run the Jinja testsuite to run everything as py.test doesn’t work on Python 3 so far.

But I’m very happy. If the process stays that simple my forecast of not using Python 3 before 2012 may be wrong. So, go on, port your libraries now, at least for testing purposes and let’s make some of the best available right with the Python 3.0 release which is scheduled for September. If the initial number of working libraries is high enough for some applications it improves Python’s changes for a quicker adoption a lot. I know it’s very unlikely for big projects to switch in a short timeframe but at least smaller projects will be able to benefit from Python 3 earlier.

And even if Guido disagrees: Python 3 is *the* change to break *small pieces* of the API. Don’t break them in a way that everybody is confused and doesn’t know how things work any more. But if there are some design flaws in the library you want to change adapt them with the change to Python 3 rather than between two Python 2 versions. (And of course document them!)

Sphinx Python Documentation Tool Released

March 21st, 2008

As mentioned in an earlier blog post I’m not a fan of full automatically generated documentation as I think it’s clearly the wrong way to solve documentation problems. Whenever I encounter epydoc generated documentation and I find out that that’s the only kind of documentation I run away :)

In the past there have been two kinds of documentation in the python world: handwritten documentation (Django for example) and full automated API documentation (Paste’s). Both of them have advantages and disadvantages for user and developer but none of them was perfect. At least for me. Django’s documentation is one of the best I ever encountered but writing such documentation yourself is painful. I tried to do something similar with Werkzeug and Jinja and it sucks. On the one hand because usually you start improving stuff and add the documentation later on, often forgetting about it. It’s not unlikely that the documentation is slightly different from the actual implementation because someone (usually me) forgot to “sync” them.

With Werkzeug 0.2 I was experimenting with combining those two things. I added some directives to docutils that pulled docstrings automatically from the objects and added them to the rst file. That way the documentation is perfectly in sync with the sourcecode and because the members are specified explicitly I can hide implementation details and private functions. However the Werkzeug documentation builder was and is a hack and was never meant to be used by anyone except the Werkzeug project, and even that one just for one release. The reason for that is that Georg Brandl spent the last couple of weeks rewriting and improving parts of the python documentation tool which powers the new Python 2.6 and 3.0 docs to support non cpython projects too.

The resulting library Sphinx is in my opinion the best general purpose documentation tool since sliced bread! It’s intended to be a tool for handwritten documentation that builds a documentation into standalone HTML files, CHM HTML files, LaTeX or pickles which can be used to display the documentation in a WSGI application.

It uses reStructuredText as markup language for all the documentation, supports syntax highlighting of code blocks via pygments, embedded doctests so that you can extend your testsuite with doctests from your documentation, has support for automatic object documentation by including the docstrings from objects listed (semiautomated documentation as I did with Werkzeug), automatic cross linking, index generation, changelog generators, many custom roles and directives for rst and much more.

While it’s the first release it’s already a very good documented and well tested library and used for the Python documentation. While I hate the word “framework” I think Sphinx could be that for documentation tools. The extension API can be used to add missing features and may also be used for more automated documentation generation in the future.

Unfortunately I wasn’t that active in the implementation of Sphinx so far, so I’m clearly the wrong person for further questions about the development direction of Sphinx but I’m sure Georg will answer your questions. You can contact him in #pocoo on irc.freenode.net or via E-Mail at georg guesswhichcharcomeshere python dot org.

Links:

Multiple Inheritance Considered Awesome

March 3rd, 2008

I must admit that I was not that much interested in Python 3 pretty much because I maintain a couple of libraries and all that backwards incompatible stuff looks like pain in the ass. Additionally lots of the stuff from the PEPs looks like it completely changes the way the language behaves. But I was wrong.

One of the things I always liked about Python was the ability to use multiple inheritance. Back in the pocoo days we stated using interfaces in a Zope/trac like manner and then noticed that we can do the very same with simple base classes and isinstance calls too. Yes of course you can abuse that and create inheritance trees nobody wants to look at, but seriously, your fault then.

But what always confused me was that nobody used ruby like mixins. Having multiple inheritance is predestined for mixin in functionality but the only classes in the standard library that did something like that was the DictMixin. And that mixin had another problem: it was not a subclass of dict (obviously). Now many libraries do something like isinstance(foo, dict) to switch between modes. A very common situation where this is necessary is if you want to accept iterables of tuples or dicts. This is a situation where duck-typing doesn’t really work out. Of course you can do hasattr(x, 'items') to check if that object implements the dict interface, but that is ugly and can lead to unexpected behavior. For example: what’s a dict? There are ancient dict like implementations missing the __contains__ method and just have has_key for example.

In many languages missing multiple inheritance this is solved by specifying a IMapping interface and implementing that in custom classes. But of course Python can do better and with Python 3 it finally did. And it did that in a incredible cool way that integrates nicely into the language and doesn’t break the zen of python which is freaking awesome.

So what’s the solution Python 3 has? Abstract base classes! So how do they work. As I already said above in the python python developers usually relayed on two things: duck typing (testing if an object implements a specific method) or instance checks against specific types. Now abstract base classes do both in a clever way. The builtin isinstance function is now overridable via __instancecheck__ and __subclasscheck__. While I doubt that anyone will override that by hand there is some cool metaclass magic going on in the abstract base classes that do that for you.

So an abstract base classes isn’t necessarily a baseclass of the object you are testing against but they could. Let me give you a small example. In Python 2.4 testing if an object is iterable worked like this:

try:
    iter(obj)
except TypeError:
    do_something_with_not_iterable_object(obj)
else:
    do_something_with_iterable_object(obj)

That wasn’t that bad and it has the advantage that we don’t have to implement some IIterable interface in all the iterable things. To check if an object is iterable an call to iter() is enough. But with Python 3 we also have an abstract base class called Iterable which we can use for testing now:

from collections import Iterable
if isinstance(obj, Iterable):
    the_object_is_iterable()
else:
    the_object_is_not_iterable()

But how does that work? The obj does not necessarily inherit from that class. As said above the metaclass of that abstract base class Iterable overrides the test functions and performs the checking for us. It sees that the object responds to __iter__ or the sequence iteration protocol inherited from older python versions and returns True so that we can react to it.

Additionally the metaclass of the abstract base classes keeps a registry of classes that provide a compatible interface. This makes it possible to let isinstance(some_dict, Mapping) return True in no time by just comparing the object type against the list of known classes that registered themselves for the abstract base class.

This happens for example for all the builtin classes. Inside the python module that specifies the ABCs this piece of code can be found:

MutableMapping.register(dict)

Imagine you wrote your own C extension that implements a cool linked list implementation. All you have to do to register your list as Sequence is this piece of code:

from _yourlinkedlist import YourLinkedList
from collections import Sequence
Sequence.register(YourLinkedList)

But that’s just one way to use abstract base classes. The nicest feature of them is that they ship a lot of annoying repetitive bootstrapping code. For example in werkzeug I wrote that nice HeaderSet which is basically a sorted case-insensitive set. But what I do not implement is __and__ and all the other set stuff because I’m a) lazy and b) doubt that someone will seriously missing it for the use case of that object. But what if the set behavior would come for free by just subclassing from Set? That’s what’s now possible in Python 3*:

from collections import Set

class HeaderSet(Set):

    def __init__(self, initial=()):
        self._ordering = list(initial)
        self._storage = set(map(str.lower, initial))

    def __contains__(self, x):
        return x.lower() in self._storage

    def __iter__(self):
        return iter(self._ordering)

    def __len__(self):
        return len(self._ordering)

    def __le__(self, other):
        return self._storage.__le__(other)

* actually I lied here. Right now it’s not possible. The basics work but as soon if I do foo & bar I either get a NameError, itertools not defined or a TypeError because __rle__ is not defined. But that’s why it’s an alpha ;-)

And what has this to do with multiple inheritance? In that example it might not be obvious but have a look at the class graph for that HeaderSet:

Set(Sized, Iterable, Container)
    HeaderSet

Very cool stuff. So what can this be used for? Specifying otherwise unspecified protocols. For example it was a common idom in Python 2.x to duck-type accept io like objects. For example in Django it’s pretty common to do something like this:

response = HttpResponse(mimetype='image/png')
my_pil_image.save(response)
return response

But it was pretty much unspecified what PIL calls on that response object. Just write()? write() and writeline()? Does it seek()? Now we have io.IOBase, io.BufferedIOBase and a lot more. This is great stuff and seriously, can’t wait porting my stuff to Python 3.

Sick of pkg_resources Warnings?

Create a sitecustomize.py in your site-packages with the following code:

import warnings
warnings.filterwarnings('ignore',
  message=r'Module .*? is being added to sys.path', append=True)

TextPress Development Blog Online

February 16th, 2008

TextPress has now a dedicated development blog. That means two things. For one development on TextPress is active again and we are looking for help on converters, missing extensions (especially ATOM publishing support) and more.

That development blog is of course running on TextPress ;-)

GHRML — Haml for Genshi

February 15th, 2008

May I introduce: GHRML, the Genshi Human readable markup language. name not final of course, mika would kill me ^^.

What’s GHRML? First of all it’s work in progress. But beside that it’s a pretty cool clone of Haml for Python. But no, it’s not another templating language, just a different representation for genshi markup templates. This way we can reuse all the genshi features like the directives or serializers. This gives you the full capabilities of the genshi templating language in a nice alternative syntax.

So how does it look like? Here a small example:

%html
  %head
    %title Hello World
    %style{'type': 'text/css'}
      body { font-family: sans-serif; }
    %script{'type': 'text/javascript', 'src': 'foo.js'}

  %body
    #header
      %h1 Hello World
    %ul.navigation
      %li[for item in navigation]
        %a{'href': item.href} $item.caption

    #contents
      Hello World!

Most of the syntax is directly stolen from Haml. The only real change is that we don’t need explicitly marked self closing elements because genshi knows that already in the serializer. Additionally you can use brackets for genshi directives like the for-loop above. And the parser works with variable indentation, if you want an indention level of four, just do so, the parser will recognize that.

Sourcecode in the sandbox for those of you who want to try it, some updates later ;-)

Update: Please read the new blogpost which reveals the new location and maintenance of the code.

Python Code Introspection

January 21st, 2008

Georg blogged about sphinx recently and now we can all rejoice because sphinx can now be used with custom documentation (non c-python documentation) too. Unfortunately hand written documentation alone is hard to maintain.

For Werkzeug 0.2 I want to switch to a combination of hand written documentation and source documentation generated from python sourcecode, powered by sphinx of course. The main problem is getting the information out of the sourcecode. Unlike compiled languages in Python a lot is dynamic and much of the functionality is only available at runtime. Unlike PHP objects can be modified later on, and unlike Ruby there are attributes and methods on classes so it’s hard to hide private/protected stuff on a class. epydoc came up with some conventions (especially for rst users) but introspection is still too hard in python and will lead to unexpected results.

A pretty common idom in python is to implement a class in one file and import it to somewhere else, where it will become importable from. One of the best example was Python’s “re” module. Until 2.5 that module was a stub and the actual functions were imported from the “sre” module. This becomes difficult if not impossible to get right for automatically generated modules. Pydoc for example requires __all__ to be defined in that module, otherwise it will hide those objects like it hides every other import.

Another example which is pain in the ass to get right by automatic introspection are function decorators and descriptors. Descriptors also break in pydoc half of the time and nobody seems to know when exactly it breaks. Function decorators are impossible to get right because even if you try to traverse the closure variables of all the wrapped functions up to the original function the result could be terrible wrong because one function in the middle is indeed variadic.

Because it’s that hard to get right I don’t want to lose time on solving something that’s unsolvable anyways. My current idea to make the documentation process a little less annoying I think about adding directives to sphinx that pull information from python objects into rst files in a semiautomatic way.

Say you have a pretty complex module that implements a couple of functions and classes and who knows what. Additionally to the documentation from the docstrings you also want to group the objects by topics and add hand written documentation for every group.

In the python code you document all your functions with nice rst docstrings with the epydoc conventions. Additionally you add group markers:

def escape(s, quote=False):
     """This function escapes strings for HTML documents.

     :param s: the string to escape
     :param quote: set to `True` to escape the quote character too.
     :return: escaped string
     :group: html-helpers
     """"

You can do that with every function and object. To dump the documentation for a group to an rst document you would then use a little directive:

HTML Helpers
============

These functions and classes help you process HTML.

.. autodoc:: group html-helpers [werkzeug.utils]

This will dump all the objects flagged with that group in alphabetical order to the rst document. But of course this breaks for complex use cases you should be able to step back. Imagine you have a pretty complex object and you want to hide some of the methods / attributes, mark missing variables or make a member a descriptor rather than a method if the automatic discovery failed:

Request
=======

summary for the request object goes here.

.. autodoc:: object werkzeug.wrappers.Request

    no_docstring: yes
    extra_members:
        - `_get_file_stream`

But how exactly that should work, I don’t have a good idea yet. But I want to find it quickly because right now I’m pretty unhappy because writing docs both in the sourcecode and in hand written documentation sucks ass.

Update: Fixed Georg’s name :-)

Error Reporting in WSGI Applications

January 18th, 2008

Thanks to an google alert on “pygments” I stumbled accross gluon/web2py, some sort of “enterprise ready framework”. It’s definitively not a framework I would use myself for countless reasons but there is one thing on the feature list which I found interesting. Apparently gluon files tickets for tracebacks in the database. While it’s a terrible idea to put that data into the very same database all your application data goes (what happens if the DB is down?) it’s a different approach to Django which sends mails on errors.

Two days ago someone mentioned the paste WSGI middleware which sends emails on tracebacks, similar to the way Django does that. I’m not a fan of error reporting middlewares because I think that should go into the application. On an server error (caused by lost database connections, programmer errors etc.) you should not present the user a black-and-white error page or display an inline traceback like most PHP applications do. We can do better!

First of all there is a module in the standard library everybody seems to forget about. It’s called “logging” and does exactly that — it logs errors. I don’t know why so many people miss it or just don’t use it, but it’s really one of the good things in the python standard library.

Why is it so good? It’s extensible and configurable. The idea is that the application gets itself a logger and logs into that logger (debug messages, information messages, warnings, errors or tracebacks). Independently from the logging there are logger handlers which do something with the logged messages. For example you can tell the logger to log everything except of debug output into a rotated file and mail all errors to some mail addresses.

In the Werkzeug wiki there is a wiki page about error handling in WSGI applications using the logger module. Total amount of code needed for a simple WSGI application with logging is about ten lines or so. And it’s flexible enough to integrate in every WSGI application setup, no need to solve that in a middleware where you don’t have access to your application’s config or whatever.

And the best thing about it is that you can configure the way errors are handled. Like I mentioned before gluon creates entries in the database for logged errors. I wrote a small logger handler that does the same but for an external trac. Whenever an application error occours the logger checks if there is already a ticket for that error, if not it creates one. For every new occurrence of that bug it will create a comment in that ticket.

If you want to try it out yourself, I added the code to the sandbox repository.

Update: I also created a trac hack for it.

New Stuff in Werkzeug (and the WSGI World)

January 16th, 2008

Unfortunately I’m very busy lately so there are few updates on Werkzeug and all the other libraries I personally contributed (and there are even some patches in my Mail queue I have to apply after reviewing) but that doesn’t mean that there is no progress :-)

There is actually quite a lot of new stuff between the 0.1 release and now. Werkzeug tries to implement stupid stuff you reimplement in every second application in a way that you can use it with minor modifications in your application. Because many of those features only come up if you have implemented them often in your own applications they didn’t make it into 0.1. Thanks to all the early adopters we now have cool new stuff that was implemented because there was need for it.

For example Werkzeug now has RFC-compliant Etag and Cache-Control parsing. You can also generate etags automatically for responses and make them conditional for some requests. The utility module was extended with many new stuff that fix limitations in the standard library or implement long missed functions like finding modules in a package (very useful if you want to automatically register controllers), importing modules by a string (useful if your URL map endpoints are (partial) import paths), generating URLs like trac does by calling an Href object, functions to fix URLs similar to the way Firefox fixes them, dumping and parsing HTTP dates, loading and dumping cookies with a simple function call (and support for http only) and probably a lot more before the actual 0.2 release.

Another cool thing is (unrelated to Werkzeug) that more and more cool modules come up that make web development a charm. While babel is available for quite a while I haven’t really used it until last weak and it’s really great. Together with Werkzeug’s routing system and the ability to do multiple inheritance in SQLAlchemy there are so cool ways to do web development of internationalized applications. I know the last sentence doesn’t make sense without the context, so I guess I have to blog about that. sooo freaking cool.

Form handling in Python is still a bit strange if you are not using django’s newforms (I know that there is formencode but somehow it isn’t what I’m looking for) but there is now WTForms which looks promising. With some more small changes it could become a very cool form handling system (for example I’m missing some default validators at the moment and I’m not completely sure how to pass choices to a select box in a per form instance basis). WTForms was derived from an application that is already in production so it solves already many problems nicely. It’s a library I want to watch closely the next weeks.

And I think that approach should become the way Werkzeug is developed in the future. Implement features a release earlier and mark them as “under consideration” if they are not yet used in production applications. If you adopt them early you can give feedback and we can improve it to the next Werkzeug release and streamline the API.

What’s to do until the next Werkzeug release? Georg is currently working on making the sphinx documentation builder independent from the CPython documentation so that other projects can use it too. I then want to semi-automatically build an API documentation for Werkzeug and combine them with hand written rst pages for the Werkzeug 0.2 documentation. I got some feedback for the Werkzeug docs and looks like they are a bit too chaotic and misleading. Especially getting started with Werkzeug is still too complicated so I hope we can address this with a new documentation that combines automatically generated documentation with tutorials.

The documentation tool could probably be useful for other projects too, I guess Georg will drop some lines in his blog once it’s ready.

Updates regarding Jinja will be up shortly, there is currently a branch developed by Lakin Wecker to speed up Jinja template evaluation. And if you already know what GHRML/XAML will gonna be: I will try to get that running this weekend.

That’s it for the moment ;-)

cogitations driven by wordpress