Archive for November, 2007

Listen to more…

Inina Gap. Which you probably haven’t heard by now. It’s not rock, metal or any other stuff I normally listen to, in fact it’s completely different, more like electronic fusion. And disclaimer: I a) saw them live, and b) a friend of mine is playing bass in that band :-)

Nonetheless: Check it out.

220…

…dead. And that’s a non lethal weapon?

update: Georg points out that it’s called a “less lethal” weapon.

Wiki Models with SQLAlchemy

November 22nd, 2007

I was working on an example application for the upcoming Werkzeug release and decided to make a wiki for that purpose. (The main reason is that I found the Creoleparser on the pypi index and thought that it would integrate into a genshi powered wiki well)

Now wikis have an interesting data structure. Basically you have pages and revisions, where a revision is bound to exactly one page. Simple to model and my database definition looks like that:

page_table = Table('pages', metadata,
    Column('page_id', Integer, primary_key=True),
    Column('name', String(60), unique=True)
)

revision_table = Table('revisions', metadata,
    Column('revision_id', Integer, primary_key=True),
    Column('page_id', Integer, ForeignKey('pages.page_id')),
    Column('timestamp', DateTime),
    Column('text', String),
    Column('change_note', String(200))
)

Very simple schema and you can easily bind it to two classes:

class Revision(object):
    def __init__(self, page, text, change_note='', timestamp=None):
        if isinstance(page, (int, long)):
            self.page_id = page
        else:
            self.page = page
        self.text = text
        self.change_note = change_note
        self.timestamp = timestamp or datetime.utcnow()

    def render(self, request=None):
        """Render the page text into a genshi stream."""
        if request is None:
            request = get_request()
            if request is None:
                raise RuntimeError('rendering requires request context')
        return parse_creole(request, self.text)

class Page(object):
    def __init__(self, name):
        self.name = name

    @property
    def title(self):
        return self.name.replace('_', ' ')

Session.mapper(Revision, revision_table)
Session.mapper(Page, page_table, properties=dict(
    revisions=relation(Revision, backref='page', order_by=[desc(Revision.revision_id)])
))

Works pretty well but as soon as you start putting your stuff into the template it feels unpythonic :-) You basically have to provide always two objects, the revision and the page. Why not combine that into one model? First I removed my database tables again and combined that into one table but then I found out that you can map joins to classes in SQLAlchemy. That and Python’s ability to do multiple inheritance gives me the ability to combine both tables into one class:

class RevisionedPage(Page, Revision):
    pass

Session.mapper(RevisionedPage, join(page_table, revision_table), properties=dict(
    page_id=[page_table.c.page_id, revision_table.c.page_id],
))

Very nice for the templates and also easy to query. It looks like a normal python class. I haven’t tried if that also works for write access, but I doubt it. Because of that I have added an exception to the __init__ method to avoid creating pages and revisions via the RevisionedPage object.

The full example can be found in the simplewiki sources.

Convert an internal iterator into an external in Python

from py.magic import greenlet

def make_iterator(func):
    g1 = greenlet.getcurrent()
    g2 = greenlet(lambda: func(lambda item: g1.switch((item,))), g1)
    while 1:
        rv = g2.switch()
        if not rv:
            return
        yield rv[0]

Example usage:

def my_internal_iterator(f):
    for item in xrange(10):
        f(item)

iterator = make_iterator(my_internal_iterator)
iterator.next() # yields 0
iterator.next() # yields 1

convert a Request.write() into a WSGI yield

I tried that for a long time now using python generators with yield and send threads and much more. But I never got anything that looked easy to understand and worked at the same time. The problem basically occurs if you have an old python web application that has some sort of request object with a write method that directly writes to the output stream of the server interface (sys.stdout or some sort of fastcgi/mod_python output stream object) and you want to convert the application to WSGI. Take the following piece of code from an imaginary legacy application:

def old_application(request):
    from time import sleep
    request.header('Content-Type: text/html')
    for x in xrange(10):
        request.write(str(x) + ' ')
    request.flush()
    request.write('<br>And the next flush takes another second<br>')
    request.flush()
    sleep(1)
    request.write('And done!')

Say we want to convert that into a WSGI application with those semantics:

def wsgi_application(environ, start_response):
    from time import sleep
    start_response('200 OK', [('Content-Type', 'text/html'])
    yield ' '.join(str(x) for x in xrange(10)) + ' '
    yield '<br>And the next flush takes another second<br>'
    time.sleep(1)
    yield 'And Done!'

The main problem is the time.sleep and all those flush calls. That means we cannot just buffer the contents but convert the into a generator on the fly. What we need to get that done are either coroutines, greenlets or two threads that communicate with each other. The easiest and I guess also fastest approach are greenlets.

Here a function that converts an old legacy application like above into a WSGI application with the same semantics:

from py.magic import greenlet

class Request(object):

    def __init__(self, environ):
        self._parent = greenlet.getcurrent()
        self.environ = environ
        self.status = '200 OK'
        self.headers = []

    def header(self, item):
        self.headers.append(tuple(item.split(':', 1)))

    def write(self, text):
        self._parent.switch(('write', text))

    def flush(self):
        self._parent.switch(('flush', None))

def convert_app(application):
    def wsgi_app(environ, start_response):
        request = Request(environ)
        buffer = []
        headers_sent = []

        def flush():
            if not headers_sent:
                start_response(request.status, request.headers)
                headers_sent.append(True)
            data = ''.join(buffer)
            if data:
                yield data
            del buffer[:]

        def run():
            application(request)
            request.flush()

        g = greenlet(run, request._parent)
        while 1:
            rv = g.switch()
            if not rv:
                break
            signal, value = rv
            if signal == 'flush':
                for item in flush():
                    yield item
            elif signal == 'write':
                buffer.append(value)
    return wsgi_app

Now that’s a bunch of code. Let’s go step by step through it. The first thing we do is creating a request class. This class should resemble the old request object as much as possible. All methods can work like they did before, the only differences are the write and flush methods. Those switch back to the parent greenlet (which is the greenlet that generated the request object, usually the main greenlet) and send some data to it (namely the name of the method and the argument). Whenever python encounters this statement it stops the execution and goes back to the point that switched into this greenlet. This point is in our example in a loop that generates a generator for our WSGI application.

That leads us to the convert_app function that is passed and old legacy application and returns a new WSGI application. Inside this new WSGI application we create a new request object, pass it the WSGI environment and create some objects and functions we need so that we can process the data from the greenlets and convert it into a valid WSGI response: A buffer for unsent data, a list we use a sentinel for sent data, a flush method that returns a generator with the data from the buffer, starts the response and cleans the buffer, and a run method that invokes the old application and calls request.flush() after the application has finished so that we don’t have to do that in the application itself.

The mainloop after that basically switches between application greenlet and main greenlet until the return value of switch is None (that is the case if the application closed or someone switched into the main greenlet without arguments, which we don’t do). If that is the case we return, otherwise we check if it’s a flush or write call and handle that.

To launch the converted application with wsgiref all we have to do are those three lines of code:

from wsgiref.simple_server import make_server
srv = make_server('localhost', 5000, convert_app(old_application))
srv.serve_forever()

Basically greenlets would make it possible to host mod_python applications inside arbitrary WSGI servers. Maybe in the future someone writes a module that allows us to convert some of the mod_python applications into WSGI applications without touching existing application code.

Broken iPod Harddisk is not necessarily Broken

November 18th, 2007

My brother owns a 5th Generation Video iPod and since some time the iPod was freezing on some songs and I was unable to sync it properly. After about 2 GB transfered data OS X ejected the iPod and sometimes iTunes locked up in a way that the only way to kill iTunes and get working USB ports again was restarting the notebook.

However Graham sent me a link yesterday to some articles where people encountered the same problems. Among some solutions I haven’t tried (like opening the ipod and putting a small pile paper behind the harddisk) there was one that proposed zeroing out the harddisk.

I did that and it didn’t help. However I noticed that the disk ran a lot more silent than before. So I continued that process 7 times over the night and this morning I was able to reload the iPod without a problem. After a complet resync I tried all the songs on the device that made problems before and those pass. Of course now other songs could be broken but beside that it runs a lot faster now I haven’t encountered any “funny harddisk sounds” during the testing.

Definitively worth a try. Don’t give up on iPods that fast :-)

Jinja 1.2 Released

November 17th, 2007

Finally Jinja 1.2 is released. It took a long time because this release introduces basically a complete rewrite of the parser and parts of the lexer. With Jinja 1.2 onwards the template engine has its own parser which makes it possible to introduce new syntactical elements and change operator precedence (especially regarding the filter operator which finally binds harder than plus). We added more unittests to check if Jinja stays compatible with the python semantics we had previously and all 153 tests pass :-)

With the new parser we also simplified the syntax a bit, made the “call” a keyword now and added a couple of new features. For example the debugger (with the optional c extension compiled) is now able to rewrite all jinja frames in a traceback so you will never find yourself in the situation where line numbers don’t match the template line numbers.

Another new syntax feature are conditional expressions derived from python2.5 ({{ foo if expr else bar }}). foo.0 is now the same as foo[0] for compatibility with django and tuples are finally tuples and not lists which makes it possible to use string formattings with more than one value. Test functions with one parameter don’t need parentheses so {{ foo is feeling(well) }} can now be written as {{ foo is feeling well }}.

We also added a concatenation operator that converts all arguments into a string automatically to concatenate them. This is very useful if you have some integers and other values you want to join: {{ foo ~ bar ~ baz }}, which is equivalent to {{ foo|string + bar|string + baz|string }}.

Another great new feature (if you are a django user) is the new django support module which makes it a lot easier to integrate Jinja into django. It makes it possible to use django filters in Jinja and load from the same folders.

Aside from that there are couple of improvements and bugfixes. As always you can easy install Jinja directly from pypi or download it.

What about… Pocoo?

November 16th, 2007

If you have a look at the current pocoo homepage or the trac you will find out that the project is more or less unmaintained right now. There are many reasons for that but the most important one is that we will probably rework some parts of the webpage in the next time. Why is it like that? About two years ago I was working together with some other people (some of them are now part of the pocoo team) on the German ubuntu portal ubuntuusers.de. We have used and still use a heavily modified phpBB, MoinMoin wiki engine and some other components implemented as django pre-magic removal. Back then WSGI was something that just was a PEP and no implementation.

My big plan was to create a replacement for phpBB in Python and integrate it with Moin. However, two years later it looks like that this is not the best solution either. The main problem with have on ubuntuusers is not, that we have different programming languages, we have different auth systems, templates, databases, storage systems etc. One minor change in a template requires modifications in three different applications.

While we don’t have a solution for ubuntuusers *yet* it turns out that pocoo won’t be one either. However pocoo was and is no failure. One the long path to pocoo we created some good and not so good libraries. There was Colubrid which was one of the first WSGI dispatchers out there and it’s successor Werkzeug which will have it’s first release in two weeks. There is Jinja which is now used on a couple of big django projects, there is Pygments which soon became the favorite sourcecode highlighter of many developers, there will be TextPress around this Christmas, which will be the first big python blog engine as far as I know. Georg Brandl also started the new python documentation project in the pocoo repositories where some of us contributed some code or knowledege.

But what will happen to pocoo? Right now nothing, just that we will probably replace the pocoo.org index page with something that points directly to the projects. After christmas (in fact after the TextPress release) I want to have another look at the pocoo sourcecode and make parts of the code standalone packages. For example the config file parser. And after all that half of the team will have finished a different project which currently takes some of our time and continue working on the pocoo forum, probably with a different name and finally based on all the libraries that evolved around it.

Why I’m writing that today? Around that date two years ago I was setting up the pocoo.org server and reading the WSGI specs. From then on two years of hacking on multiple pieces of software where some of them became popular enough that people are using them :-)

So in that sense: Thanks Georg, Alexander, Benjamin, Christopher, Lukas, Ferdinand, Tassilo and all the others that worked on the pocoo projects :-)

Python Wishlist

  • new, non-flat stdlib that follows PEP 8
  • absolute imports only
  • coroutines
  • assignments as expressions
  • better scoping
  • a with-statement that executes blocks instead of wrapping code

The Latest Downtime

November 15th, 2007

From yesterday 18:00 until now pocoo.org and all the related domains (including wiki.python.de and pygments.org) where down because we moved all the xen instances to a new hetzner server. However the RAM usage by the instances is still unchanged so there will be another small downtime in the next few days. And small is ~20 minutes. Because I expected that the migration will take less time I have to postpone the upcoming Jinja release until Saturday.

Additionally we’re struggling with a mod_wsgi bug that is probably not mod_wsgi’s fault. Under certain conditions a C extension seems to not release the GIL which the result that one apache process consumes all the processor power available.

Sorry for the inconveniences caused.

Update: resource relocation done (for the moment at least)

cogitations driven by wordpress