Archive for January, 2008

Mercurial for Subversion Users

January 28th, 2008

More and more projects are switching over to mercurial or similar DVCS. Great as mercurial is, it’s hard to get started if you are used to subversion because the concept behind Subversion (svn) and mercurial (hg) is fundamentally different. This article should help you understand how mercurial and similar systems work and how you can use it to contribute patches to the pocoo projects.

If you compare Subversion to mercurial you won’t find that many similarities beside the command arguments. Subversion works like FTP whereas mercurial is bittorrent. In Subversion the server is special: it keeps all the revision log and all the operations require a connection to this server. In mercurial I can take down the central repository if there is one an all developers will still be able to exchange changes. All the revision information is available to anyone and there is absolutely no difference between server and clients.

This fundamental design decision means that there are dozens of separate branches of the code. hg makes it easy to merge and branch and it’s developed exactly for that. In Subversion branching and merging is painful an often people just don’t branch and don’t commit there changes until the testsuite etc. passes again which of course results in huge changesets. But let’s step right into it!

The first thing in Subversion you do is either creating a repository on the server or checking it out on the client. In hg there is no difference between server and client so the process of creating a repository is available to everybody. Creating a repository is just as simple as typing “hg init name_of_the_repository”. If that folder does not exist yet it will create an empty folder and initialize it as root of the repository, otherwise it will create the repository in the name of that folder.

The process of checking out is a bit different from Subversion because it’s effectively the same as creating a branch. Say you want to check out the current Pygments version to do some changes. The first thing you will do is looking for a way to access this repository. There are three very common ways to access it: filesystem, HTTP or SSH. Pygments is available as SSH and HTTP, but for non core developers only HTTP is available. Interestingly quite a few people have problems locating the checkout URL which is not very surprising because hgweb handles that. hgweb is the standard mercurial web interface which doesn’t only provide a way to look at the changesets and tree but also handles patch exchange. In the case of Pygments this command should give you a fresh checkout in a few seconds into the new folder “pygments”:

hg clone http://dev.pocoo.org/hg/pygments-main pygments

One thing you will notice is that it’s incredible fast and even though the repository contains the whole history the checkout is pretty small. By the time I’m writing this blog post the pygments sourcecode including the unittests and example sourcecode without the revision history is 2.5MB. A complete mercurial checkout is only 5MB even though it includes 486 changesets.

After you got your very own repository by cloning the pygments one you will notice that all the subversion-like commands (”hg ci”, “hg add”, “hg up”, …) work locally only. You check into your local version of the repository and hg up won’t incorporate remote changes. One of the things that happen on hg clone is that mercurial will set the path to the repository you cloned from into the hgrc of the newly created repository. This file (”.hg/hgrc”) is used to store per-repository configuration like the path of remote repositories, the name used for checkins, plugins that are only enabled for this repository and more. Executing “hg pull” will automatically pull changes from this remote repository and put them into the current repository as second branch. To see what “hg pull” will pull from that remote repository you can execute “hg incoming” and it will print a list of changesets that are in the remote repository but not yet in the local one. After you have pulled you have to update the repository with “hg up” so that you can actually see the changes. If there were remote changes that require merging you have to “hg merge” them and “hg ci” the merge.

Because this process is very common there are ways to simplify it. “hg pull && hg update” can be written as “hg pull -u”. All the commands (pull, update, merge and checkin if required) can be handled in one go using “hg fe”. This command however is part of a plugin which is disabled by default. If you want to use it you have to add the following lines into the repository hgrc or your personal one:

[extensions]
hgext.fetch=

The other important difference to subversion is how you push your changes back to the server. In open source projects usually only a small number of developers has access to the main repository and contributors create patches using “diff” or “svn diff” and mail it to one of the persons with commit rights or attach it to a ticket in the project’s tracker. If you are a person with push privileges you can do “hg push” and it will push the changesets which are not yet on the server (you can look at them using “hg outgoing”). If you don’t have push access you can create a bundle of changes and attach that to a ticket rather than a patch. A bundle stores multiple changesets in one file and it also preserves the correct author information and timestamps. Another way is mailing the changes to a different developer using the patchbomb extension (I won’t cover that here, just google it up). Or you can let other people pull from your repository. Therefore you either have to configure your apache to server a hgweb instance or you just call “hg serve” and it will spawn a server on localhost:8000 everybody can pull from.

Once the developer has decided to put your changes into the central repository and pushed them, your changes will appear there unaltered and with the same revision hashes. What will be different is the local number the changeset is given. If the revision was called deadbeef:42 locally it could be called deadbeef:52 on the server because different changesets were applied first.

All the commands that interact with remote repositories (”hg pull”, “hg push”, “hg fe”, …) also take a different path than the default path from the hgrc as argument. This allows you to pull changes from repositories shared over the web.

A cool example what mercurial allows you to do is our last ubuntuusers webteam meetup. There we used my notebook to store the central repository and everybody pushed the changes every once in a while to it. Additionally some people exchanged patches to not yet working features among each other so that the code on the central repo was seldom broken. When I left everybody had all the changes locally because they pulled and I could remove my notebook and everybody continued working on their way home. When we met again on IRC I copied my repo on the server and everybody pushed their local changes to it.

ubuntuusers Webteam Meeting

January 28th, 2008

The last four days I attended our first offline-webteam meeting and it was fun ;-) We spend a few days hacking on various parts of the ubuntuusers software and had a great time. While it’s pointless to talk about what we did because hacking on broken code is not that much fun as such (unless you can laugh about PHP bugs and stuff like that) I think we can present some of the changes with the upcoming hardy release.

Don’t expect too much but that shows that we are doing something even if you think development on the German portal software is stalled. That’s certainly not the case.

But we are once again looking for web designers. The sourcecode is multilingual and the development takes place in both German and English for potential future translations to other languages than German, so you can participate even if you are not speaking German :-) If you are interested in participating and if you have advanced XHTML1/HTML4/CSS2 skills, we want you for ubuntuusers webteam!

If you are interested mail me at mitsuhiko you know what comes next ubuntu dot com.

No…

I will not blog about it, because it just sucks.

Python Code Introspection

January 21st, 2008

Georg blogged about sphinx recently and now we can all rejoice because sphinx can now be used with custom documentation (non c-python documentation) too. Unfortunately hand written documentation alone is hard to maintain.

For Werkzeug 0.2 I want to switch to a combination of hand written documentation and source documentation generated from python sourcecode, powered by sphinx of course. The main problem is getting the information out of the sourcecode. Unlike compiled languages in Python a lot is dynamic and much of the functionality is only available at runtime. Unlike PHP objects can be modified later on, and unlike Ruby there are attributes and methods on classes so it’s hard to hide private/protected stuff on a class. epydoc came up with some conventions (especially for rst users) but introspection is still too hard in python and will lead to unexpected results.

A pretty common idom in python is to implement a class in one file and import it to somewhere else, where it will become importable from. One of the best example was Python’s “re” module. Until 2.5 that module was a stub and the actual functions were imported from the “sre” module. This becomes difficult if not impossible to get right for automatically generated modules. Pydoc for example requires __all__ to be defined in that module, otherwise it will hide those objects like it hides every other import.

Another example which is pain in the ass to get right by automatic introspection are function decorators and descriptors. Descriptors also break in pydoc half of the time and nobody seems to know when exactly it breaks. Function decorators are impossible to get right because even if you try to traverse the closure variables of all the wrapped functions up to the original function the result could be terrible wrong because one function in the middle is indeed variadic.

Because it’s that hard to get right I don’t want to lose time on solving something that’s unsolvable anyways. My current idea to make the documentation process a little less annoying I think about adding directives to sphinx that pull information from python objects into rst files in a semiautomatic way.

Say you have a pretty complex module that implements a couple of functions and classes and who knows what. Additionally to the documentation from the docstrings you also want to group the objects by topics and add hand written documentation for every group.

In the python code you document all your functions with nice rst docstrings with the epydoc conventions. Additionally you add group markers:

def escape(s, quote=False):
     """This function escapes strings for HTML documents.

     :param s: the string to escape
     :param quote: set to `True` to escape the quote character too.
     :return: escaped string
     :group: html-helpers
     """"

You can do that with every function and object. To dump the documentation for a group to an rst document you would then use a little directive:

HTML Helpers
============

These functions and classes help you process HTML.

.. autodoc:: group html-helpers [werkzeug.utils]

This will dump all the objects flagged with that group in alphabetical order to the rst document. But of course this breaks for complex use cases you should be able to step back. Imagine you have a pretty complex object and you want to hide some of the methods / attributes, mark missing variables or make a member a descriptor rather than a method if the automatic discovery failed:

Request
=======

summary for the request object goes here.

.. autodoc:: object werkzeug.wrappers.Request

    no_docstring: yes
    extra_members:
        - `_get_file_stream`

But how exactly that should work, I don’t have a good idea yet. But I want to find it quickly because right now I’m pretty unhappy because writing docs both in the sourcecode and in hand written documentation sucks ass.

Update: Fixed Georg’s name :-)

Error Reporting in WSGI Applications

January 18th, 2008

Thanks to an google alert on “pygments” I stumbled accross gluon/web2py, some sort of “enterprise ready framework”. It’s definitively not a framework I would use myself for countless reasons but there is one thing on the feature list which I found interesting. Apparently gluon files tickets for tracebacks in the database. While it’s a terrible idea to put that data into the very same database all your application data goes (what happens if the DB is down?) it’s a different approach to Django which sends mails on errors.

Two days ago someone mentioned the paste WSGI middleware which sends emails on tracebacks, similar to the way Django does that. I’m not a fan of error reporting middlewares because I think that should go into the application. On an server error (caused by lost database connections, programmer errors etc.) you should not present the user a black-and-white error page or display an inline traceback like most PHP applications do. We can do better!

First of all there is a module in the standard library everybody seems to forget about. It’s called “logging” and does exactly that — it logs errors. I don’t know why so many people miss it or just don’t use it, but it’s really one of the good things in the python standard library.

Why is it so good? It’s extensible and configurable. The idea is that the application gets itself a logger and logs into that logger (debug messages, information messages, warnings, errors or tracebacks). Independently from the logging there are logger handlers which do something with the logged messages. For example you can tell the logger to log everything except of debug output into a rotated file and mail all errors to some mail addresses.

In the Werkzeug wiki there is a wiki page about error handling in WSGI applications using the logger module. Total amount of code needed for a simple WSGI application with logging is about ten lines or so. And it’s flexible enough to integrate in every WSGI application setup, no need to solve that in a middleware where you don’t have access to your application’s config or whatever.

And the best thing about it is that you can configure the way errors are handled. Like I mentioned before gluon creates entries in the database for logged errors. I wrote a small logger handler that does the same but for an external trac. Whenever an application error occours the logger checks if there is already a ticket for that error, if not it creates one. For every new occurrence of that bug it will create a comment in that ticket.

If you want to try it out yourself, I added the code to the sandbox repository.

Update: I also created a trac hack for it.

New Stuff in Werkzeug (and the WSGI World)

January 16th, 2008

Unfortunately I’m very busy lately so there are few updates on Werkzeug and all the other libraries I personally contributed (and there are even some patches in my Mail queue I have to apply after reviewing) but that doesn’t mean that there is no progress :-)

There is actually quite a lot of new stuff between the 0.1 release and now. Werkzeug tries to implement stupid stuff you reimplement in every second application in a way that you can use it with minor modifications in your application. Because many of those features only come up if you have implemented them often in your own applications they didn’t make it into 0.1. Thanks to all the early adopters we now have cool new stuff that was implemented because there was need for it.

For example Werkzeug now has RFC-compliant Etag and Cache-Control parsing. You can also generate etags automatically for responses and make them conditional for some requests. The utility module was extended with many new stuff that fix limitations in the standard library or implement long missed functions like finding modules in a package (very useful if you want to automatically register controllers), importing modules by a string (useful if your URL map endpoints are (partial) import paths), generating URLs like trac does by calling an Href object, functions to fix URLs similar to the way Firefox fixes them, dumping and parsing HTTP dates, loading and dumping cookies with a simple function call (and support for http only) and probably a lot more before the actual 0.2 release.

Another cool thing is (unrelated to Werkzeug) that more and more cool modules come up that make web development a charm. While babel is available for quite a while I haven’t really used it until last weak and it’s really great. Together with Werkzeug’s routing system and the ability to do multiple inheritance in SQLAlchemy there are so cool ways to do web development of internationalized applications. I know the last sentence doesn’t make sense without the context, so I guess I have to blog about that. sooo freaking cool.

Form handling in Python is still a bit strange if you are not using django’s newforms (I know that there is formencode but somehow it isn’t what I’m looking for) but there is now WTForms which looks promising. With some more small changes it could become a very cool form handling system (for example I’m missing some default validators at the moment and I’m not completely sure how to pass choices to a select box in a per form instance basis). WTForms was derived from an application that is already in production so it solves already many problems nicely. It’s a library I want to watch closely the next weeks.

And I think that approach should become the way Werkzeug is developed in the future. Implement features a release earlier and mark them as “under consideration” if they are not yet used in production applications. If you adopt them early you can give feedback and we can improve it to the next Werkzeug release and streamline the API.

What’s to do until the next Werkzeug release? Georg is currently working on making the sphinx documentation builder independent from the CPython documentation so that other projects can use it too. I then want to semi-automatically build an API documentation for Werkzeug and combine them with hand written rst pages for the Werkzeug 0.2 documentation. I got some feedback for the Werkzeug docs and looks like they are a bit too chaotic and misleading. Especially getting started with Werkzeug is still too complicated so I hope we can address this with a new documentation that combines automatically generated documentation with tutorials.

The documentation tool could probably be useful for other projects too, I guess Georg will drop some lines in his blog once it’s ready.

Updates regarding Jinja will be up shortly, there is currently a branch developed by Lakin Wecker to speed up Jinja template evaluation. And if you already know what GHRML/XAML will gonna be: I will try to get that running this weekend.

That’s it for the moment ;-)

How to fix Python

For a long time I thought I’m the only person disliking the stdlib, but apparently there are more :-) Not just that every library has it’s own code style, it also has different names, some are implemented insanely bad (Cookie.py anyone?) or Javaish (threading, unittest) or just plain stupid (codeop). I know we can’t just get rid of that stdlib, wait a moment, why can’t we? Python 3 is upcoming and we all have to adapt our libraries and applications anyways.

It wouldn’t be too hard actually. Python loads modules from zip files anyways. Just move all the stuff into a zip file, move it into “old” or somethign and old applications just have to alter their imports “from cgi import escape” to “from old.cgi import escape”. Or the other way round, the old libraries stay where they are and the new stdlib goes into “py.*” or “std.*”.

Insane or a good idea? You be the judge.

Python Template Engine Comparison

January 1st, 2008

I was small-talking with zzzeek about some things when I told him that I’m using Jinja, Genshi and Mako depending on what I’m doing. He told me that it’s unusual to switch tools like that but I don’t think it’s that unusual.

All three template engines are totally different but have a one thing in common: All three are the “second generation” of template engines. Genshi is the formal successor of kid, Mako somewhat replaced Mygthy and Jinja was inspired by the django templates. All three of them are framework agnostic, use unicode internally and have a cool API you can use in WSGI applications without scratching your head. But what inspired those template engines and which template engine to chose for which situation?

I often used PHP in the past to do simple header/footer inclusion. But what always drove me nuts was that I had to use mod_rewrite to get nice URLs or use a bunch of folders with index.php files or use files and folders and drop the extension in the apache config. While this is nice, this is now that portable and you can’t have dynamic parts in the URL and once you want some more dynamic stuff such as RSS feeds etc. you notice that you made a mistake by choosing PHP. Some days ago I then started working on the website for TextPress (not yet online) and wanted to try something new: I wrote a tiny WSGI application (about 50 lines of code) that just uses werkzeug’s routing system and uses template names as endpoints. These templates are then loaded with Mako, rendered and returned as responses. This is not possible in the same way with Jinja because you don’t have python blocks and not so simple and straightforward with Genshi because you have to think about XML or use a rather limited text based template engine. Another very cool feature of Mako is that you can do dynamic inheritance which is not possible in Jinja.

Mako is a great template engine if you know Python, if you need some logic in templates (and you know: logic in templates is not bad in every situation) and if you need the best performance. Without a doubt Mako is one of the fastest template engines for Python and the fastest template engine that does not use a C extension.

Then there is Jinja which is also a text based template engine like Mako. However the focus is on a completely different level. When Mako is like PHP, Jinja is like Smarty (even though Mako is a million times better than PHP as template engine). When I stated working with Python as programming language for web applications I stumbled about django. I looked at the template engine and thought: WTF is that? The syntax seemed odd and the restrictions ridiculous. Later on I loved the syntax (and apparently others do to: the mini template engine by Ian Bicking (tempita if I recall correctly) and the Genshi text templates are using that syntax or a similar one too) but some of the restrictions seem still ridiculous. When I looked at all those Django templates I created over the time I noticed that I often moved calculations into template tags that could be function calls, that I did other calculations in in the view functions that did not belong there and even more important: that you could replace 95% of the custom template tags with function calls or function calls with an enclosed template block if the template engine had proper expressions. This lead to the development of what is now known as Jinja. The syntax, the fact that it’s sandboxed and the designed friendliness is still very similar to Django, but unlike Django python like expressions are possible in Jinja.

I’m using Jinja wherever I think web designers want to work on later on. For example as template engine for TextPress or other applications that should be styled by third party web designers.

Genshi on the other hand is an XML template engine. As a result of that it’s slower but also “context aware”. It knows when it’s processing a CDATA section, it knows when it’s inside a tag or an attribute etc. This makes it possible to defend XSS in an automatic way. Per default Genshi inserts the text into the output stream as text and not as markup. That means all the HTML entities are automatically escaped for you. And because it’s stream based you can rewrite streams during the rendering process. This makes it possible to fill form fields automatically, use XInclude for simple layout templates and a lot more. You can even translate your XML based templates into HTML4 on the fly. So you can use your XML tool chain internally and output HTML4 and use the best of both worlds. But because of this high flexibility Genshi also has some problems to fight: You need to have XML knowledge to use it. No problem if you are a programmer, but not that good if you are a web designer doing fancy layouts. You are also forced to use XML templates everywhere. It’s true that Genshi has text templates too to fill the gaps, but they are not comparable with real text template engines and you are still operating on an XML stream, just that you don’t see it. And lastly: this whole stream processing makes Genshi slow. Not so slow that you can’t use it for big applications, but noticeably slower than Mako or Jinja.

If you are using XML anyways in your application, Genshi is a very good idea. Also if you don’t have template designers that don’t know XML or if performance is not that much of a problem. Most of the time the bottleneck is the database anyways. I never had real problems with Genshi performance so far.

I hope this post sums up why I’m using all three template engines and why I think we should be happy that we can chose between a couple of template engines :-) Why I’m not covering other template engines like Cheetah or SimpleTAL? Mostly because I looked at them, tried them out and never used them for something big. Mostly because Mako looks a lot nicer than Cheetah to me and SimpleTAL is far too much away from Python for me.

cogitations driven by wordpress