Cogitations

Apologies to Github

A few days ago I wrote a review about bitbucket and one part of the blog post was about how I was annoyed by the GitHub admins forcing people into using GitHub for hosting. Here the section in all its shaming glory:

When GitHub appeared on the internets for the first time, there was a short period of time when I saw the admins of that site jump into many IRC channels of projects that were using git already to switch to GitHub for hosting.

Turns out that either my mind played a trick on me, or I screwed up earlier and mixed up persons. Fortunately IRC logs exist to disprove what I wrote earlier.

In the discussions I could grep in my IRC logs it were always non GitHub administrators that tried to convince the project owners to switch to GitHub and in one situation it was one GitHub author that briefly joined the discussion to explain the benefits of GitHub.

So I screwed up and my negative bias towards GitHub was mostly based on something my mind made up. Because of that I want to apologise to the GitHub team for the false claims I made earlier. I hope I can somehow made up for that somehow.

Ich find anscheinend nicht nur die Spitzenkandidaten scheiße…

September 20th, 2008

…sondern auch das Parteiprogramm:
eine Statistik von der Wahlkabine wo alle Parteien im Minus sind
Toll :-/

Why Jinja is not Django and why Django should have a look at it

September 16th, 2008

Today I wrote a little Django to Jinja2 template converter. While it can translate most of the builtin template tags into Jinja constructs it doesn’t fully automate the process because you have to extend it for your own custom tags and it doesn’t adapt your templates to the changed semantics.

And these differences in semantics (and the underlying architecture) are something I want to discuss a bit here. Whenever someone mentions Jinja in the Django IRC channel you can be pretty sure that someone else will write something like “… if you don’t have your logic under control” into the channel and position Jinja in the corner where failed concepts lurk around. Of course Jinja leaves more room for abuse than Django does… But this time this isn’t actually what I want to talk about here :)

First of all a small disclaimer: This article covers Jinja 2.0 and Django 1.0.

Lexing

If you compare Jinja and Django template system internals you have a lexer in both of them. The lexer basically breaks the template into small pieces for easier processing. But that’s where the similarities end because the lexers operate on very different levels. Take the following template as a simple example:

Hello {{ name|upper }}!

This is one of those templates that look and work exactly the same in Jinja and Django. First have a look what tokens the Jinja2 tokenizer yields:

>>> from jinja2 import Environment
>>> for token in Environment().lex("Hello {{ name|upper }}!"):
...  print token
... 
(1, 'data', u'Hello ')
(1, 'variable_begin', u'{{')
(1, 'whitespace', u' ')
(1, 'name', u'name')
(1, 'operator', u'|')
(1, 'name', u'upper')
(1, 'whitespace', u' ')
(1, 'variable_end', u'}}')
(1, 'data', u'!')

And here what Django outputs:

>>> from django.template import Lexer, StringOrigin
>>> origin = StringOrigin("Hello {{ name|upper }}!")
>>> for token in Lexer(origin.source, origin).tokenize():
...  print token
... 
<Text token: "Hello ...">
<Var token: "name|upper...">
<Text token: "!...">

So as you can see, whereas Jinja creates very tiny bits of the input string, Django only distinguishes between four different kinds of tokens: text, variables, blocks and line comments. While this is a lot easier to implement for the developer of the template engine, it doesn’t have any advantages over the concept Jinja has chosen. It actually has a lot of negative side effects. For example it’s impossible to write {{ '{% a block in a variable %}' }} in Django. (I know you can use templatetag openblock and templatetag closeblock, but beautiful is something else). It also has the huge disadvantage that tag has to split up the contents of the tag itself which often causes different semantics and syntactic specialities in tags and that for the developer of such a tag it’s hugely more work to do that. The former is probably the worse part of it. For example the url tag in Django takes arguments separated by commas (that are not even allowed to be followed by whitespace) but cycle expects arguments to be separated by whitespace.

The root of the problem is definitively the weak lexer of the Django template engine and I really think that should be replaced by something that yields proper tokens. That would simplify things for tag developers a lot and also lead to a more intuitive experience for template designers that can expect the same basic syntax rules everywhere.

Parsing

The next step is coverting those tokens into meaningful elements. That’s what people refer to as “parsing” usually. Jinja2 has very basic grammatical rules that can be parsed with a simple LL(1) parser (I think it’s LL(1), but don’t ask me, I’m not a compiler guy). The parser goes through the stream of incoming tokens from the lexer and converts those into logical nodes that belong together. For example if you have the template {{ 1 + 2 + 3 }} and the “cursor” of the parser is right before the first digit in the simple calculation, the parser parses this into Add(Add(Const(1), Const(2)), Const(3)). This is useful because the developer of a custom tag doesn’t have to deal with that, the Parser already knows how an Expression looks like. Now you could argue that calculations don’t belong into templates and my point is not valid, but even in the Django template language you have expressions.

The only expression Django knows about are filter expressions. In Jinja2 the parser converts {{ var|escape|upper }} into a proper filter node for you. Django provides a TokenParser for that which can do something very similar. However that parser is not used in every tag and has it’s limitations too. Furthermore was that parser introduced long after the initial implementation of the template language which means that many core tags don’t use it. Because in Jinja it’s a matter of calling parser.parse_expression() to get an expression called, the same requires a lot more typing and checking in Django. A lot of the tags that lurk around in various pastebins or websites don’t even support filters but only variables in some places. Even worse, some people are evaluating the part between the block braces using eval() against the context object.

Again, this simple design of the parser helps nobody but the developers of the template engine. I’ve seen enough Django projects by now that have to write their own template tags because the core tags just don’t do what they need, and in any case the process of developing the tag was more painful than it had to be.

With a newly implemented lexer that yields all tokens of a block or variable one after another a new parser could be implemented based on the design of the Jinja one. And by doing that one has the chance to specify some operators. Nobody is harmed if the templating language supports {% user.karma >= 20 and user.karma < 40 %} and that hardly counts as logic in templates.

Compilation

This step is the step that Django is missing. After the parser assembled tree of blocks and variables and text and everything (called an abstract syntax tree), Jinja compiles the tree down into Python bytecode. It does that by first creating python code and passing that to the builtin “compile()” function to generate bytecode of it. It does not generate bytecode directly, though it would be significantly easier, in order to better support more exotic platforms like appengine and jython.

The compilation of the syntax tree into bytecode is not that interesting in general. Jinja does it because it’s possible and provides optimizations that are otherwise not possible. More about that in the semantics section a bit later.

Evaluation

What’s more important is what Django does on template evaluation. Django is basically rendering the syntax tree on template evaluation. That’s pretty nice and a often used pattern for simple languages from what I’ve seen so far. The problem with Django however is, that it’s incredible slow and currently everything but thread safe. Many tags in the core system modify state variables on the (shared) nodes during rendering. You can easily see that for yourself by using {% cycle "odd" "even" %} inside a loop that iterates over 5 items. Start up your Django server, go to that page and hit refresh over and over again. You will notice that one time the output starts with “even”, one time with “odd”. The reason for that is that the node tree is shared. If you start up the application on a multithreaded server and hit it with tons of ab/siege requests you will even notice that you often get lists that look like “even even even odd even odd” or something similar. And that’s not only for cycle, that also affects block tags. If you extend from a variable template block.super will probably point to a totally different template when the server is under high load.

This is unacceptable behaviour and should be fixed. I’m currently wiring up a patch for that as the ticket was changed from “thread in-safety” to “reset cycle tag after iteration” which shows that at least the editor of that ticket doesn’t get the problem and is lurking around in the Django trac for too long.

The evaluation of a Jinja template doesn’t work over the ast but by evaluating the previously generated Bytecode. And yes, it’s thread safe but that’s not the point.

About Performance

The Django template engine has multiple problems as said above, and one is certainly the performance. Many people argue that the Django template engine is fast enough. Actually, could be. But think about this for a moment: For many CRUD applications you pull stuff from the database without any joins and iterate over the result set. Now guess where (at least in Django) most of the action takes place: In the template. Even the database queries are often sent by the template engine because the querysets are lazy and the initial query is sent in the template. What makes this problematic is that Django’s template engine is an AST evaluator. For every node you have in the template (and that are a lot!) you have one render method that is called. Now imagine you have extended two templates, are four blocks and two ifs deep inside your template. That are already about 10 calls deep. Now try to find read a profiler output.

To show you that I’ve uploaded two profiler outputs (one for Jinja and one for Django) rendering the very same template with the difference that the Jinja version of the template is using a macro and the Django version custom template tags:

Before you try to understand them, a few notes: test_jinja / test_django are the functions that invoke the test rendering process. The reason why the Jinja graph is not joined is that the invocation of the bytecode Jinja generates doesn’t count as regular call and the profiler is unable to connect those. So you have to think yourself the line between render -> and root. In both cases the template engine rendered the templates already a few hundred times before the profiler profiles one single call, so the templates are already parsed (and compiled in Jinja’s case). If you are wondering why there seems to be the template parser active in the django graph, I’m wondering that too. You can have a look at the benchmark to see how it works. If you think the template parser invocation in that profiler output comes from the djangoext.py, you are wrong. That’s what I suspected too. Turns out, even if I don’t use the loader there but preload the template, it’s still happening. So I take that as normal behaviour cause by template inheritance or something like that.

That profiler output shows only the rendering of a pretty normal template situation. Now imagine you have a query somewhere there because of django’s lazy querysets. Now try to figure out what the heck is going on. I was running the profiler against the changeset rendering page in bitbucket and had a call tree so complex that it was impossible for me to figure out what was going on because of 400ms for that page, 300ms were spend in the template. Just that the template invoked mercurials diffing system. That’s insane. That AST evaluator is seriously killing every possibility to get useful profiler information out of the system.

Generating Python-Code Doesn’t Make it Faster

Someone on #django asked why I don’t contribute “the thing that makes Jinja fast” to Django. That’s quite easy to answer: because it’s not that simple. Jinja sets some limitations on the engine to achieve a high performance. For example in Jinja the template context (the data structure you pass to the template) is a data source, not a data container. In Django if you have a custom template tag it is passed a context object you can modify and it will hold the variables of the template. In Jinja the template context object exists, but after the initial creation it is not modified by the engine any more. It’s only used to load yet unknown variables into the namespace Jinja is actually using for template evaluation. What this means is that it’s impossible for a tag to modify the context unless the custom tag knows at compile time the name of the variable it wants to assign to.

This knowledge gives Jinja a huge advantage over Django. Take this little template code:

<ul class="users">
{% for user in users %}
  <li>{{ user.username }}</li>
{% endfor %}
</ul>
<div class="notification">Hello {{ user.username }}</div>

This template code executes in both Jinja2 and Django. However the assumptions the template engine takes are vastly different. Jinja2 is able to translate the template to this Python code internally (without the comments obviously):

# these two variables (users and user) are used in the template
# without being initialized in the template.
l_users = context.resolve('users')
l_user = context.resolve('user')
yield u'<ul class="users">\n'
# because the loop overrides user we assign it to a temporary variable
t_1 = l_user
for l_user in l_users:
    yield u'\n  <li>%s</li>\n' % (
        environment.getattr(l_user, 'username'),
    )
# after the loop we restore the variable
l_user = t_1
yield u'\n</ul>\n<div class="notification">Hello %s</div>' % (
    environment.getattr(l_user, 'username'),
)

If we would want to transform the Django AST into Python code without changing the behavior we would have to do something like this:

buffer.append(u'<ul class="users">\n')
context.push()
context['forloop'] = t1 = {'parentloop': context.resolve('forloop'))
t2 = context.resolve('users')
if not hasattr(t2, '__len__'):
    t2 = list(t2)
t3 = len(t2)
for t4, item in enumerate(t2):
    # Shortcuts for current loop iteration number.
    t1['counter0'] = t4
    t1['counter'] = t4+1
    # Reverse counter iteration numbers.
    t1['revcounter'] = t3 - t4
    t1['revcounter0'] = t3 - t4 - 1
    # Boolean values designating first and last times through loop.
    t1['first'] = (t4 == 0)
    t1['last'] = (t4 == t3 - 1)
    buffer.append(u'\n  <li>%s</li>\n' % (
        environment.getattr(context.resolve('user'), 'username'),
    ))
context.pop()
buffer.append(u'\n</ul>\n<div class="notification">Hello %s</div>' % (
    environment.getattr(context.resolve('user'), 'username'),
))

As you can see. A 1:1 conversion to Python code of what Django templates do currently produces a lot more code. Now I can hear you arguing that the Django example does more because it puts a forloop object into the context. However it has to do that. Because the variables in Jinja are not guaranteed to show up anywhere we have a lot of room for optimizations. If a loop doesn’t use the special loop variable, Jinja won’t create one. It’s that simple. If you don’t access loop variables that require knowledge about the length, Jinja won’t convert the object into a list. What’s a bit unfair is that the Django example has to use buffering. But because tags must have the chance to render nodes they are stored inside them, buffering is necessary unless the custom tag system is changed too.

What’s even worse than the list object inside this loop is context.resolve. And that’s something Django does for every variable access. Imagine you are three levels inside your template (a with, a loop and another loop) and now you try to access a variable inside your loop that was passed to the template. Django has to traverse the context four levels up to get to that data. That’s very expensive. Especially compared to what Jinja does. A local variable in Python as used by Jinja does not end up in a dictionary unless locals() is called or frame.f_locals is accessed. And as long as it’s not in a dictionary no hash code is calculated and no dict resizing takes place. Instead the name gets a number and a place to be. When the function is called Python has already reserved space for that variable. These fast-locals (the internal name for those) are blazingly fast compared to normal dict lookup already, and even faster compared to what django does to resolve variables and you can’t get that without creating bytecode or generating Python code and compiling that.

Synopsis

Django templates are currently…

  • …a lot slower than they have to be
  • …caused by a very weak design that doesn’t really help anyone
  • …also not threadsafe due to some bugs
  • …impossible to further optimize, especially not by “just compiling it to python”
  • …Django’s weakest component
  • …pain in the ass if you want to profile Django

My Pony Request

Django 1.0 is out but that doesn’t mean it’s a good time to stop working on making Django better. It doesn’t help justifying the template language implementation detail by saying it’s fast to parse. All the sub-parsers involved make it rather slow and if you have threading problems under control the memory stay in the memory until shutdown of the process anyways.

Improvement of the template engine is possible, not that hard and will make everybody happier and you don’t have to sacrifice your logic-less templates for that.

And if that’s too radical it would be a step into the right direction to get the threading problems solved.

Bitbucket is no Bit bucket

September 14th, 2008

When github appeared on the internets for the first time, there was a short period of time when I saw the admins of that sitehappy github users[See the follow up to this post for an explanation] jump into many IRC channels of projects that were using git already to switch to github for hosting. Personally for me that was alarming because after a while it appered that git without the hub was no accepted option any more for open source projects.

Ever since we‘re running our own server I think nobody of us ever regretted to host our personal development tools such as subversion (or now mercurial), trac, irc bots and a lot more. Root servers have become pretty cheap (especially in Germany) and debian/ubuntu make server administration a charm.

It really hit me hard when I noticed that people are sacrificing the distributed nature of git and switch to a central hosting location, even though they have their own server infrastructure (I’m looking at you, rails team). Git was designed to not be dependent on a central server and it just doesn’t feel right in my eye to force people to switch to a cental hosting platform because it provides things you would be missing otherwise (branchviewers, fork overviews or something).

This was pretty much the reason why I wasn’t interested that much in bitbucket either. I found it a well done version of hgweb with very fair hosting plans with builtin wiki and bug tracker. Basically the thing I would prefer over Google’s code hosting (I’m not a big fan of subversion any more you know).

A few days ago however I signed up on bitbucket so that I could push to the dozer repository. A few hours later jespern queried me on freenode and welcomed me to bitbucket. First I was afraid the conversion that started would evolve into a github like “switch to our service, you won’t regret it” conversion but it was actually nothing of that sort. That fear out of the way I decided to try mirroring some of the pocoo repositories to bitbucket (because well, you know: we do have some server outages from time to time and a mirror is never a bad idea). And what should I say? It’s a really, really nice way to host open source project. I won’t compare it to github here which is very similar except that it uses git rather than mercurial and doesn’t offer a bug tracker and different plans, but to google code which is a very popular project hosting platform these days.

The big advantage over Google code is obviously that you are using mercurial rather than Subversion. While a closed source project usually has a known number of developers that all have access to the code an open source project usually deals with code from constributors that don’t have access to the code. So I personally think that this alone makes Google code look bad for open source projects.

Both Google code and bitbucket are providing a wiki and a bug tracker. The wiki in Google code is implemented on top of Subversion which is not really that interesting to know but it of course gives you the possibility to access the data locally too. Bitbucket does something very similar and stores the wiki pages in a separate mercurial repository. Especially nice is that the wiki syntax is the well established creole syntax which makes processing of the wiki pages locally very easy. You can pull your complete wiki history as mercurial repository and do whatever you want with it.

The bug tracker in bitbucket is probably the weakest part of the system. While it provides a simple ticket system it is missing a good mercurial integration (eg: that it listens for “this commit fixes #42″) and automatically closes / reopens tickets based on that information. There is also no way to import or export the tickets as far as I can see but I guess that this is a feature that could come in the future.

What’s missing compared to Google code is a file hosting facilty. I don’t think that this is something bitbucket should provide in the future, but that’s definitively something a separate project shold try to fix. Now that sourceforge is starting to look more and more like a domain parking site and being less useful than ever, it’s time for someone to stand up and provide file and website serving :)

There are some things I’m missing on bitbucket currently that would be nice to have. For example it would help mirroring a lot if it was possible to add ssh keys directly to a project instead to a user so that I can grant write access to an ssh key that is only used for mirroring. Integration with CIA (the announcer bot) would be kick-ass too. Maybe it would even possible to specify incoming hooks in form of urls. Every time a changegroup comes in bitbucket would traverse the list of URLs and send them a summary of the changesets as JSON dump. As noticed above tracker <-> mercurial integration would be a nice to have as well as export / import support for it.

I’m really impressed by bitbucket by now and can only recommend it for project hosting. Most important: you are not selling your soul to anyone. If you are unhappy with what you get, you can grab *all your data* and go somewhere else thanks to mercurial. And that this is possible should also give you a good feeling because it means that the people behind bitbucket will care about you to not loose you. Because the success of a website always depends on the number of happy users :-)

The (by now inofficial) pocoo mirrors are listed on my bitbucket account. Don’t get the headline? Read up bit bucket.

Simple HG One-Way Mirroring

I recently contributed some code to a bitbucket hosted repository and registered on that platform because of that. From what I’ve seen so far bitbucket looks like a really nice place to host mercurial repositories and I’ve decided to mirror the important pocoo repositories there.

Turns out, that is really easy :) You need to generate a ssh key for the hg user on the server and add that to your ssh keys on bitbucket. Then create the repository there and add the following lines to the .hg/hgrc of the repository on the main server:

[paths]
bitbucket = ssh://hg@bitbucket.org/USERNAME/REPOSITORY

[hooks]
changegroup.bitbucket = hg push bitbucket

That’s all. Now every time you push something to the repostitory on the server that is forwarded to bitbucket too. As long as you don’t check in on bitbucket now the two repositories should be in sync.

More about bitbucket and why I like it a lot, in a separate blog post ;-)

Update: as suggested by crast it’s a way better idea to use changegroup rather than incoming. I edited the post accordingly. incoming would execute the hook for each changeset pushed.

Mono isn’t just for Apes

August 14th, 2008

On my trip through the Scotland I bought an edition of “Linux Format” to have something to read while waiting for the train. I was reading the letters to the editor when I came across one letter that was basically an insult to the magazine complaining about their Mono support. Apparently they reviewed Tomboy in an earlier edition and haven’t criticised for using Microsoft technology.

I came in contact with C# and .NET about two years ago when it was .NET 2.0 and instantly fell in love with it. It was Java inspired but implemented tons of really cool features like properties, delegates and much more. However at the time, Mono hasn’t yet implemented generics which was a 2.0 feature. Maybe it did, but at least my ubuntu installation wasn’t shipping gmcs yet. Because of that I pretty much forgot about Mono for a while and just played with it once in a while.

A few weeks ago I noticed that gmcs has a command line switch called -langversion:linq which enables all 2.0 features plus a few from 3.0 and 3.5 which made me play a lot with it. My main development environments are an OS X notebook and an Ubuntu linux one. I still do have a windows box but I just use it for playing. As a matter of fact I’m only interested in Mono and not the Microsoft versions of the .NET technologies. What I’ve seen so far is that Mono implemented all the cool features I actually want to use. That is C# with generics, “yield return”, lambdas, extension methods and LINQ (that is .NET 3.0 + some 3.5 features as far as I know, afair LINQ is a 3.5 feature). They also have XSP2 and System.Web which makes it possible to use it for web applications. What I’ve seen so far qualifies as “awesome” so I want to share my feelings about Mono and why I think it’s the best since the advent of ubuntu. My real experience with Mono is only a few weeks old therefore I may be missing quite a few things so please take this “review” with a grain of salt.

I think there are three major types of critics of Mono in the Open Source community. Those who think Microsoft pays Novell for developing Mono to take over the world, those who think Mono is a project that will face patent problems in the future because Microsoft will claim copyright infringement and those who have a problem with Novell because of the patent agreement with Microsoft. Of course there are more. There are people who actually had a look at it and compare it to established platforms like the JVM and come to the conclusion that the Sun compiler is better than Mono or something similar. I agree that it’s perfectly okay to not like Mono because one is happy with the JVM or to hate Novell to sign that patent agreement with Microsoft. However none of that makes Mono as such a bad thing to have.

Mono is largely based on two ECMA standards (ECMA-334 aka C#) and (ECMA-335 aka CLI or Common Language Infrastructure). Other than that Mono also implements parts of ASP.NET, ADO.NET as well as Windows.Forms which are subject to lots of concern from the Open Source community. I don’t know if parts of those are patented already or if that’s just a hypothetical problem but it seems to be one of the main reasons why people hate Mono. Fortunately you don’t have to use any of them to do proper development with Mono. Especially Windows.Forms is something you don’t even want to use ;-)

Mono itself is a fantastic piece of software and one of the best programming environments Linux users got since ages. The big advantage of it is that the Mono environment can host multiple programming languages, not just C#. Even though there are multiple languages running on top of it they can exchange code which is somewhat hard to achieve with the more traditional approaches. For example it’s nearly impossible to use a Python library with Ruby or the other way round. With Mono, IronRuby and IronPython this becomes somewhat possible. But I must agree that I haven’t played with that in detail so far. C# alone was convincing enough.

There are just two languages I dare to compare with C#. One is obviously Java where C# got mosts of it’s inspiration. The other one is D, a rather new language(Edit: Apparently D is a lot older than I remembered. turns out it’s there since 1999) that unlike C# or Java is not running on an runtime environment but produces nativ code only. There is another one which is called Vala and popped up a while ago which basically tries to be C# compiling down to glib-C. D is a rather new language and there are not many bindings to popular libraries available so far. Vala is even newer. Until Mono popped up there was no statically compiled language with bindings to major open source libraries such as GTK or DBus and was beginner friendly. Most people used C/C++ or Python to develop GNOME applications at least and on my rather new ubuntu box most GUI applications are still made with those languages.

The language that is most similar to C# is Java. Having learned a lot from Java, Microsoft improved C# over Java a lot. C# feels a lot more like Python than Java. My favourite feature of C# is without a doubt the ability to create properties. And from all the languages with comparable features I know (which are Python, Ruby, D and C#) C# really has one of the best solution for the problem. Unlike Java there is no need to write getters and setters from the ground up in fear of breaking code later when code execution on setting/getting of values is required and unlike Python you don’t have two variables (foo and _foo) lurking around on the object. A non-property approach to represent a user could look like this:

class User {
  public string Username;
  public User(string username) {
    Username = username;
  }
}

If you later decide to execute code on setting the Username attribute you can easily do so by making the attribute private and adding a property:

class User {
  private string username;
  public User(string username) {
    Username = username;
  }
  public Username {
    get { return username; }
    set {
      Utils.MoveHomeFolder(username, value);
      username = value;
    }
  }
}

My favorite feature after properties is definitively that you have to make methods virtual explicitly. This enables faster code and hides a lot of errors. In general the compiler can save you from quite a lot of problems you only spot with excessive unit-testing in Python. For me that is a huge advantage because I’m a) quite lazy and b) bad at typing. I get typos in the easiest words and thanks to ^P in Vim those appear multiple times before I notice :)

One of the things I love about Python is the possibility to subclass internal objects such as dicts, lists and more to given them a behavior more practical to the kind of data I store in them than the normal version of the objects. For example Werkzeug comes with tons of custom dicts, lists and sets for case insensitive data, multiple keys in a dict and similar stuff. C# makes it ridiculously easy to do that thanks to generics and the classes from System.Collections.Generic. First of all they check the types of the objects you put into them and furthermore you don’t even have to subclass them to get collections the standard library accepts as containers. In Python you pretty much have to subclass the builtins because many Python libraries perform instance checks against list, dict etc. In C# there are Interfaces for that and they are used all over the place which is clever.

A huge advantage over Java is also that you have delegates and lambdas which enable a lot of cool stuff not possible in Java. C# also knows “yield return” which is essentially a helper to generate Enumerator (iterator in Python) objects automatically which saves you tons of boilerplate code. Another neat thing about C# is that you have preprocessor directives which enable conditional compilation and allow you to affect the error reporting by providing different line numbers or filenames in “#line” comments. I often wished for something like that in Python for example when writing Jinja which has to do an ugly hack to rewrite the Tracebacks on the fly to get a proper debug output.

But C# goes far beyond that. Apparently the thread safety in C# is mostly achieved by per-object locking which you can control with lock(obj) { ... } which makes it a lot easier to write thread safe classes. The Python “with” statement is available as using (expr) { ... } which leads to much shorter code compared to Java. Take this Java example:

import java.io.*;

public class FileExample {
  public static void main (String[] args) {
    StringBuilder out = new StringBuilder();
    try {
      BufferedReader in = new BufferedReader(new FileReader("filename.txt"));
      try {
        String line, separator = System.getProperty("line.separator");
        while ((line = in.readLine() != null) {
          out.append(line);
          out.append(separator);
        }
      }
      finally {
        in.close();
      }
    }
    catch (IOException ex) {
      ex.printStackTrace();
    }
    doSomethingWith(out.toString());
  }
  public static void doSomethingWith(string s) {}
}

This is just ugly and I don’t even know if I works because I hacked up from memory without actually testing it. Now compare that with the following C# version of the above code:

using System.IO;

class FileExample {
  public static void Main(string[] args) {
    string result;
    using (StreamReader r = new StreamReader("filename.txt"))
      result = r.ReadToEnd();
    DoSomethingWith(result);
  }
  public static void DoSomethingWith(string s) {}
}

Not a single try/finally. using automatically does the right thing because the StreamReader is an object implementing IDeposable which means that after the using block C# will automatically call r.Dispose().

All the libraries I played with so far (that are the System.* ones, Gtk#, Dbus and many others) are using the language features like they should. Properties are used where wanted, all public namespaces, classes, methods and properties are named in a consistent way and attribute classes (a special feature somewhat comparable to decorators in Python) are used where useful (for example dbus). That’s a consistency in the core libraries you won’t find in Python! It’s really a pleasure to work with that because the code looks nice and there are few surprises when looking for names.

Of course there are problems too. The documentation for Mono is still lacking but you can help yourself by using the MSDN one. In general the Mono documentation is a lot better than some other open source projects.

The progress Mono makes is astonishing. They may not be as fast as Microsoft but the majority of the features work and even if we wouldn’t get any new it would be a great development platform. It really doesn’t matter if Mono can’t keep up with Microsoft’s .NET. The linux community isn’t very keen on Silverlight anyways and besides Silverlight not many non-mono applications will hit the average Linux PC. The goal of the Mono project is not to run arbitrary Windows .NET applications on Linux but to have a free implementation of the .NET framework. It’s saddening that so many people torpedo the development because of FUD or just because Microsoft came up with the idea.

So if you haven’t had a look at Mono yet because you’ve heard so many negative things about it: Forget about that and give it a try yourself. You can’t lose :)

Jinja2 Final aka Jinjavitus Released

July 17th, 2008

The fiinal version of the Jinja2 Django-inspired template engine was just released. It comes with tons of improvements over the older Jinja1 engine and breaks backwards compatibility to some point over the old Jinja1 engine. It’s packaged as a separate package so that you can use both right next to each other for an easier transition.

Compared to Jinja1 it provides tons of new features:

  • Dynamic inheritance. It’s now possible to use dynamic inheritance which means that the name of the master template expanded at render time. This makes it easy to switch between different designs.
  • Improved macro and import system. Macros can be called with keyword arguments now, are much more lightweight and got their own import system which makes templates easier to understand. The import syntax follows the Python one with small adjustments.
  • Heavily improved for-loops. Loops can be filtered now and the length is calculated lazily. This makes it possible to iterate over generators with an unknown length in a much more efficient way.
  • Improved behavior of undefined values. Jinja1 had a very silent undefined behavior. If a variable was undefined you were able to call it without getting errors or access any attribute. Jinja2 ships three undefined types that make it easier to debug templates. The default undefined types allows you to print the undefined variables (which when printed outputs nothing) and loop over it (works like iterating over an empty list). However every other operatation raises an UndefinedError. Additionally there an undefined type with the same behavior but it prints the name of the variable or attribute missing if it’s printed. The third builtin undefined type is the strict type which doesn’t allow any operation except of testing if it’s undefined which is the closest you can get to the default Python behavior.
  • Improved sandbox. The sandboxed environment if enabled is now easier to use and more secure. In the default configuration everything starting with an underscore is considered insecure so it’s not needed any longer to mark those attributes explicitly. It’s also a lot faster now and easily modifyable by subclassing.
  • Line statements. Jinja2 allows to to specify a line statement prefix that marks a whole line as statement. This concept is inspired from Mako and Cheetah and allows very clean templates in many situations. If a line starts with the prefix character (for example “#”), everything up to the end of the line is handled as block.
  • Easier API. The API is much easier to use now on nearly every level. The loaders got refactored so that you only have to provide a single methods, filters and tests are normal Python functions now.
  • Automatic escaping. Jinja2 comes with optional automatic escaping compatible to Pylons and Genshi
  • Improved i18n support. Jinja2 integrates into Babel now which makes internationalizing web applications a charm.
  • Extension interface. Jinja2 provides a documented interface that can be used to extend the template engine. The interface could even be used to create Jinja inspired template engines on top of the existing compiler interface with a completely different syntax.

Get it while it’s hot from the Cheeseshop (or PyPI for those of you who prefer the new name).

Deploying Python Web Applications

July 17th, 2008

Every once in a while I’m really impressed by a library I stumble upon. A while back that was virtualenv, now i stumbled upon fabric. I was using capistrano for a project I was working on which was kinda okay but somehow I wasn’t sold to it.

Yesterday however apollo13 stumbled upon fabric which is capistrano just in Python, with a working put command and less annoying in general.

In combination with a custom virtualenv bootstrapping script Python web application deployment is a charm. One “fab bootstrap” later the servers are creating a virtual python environment, compiling all dependencies, checking out all eggs and initializing the application environment. Updates are just one “fab production deploy” away.

And the best part is that fabric is not limited to Python. You can use it to deploy anything you can control over ssh.

Here an example fabfile (the file that controls the deployment)

set(
    fab_hosts = ['srv1.example.com', 'srv2.example.com']
)

def deploy():
    """Deploy the latest version."""
    # pull all changes from mercurial and touch the wsgi file to
    # tell the apache to reload the application.
    run("hg pull -u; touch application.wsgi")

def bootstrap():
    """Asks for a list of servers and bootstrapps the application there."""
    set(fab_hosts=[x.strip() for x in raw_input('Servers: ').split()])
    run("hg clone http://repository.example.com/application")
    local("./generate-wsgi-file.py > /tmp/application.wsgi")
    put("/tmp/application.wsgi", "application.wsgi")

Saved as fabfile.py “fab bootstrap” then asks for some servers and bootstraps the application there, after changes in the repository you can “fab deploy” the latest version. Of course that’s just a very basic made up example, but it shows how you can use fabric.

I’m using makefiles currently to execute common tasks for various Python projects (like releasing code, resting unittests and much more), I suppose fabric could also do that for me. And that would have the advantage that it works for windows users too.

virtualenv to the Rescue

July 5th, 2008

Deploying Python applications is not that hard, but there are some pitfalls. The main problem everyone stumbles upon sooner or later is that sys.modules is a singleton. Modules are cached there which the effect that there can be exaclty one version of a library loaded at the same time per interpreter. There is no way around that and it’s fine. However there is another problem: Everyone installs everything into a system wide singleton called site-packages. Sooner or later everything ends up there, and while pkg_sources supports switching between versions this is very unconfortable and requires extra hackery.

My plan to solve these problems always was creating a folder, putting all libraries into that folder and site.addsitedir()ing that folder in the file that starts up the application. But especially for web applications that’s a dull repetitive task and easy_install and other libraries don’t play well with this approach.

Fortunately Ian Bicking wrote a really cool module called “virtualenv” that creates virtual Python environments. And it does that in a very cool manner. The following example shows how we run LodgeIt on our server:

First you need to install virtualenv. This assumes you already have easy_install:

mitsuhiko@nausicaa:~$ sudo easy_install virtualenv
Searching for virtualenv
Reading http://pypi.python.org/simple/virtualenv/
Best match: virtualenv 1.1
Downloading virtualenv-1.1-py2.5.egg
Processing virtualenv-1.1-py2.5.egg
creating /opt/local/lib/python2.5/site-packages/virtualenv-1.1-py2.5.egg
Extracting virtualenv-1.1-py2.5.egg to /opt/local/lib/python2.5/site-packages
Adding virtualenv 1.1 to easy-install.pth file
Installing virtualenv script to /opt/local/bin

Installed /opt/local/lib/python2.5/site-packages/virtualenv-1.1-py2.5.egg
Processing dependencies for virtualenv
Finished processing dependencies for virtualenv

Now we can create a folder folder the application. In our case the folder is called after the domain where the web application runs:

mitsuhiko@nausicaa:~$ virtualenv paste.pocoo.org
New python executable in paste.pocoo.org/bin/python
Installing setuptools.......................done.

The virtual python is now available in that folder and we can switch into that environment by sourcing the activate file:

mitsuhiko@nausicaa:$ cd paste.pocoo.org/
mitsuhiko@nausicaa:~/paste.pocoo.org$ . bin/activate
(paste.pocoo.org)mitsuhiko@nausicaa:~/paste.pocoo.org$

From that point onwards “python” and “easy_install” point to the executables in the ~/paste.pocoo.org/bin folder. The prompt is prefixed with the name of the virtual environment so that we are reminded that “python” is our virtual Python. We can now install all the dependencies:

$ easy_install SQLAlchemy==0.4.4 Jinja2 Werkzeug Pygments

Half a minute later the virtual python interpreter will now have the libraries available and we’re ready to add out application:

$ hg clone http://dev.pocoo.org/hg/lodgeit-main

Lodgeit itself doesn’t have a setup.py file but we can just cd into the folder and run the testing server to test if it works:

$ cd lodgeit-main/
$ python runlodgeit.py runserver
 * Running on http://localhost:5000/
 * Restarting with reloader...

That works perfectly for scripts invoking “python” directly such as standalone servers like CherryPy but won’t work for mod_wsgi where the interpreter is created by mod_wsgi and doesn’t point to the virtual environment.

There are two ways to solve the problem. For mod_wsgi 2.0 you can run the web application in daemon mode and pass it the path to the site packages in the apache config:

WSGIDaemonProcess lodgeit python-path=/path/to/site-packages

Alternatively you can go into the .wsgi file and add the site-packages by hand:

import site
site.addsitedir("/path/to/site-packages")

The site-packages of the virtual environment are located at /path/to/virtual/env/lib/python2.X/site-packages.

Even though I was not interested in virtualenv in the beginning I changed my mind because it’s straightforward to use and easy to maintain. More information about virtualenv is available in the Cheeseshop and the announcement blog post in Ian’s blog.

Whitespace Sensitivity

July 1st, 2008

I was reading a thread on ruby-forum.com about Python that said that the whitespace-sensitivity of Python is from hell or something. There are people from every programming language that can rant about Whitespace sensitivity in Python but clearly not Ruby programmers. Why? Because Python doesn’t care about Whitespace at all. The only thing that somewhat has to do with whitespace is the indentation that the lexer convers into indent and outdent tokens. But after that, no whitespace any more, the parser doesn’t know anything about that.

That however is not true for Ruby! foo[42] does a completely different thing than foo [42]. The first calls foo without argument and calls the [] method of the return value with 42 as argument, the latter calls foo with [42] as Argument which happens to be an Array with one element. But there are more examples.

Take this example:

foo = 23
def bar
  42
end

puts bar/foo

That prints “1″.

However take this minor modification:

foo = 23
def bar
  42
end

puts bar /foo

Now this gives you an error that the regular Expression literal is unterminated. That’s what I call whitespace sensitivity :)

You’re wonderhing why I’m using a method for “bar” and not a locale variable? Because the parser keeps track of all assigned local variables or methods (Not sure what exactly it does) and the syntax ambiguities are resolved that way.

cogitations driven by wordpress