Armin Ronacher

Pros and Cons about Python 3

written by Armin Ronacher, on Thursday, January 7, 2010 14:40.

I was briefly expressing my disagreement with the Python 3 development decisions, so I want to elaborate on that a bit. While I was previously addressing some of the problems I have with Python 3, I took the time to create a list of things that were solved in a way other than I expected or hoped.

Let's start with the biggest grief of mine...

Unicode Support

If you look at any of the pocoo libraries, they all use unicode. In fact, Jinja2 and Werkzeug even enforce unicode so you can't even use them with byte strings internally unless you do the encode dance. Why that? Because I believe in unicode and that is not too surprising because German, the official language of Austria uses some non ASCII letters and yet tons of systems deployed force me to substitute Umlauts with latin letters just because someone had a limited horizon.

But the unicode world is complex and Python does not care about unicode too much. And that neither in Python 2 nor Python 3. So what does not work about unicode in Python? So in German for example, there are words like "Fuß" (which means foot). The last letter there is a so-called "Scharfes Es" or "Eszett". The former means "sharp-s" and is usually represented as a "ſs" ligature, the other one is a "ſz" ligature and used in blackletter fonts (The letter "ſ" is no longer in use but worked similar to a latin "s"). Another common way was using an "ſs" litature. Because this letter will never occur at the beginning of words there was never an uppercase character for it (there was one introduced lately, but nobody uses it). However, it is pretty common to use title case or uppercase for emphasis, so there is a need for that letter to exist in uppercase. The common replacement you see is a doubled "s" or "sz". So "Fuß" becomes "FUSS", "Maße" became "MASZE" etc. There are some variations, but basically it means that one letter becomes two.

However, that does not work in Python. The Python unicode implementation cannot do two things: neither can it replace one letter with two when changing the case, nor does it allow a locale information for character mappings. The latter is necessary for languages like Turkish where an uppercase "I" is lowercased to "ı" and not "i". (And I will not complain about the shared state of the locale library which of course stayed in Python 3).

Another problem is that Python uses UCS2 or UCS4 internally and that shines through. So if you have an UCS2 build, len() called on strings does not give you the number of characters in the string, but the number of UCS2 interpreted characters which might not be the same. In fact, every letter outside the basic plane will be wrongly counted. Because UTF-16 (UCS2 with support for surrogate pairs that allow you to use characters outside the basic plane) is a variable length encoding it has the same problems as utf-8 as an internal encoding: namely, making slicing a non trivial operation. Another problem arises: binary extensions have to be compiled for UCS2 and UCS4 pythons separately. And last time I checked, setuptools did not allow you to publish both builds on pypi and pull the correct one. In fact, by default there is no such information in the filename which would make it possible to provide both extensions.

So what they did "improve" with Python 3 was making unicode strings the default. And they did that in a very backwards incompatible and IMO problematic way: they degraded bytestrings from strings to glorified integer arrays and enforced unicode on non-unicode protocols.

Just to give you an example: When you iterate over bytestrings in Python 2, the iterator will yield you a bytestring of length 1 for each character with that character in. While I was always a harsh opponent of strings being iteration, it was something everybody relied on. In Python 3, bytestrings are bytes objects, which are basically arrays of integers which look like strings in the repr, but yield the bytes as integers. So if you had code that relied on the iteration returning chars, your code will break. And yes, Python 3 breaks backwards compatibility, but this is something that 2to3 does not pick up and most likely you will not either. At least in my situation it took me a long time to track down the problem because some other implicit conversion was happening at another place.

Now at the same time unicode strings continue to yield unicode strings on iteration with the char in. That means suddenly bytes and unicode objects have different semantics, making it impossible to provide an interface for both bytestrings and unicode. There were tons of places libraries accepted both unicode and bytes in Python 2 because it "made sense". A good example is URL handling. URLs are encodingless. Some schemes hint a default charset, but in reality no such thing exists. However, applications themselves knew what encoding the URL would use, so they happily pass unicode strings to the URL encoder and that would use the application URL encoding to ensure the URL is properly quoted. In Django/Werkzeug/and probably many more libraries, if you passed unicode to the URL encoder, it would by default encode to UTF-8. However if the URL was came from another source with unknown encoding, it was possible to transparently pass the URL on. Also the decoding of URLs from somewhere else usually happened encoding-less. Many applications for example check the referrer of the page to check if the user came from a search engine and if yes, grab the search keywords from the referrer and highlight them on the current page. In that situation you can keep a list of known encodings of referrers in the URL and decode the referrer URL accordingly. In Python 3 the URL module in many situations uses an UTF-8 default encoding, or requires the URL to be UTF-8 encoded or provides a completely different and limited interface for byte URLs.

Sure, it might be sufficient for 98% of all users, but there are non obvious implications: a library that wraps urllib/urlparse and whatnot cannot reuse the same code for Python 2 and 3. When I started supporting IRIs in Werkzeug (basically the URL successors with proper encoding, already somewhat used by browsers) I chose to abandon the urllib module altogether and write my own simple decoder to make it easier to later port that thing over to Python 3 without changed semantics.

There are other examples as well: filesystem access. Python 3 assumes your filesystem has an encoding, but many linux systems do not. In fact, not even OS X enforces an encoding. You can happily use fopen to create a file that does not look like UTF-8 at all. And even there, the situation is a lot more complex because on OS X, different unicode normalization rules apply for the filesystem than for the applications themselves. So even if you are using Python 3, you still will have to manually normalize the filename to a different encoding when you want to compare filenames on the filesystem.

When I looked at the unicode stuff in Python 3 I did not see much value over nicely written libraries in Python 2 that enforced unicode usage. In fact, the update makes it especially hard to convert such libraries (that required unicode) to Python 3, because 2to3 assumes you are using byte strings and not unicode.

The case of "super" and other Quirks

I wonder how this slipped past everybody and why Guido is okay with that, but the new super non-keyword-keyword in Python 3 is just wrong. The fact that this code works is alarming already:

class Foo(Bar):
  def foo(self):
    super().foo()

Assuming you know Python used to work function invocation wise the fact that this works smells. But it gets a lot worse because this code does not work:

_super = super
class Foo(Bar):
  def foo(self):
    _super().foo()

That's just wrong. The use of the name of a global function (which btw I can reassign!) should never affect the bytecode generated, that's what keywords are for! Python also did not optimize while True loops because someone could reassign True, but suddenly it's sortof okay to do that. Also, why have self explicit when some magic in the compiler is now suddenly able to inject new symbols in the code? From that point onwards it is a one-liner to make the self implicit and suddenly there is no reason for that self being the explicit first parameter any more.

From what I remember, this was done to optimize the code. That's true, they do optimize something, but at the same time a function call of a global function in a method in a loop, will do a dict lookup every time the thing is invoked. Another thread could reassign the global function and suddenly the code would no longer call the new function because the old one was pulled into a local "register" (fastlocal or similar). And if you think "that's undefined behavior", I beg you to look into the mimetypes library. That will explain that no where in the world a Python implementation could be conforming if it avoids global lookups by optimizing them.

What I wished for Python 3 was to remove really useless dynamic features like pulling in functions on every call to allow more compiler/interpreter optimizations, easier multithreading support and everything.

Also what I was wishing for, for Python 3 was a better interpreter interface, and a revoked GIL or no GIL at all. I would love to be able to use multiple Python interpreters per application. Some sort of reentrant interpreter. That would simplify embedding Python into other applications and expand the possibilities. Just look at how V8 works internally to get an idea of what I was hoping for. I also wished there was a builtin support for freezing objects (no longer a frozenset, just freeze the set, and then finally be able to do the same for lists etc.). Also builtin support for proxing would be nice. The hack thread local libraries and the weakref module to proxy objects is just wrong, wrong, wrong (and unreliable as well). Imports are still horrible implemented, the standard library is still inconsistent or limited (and now even broken, cgi.FieldStorage in Python 3 anyone?)

What's cool about Python 3?

What I really like is the new non local stuff. I was longing for that for a long, long time. booleans being a keyword, that should have been in there for longer, finally easier division semantics, improved metaclasses, class decorators, no more classic classes, dict views, the builtins returning iterables instead of lists etc (Though they should have added improved repr support that would allow me to introspect those iterators and freezing them at the same time [which I guess would once again require a cleaner and improved interpreter design to get right]).

Conclusion

But that does not justify a new version of Python. Instead they could have added a strict mode and let the old code run emulated. They could have expanded that strict mode to allow access to new features of the language, add support for compiler optimizations and so much more. (JavaScript is currently getting such a strict mode).

So yes, I am disappointed how Python 3 worked out. They could have done so much more or skipped Python 3 altogether and get the cool stuff into an optional strict mode in Python 2.

Status Update 2010

written by Armin Ronacher, on Wednesday, January 6, 2010 23:19.

As you can see from the archives, the activity in this blog was rather low the last couple of weeks. But not only the blog, also my visible programming activity and everything related. There are a couple of reason for that but unfortunately I'm not entirely sure how this will change in the next few weeks.

University turned out to be boring and stressful at the same time and is the main limiting factor of my productivity. While at the same time many courses bore or frustrate me, there is still a lot of stuff that has to be delivered or learned for.

While this is the primary reason for not being active at all, there are a couple of other reasons as well. What I spend most of my time with the last couple of years was certainly Open Source (I guess even free software), Python and web applications. For a time I am interested in other topics as well, I just never had the impression that I would be able to learn was has to be learned to succeed at it.

I still don't know, but without trying I will never know. So my Python / Webdev activity will most likely stay as low as it currently is for a while. That however does not mean that my projects will be unmaintained.

Project Maintenance

So what's happening to the projects in the near future?

Werkzeug's Future

So this little bugger will certainly get a new release soon. I was actually planning a release before the beginning of the new year but there are a couple of things that I want to have sorted out first. I was planning to move the debugger together with a couple of things into a new library called Flickzeug, but because I don't think I will personally need debugging tools any time soon, I will put that plan on hold.

If anyone is interested in helping out there, just drop me a line and I will help you as good as I can. The design of the library is pretty much finished, just the implementation is unfinished.

Until something for Flickzeug is planned, the debugging component will stay in Werkzeug and not be deprecated. The same cannot be said about the Werkzeug templates however. Because it depends on a library that goes away in Python 3 I will only deprecate it for future Werkzeug Python 3 versions.

This is also my biggest grief at the moment. I am totally unhappy how Python is developing currently and especially with the implementation of Python 3 and the plans for it. Graham Dumpleton is doing an insanely great job with maintaining mod_wsgi and I am sure whatever he decides on for Python 3's WSGI will become the standard everybody is using. And I trust him enough that I know whatever is chosen, is an acceptable solution everybody can be happy with. Until then however, I do not plan to port Werkzeug to Python 3.

Jinja

Jinja is a little bit more complicated. The hg tip should work on Python 3 already if you run it through 2to3, so with the next release we should have acceptable builtin Python 3 support. What keeps me from releasing a new version is that the compiler has a couple of scoping issues that arrive from edge cases where Jinja scoping does not properly translate to Python scoping.

All that would not be an issue if I could generate bytecode directly, but that unfortunately is not portable and does not work on the appengine at all, so I can only add further workarounds into the code generator.

Right now I am looking for code that breaks on Jinja2 tip and come up with solutions for these edge cases. If you have templates that break on tip, let me know and file tickets for those.

Zine

Now this is the most tricky project. When I started working on Zine I was extremely motived doing that, mostly because at that time WordPress was really weak and easy to catch up to. However Zine is still a one man project and because it is built in Python it is mostly interesting for other Python developers and as such missing its market.

The hg version is a lot better than the latest release so what I will do is integrating the missing changes into it and release a new version at the beginning of February. From that point onwards however I have no plans.

Why all that

So why the sudden switch away from what I did the last years? It's not that I lost motivation in general for what web development but I don't want to limit myself to that area. There is more I am interested in and because I picked the wrong studies all over again I won't learn enough I will have to start learning that next to my regular studies.

Alex Gaynor wrote an essay about education a while ago to which I wanted to respond, but I never had the time to. Lately however that comes back more and more to me because the situation in Austria regarding education and university is currently just weird for a couple of reasons.

Basically it boils down to me at the same time being incredible unhappy with the way university works in Austria but finding myself over and over again in the situation where I have to defend the system against other stupid students that think it's a good idea to occupy a large auditorium to get more money from the state (and by doing so disrupting other students whose exams get relocated, access to library denied and so forth).

Call for Volunteers

So a while ago there were a couple of people contributing code to various Pocoo projects. This is still true but even though the number of people using Werkzeug, Jinja2 and Zine increased (not mentioning Pygments here which Georg maintains) the number of patches and active developers decreased. This is unfortunate because right now I'm pretty much alone on these and working alone means that in times where I have few time to spare, the amound of new features, improvements and everything declines.

Two projects of mine (CleverCSS and GHRML) either found new owners by sending me a mail or got some better maintained branches on github and I would love to either hand over the project completely (including the pypi page) or getting you on the pocoo team and giving you commit access here.

For Werkzeug, Jinja2 and Zine my plan is to clean up the trackers a lot so that it's easier to work on small tasks which hopefully will both enable external contributions and make it possible for me to work on these next to what I will do otherwise :)

Happy new 2010

So for all of you, a Happy new 2010 and may it have more, and more stable releases, than 2009!

jQuery in Middlewares

written by Armin Ronacher, on Saturday, September 26, 2009 18:33.

Recently projects started using jQuery for middlewares that rewrite the request (like debug toolbars and similar stuff). This however breaks if another jQuery library is used on the page or if another library is in use because suddenly the $ or jQuery objects work differently.

However, because jQuery is awesome someone thought about that and added the deep-no-conflict mode. Let me show you a small example about how to use no compatible:

<script type="text/javascript" src="jquery-1.3.2.min.js"></script>
<script type="text/javascript">
  (function($) {
    $(function() {
      $('body').html('jQuery is awesome');
    });
  })(jQuery.noConflict(true));
</script>

If you do this in a middleware inside the anonymous function you have access to your jQuery version you just included as $, but outside of it no trace of jQuery is left. Websites will continue to work in exactly the same way as before. Another word of warning: make sure to give your dom elements a really random class prefix in order to avoid clashes on the DOM level and in the CSS definitions. Also: append to things, not prepend. The latter could break first-child selectors people might use.

This was DjangoCon 2009

written by Armin Ronacher, on Monday, September 14, 2009 5:50.

Wow. DjangoCon was a blast. Not only because of the topics there, but also because of Portland itself and the people I met there and at the conference. I learned tons of new stuff and finally got the chance to meet some of the guys I previously only knew from IRC. Django really got one of the coolest communities I've seen.

The Highlights

The best parts of the conference were not even the talks, but the stuff that happens after the conference as always. The coolest talk in my opinion was Simon talking about “Cowboy Programming”. I loved it because the idea of hacking on stuff on a fort for a couple of days, disconnected from the internet is a honking great idea. I will totally do that in Austria next year with a small group of people who are interested :)

The other kind of highlight was the Django and HTML5 talk which turned out to have nothing to do with either Django or HTML5, but instead with Sproutcore. Because it did not really explain what Sproutcore was, except that it makes 5req/sec a clarificatory website appeared right after the talk.

Ian Bicking's talk turned out to be something completely different than what everybody was expecting. Ted Leung described it as “a free software programmer’s midlife crisis”.

Unfortunately neither Jacob nor Adrian where at the conference, but that did not matter too much because that did not harm the conference at all. It might have helped for the sprints though to see Jacob there, the legal part was not really covered at all these sprints.

What did we do?

Whenever Python programmers meet somewhere they hack on stuff. DjangoCon was no exception.

Ian and I started working on an updated version of PEP 333 but we immediately got some negative feedback from Graham (who unfortunately wasn't there) on some of the changes. I will update the PEP next week to clarify the points he brought up and read up on the web-sig to not miss anything.

Alex Gaynor and Eric Florenzano spend a whole night to start a comet framework on top of tornado called Hurricane. They actually started with a chat application first as far as I remember. When Mike Malone joined it was promptly rewritten into something more abstract that works on different queue backends. I had fun helping out a little bit but so far my contributions to that project where not that interesting. Let's see what comes up next.

Amazingly I also found some time releasing a new version of Jinja 2 and fixing some bugs in Werkzeug. Also very cool: there are Werkzeug users in Portland and they do some amazing stuff.

Embarrassingly I did not work on any of the things I wanted to work on for DjangoCon. Neither my WSGI changes for Django nor my idea of a stacked settings module, but I will probably try to hack on that after the GSOC branches are merged in which my changes would depend on. On the bright side: I think I did something useful for Django by explaining packaging to Simon.

I also did a brief talk about Solace at the PDX-Python usergroup meeting and at DjangoCon as lightning talk. Unfortunately not yet confident talking in a non-native language, hope I will do better next time.

And of course I did not drink a lot of alcohol with Mike Malone and Michael Richardson ;)

What's up next?

Meh. Not much I'm afraid. The only bad thing about DjangoCon is that it makes me sad going back to where I was before. Missing all the people sharing the same interest, missing the startup-esk flair of Portland, missing Michael Richardson, Adam Lowry, Michael Pelletier, Jason Kirkland, Brett Carter, Mike Malone and all the others I met there and had a very good time with. Hope we will see each other again on PyCon or DjangoCon somewhere.

Related:

Jinja 2.2 Released

written by Armin Ronacher, on Sunday, September 13, 2009 7:46.

I'm happy to announce the 2.2 Release (Codename Kong) of Jinja 2, the high performance, sandboxed template engine for Python. What's new in this release?

  • {% include %} tags now have an “ignore missing” marker that tells Jinja to skip missing files silently.
  • Priority of not raised. You can now write foo not in bar and do what it does in Python.
  • Fixed many problems with {% call %} and {% macro %} tags in loops.
  • Included templates can now access all variables from outer scopes properly.
  • Added “scoped” modifier for blocks that cause them to be affected by scoping rules.
  • Added support for line-based comments.
  • Added the meta module that gives access to some Jinja internals in a supported way.

Grab it while it's hot from the Python Package Index