Armin Ronacher

return_value_used for Python

written by Armin Ronacher, on Saturday, May 30, 2009 12:35.

Georg found a way today to get the names for a variable in Python. Motivated by this incredible hack I gave porting runkit_return_value_used from PHP to Python a try. What this function does in PHP is finding out if the return value of a function is used or not.

So if you're evil you could write a function like this in PHP:

<?php

function the_today() {
  $today = date("l jS \\of F Y h:i:s A");
  if (runkit_return_value_used())
    return $today;
  echo $today
}

?>
<p>Today is: <?php the_today(); ?></p>

This would print the current date if the function is invoked without assigning the return value to something but will return the current date as string otherwise. I have no idea how this works in PHP but you can have that functionality in Python as well.

import sys, dis

def rvused():
    frame = sys._getframe(2)
    remaining = frame.f_code.co_code[frame.f_lasti:]
    try:
        next_code = remaining[ord(remaining[0]) >= dis.HAVE_ARGUMENT
                              and 3 or 1]
    except IndexError:
        return True
    return ord(next_code) != dis.opmap['POP_TOP']

You can then use it like this:

>>> def foo():
...  if rvused():
...   print "my return value is used"
...   return 42
...  print "my return value is not used"
... 
>>> def test():
...  print foo()
...  foo()
... 
>>> test()
my return value is used
42
my return value is not used

The implementation of the function is pretty simple actually. The rvused function goes two stack frames back, which is the stackframe of the function calling rvused. Then we look at the next bytecode after the last executed instruction (f_lasti) which should be the one that handles the return value of our function. If the bytecode is POP_TOP it means we have an unused value on the stack the virtual machine has to throw away. In that case we let the function return False to signal an unused return value.

How this works becomes obvious if we look at the bytecode for test:

>>> dis.dis(test)
  2           0 LOAD_GLOBAL              0 (foo)
              3 CALL_FUNCTION            0
              6 PRINT_ITEM          
              7 PRINT_NEWLINE       

  3           8 LOAD_GLOBAL              0 (foo)
             11 CALL_FUNCTION            0
             14 POP_TOP             
             15 LOAD_CONST               0 (None)
             18 RETURN_VALUE        

The last instruction in both cases is CALL_FUNCTION. The next one is PRINT_ITEM in the first case which means that python will print the last value on the stack, which is the return value of the function. In the second case the opcode after the function call is POP_TOP which tells the VM to throw away the result.

Oh. And please don't use that in your code. It was just a stupid experiment to find out if it was possible to port that abomination of a function from PHP to Python. Maybe it helps for debugging though.

Holy Crap, Wave is Awesome

written by Armin Ronacher, on Friday, May 29, 2009 17:30.

I'm probably not telling news if I tell you that Wave is awesome. And that for so many reasons. Quite frankely, I was blown away by the demonstration not only because the application felt like a desktop thingy, what really convinced me that this is something new was the fact that Wave is not only an application on Google servers but also an open protocol. It's nice that the application is open source as well, but dammit: It's an open protocol. Imagine the possibilities.

Concept wise the idea is nothing new. We've had that realtime text based communication with very old ICQ clients as well (at least reddit say that). However people change and after a decade things look different again. Wave would integrate into my personal workflow like if it was developed exactly for what I do. I love E-Mail but when it comes to figuring out problems with others nothing beats IRC for me. Just for comparison: the #pocoo IRC channel has ~100 people online at the same time but the activity on the mailinglist approximates zero.. The main problem I have with IRC is however that it's hard for other people to benefit from earlier discussions. We do have a logging bot in the channel and the logs are publicity available but they are painfully to use as reference for your own problems. So IRC has problems but the nearly real-time behaviour makes it easy to help me fix other people's code or discuss implementations etc.

Wave extends both mail and instant messaging. A wave can be composed like a mail and people can comment to it like they would do on IRC. Imagine how effective mailinglists would work that way. You have a problem and describe it in a new wave. Then someone that has some ideas can add a new comment to it. If you happen to be online at the same time you can write to it like it was an IRC session. And that would even scale to a large number of concurrent users because the users are not writing to a central log like they do on IRC but to a topic. In this case the problem you just outlined in that new wave.

So far the only Google products I use and like are the search, maps and youtube. I don't care about gmail because I like to have things in my own hands. I hate centralized systems as much as the next guy. With Android things at Google somehow changed. They started their first large-scale open source project. I always thought that would stop there, but they were doing that again for Chrome as well. And with Wave the whole concept of Open Source at Google went into a new direction. Wave is not only a product but also a protocol you can implement yourself. And if people accept it there should be some alternative implementations available soon. And not only is the specification open, also the implementation Google uses. That makes it incredible easy to get started and setup your own Waves.

I have no idea why Google is doing that, but I think it's great. They are not forcing the users to be present on their servers. They even said in the introduction video explicitly that if you're sending a Wave from one user to another user on the same Wave instance, the messages would never leave the original server. There are very few companies that would dare to give users that much freedom.

You probably have to be a big company to be able to give away your product like this, but I seriously hope other companies will follow Google in that regard. I know Microsoft is slowly getting the concept of open standards and I can only hope they will try to become a bit like Google here. (even though I'm pretty sure this won't happen)

EuroDjangoCon Recap

written by Armin Ronacher, on Sunday, May 10, 2009 6:45.

EuroDjangoCon was awesome! I came back yesterday evening after 5 days of great talks and sprints from Prague. For the week there I had to postpone some exams for university, but it was totally worth it.

If you haven't been there, some of the best moments of the week in recap:

Talks

Zed Shaw, known for his extravagant presentations decided to use a terminal as his presentation tool that wrote the “slides” to stdout and where then projected to the wall. At the same time he was broadcasting the talk to twitter, sentence by sentence. However seems like he breached the Twitter API limits half way through so the talk is now available in full length in his blog: EuroDjangoConf2009 Keynote All Over Your Twitters. The main point he made during that talk was that it's important for programmers to have interests besides programming that emphasize on the concept of creativity and practice.

The talk by Simon Willison was probably the one that motivated me the most. I have to admit that I often when the easy way replacing a whole part of the Django infrastructure for projects with external libraries instead of trying to find a solution for the problem and improving Django. Simon's talk was addressing one of the problems I had with Django, if not the problem I had with Django. The infamous DJANGO_SETTINGS_MODULE. And I'm not talking about that being an environment variable but that magic singleton named django.conf.settings that is loaded once and nearly impossible to change at runtime.

His talk was addressing mainly replacing middlewares and the URLconf with view-like callables that take the request and return a response, but we where brainstorming with Adrian in an open space later and soon we've had a healthy discussion going on about breaking Django internally into multiple systems that can be optionally configured separately and changed at runtime in a thread-safe manner. The slides from his talk are online on slideshare: Django Heresies.

Joe Stump from digg was talking about large data sets and rethinking the stack. For me this was very interesting because the only time I was personally working with largish data was when I was working for Plurk and it wasn't that big back then :) I personally still have the feeling that the larger the data the weirder the code, but maybe that's how things are. I suppose as a result of his and Michael Malone's talk people will start working on message queues for Django. I was unable to find the slides for the data set talk online, but the one about Rethinking the Stack is online as keynote source: Rethinking_the_Stack.key.zip. Michael's talk was about scaling Django at Pownce and was interesting as well. You can find the slides online on his website: Scaling Django.

Adrian was briefly showing some of the history of Django before it was open sourced in a lightning talk. Even though I was using Django since the first open source release it was interesting as hell what happened before that. Did you know there were like 8000 changesets before the first release? That means that until recently there was more stuff happening before the open source release than afterwards :-) Also I didn't know that Ian Bicking was responsible for Django no longer using code generation (yay!). Really seems like he has his hands in every Python project out there.

There was a lot more, but I think these were my favorites.

Sprints

Sprinting on Django was great fun. The first day we were 50 people in the offices of centrum holdings, the company that provided the room for the sprints. Honza was ordering 30 pizzas and Jacob came with two or more bags of chips and sweets. I got 9 tickets closed so far, 6 still in the pipeline. Alex and I also refactored the file system in Django slightly so that it doesn't require monkey patching of __class__ and replacing all methods in subclasses, like it was the case before.

There was also a dedicated pinax sprint, but I didn't take part of that.

Prague

Prague rocks. What do I have to say more? When I stepped of the train I draw 2000 crowns from my account (~75€) and I was unable to spend all of it the week even though we went drinking and eating every single day. The party was in a place where they had beer taps in the middle of the table and the number of liters the table drew was shown in a scoreboard on the wall. Unfortunately we often went in really large groups which made it hard to get to the really interesting places (like a Jazz bar), but now we know better.

What else?

I didn't take any pictures myself (and I have the ability to avoid photographers) but there are some nice galleries on flick. Next time they should decide on a better tag than edc which is hard to search for on flickr.

Can't really wait for djangocon in Portland.

First Werkzeug Screencast

written by Armin Ronacher, on Monday, April 27, 2009 21:34.

I've played with the idea of recording a screencast for Werkzeug for a long time. Unfortunately recording a screencast is a lot more work than you would think so it took me a long time to create one where I was at least partially happy with. I know the one I've uploaded is far from perfect: the sound is not so well, the narration is so-so and there is even a small error in the recording I had to fix using an overlay.

But I spend too much work in it already to just throw it away. I promise I will upload a finer version with the next release of Werkzeug, but until then I hope it still helps a bit.

I tried to stick to the very basic things you would want to do with Werkzeug. No fancy JavaScript or flickr API tricks. The screencast gives you a quick introduction into Werkzeug with Jinja2 as template engine and SQLAlchemy as database abstraction layer to create a simple wiki application.

The screencast is available with sources on werkzeug.pocoo.org/wiki30. It should work fine on VLC and QuickTime.

Werkzeug 0.5 Released

written by Armin Ronacher, on Friday, April 24, 2009 21:04.

I'm very happy and proud to release the latest version of Werkzeug, the WSGI utility library for Python. This release is probably the most interesting one so far. We refactored a lot of internals, overhauled the documentation and pushed out some exiting new features. Unfortunately that release also adds some deprecations and drops Python 2.4 support, but you will notice that it's worth it :)

Improved Test Client

Werkzeug came with a minimalist test client since one of the first releases. However it was never really fun to use and some of the functionality was available through other (and independent) functions such as create_environ as well. With 0.5 we changed that, improved the test client's interface and rewrote it to use the other functions as well. This has the big advantage that whatever you use now, the parameters are the same.

Because of the unified interface you can now automatically create arbitrary WSGI environments for internal requests or whatever you want. Let me give you a small example that shows how you can create new WSGI environments:

>>> from werkzeug import EnvironBuilder
>>> b = EnvironBuilder(path='/foo', base_url='http://example.com/bar')
>>> b.base_url
'http://example.com/bar/'
>>> b.path
'/foo'
>>> b.script_root
'/bar'
>>> b.host
'example.com'

You can easily use this to create a WSGI environment with file uploads as payload:

>>> b.method = 'POST'
>>> b.add_file('file', '/path/to/the/file.txt')

And then create the WSGI environment:

>>> env = b.get_environ()

Not to forget: the test client supports cookies and internal redirects now.

Stream Limiting and Form Data Parsing

WSGI for the longest time had the problem that the specification says the input stream has to provide a readline method but must not support the size hint parameter. Fortunately no implementation we came across cares about that and provides the size hint. With 0.5 Werkzeug does not require readline to support the size argument for form data parsing which makes it fully WSGI compliant for the first time. With 0.5 onwards the input streams you care about are automatically limited to the content length. This means you can savely call read() on the input stream (the one on the request object!) and not cause a lockup. If you don't want to use the request objects you can still use the new LimitedStream class that implements the stream limiting. If you want to savely iterate line by line over the input stream and not break the WSGI specification by supplying a size for readline(), you can use the make_line_iter helper function that will return an iterator that iterates over a stream.

But there is more. The parsing system was rewritten for Werkzeug 0.5 which makes it possible to decide where to store the file before the upload started. In the past Werkzeug always created a temporary file no matter what was uploaded, now you can react to the size of the upload and provide a different stream instead. For example you can decide to store the stuff in memory if the uploaded files are smaller than one megabyte or stream them directly to disk.

You can also limit the upload size so so that Werkzeug exhausts the input stream and throws away the data if the user uploaded a file over the threshold:

from werkzeug import Request

class LimitedRequest(Request):
    #: not more than 8 MB are accepted
    max_content_length = 1024 * 1024 * 8

    #: the maximum size for regular form data (not files) is 1MB
    max_form_memory_size = 1024 * 1024

If the user tries to upload more than that and Werkzeug tries to parse the uploaded data it will raise a RequestEntityTooLarge exception. You can either return that as generic error or catch it and display something nicer instead or ignore it.

Another thing we noticed when working on file uploads is that many users try to upload and store files on the file system with (nearly) the same name. Because you can easily cause troubles that way we added a function to secure a filename. This makes sure that a user can't upload a file with a spoofed filename to leave the upload target folder. It also removes non-ASCII characters and whitespace so that the code works the same on unicode and non-unicode file systems:

>>> secure_filename("My cool movie.mov")
'My_cool_movie.mov'
>>> secure_filename("../../../etc/passwd")
'etc_passwd'
>>> secure_filename(u'i contain cool ümlauts.txt')
'i_contain_cool_umlauts.txt'

Other major Changes

The common request headers such as content type, length, referrer or date are now exposed to the full request objects. The handling of content types was simplified and you can now directly access mimetype parameters. The user agent matcher was improved as well and knows about Google Chrome and some other modern browsers.

Again we beefed up our HTTP support and added more parsing and dumping functions for all kinds of headers. Besides content ranges we should have everything covered now. For better URL compliance the magic “&” / “;” switch on the URL decoder is gone for good.

All attributes on the request object are read only now which also includes collections. This gives us the opportunity to fine tune the objects in the future without breaking code and causing confusing behavior. By doing this we also implemented read only collection classes of all builtin containers:

>>> from werkzeug import ImmutableDict
>>> d = ImmutableDict({"foo": "bar"})
>>> d["foo"]
'bar'
>>> isinstance(d, dict)
True
>>> d["bar"] = "new value"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'ImmutableDict' objects are immutable

The builtin webserver no longer requires wsgiref. It should no work a lot better on Windows systems where we had some annoying problems with DNS lookups in the past. Also it supports any HTTP method now and no longer causes troubles if your code emits a Date header.

Because Webservers (and browsers) are often pretty buggy, we added some fixes for servers and browsers to the contrib module. These work around bugs with lighttpd and IIS servers and the infamous internet explorer.

New Documentation

Last but not least the best feature of this release: The new documentation. We worked hard documenting every single interface Werkzeug provides, improving the tutorial and finally documenting the contributed modules. The new documentation also has a section on how to configure web servers, how to test applications and some useful notes on request data handling.

Get It

You can get the latest release version directly from the Python Package Index and the Werkzeug website.