Pro/Cons about Werkzeug, WebOb and Django
Yesterday I had a discussion with Ben Bangert from the Pylons team, Philip Jenvey and zepolen from the pylons IRC channel. The topic of the discussion was why we have Request and Response objects in Werkzeug, WebOb and Django and what we could to to improve the situation a bit.
We decided on writing down what we like or dislike on these three systems in order to find out in which direction to go, so this is my attempt. Please keep in mind that this are my opinions only!
WebOb
Let's start with WebOb which is the smallest of the three libraries in question. WebOb really just sticks to the basics and provides request and response objects and some data structures required.
The philosophy of WebOb is to stay as compatible to paste as possible and that modifications on the request object appear in the WSGI environment. That basically means that when you do anything on the request object and you create another one later from the same environment you will see your modifications again.
This is without doubt something that neither Werkzeug or Django do. Both Werkzeug and Django consider the incoming request something you should not modify, after all it came from the client. If you need to create a request or WSGI environment in Werkzeug you get a separate utility for, that is designed for exactly that purpose.
While I have to admit that the idea of a reflecting request object is tempting, I don't think it's a good idea. Using the WSGI environment as a communication channel seems wrong to me. The main problem with it is that WebOb cannot achieve what it's doing with standard environment keys. There are currently five WebOb keys in the environment for “caching” purposes and for compatibility with paste it also understands a couple of paste environment keys.
The idea is that other applications can get a request again at a completely different point, but I'm not sure if WSGI is the correct solution for that particular problem. Reusable applications based on the complex WSGI middleware system seems to be the wrong layer to me.
Some other parts where I don't agree with the WebOb concepts:
- The parsing of the data is implemented either in private functions or directly in the request object. I strongly prefer giving the user the choice to access the parser separately. Sometimes you really just need a cookie parsed, why create a full request object then?
- WebOb uses
request.GETandrequest.POSTfor URL parameters and form data. Because you can have URL parameters in non-GET requests as well this is misleading, for POST data it's wrong as well because form data is available in more than just POST requests. Accessingrequest.POSTto get form data in a PUT request seems wrong. - WebOb still uses
cgi.FieldStorageand not only internally but also it puts those objects into thePOSTdict. This is not the best idea for multiple reasons. First of all users are encouraged to trust their submitted data and blindly expect a field storage object if they have a upload field in their form. One could easily cause trouble by sending forged requests to the application. If logging is set up the administrator is sent tons of error mails instantly. I strongly prefer storing uploaded files in a separate dictionary like Django and Werkzeug do. The other problem with usingFieldStorageas parser is that it's not WSGI compliant by requiring a size argument on the readline function and that it has a weird API. You can't easily tell it to not accept more than n bytes in memory and to switch between in memory uploading and a temporary file based on the length of the transmitted data. Alsocgi.FieldStoragesupports nested files which no browser supports and which could cause internal server errors as well because very few developers know that a) nested uploads exist and b) that the field storage object behaves differently if a nested uploaded file is transmitted. - Also WebOb barks on invalid cookies and throws away all of them if one is broken. This is especially annoying if you're dealing with cookies outside of your control that use invalid characters (stuff such as advertisement cookies)
Now to the parts where WebOb wins over Django and Werkzeug:
- Unlike Django and Werkzeug WebOb provides not only a unicode API but also a bytestring based API. This could help existing applications that are not unicode ready yet. Downside is that with the current plans of Graham for WSGI on Python 3 there do not seem to be ways to support it on Python 3.
- WebOb supports the HTTP range feature.
- The charset can be switched on the fly in WebOb, in Werkzeug you set the charset for your request/response object and from that point nowards it's used no matter what. In Django the charset is application wide.
An interesting thing is that WebOb uses datetime objects with timezone informations. The tzinfo attribute is set to a tzinfo object with an UTC offset of zero. That's different to Werkzeug and Django which use offset-naive datetime objects. Because Python treats them differently and does not support operations that mix those. Unfortunately the datetime module makes it hard to decide what to do. Personally I decided to use datetime objects that have no tzinfo set and only dates in UTC.
Werkzeug
In terms of code base size Werkzeug's next. The problem with Werkzeug certainly is that it does not really know what belongs into it and what not. That situation will slightly improve with the next version of it when some deprecated interfaces go away and when the debugger is moved into a new library together with all sorts of debugging tools such as profilers, leak finders and more (enter flickzeug).
Werkzeug is based on the principle that things should have a nice API but at the same time allow you to use the underlying functions. For example you can easily access request.form to get a dict of uploaded form data, but at the same time you can call werkzeug.parse_form_data to parse the stuff into a multidict. You can even go a layer down and tell Werkzeug to not use the multidict and provide a custom container or a standard dict, list, whatever.
Also Werkzeug has a slightly different goal than WebOb. WebOb focuses on the request and response object only, Werkzeug provides all kind of useful helpers for web applications. The idea is that if there is a function you can use, you are more likely to use it than that you reimplement it. For example many applications take the uploaded file name and just create a file with the same name. This however turns out to be a security problem so Werkzeug gives you a function (werkzeug.secure_filename) you can use to get a secure version of the filename that also is limited to ASCII characters.
So obviously there is a lot of stuff in Werkzeug you probably would not expect there.
So here some of the things I like especially about Werkzeug:
- The request/response objects. They are designed to be lightweight and can be extended using mixins. Werkzeug also provides full-featured request objects that implement all shipped mixins. Also the request/response objects are not doing any parsing or dumping, that is all available through separate functions as well which makes the code readable and easy to extend.
- It fixes many problems with the standard library or reimplements broken features. It does not depend on the
cgi.FieldStoragesince 0.5, allows you to limit the uploaded data before it's consumed. That way an attacker cannot exhaust server resources. - The data structures provide handy helpers such as raising key errors that are also bad request exceptions so that if you're not catching them, you are at least not generating internal server errors as long as the base
HTTPExceptionis catched. - Werkzeug uses a non-data descriptor for the properties on the request and response objects. The first time you access the property code is executed and that is stuffed into the dict. After that there is no runtime penalty when accessing the attributes.
And of course here the list of things that are not that nice:
- It's too large for a library that only wants to implement request and response objects.
- There is no support for if-range and friends.
- The response stream is useless because each
write()ends up as a separate “item” in the application iterator. Because each item is followed by a flush it makes the response stream essentially useless. - The
MultiDictis unordered which means that some information is lost. - The response object modifies itself on
__call__. This allows some neat things like automatically fixing the location header, but in general that should happen temporarily when called as WSGI application instead of modifying the object.
Django
Now Django isn't exactly a reusable library for WSGI applications but it does have a request and response object with an API, so here my thoughts on it:
- URL arguments are called
request.GETlike in WebOb, but files and form data was split up intorequest.POSTandrequest.FILES. - The request object is unicode only and the encoding can be set dynamically.
- Problem is, they don't work with non-Django WSGI applications.
Chances on a common Request Object?
WebOb and Werkzeug will stick around, and the chances that Django starts depending on external libraries for the Request object are very, very low. However it could be possible to share the implementation of the HTTP parsers etc.
To be humble, I would not want to break Werkzeug into two libraries for utlities and request/response objects and parsers because of the current packaging situation. A lot of small stuff I work on works perfectly fine with nothing but what Werkzeug provides which is pretty handy. So yes, it's selfish to not break it up, but that's how I feel about the situation currently.
One thing I'd be interested in seeing is an overview of what the API for doing various things is in each of the libs.
— Alex Gaynor on Wednesday, August 5, 2009 14:58 #
@1: provide an example in Django, I'll port it to Werkzeug and WebOb :)
— Armin Ronacher on Wednesday, August 5, 2009 15:31 #
Hi Armin,
What are your reasons for comparing Django to Werkzeug and WebOb? I think CherryPY would be a better comparison because it is also used as a "base" framework to create other frameworks like Turbo Gears
Best
— Akira on Wednesday, August 5, 2009 15:34 #
@3: Maybe. But CherryPy's request object is as far as I can see even harder to use separately than the Django one.
The reason why I've chosen the Django one for comparison is that it's one that many people (are forced to) use.
— Armin Ronacher on Wednesday, August 5, 2009 15:41 #
I have compiled a list of differences between WebOb and different libraries here: pythonpaste.org/webob/differences.html -- but it doesn't include much information about features WebOb has that the others don't, only where they overlap in an incompatible way. Also the Werkzeug comparison in particular is out of date.
I also played around with what it would take to make WebOb compatible with Django requests; it's not tested, but it's an example: svn.pythonpaste.org/Paste/WebOb/branches/ianb-decorator-experiment/webob/django.py
— Ian Bicking on Wednesday, August 5, 2009 16:05 #
Now, as far as the post itself:
— Ian Bicking on Wednesday, August 5, 2009 16:29 #
This reminds me that I need to take a look at one of our SoC projects that's working on more interesting interoperability with vanilla WSGI applications; in theory that'll help the compatibility situation from Django somewhat.
— James Bennett on Wednesday, August 5, 2009 17:08 #
@Ian: I admit that WebOb does looks awesome. Especially take fake CGI body and everything. But the overhead there is also so incredible high that I'm not sure if it's really worth the hassle.
Please don't get me wrong, I like descriptor magic and everything, but for me it sounds like trying to get too much into WSGI.
I think these differences in the way things are approached are a reason multiple request/response objects will stick around, even if Werkzeug would suddenly go away.
The MultiDict in WebOb are indeed great, I just wish they were dict subclasses or at least newstyle classes. It itches me that I see <type 'instance'> every time I work with WebOb.
What I would like to do is to refactor the lower-level parsing tools in Werkzeug and make them independent of Werkzeug itself so that it would be possible to use them from both WebOb, Werkzeug and maybe Django as well.
— Armin Ronacher on Wednesday, August 5, 2009 20:05 #
I think the root of the problem is the WSGI specification: it's too low level. It would make sense to extend the specification and set a standard on how arguments (both POST and GET) are exposed in the environ dictionary (the same can be said about uploaded files). Currently both of these things are exposed, but their form is very low level and it would make sense to make it more high level. Solving this would solve a lot of the need for having a "request" object (and if one insisted, the request object would generally be a high-level API to the environ dictionary).
Generally thought, I think the WSGI specification is great, but again too low level at some points. It would have been great if the WSGI specification also specified a more high level handling of response (so it isn't just a function call like it is now, but a dictionary) - i.e. takes a dictionary in and returns a dictionary (and where the different fields of the in and out dictionary are specified).
— amix on Wednesday, August 5, 2009 21:56 #
I wouldn't pay too much attention to any proposals I have around WSGI. I don't write web applications so have no authority in the area. The reason I propose anything is to at least get people back on to looking at WSGI for Python 3.0 which just never seems to resolve itself. You think you are on the right track and you might get a resolution, but then you get side comments from people saying they aren't happy with it, but they leave it at that and don't elaborate on what they have a problem with. For example, the grumbles from some that they would prefer to see bytes used for WSGI environment values instead of proposed latin-1 but then don't go on to help explore the option properly to see whether it is actually practical. Similarly, the contention between whether response header values should be bytes or latin-1 or allow both with commensurate complexity in WSGI middleware to deal with case if both can be returned. Thus, everything is left hanging. It is quite frustrating and I have held up the release of mod_wsgi 3.0 for a number of months now hoping that the WSGI and Python 3.0 issue would be finalised. I am not going to pursue this for much longer. I will try and summarise the issues around use of latin-1 and bytes on Python WEB-SIG one more time and if that dies yet again I will officially state that I give up and leave it up to others. I will at the same time just disable all the Python 3.0 code in mod_wsgi as there seems little point in allowing people to use it if it is all going to have to be changed when any final decision is made.
As for extending WSGI as suggested by one. Please don't do that. As I understand it, WSGI was originally intended purely to be used at the interface between web application and underlying web server. It wasn't really intended to be used at higher levels, but once people saw what might be done with WSGI middleware that happened and now we are stuck with it at higher levels even though a separate higher level object interface may be more appropriate. If anything, I would like to see WSGI disappear from within the stack and a separate more appropriate interface adopted for that use.
— Graham Dumpleton on Wednesday, August 5, 2009 23:56 #
Python datetimes don't seem to handle timezones well and have a number of issues. This article
www.enricozini.org/2009/debian/using-python-datetime/
suggests using datetime in UTC only. Using UTC datetimes is definitely a win for Werkzeug.
— Ian Lewis on Thursday, August 6, 2009 9:26 #
yes, use utc only.
— brisbane on Saturday, August 8, 2009 21:41 #
Yes, the root of the problem is the WSGI specification. It was purposely made spartan due to the situation in 2005: interoperability between frameworks was as common as purple cheese, and people feared frameworks would refuse to adopt WSGI if it made too many high-level assumptions.
Now the situation is different. WSGI succeeded in its original goal of plugging any webapp into any webserver. Along the way it was discovered that an interaction protocol helps in a lot of other ways too, and makes generic middleware and application components possible.
So, WSGI purists don't like every application component under the sun being called "middleware". And middleware/application writers find the raw WSGI interface extremely cumbersome. That's why WebOb is so popular.
The vision of WebOb is important: an independent Request/Response pair that frameworks can build upon and middleware can use. I'd like to see a webob-based stack replace the WSGI stack for most cases. The Pypes project is working on something along those lines.
WebOb's advantage over the others is it was designed to be independent; it's not tied to any framework baggage. If some of WebOb's features are undesirable, perhaps the uncontroversial parts can be extracted to a base-webob package. And the only-for-Paste parts could be evaluated for deprecation.
We could also move to a new Requests/Response package inspired by webob. But that would raise havoc in the Pylons world, which has already changed Request/Response objects recently and is trying to settle down for a 1.0 release.
The idea that the request/environ should never be modified is bogus. The purpose of the environ is to pass information downstream, and that could be information originating in middleware. And why shouldn't a higher-level component be allowed to modify the request; e.g., to put it into a more standard format. Apache does that, and that's one reason why many people have an Apache front-end. It's not like these components are nefarious men-in-the-middle. They are components that the application has consented to use. (Insert Mark Ramm joke about consensual liaisons here.)
— Mike Orr on Friday, August 21, 2009 18:00 #
Orr: "We could also move to a new Requests/Response package inspired by webob. But that would raise havoc in the Pylons world, which has already changed Request/Response objects recently and is trying to settle down for a 1.0 release."
This seems to be a very desirable goal, perhaps even the most desirable among those available. Can someone get Ben Bangert to weigh in on behalf of Pylons? Perhaps "havoc" is too strong a word, if the desirability of this outcome can be communicated to the right folks and consensus reached quickly.
— Patrick on Monday, August 31, 2009 20:11 #