The 1000% Speedup, or, the stdlib sucks
I hate the stdlib. There! I said it. Why do I hate it? Cookie, cgi, urllib, n'uff said. Today I can add another module to the “I hate you” list.
I noticed that Werkzeug's SharedDataMiddleware was just delivering 50 requests a second. I tried around a bit but nothing changed. Until I removed the call to mimetypes.guess_type. Suddenly I got 500 requests the second instead of the previous 50.
How badly implemented can that function be I was asking myself. So I had a look:
def guess_type(url, strict=True): init() return guess_type(url, strict)
WTF? And then it hit me. init() was loading the mimetype database and monkey patching the module so that guess_type() was replaced by a new function. Actually a method of the mimetype database.
And I was importing guess_type as follows:
from mimetypes import guess_type
So on every call the database was reloaded. GNAAAaa. So check out If you're doing the same mistake. If yes, change it to the following import:
import mimetypes
And now I file a ticket in the Python bug tracker ticket filed. -.-
Wow, for someone who hates the stdlib so much, I only see 3 bugs filed by you, and 10 you're +noisy on. I know you found a bug (and a bad one at that). But coming out and declaring the stdlib crap because of issues you have with "Cookie, cgi, and urllib" is annoying, especially as I don't see patches from you.
Maybe I missed them in the tracker.
— Jesse on Monday, March 2, 2009 0:42 #
Holy f'in lameness, Batman!
— Philip Jenvey on Monday, March 2, 2009 1:00 #
Next step: write your own request header parser. The stdlib one does all sorts of email- and other general-MIME-parsing you don't need for HTTP; steal CherryPy's if you want. You'll see a big speedup.
— Robert Brewer on Monday, March 2, 2009 1:25 #
Why do you hate urllib?
— Victor Noagbodji on Monday, March 2, 2009 2:12 #
It's interesting that the module actually started out like the way your patch proposes to change it. It didn't become stupid until r22194.
— Benjamin Peterson on Monday, March 2, 2009 3:26 #
@3: I just recently kicked out the standard libraries multipart parser (aka cgi.FieldStorage). The total gain was a much nicer interface (no more magic), being able to decide to stream to the memory or filesystem based on the size of the upload, WSGI compliance (does not require a sized readline etc.)
So far I haven't regretted anything by ditching more and more of the standard library.
@4: up to 2.6 there was no way to specify a timeout for requests. There just was the defaulttimeout on the sockets. Of course modifying that is ugly, not thread safe and deeply dangerous if there is concurrency involved. And the implementation of that module is so inflexible that you couldn't add timeout support by subclassing.
I added up rewriting the urllib module for the latest version of Zine with a much smaller interface.
@1: The problem is that many modules are in an incredible bad state where the only possible solution would be rewriting it. And a rewrite breaks existing code that works around bugs in that.
One of the code in Werkzeug that begs for being rewritten without relying on standard library bugs is the cookie support: dev.pocoo.org/projects/werkzeug/browser/werkzeug/_internal.py?rev=f1d3a357cb38#L240
That one even goes as far as overriding a mangled method.
So why don't a file bugs? Here you have your reason. Besides. These bugs are discovered in real-world application which need the fix much faster than Python is released.
Which is why I'm a strong supporter of the idea that Python comes with just one battery which are collections, abstract base classes etc. And everything else is downloadable from the PyPI.
@5: Wow. That's unexpected.
— Armin Ronacher on Monday, March 2, 2009 7:25 #
hmm.. i didn't even know there is such a difference between "from x import y" and "import x".
— dafire on Monday, March 2, 2009 10:18 #
@7: There shouldn't. Normally there is no difference. However like everything in Python you can change that of course.
— Armin Ronacher on Monday, March 2, 2009 11:48 #
Sorry Armin, I think that:
1> You're smart enough to know the stdlib isn't going to get thrown out because you don't like the code, given that it helps plenty of people get things done.
2> As for the fix time, again - while you need a fix now, why not push that fix into python-core rather than not? Python is your language as much as it is mine, or any other person's - help fix what broken, or work on enhancement.
3> You're smart, we all know this, and so asking for you to contribute to the language and improvement of the standard lib isn't silly. Instead, you're sitting there calling it stupid, which isn't very becoming.
So, given the bug you opened was already fixed, why don't you file some more since you know that the stdlib is totally broken?
— Jesse on Monday, March 2, 2009 12:55 #
@9: I know. I could have been changed from 2.x to 3.x but that didn't happen.
Also about a fix: The fix from the blog post in question can't be applied "to python". It has to be applied to the code using mimetypes. I however did provide a Patch for the bug tracker and it was quickly committed.
I do contribute to the language as best as I can. But the problem with fixing those bugs was outlined in an earlier reply: Fixing the bugs would often break existing code working around those problems.
— Armin Ronacher on Monday, March 2, 2009 13:21 #
dev.pocoo.org/projects/zine/browser/zine/utils/net.py for the urllib replacement, perhaps you can do a quick post about the why, and what it fixes?
— Ali on Monday, March 2, 2009 13:57 #
@6 cgi.fieldstorage is evil!
— Jason Baker on Monday, March 2, 2009 14:23 #
@Armin I realize fixing some of these issues may break existing apps, but that's not an impossible slope. Deprecate in one rev, fix in another. I personally feel if it's broken, or grossly incorrect, current apps be damned, we need to fix it.
All I'm asking for is patches and fixes for the stdlib, which isn't going away. You're code is pretty darned good, and I think what you've made great - I just want to see work towards fixing things, and less sensationalist posts that can help all of us both realize what's broken, why, and how to fix it.
It's better to be constructive and do what we can than throwing our hands up and saying "this sucks let's go shopping".
— Jesse Noller on Monday, March 2, 2009 15:50 #
@13: If something is deeply flawed / problematic I'll file a bug. Generally speaking I prefer to not use the standard library and build on third party libraries which adapt a lot quicker.
I'll blog about that issue so that I can explain myself a bit better.
— Armin Ronacher on Monday, March 2, 2009 17:53 #
Thanks for finding this armin. I checked the TG2 source code for this, but it appears someone else had already fixed it. This mimetype behavior is totally lame.
— Chris Perkins on Friday, March 6, 2009 19:37 #
@15, hey Chris I guess I'm the 3th guy to check the TG2 code after reading this :)
Thanks Armin for the fix.
Another REALLY annoying thing about the stdlib's mimetypes module is that it isn't platform independent I don't have the line at hand but it's db is build with some "known" files which may yield different results even on different versions of *NIX (we have a lovely time getting TG2 stuff to work on linux because it was coded in macos, and then had a hard time getting it fix for archlinux, without killing the first two)
— Jorge Vargas on Wednesday, March 11, 2009 7:16 #
We've found few bad places in stdlib too.
For example, BaseHTTPServer opens socket on __init__() so it's untrivial to create unit tests that avoid network.
— Sergey Shepelev on Monday, March 16, 2009 14:00 #
Checked the latest v 2.7, found quite improved
— Wajahat on Sunday, March 7, 2010 3:12 #