Archive for September, 2007

Pushing Apache Performance

September 30th, 2007

You might know that pocoo.org had a an interesting availability, far below 99%. The problem we have had is that all pocoo services (and that are quite a lot) are hosted on a XEN instance with less less than 400MB RAM. With two different database servers (MySQL and postgres), hgweb, 7 trac instances, one pastebin, one dynamic project webpage, a blog and a bunch of static webpages apache loved to eat more memory than being available thus swapping.

Although there would certainly a possibility to get a better server or allocate more memory for the XEN instance it’s possible to reconfigure things so that you can still use them. Here is what we did one week ago. We managed it to have a load of around 0.1 - 1.5 since a week, and a memory usage of around 200MB. Before that our load was between 2 and 50, depending on the time of the last apache restart ;-)

Step one: switch to worker. Apache prefork is certainly the worst thing you can do if you have few memory in your server. The reason for this is that prefork forks. While I agree that threads are problematic in many terms (especially PHP) they are the best way to keep your memory usage low. But worker means that you cannot use mod_php4/5. Because TextPress is still not ready I needed PHP support, so I’ve chosen to host it as CGI, more later.

The worker MPM has quite a lot of configuration parameters, and to be fair, I have no idea how the work in detail. The basics are covered in the apache documentation. The configuration values that worked best for us where these:


ServerLimit            5
StartServers           1
MaxClients            30
MinSpareThreads        5
MaxSpareThreads       30
ThreadsPerChild       15
MaxRequestsPerChild  500

These settings start between 1 and 5 server processes (usually two because of MaxClients) where each of them handles 15 threads. After 500 requests the dispatcher is restarted. The latter is a very good idea because nearly every application leaks memory.

Now because I lost my blog a second after installing the worker module that was the next thing I changed. With version 5 onwards PHP is usually compiled as cgi/fastcgi binary so ready to rumble if you have mod_fastcgi. We do, but I have disabled it together with all unused modules. Although FastCGI is a lot faster than CGI and a really good idea if you expect many hits I’ve chosen CGI for the blog to not have a persistent PHP interpreter in the memory. Sadly this blog has the fewest hits of all our sites so until more visitors come here CGI is the way to go. If you are too lazy too look up the configuration, here is ours:

AddType application/x-httpd-php .php
Action application/x-httpd-php /php5-cgi

It’s that simple. If you want to switch to fastcgi, all you have to do is to enable the FastCGI module and enable it for /php5-cgi.

But PHP will disappear on that server sooner or later anyways, so let’s continue with speeding up the snakes. There the answer is simple: If you have an apache, use mod_wsgi. And then it depends on your applications. If you have different applications that do not interfere with each other and are threadsafe put them into the same process group and enable threading. Otherwise put them into different groups and use threading if they are thread safe. And let mod_wsgi kill processes after a thousand requests. The mod_wsgi wiki is full of examples, you will find all there.

The last step is making apache as such faster. Remove all .htaccess files and put their contents into the apache configs. Then set AllowOverride to None so that apache doesn’t test for them any more. Set down your connection timeout to say 150 seconds, the number of MaxKeepAliveRequests down to something lower like 100 and the KeepAliveTimeout to 3. That might be annoying for people with slow network connections but it gives the apache the opportunity to mark worker threads as available sooner. Enable logrotation, log fewer stuff and do not log hostnames (check if HostnameLookup is set to Off).

That being said the only thing that is left is playing around with some of the configuration values of your database servers and other running processes that might consume memory.

Here the result as a fancy graph:
swap status.

On Sandboxing Genshi

September 26th, 2007

One of the big advantages of django templates is that they are sandboxed. And the django sandbox is pretty secure because templates provide nearly no possibility to screw things up, especially because there is no way to put logic into django templates. The only possibility you have to add a security problem to django templates is writing broken template tags or stuff like that. Now Jinja has real expressions and with them the possibility to screw things up. But the Jinja core itself restricts access to python internals and as long as you keep your objects save nobody will be able to execute arbitrary python code in a template or access the filesystem.

Some time ago Christopher Lenz blogged about logic in templates, and one of the things he stated there was the following sentence about Genshi:

But even though I personally prefer working with a template language that allows me to use a real programming language (Python, Ruby, etc), there is definitely room for template languages that put severe restrictions on what you can do with them. An obvious example is that you’re running a site such as Typepad, and want to allow users to manage their own custom templates. As things currently stand, you wouldn’t be able to do that using Genshi.

Just because Genshi doesn’t support it by now this doesn’t mean it will stay like that. In fact the great architecture of Genshi makes sandboxing Genshi quite simple. To see how secure we can get Genshi I started a Genshi branch today. Maybe you can use Genshi soon for user provided templates :-)

Other notice, I got quite a few mails that my blog is/was broken. As you might now we have had some problems with the load of our server the last months. To resolve that problem i tweaked the apache the last two days and while doing that various services behaved strangely or caused problems. By now everything should work again.

Where is Dawn of Ubuntu?

September 21st, 2007

And one more time I have to find my wallpaper on an ubuntu screenshot with a message like this below:

I’m running Ubuntu 7.04 (Feisty Fawn) with Compiz Fusion version 0.5.5 at the time of this screenshot. Wallpaper is the Dawn of Ubuntu wallpaper that for some reason wasn’t included in Feisty, but if you can find it on Flickr or Google Images if you like it.

So why is that wallpaper not in ubuntu any more? The reason is a misunderstanding and my own personal lack of license knowledge. I released it with a limited CC license which was not compatible with the ubuntu license, just that nobody noticed till (edgy? feisty?) no idea. Anyhow. If someone is missing that wallpaper, you can find it in the art section. And if one wants to have that in ubuntu again, file a bug in launchpad. I will do my best then to resolve that licensing issue.

Great Cover Versions

AK blogged about it some time ago, now I have to do it too. I bought the new apocalyptica album worlds collide and one of the tracks is called “Helden”, the singer for that particular track is Till Lindemann of Rammstein. Believe it or not, that’s a German cover of David Bowies “heroes”. And darn, I hate Bowie, I hate “heroes”, but the apocalyptica cover is awesome. And one of the best covers I’ve ever heard.

Waiting in Line for…

…for what? Being the first one that discovers the hidden bugs and limitations before anyone else does? There was only one product in my life I bought on the Launchday. And that way a Nintendo DS which I bought on the same day as two of my best friends. And now I think even that was a bad decision because until the DS Lite came out I only owned two games. If I would have waited until then I would have had the new revision and saved some money too.

Why do I say that? Because Bill Maher got it exactly right some days ago, even if people disagree. There is absolutely no need to buy things on the first day. Wait for the other fools that do so. Let them find out if said product really does what the company says in the advertisements. Or try it a week after launch in a store nearby, which is the even better idea. And if you wait for longer than half a year you can then find out if you really, really wanted it. If you don’t even think about it any more there was no need in buying. That saves money and spares nerves. And on the long run you will be happier too, I promise.

Using CleverCSS in Django

September 17th, 2007

A few minutes ago I pushed CleverCSS to the cheeseshop. So far there is no framework support, once again you have to hack up the bridge yourself, but in this post I’ll show a simple way to get CleverCSS running in django and simplify your CSS files.

The preferred way to use CleverCSS is compiling the clevercss files into css, and not doing that dynamically. But because during development this can be annoying, if you have to recompile your files by hand all the time, I suggest using a view that serves those files dynamically during development. In production usage you just overlay the view URL with an apache directive etc. that points to the folder with the static css files in.

Here the view function, you can store it directly in your urls.py because it’s that small. Otherwise move it into a application responsible for static stuff:

import clevercss
from os import path
from django.http import HttpResponse, Http404

def serve_ccss_file(request, filename):
    fn = path.join(path.dirname(__file__), path.pardir, 'styles', filename + '.dcss')
    if not path.exists(fn):
        raise Http404()
    f = file(fn):
    try:
        css = clevercss.convert(f.read().decode('utf-8'))
        return HttpResponse(css, mimetype='text/css')
    finally:
        f.close()

Now you only have to create an URL rule for that:

from django.conf.urls.defaults import *

urlpatterns = patterns('',
    (r'style/([a-zA-Z_]+).css', serve_ccss_file)
)

All you clevercss files now should go to yourproject/styles, as filename.ccss. If you enter production mode you just open your shell, go to that directory and execute clevercss.py *. (That requires that you have symlinked the clevercss.py file into your PATH.)

To test that, just drop a test.ccss file with the following contents in your styles folder and open http://localhost:8080/styles/test.css:

link_color = red

body:
  background-color: white.darken(10)
  font-family: 'Lucida Sans', Verdana, sans-serif;

a:
  color: $link_color
  &:hover:
    color: $link_color.brighten(10)
  &:visited:
    color: $link_color.darken(20)

Steal, and steal and keep on stealing

I must admit that I’m not the biggest Nine Inch Nails fan, but some of their songs are really good. A friend of mine told me that Nine Inch Nails are now very angry about us Austrians because of this years Frequency. Guess not necessarily the crowd was to blame (well, it was somehow), but who in god’s name puts NIN right between Beatstakes and Die Ärzte? Well, we probably will never find out, but that was a very bad idea. Totally different types of music.

But the main reason why I write this post and why the headline looks that flashy, is this video: NIN live in Sydney. And the words at the beginning are (without a joke):

I called them [my record label] out for being greedy fucking assholes. I didn’t get a chance to check, has the price come down at all? I see a no, a no… Has anyone seen the price come down? Okay, well, you know what that means - Steal it!. Steal away. Steal and steal and steal some more and give it to all your friends and keep on stealing. Because one way or another these motherfuckers will get it through their head that they’re ripping people off and that that’s not right.

Trent Reznor said something similar a year ago, afair in a magazine, about the ridiculous prices down in Australia, and according to him that’s because his record label thinks: “that there are great fans there, which are willed to pay more money”.

Now his message doesn’t mean that you should start stealing all kinds of music, all around the world. As far as I understand this Trent is very, very angry about his recording company and their insane policies. The problem is that nowadays the labels make the money, not so much the artists. And then there are some creepy organizations that do stuff like suing moms for listening to illegally downloaded “Gangsta Rap”, want to get their underage daughters to curt to say out against them.

The music business is without a doubt, completely fucked up. And I don’t think that will become any better, even worse. Now where we all start buying compressed music online they don’t have to spend any money on printing or booklets. And the bad sound quality (yes, I *can* hear the difference) and the missing booklets are the reason why I still go to my local music store and buy CDs.

Maybe some more, especially independent artists, should stand up and start publishing their music on their own. Hopefully something in that broken system improves the next years… I really hope so.

Speaking of music, listen to some more Pure Reason Revolution.

Working hard on a bad Image huh?

Oh apple, what are you doing. Yes, I’m quite mad about you recently. Additionally to the funny problems I had last week I now have two more. The scroll “wheel” of my myghty mouse broke and my TFT thinks contrast is for sissies. The contrast problem is btw fixable by plugging in an external monitor and rotating it clockwise and counter clockwise from the display control panel.

But what really, really makes me mad is how closed their stack is. Although their operating system is based on an open source kernel nearly all of the other stuff is closed like nothing else. I was a happy iPod user the last years but I will be the last one i buy. Now where they have a checksum in their stupid binary crap library they call iTunes DB which makes it impossible to access it with other tools I’m just not interested in it any more…

CleverCSS

I was working on TextPress the last days, (I got motivated by the fact that WordPress has got security problems once more) and had to notice that CSS could need some variables. I wanted to write a simple preprocessor for css that inserts variables but ended up writing a small parser that accepts indented CSS code with inline expressions.

Basically what it does is converting this:

// Single Line Comment
foo = 4px
font_family = 'Verdana', sans-serif
base_size = 0.9em

/*
   this is a multiline comment
 */
body:
    padding: $foo * 4
    font ->
        family: $font_family
        size: $base_size + 0.2em

a.foo:
    background-image: url(foo.png)
    span:
        display: none

Into this:

body {
  padding: 16px;
  font-family: Verdana, sans-serif;
  font-size: 1.1em;
}

a.foo {
  background-image: url(foo.png);
}

a.foo span {
  display: none;
}

The advantage might not be visible in that small example but consider complex layouts etc. One thing I would love to add is some support for layout extending. Say you have a base layout and you can say @extends(’layout.ccss’) and would get all the layout informations from the layout file. But I don’t know how useful this is.

Right now the conversion process is quite slow, but I wouldn’t generate such stylesheets on request. It’s a much better idea to do generate them from a script. During development one could still generate them automatically.

The module is available in the sandbox hg repo: clevercss.py. Just call clevercss.convert and pass it the CleverCSS markup.

Things that don’t work yet are unit conversions. For example you cannot do “1cm + 11mm”.

Multi Trac / Django Hosting with mod_wsgi

September 12th, 2007

As you might now we switched the pocoo trac to mercurial, mod_wsgi and splitted it in the same go. The new structure is can be found at dev.pocoo.org. What you cannot see there is how all that is implemented. And I tell you. It’s dead simple.

Basically we use mod_wsgi for hosting the tracs. There are many reasons for that but the most important one is that you can host multiple trac instances without much configuration. Basically the configuration binds a wsgi application to a URL match rule. One important thing is the maximum-requests setting. To understand this parameter you have to know that mod_wsgi does not only keep a pool of running python interpreters, but also your application with all data in the memory. Now that’s a big difference to mod_php where your application is sourced on request and removed from the memory after the reqest. So basically you cannot create memory holes, which you can do in python. If your application leaks memory (and trac tends to do so) you can tell mod_wsgi to restart one python interpreter after 500 requests for example. That setting of course depends on your trac version, the number of plugins etc. And especially how many memory you have in your application. Here the Apache config:

<VirtualHost *:80>
    ServerName dev.example.org
    RewriteEngine On

    WSGIDaemonProcess tracs threads=10 maximum-requests=500

    RewriteCond %{REQUEST_URI} ^/([a-z_]+)
    RewriteRule . - [E=TRAC_ID:%1]
    WSGIScriptAliasMatch ^/([a-z_]+) /var/trac/trac.wsgi

    <Directory /var/trac>
        WSGIProcessGroup tracs
        WSGIApplicationGroup %{GLOBAL}
        Order deny,allow
        Allow from all
    </Directory>

    <LocationMatch /([a-z_]+)/login>
        AuthType "Basic"
        AuthName "Trac Instances Login"
        AuthUserFile /var/trac/users
        Require valid-user
    </LocationMatch>
</VirtualHost>

Then we need a trac.wsgi file which is basically just a minimal WSGI application that dispatches our key. Say we have our trac instances in /var/trac/instances, every trac in it’s own folder. Then we can use this code:

#!/usr/bin/python
from os import environ, path
from trac.web.main import dispatch_request

def application(environ, start_response):
    trac_path = path.join('/var/trac/instances', environ['TRAC_ID'])
    if path.exists(trac_path):
        environ['trac.env_path'] = trac_path
        return dispatch_request(environ, start_response)
    start_response('404 NOT FOUND', [('Content-Type', 'text/plain')])
    return ['Not Found']

You can of course modify that not found message, maybe render a fancy HTML page or just redirect to the index of that domain or whatever. The important thing is that you set the path to the trac instance before calling the dispatch_request function. In theory you can do that from the apache config too, but in that situation you cannot check if the trac really exists.

And now about the django hosting part. Basically you can do the same for django. Django basically has a environment key called the DJANGO_SETTINGS_MODULE key. This key basically controls what settings module django will import. Unfortunately the whole django core is not process safe, so you cannot run two different django powered applications in the same python interpreter. This however is not that much of an issue with mod_wsgi, because you can tell mod_wsgi to not share the interpreter. (In the trac config above we shared the interpreter to save some memory)

Your config could look like this:

<VirtualHost *:80>
    ServerName www.example.org
    WSGIDaemonProcess django_app1 threads=10 maximum-requests=5000
    WSGIScriptAlias /app1 /var/www/django_app1.wsgi
    WSGIDaemonProcess django_app2 threads=10 maximum-requests=5000
    WSGIScriptAlias /app2 /var/www/django_app2.wsgi

    <Location /app1>
        WSGIProcessGroup django_app1
        WSGIApplicationGroup %{GLOBAL}
    </Location>
    <Location /app2>
        WSGIProcessGroup django_app2
        WSGIApplicationGroup %{GLOBAL}
    </Location>
</VirtualHost>

The actual “django_appX.wsgi” file is very, very simple. It just adds the folder and instanciates the django wsgi app:

#!/usr/bin/python
import sys, os
sys.path.insert(0, '/path/to/django_appX')
os.environ['DJANGO_SETTINGS_MODULE'] = 'django_appX.settings'

from django.core.handlers.wsgi import WSGIHandler
application = WSGIHandler()

Hope I could help a little bit, if you have some questions to our server setup just send me a mail or write a comment. Finally, we first encountered some problems with mod_wsgi two months ago, but at the moment everything is working well, a lot better than any other server setup we used. You can even put python applications into the context of another user which basically replaces fastcgi + suexec.
And btw, the support we got from Graham is really, really good :)

Update: removed WSGIPassAuthorization like Graham suggested in the comments below.

cogitations driven by wordpress