convert a Request.write() into a WSGI yield
I tried that for a long time now using python generators with yield and send threads and much more. But I never got anything that looked easy to understand and worked at the same time. The problem basically occurs if you have an old python web application that has some sort of request object with a write method that directly writes to the output stream of the server interface (sys.stdout or some sort of fastcgi/mod_python output stream object) and you want to convert the application to WSGI. Take the following piece of code from an imaginary legacy application:
def old_application(request):
from time import sleep
request.header('Content-Type: text/html')
for x in xrange(10):
request.write(str(x) + ' ')
request.flush()
request.write('<br>And the next flush takes another second<br>')
request.flush()
sleep(1)
request.write('And done!')
Say we want to convert that into a WSGI application with those semantics:
def wsgi_application(environ, start_response):
from time import sleep
start_response('200 OK', [('Content-Type', 'text/html'])
yield ' '.join(str(x) for x in xrange(10)) + ' '
yield '<br>And the next flush takes another second<br>'
time.sleep(1)
yield 'And Done!'
The main problem is the time.sleep and all those flush calls. That means we cannot just buffer the contents but convert the into a generator on the fly. What we need to get that done are either coroutines, greenlets or two threads that communicate with each other. The easiest and I guess also fastest approach are greenlets.
Here a function that converts an old legacy application like above into a WSGI application with the same semantics:
from py.magic import greenlet
class Request(object):
def __init__(self, environ):
self._parent = greenlet.getcurrent()
self.environ = environ
self.status = '200 OK'
self.headers = []
def header(self, item):
self.headers.append(tuple(item.split(':', 1)))
def write(self, text):
self._parent.switch(('write', text))
def flush(self):
self._parent.switch(('flush', None))
def convert_app(application):
def wsgi_app(environ, start_response):
request = Request(environ)
buffer = []
headers_sent = []
def flush():
if not headers_sent:
start_response(request.status, request.headers)
headers_sent.append(True)
data = ''.join(buffer)
if data:
yield data
del buffer[:]
def run():
application(request)
request.flush()
g = greenlet(run, request._parent)
while 1:
rv = g.switch()
if not rv:
break
signal, value = rv
if signal == 'flush':
for item in flush():
yield item
elif signal == 'write':
buffer.append(value)
return wsgi_app
Now that’s a bunch of code. Let’s go step by step through it. The first thing we do is creating a request class. This class should resemble the old request object as much as possible. All methods can work like they did before, the only differences are the write and flush methods. Those switch back to the parent greenlet (which is the greenlet that generated the request object, usually the main greenlet) and send some data to it (namely the name of the method and the argument). Whenever python encounters this statement it stops the execution and goes back to the point that switched into this greenlet. This point is in our example in a loop that generates a generator for our WSGI application.
That leads us to the convert_app function that is passed and old legacy application and returns a new WSGI application. Inside this new WSGI application we create a new request object, pass it the WSGI environment and create some objects and functions we need so that we can process the data from the greenlets and convert it into a valid WSGI response: A buffer for unsent data, a list we use a sentinel for sent data, a flush method that returns a generator with the data from the buffer, starts the response and cleans the buffer, and a run method that invokes the old application and calls request.flush() after the application has finished so that we don’t have to do that in the application itself.
The mainloop after that basically switches between application greenlet and main greenlet until the return value of switch is None (that is the case if the application closed or someone switched into the main greenlet without arguments, which we don’t do). If that is the case we return, otherwise we check if it’s a flush or write call and handle that.
To launch the converted application with wsgiref all we have to do are those three lines of code:
from wsgiref.simple_server import make_server
srv = make_server('localhost', 5000, convert_app(old_application))
srv.serve_forever()
Basically greenlets would make it possible to host mod_python applications inside arbitrary WSGI servers. Maybe in the future someone writes a module that allows us to convert some of the mod_python applications into WSGI applications without touching existing application code.
I’d thought about writing a compatibility layer that would allow mod_python content handlers to run on top of mod_wsgi. It probably wouldn’t be a true WSGI application and thus not portable to other WSGI hosting solutions, as to properly emulate mod_python you need to delve into some of the Apache internals. The idea here was to use the SWIG bindings for the Apache APIs I had been working on. The more I thought about it the more I disliked the idea. The main problem is that stuff like mod_python.publisher is broken in various ways and I didn’t like the idea of propagating those mistakes or design problems. The request object interface has also grown organically over time and isn’t necessarily consistent. The sort of person I am, I would want to fix such problems, but then by doing so various peoples code probably wouldn’t work anymore as they are likely to depend on the suspect behaviour.
Anyway, got enough on my plate for a while before I could even think about new projects. When I get past those then I might fish around for ideas of what people might like to see me work on next. This greenlet stuff sounds interesting though. One of the issues with a WSGI 2.0 was that it would be hard to emulate the WSGI 1.0 write() interface in top of it. Using greenlets may make that easier as the concept of what is required isn’t much different to what you have demonstrated above.
Comment by Graham Dumpleton — Monday, November 19th, 2007 @ 4:36 am