Python Code Introspection

Georg blogged about sphinx recently and now we can all rejoice because sphinx can now be used with custom documentation (non c-python documentation) too. Unfortunately hand written documentation alone is hard to maintain.

For Werkzeug 0.2 I want to switch to a combination of hand written documentation and source documentation generated from python sourcecode, powered by sphinx of course. The main problem is getting the information out of the sourcecode. Unlike compiled languages in Python a lot is dynamic and much of the functionality is only available at runtime. Unlike PHP objects can be modified later on, and unlike Ruby there are attributes and methods on classes so it’s hard to hide private/protected stuff on a class. epydoc came up with some conventions (especially for rst users) but introspection is still too hard in python and will lead to unexpected results.

A pretty common idom in python is to implement a class in one file and import it to somewhere else, where it will become importable from. One of the best example was Python’s “re” module. Until 2.5 that module was a stub and the actual functions were imported from the “sre” module. This becomes difficult if not impossible to get right for automatically generated modules. Pydoc for example requires __all__ to be defined in that module, otherwise it will hide those objects like it hides every other import.

Another example which is pain in the ass to get right by automatic introspection are function decorators and descriptors. Descriptors also break in pydoc half of the time and nobody seems to know when exactly it breaks. Function decorators are impossible to get right because even if you try to traverse the closure variables of all the wrapped functions up to the original function the result could be terrible wrong because one function in the middle is indeed variadic.

Because it’s that hard to get right I don’t want to lose time on solving something that’s unsolvable anyways. My current idea to make the documentation process a little less annoying I think about adding directives to sphinx that pull information from python objects into rst files in a semiautomatic way.

Say you have a pretty complex module that implements a couple of functions and classes and who knows what. Additionally to the documentation from the docstrings you also want to group the objects by topics and add hand written documentation for every group.

In the python code you document all your functions with nice rst docstrings with the epydoc conventions. Additionally you add group markers:

def escape(s, quote=False):
     """This function escapes strings for HTML documents.

     :param s: the string to escape
     :param quote: set to `True` to escape the quote character too.
     :return: escaped string
     :group: html-helpers
     """"

You can do that with every function and object. To dump the documentation for a group to an rst document you would then use a little directive:

HTML Helpers
============

These functions and classes help you process HTML.

.. autodoc:: group html-helpers [werkzeug.utils]

This will dump all the objects flagged with that group in alphabetical order to the rst document. But of course this breaks for complex use cases you should be able to step back. Imagine you have a pretty complex object and you want to hide some of the methods / attributes, mark missing variables or make a member a descriptor rather than a method if the automatic discovery failed:

Request
=======

summary for the request object goes here.

.. autodoc:: object werkzeug.wrappers.Request

    no_docstring: yes
    extra_members:
        - `_get_file_stream`

But how exactly that should work, I don’t have a good idea yet. But I want to find it quickly because right now I’m pretty unhappy because writing docs both in the sourcecode and in hand written documentation sucks ass.

Update: Fixed Georg’s name :-)

One Response to “Python Code Introspection”

  1. This proposal is certainly a good one. The exact syntax might be different (e.g. using directive options for “no_docstring” etc.). I hope it is easily implementable as a Sphinx extension.

    The combination of hand-written docs and docstrings where they make sense is something that ensures up-to-date documentation that is nice to read, not fragmented like many API docs.

    Comment by Georg — Monday, January 21st, 2008 @ 10:06 pm

Leave a Reply

cogitations driven by wordpress