Armin Ronacher

SQLAlchemy “Patterns” in Zine

written by Armin Ronacher, on Friday, January 2, 2009 22:32.

Now that the biggest issues with Zine are resolved, the 0.1.2 code is nearly finished, some copyright annotation problems are about to be fixed as well I finally have some time to blog about some cool internal parts of Zine. I want to start with some neat things you can do with SQLAlchemy you might miss if you are not looking closely.

A lot of the things we are doing in Zine are only possible thanks to some changes done in SQLAlchemy 0.5, so you don't want to miss them.

Custom Query Classes

Ever used Django? If yes, you might remember the database manager objects (commonly called objects) on models:

>>> from yourproject.app.models import Entry
>>> Entry.objects.filter(pub_date__gt=yesterday).count()
3

Database managers work a bit like query sets which allow you to refine queries before they are send to the database. Unlike query sets you can subclass them and add methods to send common queries. In SQLAlchemy we were unable to do that until 0.5, but now we got it even better than the Django guys :) Because SQLAlchemy's query objects return itself and even more important: they are available on relations as well and you can explicitly fire the query.

So how do we use custom query objects in Zine? Have a look at the following query we use on the blog archive page:

data = category.posts.theme_lightweight('category') \
               .published().get_list(page=page, per_page=per_page,
                                     endpoint='blog/show_category',
                                     url_args=dict(slug=slug))

category obviously is an instance of a class mapped to a database table by using SQLAlchemy. The posts attribute on it is a dynamic_loader with a custom query class. A dynamic_loader works like a regular relation just that the relation is not loaded automatically into a list but that it's a query object you can use to send a query. Now all the methods on this query object we are using in the snippet above are extensions to regular SQLAlchemy queries:

theme_lightweight()
This method returns a query with different options. It looks into the theme settings and disables the automatic loading of columns and relations. For example a theme might not need the comment count or the author in the category overview page, so there is no need to load it from the database. I'll show you how this is implemented a little bit later.
published()
This method filters out all unpublished posts and all posts that are published but still scheduled. In Zine if you set the publication date into the future, the post won't appear until the publication date passed.
get_list()
This method is a convenience function that does a lot of things. It returns a dict with pagination and a slice of the result set for the given page. endpoint and url_args are helpers used by the Werkzeug URL generation system and used to build the links for the pagination.

So how is this implemented? The key to custom query objects is subclassing and the query_property method on a scoped query object. This is how it looks like in Zine, without the actual implementation:

from datetime import datetime
from sqlalchemy import Table
from sqlalchemy.orm import Query, mapper, dynamic_loader, relation, \
     scoped_session, sessionmaker

Session = scoped_session(sessionmaker())

categories = Table('categories', ...)
posts = Table('posts', ...)
post_categories = Table('post_categories', ...)

class PostQuery(Query):
    """A custom subclass of queries."""
    def published(self):
        return self.filter(
            (Post.published == True) &
            (Post.pub_date >= datetime.utcnow())
        )

class Category(object):
    # add the standard query object to categories
    query = Session.query_property(Query)

class Post(object):
    # but use our PostQuery for posts
    query = Session.query_property(PostQuery)


# map the tables to the classes, and make sure that the
# category/post relation uses a dynamic_loader with the
# correct query class (PostQuery).
mapper(Category, categories, properties=dict(
    posts=dynamic_loader(Post, secondary=post_categories,
                         query_class=PostQuery)
), order_by=categories.c.name)
mapper(Post, posts, properties=dict(
    categories=relation(Category, secondary=post_categories, lazy=False)
))

Query Options

Another very cool feature in SQLAlchemy are query options. Per default SQLAlchemy tries to load a lot of stuff so that it doesn't have to send subqueries. You can control that behaviour when mapping a relation as documented. However you can also control this on a per-query basis. And this is pretty cool because quite often your models are used in different situations differently but they should still behave the same. Let me give you an example: in Zine a post object is both used to display the actual post with comments and all that belongs to it, but also on an overview page where 50 or more post headlines should be displayed. In Django I would have solved that probably by using raw SQL or by using .values(). In SQLAlchemy we can do something better :) Watch that:

from sqlalchemy.orm import defer, lazyload
posts = Post.query.options(defer('text'), lazyload('author')).limit(50).all()

That loads up to 50 posts from the database, but the text column is not loaded and the author relation is not loaded either. Now you may wonder why it's called defer and lazyload and I talked about not loading at all. SQLAlchemy won't load it until you access the attribute. So if you are not accessing the attribute no loading takes place. So just don't access it and you're fine. Which is a pretty cool decision how to handle that, because the object really still feels the same. Just that it's optimized differently.

Now if combined with custom query objects you could create a method that optimizes queries for overview pages:

Post.query.optimize_overview().limit(50)

All the details about lazyload, defer and friends is explained in the Loading Strategies and Deferred Column Loading sections of the SQLAlchemy documentation.

Disable Autoflushing Temporarily

I configure my mappers using the scoped session and to automatically save an object when it's initialized (__init__). Now this is neat, but this sometimes forces you to temporarily disable flushing if you need a database query to fill in a column in the constructor. In Zine this for example happens when we generate a slug. If the post is not passed a slug we generate one based on the title. And if that slug is in use we are incrementing the string (by adding numbers) until a free one is found. Now of course we must not commit our newly created Post that does not yet have a slug. So we temporarily disable flushing to the database:

class Post(object):
    def __init__(self, title, ...):
        self.title = title
        if slug is None:
            slug = generate_slug()
            while Post.query.autoflush(False).filter_by(slug=slug) \
                      .limit(1).count():
                slug = increment_string(slug)
        self.slug = slug

query.autoflush(False) disables autoflushing just for this single query.

Association Proxies

The last thing I want to share are association proxies. For that let me show you a interactive Python session* first:

>>> me = User.query.first()
>>> me.privileges
set([<Privilege 'BLOG_ADMIN'>])
>>> me.privileges.add(MODERATE_COMMENTS)
>>> me.privileges
set([<Privilege 'BLOG_ADMIN'>, <Privilege 'MODERATE_COMMENTS'>])

* this is actually a mockup. The actual Zine implementation is a bit more complex and supports privilege inheritance as well, but that is something SQLAlchemy unrelated.

What's so awesome about that? These privilege objects are singletons from the Zine privilege system. They are used to perform privilege checks (if the current user is allowed to moderate comments etc.). They are implemented in a part of the system and totally separated from the database. But the privileges are of course somehow stored in the database. So how are these two systems linked together? The answer is a hidden gem in SQLAlchemy called the “Association Proxy”.

Let's have a look at the User model first:

users = Table('users')
privileges = Table('privileges')
user_privileges = Table('user_privileges')

class User(object):
    privileges = association_proxy('_privileges', 'privilege_object',
                                   create_privilege)

db.mapper(User, users, properties=dict(
     _privileges=relation(_Privilege, secondary=user_privileges,
                          collection_class=set, cascade='all, delete')
))

Looks simple. So what does it do? It says privileges is an association proxy to the low-level _privileges attribute of the class and we are actually dispatching to the privilege_object in our set. Because _privileges is actually a set of _Privilege objects, each of which has an privilege_object attribute. If a privilege was not yet in the database it's created using create_privilege.

Confused? Here's the rest of the code, then we'll clear things up:

from sqlalchemy.ext.associationproxy import association_proxy

class Privilege(object):
    """Class for the privilege singletons."""
    def __init__(self, name, description):
        self.name = name
        self.description = description
    def __repr__(self):
        return '<%s %r>' % (type(self).__name__, self.name)

BLOG_ADMIN = Privilege('BLOG_ADMIN', 'can administer the blog')
MODERATE_COMMENTS = Privilege('MODERATE_COMMENTS', 'can moderate the comments')

PRIVILEGE_SINGLETONS = {
    'BLOG_ADMIN':         BLOG_ADMIN,
    'MODERATE_COMMENTS':  MODERATE_COMMENTS
}

class _Privilege(object):
    """Internal throw-away class used for the association proxy."""
    def __init__(self, name):
        self.name = name
    @property
    def privilege_object(self):
        return PRIVILEGE_SINGLETONS.get(self.name)

def create_privilege(privilege):
    if not isinstance(privilege, Privilege):
        raise TypeError('%r is not a privilege object' %
                        type(privilege).__name__)
    priv = _Privilege.query.filter_by(name=privilege.name).first()
    if priv is None:
        priv = _Privilege(privilege.name)
    return priv

Back to our interactive shell from above. If we look at the internal _privileges attribute we can see an instrumented set of internal _Privilege objects:

>>> me._privileges
InstrumentedSet([<app.models._Privilege object at 0x16bc830>,
                 <app.models._Privilege object at 0x16bc710>])

Each of which has a privilege_object attribute:

>>> [x.privilege_object for x in me._privileges]
[<Privilege 'BLOG_ADMIN'>, <Privilege 'MODERATE_COMMENTS'>]

And this is pretty much what privileges is now. Just that privileges supports modifications and handles all the database magic in the background. (Like inserting new privilege rows into the privileges table or inserting/deleting from the user_privileges table.)

More Gems

There are tons of more useful gems in the SQLAlchemy system like custom collection classes (Ever wanted to have a collection that actually was a dict?) and much more. It's really worth reading the documentation and not just looking up stuff there when you encounter problems. You could miss tons of cool stuff.

Also, zzzeek just said that we can expect 0.5 today or tomorrow. So you don't have to have a bad feeling for toying with 0.5 even though it's not yet released :)

Comments

  1. Excellent post. This sort of "real world" implementation is incredibly cool and very useful. Keep up the great work.

    —  Empty on Saturday, January 3, 2009 1:36 #

  2. Very, very useful post. Never heard about the Query subclass trick which is incredibly cool. Thanks for sharing it.

    —  Alberto on Saturday, January 3, 2009 9:31 #

  3. Took me a little bit to decipher this, but interesting stuff nonetheless.

    query_property obviously isn't a built-in. The snippet above is missing the Session = scoped_session(sessionmaker()), and query_property should actually be Session.query_property.

    —  Bob Ippolito on Tuesday, January 13, 2009 9:59 #

  4. @3: Yes indeed, fixed the post accordingly.

    —  Armin Ronacher on Tuesday, January 13, 2009 10:41 #

  5. Armin,

    I agree, this is a great post. I did a Google search for "singleton sqlalchemy" for something I was working on, and came across this.

    I didn't know you could lazily load attributes until they were looked up, this is quite nice. Keep up the good work.

    —  Noah Gift on Tuesday, March 3, 2009 7:24 #

Leave a Reply