SQLAlchemy and You
written on Tuesday, July 19, 2011
Without doubt are most new Python web programmers these days chosing the
Django framework as their gateway drug to Python web development. As such
many people's first experience with a Python ORM (or maybe an ORM
altogether) is the Django one. When they are later switching to something
else they often find SQLAlchemy unnecessarily complex and hard to use.
Why is that the case?
I made a quick poll on Twitter about why people prefer the Django ORM over
SQLAlchemy and I got back a few interesting results. First of all that
question was obviously asked with the intent to attract answers from
people that do prefer Django over SQLAlchemy or at least have some issues
with SQLAlchemy that they don't seem to have with Django. Without a doubt
there is a large fanbase behind SQLAlchemy, myself included.
SQLAlchemy in general just has a much larger featureset and it's the only
ORM for Python which allows you to take full advantage of your database
and does not stand in your way. It exposes all features of your
underlying database if you want and can be heavily fine tuned.
This article assumes that you have some basic Django knowledge and want to
give SQLAlchemy a try. Step by step it walks through the differences and
common idioms.
Design Differences
There are two very important differences between SQLAlchemy and Django.
The first one is the less obvious one: SQLAlchemy is a deeply layered
system, whereas Django's ORM is basically just one layer which is the ORM
you see. In SQLAlchemy you have at the very bottom the engine which with
abstracts away connection pools and basic API differences between
different databases, on top of that you have the SQL abstraction language,
sitting on top of that and the table definitions you have the basic ORM
and on top of that you have the declarative ORM which looks very close
to the Django ORM. The other more striking difference however is that
SQLAlchemy follows the “Unit of Work” pattern whereas Django's ORM follows
something that is very close to the “Active Record” pattern.
What's the difference? Django's ORM is basically quite simple. Each time
you do any query it generates a SQL expression for you and sends a query
to the database. Then it constructs and object for you. That object can
be modified and if you call save() on it, it will update the record in
the database with the new values of the attributes. This is not at all
how SQLAlchemy's ORM component works. In SQLAlchemy you have an object
called the “session”. It basically encapsulates a transaction. However
it does more. Each object is tracked by primary key in this session. As
such each object only exists once by primary key. As such you can safely make a
lot of queries and you never have things out of sync. When you commit the
session it will send all changes at once to the database in correct order,
if you rollback the session nothing happens instead.
SQLAlchemy's Complexity
SQLAlchemy has to fight with some basic acceptance problems which are
caused by the fact that it's framework independent and is not even
something you would only use in web applications. This is why projects
like Flask-SQLAlchemy
exist to make the integration for you. Many frameworks either provide
something that preconfigures SQLAlchemy for you with some sane defaults or
have a section in the cookbook to copy/paste code from.
This also is the reason why many people find SQLAlchemy's documentation
overwhelming. Not only does the documentation guide you through the
different levels of SQLAlchemy but also shows all the different ways you
can configure SQLAlchemy and the ORM.
Django on the other hand does not have many different ways the configure
the ORM. It comes preconfigured with the exception of some database
related setting such as server name, port and a few other things.
The Session — The Heart of the ORM
If you chose to use SQLAlchemy's ORM component and not just the engine or
SQL abstraction layer you will sooner or later be confronted with the
session object. What is the session object? It is the one object that
records all the changes on models you do. How does a model know about the
session? Let's compare this to Django for a moment.
In Django if you have a model it also has an objects attribute attached.
That attribute points to a manager object which in turn can generate
queryset objects and the queryset objects then fire the query and hold the
results. How does this queryset object find the current transaction? In
Django the answer is that transactions are bound to a thread always. Each
thread can have one transaction and the queryset uses. So Django needs to
find the transaction in the queryset when the actual query fires and it
does that by finding the current thread which owns a transaction.
So how does that work in SQLAlchemy? As we have established, objects are
always “owned” by a session and keyed by primary key. Each primary key
can only exist once. Because that session is quite fundamental and needs
to work in many setups this is configurable. But first we need to figure
out what the difference between a Django queryset and a SQLAlchemy query
object is. They look similar on the surface, but are very different in
practice. A Django queryset is created by the manager of a model and also
holds the results once the query is fired. In SQLAlchemy the query object
can be created in many different ways and unifies the idea of the manager
and Django's querysets in itself. When the query object is evaluated (the
query is sent to the database) it instead returns a list or the only
result object, depending on what methods are used. This means the manager
object is entirely unnecessary in SQLAlchemy, if you need custom manager
methods you would just subclass the query and attach new methods on there.
Since the query object can be joined it makes for quite a nice API.
But back to sessions. In Django default of having one transaction per
thread makes a lot of sense, but limits the usefulness when you have other
means of concurrency somewhat. Also it makes it hard if you want to use
the same model against two different databases. This is where the
explicit API comes into play which is the one that the documentation uses.
So instead of this Django idiom:
In default SQLAlchemy you would do this instead:
session.query(MyModel).all()
Here it's explicit what session object should be used and you can have
multiple of those side by side obviously. Since many people do not need
this and are fine with having one session per thread you can take
advantage of the scoped session support in SQLAlchemy. For instance the
Flask-SQLAlchemy extension will by default attach a query class level
attribute to your models which looks at the current thread and it's
session object. So each thread will only have one session. Furthermore
at the end of an HTTP request in Flask the extension will automatically
destroy the session and discarding uncommitted changes.
With that, it looks a lot closer to Django:
You however will still need the session to commit and insert and delete
objects from the database. The scoped session automatically provides a
proxy that always point to the current active session.
The Declarative Extension
For a long time SQLAlchemy made you declare table objects first and then
separately create the classes and map those together. This has the
advantage over just subclassing some magical baseclass that you can map
already existing classes to things in the database. The downside always
was however that you had to declare multiple objects and the common case
was unnecessarily complex.
SQLAlchemy since introduced the declarative base. It's a extension module
shipped with SQLAlchemy that provides a function which creates a brand new
baseclass (which you can also customize) which does metaclass magic very
similar to Django. As such you can directly declare relationships and
attributes in the class itself.
There are still some differences though:
- Relationships are not magically created for you, you have to be
expicit. The same is true with foreign keys.
- Primary keys are not automatically generated for you for the simple
reason that SQLAlchemy supports more than one primary key type. If
you want one chosen by default, you can provide a baseclass that
implements that.
- The table name has to be set explicitly. Again you can customize the
baseclass to derive the table name from the class name if you like.
To get this baseclass you basically just need this:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
Basic Models
A basic Django model looks something like this:
class Person(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
The equivalent SQLAlchemy model with declarative base looks like this:
from sqlalchemy import Column, Integer, String
class Person(Base):
__tablename__ = 'persons'
id = Column(Integer, primary_key=True)
first_name = Column(String(30))
last_name = Column(String(30))
It's a little more to type, but if you want to make this implicit you just
need a proper baseclass. Flask-SQLAlchemy for instance sets the lowercase
version of the class as default tablename unless overridden.
Many-to-One Relationships
In Django this is straightforward:
class Manufacturer(models.Model):
name = models.CharField(max_length=30)
class Car(models.Model):
manufacturer = models.ForeignKey(Manufacturer,
related_name='cars')
name = models.CharField(max_length=30)
In SQLAlchemy we have to be a little bit more expressive:
from sqlalchemy import Column, Integer, String, ForeignKey
from sqlalchemy.orm import relationship, backref
class Manufacturer(Base):
__tablename__ = 'manufacturers'
id = Column(Integer, primary_key=True)
name = Column(String(30))
class Car(models.Model):
__tablename__ = 'cars'
id = Column(Integer, primary_key=True)
manufacturer_id = Column(Integer, ForeignKey('manufacturers.id'))
name = Column(String(30))
manufacturer = relationship('Manufacturer', backref=
backref('cars', lazy='dynamic'))
Here we have to model the relationship ourself. First we need to declare
the foreign key. It has to have the same type as the primary key of the
table we want to point to and additionally the column needs to be given a
ForeignKey instance with the first argument being the dotted name to the
column referenced. Note that this is the table name, not the class name.
The relationship is then declared on Car with relationship. The first
argument is a class or the name of a class we want to have the
relationship with. By default it will try to find a valid join condition
automatically. If it does not, you can explicitly provide one as a string
or real expression:
manufacturer = relationship('Manufacturer',
primaryjoin='Car.manufacturer_id == Manufacturer.id',
backref=backref('cars', lazy='dynamic'))
The backref argument automatically declares the reverse. It will attach
a cars property on the manufacturer. The lazy='dynamic' tells
SQLAlchemy to make the backref lazy and a dynamic loading one. In that
case accessing manufacturer.cars will be a query object you can further
refine instead of directly firing the query and returning a list.
Other lazy settings:
- 'select': if accessed load everything as list with another select
statement. This is the default.
- 'joined': uses a join to automatically load that backref with the
query of the parent itself.
- 'dynamic': returns a query object instead of firing the query. This
can be sliced and further extended.
The lazy settings can also be set on relationship and not just backref.
Backref in a nutshell:
'lazy' and 'select'. The first one fires a query when
honda.cars is accessed, the other one will fetch it when honda is
queried:
>>> honda.cars
[<Car 1>, <Car 2>]
And here with 'dynamic':
>>> honda.cars
<AppenderQuery ...>
>>> honda.cars.all()
[<Car 1>, <Car 2>]
Many-To-Many
Many to many relationships in Django are easy cake because everything is
done for you:
class Topping(models.Model):
name = models.CharField(max_length=30)
class Pizza(models.Model):
toppings = models.ManyToManyField(Topping)
name = models.CharField(max_length=30)
In SQLAlchemy we have to construct a helper table to join over:
from sqlalchemy import Column, Integer, String, ForeignKey, Table
from sqlalchemy.orm import relationship, backref
pizza_toppings = Table('pizza_toppings', Base.metadata,
Column('topping_id', Integer, ForeignKey('toppings.id')),
Column('pizza_id', Integer, ForeignKey('pizzas.id'))
)
class Topping(Base):
__tablename__ = 'toppings'
id = Column(Integer, primary_key=True)
name = Column(String(30))
class Pizza(models.Model):
__tablename__ = 'pizzas'
id = Column(Integer, primary_key=True)
name = Column(String(30))
toppings = relationship('Topping', secondary=pizza_toppings,
backref=backref('pizzas', lazy='dynamic'))
Translating Queries From Django To SQLAlchemy
So this here assumes that you are using scoped sessions like
Flask-SQLAlchemy does and unmodified Django. The first example is always
how the equivalent Django code looks like and how you would do that with
SQLAlchemy:
Inserting Entries
Inserting entries in Django can be done with either creating an instance
of a model or by using the create() method of the object manager:
foo = MyModel(field1='value', field2='value')
foo.save()
# or alternatively
foo = MyModel.objects.create(field1='value', field2='value')
In SQLAlchemy you need to do this instead:
foo = MyModel(field1='value', field2='value')
session.add(foo)
But with that you have only added the object to the session, at that point
it has not yet committed the transaction. This has to be done explicitly
by yourself when you are happy with all the changes:
Deleting Entries
Deleting works very much like saving in Django. You get your object and
then call the delete() method on it:
obj = MyModel.objects.filter(pk=the_id).get()
obj.delete()
In SQLAlchemy that operation is performed via the session:
obj = MyModel.query.get(the_id)
session.delete(obj)
Again, remember to commit your session.
Updating Entries
How do you update an entry? Just get the object, modify it and commit the
session:
obj = MyModel.query.get(the_id)
obj.name = 'New Value'
session.commit()
Primary Key Queries
Queries is where Django and SQLAlchemy are the most different. Django
uses keyword arguments to the query functions to filter the query,
SQLAlchemy generally uses expressions composed out operator objects.
Query by primary key in Django:
obj = MyModel.objects.get(pk=the_id)
And in SQLAlchemy:
obj = MyModel.query.get(the_id)
Note that get() returns None if the primary key does not exist in
SQLAlchemy and will raise a DoesNotExist exception in Django.
Generally the get() method is a shortcut in SQLAlchemy that will also
not issue a query for that object if it was already queried for that
session before. Also unlike Django your primary key can be of any type or
be a compound of more than one column.
General Query Syntax
If you want to filter a query in Django you generally use keyword
arguments in the format column__operation=value. For instance
column__contains='e' to check if a string column named column
contains the letter “e”. In SQLAlchemy instead you are using expressions.
These expressions can be printed to see what query they would generate.
Here some examples:
>>> print MyModel.id == 23
model.model_id = :model_id_1
>>> print MyModel.id.in_([1, 2, 3])
model.model_id IN (:model_id_1, :model_id_2, :model_id_3)
>>> print MyModel.name.contains('e')
model.name LIKE '%%' || :name_1 || '%%'
Note that SQLAlchemy shows you the placeholders there because it will
let the database insert those values later.
The whole expression language expresses pretty much everything that SQL
has to offer:
>>> print MyModel.thread_count + MyModel.post_count + 1
(model.thread_count + model.post_count) + :param_1
>>> print MyModel.id.between(1, 10) & MyModel.name.startswith('a')
model.model_id BETWEEN :model_id_1 AND :model_id_2 AND
model.name LIKE :name_1 || '%%'
Now this is a biggie, because this is how you can filter for anything if
you pass such an expression to filter():
active_users_with_a_or_b = User.query.filter(
(User.name.startswith('a') | User.name.startswith('b')) &
(User.is_active == True)
).all()
To evaluate a query you have a few choices:
- first() returns the first result from the query and will also
tell the database to perform an implicit LIMIT 1. If more than
one result is found you won't know and if none is found you get
None back.
- one() is similar to first() but it will not limit the
result in any way but perform a sanity check on getting the
results. It will raise an NoResultFound exception back if it did
not found a single row or a MultipleResultsFound exception if it got
more than one result which indicates a bug on your part.
- all() just evaluates the whole query and returns each row as a
list. Why as a list and not as an iterator? First of all because
each object returned is also immediately registered on the session.
There are of course ways to bypass that, but unless you have an
enormous result count you won't notice, secondly because most Python
database adapters don't support streaming results anyways.
Now this is nice and everything, but all that model repetition can be
annoying. For as long as you are just comparing a column to a given value
you can use the filter_by() function and pass keyword arguments:
user = User.query.filter_by(username=username).first()
Multiple arguments are automatically joined with AND.
Date Based Queries
In Django you can use field__year=2011 to select all entries where the
year of a field has a specific value. Underneath what usually happens is
that an EXTRACT expression is issued. Unfortunately that's hugely
database dependent and does not map nicely to a function. Thankfully
SQLAlchemy provides a helper for that which automatically does the right
thing for each database:
from sqlalchemy.sql import extract
entries_a_month = Entry.query.filter(
(extract(Entry.pub_date, 'year') == 2011) &
(extract(Entry.pub_date, 'month') == 1)
).all()
Quite a few extractions are possible. The most common ones are month,
day, year, hour, minute, second, doy (day of
year) and dow (day of week).
Sorting
In Django if you sort something you do that by calling order_by() and
passing it some strings with the columns to order by:
forwards = MyModel.objects.order_by('pub_date')
backwards = MyModel.objects.order_by('-pub_date')
While it appears that the same is possible in SQLAlchemy you have to be
careful because it only works as SQLAlchemy inserts that text directly
into the query. What instead you want to be doing is using the
expressions again:
forwards = MyModel.query.order_by(MyModel.pub_date)
backwards = MyModel.query.order_by(MyModel.pub_date.desc())
And again, any expression works in that situation, so you can just easily
order by ridiculous expressions if you want.
Aggregates
Aggregates in Django are a quite new feature and generally not all that
awesome, so we're skipping the Django part here. Thankfully they are much
better supported in SQLAlchemy as SQLAlchemy just handles them by querying
over arbitrary expressions. Functions on the database can be expressed by
sqlalchemy.func.functionname in SQLAlchemy. This in combination with
arbitrary expressions makes it quite potent. But first the simple case:
from sqlalchemy.sql import func
q = session.query(func.count(User.id))
Now that query obviously does not resolve to a model but a scalar value.
In this case if we would call q.first() we would get a single tuple
back with a single item: the count. For this case SQLAlchemy provides a
nice shortcut: scalar():
>>> session.query(func.count(User.id)).scalar()
1337
What if we want to group by something? Use group_by() and just
iterate over it:
for age, count in session.query(User.age,
func.count(User.id)).group_by(User.age).all():
print 'Users aged %d: %d' % (age, count)
Distinct counts are simple as well, just call .distinct() on the
query. In fact: if you have a rough idea of what the SQL would look like
you can get to the expected result with pure guesswork and SQLAlchemy will
most likely “just work” ™.
Joins
Now this is the part where people get constantly confused with SQLAlchemy
but fear not, I have you covered. Django hides the business of joins from
you. For instance if you want to get all posts written by a specific
author that is known by name you would do something like this:
posts = Post.objects.filter(author__name__exact=the_author_name)
So how do you do that in SQLAlchemy? The answer is that this means a join
is taking place. There are two ways to model that select. First the
simple one:
posts = Post.query.join(Author).filter(Author.name == the_author_name)
That wasn't too tricky. How does SQLAlchemy know how to do the right
thing? It looks at what joins are possible and if only one is, it selects
the right one. Alternatively you can explicitly provide what to join on
as an expression as second argument to join(). Again, you can get
arbitrarily complex there. Everything after the join automatically
operates on the last .join()-ed model. If you want to further filter
the former model (here Post) you can either move them before the
.join() call or use .reset_joinpoint().
Alternatively you could also express this as a subselect:
author_query = Author.query.filter(Author.name == the_author_name)
posts = Post.query.filter(Post.author_id.in_(author_query))
Why does SQLAlchemy not do what Django does? Well, first of all explicit
is better than implicit: you know exactly what happens. A regular join is
not always what you want or SQL would not provide an outerjoin which of
course you can use with SQLAlchemy as well. Secondly, it's really easy to
replicated. If you are curious of how that works you can have a look at
this subclass of the builtin query that implements Django's filtering with
keyword arguments: sqlalchemy-django-query.
Why Consider SQLAlchemy?
This article did not really give you any reasons to use SQLAlchemy, did
it? But the simple cases is not where SQLAlchemy shines. It's the more
complex situations which you can't do at all in Django that work nicely in
SQLAlchemy. Oh, and SQLAlchemy does not override all your columns when
you just changed one on update ;-)
This entry was tagged
python and
sqlalchemy