Posted byon under
You may have heard that a major version of SQLAlchemy, version 2.0, has been released in January 2023. Or maybe you missed the announcement and this is news to you. Either way, I thought you'd be curious to know what's new in it, if it is worth upgrading and how difficult the upgrade is.
As with previous software reviews, this is going to be an opinionated overview. I have been using the SQLAlchemy ORM in web projects for a long time, so in this article I will discuss the features that affect my own work, both in positive or negative ways. If instead you are interested to see a list of every change that went into this new release, then the official change log is the place to go.
For The Impatient
If you just want to see how a real-world project written for SQLAlchemy 2.0 looks like and don't want to read through the analysis that follows, you can head over to my microblog-api project, which I just upgraded to use the latest SQLAlchemy features.
No More Query Object
The first change I'm going to discuss is the new query interface. To be exact, this feature has been introduced in SQLAlchemy 1.4 releases as a way to help developers transition to 2.0, so you may have already seen it.
The now "legacy" way to issue queries in the SQLAlchemy ORM consisted in using a
Query object, available from the
Session.query() method, or for those using the Flask-SQLAlchemy extension for Flask, also as
Model.query. Here are some examples:
# using native SQLAlchemy user = session.query(User).filter(User.username == 'susan').first() # using Flask-SQLAlchemy user = User.query.filter(User.username == 'susan').first()
In release 2.0 this is considered the "old" way of executing queries. You can still run queries in this way, but the documentation refers to this interface as the "1.x Query API" or the "legacy Query API".
The new Query API has a very clear separation between the queries themselves and the execution environment in which they run. The above query to look for a user by the
username attribute can now be written as follows:
query = select(User).where(User.username == 'susan')
This example stores the query in the
query variable. At this point the query has not executed yet, and is not even associated with a session.
To execute the query, it can be passed to the
execute() method of a session object:
results = session.execute(query)
The returned value from the
execute() call is a
Result object, which functions as an iterable that returns
Row objects with a similar interface to a named tuple. If you prefer to get the results without iterating over them, there are a few methods that can be called on this object, some of which are:
all()to return a list with a row object for each result row.
first()to return the first result row.
one()to return the first result row, and raise an exception if there are no results or more than one result.
one_or_none()to return the first result row,
Noneif there are no results, or raise an exception if there are more than one result.
Working with results as tuples makes sense when each result row can contain multiple values, but when there is a single value per row it can be tedious to extract the values out of single-element tuples. The session has two additional execution methods that make working with single-value rows more convenient to use:
ScalarResultobject with the first value of each result row. The methods of
Resultlisted above are also available on this new result object.
scalar()return the first value of the first result row.
The example legacy query shown above can be executed as follows using the new interface:
user = session.scalar(query)
If you are used to the old query object it can take some time to become familiar with the new way to issue queries, but I personally find the separation between creating and executing queries a great improvement. Once again keep in mind that the legacy queries can still be used, so you can transition gradually to the new style.
Session Context Managers
Something I'm very excited to see is the introduction of context managers for sessions, which were also included in release 1.4 to help developers with a migration towards 2.0. I have always implemented my own session context managers and even explained how to do it in a video from a few years ago.
In 1.3 and older releases, the scoped session was the main pattern for working with sessions. The Flask-SQLAlchemy extension, for example, embraced it with the
db.session variable, which is its signature. To be honest, I have never liked scoped sessions because they tie a session to a thread, which is completely arbitrary. In many cases the life of a session is much shorter than that of a thread, so you have to resort to manual management to get things to work in the intended way.
Now a session can be instantiated with a context manager, so there is a clear start and end. Here is an example:
with Session() as session: session.add(user) session.commit()
Here the session is closed when the context manager block ends. And if an error occurs inside it, the session is rolled back.
A variant of this pattern can be used to have a session that automatically commits at the end, while still rolling back on errors:
with Session() as session: with session.begin(): session.add(user)
Another interesting change introduced in release 2.0 is the option to use typing hints to declare columns and relationships in models. Consider the following definition for a
class User(Model): __tablename__ = 'users' id = Column(Integer, primary_key=True) username = Column(String(64), index=True, unique=True, nullable=False) email = Column(String(120), index=True, unique=True, nullable=False) password_hash = Column(String(128)) about_me = Column(String(140))
In 2.0, the column type can be defined with the
Mapped typing hint. If there are any additional options, they can be given in a
import sqlalchemy as sa import sqlalchemy.orm as so class User(Model): __tablename__ = 'users' id: so.Mapped[int] = so.mapped_column(primary_key=True) username: so.Mapped[str] = so.mapped_column(sa.String(64), index=True, unique=True) email: so.Mapped[str] = so.mapped_column(sa.String(120), index=True, unique=True) password_hash: so.Mapped[Optional[str]] = so.mapped_column(sa.String(128)) about_me: so.Mapped[Optional[str]] = so.mapped_column(sa.String(140))
Relationships are typed in the same way. Here is an example:
class Post(Model): # ... author: so.Mapped['User'] = so.relationship(back_populates='posts')
Note how the type can be given as a string, which is sometimes necessary to avoid forward references.
Overall this isn't very different, but using typing hints can provide some benefits:
- If you use an IDE that runs static analysis on your code and provides suggestions as you type, having a strongly typed model will help your IDE understand your code better.
- A related feature introduced in 2.0 is an integration between models and dataclasses, which also relies on the typing hints.
- One little benefit that I noticed is that there are less symbols that need to be imported from SQLAlchemy. For columns that are numbers, dates, times, or even UUIDs, you can now use the typing hint to define them, without having to import the corresponding type classes from SQLAlchemy (sadly the
String()class is still needed because for many databases a maximum length must be given).
As with the queries, the old way of defining columns and relationships continues to be supported.
If you followed my Flask tutorials, you know that I have always recommended the
lazy='dynamic' option for relationships that can be large, as this makes it possible to add pagination, sorting and filtering before SQLAlchemy goes and retrieves the related objects.
Unfortunately dynamic relationships are also considered legacy in SQLAlchemy 2.0, as they are incompatible with the new query interface. Instead, the recommended solution is a new relationship type called "Write-Only". Here is how to define a write-only relationship:
class User(Model): # ... tokens: so.WriteOnlyMapped['Token'] = so.relationship(back_populates='user')
Why the name "write-only"? The difference with the old dynamic relationship is that the write-only relationship does not load (or read) the related objects, but provides
remove() methods to make changes (or write) to it.
How do you get the related objects then? The relationship exposes a
select() method that returns a query that you can execute on the session, possibly after adding filters, sorting or pagination. This is less automatic and definitely less magical than the dynamic relationship, but it supports the same use cases.
Here is an example of how to get the
tokens objects associated with a given user, sorted by their expiration date:
tokens = session.scalars(user.tokens.select().order_by(Token.expiration)).all()
If I have to be honest, it took me a while to accept that the dynamic relationships are a thing of the past, because I really really like them. But I do understand that their design is incompatible with the new query interface. I struggled to build an application with SQLAlchemy 1.4 that did not use dynamic relationships, and complains from myself and others on the SQLAlchemy discussions board is what resulted in the addition of the write-only relationship in 2.0.
Version 1.4 introduced a beta version of the asyncio extension, which provides async versions of the
Session objects with awaitable methods. In release 2.0 this extension is not considered beta anymore.
The most important thing to keep in mind with regards to the asyncio support in SQLAlchemy 2.0 is that many of the best features of SQLAlchemy are possible because often database instructions are issued implicitly, for example as a result of the application accessing an attribute of a model instance. When using the asynchronous paradigm, implicit I/O as a result of an attribute being accessed is not possible because all database activity must happen while the application issues an
await function call, as this is what makes concurrency possible.
A big part of setting up an asynchronous solution with SQLAlchemy involves preventing all the ways in which normally there would be implicit database actions, so you will need to have at least a basic understanding of how SQLAlchemy works under the hood, and be prepared to receive obscure error messages if you missed anything. The asyncio extension documentation explains the issues and provides guidance on what needs to be done, so getting a working solution is definitely possible, but it will not be as flexible as what you get under regular Python.
I hope this was a useful review of what I consider the most important changes in SQLAlchemy 2.0. The microblog-api repository contains a complete, non-trivial API project based on Flask, with database support provided by SQLAlchemy 2.0 and my Alchemical extension. Feel free to try it out if you'd like to see the new SQLAlchemy features in action!
Learn SQLAlchemy 2 with my New Book!
That's right, I am working on a new book that will teach you SQLAlchemy 2 by building an actual database project from the very beginning and in small incremental steps. The expected release date of the book is April 30th, 2023. If you would like to secure your copy, I have made it available to pre-order from the Kindle store with a 30% introductory discount.
Become a Patron!
Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!