What's New in SQLAlchemy 2.0?

Posted by
on under

You may have heard that a major version of SQLAlchemy, version 2.0, has been released in January 2023. Or maybe you missed the announcement and this is news to you. Either way, I thought you'd be curious to know what's new in it, if it is worth upgrading and how difficult the upgrade is.

As with previous software reviews, this is going to be an opinionated overview. I have been using the SQLAlchemy ORM in web projects for a long time, so in this article I will discuss the features that affect my own work, both in positive or negative ways. If instead you are interested to see a list of every change that went into this new release, then the official change log is the place to go.

For The Impatient

If you just want to see how a real-world project written for SQLAlchemy 2.0 looks like and don't want to read through the analysis that follows, you can head over to my microblog-api project, which I just upgraded to use the latest SQLAlchemy features.

No More Query Object

The first change I'm going to discuss is the new query interface. To be exact, this feature has been introduced in SQLAlchemy 1.4 releases as a way to help developers transition to 2.0, so you may have already seen it.

The now "legacy" way to issue queries in the SQLAlchemy ORM consisted in using a Query object, available from the Session.query() method, or for those using the Flask-SQLAlchemy extension for Flask, also as Model.query. Here are some examples:

# using native SQLAlchemy
user = session.query(User).filter(User.username == 'susan').first()

# using Flask-SQLAlchemy
user = User.query.filter(User.username == 'susan').first()

In release 2.0 this is considered the "old" way of executing queries. You can still run queries in this way, but the documentation refers to this interface as the "1.x Query API" or the "legacy Query API".

The new Query API has a very clear separation between the queries themselves and the execution environment in which they run. The above query to look for a user by the username attribute can now be written as follows:

query = select(User).where(User.username == 'susan')

This example stores the query in the query variable. At this point the query has not executed yet, and is not even associated with a session.

To execute the query, it can be passed to the execute() method of a session object:

results = session.execute(query)

The returned value from the execute() call is a Result object, which functions as an iterable that returns Row objects with a similar interface to a named tuple. If you prefer to get the results without iterating over them, there are a few methods that can be called on this object, some of which are:

  • all() to return a list with a row object for each result row.
  • first() to return the first result row.
  • one() to return the first result row, and raise an exception if there are no results or more than one result.
  • one_or_none() to return the first result row, None if there are no results, or raise an exception if there are more than one result.

Working with results as tuples makes sense when each result row can contain multiple values, but when there is a single value per row it can be tedious to extract the values out of single-element tuples. The session has two additional execution methods that make working with single-value rows more convenient to use:

  • scalars() returns a ScalarResult object with the first value of each result row. The methods of Result listed above are also available on this new result object.
  • scalar() return the first value of the first result row.

The example legacy query shown above can be executed as follows using the new interface:

user = session.scalar(query)

If you are used to the old query object it can take some time to become familiar with the new way to issue queries, but I personally find the separation between creating and executing queries a great improvement. Once again keep in mind that the legacy queries can still be used, so you can transition gradually to the new style.

Session Context Managers

Something I'm very excited to see is the introduction of context managers for sessions, which were also included in release 1.4 to help developers with a migration towards 2.0. I have always implemented my own session context managers and even explained how to do it in a video from a few years ago.

In 1.3 and older releases, the scoped session was the main pattern for working with sessions. The Flask-SQLAlchemy extension, for example, embraced it with the db.session variable, which is its signature. To be honest, I have never liked scoped sessions because they tie a session to a thread, which is completely arbitrary. In many cases the life of a session is much shorter than that of a thread, so you have to resort to manual management to get things to work in the intended way.

Now a session can be instantiated with a context manager, so there is a clear start and end. Here is an example:

with Session() as session:
    session.add(user)
    session.commit()

Here the session is closed when the context manager block ends. And if an error occurs inside it, the session is rolled back.

A variant of this pattern can be used to have a session that automatically commits at the end, while still rolling back on errors:

with Session() as session:
    with session.begin():
        session.add(user)

Typing Improvements

Another interesting change introduced in release 2.0 is the option to use typing hints to declare columns and relationships in models. Consider the following definition for a User model:

class User(Model):
    __tablename__ = 'users'

    id = Column(Integer, primary_key=True)
    username = Column(String(64), index=True, unique=True, nullable=False)
    email = Column(String(120), index=True, unique=True, nullable=False)
    password_hash = Column(String(128))
    about_me = Column(String(140))

In 2.0, the column type can be defined with the Mapped typing hint. If there are any additional options, they can be given in a mapped_column() call.

import sqlalchemy as sa
import sqlalchemy.orm as so

class User(Model):
    __tablename__ = 'users'

    id: so.Mapped[int] = so.mapped_column(primary_key=True)
    username: so.Mapped[str] = so.mapped_column(sa.String(64), index=True, unique=True)
    email: so.Mapped[str] = so.mapped_column(sa.String(120), index=True, unique=True)
    password_hash: so.Mapped[Optional[str]] = so.mapped_column(sa.String(128))
    about_me: so.Mapped[Optional[str]] = so.mapped_column(sa.String(140))

Relationships are typed in the same way. Here is an example:

class Post(Model):
    # ...
    author: so.Mapped['User'] = so.relationship(back_populates='posts')

Note how the type can be given as a string, which is sometimes necessary to avoid forward references.

Overall this isn't very different, but using typing hints can provide some benefits:

  • If you use an IDE that runs static analysis on your code and provides suggestions as you type, having a strongly typed model will help your IDE understand your code better.
  • A related feature introduced in 2.0 is an integration between models and dataclasses, which also relies on the typing hints.
  • One little benefit that I noticed is that there are less symbols that need to be imported from SQLAlchemy. For columns that are numbers, dates, times, or even UUIDs, you can now use the typing hint to define them, without having to import the corresponding type classes from SQLAlchemy (sadly the String() class is still needed because for many databases a maximum length must be given).

As with the queries, the old way of defining columns and relationships continues to be supported.

Write-Only Relationships

If you followed my Flask tutorials, you know that I have always recommended the lazy='dynamic' option for relationships that can be large, as this makes it possible to add pagination, sorting and filtering before SQLAlchemy goes and retrieves the related objects.

Unfortunately dynamic relationships are also considered legacy in SQLAlchemy 2.0, as they are incompatible with the new query interface. Instead, the recommended solution is a new relationship type called "Write-Only". Here is how to define a write-only relationship:

class User(Model):
    # ...
    tokens: so.WriteOnlyMapped['Token'] = so.relationship(back_populates='user')

Why the name "write-only"? The difference with the old dynamic relationship is that the write-only relationship does not load (or read) the related objects, but provides add() and remove() methods to make changes (or write) to it.

How do you get the related objects then? The relationship exposes a select() method that returns a query that you can execute on the session, possibly after adding filters, sorting or pagination. This is less automatic and definitely less magical than the dynamic relationship, but it supports the same use cases.

Here is an example of how to get the tokens objects associated with a given user, sorted by their expiration date:

tokens = session.scalars(user.tokens.select().order_by(Token.expiration)).all()

If I have to be honest, it took me a while to accept that the dynamic relationships are a thing of the past, because I really really like them. But I do understand that their design is incompatible with the new query interface. I struggled to build an application with SQLAlchemy 1.4 that did not use dynamic relationships, and complains from myself and others on the SQLAlchemy discussions board is what resulted in the addition of the write-only relationship in 2.0.

Async Support

Version 1.4 introduced a beta version of the asyncio extension, which provides async versions of the Engine and Session objects with awaitable methods. In release 2.0 this extension is not considered beta anymore.

The most important thing to keep in mind with regards to the asyncio support in SQLAlchemy 2.0 is that many of the best features of SQLAlchemy are possible because often database instructions are issued implicitly, for example as a result of the application accessing an attribute of a model instance. When using the asynchronous paradigm, implicit I/O as a result of an attribute being accessed is not possible because all database activity must happen while the application issues an await function call, as this is what makes concurrency possible.

A big part of setting up an asynchronous solution with SQLAlchemy involves preventing all the ways in which normally there would be implicit database actions, so you will need to have at least a basic understanding of how SQLAlchemy works under the hood, and be prepared to receive obscure error messages if you missed anything. The asyncio extension documentation explains the issues and provides guidance on what needs to be done, so getting a working solution is definitely possible, but it will not be as flexible as what you get under regular Python.

Conclusion

I hope this was a useful review of what I consider the most important changes in SQLAlchemy 2.0. The microblog-api repository contains a complete, non-trivial API project based on Flask, with database support provided by SQLAlchemy 2.0 and my Alchemical extension. Feel free to try it out if you'd like to see the new SQLAlchemy features in action!

Learn SQLAlchemy 2 with my New Book!

That's right, I am working on a new book that will teach you SQLAlchemy 2 by building an actual database project from the very beginning and in small incremental steps. The expected release date of the book is April 30th, 2023. If you would like to secure your copy, I have made it available to pre-order from the Kindle store with a 30% introductory discount.

Become a Patron!

Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!

34 comments
  • #26 Miguel Grinberg said

    @Kescopee: Did you import the products first? You have some products in the orders file that are not in your database. They should be in it if you had imported the products data file first.

  • #27 Kescopee said

    Hi Miguel,

    Thank you for your help. That fixed the problem.

  • #28 Kescopee said

    I completed your book and I found it to be very engaging, informative and well-written, with key technical details about SQLAlchemy 2. Overall, an awesome book and I highly recommend it to anyone who wants to learn more about SQLAlchemy. Thank you for sharing your knowledge with the world!

    I have a question about writing tests (e.g. using pytest). How can I create a separate alchemical database instance and still use the existing models in the models.py file? Correct me if I'm wrong, but can models be shared between different database instances each with a dedicated database e.g. test_db and production_db?

  • #29 Miguel Grinberg said

    @Kescopee: You will have a single database instance, not multiple ones. What changes is what database URL you configured on the one instance. The models are all associated with this instance, so they'll work just fine regardless of which database URL (production, testing, development, etc.) you use.

  • #30 Chris Tucker said

    Hello Miguel,

    Apologies if this comment is inappropriate here but I can find no other way of reporting errata in your book SQLAlchemy2 In Practice.
    I'm working through your chapter 2, using MariaDB. Running import_products.py against the products.csv from github gives this error: "sqlalchemy.exc.OperationalError: (mariadb.OperationalError) Incorrect string value: '\xC5\xA4O' for column retrofun.products.name at row 135". I'm an SQL novice and don't know how to work around this problem, other than by deleting the offending line in the csv file, which resolves the problem.

  • #31 Miguel Grinberg said

    @Chris: I think the default collation in MariaDB is latin1, unlike MySQL which uses UTF-8. The dataset that I'm using is based on a UTF-8 character encoding. Your solution is fine, if you remove all non-standard characters then latin1 and utf-8 are equal.

  • #32 Chris Tucker said

    Hello Miguel,
    Thank you. Encoding and collation will be lessons for another day!

  • #33 waqar ahmad said

    for flask-sqlalchemy sessions can i use
    with db.session as session:
    session.add(user)
    session.commit()
    also, in this case the flask-sqlalchemy will create a scooped session or not? if yes, then the above context manager session will be under a scooped session?

  • #34 Miguel Grinberg said

    @waqar: I haven't looked at the Flask-SQAlchemy implementation changes when they added SQLAlchemy 2.0 compatibility, but I believe they continue to use scoped sessions, like they did in previous versions.

Leave a Comment