Implementing the "Soft Delete" Pattern with Flask and SQLAlchemy

Posted by
on under

Soft Deletes

Every time I find myself writing code to delete data from a database I get nervous. What if I later determine that I needed this piece of information, after all? For example, what if having access to this data that was deleted would have helped me reproduce or debug an issue? Or what if the data can be useful for audit purposes in a future version of the application?

You can find lots of reasons to never delete records from your database. But obviously these records that you saved from permanent deletion need to be marked as being "less interesting" than the rest, so that you have something you can use to filter them out in queries. The Soft Delete pattern is one of the available options to implement deletions without actually deleting the data. It does it by adding an extra column to your database table(s) that keeps track of the deleted state of each of its rows. This sounds straightforward to implement, and strictly speaking it is, but the complications that derive from the use of soft deletes are far from trivial. In this article I will discuss some of these issues and how I avoid them in Flask and SQLAlchemy based applications.

A Concrete Example

As way of an example, I'm going to use a Flask application that implements a simple chat service. Below you can see the two models used by this service, before there is support for soft deletes:

class User(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(128))

class Message(db.Model):
    __tablename__ = 'messages'
    id = db.Column(db.Integer, primary_key=True)
    message = db.Column(db.String(256))
    user_id = db.Column(db.Integer, db.ForeignKey('users.id'))
    user = db.relationship('User')

As you can see, there is nothing strange or complicated, just a database of users and messages written by them. The complete code for the application is available on GitHub: https://github.com/miguelgrinberg/sqlalchemy-soft-delete/compare/v0.0...v0.1. Note that for this example I have not implemented authentication, since that is not the scope of this article.

What I want to do now, is allow users to delete their accounts. Physically deleting users from the database is not a great idea, as there might be messages with their user_id foreign keys pointing at deleted users, which would leave orphaned messages at best, or trigger database consistency errors at worst. The correct thing to do is to "soft delete" users who wish to delete their accounts, and that way they can stay in the database and keep any messages written by the user consistent.

Implementing Soft Deletes

The actual implementation of the soft delete feature is fairly simple. The core change is the addition of a column to the model or models that need to support the feature. In this case, I'm going to add soft delete support for users:

class User(db.Model):
    # ...
    deleted = db.Column(db.Boolean(), default=False)

For this example I have made the deleted column a boolean for simplicity, but in many cases it is convenient to make it a timestamp, so that you not only know what records are deleted, but also when each deletion took place. The implementation with a timestamp is largely identical.

Now that the application can record deleted users in the database, I can add a new endpoint that performs the soft delete:

@app.route('/users/<id>', methods=['DELETE'])
def delete_user(id):
    user = User.query.get_or_404(id)
    user.deleted = True
    db.session.commit()
    return '', 204

This looks pretty straightforward, right? Unfortunately, the above function has a small problem. If you delete a user, and then try to delete it again, the function will still succeed. This is a difference in behavior compared to a "hard delete" implementation. where a user can only be deleted once, and any attempts to repeat the call will end with a code 404 response, indicating that the user was not found. This behavior can easily be emulated for the soft deletes solution:

@app.route('/users/<id>', methods=['DELETE'])
def delete_user(id):
    user = User.query.get_or_404(id)
    if user.deleted:
        abort(404)
    user.deleted = True
    db.session.commit()
    return '', 204

Now the delete user endpoint works in the expected way, but the rest of the application knows nothing about deleted users. For example, the endpoint the returns a user should also send a code 404 response for a deleted user:

@app.route('/users/<id>', methods=['GET'])
 def get_user(id):
    user = User.query.get_or_404(id)
    if user.deleted:
        abort(404)
    return jsonify(user.to_dict())

And the endpoint that returns all users needs to filter out the deleted ones:

@app.route('/users', methods=['GET'])
def get_users():
    users = User.query.filter_by(deleted=False)
    return jsonify({'users': [u.to_dict() for u in users]})

Is that it? Unfortunately there is still more. The POST request that adds a new message should not work if the user that's posting has been deleted:

@app.route('/users/<id>/messages', methods=['POST'])
def new_message(id):
    user = User.query.get_or_404(id)
    if user.deleted:
        abort(404)
    message = Message(user_id=user.id, **request.get_json())
    db.session.add(message)
    db.session.commit()

And the JSON representation for a message should omit the URL for the user if the user is deleted:

class Message(db.Model):
    # ...
    def to_dict(self):
        return {'id': self.id, 'message': self.message,
                'url': url_for('get_message', id=self.id),
                'user_url': url_for('get_user', id=self.user_id)
                if not self.user.deleted else None}

Now we can say that the service supports user deletions in the expected way. If you want to review the complete set of changes, you can find them in the GitHub repository: https://github.com/miguelgrinberg/sqlalchemy-soft-delete/compare/v0.1...v0.2.

The good news is that from the outside, with this version of the application you cannot really tell that nothing is getting deleted, the behavior is identical to a hard delete implementation. But the cost to achieve that illusion is a bit high, don't you think? In this simple example, I had to make small adjustments to account for soft deletes in almost all API endpoints.

And that's not the worst of it, the really bad news is that I will have to remember to account for soft deleted users in any code I write in the future. In a soft delete implementation done in this style, you have to explicitly filter out the deleted records every single time. Can you imagine how difficult this can be for a large application worked on by a team of developers?

Let's find a better way.

Making Soft Deletes More Convenient

As you've seen in the previous section, the core issue that complicates the application after soft deleted users have been added is that the SQLAlchemy queries need to be modified to work correctly. Below you can see how SQLAlchemy queries change when adopting soft deletes:

Hard or no deletesSoft deletes
Active usersUser.queryUser.query.filter_by(deleted=False)
Deleted usersN/AUser.query.filter_by(deleted=True)
All usersN/AUser.query

As you can see in the table, soft deletes provide us with more flexibility in issuing queries for users, but the key problem is that the most important query of all, which is the one that returns active users, is not a trivial and expected User.query, but a more complex query. If you forget to update a query for active users to use the correct form, nothing will apparently break, which is the scary part. You just have silently added a bug to your application. And this isn't even painting the whole picture, because applications will also have queries on other entities that need to do joins against users, and those, again, need to be fixed to work correctly with the soft delete implementation.

Assuming the most common user related queries are those for active users, wouldn't it be good to find a way to alter the behavior of User.query so that, by default, only active users are returned? That would mean that in large part, existing code will continue to work in the same way as before, and only the code that explicitly needs to deal with deleted entities needs to be modified.

I started with the idea of "monkey patching" the query property of a model so that it represents a different query, one that has the condition to filter out the soft deleted entries. In the end, I decided to take advantage of the ability in Flask-SQLAlchemy to use a custom query class:

class User(db.Model):
    # ... 
    query_class = QueryWithSoftDelete

But I have to be honest here. The implementation of the QueryWithSoftDelete class is a bit tricky, it uses some tricks to achieve the goal of adding filter_by(deleted=False) to the initial query exposed by User.query. Here is my first implementation:

class QueryWithSoftDelete(BaseQuery):
    def __new__(cls, *args, **kwargs):
        obj = super(QueryWithSoftDelete, cls).__new__(cls)
        if len(args) > 0:
            super(QueryWithSoftDelete, obj).__init__(*args, **kwargs)
            return obj.filter_by(deleted=False)
        return obj

    def __init__(self, *args, **kwargs):
        pass

The new query class inherits from Flask-SQLAlchemy's BaseQuery. To obtain the modified query that I needed I had to alter the way the object is created. I could not add the filter statement in the __init__() function because in SQLAlchemy query objects are immutable, so it is not possible to update one in place. The only viable option is to do it in __new__(), where the query object is created and can be swapped out for a replacement. But to add a filter to the query I needed the query object to be initialized first, so I had to invoke the parent class __init__() function explicitly in __new__(), then add the extra filter and return that. And then I had to prevent __init__() to be called a second time by Python by adding an empty overloaded __init__() method. SQLAlchemy also has some tricks of its own when it creates queries, so I also had to account for a situation in which a query object is created without passing any arguments to the class, in which case I just use the default object creation.

With this custom query class, User.query is the user query expanded with the filter that leaves deleted objects out, and all the endpoints in the example application can be reverted to the way they were before soft deletes where added. Any new code that is written which references User.query will get only the active users.

So this is looking great, but what happens if you ever want to run a query that includes the deleted users? It would be nice if there was a way to optionally "reset" the query back to include all objects. To add that functionality I added a new method in the custom query class called with_deleted(). If you want to run a query that uses all users instead of just the active ones, then you can do it this way:

User.query.with_deleted().filter(...)

And this, I think looks pretty good. Below you can see the final version of the soft delete query class, including the with_deleted() method:

class QueryWithSoftDelete(BaseQuery):
    def __new__(cls, *args, **kwargs):
        obj = super(QueryWithSoftDelete, cls).__new__(cls)
        with_deleted = kwargs.pop('_with_deleted', False)
        if len(args) > 0:
            super(QueryWithSoftDelete, obj).__init__(*args, **kwargs)
            return obj.filter_by(deleted=False) if not with_deleted else obj
        return obj

    def __init__(self, *args, **kwargs):
        pass

    def with_deleted(self):
        return self.__class__(self._only_full_mapper_zero('get'),
                              session=db.session(), _with_deleted=True)

To get the clean query that includes all the entries I had to resort to another little trick. I added a _with_deleted keyword argument to the class, and with this argument I can instruct __new__() to include or not include the soft delete filter criteria.

The complete set of changes is available in the GitHub repository: https://github.com/miguelgrinberg/sqlalchemy-soft-delete/compare/v0.2...v0.3. And if you feel like trying out the code, you can clone the repository at https://github.com/miguelgrinberg/sqlalchemy-soft-delete and follow the instructions in the readme file.

UPDATE (12/24/2016): I've been alerted of a problem with the query.get() method, which fails to work when invoked on the new query class. The repository now includes a fix, which requires get() to be implemented in a slightly different way.

Conclusion

Before you go on to add soft deletes to all your projects I think it would be fair for me to mention that there are as many critics of the soft delete pattern as there are supporters, if not more. Some of the limitations of the pattern, as seen by its detractors are:

  • Any defined cascade settings will not trigger if soft deletes are used, since nothing gets deleted for real. This means that if you need some cascading effect in related entities of a soft deleted object, that will have to be implemented manually, or else you may end up getting deleted data incorrectly referenced by non-deleted entities, or deleted data that inadvertently appears as results from joins or other database operations that work across multiple tables.
  • A soft deleted entry in a table does not free up uniquely indexed columns. For example, a unique nickname for a deleted user will not become available for another user to take, since the actual row remains physically stored in the table.
  • Over time, a lot of the data that you have in your database can be deleted data. All this extra, mostly unneeded data could affect the performance of the database

The soft delete pattern is not the only pattern that addresses the problem of dealing with deleted data without doing physical deletions. Two other solutions that you may consider are:

  • Archive deleted to a separate table, maybe even on a separate database. This will keep the database with the active data at a reasonable size.
  • Serialize data to a log or backup file before physically deleting it.

I hope you find this technique useful. I would be very interested in hearing your thoughts about my soft delete implementation, in particular if you find situations in which my custom query object is insufficient. Let me know below in the comments!

Become a Patron!

Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!

30 comments
  • #26 Miguel Grinberg said

    @AK: I have updated the repository and the article with code that is also compatible with SQLAlchemy 1.4.

  • #27 Jan said

    Hi, I'm implementing this feature now in my project, so I looked for inspiration here. Your trick with Query is really nice, is it possible to similarily patch select in new SQLAlchemy 2 convention?
    I'm thinking something like:

    @classmethod
    def select(cls):
      return db.session.select(cls).where(deleted==False)
    

    would this work? and might there be some problem with this? (inspiration for cls.select is taken from your Alchemical.

    Thanks for your thoughts

  • #28 Miguel Grinberg said

    @Jan: Sure, it is actually less magical when you do it with the new query style. You can add a with_deleted=False argument to the overriden select() method if you want to give the option to include the soft-deleted entities as well.

  • #29 zz42 said

    Hi , it works when i use flask-sqlalchemy style 'data = User.query.all()' but not working when using native sqlalchemy style 'db.session.query(User).all()', is there a way to do this with native sqlalchemy?

  • #30 Miguel Grinberg said

    @zz42: did you notice the date in this blog post? This was written in 2016. What you call "native" SQLAlchemy did not exist back then. The same ideas can be applied to the new query syntax in SQLAlchemy 2, but it needs to be adapted, the code in this article is not designed for that.

Leave a Comment