2016-09-29T14:54:27Z

Implementing the "Soft Delete" Pattern with Flask and SQLAlchemy

Soft Deletes

Every time I find myself writing code to delete data from a database I get nervous. What if I later determine that I needed this piece of information, after all? For example, what if having access to this data that was deleted would have helped me reproduce or debug an issue? Or what if the data can be useful for audit purposes in a future version of the application?

You can find lots of reasons to never delete records from your database. But obviously these records that you saved from permanent deletion need to be marked as being "less interesting" than the rest, so that you have something you can use to filter them out in queries. The Soft Delete pattern is one of the available options to implement deletions without actually deleting the data. It does it by adding an extra column to your database table(s) that keeps track of the deleted state of each of its rows. This sounds straightforward to implement, and strictly speaking it is, but the complications that derive from the use of soft deletes are far from trivial. In this article I will discuss some of these issues and how I avoid them in Flask and SQLAlchemy based applications.

A Concrete Example

As way of an example, I'm going to use a Flask application that implements a simple chat service. Below you can see the two models used by this service, before there is support for soft deletes:

class User(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(128))

class Message(db.Model):
    __tablename__ = 'messages'
    id = db.Column(db.Integer, primary_key=True)
    message = db.Column(db.String(256))
    user_id = db.Column(db.Integer, db.ForeignKey('users.id'))
    user = db.relationship('User')

As you can see, there is nothing strange or complicated, just a database of users and messages written by them. The complete code for the application is available on GitHub: https://github.com/miguelgrinberg/sqlalchemy-soft-delete/compare/v0.0...v0.1. Note that for this example I have not implemented authentication, since that is not the scope of this article.

What I want to do now, is allow users to delete their accounts. Physically deleting users from the database is not a great idea, as there might be messages with their user_id foreign keys pointing at deleted users, which would leave orphaned messages at best, or trigger database consistency errors at worst. The correct thing to do is to "soft delete" users who wish to delete their accounts, and that way they can stay in the database and keep any messages written by the user consistent.

Implementing Soft Deletes

The actual implementation of the soft delete feature is fairly simple. The core change is the addition of a column to the model or models that need to support the feature. In this case, I'm going to add soft delete support for users:

class User(db.Model):
    # ...
    deleted = db.Column(db.Boolean(), default=False)

For this example I have made the deleted column a boolean for simplicity, but in many cases it is convenient to make it a timestamp, so that you not only know what records are deleted, but also when each deletion took place. The implementation with a timestamp is largely identical.

Now that the application can record deleted users in the database, I can add a new endpoint that performs the soft delete:

@app.route('/users/<id>', methods=['DELETE'])
def delete_user(id):
    user = User.query.get_or_404(id)
    user.deleted = True
    db.session.commit()
    return '', 204

This looks pretty straightforward, right? Unfortunately, the above function has a small problem. If you delete a user, and then try to delete it again, the function will still succeed. This is a difference in behavior compared to a "hard delete" implementation. where a user can only be deleted once, and any attempts to repeat the call will end with a code 404 response, indicating that the user was not found. This behavior can easily be emulated for the soft deletes solution:

@app.route('/users/<id>', methods=['DELETE'])
def delete_user(id):
    user = User.query.get_or_404(id)
    if user.deleted:
        abort(404)
    user.deleted = True
    db.session.commit()
    return '', 204

Now the delete user endpoint works in the expected way, but the rest of the application knows nothing about deleted users. For example, the endpoint the returns a user should also send a code 404 response for a deleted user:

@app.route('/users/<id>', methods=['GET'])
 def get_user(id):
    user = User.query.get_or_404(id)
    if user.deleted:
        abort(404)
    return jsonify(user.to_dict())

And the endpoint that returns all users needs to filter out the deleted ones:

@app.route('/users', methods=['GET'])
def get_users():
    users = User.query.filter_by(deleted=False)
    return jsonify({'users': [u.to_dict() for u in users]})

Is that it? Unfortunately there is still more. The POST request that adds a new message should not work if the user that's posting has been deleted:

@app.route('/users/<id>/messages', methods=['POST'])
def new_message(id):
    user = User.query.get_or_404(id)
    if user.deleted:
        abort(404)
    message = Message(user_id=user.id, **request.get_json())
    db.session.add(message)
    db.session.commit()

And the JSON representation for a message should omit the URL for the user if the user is deleted:

class Message(db.Model):
    # ...
    def to_dict(self):
        return {'id': self.id, 'message': self.message,
                'url': url_for('get_message', id=self.id),
                'user_url': url_for('get_user', id=self.user_id)
                if not self.user.deleted else None}

Now we can say that the service supports user deletions in the expected way. If you want to review the complete set of changes, you can find them in the GitHub repository: https://github.com/miguelgrinberg/sqlalchemy-soft-delete/compare/v0.1...v0.2.

The good news is that from the outside, with this version of the application you cannot really tell that nothing is getting deleted, the behavior is identical to a hard delete implementation. But the cost to achieve that illusion is a bit high, don't you think? In this simple example, I had to make small adjustments to account for soft deletes in almost all API endpoints.

And that's not the worst of it, the really bad news is that I will have to remember to account for soft deleted users in any code I write in the future. In a soft delete implementation done in this style, you have to explicitly filter out the deleted records every single time. Can you imagine how difficult this can be for a large application worked on by a team of developers?

Let's find a better way.

Making Soft Deletes More Convenient

As you've seen in the previous section, the core issue that complicates the application after soft deleted users have been added is that the SQLAlchemy queries need to be modified to work correctly. Below you can see how SQLAlchemy queries change when adopting soft deletes:

Hard or no deletesSoft deletes
Active usersUser.queryUser.query.filter_by(deleted=False)
Deleted usersN/AUser.query.filter_by(deleted=True)
All usersN/AUser.query

As you can see in the table, soft deletes provide us with more flexibility in issuing queries for users, but the key problem is that the most important query of all, which is the one that returns active users, is not a trivial and expected User.query, but a more complex query. If you forget to update a query for active users to use the correct form, nothing will apparently break, which is the scary part. You just have silently added a bug to your application. And this isn't even painting the whole picture, because applications will also have queries on other entities that need to do joins against users, and those, again, need to be fixed to work correctly with the soft delete implementation.

Assuming the most common user related queries are those for active users, wouldn't it be good to find a way to alter the behavior of User.query so that, by default, only active users are returned? That would mean that in large part, existing code will continue to work in the same way as before, and only the code that explicitly needs to deal with deleted entities needs to be modified.

I started with the idea of "monkey patching" the query property of a model so that it represents a different query, one that has the condition to filter out the soft deleted entries. In the end, I decided to take advantage of the ability in Flask-SQLAlchemy to use a custom query class:

class User(db.Model):
    # ... 
    query_class = QueryWithSoftDelete

But I have to be honest here. The implementation of the QueryWithSoftDelete class is a bit tricky, it uses some tricks to achieve the goal of adding filter_by(deleted=False) to the initial query exposed by User.query. Here is my first implementation:

class QueryWithSoftDelete(BaseQuery):
    def __new__(cls, *args, **kwargs):
        obj = super(QueryWithSoftDelete, cls).__new__(cls)
        if len(args) > 0:
            super(QueryWithSoftDelete, obj).__init__(*args, **kwargs)
            return obj.filter_by(deleted=False)
        return obj

    def __init__(self, *args, **kwargs):
        pass

The new query class inherits from Flask-SQLAlchemy's BaseQuery. To obtain the modified query that I needed I had to alter the way the object is created. I could not add the filter statement in the __init__() function because in SQLAlchemy query objects are immutable, so it is not possible to update one in place. The only viable option is to do it in __new__(), where the query object is created and can be swapped out for a replacement. But to add a filter to the query I needed the query object to be initialized first, so I had to invoke the parent class __init__() function explicitly in __new__(), then add the extra filter and return that. And then I had to prevent __init__() to be called a second time by Python by adding an empty overloaded __init__() method. SQLAlchemy also has some tricks of its own when it creates queries, so I also had to account for a situation in which a query object is created without passing any arguments to the class, in which case I just use the default object creation.

With this custom query class, User.query is the user query expanded with the filter that leaves deleted objects out, and all the endpoints in the example application can be reverted to the way they were before soft deletes where added. Any new code that is written which references User.query will get only the active users.

So this is looking great, but what happens if you ever want to run a query that includes the deleted users? It would be nice if there was a way to optionally "reset" the query back to include all objects. To add that functionality I added a new method in the custom query class called with_deleted(). If you want to run a query that uses all users instead of just the active ones, then you can do it this way:

User.query.with_deleted().filter(...)

And this, I think looks pretty good. Below you can see the final version of the soft delete query class, including the with_deleted() method:

class QueryWithSoftDelete(BaseQuery):
    def __new__(cls, *args, **kwargs):
        obj = super(QueryWithSoftDelete, cls).__new__(cls)
        with_deleted = kwargs.pop('_with_deleted', False)
        if len(args) > 0:
            super(QueryWithSoftDelete, obj).__init__(*args, **kwargs)
            return obj.filter_by(deleted=False) if not with_deleted else obj
        return obj

    def __init__(self, *args, **kwargs):
        pass

    def with_deleted(self):
        return self.__class__(db.class_mapper(self._mapper_zero().class_),
                              session=db.session(), _with_deleted=True)

To get the clean query that includes all the entries I had to resort to another little trick. I added a _with_deleted keyword argument to the class, and with this argument I can instruct __new__() to include or not include the soft delete filter criteria.

The complete set of changes is available in the GitHub repository: https://github.com/miguelgrinberg/sqlalchemy-soft-delete/compare/v0.2...v0.3. And if you feel like trying out the code, you can clone the repository at https://github.com/miguelgrinberg/sqlalchemy-soft-delete and follow the instructions in the readme file.

UPDATE (12/24/2016): I've been alerted of a problem with the query.get() method, which fails to work when invoked on the new query class. The repository now includes a fix, which requires get() to be implemented in a slightly different way.

Conclusion

Before you go on to add soft deletes to all your projects I think it would be fair for me to mention that there are as many critics of the soft delete pattern as there are supporters, if not more. Some of the limitations of the pattern, as seen by its detractors are:

  • Any defined cascade settings will not trigger if soft deletes are used, since nothing gets deleted for real. This means that if you need some cascading effect in related entities of a soft deleted object, that will have to be implemented manually, or else you may end up getting deleted data incorrectly referenced by non-deleted entities, or deleted data that inadvertently appears as results from joins or other database operations that work across multiple tables.
  • A soft deleted entry in a table does not free up uniquely indexed columns. For example, a unique nickname for a deleted user will not become available for another user to take, since the actual row remains physically stored in the table.
  • Over time, a lot of the data that you have in your database can be deleted data. All this extra, mostly unneeded data could affect the performance of the database

The soft delete pattern is not the only pattern that addresses the problem of dealing with deleted data without doing physical deletions. Two other solutions that you may consider are:

  • Archive deleted to a separate table, maybe even on a separate database. This will keep the database with the active data at a reasonable size.
  • Serialize data to a log or backup file before physically deleting it.

I hope you find this technique useful. I would be very interested in hearing your thoughts about my soft delete implementation, in particular if you find situations in which my custom query object is insufficient. Let me know below in the comments!

17 comments

  • #1 Mohamed Ali said 2016-09-30T07:03:01Z

    thank you Mr. Miguel . i want ask about document database like mongodb . what's is solution

  • #2 dk said 2016-09-30T17:48:27Z

    One of my programming mentors loved soft deletes and we used them in a couple significant projects. At the end of the day I think that the additional complexity and boilerplate-type code necessary for soft-deletes isn't worth it, especially given a reasonable backup scheme.

  • #3 Miguel Grinberg said 2016-09-30T18:48:01Z

    @dk: I would not say that I love soft-deletes myself, but I recognize that sometimes deleting stuff for good is not an option, so you have to find an alternative. Making soft-deletes a bit more decent than having to implement everything by hand is a valid goal, in my opinion.

  • #4 esthom said 2016-09-30T19:42:42Z

    I do not consider `soft delete` a good pattern. I believe the way to go is Event Sourcing, where data is actually never lost, the state is maintained by a chain of events that facilitate data provenance.

  • #5 Pavel said 2016-09-30T19:45:26Z

    Hello, Miguel! Thank you for the article, that's quite a great solution to get away from filtering every time when you use soft delete.

  • #6 Pete Forman said 2016-10-02T12:21:29Z

    Another variation is that instead of a boolean delete column a pair of datetime columns can represent start and stop of validity. (In some cases stop may suffice.) I have used this for a system messages table, it is handy to be able to set messages about maintenance, etc. that will be displayed in the future. Of course that has the same issues about modifying existing queries.

  • #7 Charliie said 2016-10-18T13:24:15Z

    Is that "db.relationship" in the models correct? shouldn't db.relationship line be a User's field instead of Message's as it's a one to many relationship?

  • #8 Miguel Grinberg said 2016-10-18T18:56:27Z

    @Charlie: it works both ways. What I did here is consider the relationship from the other end, so it is a many-to-one. See http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html#many-to-one.

  • #9 Jeremy Epstein said 2016-10-28T00:11:11Z

    Thanks for the clear example of how to write a custom query class in Flask-SQLAlchemy. I used to achieve the same back in the day, with Django, using custom Managers, but haven't looked into it since I started with Flask (and I should have, it's been a while now). Personally, I usually call boolean fields like this "active" or "published" (or, if clients request it, a second boolean field called "archived"). I think these field names make more sense than "deleted", because hey, a soft delete isn't a real delete, it's just making some piece of content inactive / unpublished / archived, but it still exists in the DB. But that's just a personal preference of semantics, it's a soft delete regardless.

  • #10 Paul @ PLD said 2016-11-16T01:04:18Z

    Hello, Miguel! Thank you for the article, that's quite a great solution to get away from filtering every time when you use soft delete.

  • #11 Josh said 2016-12-24T16:33:04Z

    Thank you Miguel for the great tutorials, and I have just tried implementing the softdelete in my application as well testing with your git hub example. In both applications I am getting an error when performing the query.get_or_404(id) commands. Below is my stack trace from when I ran the API to get a specific user from you app (I also get a similar exception "Query.get() being called on a Query with existing criterion" in mine as well). Any thoughts on what is going on here? Thank you, Josh [2016-12-24 11:22:22,987] ERROR in app: Exception on /users/1 [GET] Traceback (most recent call last): File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/flask/app.py", line 1988, in wsgi_app response = self.full_dispatch_request() File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/flask/app.py", line 1641, in full_dispatch_request rv = self.handle_user_exception(e) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/flask/app.py", line 1544, in handle_user_exception reraise(exc_type, exc_value, tb) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise raise value File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/flask/app.py", line 1639, in full_dispatch_request rv = self.dispatch_request() File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/flask/app.py", line 1625, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/Users/Josh/tmp2/sqlalchemy-soft-delete/app.py", line 74, in get_user user = User.query.get_or_404(id) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/flask_sqlalchemy/__init__.py", line 420, in get_or_404 rv = self.get(ident) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 831, in get return self._get_impl(ident, loading.load_on_ident) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 864, in _get_impl return fallback_fn(self, key) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 173, in load_on_ident q._get_condition() File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 350, in _get_condition "get", order_by=False, distinct=False) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 368, in _no_criterion_condition self._no_criterion_assertion(meth, order_by, distinct) File "/Users/Josh/Envs/soft-delete/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 365, in _no_criterion_assertion "Query with existing criterion. " % meth) sqlalchemy.exc.InvalidRequestError: Query.get() being called on a Query with existing criterion. 127.0.0.1 - - [24/Dec/2016 11:22:22] "GET /users/1 HTTP/1.1" 500 -

  • #12 Miguel Grinberg said 2016-12-24T23:33:55Z

    @Josh: odd that I did not see this back when I published the article. In any case, fixing this requires a small addition to the new query class. You can find the fix in the last commit in the git repository. Thanks for raising the issue.

  • #13 Josh said 2016-12-25T00:32:45Z

    Thank you Miguel for the fix and that did the trick. Overriding the default query.get behavior makes sense. I tried the first_or_404() with a filter by the ident and that was another option, but this is much cleaner. Thanks again!

  • #14 Josh said 2016-12-25T02:59:12Z

    Hi @Miguel one follow on question. Do you know how to enforce the deleted=False filtering on model relationships? For instance I have three models (Device, Service, Char) linked by relationships and a query on a Device returns an object like Device.services.chars. What I would like is to not have the services and chars that are deleted in that object. I tried adding query_class=QueryWithSoftDelete <e.g services = db.relationship("Service", backref="device", lazy="immediate", query_class=QueryWithSoftDelete)> property in the the db.relationship, but no luck. I also see that you handle this in the to_dict() methods by filtering out the deleted objects, but is there a better way that filters those out at the DB level? Thank you again for the help.

  • #15 Miguel Grinberg said 2016-12-25T07:15:11Z

    @Josh: it really depends on how you want to design your data models, but if you don't want certain objects to appear in relationships, then the proper thing to do would be to remove them from those relationships. That's what happens when you do actual deletes via a cascade clause. If you have a good reason to leave those linked, then writing a method or property that returns the query with the deleted entities filtered is a good option, in my opinion.

  • #16 Piet hendriks said 2017-08-09T07:39:58Z

    The filter does also not work when I add labels to a query, e.g.: MyObject.name.label('Name'). Because the label object does not have the attribute "deleted"

  • #17 Alec Nikolas Reiter said 2017-09-04T01:01:20Z

    Ended up using this in a project, however there is one issue. If soft deletes are applied after the fact you need to either backfill the data (not a terrible idea) or make a modification in `__new__` to look for either deleted=False OR deleted=None. The following fixed my issue: entity = inst._mapper_zero().class_ return inst.filter(db.or_(entity.deleted == False, entity.deleted == None)) if not with_deleted else inst

Leave a Comment