2012-11-04T08:00:41Z

The Flask Mega-Tutorial, Part X: Full Text Search

(Great news! There is a new version of this tutorial!)

This is the tenth article in the series in which I document my experience writing web applications in Python using the Flask microframework.

The goal of the tutorial series is to develop a decently featured microblogging application that demonstrating total lack of originality I have decided to call microblog.

NOTE: This article was revised in September 2014 to be in sync with current versions of Python and Flask.

Here is an index of all the articles in the series that have been published to date:

Recap

In the previous article in the series we've enhanced our database queries so that we can get results on pages.

Today, we are going to continue working on our database, but in a different area. All applications that store content must provide a search capability.

For many types of web sites it is possible to just let Google, Bing, etc. index all the content and provide the search results. This works well for sites that have mostly static pages, like a forum. In our little microblog application the basic unit of content is just a short user post, not a whole page. The type of search results that we want are dynamic. For example, if we search for the word "dog" we want to see blog posts from any users that include that word. It is obvious that until someone searches for that word there is no page that the big search engines could have indexed with these results, so clearly we have no choice other than rolling our own search.

Introduction to full text search engines

Unfortunately support for full text search in relational databases is not well standardized. Each database implements full text search in its own way, and SQLAlchemy at this time does not have a full text search abstration.

We are currently using SQLite for our database, so we could just create a full text index using the facilities provided by SQLite, bypassing SQLAlchemy. But that isn't a good idea, because if one day we decide to switch to another database we would need to rewrite our full text search capability for another database.

So instead, we are going to let our database deal with the regular data, and we are going to create a specialized database that will be dedicated to text searches.

There are a few open source full text search engines. The only one that to my knowledge has a Flask extension is Whoosh, an engine also written in Python. The advantage of using a pure Python engine is that it will install and run anywhere a Python interpreter is available. The disadvantage is that search performance will not be up to par with other engines that are written in C or C++. In my opinion the ideal solution would be to have a Flask extension that can connect to several engines and abstract us from dealing with a particular one in the same way Flask-SQLAlchemy gives us the freedom to use several database engines, but nothing of that kind seems to be available for full text searching at this time. Django developers do have a very nice extension that supports several full text search engines called django-haystack. Maybe one day someone will create a similar extension for Flask.

But for now, we'll implement our text searching with Whoosh. The extension that we are going to use is Flask-WhooshAlchemy, which integrates a Whoosh database with Flask-SQLAlchemy models.

Python 3 Compatibility

Unfortunately, we have a problem with Python 3 and these packages. The Flask-WhooshAlchemy extension was never made compatible with Python 3. I have forked this extension and made a few changes to make it work, so if you are on Python 3 you will need to uninstall the official version and install my fork:

$ flask/bin/pip uninstall flask-whooshalchemy
$ flask/bin/pip install git+git://github.com/miguelgrinberg/flask-whooshalchemy.git

Sadly this isn't the only problem. Whoosh also has issues with Python 3, it seems. In my testing I have encontered this bug, and to my knowledge there isn't a solution available, which means that at this time the full text search capability does not work well on Python 3. I will update this section once the issues are resolved.

Configuration

Configuration for Flask-WhooshAlchemy is pretty simple. We just need to tell the extension what is the name of the full text search database (file config.py):

WHOOSH_BASE = os.path.join(basedir, 'search.db')

Model changes

Since Flask-WhooshAlchemy integrates with Flask-SQLAlchemy, we indicate what data is to be indexed for searching in the proper model class (file app/models.py):

from app import app

import sys
if sys.version_info >= (3, 0):
    enable_search = False
else:
    enable_search = True
    import flask_whooshalchemy as whooshalchemy

class Post(db.Model):
    __searchable__ = ['body']

    id = db.Column(db.Integer, primary_key=True)
    body = db.Column(db.String(140))
    timestamp = db.Column(db.DateTime)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'))

    def __repr__(self):
        return '<Post %r>' % (self.body)

if enable_search:
    whooshalchemy.whoosh_index(app, Post)

The model has a new __searchable__ field, which is an array with all the database fields that will be in the searchable index. In our case we only want to index the body field of our posts.

We also have to initialize the full text index for this model by calling the whoosh_index function. Note that since we know that the search capability currently does not work on Python 3 we have to skip its initialization. Once the problems in Whoosh are fixed the logic around enable_search can be removed.

Since this isn't a change that affects the format of our relational database we do not need to record a new migration.

Unfortunately any posts that were in the database before the full text engine was added will not be indexed. To make sure the database and the full text engine are synchronized we are going to delete all posts from the database and start over. First we start the Python interpreter. For Windows users:

flask\Scripts\python

And for everyone else:

flask/bin/python

Then in the Python prompt we delete all the posts:

>>> from app.models import Post
>>> from app import db
>>> for post in Post.query.all():
...    db.session.delete(post)
>>> db.session.commit()

Searching

And now we are ready to start searching. First let's add a few new posts to the database. We have two options to do this. We can just start the application and enter posts via the web browser, as regular users would do, or we can also do it in the Python prompt.

From the Python prompt we can do it as follows:

>>> from app.models import User, Post
>>> from app import db
>>> import datetime
>>> u = User.query.get(1)
>>> p = Post(body='my first post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my second post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my third and last post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> db.session.commit()

The Flask-WhooshAlchemy extension is nice, because it hooks up into Flask-SQLAlchemy commits automatically. We do not need to maintain the full text index, it is all done for us transparently.

Now that we have a few posts in our full text index we can issue searches:

>>> Post.query.whoosh_search('post').all()
[<Post u'my second post'>, <Post u'my first post'>, <Post u'my third and last post'>]
>>> Post.query.whoosh_search('second').all()
[<Post u'my second post'>]
>>> Post.query.whoosh_search('second OR last').all()
[<Post u'my second post'>, <Post u'my third and last post'>]

As you can see in the examples above, the queries do not need to be limited to single words. In fact, Whoosh supports a pretty powerful search query language.

Integrating full text searches into the application

To make the searching capability available to our application's users we have to add just a few small changes.

Configuration

As far as configuration, we'll just indicate how many search results should be returned as a maximum (file config.py):

MAX_SEARCH_RESULTS = 50

Search form

We are going to add a search form to the navigation bar at the top of the page. Putting the search box at the top is nice, because then the search will be accessible from all pages.

First we add a search form class (file app/forms.py):

class SearchForm(Form):
    search = StringField('search', validators=[DataRequired()])

Then we need to create a search form object and make it available to all templates, since we will be putting the search form in the navigation bar that is common to all pages. The easiest way to achieve this is to create the form in the before_request handler, and then stick it in Flask's global g (file app/views.py):

from forms import SearchForm

@app.before_request
def before_request():
    g.user = current_user
    if g.user.is_authenticated:
        g.user.last_seen = datetime.utcnow()
        db.session.add(g.user)
        db.session.commit()
        g.search_form = SearchForm()

Then we add the form to our template (file app/templates/base.html):

<div>Microblog:
    <a href="{{ url_for('index') }}">Home</a>
    {% if g.user.is_authenticated %}
    | <a href="{{ url_for('user', nickname=g.user.nickname) }}">Your Profile</a>
    | <form style="display: inline;" action="{{ url_for('search') }}" method="post" name="search">{{ g.search_form.hidden_tag() }}{{ g.search_form.search(size=20) }}<input type="submit" value="Search"></form>
    | <a href="{{ url_for('logout') }}">Logout</a>
    {% endif %}
</div>

Note that we only display the form when we have a logged in user. Likewise, the before_request handler will only create a form when a user is logged in, since our application does not show any content to guests that are not authenticated.

Search view function

The action field of our form was set above to send all search requests the the search view function. This is where we will be issuing our full text queries (file app/views.py):

@app.route('/search', methods=['POST'])
@login_required
def search():
    if not g.search_form.validate_on_submit():
        return redirect(url_for('index'))
    return redirect(url_for('search_results', query=g.search_form.search.data))

This function doesn't really do much, it just collects the search query from the form and then redirects to another page passing this query as an argument. The reason the search work isn't done directly here is that if a user then hits the refresh button the browser will put up a warning indicating that form data will be resubmitted. This is avoided when the response to a POST request is a redirect, because after the redirect the browser's refresh button will reload the redirected page.

Search results page

Once a query string has been received the form POST handler sends it via page redirection to the search_results handler (file app/views.py):

from config import MAX_SEARCH_RESULTS

@app.route('/search_results/<query>')
@login_required
def search_results(query):
    results = Post.query.whoosh_search(query, MAX_SEARCH_RESULTS).all()
    return render_template('search_results.html',
                           query=query,
                           results=results)

The search results view function sends the query into Whoosh, passing a maximum number of search results, since we don't want to be presenting a potentially large number of hits, we are happy showing just the first fifty.

The final piece is the search results template (file app/templates/search_results.html):

<!-- extend base layout -->
{% extends "base.html" %}

{% block content %}
  <h1>Search results for "{{ query }}":</h1>
  {% for post in results %}
      {% include 'post.html' %}
  {% endfor %}
{% endblock %}

And here, once again, we can reuse our post.html sub-template, so we don't need to worry about rendering avatars or other formatting elements, since all of that is done in a generic way in the sub-template.

Final words

We now have completed yet another important, though often overlooked piece that any decent web application must have.

The source code for the updated microblog application is available below:

Download microblog-0.10.zip.

As always, the above download does not include a database or a flask virtual environment. See previous articles in the series to learn how to create these.

I hope you enjoyed this tutorial. If you have any questions feel free to write in the comments below. Thank you for reading, and I will be seeing you again in the next installment!

Miguel

149 comments

  • #126 Miguel Grinberg said 2016-12-04T23:40:00Z

    @Xavi: Not sure what are you asking. If you need to search the same string on three tables, why don't you run the query on each of the tables, then combine the three lists into a single one to return to the client?

  • #127 Manuel Ramos Calderón said 2016-12-26T17:12:10Z

    @Timothy Cleveland : were you able to solve the problem?

    I am refering to the following: AttributeError: 'BaseQuery' object has no attribute 'whoosh_search'

    Thanks :D

  • #128 Humoyun Ahmad said 2017-01-15T02:47:00Z

    Dear Miguel Grinberg!

    I am very pleased to say that you are done awesome job, but I am curious why you did not include full text search part in you book, now I am following your book and it is just great. It would be very nice to see the search part in your book also!

  • #129 Miguel Grinberg said 2017-01-15T16:44:45Z

    @Humoyun: the book had space constraints set by the publisher, I had to pick topics so that it did not become too large. Besides, I don't think I can do much more on this topic than what I did here in this article, so it would not add anything new.

  • #130 Michael Griesinger said 2017-01-20T01:38:36Z

    "In my testing I have encountered this bug, and to my knowledge there isn't a solution available, which means that at this time the full text search capability does not work well on Python 3. I will update this section once the issues are resolved."

    Looks like whoosh 2.x supports python 3 http://whoosh.readthedocs.io/en/latest/releases/2_0.html

  • #131 Vladislav Veselov said 2017-03-06T10:14:21Z

    Thank you Miguel for this excellent tutorial!

    I've coped with search engine through WhooshAlchemy. Starting from version 0.3.0 you can use it with Python 3. There is an URL, if someone interested: https://pypi.python.org/pypi/WhooshAlchemy

  • #132 Vladimir Velkov said 2017-05-31T03:36:31Z

    The following is to whom might be concerned :)

    There is issue where the whoost_search() method is returning empty query.

    If you experience this problem, it might be because WhooshAlchemy is expecting SQLAlchemy to track modifications; the relevant SQLAlchemy setting is SQLALCHEMY_TRACK_MODIFICATIONS – you might want to check if this setting is disabled in your config.

    AFAIK, the above setting is not enabled by default.

    BR, Vlad

  • #133 Miguel Grinberg said 2017-05-31T05:13:36Z

    @Vladimir: I'm not sure what problem you refer to, but I don't see how the track modifications setting can affect how whoosh queries work, since they are completely unrelated.

  • #134 perry said 2017-07-26T13:00:19Z

    Hello, thank you for the tutorial. I receive this error when performing a search. I also get the error "AttributeError: 'BaseQuery' object has no attribute 'whoosh_search'" when testing on the command line. I am using python 3.4.2 on a raspberry pi zero.

    File "/home/pi/microblog/app/views.py", line 184, in search_results

    @app.route('/search_results/<query>') @login_required def search_results(query): results = Post.query.whoosh_search(query, MAX_SEARCH_RESULTS).all() return render_template('search_results.html', query=query, results=results) AttributeError: 'BaseQuery' object has no attribute 'whoosh_search'
  • #135 Miguel Grinberg said 2017-07-26T14:48:14Z

    @perry: Have you installed the Flask-WhooshAlchemy extension? Look at the code on GitHub and compare it against yours to see what's missing.

  • #136 Cornelia Xaos said 2017-08-17T20:01:50Z

    Probably super late, but.. Flask_WhooshAlchemy version 0.8 works in Python 3 with no needed changes to Miguel's code. I haven't run into any bugs (nor the bug he mentioned above.. but I haven't exactly been trying to cause them). Still, I'm able to search just fine with that version of WhooshAlchemy.. was tempted to try and get Whoosh_Paginate working. :P

    ===

    Be sure to rebuild your indices if coming from an old version of WhooshAlchemy. which you could do with something like the following:

    posts = Post.query.all() for post in posts: db.session.delete(post)

    db.session.commit() db.session.add_all(posts) db.session.commit()

    That should cause indices to be rebuilt. Make sure you delete the 'search.db' first!

  • #137 Yury said 2017-10-07T10:59:29Z

    Hi, Miguel! Thanks you for so exciting tutorial. Now i want to share my experience with subs. Maybe it is not actual now, but i haven`t seen that problem with empty list has been solved. So, i have tryed the method also known as "Trial and error". I was changing many things but always got empty list - [ ] when i used woosh_search. Once i tryed to search user nicknames in User table. I have added new user with name ''dub2" and run script "User.query.whoosh_search('dub2').all()". And it has returned the user with nickname "dub2". I was dissapointed in a moment. But then I payed attention to the difference between query for the posts and query for the users. Here"Post.query.whoosh_search('post').all()" we use keyword 'post' that is contained in every of 3 posts (its all about examples for python console). But in this query "User.query.whoosh_search('dub2').all()" keyword is similar to the real nickname. So i have supposed that keyword is not a constant value and can take some formats. I've tryed to make by analogy with windows search by adding "" on both sides of the keyword. After adding 2 more users, I've ran next script "User.query.whoosh_search('dub').all()" and I've got result [, , ] Finally I've rewritten posts query with adding "" to the keyword "post". And look for this:

    Post.query.whoosh_search('post').all() [, , ]

    I hope that it will be usefull for someone. Sorry for bad english, its not my native language. Best Regards!

  • #138 Andrew said 2017-10-26T03:56:24Z

    Hey Miguel,

    I've followed your instructions and even added the custom whoosh-alchemy, but still get the following error, any solutions?

    db.session.commit() Post.query.whoosh_search('post').all() Traceback (most recent call last): File "", line 1, in AttributeError: 'BaseQuery' object has no attribute 'whoosh_search'

  • #139 Miguel Grinberg said 2017-10-27T17:44:53Z

    @Andrew: My guess is that for some reason the flask-whooshalchemy extension isn't initializing. Not sure if this is your fault or not. This article was written long ago, and I have actually moved away from this type of solution. The updated version of this article that is coming up soon will feature a completely different implementation based on Elasticsearch.

  • #140 Lubomir said 2017-11-04T17:53:43Z

    Hello, thank you for the tutorial, it has been a great help. Just another heads-up: if Post.query.whoosh_search returns empty check your config file for SQLALCHEMY_TRACK_MODIFICATIONS. It has to be set to True. https://github.com/gyllstromk/Flask-WhooshAlchemy/issues/21

  • #141 tom said 2017-11-14T16:30:50Z

    Hi Miguel Grinberg, I am following your tutorials and book on flask web development for a long time. It's been very helpful. Really appreciate all your knowledge sharing. And now I am trying to use python3 to build full-text-search function for my website, do you have any suggestions on how to implement since this post is already 4 years old?

    Hope you are always well and thanks again!

  • #142 Miguel Grinberg said 2017-11-14T17:08:20Z

    @tom: for the updated version of this tutorial that I will be releasing next month I have switched to Elasticsearch.

  • #143 tom said 2017-12-20T16:55:13Z

    Hello Miguel, could you give me some tips about this problem on full-text-search please?

    I have two classes(tables), Post and Tag, and their relationship is many to many. Post has searchable fields of title and p_body and Tag has searchable field of t_body.

    My silly question is how to get the posts whose title or p_body or tags' body ? I would like to get the posts that meet either condition and then paginate them to the frontend.

    My english is not very good, I hope you can understand, really appreciate your time and patience!

  • #144 Miguel Grinberg said 2017-12-21T04:37:00Z

    @tom: have you considered building a single index for all your fields? This may require a custom solution, and that seems to be the easiest.

  • #145 tom said 2017-12-21T05:40:55Z

    I am using flask-sqlalchemy as in your tutorials. Could you give me a short example on how to create a single index for all the fields in different Classes/Tables so that I can use flask-whooshalchemy to do the full text search? many many thanks.

  • #146 Miguel Grinberg said 2017-12-21T05:59:33Z

    @tom: I haven't used Whoosh in many years, not sure if you noticed that this article was written in 2012. As I said, you may need to build a custom solution. If you use Whoosh directly, or any other full-test search engine, you can add words from different sources to a single index. That is what I suggested as a possible solution.

  • #147 Ilya said 2017-12-25T22:33:37Z

    Hi Miguel,

    I'm seeing the following error ever since implementing full text search when I try to run anything in the app:

    Traceback (most recent call last): File "./run.py", line 2, in from app import app File "/Users/ilyanatarius/Envs/flask/app/init.py", line 19, in from app import views, models File "/Users/ilyanatarius/Envs/flask/app/views.py", line 4, in from .forms import LoginForm, EditForm, PostForm File "/Users/ilyanatarius/Envs/flask/app/forms.py", line 4, in from app.models import User File "/Users/ilyanatarius/Envs/flask/app/models.py", line 98, in class Post(db.Model): File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/flask_sqlalchemy/model.py", line 67, in init super(NameMetaMixin, cls).init(name, bases, d) File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/flask_sqlalchemy/model.py", line 121, in init super(BindMetaMixin, cls).init(name, bases, d) File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/sqlalchemy/ext/declarative/api.py", line 64, in init as_declarative(cls, classname, cls.dict) File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 88, in _as_declarative _MapperConfig.setup_mapping(cls, classname, dict) File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 103, in setup_mapping cfg_cls(cls_, classname, dict_) File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 131, in init self._setup_table() File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 395, in _setup_table table_kw) File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/flask_sqlalchemy/model.py", line 90, in __table_cls__ return sa.Table(*args, kwargs) File "/Users/ilyanatarius/Envs/flask/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 421, in new "existing Table object." % key) sqlalchemy.exc.InvalidRequestError: Table 'post' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

    Any ideas on what might be wrong here? I've compared my code and setup to yours, which I have been following verbatim, and I can't track down the issue.

  • #148 Miguel Grinberg said 2017-12-28T07:37:28Z

    @Ilya: The error seems to suggest you have two Post models in your application.

  • #149 Francesco said 2018-03-11T20:54:05Z

    Thank you! This bit "if a user then hits the refresh button the browser will put up a warning indicating that form data will be resubmitted. This is avoided when the response to a POST request is a redirect" has been very helpful, I couldn't find this info anywhere looking for "get or post request for searchbar flask" or similia

Leave a Comment