The Flask Mega-Tutorial, Part XIV: I18n and L10n (2012)

Posted by
on under

(Great news! There is a new version of this tutorial!)

This is the fourteenth article in the series in which I document my experience writing web applications in Python using the Flask microframework.

The goal of the tutorial series is to develop a decently featured microblogging application that demonstrating total lack of originality I have decided to call microblog.

NOTE: This article was revised in September 2014 to be in sync with current versions of Python and Flask.

Here is an index of all the articles in the series that have been published to date:

The topics of today's article are Internationalization and Localization, commonly abbreviated I18n and L10n. We want our microblog application to be used by as many people as possible, so we can't forget that there is a lot of people out there that can't speak English, or that can, but prefer to speak their native language.

To make our application accessible to foreign visitors we are going to use the Flask-Babel extension, which provides an extremely easy to use framework to translate the application into different languages.

Configuration

To work with translations we are going to use a package called Babel, along with its Flask extension Flask-Babel. Flask-Babel is initialized simply by creating an instance of class Babel and passing the Flask application to it (file app/__init__.py):

from flask_babel import Babel
babel = Babel(app)

We also need to decide which languages we will offer translations for. For now we'll start with a Spanish version, since we have a translator at hand (yours truly), but it'll be easy to add more languages later. The list of supported languages will go to our configuration (file config.py):

# -*- coding: utf-8 -*-
# ...
# available languages
LANGUAGES = {
    'en': 'English',
    'es': 'Español'
}

The LANGUAGES dictionary has keys that are the language codes for the available languages, and values that are the printable name of the language. We are using the short language codes here, but when necessary the long codes that specify language and region can be used as well. For example if we wanted to support American and British English as separate languages we would have 'en-US' and 'en-GB' in our dictionary.

Note that because the word Español has a foreign character in it we have to add the coding comment at the top of the Python source file, to tell the Python interpreter that we are using the UTF-8 encoding and not ASCII, which lacks the ñ character.

The next piece of configuration that we need is a function that Babel will use to decide which language to use (file app/views.py):

from app import babel
from config import LANGUAGES

@babel.localeselector
def get_locale():
    return request.accept_languages.best_match(LANGUAGES.keys())

The function that is marked with the localeselector decorator will be called before each request to give us a chance to choose the language to use when producing its response. For now we will do something very simple, we'll just read the Accept-Languages header sent by the browser in the HTTP request and find the best matching language from the list that we support. This is actually pretty simple, the best_match method does all the work for us.

The Accept-Languages header in most browsers is configured by default with the language selected at the operating system level, but all browsers give users the chance to select other languages. Users can provide a list of languages, each with a weight. As an example, here is a complex Accept-Languages header:

Accept-Language: da, en-gb;q=0.8, en;q=0.7

This says that Danish is the preferred language (with default weight = 1.0), followed by British English (weight = 0.8) and as a last option generic English (weight = 0.7).

The final piece of configuration that we need is a Babel configuration file that tells Babel where to look for texts to translate in our code and templates (file babel.cfg):

[python: **.py]
[jinja2: **/templates/**.html]
extensions=jinja2.ext.autoescape,jinja2.ext.with_

The first two lines give Babel the filename patterns for our Python and template files respectively. The third line tells Babel to enable a couple of extensions that make it possible to find text to translate in Jinja2 templates.

Marking texts to translate

Now comes the most tedious aspect of this task. We need to review all our code and templates and mark all English texts that need translating so that Babel can find them. For example, take a look at this snippet from function after_login:

if resp.email is None or resp.email == "":
    flash('Invalid login. Please try again.')
    redirect(url_for('login'))

Here we have a flash message that we want to translate. To expose this text to Babel we just wrap the string in Babel's gettext function:

from flask_babel import gettext
# ...
if resp.email is None or resp.email == "":
    flash(gettext('Invalid login. Please try again.'))
    redirect(url_for('login'))

In a template we have to do something similar, but we have the option to use _() as a shorter alias to gettext(). For example, the word Home in this link from our base template:

  <li><a href="{{ url_for('index') }}">Home</a></li>

can be marked for translation as follows:

  <li><a href="{{ url_for('index') }}">{{ _('Home') }}</a></li>

Unfortunately not all texts that we want to translate are as simple as the above. As an example of a tricky one, consider the following snippet from our post.html subtemplate:

<p><a href="{{ url_for('user', nickname=post.author.nickname) }}">{{ post.author.nickname }}</a> said {{ momentjs(post.timestamp).fromNow() }}:</p>

Here the sentence that we want to translate has this structure: "<nickname> said <when>:". One would be tempted to just mark the word "said" for translation, but we can't really be sure that the order of the name and the time components in this sentence will be the same in all languages. The correct thing to do here is to mark the entire sentence for translation using placeholders for the name and the time, so that a translator can change the order if necessary. To complicate matters more, the name component has a hyperlink embedded in it!

There isn't really a nice and easy way to handle cases like this. The gettext function supports placeholders using the syntax %(name)s and that's the best we can do. Here is a simple example of a placeholder in a much simpler situation:

gettext('Hello, %(name)s', name=user.nickname)

The translator will need to be aware that there are placeholders and that they should not be touched. Clearly the name of a placeholder (what's between the %( and )s) must not be translated or else the connection to the actual value would be lost.

But back to our post template example, here is how it is marked for translation:

{% autoescape false %}
<p>{{ _('%(nickname)s said %(when)s:', nickname = '<a href="%s">%s</a>' % (url_for('user', nickname=post.author.nickname), post.author.nickname), when=momentjs(post.timestamp).fromNow()) }}</p>
{% endautoescape %}

The text that the translator will see for the above example is:

%(nickname)s said %(when)s:

Which is pretty decent. The values for nickname and when are what gives this translatable sentence its complexity, but these are given as additional arguments to the _() wrapper function and are not seen by the translator.

The nickname and when placeholders contain a lot of stuff in them. In particular, for the nickname we had to build an entire HTML link because we want this nickname to be clickable.

Because we are putting HTML in the nickname placeholder we need to turn off autoescaping to render this portion of the template, if not Jinja2 would render our HTML elements as escaped text. But requesting to render a string without escaping is considered a security risk, it is unsafe to render texts entered by users without escaping.

The text assigned to the when placeholder is safe because it is text that is entirely generated by our momentjs() wrapper function. What goes in the nickname argument, however, is coming from the nickname field of our User model, which in turn comes from our database, which can be entered by the user in a web form. If someone registers into our application with a nickname that contains embedded HTML or Javascript and then we render that malicious nickname unescaped, then we are effectively opening the door to an attacker. We certainly do not want that, so we are going to take a quick detour and remove any risks.

The solution that makes most sense is to prevent any attacks by restricting the characters that can be used in a nickname. We'll start by creating a function that converts an invalid nickname into a valid one (file app/models.py):

import re

class User(db.Model):
    #...
    @staticmethod
    def make_valid_nickname(nickname):
        return re.sub('[^a-zA-Z0-9_\.]', '', nickname)

Here we just take the nickname and remove any characters that are not letters, numbers, the dot or the underscore.

When a user registers with the site we receive his or her nickname from the OpenID provider, so we make sure we convert this nickname to something valid (file app/views.py):

@oid.after_login
def after_login(resp):
    #...
    nickname = User.make_valid_nickname(nickname)
    nickname = User.make_unique_nickname(nickname)
    user = User(nickname=nickname, email=resp.email)
    #...

And then in the Edit Profile form, where the user can change the nickname, we can enhance our validation to not allow invalid characters (file app/forms.py):

class EditForm(Form):
    #...
    def validate(self):
        if not Form.validate(self):
            return False
        if self.nickname.data == self.original_nickname:
            return True
        if self.nickname.data != User.make_valid_nickname(self.nickname.data):
            self.nickname.errors.append(gettext('This nickname has invalid characters. Please use letters, numbers, dots and underscores only.'))
            return False
        user = User.query.filter_by(nickname=self.nickname.data).first()
        if user is not None:
            self.nickname.errors.append(gettext('This nickname is already in use. Please choose another one.'))
            return False
        return True

With these simple measures we eliminate any possible attacks resulting from rendering the nickname to a page without escaping.

Extracting texts for translation

I'm not going to enumerate here all the changes required to mark all texts in the code and the templates. Interested readers can check the github diffs for this.

So let's assume we've found all the texts and wrapped them in gettext() or _() calls. What now?

Now we run pybabel to extract the texts into a separate file:

flask/bin/pybabel extract -F babel.cfg -o messages.pot app

Windows users should use this command instead:

flask\Scripts\pybabel extract -F babel.cfg -o messages.pot app

The pybabel extract command reads the given configuration file, then scans all the code and template files in the directories given as arguments (just app in our case) and when it finds a text marked for translation it copies it to the messages.pot file.

The messages.pot file is a template file that contains all the texts that need to be translated. This file is used as a model to generate language files.

Generating a language catalog

The next step in the process is to create a translation for a new language. We said we were going to do Spanish (language code es), so this is the command that adds Spanish to our application:

flask/bin/pybabel init -i messages.pot -d app/translations -l es

The pybabel init command takes the .pot file as input and writes a new language catalog to the directory given in the -d command line option for the language specified in the -l option. By default, Babel expects the translations to be in a translations folder at the same level as the templates, so that's where we'll put them.

After running the above comment a directory app/translations/es is created. Inside it there is yet another directory called LC_MESSAGES and inside it there is a file called messages.po. The command can be executed multiple times with different language codes to add support for other languages.

The messages.po file that is created in each language folder uses a format that is the de facto standard for language translations, the format used by the venerable gettext utility. There are many translation applications that work with .po files. For our translation needs we will use poedit, because it is one of the most popular and because it runs on all the major operating systems.

If you want to put your translator hat and give this task a try go ahead and install poedit from this link. The usage of this application is straightforward. Below is a screenshot after all the texts have been translated to Spanish:

poedit

The top section shows the texts in their original and translated languages. The bottom left has a box where the translator writes the text.

Once the texts have been translated and saved back to the messages.po file there is yet another step to publish these texts:

flask/bin/pybabel compile -d app/translations

The pybabel compile step just reads the contents of the .po file and writes a compiled version as a .mo file in the same directory. This file contains the translated texts in an optimized format that can be efficiently used by our application.

The translations are now ready to be used. To check them you can modify the language settings in your browser so that Spanish is the preferred language, or if you don't feel like messing with your browser configuration you can also fake it by temporarily changing the localeselector function to always request Spanish (file app/views.py):

@babel.localeselector
def get_locale():
    return 'es'  # request.accept_languages.best_match(LANGUAGES.keys())

Now when we run the server each time the gettext() or _() functions are called instead of getting the English text we will get the translation defined for the language returned by the localeselector function.

Updating the translations

What happens if we leave the messages.po file incomplete, with some of the texts missing a translation? Nothing happens, the application will run just fine regardless, and those texts that don't have a translation will continue to appear in English.

What happens if we missed some of the English texts in our code or templates? Any strings that were not wrapped with gettext() or _() will not be in the translation files, so they'll be unknown to Babel and remain in English. Once we spot a missing text we can add the gettext() wrapper to it and then run the following pybabel commands to update the translation files:

flask/bin/pybabel extract -F babel.cfg -o messages.pot app
flask/bin/pybabel update -i messages.pot -d app/translations

The extract command is identical to the one we issued earlier, it just generates an updated messages.pot file that adds the new texts. The update call takes the new messages.pot file and merges the new texts into all the translations found in the folder given by the -d argument.

Once the messages.po files in each language folder have been updated we can run poedit again to enter translations for the new texts, and then repeat the pybabel compile command to make those new texts available to our application.

Translating moment.js

So now that we have entered a Spanish translation for all the texts in code and templates we can run the application to see how it looks.

And right there we'll notice that all the timestamps are still in English. The moment.js library that we are using to render our dates and times hasn't been informed that we need a different language.

Reading the moment.js documentation we find that there is a large list of translations available, and that we simply need to load a second javascript with the selected language to switch to that language. So we just download the Spanish translation from the moment.js website and put it in our static/js folder as moment-es.min.js. We will follow the convention that any language file for moment.js will be added with the format moment-<language>.min.js, so that we can then select the correct one dynamically.

To be able to load a javascript that has the language code in its name we need to expose this code to the templates. The simplest way to do that is to add the language code to Flask's g global, in a similar way to how we expose the logged in user (file app/views.py):

@app.before_request
def before_request():
    g.user = current_user
    if g.user.is_authenticated:
        g.user.last_seen = datetime.utcnow()
        db.session.add(g.user)
        db.session.commit()
        g.search_form = SearchForm()
    g.locale = get_locale()

And now that we can see the language code in the templates we can load the moment.js language script in our base template (file app/templates/base.html):

{% if g.locale != 'en' %}
<script src="/static/js/moment-{{ g.locale }}.min.js"></script>
{% endif %}

Note that we make it conditional, because if we are showing the English version of the site we already have the correct texts from the first moment.js javascript file.

Lazy evaluation

While we continue playing with the Spanish version of our site we notice another problem. When we log out and try to log back in there is a flash message that reads "Please log in to access this page." in English. But where is this message? Unfortunately we aren't putting out this message, it is the Flask-Login extension that does it on its own.

Flask-Login allows this message to be configured by the user, so we are going to take advantage of that not to change the message but to make it translatable. So in our first try we do this (file app/__init__.py):

from flask_babel import gettext
lm.login_message = gettext('Please log in to access this page.')

But this really does not work. The gettext function needs to be used in the context of a request to be able to produce translated messages. If we call it outside of a request it will just give us the default text, which will be the English version.

For cases like this Flask-Babel provides another function called lazy_gettext, which doesn't look for a translation immediately like gettext() and _() do but instead delay the search for a translation until the string is actually used. So here is how to properly set up the login message (file app/__init__.py):

from flask_babel import lazy_gettext
lm.login_message = lazy_gettext('Please log in to access this page.')

When using lazy_gettext the pybabel extract command needs to be informed that the lazy_gettext function also wraps translatable texts with the -k option:

flask/bin/pybabel extract -F babel.cfg -k lazy_gettext -o messages.pot app

So after extracting yet another messages.pot template we update the language catalogs (pybabel update), translate the added text (poedit) and finally compile the translations one more time (pybabel compile).

With the advent of Flask 0.10 user sessions are serialized to JSON. This introduces a problem with lazy evaluated texts that are given as argument to the flash() function. Flashed messages are written to the user session, but the object used to wrap the lazily evaluated texts is a complex object that does not have a direct conversion to a JSON type. Flask 0.9 did not serialize sessions to JSON so this was not a problem, but until Flask-Babel addresses this we have to provide a solution from our side, and this solution comes in the form of a custom JSON encoder (file app/__init__.py):

from flask.json import JSONEncoder

class CustomJSONEncoder(JSONEncoder):
    """This class adds support for lazy translation texts to Flask's
    JSON encoder. This is necessary when flashing translated texts."""
    def default(self, obj):
        from speaklater import is_lazy_string
        if is_lazy_string(obj):
            try:
                return unicode(obj)  # python 2
            except NameError:
                return str(obj)  # python 3
        return super(CustomJSONEncoder, self).default(obj)

app.json_encoder = CustomJSONEncoder

This installs a custom JSON encoder that forces the lazy texts to be evaluated into strings prior to being converted to JSON. Note the complexity in getting this done differently for Python 2 vs. Python 3.

And now we can say that we have a fully internationalized application!

Shortcuts

Since the pybabel commands are long and hard to remember we are going to end this article with a few quick and dirty little scripts that simplify the most common tasks we've seen above.

First a script to add a language to the translation catalog (file tr_init.py):

#!flask/bin/python
import os
import sys
if sys.platform == 'win32':
    pybabel = 'flask\\Scripts\\pybabel'
else:
    pybabel = 'flask/bin/pybabel'
if len(sys.argv) != 2:
    print "usage: tr_init <language-code>"
    sys.exit(1)
os.system(pybabel + ' extract -F babel.cfg -k lazy_gettext -o messages.pot app')
os.system(pybabel + ' init -i messages.pot -d app/translations -l ' + sys.argv[1])
os.unlink('messages.pot')

Then a script to update the catalog with new texts from source and templates (file tr_update.py):

#!flask/bin/python
import os
import sys
if sys.platform == 'win32':
    pybabel = 'flask\\Scripts\\pybabel'
else:
    pybabel = 'flask/bin/pybabel'
os.system(pybabel + ' extract -F babel.cfg -k lazy_gettext -o messages.pot app')
os.system(pybabel + ' update -i messages.pot -d app/translations')
os.unlink('messages.pot')

And finally, a script to compile the catalog (file tr_compile.py):

#!flask/bin/python
import os
import sys
if sys.platform == 'win32':
    pybabel = 'flask\\Scripts\\pybabel'
else:
    pybabel = 'flask/bin/pybabel'
os.system(pybabel + ' compile -d app/translations')

These scripts should make working with translation files an easy task.

Final words

Today we have implemented an often overlooked aspect of web applications. Users want to work in their native language, so being able to publish our application in as many languages as we can find translators for is a huge accomplishment.

In the next article we will look at what is probably the most complex aspect in the area of I18n and L10n, which is the real time automated translation of user generated content. And we will use this as an excuse to add some Ajax magic to our application.

Here is the download link for the latest version of microblog, including the complete Spanish translation:

Download microblog-0.14.zip.

If you prefer, you can also find the code on github here.

Thank you for being a loyal reader. See you next time!

Miguel

Become a Patron!

Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!

88 comments
  • #51 John said

    Hello Miguel :)
    i18n Translation doesn't work with Safari (on GoogleChrome or Firefox it's ok).
    Is this normal ?

  • #52 John said

    Safari seems to send fr-fr unlike Google Chrome or Firefox and Babel return None on accept_languages :(

  • #53 John said

    Ok, i found, you need to declare fr_FR/en_US not en-US/fr-FR like in tutorial

  • #54 Miguel Grinberg said

    @John: Correct, the language to request is configured in the browser as an ordered list of choices. The server the picks the language closest to the top of the list that it supports.

  • #55 Martin said

    Hi Miguel,

    congratulations for the great tutorial. It is the best I have encountered in years. However, I have a problem with the translations. Once logged in they work perfectly fine, but the login form itself does not get translated while the menu from base.html does. Any idea what the reason might be?

  • #56 Martin said

    Yesterday I asked why the sign in page might not get translated while once logged in the translation works fine. I found it. Somehow the translated texts did not make it back into to the application. I forgot to update 404 and 500.html. Since that is done it seems to work.

    Martin

  • #57 Miguel Grinberg said

    @Martin: did you check if the HTML for the form has the _( ) wrapper that triggers the translations? Compare with this: https://github.com/miguelgrinberg/microblog/blob/master/app/templates/login.html

  • #58 Moacyr said

    when I try the command "flask \ Scripts \ pybabel extract -F -o babel.cfg messages.pot app" of the error:

    "failed to create process."

  • #59 Miguel Grinberg said

    @Moacyr: do you have the pybabel executable in the flask\Scripts folder?

  • #60 Norbert Stüken said

    Thanks for this great tutorial. I know, its a bit older now, but to keep it up to date, two annotations:
    - When using Python 3, there is an issue when updating translations ("TypeError: must be str, not bytes"). This can easily be solved by changing a line in the babel package (see https://github.com/mitsuhiko/flask-babel/issues/43)
    - moment.js seemed to have changed the handling with different languages. Now, only one file including all local translations is downloaded (moment-with-locales.min.js) and then insert the following in base.html:
    <script src="/static/js/moment-with-locales.min.js"></script>
    {% if g.locale == 'es' %}
    <script>moment.locale('es');</script>
    {% endif %}

  • #61 Amin said

    Thank you for your great tutorials.
    But how do you localize WTF form fields? For example the placeholders. Flask Babel extract seems to ignore those.

  • #62 Miguel Grinberg said

    @Amin: have you tried adding the gettext function around a placeholder value given in a template? I think that works. Show me an actual example if it doesn't.

  • #63 Justin Goh said

    Hi, I'm not sure if this tutorial is out of date (or that Moment.js was updated recently), but I spent really long (<3 weeks of coding experience) trying to figure out how to get Moment.js to work (i18n), and it turns out the fix for me was adding:

    <script>moment.locale('{{ g.locale }}')</script>

    into my base.html.

    Hope this will help others stuck with the same problem.

  • #64 Samarth Shah said

    Do we have to extract, init, translate (using GUI) and compile everytime for each language?

    If so, it is easier when I am programming in MacOS. But what happens when I ship code to a Linux server where I can't have GUI programs like poedit?
    Is there a command line way to translate everything for you into mo files? So that all I have to do is run that file (python or sh)?

    It just feels cumbersome to manually translate .po files to .mo using GUI.

  • #65 Miguel Grinberg said

    @Samarth: first of all, poedit is available on all platforms, even Linux. If you have a headless Linux server, all you need to do is host an X server on your local machine and configure it as the display for your Linux server. But if that sounds too complicated, all the translation files are text files, you can edit them in a text editor.

  • #66 Yogi said

    Hi Miguel,

    if a user want to set the language manually, what is the best way to implement it?
    My issue is, if a user select a flag country icon, how shall this language information be passed to the init.py without refreshing the actual page?

    Thanks

  • #67 Miguel Grinberg said

    @Yogi: there isn't a single way to do this. You can have the flag icon send a request to the server that includes the language code. Once the server has it, it can save it on the DB for the user, or write it in a cookie so that it gets it back with every future request.

  • #68 Darek said

    Thanks for tutorial, I know question was made 3 years ago but...
    @marco and whoever gets "No such file or directory: 'babel.cfg' "

    solution which worked for me is rename babel.cfg to simply babel and run
    flask/bin/pybabel extract -F babel -o messages.pot ap

    Cheers.

  • #69 Norbert Stüken said

    Just tried out my microblog project from last year and found that the line: print "usage: tr_init <language-code> in" in 'tr_init.py' must be changed to: print("usage: tr_init <language-code>") to be Python3 compatible. I found the solution here: http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python

  • #70 Alan Porter said

    Three years later, and this series is still teaching new folks about Flask. Thanks for sharing it with us. I am impressed with how many practical topics are addressed along the way, things that aren't really part of the lesson title, but that end up being important parts of developing a web site.

    I'm having trouble running the "pybabel extract" step. I get a strange error with a long stack trace ending with this:

    UndefinedEnvironmentName: 'extra' does not exist in evaluation environment.

    I think at this point, I'll have to punt on the translations and see if I can complete the rest of the tutorials.

    Alan

  • #71 Miguel Grinberg said

    @Alan: I believe the error that you have was a bug in pip. You may want to try upgrading the pip that you have in your virtual environment.

  • #72 DonExo said

    Hey Miguel, I must say that even after 4-5 years after releasing this amazing tut - it's still awesome.
    One day, if I have to teach someone else a Python+Flask - I'll definitely point them right here to this amazing mega-tutorial!

    Kudos for all the effort.

    Now I have a strange problem... dunno how long it's there, I guess I always skipped it...

    Whenever I try to exec some command on the CMD - i get a error:
    "Fatal error in launcher: Unable to create process using '"'

    Tried googling and solving it by my self - but without luck.
    Whatever I try to execute trough the CMD to python related stuff - it gives me this error.
    I think it's something in exact this folder/virtual enviroment/project... because when I created a new project and started installing all the modules with it - it worked without any problems.

    A photo may bring my problems to you much closer than the words...
    http://i.imgur.com/7fWc1JW.png

    Thanks in advance.

  • #73 Miguel Grinberg said

    @DonExo: not sure what the error is, but you should try recreating the virtual environment from scratch. I'm sure that'll help.

  • #74 Kirill said

    @Alan
    I had the same problem and upgrading setuptools from 24.0.3 to 25.1.6 solved the problem.

    Kirill

  • #75 Robert Gottesman said

    Hi Miguel, I've been following your tutorial on Python 3.4 .. and seems to be working well. I ran into a glitch which should be simple to solve .. but maybe I'm foggy eyed.

    I have correctly generated a messages.pot from my windows 10 PC using the command line
    ... Scripts\pybabel extract -F babel.cfg -o messages.pot app
    Example entry follows:
    #: app/forms.py:36
    msgid "Your changes have been saved."
    msgstr ""

    However, when I execute my 'init command I get no translations for Spanish into the messages.po file. My init command is: Scripts\pybabel init -i messages.pot -d app/translations -l es

    The messages.po file returns entries like the following ..
    #: app/views.py:123
    msgid "Your changes have been saved."
    msgstr ""

    whereas the msgstr val should be: "Tus cambios han sido guardados."

    Why does is msgstr value empty? (bang head on wall)

Leave a Comment