A common need of web applications is to have a periodically running task in the background. This could be a task that imports new data from third party sources, or maybe one that removes revoked tokens from your database once they have expired. In this and many other situations you are faced with the challenge of implementing a task that runs in the background at regular intervals.
This is a pattern that many people ask me about. I've seen implementations that are based on the APScheduler package, on Celery, and even homegrown solutions built inside a background thread. Sadly none of these options are very good. In this article I'm going to show you what I believe is a very robust implementation that is based on the Flask CLI and the cron service.
Implementing the Job Logic
I adhere to the "divide and conquer" principle, so when I'm implementing a scheduled job I prefer to separate the job itself from the scheduling and also from the web application. So I really view a job that runs at regular intervals as a standalone short-lived job that runs once, configured to run over and over again at the desired frequency.
When working with a Flask application, I find that the best option to implement a short-lived job is to do it as a command attached to the
flask command, not only because I can consolidate all my jobs under a single command but also because a Flask command runs inside an application context, so I can use many of the same facilities I have access in the Flask routes, the most important of all being the database.
Below you can see an example of how I would implement a job. In this case I'm using the flasky application featured in my Flask Web Development book. This application already has a few custom commands, so I added one more at the end of the flasky.py module:
import time @app.cli.command() def scheduled(): """Run scheduled job.""" print('Importing feeds...') time.sleep(5) print('Users:', str(User.query.all())) print('Done!')
Because this is just a demonstration, I'm not doing anything specific in this job, just a five second sleep to simulate some work being done. I have added a few print statements which would be used for logging, and I have also included a simple database query, to confirm that the Flask configured database works great inside the custom command.
Now I can see my custom command when I run
(venv) $ flask --help Usage: flask [OPTIONS] COMMAND [ARGS]... This shell command acts as general utility script for Flask applications. It loads the application configured (through the FLASK_APP environment variable) and then provides commands either provided by the application or Flask itself. The most useful commands are the "run" and "shell" command. Example usage: $ export FLASK_APP=hello.py $ export FLASK_DEBUG=1 $ flask run Options: --version Show the flask version --help Show this message and exit. Commands: db Perform database migrations. deploy Run deployment tasks. profile Start the application under the code... run Runs a development server. scheduled Run scheduled job. shell Runs a shell in the app context. test Run the unit tests.
This way of writing the job makes it easy to do testing, since I can simply run this job from the command-line as many times as I need to get it right:
(venv) $ flask scheduled Importing feeds... Users: [<User 'miguel'>] Done!
Defining a Cron Job
Once the job is written and tested, it is time to implement the scheduling part. For this I find the cron service available in all Unix-based distributions more than adequate.
Each user in a Unix system has the option to set up scheduled commands that are executed by the system in a "crontab" (cron table) file. The
crontab command is used to open a text editor on the user's crontab file:
$ crontab -e
It is important to run the
crontab command under the user that is intended to run the scheduled job, which typically is the same user that runs the web application. This ensures the job will run with the correct permissions. I recommend you do not put your scheduled jobs on the root user, in the same way you shouldn't be running your web application as root.
crontab -e command will start a text editor on the user's crontab file, which will initially be empty, aside from some explanatory comments.
A scheduled job is given in the crontab file as a line with six fields. The first five fields are used to set up the run scheduled for the job. The sixth and last field is the command to run. You can configure multiple jobs, each with its own schedule by writing multiple lines in the crontab file.
I find the easiest way to set up my scheduled job is to start with a default configuration that runs the command once per minute, as this allows me to test that the command runs correctly without having to wait a lot of time between runs.
To run a job once a minute put five stars separated by spaces, followed by the command to run:
* * * * * command
In my example the command I want to run is
flask scheduled, but in general when you write a command in a crontab file you have to adapt the command to compensate for the differences between running the command from the terminal versus having cron run it as a service. I can think of three aspects that need to be considered:
- Current directory: If the command needs to run from a specific directory, you have to add a
cdin the cron job. It may also be necessary to specify an absolute path to the command.
- Environment variables: If the command needs environment variables set, they need to be set as part of the command. My recommendation is that you use a .env and/or .flaskenv files to store your variables, so that they are automatically imported by Flask when the command starts.
- Virtual environment: This one is specific to Python applications. You have to either activate the virtual environment as part of the cron command, or else execute the Python executable located inside the virtualenv directory.
- Logging: The cron service collects the output of the command and sends it to the Unix user as an email. This is almost always inconvenient, so it is best to ensure that the command generates no output by redirecting
stderrto a logfile.
Here is how my
flask scheduled command can be configured to run once a minute as a cron job:
* * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1
&& is used to include multiple commands in a single line. With it I can
cd to the directory of my project and then execute the command. To make sure the virtual environment is activated I fish the
flask command directly out of the virtualenv's bin directory. This achieves the same effect as activating the environment. For environment variables this application uses a .env file, so that works the same under cron. In terms of logging I first redirect
stdout to a file with
>>scheduled.log, which will cause new runs of the job to append at the end of the file. For
stderr I used
2>&1, which means that I want to apply the same redirection for
stderr that I configured for
stdout (the "2" and the "1" reference the file handle numbers for
As soon as you save and exit the text editor the scheduled job will start to run at the top of every minute, and you should see the output of each run added to the end of the scheduled.log file. If the command ends with a crash, the stack trace will be written to
stderr, which we are also writing the logfile, so you'll see the error in the log.
Once you have the command running successfully once a minute, you can start thinking about creating a final schedule for it. The five stars represent the following time specifications in order:
- The minute, from 0 to 59 or * for every minute
- The hour, from 0 to 23 or * for every hour
- The day of the month, from 1 to 31 or * for every day
- The month, from 1 to 12 or * for every month
- The day of the week from 0 (Sunday) to 6 (Saturday) or * for every day of the week
Using stars for all fields means that we want to run the job on every minute of every hour of every day of every month, and on every day of the week. If I wanted to run the job once per hour instead of once per minute, all I need to do is set a specific minute. For example, to run at the 0th minute of every hour:
0 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1
If instead I wanted to run once per hour, but at the 5th minute (i.e. at 0:05, 1:05, 2:05 and so on):
5 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1
To run the job daily at 4:05am:
5 4 * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1
If I want to run at at 04:05pm:
5 16 * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1
To run the job at 4:05am, but only on Tuesdays:
5 4 * * 2 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1
Instead of using a single number for each field, you can specify multiple ones separated by commas. To run the job at 4:05am on Tuesdays and Fridays:
5 4 * * 2,4 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1
Ranges of consecutive numbers can be given with a dash. To run the job at 4:05am only on weekdays:
5 4 * * 1-5 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1
When you specify a range of numbers, you can also include a step argument. The following example runs the job every 2 minutes, on the even minutes:
0-59/2 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1
And if I wanted to run every two minutes on the odd ones:
1-59/2 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1
I hope by now you get how this works. If you want to practice different cron schedules, the crontab.guru site is great, as it translates a given specification into words to make it more clear.
Once you configure your desired interval your cron job will run at the schedule time. To review how it is working, you may want to check the logfile. If you use the
Importing feeds... Users: [<User 'miguel'>] Done! Importing feeds... Users: [<User 'miguel'>] Done! Importing feeds... Users: [<User 'miguel'>] Done!
The problem is that the output of the command is always the same, so you get this repetitive stream. If at any time there was an error, or output that was unexpected, you will not know when that happened. If you wanted to know how long your runs are taking, you cannot know either.
To add a little bit more context into this logfile, timestamps can be added to each line.
from datetime import datetime import time @app.cli.command() def scheduled(): """Run scheduled job.""" print(str(datetime.utcnow()), 'Importing feeds...') time.sleep(5) print(str(datetime.utcnow()), 'Users:', str(User.query.all())) print(str(datetime.utcnow()), 'Done!')
With this change, now your logfile will show the time each line was printed:
2020-06-28 23:03:25.597371 Importing feeds... 2020-06-28 23:03:30.599382 Users:  2020-06-28 23:03:30.621601 Done!
If your job outputs more than a handful of lines in each run, you should use the
logging module from Python to create a more robust logfile.
I hope this tutorial gave you a clear idea of how to implement regularly scheduled background jobs in your Flask application. In the introduction I mentioned that using Python-based solutions is a bad idea. In case you want to know why, here are some problems:
- If your background job runs in the context of your Python process with APScheduler or a similar package, when you scale your Flask application to more than one worker you'll have multiple background jobs as well.
- If you run your background job in a homegrown thread-based solution, you'll have to have very robust error handling in place. If not, whenever the background thread crashes your jobs will stop running. Unlike most of these Python implementations, using cron requires to additional dependencies. If you deploy on a Linux machine, you always have cron available to you.
I hope I convinced you, but if you have a method of running background jobs that you like better than cron and would like to tell me about it let me know below in the comments!