Run Your Flask Regularly Scheduled Jobs with Cron

Posted by
on under

A common need of web applications is to have a periodically running task in the background. This could be a task that imports new data from third party sources, or maybe one that removes revoked tokens from your database once they have expired. In this and many other situations you are faced with the challenge of implementing a task that runs in the background at regular intervals.

This is a pattern that many people ask me about. I've seen implementations that are based on the APScheduler package, on Celery, and even homegrown solutions built inside a background thread. Sadly none of these options are very good. In this article I'm going to show you what I believe is a very robust implementation that is based on the Flask CLI and the cron service.

Implementing the Job Logic

I adhere to the "divide and conquer" principle, so when I'm implementing a scheduled job I prefer to separate the job itself from the scheduling and also from the web application. So I really view a job that runs at regular intervals as a standalone short-lived job that runs once, configured to run over and over again at the desired frequency.

When working with a Flask application, I find that the best option to implement a short-lived job is to do it as a command attached to the flask command, not only because I can consolidate all my jobs under a single command but also because a Flask command runs inside an application context, so I can use many of the same facilities I have access in the Flask routes, the most important of all being the database.

Below you can see an example of how I would implement a job. In this case I'm using the flasky application featured in my Flask Web Development book. This application already has a few custom commands, so I added one more at the end of the flasky.py module:

import time

@app.cli.command()
def scheduled():
    """Run scheduled job."""
    print('Importing feeds...')
    time.sleep(5)
    print('Users:', str(User.query.all()))
    print('Done!')

Because this is just a demonstration, I'm not doing anything specific in this job, just a five second sleep to simulate some work being done. I have added a few print statements which would be used for logging, and I have also included a simple database query, to confirm that the Flask configured database works great inside the custom command.

Now I can see my custom command when I run flask --help:

(venv) $ flask --help
Usage: flask [OPTIONS] COMMAND [ARGS]...

  This shell command acts as general utility script for Flask applications.

  It loads the application configured (through the FLASK_APP environment
  variable) and then provides commands either provided by the application or
  Flask itself.

  The most useful commands are the "run" and "shell" command.

  Example usage:

    $ export FLASK_APP=hello.py
    $ export FLASK_DEBUG=1
    $ flask run

Options:
  --version  Show the flask version
  --help     Show this message and exit.

Commands:
  db         Perform database migrations.
  deploy     Run deployment tasks.
  profile    Start the application under the code...
  run        Runs a development server.
  scheduled  Run scheduled job.
  shell      Runs a shell in the app context.
  test       Run the unit tests.

This way of writing the job makes it easy to do testing, since I can simply run this job from the command-line as many times as I need to get it right:

(venv) $ flask scheduled
Importing feeds...
Users: [<User 'miguel'>]
Done!

Defining a Cron Job

Once the job is written and tested, it is time to implement the scheduling part. For this I find the cron service available in all Unix-based distributions more than adequate.

Each user in a Unix system has the option to set up scheduled commands that are executed by the system in a "crontab" (cron table) file. The crontab command is used to open a text editor on the user's crontab file:

$ crontab -e

It is important to run the crontab command under the user that is intended to run the scheduled job, which typically is the same user that runs the web application. This ensures the job will run with the correct permissions. I recommend you do not put your scheduled jobs on the root user, in the same way you shouldn't be running your web application as root.

The crontab -e command will start a text editor on the user's crontab file, which will initially be empty, aside from some explanatory comments.

A scheduled job is given in the crontab file as a line with six fields. The first five fields are used to set up the run scheduled for the job. The sixth and last field is the command to run. You can configure multiple jobs, each with its own schedule by writing multiple lines in the crontab file.

I find the easiest way to set up my scheduled job is to start with a default configuration that runs the command once per minute, as this allows me to test that the command runs correctly without having to wait a lot of time between runs.

To run a job once a minute put five stars separated by spaces, followed by the command to run:

* * * * * command

In my example the command I want to run is flask scheduled, but in general when you write a command in a crontab file you have to adapt the command to compensate for the differences between running the command from the terminal versus having cron run it as a service. I can think of three aspects that need to be considered:

  • Current directory: If the command needs to run from a specific directory, you have to add a cd in the cron job. It may also be necessary to specify an absolute path to the command.
  • Environment variables: If the command needs environment variables set, they need to be set as part of the command. My recommendation is that you use a .env and/or .flaskenv files to store your variables, so that they are automatically imported by Flask when the command starts.
  • Virtual environment: This one is specific to Python applications. You have to either activate the virtual environment as part of the cron command, or else execute the Python executable located inside the virtualenv directory.
  • Logging: The cron service collects the output of the command and sends it to the Unix user as an email. This is almost always inconvenient, so it is best to ensure that the command generates no output by redirecting stdout and stderr to a logfile.

Here is how my flask scheduled command can be configured to run once a minute as a cron job:

* * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

The && is used to include multiple commands in a single line. With it I can cd to the directory of my project and then execute the command. To make sure the virtual environment is activated I fish the flask command directly out of the virtualenv's bin directory. This achieves the same effect as activating the environment. For environment variables this application uses a .env file, so that works the same under cron. In terms of logging I first redirect stdout to a file with >>scheduled.log, which will cause new runs of the job to append at the end of the file. For stderr I used 2>&1, which means that I want to apply the same redirection for stderr that I configured for stdout (the "2" and the "1" reference the file handle numbers for stderr and stdout respectively).

As soon as you save and exit the text editor the scheduled job will start to run at the top of every minute, and you should see the output of each run added to the end of the scheduled.log file. If the command ends with a crash, the stack trace will be written to stderr, which we are also writing the logfile, so you'll see the error in the log.

Once you have the command running successfully once a minute, you can start thinking about creating a final schedule for it. The five stars represent the following time specifications in order:

  • The minute, from 0 to 59 or * for every minute
  • The hour, from 0 to 23 or * for every hour
  • The day of the month, from 1 to 31 or * for every day
  • The month, from 1 to 12 or * for every month
  • The day of the week from 0 (Sunday) to 6 (Saturday) or * for every day of the week

Using stars for all fields means that we want to run the job on every minute of every hour of every day of every month, and on every day of the week. If I wanted to run the job once per hour instead of once per minute, all I need to do is set a specific minute. For example, to run at the 0th minute of every hour:

0 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

If instead I wanted to run once per hour, but at the 5th minute (i.e. at 0:05, 1:05, 2:05 and so on):

5 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

To run the job daily at 4:05am:

5 4 * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

If I want to run at at 04:05pm:

5 16 * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

To run the job at 4:05am, but only on Tuesdays:

5 4 * * 2 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

Instead of using a single number for each field, you can specify multiple ones separated by commas. To run the job at 4:05am on Tuesdays and Fridays:

5 4 * * 2,4 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

Ranges of consecutive numbers can be given with a dash. To run the job at 4:05am only on weekdays:

5 4 * * 1-5 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

When you specify a range of numbers, you can also include a step argument. The following example runs the job every 2 minutes, on the even minutes:

0-59/2 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

And if I wanted to run every two minutes on the odd ones:

1-59/2 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

I hope by now you get how this works. If you want to practice different cron schedules, the crontab.guru site is great, as it translates a given specification into words to make it more clear.

Logging Improvements

Once you configure your desired interval your cron job will run at the schedule time. To review how it is working, you may want to check the logfile. If you use the print technique I used above for logging, you will end up with a confusing logfile that looks somewhat like this:

Importing feeds...
Users: [<User 'miguel'>]
Done!
Importing feeds...
Users: [<User 'miguel'>]
Done!
Importing feeds...
Users: [<User 'miguel'>]
Done!

The problem is that the output of the command is always the same, so you get this repetitive stream. If at any time there was an error, or output that was unexpected, you will not know when that happened. If you wanted to know how long your runs are taking, you cannot know either.

To add a little bit more context into this logfile, timestamps can be added to each line.

from datetime import datetime
import time

@app.cli.command()
def scheduled():
    """Run scheduled job."""
    print(str(datetime.utcnow()), 'Importing feeds...')
    time.sleep(5)
    print(str(datetime.utcnow()), 'Users:', str(User.query.all()))
    print(str(datetime.utcnow()), 'Done!')

With this change, now your logfile will show the time each line was printed:

2020-06-28 23:03:25.597371 Importing feeds...
2020-06-28 23:03:30.599382 Users: []
2020-06-28 23:03:30.621601 Done!

If your job outputs more than a handful of lines in each run, you should use the logging module from Python to create a more robust logfile.

Conclusion

I hope this tutorial gave you a clear idea of how to implement regularly scheduled background jobs in your Flask application. In the introduction I mentioned that using Python-based solutions is a bad idea. In case you want to know why, here are some problems:

  • If your background job runs in the context of your Python process with APScheduler or a similar package, when you scale your Flask application to more than one worker you'll have multiple background jobs as well.
  • If you run your background job in a homegrown thread-based solution, you'll have to have very robust error handling in place. If not, whenever the background thread crashes your jobs will stop running. Unlike most of these Python implementations, using cron requires to additional dependencies. If you deploy on a Linux machine, you always have cron available to you.

I hope I convinced you, but if you have a method of running background jobs that you like better than cron and would like to tell me about it let me know below in the comments!

Become a Patron!

Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!

50 comments
  • #26 Miguel Grinberg said

    @Roark: No. This is not a topic covered in any of my books.

  • #27 drdd said

    Dude, you're amazing. That's so much easier than setting up python advanced scheduler. Thank you!

  • #28 Mike said

    @Miguel: Thank you for the tutorial, it's so informative. Could you please comment on any options/workarounds for a Cloud Foundry container? Thus, I could manually run my task/job in SSH by "tmp/lifecycle/shell", then "export FLASK_APP=my_app", finally "flask my-task". But crontab file wasn't run as far as there is no cron service in CF. Thank you!

  • #29 Miguel Grinberg said

    @Mike: I'm not familiar enough with CF to comment on this. I suppose you can use a base image that has cron, I know that works for Docker. Or else use any scheduling options provided to you by the CF platform.

  • #30 Kolade said

    Hi Miguel, Thank you so much for your Mega Tutorial, it's been so helpful although I am wondering why I can't use Celery's crontab option instead of a linux cronjob. What are the complications as my current Flask API is using celery. Thank you

  • #31 Miguel Grinberg said

    @Kolade: You can use any method that you like. Because I show how to use A you shouldn't assume that I'm telling you not to use B. I prefer cron jobs. If you already have Celery running and you are happy with it I don't see why you shouldn't use it for your scheduled jobs.

  • #32 Abhishek Kumar said

    I am not using virtual environment how will I execute cli command using crontab flask

  • #33 Miguel Grinberg said

    @Abhishek: if you prefer not to use a virtualenv, then you need to figure out where your flask command is installed, and use the full path in the crontab command.

  • #34 Dannel said

    @Miguel I am using Windows for develop and heroku for deployment what do you recomend in my case

  • #35 Miguel Grinberg said

    @Dannel: Heroku has a scheduler extension. On Windows I would use cron under the WSL.

  • #36 Adel said

    @Miguel, thank you very much for such a great tutorial! Did I understand right that every invocation of a flask CLI command (in this case, our custom command) will create the app context and then destroy it after the command finishes? So, if we invoke a background task every minute, and we have a connection to a database in that command (or in the app context), we will be establishing and destroying connections to the database every minute. Is not it more efficient to simply have a daemon running in the background (say, with a simple infinite loop) that will keep 1 open connection to the database and just fire commands when needed?

  • #37 Miguel Grinberg said

    @Adel: It really depends. A connection per minute is really small, so I don't see it as something that needs to be optimized. But if you feel the need to do it, then sure, make it a daemon, and let SQLAlchemy pool the connection. You won't be able to use cron though, you'll have to use your own scheduler or use a third party one from the Python ecosystem.

  • #38 Adel said

    @Miguel, I got you, thank you very much once again!

  • #39 Adel said

    Dear @Miguel, I have another question to you - is it possible to access an active flask socket.io connection inside the croned background task? So that, the updates can be pushed immediately to the ones who are connected at the moment.

  • #40 Miguel Grinberg said

    @Adel: Yes. See the "emitting from an external process" section of the documentation.

  • #41 Jonas Hansen said

    If you do a custom command that updates data for your webapp how do you then tell the flask app to refresh the website to include the new data?

  • #42 Miguel Grinberg said

    @Jonas: Flask does not have the ability to update the page, unless you implement something like WebSocket. In general it is the client code running in the browser that requests periodic updates by refreshing the page.

  • #43 Filipe Galo said

    Hey Miguel,

    I've been reading your blog since the start of this crazy pandemic and 100% you taught me a lot. Every time I want to do something with Python or Flask I came here.

    This post about how to run scheduled jobs is so clean and simple. Instead of installing APScheduler and worrying about the application context.

    Thanks a lot for everything.

  • #44 Osama Abbas said

    First, thanks a lot for the tutorial.

    I was wondering if I am able to use your method in updating a token. Let me briefly explain what I mean.

    I have a Flask Application that uses a token from a provider with client_id and client_secret (obtained from the provider) to be able to use some API endpoints. I am using Request-OAuth for this. The bearer access token I receive is then used with subsequent requests. But this token expires in 1 hour (3600 seconds).

    My Flask application doesn't have any login system to log the user out when token is expired. Thus, my workaround is that when the webserver is started with home page visited a token is obtained. A get_token function runs, and the token is then saved in Flask's session

    session["access_token"] = access_token
    session.modified = True
    

    However, after the first hour, the application needs to obtain a new access token. Currently, I redirect the user to the home page to get the access token renewed where the get_token function runs. But I feel like it is silly that the user has to be redirected home to get the token renewed.

    Does your method fit my scenario?

  • #45 Miguel Grinberg said

    @Osama: what do you mean here by "my method"? Nothing I show in this article is really my invention, I'm showing to configure cron jobs, which are a feature of UNIX operating systems. I don't understand what is the relationship you made between cron jobs, which run at given times with renewing tokens when they expire. Wouldn't it make more sense to renew the token directly in the app when it expires?

  • #46 Anthony Udeagbala said

    This works perfectly.

    I have been trying to automate the process using docker but I can't seem to do this. I don't want to manually edit the corn file because I would deploy the application to Digital Ocean.

    How can Docker handle this?

  • #47 Miguel Grinberg said

    @Anthony: I would imagine you need to install the cron daemon in your image if you want to run cron jobs inside your container.

  • #48 José María Sánchez said

    Thank you very much Miguel, I love your articles and your videos. I'm programming with Flask thanks to you. Muchas gracias amigo, por favor, sigue ilustrándonos con tus conocimientos. Saludos desde la Isla de Ibiza, España.

  • #49 Naved said

    I have to run a cron job every month and also when a user does an action .30 min after that action I have to remove that action how can I do both (corn job and scheduler) in the Flask app?

  • #50 Miguel Grinberg said

    @Naved: this article is for scheduled jobs, I would not use cron for background jobs that start as a result of user actions. Maybe look into Celery or RQ for that.

Leave a Comment