2020-07-20T10:29:00Z

Run Your Flask Regularly Scheduled Jobs with Cron

A common need of web applications is to have a periodically running task in the background. This could be a task that imports new data from third party sources, or maybe one that removes revoked tokens from your database once they have expired. In this and many other situations you are faced with the challenge of implementing a task that runs in the background at regular intervals.

This is a pattern that many people ask me about. I've seen implementations that are based on the APScheduler package, on Celery, and even homegrown solutions built inside a background thread. Sadly none of these options are very good. In this article I'm going to show you what I believe is a very robust implementation that is based on the Flask CLI and the cron service.

Implementing the Job Logic

I adhere to the "divide and conquer" principle, so when I'm implementing a scheduled job I prefer to separate the job itself from the scheduling and also from the web application. So I really view a job that runs at regular intervals as a standalone short-lived job that runs once, configured to run over and over again at the desired frequency.

When working with a Flask application, I find that the best option to implement a short-lived job is to do it as a command attached to the flask command, not only because I can consolidate all my jobs under a single command but also because a Flask command runs inside an application context, so I can use many of the same facilities I have access in the Flask routes, the most important of all being the database.

Below you can see an example of how I would implement a job. In this case I'm using the flasky application featured in my Flask Web Development book. This application already has a few custom commands, so I added one more at the end of the flasky.py module:

import time

@app.cli.command()
def scheduled():
    """Run scheduled job."""
    print('Importing feeds...')
    time.sleep(5)
    print('Users:', str(User.query.all()))
    print('Done!')

Because this is just a demonstration, I'm not doing anything specific in this job, just a five second sleep to simulate some work being done. I have added a few print statements which would be used for logging, and I have also included a simple database query, to confirm that the Flask configured database works great inside the custom command.

Now I can see my custom command when I run flask --help:

(venv) $ flask --help
Usage: flask [OPTIONS] COMMAND [ARGS]...

  This shell command acts as general utility script for Flask applications.

  It loads the application configured (through the FLASK_APP environment
  variable) and then provides commands either provided by the application or
  Flask itself.

  The most useful commands are the "run" and "shell" command.

  Example usage:

    $ export FLASK_APP=hello.py
    $ export FLASK_DEBUG=1
    $ flask run

Options:
  --version  Show the flask version
  --help     Show this message and exit.

Commands:
  db         Perform database migrations.
  deploy     Run deployment tasks.
  profile    Start the application under the code...
  run        Runs a development server.
  scheduled  Run scheduled job.
  shell      Runs a shell in the app context.
  test       Run the unit tests.

This way of writing the job makes it easy to do testing, since I can simply run this job from the command-line as many times as I need to get it right:

(venv) $ flask scheduled
Importing feeds...
Users: [<User 'miguel'>]
Done!

Defining a Cron Job

Once the job is written and tested, it is time to implement the scheduling part. For this I find the cron service available in all Unix-based distributions more than adequate.

Each user in a Unix system has the option to set up scheduled commands that are executed by the system in a "crontab" (cron table) file. The crontab command is used to open a text editor on the user's crontab file:

$ crontab -e

It is important to run the crontab command under the user that is intended to run the scheduled job, which typically is the same user that runs the web application. This ensures the job will run with the correct permissions. I recommend you do not put your scheduled jobs on the root user, in the same way you shouldn't be running your web application as root.

The crontab -e command will start a text editor on the user's crontab file, which will initially be empty, aside from some explanatory comments.

A scheduled job is given in the crontab file as a line with six fields. The first five fields are used to set up the run scheduled for the job. The sixth and last field is the command to run. You can configure multiple jobs, each with its own schedule by writing multiple lines in the crontab file.

I find the easiest way to set up my scheduled job is to start with a default configuration that runs the command once per minute, as this allows me to test that the command runs correctly without having to wait a lot of time between runs.

To run a job once a minute put five stars separated by spaces, followed by the command to run:

* * * * * command

In my example the command I want to run is flask scheduled, but in general when you write a command in a crontab file you have to adapt the command to compensate for the differences between running the command from the terminal versus having cron run it as a service. I can think of three aspects that need to be considered:

  • Current directory: If the command needs to run from a specific directory, you have to add a cd in the cron job. It may also be necessary to specify an absolute path to the command.
  • Environment variables: If the command needs environment variables set, they need to be set as part of the command. My recommendation is that you use a .env and/or .flaskenv files to store your variables, so that they are automatically imported by Flask when the command starts.
  • Virtual environment: This one is specific to Python applications. You have to either activate the virtual environment as part of the cron command, or else execute the Python executable located inside the virtualenv directory.
  • Logging: The cron service collects the output of the command and sends it to the Unix user as an email. This is almost always inconvenient, so it is best to ensure that the command generates no output by redirecting stdout and stderr to a logfile.

Here is how my flask scheduled command can be configured to run once a minute as a cron job:

* * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

The && is used to include multiple commands in a single line. With it I can cd to the directory of my project and then execute the command. To make sure the virtual environment is activated I fish the flask command directly out of the virtualenv's bin directory. This achieves the same effect as activating the environment. For environment variables this application uses a .env file, so that works the same under cron. In terms of logging I first redirect stdout to a file with >>scheduled.log, which will cause new runs of the job to append at the end of the file. For stderr I used 2>&1, which means that I want to apply the same redirection for stderr that I configured for stdout (the "2" and the "1" reference the file handle numbers for stderr and stdout respectively).

As soon as you save and exit the text editor the scheduled job will start to run at the top of every minute, and you should see the output of each run added to the end of the scheduled.log file. If the command ends with a crash, the stack trace will be written to stderr, which we are also writing the logfile, so you'll see the error in the log.

Once you have the command running successfully once a minute, you can start thinking about creating a final schedule for it. The five stars represent the following time specifications in order:

  • The minute, from 0 to 59 or * for every minute
  • The hour, from 0 to 23 or * for every hour
  • The day of the month, from 1 to 31 or * for every day
  • The month, from 1 to 12 or * for every month
  • The day of the week from 0 (Sunday) to 6 (Saturday) or * for every day of the week

Using stars for all fields means that we want to run the job on every minute of every hour of every day of every month, and on every day of the week. If I wanted to run the job once per hour instead of once per minute, all I need to do is set a specific minute. For example, to run at the 0th minute of every hour:

0 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

If instead I wanted to run once per hour, but at the 5th minute (i.e. at 0:05, 1:05, 2:05 and so on):

5 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

To run the job daily at 4:05am:

5 4 * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >>scheduled.log 2>&1

If I want to run at at 04:05pm:

5 16 * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

To run the job at 4:05am, but only on Tuesdays:

5 4 * * 2 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

Instead of using a single number for each field, you can specify multiple ones separated by commas. To run the job at 4:05am on Tuesdays and Fridays:

5 4 * * 2,4 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

Ranges of consecutive numbers can be given with a dash. To run the job at 4:05am only on weekdays:

5 4 * * 1-5 cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

When you specify a range of numbers, you can also include a step argument. The following example runs the job every 2 minutes, on the even minutes:

0-59/2 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

And if I wanted to run every two minutes on the odd ones:

1-59/2 * * * * cd /home/ubuntu/flasky && venv/bin/flask scheduled >scheduled.log 2>&1

I hope by now you get how this works. If you want to practice different cron schedules, the crontab.guru site is great, as it translates a given specification into words to make it more clear.

Logging Improvements

Once you configure your desired interval your cron job will run at the schedule time. To review how it is working, you may want to check the logfile. If you use the print technique I used above for logging, you will end up with a confusing logfile that looks somewhat like this:

Importing feeds...
Users: [<User 'miguel'>]
Done!
Importing feeds...
Users: [<User 'miguel'>]
Done!
Importing feeds...
Users: [<User 'miguel'>]
Done!

The problem is that the output of the command is always the same, so you get this repetitive stream. If at any time there was an error, or output that was unexpected, you will not know when that happened. If you wanted to know how long your runs are taking, you cannot know either.

To add a little bit more context into this logfile, timestamps can be added to each line.

from datetime import datetime
import time

@app.cli.command()
def scheduled():
    """Run scheduled job."""
    print(str(datetime.utcnow()), 'Importing feeds...')
    time.sleep(5)
    print(str(datetime.utcnow()), 'Users:', str(User.query.all()))
    print(str(datetime.utcnow()), 'Done!')

With this change, now your logfile will show the time each line was printed:

2020-06-28 23:03:25.597371 Importing feeds...
2020-06-28 23:03:30.599382 Users: []
2020-06-28 23:03:30.621601 Done!

If your job outputs more than a handful of lines in each run, you should use the logging module from Python to create a more robust logfile.

Conclusion

I hope this tutorial gave you a clear idea of how to implement regularly scheduled background jobs in your Flask application. In the introduction I mentioned that using Python-based solutions is a bad idea. In case you want to know why, here are some problems:

  • If your background job runs in the context of your Python process with APScheduler or a similar package, when you scale your Flask application to more than one worker you'll have multiple background jobs as well.
  • If you run your background job in a homegrown thread-based solution, you'll have to have very robust error handling in place. If not, whenever the background thread crashes your jobs will stop running. Unlike most of these Python implementations, using cron requires to additional dependencies. If you deploy on a Linux machine, you always have cron available to you.

I hope I convinced you, but if you have a method of running background jobs that you like better than cron and would like to tell me about it let me know below in the comments!

24 comments

  • #1 Kris said 2020-07-20T21:12:41Z

    Great article (as always). I'm curious if your approach would differ if you wanted to manage the schedule within Flask. For instance, if you wanted to provide a view to modify the schedule, run the tasks immediately and see the current status. Would you still use cron as the scheduler and write (or look for) a library to manage the cron schedule or would you look to something else?

    I find myself building apps like this quite often and have used APScheduler. It works well but, I agree that it would be better to separate that functionality. Celery seems like overkill for my use cases.

  • #2 Miguel Grinberg said 2020-07-20T22:39:01Z

    @Kris: APscheduler is a bad choice for a scheduled job within a web application, in my opinion. Once you scale to more than one server process you have to do crazy things to prevent duplicate schedules running in each process. So yes, I would still use cron.

  • #3 Dan said 2020-07-22T17:47:39Z

    Miguel - thank you for this and everything else you've written about Flask.

    This post is timely as I've been thinking about adding a timed job to my app.

    I was looking at APScheduler but you have pointed out the problems with this approach.

    However, I would ideally like to define my timer in Python code so that it sits alongside the rest of my app and gets checked into git and copied into my Docker image.

    Have you got any thoughts on if this is a good idea / how it could be done?

    Maybe I could set up the cron job with python-crontab or similar?

    Thanks again!

  • #4 Miguel Grinberg said 2020-07-22T22:11:52Z

    @Dan: how about committing a copy of your crontab file as a source file?

  • #5 Dan said 2020-07-22T22:46:34Z

    Good idea, thank you.

  • #6 Aryal said 2020-08-24T08:30:26Z

    Hey Miguel, when I run the Cron job, I get the same error as I get when I try to run flask cli commands without settings 'FLASK_APP=flasky.py'. How would one go about setting the FLASK_APP variable in this case?

  • #7 Miguel Grinberg said 2020-08-24T13:35:22Z

    @Aryal: the easiest way is to add a .flaskenv file to your project. See Chapter 1 of the Mega-Tutorial.

  • #8 Tom said 2020-09-07T16:22:56Z

    Miguel, quick question - what is the difference between ".flaskenv" (see question #7 above) and ".env" you used in the mega-tutorial, chapter 17 (https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xvii-deployment-on-linux). Thank you for all your work, Tom

  • #9 Miguel Grinberg said 2020-09-07T21:51:43Z

    @Tom: technically there is no difference, both files are imported into the environment. Normally the .env file application specific variables (including secrets) while .flaskenv has flask variables such as FLASK_APP.

  • #10 Edmund said 2020-09-13T09:22:25Z

    Hi Miguel, I like the sound of this approach, running inside the application context, so I can access the database etc. I have db access working in my cli command now, great. However, I am trying to send a periodic reminder email which I have setup as another method in the email.py file that I set up from the mega-tutorial. It uses url_for() to generate the links in the email. I hit errors saying:

    File "c:\users\xxxxx\pycharmprojects\my-app\venv\lib\site-packages\flask\helpers.py", line 333, in url_for "Application was not able to create a URL adapter for request" RuntimeError: Application was not able to create a URL adapter for request independent URL generation. You might be able to fix this by setting the SERVER_ NAME config variable.

    Looking in helpers.py it looks like "url_adapter" must be null.

    url_adapter = appctx.url_adapter

    if url_adapter is None:

    Note: the same email method generates the email/links fine when run from within the web application (i.e from routes.py)

    Any thoughts on solutions? I have played around with setting the SERVER_NAME config var without much success once deployed in heroku, but it also feels quite hacky and must be a better way to set up the SERVER_NAME properly.

    Huge thanks as always.

  • #11 Miguel Grinberg said 2020-09-13T17:07:44Z

    @Edmund: Right, so the problem is that Flask cannot generate URLs because it does not know the domain. During a request it gets it from the request context, so here you have to provide it in the SERVER_NAME configuration option, and then your URLs will use that. I agree that it is a bit hacky, that's the only way Flask can get this information when there is no client providing it.

  • #12 Austin Bravo said 2020-09-15T16:52:58Z

    This was exactly what I needed for a client project - thanks for making it.

  • #13 Theo said 2020-09-17T21:24:38Z

    Hey Miguel, that's really helpful thanks. How would you deal with automatic horizontal scaling so for example you have many instances of your app running on different servers? Could you also think of way of using a scheduler? For instance could you do sth like checking for a unique key on a shared redis server which is created when the task runs for the first time?

  • #14 Miguel Grinberg said 2020-09-18T09:55:21Z

    @Theo: I'm confused by your question. The problem you mention is typical of using Python schedulers. I'm against that, this article shows an alternative approach to scheduling that is done outside of the Python process. With this solution you can scale your application freely and the background jobs are unaffected.

  • #15 Noah said 2020-10-07T22:13:33Z

    Is this something that can be done with a Heroku deployment? Or would you need to do this with one of Heroku’s add-ons? Thank you Miguel!

  • #16 Miguel Grinberg said 2020-10-08T08:29:12Z

    @Noah: cron is not accessible in the Heroku environment, you have to use their scheduling add-on.

  • #17 Andy said 2020-10-09T11:16:18Z

    Hey Miguel this is so helpful! I can't believe how simple it is. I was having so much trouble using apscheduler for a simple scheduled task in my app so i have very quickly switched to this method and this will be extremely useful for other things. Thanks!

  • #18 Usman Kamal said 2020-10-09T23:48:17Z

    Thanks Miguel for the guide. I've followed above guidelines but unfortunately cron is unable to find my Flask's custom command. If I run the exact crontab entry from CLI it triggers the command correctly. Below are the details, any hints would be helpful.

    Crontab file entry: * * * cd /home/usmankamal/cronapp && r-s-py3.6-cron/bin/flask xyz >>xyz.log 2>&1

    xyz.log: " Usage: flask [OPTIONS] COMMAND [ARGS]... Try 'flask --help' for help.

    Error: No such command 'xyz'. "

  • #19 Miguel Grinberg said 2020-10-10T09:22:10Z

    @Usman: it's five stars before the command, not three.

  • #20 Usman Kamal said 2020-10-10T12:25:40Z

    @Miguel: My bad, I didn't copy paste the entry correctly in above message. But the crontab file has the correct number of stars.

    These are the contents of my crontab file: https://justpaste.it/7ehq9

    As you can see I've tried a couple of other tweaks in commented entries but they didn't work either. The problem seems to be that when running via cron it doesn't identify the custom command, but if I run the exact contents of the crontab entry from CLI it works as mentioned in my earlier message.

  • #21 Miguel Grinberg said 2020-10-10T13:39:50Z

    @Usman: is "xyz" a custom CLI command? Have you set the FLASK_APP environment variable in .flaskenv so that the "flask" command knows where to find your application instance?

  • #22 Usman Kamal said 2020-10-11T13:24:00Z

    @Miguel: You were right, I was missing .flaskenv. I was of the view that cron will pick the FLASK_APP from system environment variables just like my terminal does. Thanks a bunch, much appreciated.

  • #23 Weston Eric Jones said 2020-11-18T12:54:47Z

    Could this be done for an application deployed to AWS?

  • #24 Miguel Grinberg said 2020-11-18T15:44:07Z

    @Weston: Yes. This would work exactly as I show it here on an EC2 server.

Leave a Comment