2021-12-03T11:03:40Z

API Authentication with Tokens

In this article I'm going to show you a few common patterns for client authentication based on tokens, and how can they be implemented in a Python API back end. This method of authentication works well for rich clients, like JavaScript-based front end applications running in the browser, or perhaps a command-line (CLI) application.

I have written about Authentication several times on this blog, so this article is a bit different. Since I have already provided a few authentication projects in previous articles and in my open source projects, in this article I'm going to go over all the considerations you have to take into account when deciding how to best implement authentication for your own API project.

This article was voted by my supporters on Patreon. Would you like to support my work, and as a thank you be able to vote on my future articles and also have access to a chat room where I hang out? Become a Patron!

Types of Tokens

In terms of their composition, there are two large groups or categories of tokens that I'm going to discuss in this article. Depending on the needs of your application you will have to choose which type of token works best. To be honest, I do not know if there are formal names for these, so I'm going to name them myself. The two groups are random tokens and signed tokens.

Random Tokens

Random tokens are, as the name I've given them implies, composed of a sequence of random characters. You may have also heard the term "API Key" for this type of token. There are lots of well-known APIs that use random tokens. Examples that come to mind are AWS, Azure, Twilio, Slack, and many many more.

In Python, you can generate random tokens using the secrets module of the Python standard library:

>>> import secrets

>>> secrets.token_bytes()
b'|yg\xa8\xc0\x07\xd5z\x9d#\xfe\x94\x17\xecw`s\x96g\xef\xea\xe4\x1d\x80\x11\xfd\xa4y\xfce\xf4\x80'

>>> secrets.token_hex()
'73e59171f050865e733cb1a3e16413e07983a3169601211610cdce00ed185e3d'

>>> secrets.token_urlsafe()
'oWjvY0hX05Py-1N1UU9aRrFe0n-82iQgoqKnW03CdRA'

The token_bytes(), token_hex() and token_urlsafe() functions from the secrets module generate tokens in three different formats, so you have a few options depending on what's most convenient for your application. These functions take an optional argument that specifies the token length, but they all have a reasonable default.

Which of the three options is the best? The token generation engine is the same in all of them, what changes is how the token is rendered before it is returned to you. My personal favorite is the token_urlsafe() function, because it generates the shortest strings for a given token length.

When you generate a random token for a client, you have to preserve this token in your user database under the user represented by the client. This is necessary because later when the client presents the token for authentication you will have to ensure it matches the original.

If you want to have the utmost security, you can encrypt the tokens before you store them in your database. For this you can use your favorite encryption algorithm, or your database's encryption facilities if available. If you decide to implement encryption for your tokens, you should make sure you don't store your encryption key(s) in the same database as your encrypted tokens. If you feel confident that you have good security practices on the server that hosts your database, then encrypting the tokens might be an unnecessary complication.

You will want to have an index built on the database column that holds your tokens (encryption will definitely complicate this!), so that when a client authenticates with a token you can search this index and quickly determine which user owns the token.

Signed Tokens

Signed tokens are much more complex than random tokens and are supported by cryptographic functions. In general a signed token has three components: a header, one or more claims, and the signature.

The header stores metadata about the token that is useful in decoding and verifying the token. This can include cryptographic signing algorithms used, version numbers and similar information that is not sensitive.

The server will add one or more claims to the token. These describe the client requesting the token. The most common claims for an authentication token are the user ID and the user role, but other claims that make sense for the application can be added. In general claims stored in tokens are not encrypted, so you should avoid storing sensitive information such as passwords or API keys.

The signature part of the token is what provides legitimacy to the claims. This signature is generated from the information contained in the claims using cryptographic functions and a secret key that is only known to the server that generates the tokens. Once a signature is attached to the token, it is not possible to alter the claims, because changes made to the claims would render the signature invalid.

To authenticate a client with a signed token it is necessary to first validate the token signature. If the signature does not validate, then access is denied. If the signature passes validation, then access is granted, and the claims included in the token can be assumed to be legitimate, so they are used to determine the identity of the client.

The algorithms used in generating and verifying the signatures can be symmetric or asymmetric. When using symmetric signing algorithms, the same secret key used to sign the token needs to be used when the token signature is verified. Asymmetric signing algorithms use two keys, a secret key to sign the token and a public key to verify it. The benefit of asymmetric signing is that it allows anybody to verify signed tokens, without compromising their security.

By far the most popular signed token format is JSON Web Token (JWT, sometimes pronounced "jot"). In Python, the pyjwt package provides easy functions to create and verify your tokens. Below you can see how to generate a JWT token using this library:

>>> import jwt
>>> encoded_jwt = jwt.encode({"some": "payload"}, "secret", algorithm="HS256")
>>> print(encoded_jwt)
eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzb21lIjoicGF5bG9hZCJ9.Joh1R2dYzkRvDkqv3sygm5YyK8Gi4ShZqbhK2gxcs2U

In this example (which I took from pyjwt's documentation) the claim is {"some": "payload"}, the symmetric secret key is "secret", and the signing algorithm, which is stored in the token's header is "HS256".

Decoding the token, which also verifies the signature, is done as follows:

>>> jwt.decode(encoded_jwt, "secret", algorithms=["HS256"])
{'some': 'payload'}

If the signature does not validate, or if the signature is valid, but the token has an expiration claim that is in the past, then pyjwt will raise an exception instead of returning the claims.

If you are interested in the asymmetric key usage, I have written a dedicated article on how to implement asymmetric signing with JWT.

The most important difference signed tokens have is that they do not need to be stored in your user database, because the information about the user is stored directly inside the token. So for a signed token, you just have to generate the token and return it to your client. When the token is presented back for authentication, the server just needs to verify the signature and use the claims to identify the user. While not needing to store tokens in a database seems like a great advantage, this has a problem, which I'm going to discuss in the Token Revocation section below.

How Does the Client Get the Token?

We know that the client is going to send a token when it makes an API call, but to be able to do that it first needs to obtain the token. For this task there are also two possible methods.

Copy/Paste Method

An approach that is widely used by API services that are typically called from application servers (as opposite to a browser) is to show the token (sometimes called API key) in the website of the service provider. The developer of the consumer application must copy the token from the provider's website and then add it to their application configuration. This is an approach that is used by many cloud services, including Slack and Twilio, to name just a couple. If your API is designed to be called by other servers, then this is a good model to use.

Auth Endpoint Method

When the consumer application runs in the browser the solution described above does not work, first because the browser is an insecure platform where it is not possible to safely store a token or API key, but also because you'll very likely want each user who logs in to the application to use their own individual token.

For this type of application the process is more involved. When the user logs in, for example by providing an email address and a password, the browser sends a request to a token endpoint, authenticating on behalf of the user with the credentials provided. If the endpoint verifies the credentials, then it generates a token for the client and returns it in the response. From then on, the client authenticates their requests with this token. The two-legged OAuth protocol is a well known example of this process.

Authentication for this method is often implemented using the HTTP Basic Authentication standard, in which the client passes the user provided credentials in the Authorization header of the request, with the following structure:

Authorization: Basic base64("<username>:<password>")

Implementing token generation with this method in a Flask application is relatively simple. The snippet of code below comes from my Flask Mega-Tutorial. This code uses Flask-HTTPAuth's support for HTTP Basic Authentication to implement the token generation endpoint.

from flask_httpauth import HTTPBasicAuth
from app.models import User
from app.api import bp

basic_auth = HTTPBasicAuth()


@basic_auth.verify_password
def verify_password(username, password):
    user = User.query.filter_by(username=username).first()
    if user and user.check_password(password):
        return user


@bp.route('/tokens', methods=['POST'])
@basic_auth.login_required
def get_token():
    token = basic_auth.current_user().get_token()
    db.session.commit()
    return jsonify({'token': token})

See the Flask Mega-Tutorial API chapter if you want to learn this method in detail. At a high level, the verify_password() function is in charge of authenticating endpoints with username and password, relying on methods of the User model for the actual password verification.

The get_token() function is the actual authentication endpoint, decorated with the basic_auth.login_required, so that all requests to this endpoint are verified by calling the verify_password function. The endpoint relies, once again, on supporting functions provided in the User model to generate a token that is then returned to the client in the response.

This solution can be made more elaborate by the use of two tokens, an access token and a refresh token. An access token is used for authenticating against API endpoints, but this token is provided with a relatively short expiration time. When the token expires, it cannot be used anymore, and the client must request a new access token by invoking another endpoint that is authenticated with the refresh token. If you are interested in learning more about refresh tokens, there is no better source than the OAuth 2 specification.

Authenticating API Endpoints

Once the client is in possession of a token, it can send authenticated requests. The actual authentication mechanism that is often used is Bearer Authentication, which also uses the Authorization header:

Authorization: Bearer <token>

For Flask applications, the HTTPTokenAuth class from the Flask-HTTPAuth extension simplifies the server-side implementation. Once again, here is a short excerpt from the Flask Mega-Tutorial code:

from flask_httpauth import HTTPTokenAuth
from app.models import User

token_auth = HTTPTokenAuth()


@token_auth.verify_token
def verify_token(token):
    return User.check_token(token) if token else None

The verify_token() function is registered as a verification function for the token_auth object. Any endpoints decorated with token_auth.login_required will invoke this function to authorize the request.

After looking at the Basic and Bearer authentication implementations, you may be wondering if these two authentication methods can coexist in the same application. The answer is yes, there is absolutely no issue with having two (or more) Flask-HTTPAuth objects in the same application. Each of these objects provides its own login_required decorator, so for each endpoint you have to choose the appropriate decorator for the type of authentication you want that endpoint to use.

Token Revocation

An important security consideration when working with token authentication is making it easy to revoke tokens. This is not only important to control a leak, but also as a "logout" mechanism that clients can use to disable a token once they don't need it anymore, ensuring that even if this discarded token is leaked it won't be of use.

If you are using random tokens, revoking a token requires removing the token from the database. Once the token is not in the database, it won't work as authentication. I typically implement an endpoint such as /tokens with the DELETE method to do the revocation. Of course this endpoint must also be authenticated, possibly with the token being revoked. Here is a snippet from the Flask Mega-Tutorial:

@bp.route('/tokens', methods=['DELETE'])
@token_auth.login_required
def revoke_token():
    token_auth.current_user().revoke_token()
    db.session.commit()
    return '', 204

If you use signed tokens, then the revocation process is much more difficult, because these tokens are not stored in a central database from where they can be deleted. The solution for this type of token is to build a revoked tokens table in your database, where tokens that are revoked are stored. When a client presents a token for authentication, the revoked token table must be searched and if the token appears in this table, then the request must be denied, even if the token is otherwise valid. If your tokens have an expiration, then they only need to be in the revoked tokens table during the validate period. A background task (maybe a cron job) can run at regular intervals and purge any expired revoked tokens, since once they are expired they do not present a risk anymore.

If you thought signed tokens such as JWT are great because they do not require database storage, now you can see why in most cases that is actually not as great as it seems, since you will still need database storage for the revocations.

A Complete Example

In this article I have decided to not include a complete example, because it would largely be duplicating the API chapter of my Flask Mega-Tutorial. If you are interested in studying a complete implementation, then that chapter is the most up to date resource I can offer you.

Leave a Comment