Flask Video Streaming Revisited

Posted by
on under

Flask Video Streaming Server

Almost three years ago I wrote an article on this blog titled Video Streaming with Flask, in which I presented a very modest streaming server that used a Flask generator view function to stream a Motion-JPEG stream to web browsers. My intention with that article was to show a simple, yet practical use of streaming responses, a not very well known feature in Flask.

That article is extremely popular, but not because it teaches how to implement streaming responses, but because a lot of people want to implement streaming video servers. Unfortunately, my focus when I wrote the article was not on creating a robust video server, so I frequently get questions and requests for advice from those who want to use the video server for a real application and quickly find its limitations. So today I'm going to revisit my streaming video server and describe a few improvements I've made to it.

Recap: Using Flask's Streaming for Video

I recommend you read the original article to familiarize yourself with my project. In short, this is a Flask server that uses a streaming response to provide a stream of video frames captured from a camera in Motion JPEG format. This format is very simple and not the most efficient, but has the advantage that all browsers support it natively and without any client-side scripting required. It is a fairly common format used by security cameras for that reason. To demonstrate the server, I implemented a camera driver for a Raspberry Pi with its camera module. For those that didn't have a Pi with a camera at hand, I also wrote an emulated camera driver that streams a sequence of jpeg images stored on disk.

Running the Camera Only When There Are Viewers

One aspect of the original streaming server that people did not like is that the background thread that captures video frames from the Raspberry Pi camera starts when the first client connects to the stream, but then it never stops. A more efficient way to handle this background thread is to only have it running while there are viewers, so that the camera can be turned off when nobody is connected.

I implemented this improvement a while ago. The idea is that every time a frame is accessed by a client the current time of that access is recorded. The camera thread checks this timestamp and if it finds it is more than ten seconds old it exits. With this change, when the server runs for ten seconds without any clients it will shut its camera off and stop all background activity. As soon as a client connects again the thread is restarted.

Here is a brief description of the changes:

class Camera(object):
    # ...
    last_access = 0  # time of last client access to the camera

    # ...

    def get_frame(self):
        Camera.last_access = time.time()
        # ...

    @classmethod
    def _thread(cls):
        with picamera.PiCamera() as camera:
            # ...
            for foo in camera.capture_continuous(stream, 'jpeg', use_video_port=True):
                # ...
                # if there hasn't been any clients asking for frames in
                # the last 10 seconds stop the thread
                if time.time() - cls.last_access > 10:
                    break
        cls.thread = None

Simplifying the Camera Class

A common problem that a lot of people mentioned to me is that it is hard to add support for other cameras. The Camera class that I implemented for the Raspberry Pi is fairly complex because it uses a background capture thread to talk to the camera hardware.

To make this easier, I decided to move the generic functionality that does all the background processing of frames to a base class, leaving only the task of getting the frames from the camera to implement in subclasses. The new BaseCamera class in module base_camera.py implements this base class. Here is what this generic thread looks like:

class BaseCamera(object):
    thread = None  # background thread that reads frames from camera
    frame = None  # current frame is stored here by background thread
    last_access = 0  # time of last client access to the camera
    # ...

    @staticmethod
    def frames():
        """Generator that returns frames from the camera."""
        raise RuntimeError('Must be implemented by subclasses.')

    @classmethod
    def _thread(cls):
        """Camera background thread."""
        print('Starting camera thread.')
        frames_iterator = cls.frames()
        for frame in frames_iterator:
            BaseCamera.frame = frame

            # if there hasn't been any clients asking for frames in
            # the last 10 seconds then stop the thread
            if time.time() - BaseCamera.last_access > 10:
                frames_iterator.close()
                print('Stopping camera thread due to inactivity.')
                break
        BaseCamera.thread = None

This new version of the Raspberry Pi's camera thread has been made generic with the use of yet another generator. The thread expects the frames() method (which is a static method) to be a generator implemented in subclasses that are specific to different cameras. Each item returned by the iterator must be a video frame, in jpeg format.

Here is how the emulated camera that returns static images can be adapted to work with this base class:

class Camera(BaseCamera):
    """An emulated camera implementation that streams a repeated sequence of
    files 1.jpg, 2.jpg and 3.jpg at a rate of one frame per second."""
    imgs = [open(f + '.jpg', 'rb').read() for f in ['1', '2', '3']]

    @staticmethod
    def frames():
        while True:
            time.sleep(1)
            yield Camera.imgs[int(time.time()) % 3]

Note how in this version the frames() generator forces a frame rate of one frame per second by simply sleeping that amount between frames.

The camera subclass for the Raspberry Pi camera also becomes much simpler with this redesign:

import io
import picamera
from base_camera import BaseCamera

class Camera(BaseCamera):
    @staticmethod
    def frames():
        with picamera.PiCamera() as camera:
            # let camera warm up
            time.sleep(2)

            stream = io.BytesIO()
            for foo in camera.capture_continuous(stream, 'jpeg', use_video_port=True):
                # return current frame
                stream.seek(0)
                yield stream.read()

                # reset stream for next frame
                stream.seek(0)
                stream.truncate()

OpenCV Camera Driver

A fair number of users complained that they did not have access to a Raspberry Pi equipped with a camera module, so they could not try this server with anything other than the emulated camera. Now that adding camera drivers is much easier, I wanted to also have a camera based on OpenCV, which supports most USB webcams and laptop cameras. Here is a simple camera driver for it:

import cv2
from base_camera import BaseCamera

class Camera(BaseCamera):
    @staticmethod
    def frames():
        camera = cv2.VideoCapture(0)
        if not camera.isOpened():
            raise RuntimeError('Could not start camera.')

        while True:
            # read current frame
            _, img = camera.read()

            # encode as a jpeg image and return it
            yield cv2.imencode('.jpg', img)[1].tobytes()

With this class, the first video camera reported by your system will be used. If you are using a laptop, this is likely your internal camera. If you are going to use this driver, you need to install the OpenCV bindings for Python:

$ pip install opencv-python

Camera Selection

The project now supports three different camera drivers: emulated, Raspberry Pi and OpenCV. To make it easier to select which driver to use without having to edit the code, the Flask server looks for a CAMERA environment variable to know which class to import. This variable can be set to pi or opencv, and if it isn't set, then the emulated camera is used by default.

The way this is implemented is fairly generic. Whatever the value of the CAMERA environment variable is, the server will expect the driver to be in a module named camera_$CAMERA.py. The server will import this module and then look for a Camera class in it. The logic is actually quite simple:

from importlib import import_module
import os

# import camera driver
if os.environ.get('CAMERA'):
    Camera = import_module('camera_' + os.environ['CAMERA']).Camera
else:
    from camera import Camera

For example, to start an OpenCV session from bash, you can do this:

$ CAMERA=opencv python app.py

From a Windows command prompt you can do the same as follows:

$ set CAMERA=opencv
$ python app.py

Performance Improvements

Another observation that was made a few times is that the server consumes a lot of CPU. The reason for this is that there is no synchronization between the background thread capturing frames and the generator feeding those frames to the client. Both run as fast as they can, without regards for the speed of the other.

In general it makes sense for the background thread to run as fast as possible, because you want the frame rate to be as high as possible for each client. But you definitely do not want the generator that delivers frames to a client to ever run at a faster rate than the camera is producing frames, because that would mean duplicate frames will be sent to the client. While these duplicates do not cause any problems, they increase CPU and network usage without any benefit.

So there needs to be a mechanism by which the generator only delivers original frames to the client, and if the delivery loop inside the generator is faster than the frame rate of the camera thread, then the generator should wait until a new frame is available, so that it paces itself to match the camera rate. On the other side, if the delivery loop runs at a slower rate than the camera thread, then it should never get behind when processing frames, and instead it should skip frames to always deliver the most current frame. Sounds complicated, right?

What I wanted as a solution here is to have the camera thread signal the generators that are running when a new frame is available. The generators can then block while they wait for the signal before they deliver the next frame. In looking through synchronization primitives, I've found that threading.Event is the one that matches this behavior. So basically, each generator should have an event object, and then the camera thread should signal all the active event objects to inform all the running generators when a new frame is available. The generators deliver the frame and reset their event objects, and then go back to wait on them again for the next frame.

To avoid having to add event handling logic in the generator, I decided to implement a customized event class that uses the thread id of the caller to automatically create and manage a separate event for each client thread. This is somewhat complex, to be honest, but the idea came from how Flask's context local variables are implemented. The new event class is called CameraEvent, and has wait(), set(), and clear() methods. With the support of this class, the rate control mechanism can be added to the BaseCamera class:

class CameraEvent(object):
    # ...

class BaseCamera(object):
    # ...
    event = CameraEvent()

    # ...

    def get_frame(self):
        """Return the current camera frame."""
        BaseCamera.last_access = time.time()

        # wait for a signal from the camera thread
        BaseCamera.event.wait()
        BaseCamera.event.clear()

        return BaseCamera.frame

    @classmethod
    def _thread(cls):
        # ...
        for frame in frames_iterator:
            BaseCamera.frame = frame
            BaseCamera.event.set()  # send signal to clients

            # ...

The magic that is done in the CameraEvent class enables multiple clients to be able to wait individually for a new frame. The wait() method uses the current thread id to allocate an individual event object for each client and wait on it. The clear() method will reset the event associated with the caller's thread id, so that each generator thread can run at its own speed. The set() method called by the camera thread sends a signal to the event objects allocated for all clients, and will also remove any events that aren't being serviced by their owners, because that means that the clients associated with those events have closed the connection and are gone. You can see the implementation of the CameraEvent class in the GitHub repository.

To give you an idea of the magnitude of the performance improvement, consider that the emulated camera driver consumed about 96% CPU before this change because it was constantly sending duplicate frames at a rate much higher than the one frame per second being produced. After these changes, the same stream consumes about 3% CPU. In both cases there was a single client viewing the stream. The OpenCV driver went from about 45% CPU down to 12% for a single client, with each new client adding about 3%.

Production Web Server

Lastly, I think if you plan to use this server for real, you should use a more robust web server than the one that comes with Flask. A very good choice is to use Gunicorn:

$ pip install gunicorn

With Gunicorn, you can run the server as follows (remember to set the CAMERA environment variable to the selected camera driver first):

$ gunicorn --threads 5 --workers 1 --bind 0.0.0.0:5000 app:app

The --threads 5 option tells Gunicorn to handle at most five concurrent requests. That means that with this number you can get up to five clients to watch the stream simultaneously. The --workers 1 options limits the server to a single process. This is required because only one process can connect to a camera to capture frames.

You can increase the number of threads some, but if you find that you need a large number, it will probably be more efficient to use an asynchronous framework instead of threads. Gunicorn can be configured to work with the two frameworks that are compatible with Flask: gevent and eventlet. To make the video streaming server work with these frameworks, there is one small addition to the camera background thread:

class BaseCamera(object):
    # ...
   @classmethod
    def _thread(cls):
        # ...
        for frame in frames_iterator:
            BaseCamera.frame = frame
            BaseCamera.event.set()  # send signal to clients
            time.sleep(0)
            # ...

The only change here is the addition of a sleep(0) in the camera capture loop. This is required for both eventlet and gevent, because they use cooperative multitasking. The way these frameworks achieve concurrency is by having each task release the CPU either by calling a function that does network I/O or explicitly. Since there is no I/O here, the sleep call is what achieves the CPU release.

Now you can run Gunicorn with the gevent or eventlet workers as follows:

$ CAMERA=opencv gunicorn --worker-class gevent --workers 1 --bind 0.0.0.0:5000 app:app

Here the --worker-class gevent option configures Gunicorn to use the gevent framework (you must install it with pip install gevent). If you prefer, --worker-class eventlet is also available. The --workers 1 limits to a single process as above. The eventlet and gevent workers in Gunicorn allocate a thousand concurrent clients by default, so that should be much more than what a server of this kind is able to support anyway.

Conclusion

All the changes described above are incorporated in the GitHub repository. I hope you get a better experience with these improvements.

Before concluding, I want to provide quick answers to other questions I have received about this server:

  • How to force the server to run at a fixed frame rate? Configure your camera to deliver frames at that rate, then sleep enough time during each iteration of the camera capture loop to also run at that rate.
  • How to increase the frame rate? The server as described here delivers frames as fast as possible. If you need better frame rates, you can try configuring your camera for a smaller frame size.
  • How to add sound? That's really difficult. The Motion JPEG format does not support audio. You are going to need to stream the audio separately, and then add an audio player to the HTML page. Even if you manage to do all this, synchronization between audio and video is not going to be very accurate.
  • How to save the stream to disk on the server? Just save the sequence of JPEG files in the camera thread. For this you may want to remove the automatic mechanism that ends the background thread when there are no viewers.
  • How to add playback controls to the video player? Motion JPEG was not made for interactive operation by the user, but if you are set on doing this, with a little bit of trickery it may be possible to implement playback controls. If the server saves all jpeg images, then a pause can be implemented by having the server deliver the same frame over and over. When the user resumes playback, the server will have to deliver "old" images that are loaded from disk, since now the user would be in DVR mode instead of watching the stream live. This could be a very interesting project!

That is all for now. If you have other questions please let me know!

Become a Patron!

Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!

225 comments
  • #176 Helene said

    Hello Miguel,
    First of all, thanks a lot for this tutorial !
    It works fine for my application as long as I run it locally. As it is a project for school, I wanted to host it on "pythonanywhere" so I could share it easily. Everything seems to work (my background etc. is displayed) except that the webcam image is not displayed. It seems that the access to the webcam is never asked (and never granted). I don't get any error, only the page with nothing (or the logo of "broken file", depending on the browser I use) where the webcam view should be.
    Am I missing anything ?

    Thanks in advance,
    Helene

  • #177 Miguel Grinberg said

    @Helene: this application requires that the server has access to the camera hardware. When you run under PythonAnywhere there is no camera. You may be confused because you are running the server and the client on the same machine, but the camera is part of the server, not the client.

  • #178 Bence said

    Hi Miguel!

    Thank you for this understandable article, it was very helpful to implement an app for my IoT project with a Jetson Nano!

    Best regards

  • #179 Simon said

    Hi Miguel,
    Thanks a lot for this tutorial.
    I was searching for something like this for my robot project quite some time.
    I included openCV to it, so I can make it more autonomuse in the future.
    Here is the whole project:
    https://hackaday.io/project/175039-big-and-small-tank
    Best Regards

  • #180 Andres Sommerhoff said

    Great Blog Miguel!! Thank you for sharing it!

    I'm wondering how to reemplace a Motion JPEG with a more sophisticated stream like H.264, VP8, etc using same OpenCV and python. As you said Motion JPEG compression is not very efficient for motion video. Any recommendation who to start? Should Flask be replaced or is it capable for the task? Is any other streaming format you can recommend that is more convenient than H.264 and VP8 (maybe not as good in quality or compression, but more convenient regarding being easier to implement, enough compatible, etc) but better in compression and quality than Motion JPEG?

  • #181 Miguel Grinberg said

    @Andres: you can't replace Motion-JPEG with something else in this project. Streaming with other methods requires a completely different implementation, you will also need a dedicated video encoder for the format you decide to use. Flask can serve the stream, but generating the stream is that part that is much more difficult.

  • #182 Peter Makrels said

    Hi Miguel!

    I was playing with your code and tried to add stop and start buttons using SocketIO. I noticed that only adding the followings

    app.py

    from flask_socketio import SocketIO
    import eventlet
    # ...
    socketio = SocketIO(app)
    # ...
    #app.run(host='0.0.0.0', threaded=True)
    socketio.run(app, host='0.0.0.0', debug=True)
    

    results in some very strange behavior. I ran with the dummy camera, and the frames were played backwards (2, 1, 3, 2, 1, 3 ... ).

    What could this be due to? Thanks a bunch!

  • #183 Miguel Grinberg said

    @Peter: yes, the problem is that this application uses threads, and threads are blocking when used alongside an async framework, unless you are very careful in how you design the application. You may want to look into monkey patching the standard library so that threads and other async-incompatible things are patched to work well with eventlet.

  • #184 Peter Makrels said

    Dear Miguel! Thank you for the answer!

    1) I looked into it and changed the imports for time and threading to be from eventlet.green but the problem is persistent. I tried with eventlet.monkey_patch() too with the same results. I also added a couple more dummy frames and now I see that what looked like numbers playing backwards is actually forwards just with frames dropped.

    I modified the dummy camera to yield as follows

    # ...
                i = int(time.time()) % 7
                yield (Camera.imgs[i], i)
    

    and the gen() function in app.py to print every time it yields something

    def gen(camera):
        """Video streaming generator function."""
        while True:
            frame, i = camera.get_frame()
            print("yielding from gen!!!", i)
            yield (b'--frame\r\n'
                   b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')
    

    and it seems like it provides consecutive images at a regular ~1s interval. I am a bit lost as of what could be happening.

    2) Another thing I tried was setting

    socketio = SocketIO(app, async_mode="threading")
    

    which seemingly solved the issue. I know eventlet is the recommended way to go here and I am not sure of the implications of using threading instead, but this is just a toy running on my home wifi, so could it be fine?

    Could you please give feedback on these points? It is much appreciated!

  • #185 Miguel Grinberg said

    @Peter: If switching to threading fixes your problem, then it is pretty clear that something that you are doing is incompatible with eventlet (or you haven't monkey patched correctly). If you use threading mode you lose the WebSocket support.

  • #186 Kilian said

    Hi Miguel,
    Thank you for all the recourses you have made available online!
    I was wondering, would it be possible to use Flask-socket.io with an
    socketio.emit(videodata, broadcast=True)
    to update the videoimages to all the clients is some way?

  • #187 Miguel Grinberg said

    @Kilian: If you implement a video player for the browser that works with Socket.IO then sure, you can send frames via emit. But the video player used in this article does not use Socket.IO, so it won't work that way.

  • #188 bieb said

    Hi Miguel!
    Thanks a lot for this tutorial.
    I would like to store the information which is loaded in a session variable inside the generator. However, when I tried to get session value outside generator, it returns None.
    Am I missing anything ?

    def gen(camera):
        while True:
            frame = camera.get_frame()
        session['info'] = info
    
    
            yield (b'--frame\r\n'
                   b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')
    
    @app.route('/video_feed')
    def video_feed():
        return Response(stream_with_context(gen(Camera())),
                        mimetype='multipart/x-mixed-replace; boundary=frame')
    
  • #189 Miguel Grinberg said

    @bieb: This is not possible. To change the session a cookie needs to be delivered to the client. Because the stream is running, there is no way to send a cookie.

  • #190 Peter Lakner said

    Thanks for the articles I found them very helpful. I am wondering if you have any suggestions for how to pull this off with something more efficient than MJPEG. I am working on something with a very limited bandwidth

  • #191 Miguel Grinberg said

    @Peter: a solution with a modern/advanced streaming format is going to be very different, I don't think this article can be adapted, it is going to take major changes. For starters, you will need an auxiliary process to do the encoding, maybe ffmpeg.

  • #192 Sunday Ajiroghene said

    Hello Miguel,

    This article is great.

    Lets imagine a scenario where I want to use a button to start the camera stream and stop it too when I choose to. I wait for another time and decides to start the camera Just like in a real scenario where the user chooses when to start a stream and stop the stream and resume stream at later time.

    This seems so difficult for me to implement, I have tried it over and again but got the whole stuff messed up.

    Can you please help me out, solve this?

  • #193 Miguel Grinberg said

    @Sunday: that isn't really a good design, because you can have more than one user watching the stream. If one user stops the stream, then it will affect the other user. A more practical solution is to dynamically remove the <img> tag using JavaScript. That will stop showing the stream for the user, but the server will continue streaming to other clients.

  • #194 R_Moore said

    Hi Miguel,

    I am trying to use services like ngrok and localTunnel to test out my flask application. The application I wrote uses the exact method you outlined in this blog to stream video from my built-in camera. On the local host, this works flawless. However, when I use ngrok and localTunnel and connect to their tunnelled URL's, the resultant stream has a very low frame rate. I want to assume that this is due to high latency seen from using ngrok service, but I am also wondering if this method is suited for streaming outside of local networks. Excuse my lack of knowledge of technical terms.

    I am wondering if you have every deployed this streaming method onto a server that can be accessed outside of a local network and gotten good performance?

    Forgive me, but I am lacking an understanding of how to take this deployed application that you have outlined above and allow it to connected to from another network. Can you recommend any methods for this?

  • #195 Miguel Grinberg said

    @R: This method just streams bytes over a network connection. There is nothing magical or strange about the method in itself. How fast or slow those bytes travel depend on the network condition. Ngrok will cause a considerable delay, because the bytes now need to make two trips, first from localhost to ngrok, then from ngrok to your client.

    If you need a faster stream, then you can always reduce the frame rate or the frame dimensions. Or of course you can look for a different streaming algorithm. This one has the advantage that it is very easy to implement. You can always look at other methods, maybe encoding with ffmpeg if you want a more efficient compression.

  • #196 Adam Thompson said

    Hi Miguel,
    Thanks for this great article, I come back to it often. I've had this working great on a few projects. I was wondering if you have any suggestions for how to implement changing camera settings on the fly. I'm adding an interface for adjusting the camera resolution and framerate from the web. I'm new to threading in Python and it's not immediately clear to me what the best way would be to pass arguments to the camera. I imagine it will require restarting the camera thread anyway. I'm wrapping my head around it and will post back later if I find a solution I like. Thanks again!

  • #197 Miguel Grinberg said

    @Adam: passing arguments to the camera really depends on the camera and the Python camera library that you are using, in particular the way the library allows you to change camera settings. I think it is safe to assume that for safety and greater compatibility all cameras you will need to stop the camera, make the necessary changes, then start it again.

  • #198 Akshaya said

    Hi Miguel,
    I am trying to achieve below case but i am not able to achieve
    I want the browser to show the video ( sequence of image frame yielded from python ) & changing text in same page. If image frame has apple, then apple text should be shown & then once frame shows the box, it should print box.
    I am able to view the video but text is constant & it is not updating. Any idea to solve my issue.
    Thanks in Advance

  • #199 Miguel Grinberg said

    @Akshaya: This was asked several times in these comments already. Have you reviewed them? There is no way to do what you want using motion-jpeg. You can send the video as shown in this article, and then figure out your own way to send and synchronize the text using JavaScript, or else you need to find a different video format that supports captions/subtitles/etc.

  • #200 Scarlito said

    Hi Miguel.
    Thanks for your great work ! (this tutorial, and of course the Mega tutorial !).
    This question has been asked several times (I just read all the comments), plus it's summarized at the end of the article.
    Unfortunately, I certainly missed something.
    I'd like to capture a frame and save it as a JPEG file. You wrote that the only thing to do is to take the latest frame and save it to a file. Seems simple, and that was exactly was I was looking for. I just don't understand how to actually save this frame.
    I suppose I'd have to use "camera.get_frame()" to get the frame, but how to then save it as JPEG ?
    Thanks !

Leave a Comment