Flask Video Streaming Revisited

Posted by

on under

Almost three years ago I wrote an article on this blog titled Video Streaming with Flask, in which I presented a very modest streaming server that used a Flask generator view function to stream a Motion-JPEG stream to web browsers. My intention with that article was to show a simple, yet practical use of streaming responses, a not very well known feature in Flask.

That article is extremely popular, but not because it teaches how to implement streaming responses, but because a lot of people want to implement streaming video servers. Unfortunately, my focus when I wrote the article was not on creating a robust video server, so I frequently get questions and requests for advice from those who want to use the video server for a real application and quickly find its limitations. So today I'm going to revisit my streaming video server and describe a few improvements I've made to it.

Recap: Using Flask's Streaming for Video

I recommend you read the original article to familiarize yourself with my project. In short, this is a Flask server that uses a streaming response to provide a stream of video frames captured from a camera in Motion JPEG format. This format is very simple and not the most efficient, but has the advantage that all browsers support it natively and without any client-side scripting required. It is a fairly common format used by security cameras for that reason. To demonstrate the server, I implemented a camera driver for a Raspberry Pi with its camera module. For those that didn't have a Pi with a camera at hand, I also wrote an emulated camera driver that streams a sequence of jpeg images stored on disk.

Running the Camera Only When There Are Viewers

One aspect of the original streaming server that people did not like is that the background thread that captures video frames from the Raspberry Pi camera starts when the first client connects to the stream, but then it never stops. A more efficient way to handle this background thread is to only have it running while there are viewers, so that the camera can be turned off when nobody is connected.

I implemented this improvement a while ago. The idea is that every time a frame is accessed by a client the current time of that access is recorded. The camera thread checks this timestamp and if it finds it is more than ten seconds old it exits. With this change, when the server runs for ten seconds without any clients it will shut its camera off and stop all background activity. As soon as a client connects again the thread is restarted.

Here is a brief description of the changes:

class Camera(object):
    # ...
    last_access = 0  # time of last client access to the camera

    # ...

    def get_frame(self):
        Camera.last_access = time.time()
        # ...

    @classmethod
    def _thread(cls):
        with picamera.PiCamera() as camera:
            # ...
            for foo in camera.capture_continuous(stream, 'jpeg', use_video_port=True):
                # ...
                # if there hasn't been any clients asking for frames in
                # the last 10 seconds stop the thread
                if time.time() - cls.last_access > 10:
                    break
        cls.thread = None

Simplifying the Camera Class

A common problem that a lot of people mentioned to me is that it is hard to add support for other cameras. The Camera class that I implemented for the Raspberry Pi is fairly complex because it uses a background capture thread to talk to the camera hardware.

To make this easier, I decided to move the generic functionality that does all the background processing of frames to a base class, leaving only the task of getting the frames from the camera to implement in subclasses. The new BaseCamera class in module base_camera.py implements this base class. Here is what this generic thread looks like:

class BaseCamera(object):
    thread = None  # background thread that reads frames from camera
    frame = None  # current frame is stored here by background thread
    last_access = 0  # time of last client access to the camera
    # ...

    @staticmethod
    def frames():
        """Generator that returns frames from the camera."""
        raise RuntimeError('Must be implemented by subclasses.')

    @classmethod
    def _thread(cls):
        """Camera background thread."""
        print('Starting camera thread.')
        frames_iterator = cls.frames()
        for frame in frames_iterator:
            BaseCamera.frame = frame

            # if there hasn't been any clients asking for frames in
            # the last 10 seconds then stop the thread
            if time.time() - BaseCamera.last_access > 10:
                frames_iterator.close()
                print('Stopping camera thread due to inactivity.')
                break
        BaseCamera.thread = None

This new version of the Raspberry Pi's camera thread has been made generic with the use of yet another generator. The thread expects the frames() method (which is a static method) to be a generator implemented in subclasses that are specific to different cameras. Each item returned by the iterator must be a video frame, in jpeg format.

Here is how the emulated camera that returns static images can be adapted to work with this base class:

class Camera(BaseCamera):
    """An emulated camera implementation that streams a repeated sequence of
    files 1.jpg, 2.jpg and 3.jpg at a rate of one frame per second."""
    imgs = [open(f + '.jpg', 'rb').read() for f in ['1', '2', '3']]

    @staticmethod
    def frames():
        while True:
            time.sleep(1)
            yield Camera.imgs[int(time.time()) % 3]

Note how in this version the frames() generator forces a frame rate of one frame per second by simply sleeping that amount between frames.

The camera subclass for the Raspberry Pi camera also becomes much simpler with this redesign:

import io
import picamera
from base_camera import BaseCamera

class Camera(BaseCamera):
    @staticmethod
    def frames():
        with picamera.PiCamera() as camera:
            # let camera warm up
            time.sleep(2)

            stream = io.BytesIO()
            for foo in camera.capture_continuous(stream, 'jpeg', use_video_port=True):
                # return current frame
                stream.seek(0)
                yield stream.read()

                # reset stream for next frame
                stream.seek(0)
                stream.truncate()

OpenCV Camera Driver

A fair number of users complained that they did not have access to a Raspberry Pi equipped with a camera module, so they could not try this server with anything other than the emulated camera. Now that adding camera drivers is much easier, I wanted to also have a camera based on OpenCV, which supports most USB webcams and laptop cameras. Here is a simple camera driver for it:

import cv2
from base_camera import BaseCamera

class Camera(BaseCamera):
    @staticmethod
    def frames():
        camera = cv2.VideoCapture(0)
        if not camera.isOpened():
            raise RuntimeError('Could not start camera.')

        while True:
            # read current frame
            _, img = camera.read()

            # encode as a jpeg image and return it
            yield cv2.imencode('.jpg', img)[1].tobytes()

With this class, the first video camera reported by your system will be used. If you are using a laptop, this is likely your internal camera. If you are going to use this driver, you need to install the OpenCV bindings for Python:

$ pip install opencv-python

Camera Selection

The project now supports three different camera drivers: emulated, Raspberry Pi and OpenCV. To make it easier to select which driver to use without having to edit the code, the Flask server looks for a CAMERA environment variable to know which class to import. This variable can be set to pi or opencv, and if it isn't set, then the emulated camera is used by default.

The way this is implemented is fairly generic. Whatever the value of the CAMERA environment variable is, the server will expect the driver to be in a module named camera_$CAMERA.py. The server will import this module and then look for a Camera class in it. The logic is actually quite simple:

from importlib import import_module
import os

# import camera driver
if os.environ.get('CAMERA'):
    Camera = import_module('camera_' + os.environ['CAMERA']).Camera
else:
    from camera import Camera

For example, to start an OpenCV session from bash, you can do this:

$ CAMERA=opencv python app.py

From a Windows command prompt you can do the same as follows:

$ set CAMERA=opencv
$ python app.py

Performance Improvements

Another observation that was made a few times is that the server consumes a lot of CPU. The reason for this is that there is no synchronization between the background thread capturing frames and the generator feeding those frames to the client. Both run as fast as they can, without regards for the speed of the other.

In general it makes sense for the background thread to run as fast as possible, because you want the frame rate to be as high as possible for each client. But you definitely do not want the generator that delivers frames to a client to ever run at a faster rate than the camera is producing frames, because that would mean duplicate frames will be sent to the client. While these duplicates do not cause any problems, they increase CPU and network usage without any benefit.

So there needs to be a mechanism by which the generator only delivers original frames to the client, and if the delivery loop inside the generator is faster than the frame rate of the camera thread, then the generator should wait until a new frame is available, so that it paces itself to match the camera rate. On the other side, if the delivery loop runs at a slower rate than the camera thread, then it should never get behind when processing frames, and instead it should skip frames to always deliver the most current frame. Sounds complicated, right?

What I wanted as a solution here is to have the camera thread signal the generators that are running when a new frame is available. The generators can then block while they wait for the signal before they deliver the next frame. In looking through synchronization primitives, I've found that threading.Event is the one that matches this behavior. So basically, each generator should have an event object, and then the camera thread should signal all the active event objects to inform all the running generators when a new frame is available. The generators deliver the frame and reset their event objects, and then go back to wait on them again for the next frame.

To avoid having to add event handling logic in the generator, I decided to implement a customized event class that uses the thread id of the caller to automatically create and manage a separate event for each client thread. This is somewhat complex, to be honest, but the idea came from how Flask's context local variables are implemented. The new event class is called CameraEvent, and has wait(), set(), and clear() methods. With the support of this class, the rate control mechanism can be added to the BaseCamera class:

class CameraEvent(object):
    # ...

class BaseCamera(object):
    # ...
    event = CameraEvent()

    # ...

    def get_frame(self):
        """Return the current camera frame."""
        BaseCamera.last_access = time.time()

        # wait for a signal from the camera thread
        BaseCamera.event.wait()
        BaseCamera.event.clear()

        return BaseCamera.frame

    @classmethod
    def _thread(cls):
        # ...
        for frame in frames_iterator:
            BaseCamera.frame = frame
            BaseCamera.event.set()  # send signal to clients

            # ...

The magic that is done in the CameraEvent class enables multiple clients to be able to wait individually for a new frame. The wait() method uses the current thread id to allocate an individual event object for each client and wait on it. The clear() method will reset the event associated with the caller's thread id, so that each generator thread can run at its own speed. The set() method called by the camera thread sends a signal to the event objects allocated for all clients, and will also remove any events that aren't being serviced by their owners, because that means that the clients associated with those events have closed the connection and are gone. You can see the implementation of the CameraEvent class in the GitHub repository.

To give you an idea of the magnitude of the performance improvement, consider that the emulated camera driver consumed about 96% CPU before this change because it was constantly sending duplicate frames at a rate much higher than the one frame per second being produced. After these changes, the same stream consumes about 3% CPU. In both cases there was a single client viewing the stream. The OpenCV driver went from about 45% CPU down to 12% for a single client, with each new client adding about 3%.

Production Web Server

Lastly, I think if you plan to use this server for real, you should use a more robust web server than the one that comes with Flask. A very good choice is to use Gunicorn:

$ pip install gunicorn

With Gunicorn, you can run the server as follows (remember to set the CAMERA environment variable to the selected camera driver first):

$ gunicorn --threads 5 --workers 1 --bind 0.0.0.0:5000 app:app

The --threads 5 option tells Gunicorn to handle at most five concurrent requests. That means that with this number you can get up to five clients to watch the stream simultaneously. The --workers 1 options limits the server to a single process. This is required because only one process can connect to a camera to capture frames.

You can increase the number of threads some, but if you find that you need a large number, it will probably be more efficient to use an asynchronous framework instead of threads. Gunicorn can be configured to work with the two frameworks that are compatible with Flask: gevent and eventlet. To make the video streaming server work with these frameworks, there is one small addition to the camera background thread:

class BaseCamera(object):
    # ...
   @classmethod
    def _thread(cls):
        # ...
        for frame in frames_iterator:
            BaseCamera.frame = frame
            BaseCamera.event.set()  # send signal to clients
            time.sleep(0)
            # ...

The only change here is the addition of a sleep(0) in the camera capture loop. This is required for both eventlet and gevent, because they use cooperative multitasking. The way these frameworks achieve concurrency is by having each task release the CPU either by calling a function that does network I/O or explicitly. Since there is no I/O here, the sleep call is what achieves the CPU release.

Now you can run Gunicorn with the gevent or eventlet workers as follows:

$ CAMERA=opencv gunicorn --worker-class gevent --workers 1 --bind 0.0.0.0:5000 app:app

Here the --worker-class gevent option configures Gunicorn to use the gevent framework (you must install it with pip install gevent). If you prefer, --worker-class eventlet is also available. The --workers 1 limits to a single process as above. The eventlet and gevent workers in Gunicorn allocate a thousand concurrent clients by default, so that should be much more than what a server of this kind is able to support anyway.

Conclusion

All the changes described above are incorporated in the GitHub repository. I hope you get a better experience with these improvements.

Before concluding, I want to provide quick answers to other questions I have received about this server:

How to force the server to run at a fixed frame rate? Configure your camera to deliver frames at that rate, then sleep enough time during each iteration of the camera capture loop to also run at that rate.
How to increase the frame rate? The server as described here delivers frames as fast as possible. If you need better frame rates, you can try configuring your camera for a smaller frame size.
How to add sound? That's really difficult. The Motion JPEG format does not support audio. You are going to need to stream the audio separately, and then add an audio player to the HTML page. Even if you manage to do all this, synchronization between audio and video is not going to be very accurate.
How to save the stream to disk on the server? Just save the sequence of JPEG files in the camera thread. For this you may want to remove the automatic mechanism that ends the background thread when there are no viewers.
How to add playback controls to the video player? Motion JPEG was not made for interactive operation by the user, but if you are set on doing this, with a little bit of trickery it may be possible to implement playback controls. If the server saves all jpeg images, then a pause can be implemented by having the server deliver the same frame over and over. When the user resumes playback, the server will have to deliver "old" images that are loaded from disk, since now the user would be in DVR mode instead of watching the stream live. This could be a very interesting project!

That is all for now. If you have other questions please let me know!

Buy me a coffee?

Thank you for visiting my blog! If you enjoyed this article, please consider supporting my work and keeping me caffeinated with a small one-time donation through Buy me a coffee. Thanks!

Share this post

228 comments

#51 PyEldar said 2018-02-19T11:11:06Z

Well, i quite understand the GIL but i thought when the OS does not decide to let the thread run the GIL can not do anything about that ... On the other hand now i see that when the OS wants to make one thread running but the GIL gives up the chance to run for this thread the OS simply switches to the other thread which the GIL lets run so it the end GIL is in charge.

thanks again
#52 Lance said 2018-02-21T11:58:50Z

Hi Miguel,

i'm working on a project and stumbled upon your work, which is extraordinary by the way. hats off.
What i'm trying to achieve is having a stream (MJPEG is ideal) over the network, and i got that part covered.

what i also want, is the output of the camera, on a small 2.8" pitft screen which i can controll trough pygame.

i'm able to stream the camera to the screen using pygame no problem, the problem is, as you boldly state:
"This is required because only one process can connect to a camera to capture frames."

I'm unable to do both at the same time, as to complicate things we obviously use

camera.capture_continuous(stream, 'jpeg',use_video_port=True):

while pygame seems to only understand

camera.capture(stream, use_video_port=False, format='raw')

i'm thinking of creating an mjpeg client with pygame... but it dosent really seem elegant as i would have to read the stream to localhost (potentially adding latency).

was hoping you could give a noob any pointers :)
#53 Miguel Grinberg said 2018-02-21T20:34:38Z

@Lance: you can switch to the capture format for your screen, and then for each frame you take the raw image data and compress it to jpeg for the mjpeg stream. It's going to be slower due to the software compression of every frame, but that way I think you can serve both the screen and the web clients from one source.
#54 Bradley said 2018-02-21T22:05:22Z

Hi Miguel,

I really appreciate your work on streaming in flask. There is nothing else available for a flask tutorial on the subject. I believe the server implementation will work for my purposes.

For some reason the client browser page never finishes loading and the rest of the page is never requested. I think its due to the fact that the <img> element is changing and it is not the only element on the page. Do you have any suggestions for HTML elements that can ingest JPEGS without this problem? Perhaps Canvas?
#55 Miguel Grinberg said 2018-02-21T23:52:32Z

@Bradley: are you using the development server? The problem might be that your server is single-threaded. Once the browser starts displaying the stream, it won't be able to make additional connections to the server to get the rest of the page contents. Switch to a multi-process or multi-threaded server and you should be fine.
#56 vijay said 2018-02-26T14:07:11Z

The article was great, but i have an issue for which i would require your support.

using flask how can i detect webcam of each client when running the web application.

I tried using web sessions but i was not successful in detecting the webcam. pls suggest
#57 Miguel Grinberg said 2018-02-26T21:24:04Z

@vijay: I'm confused. What does streaming have to do with the user's webcams? The camera that is streamed is connected to the server, not the clients. If you want to work with the client's web cam, then I suggest you look into the JavaScript APIs for video and audio, the server cannot access those directly.
#58 Phil said 2018-03-04T21:15:04Z

Thanks for this great update! I'm having a problem getting the OpenCV code to work, though. It keeps spitting out the "RuntimeError: Could not start camera." exception. Wondered if you had any obvious things I should check?

Output is:
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

192.168.1.42 - - [04/Mar/2018 13:12:19] "GET / HTTP/1.1" 200 -
Starting camera thread.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(self.__args, *self.__kwargs)
File "/home/pi/scripts/stream/base_camera.py", line 93, in _thread
for frame in frames_iterator:
File "/home/pi/scripts/stream/camera_opencv.py", line 16, in frames
raise RuntimeError('Could not start camera.')
RuntimeError: Could not start camera.
#59 Miguel Grinberg said 2018-03-04T22:04:08Z

@Phil: that means that OpenCV is unable to connect to your camera. If your camera is connected and you are sure that it works, then the problem could be that OpenCV does not support that particular model.
#60 yang said 2018-03-06T12:18:42Z

Hi Miguel,
Thank you for such a informative tutorial. The repo works great out of the box!
I'm right now trying to deploy it to a heroku server using a opencv camera. The app works as it should be on the localhost, but when I deploy it to heroku, it always times out. Do you have any suggestions on how I should troubleshoot this?
Thanks again!
#61 Miguel Grinberg said 2018-03-06T17:19:45Z

@yang: How do you intend the server running under Heroku to connect to your camera?
#62 yang said 2018-03-07T20:35:09Z

@miguel grinberg.

I assumed it would ask for permission to use the camera when the scripts calls videoCapture(). Is that a wrong assumption?

I've used a work around by capturing the image from the webcam, drawing it to a canvas and then sending the data uri to opencv to process (https://www.w3schools.com/tags/canvas_drawimage.asp). But it is rather slow.
My app is running here; https://drowzee-drowzee.herokuapp.com/
I am quite new to this and still working my way around it. How should I be implementing this?
#63 Miguel Grinberg said 2018-03-08T00:26:35Z

@yang: ask permission to who? This is server to client streaming, not client to server, or client to client. The camera needs to be directly accessible by the server. For example, a camera that is connected on the server's USB port, or an internal laptop camera. What you want to do is something completely different, where the client has the camera, not the server.
#64 Brian Hamill said 2018-03-15T00:17:15Z

Hi Miguel, great post!

I just wanted to give a tip to anyone looking to save the stream, this could be done by using openCV's VideoWriter() method which you can pass in the current frame image into, something like

while True:
# read current frame
_, img = camera.read()
writer.write(img)
yield cv2.imencode('.jpg', img)[1].tobytes()

The only hard part here is figuring out how the constructor works for the video writer, something similar to

writer = cv2.VideoWriter(filename, -1, 20.0,(int(camera.get(3)), int(camera.get(4))))
<h1>where camera.get(3) and get(4) are height and width, and need converted to integers</h1> <h1>-1 is the fourcc which defaults to MP4</h1> <h1>20 is the frames per second, however there is probably a way to get this via another .get() method</h1>
the video writer will need to be closed/released somewhere using writer.release() for it to properly save the video!
#65 Bradley said 2018-03-16T17:17:32Z
Hi Miguel,

I have another question. I am trying to utilize your addition of a set_video_source() static method for setting up a video source or the open_cv example. I am creating a class video source and for some reason, it is not destructed when the server closes. If I create the video source directly in the frames() static method then the objects are deleted when the threads stop. Do you have a suggestion for creating a stream object (from an API) for the Camera() class so it is available in the static method and also have it delete on close?

Here is how it is laid out:

class Camera(BaseCamera):
video_source = 0
```
@staticmethod
def set_video_source():
    Camera.video_source = API.CreateVideoSource

@staticmethod
def frames():
    Camera.set_video_source() #  where should this go?
    while True:

        yield Camera.video_source.get_next_frame()
```
#66 Miguel Grinberg said 2018-03-17T06:17:26Z

@Bradley: Not sure if I fully understand your question, but the video_source attribute is a class variable, so the only way to force it to delete is by assigning None, which leaves whatever object was set in that attribute without any references.
#67 Bradley said 2018-03-17T19:34:40Z

@Miguel Grinberg

Thanks for the reply to the rushed question! You are correct the fact that video_source is a class variable is causing my issue. I am trying to find an elegant way to assign None to the variable when the server is shutdown in order to delete my source correctly. Within your implementation how would you recommend freeing class resources? With a destructor? Context variables? Apologies if this is a simple problem!
#68 Miguel Grinberg said 2018-03-18T06:18:45Z

@Bradley: If the server is shutting down, then what do you care about clearing this attribute? The process is going away so what does it matter? You may want to set the background thread to a daemon thread (thread.daemon = True) so that it does not prevent the process from exiting if that is the problem that you are having.
#69 Bradley said 2018-03-18T14:24:20Z

@Miguel Grinberg

To provide a bit more context the video_source I am using is created using a python swig wrapped API which I am also developing. The video source is a GL render instance which currently is set to be torn down when the C++ API object is deleted. The problem is that if I store this object as a python class variable when the server is shutdown the object is never deleted and the render instance is not torn down, and I am unable to create another one.

Is the correct solution to fix the backend of the API to delete itself if the resource stops being requested? I was thinking it would be much easier to adjust the server so that all of the references to the object are removed when the server is shutdown.

Thanks for your help!
#70 Luis Bermudez said 2018-03-18T22:11:40Z

Hi Miguel, great article!

The OpenCV implementation makes sense if you want to use your laptop camera. Does this assume that your Flask server is also installed on your laptop? I have my Flask server installed on the cloud (Ubuntu 16 via Digital Ocean), and I want to send my laptop's webcam video to my server, process it on the server (via OpenCV), and then display the video back on my client browser on my laptop. Does your above implementation work for something like this? (I'm assuming not)

If not, what would you suggest? Use the Flask-SocketIO or use the Flask Streaming? Other? Thanks!
#71 Miguel Grinberg said 2018-03-19T16:44:48Z

@Luis: No, this project is for streaming from server to client. You need streaming in the reverse direction, in your example your laptop/camera is a client. You can use a WebSocket connection to reverse-stream video frames to the server, I did something like that for audio here: https://github.com/miguelgrinberg/socketio-examples/tree/master/audio.
#72 Miguel Grinberg said 2018-03-19T22:33:53Z

@Bradley: Okay, maybe you can make this work if you add a destructor to your video source object. Python will invoke it when it deletes the object, giving you a chance to do cleanup.
#73 Phil said 2018-03-20T00:58:09Z

As a newbie trying to learn this is great - thank you. I've integrated your code into a carputer I'm building for the reversing camera. It works fine in dev running the flaskapp through Python but I can't seem to make the camera functionality work through Apache (the page displays a broken link picture and the words "<html page> didn't send any data".

In your article, you mention Gunicorn as a preferred webserver and tell us to remember to set the CAMERA environment variable.

I'd be grateful if you an can confirm that I should be able to make this work with Apache and perhaps point me in the right direction on setting the CAMERA environment variable if this needs to be done anywhere other than the python script.
#74 Miguel Grinberg said 2018-03-20T05:35:44Z

@Phil: As long as the CAMERA variable is defined when the script is started by Apache, it does not really matter where it is set. You can also remove the variable from the script and hardcode which camera class to use if that makes it easier for your Apache-based solution.
#75 Chrom said 2018-03-21T23:49:36Z

Hello Miguel,
thank you very much for this tutorial. I combined your tutorial with the part 4.1 of the documentation of picamera (http://picamera.readthedocs.io/en/release-1.9/index.html) and managed to stream JPEGs from my Raspberry Pi 3. However, there is a latency of 1 second, which is unacceptable for my project. I first write the image to a temporary .jpg file, then stream that file like you did. I suspect that the latency is because of the writing and reading process. Is there anyway to reduce this latency, like a method to stream without reading/saving jpeg frames? Just the names are enough, and I would really appreciate any detailed pointer. Also, how can I read the jpeg frames from my computer? I need the data to do some image processing too.
For now I'm also looking for a method to stream just the raw rgb arrays to the computer, then let the computer read and show the image with opencv imshow (I need to do some processing on the image so just the raw data is even better than encoded data from a jpeg image).