Video Streaming with Flask

Posted by
on under

I'm sure by now you know that I have released a book and a couple of videos on Flask in cooperation with O'Reilly Media. While the coverage of the Flask framework in these is fairly complete, there are a small number of features that for one reason or another did not get mentioned much, so I thought it would be a good idea to write articles about them here.

This article is dedicated to streaming, an interesting feature that gives Flask applications the ability to provide large responses efficiently partitioned in small chunks, potentially over a long period of time. To illustrate the topic I'm going to show you how to build a live video streaming server!

NOTE: there is now a follow-up to this article, Flask Video Streaming Revisited, in which I describe some improvements to the streaming server introduced here.

What is Streaming?

Streaming is a technique in which the server provides the response to a request in chunks. I can think of a couple of reasons why this might be useful:

  • Very large responses. Having to assemble a response in memory only to return it to the client can be inefficient for very large responses. An alternative would be to write the response to disk and then return the file with flask.send_file(), but that adds I/O to the mix. Providing the response in small portions is a much better solution, assuming the data can be generated in chunks.
  • Real time data. For some applications a request may need to return data that comes from a real time source. A pretty good example of this is a real time video or audio feed. A lot of security cameras use this technique to stream video to web browsers.

Implementing Streaming With Flask

Flask provides native support for streaming responses through the use of generator functions. A generator is a special function that can be interrupted and resumed. Consider the following function:

def gen():
    yield 1
    yield 2
    yield 3

This is a function that runs in three steps, each returning a value. Describing how generator functions are implemented is outside the scope of this article, but if you are a bit curious the following shell session will give you an idea of how generators are used:

>>> x = gen()
>>> x
<generator object gen at 0x7f06f3059c30>
>>> next(x)
>>> next(x)
>>> next(x)
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

You can see in this simple example that a generator function can return multiple results in sequence. Flask uses this characteristic of generator functions to implement streaming.

The example below shows how using streaming it is possible to generate a large data table, without having to assemble the entire table in memory:

from flask import Response, render_template
from app.models import Stock

def generate_stock_table():
    yield render_template('stock_header.html')
    for stock in Stock.query.all():
        yield render_template('stock_row.html', stock=stock)
    yield render_template('stock_footer.html')

def stock_table():
    return Response(generate_stock_table())

In this example you can see how Flask works with generator functions. A route that returns a streamed response needs to return a Response object that is initialized with the generator function. Flask then takes care of invoking the generator and sending all the partial results as chunks to the client.

For this particular example if you assume Stock.query.all() returns the result of a database query as an iterable, then you can generate a potentially large table one row at a time, so regardless of the number of elements in the query the memory consumption in the Python process will not grow larger and larger due to having to assemble a large response string.

Multipart Responses

The table example above generates a traditional page in small portions, with all the parts concatenated into the final document. This is a good example of how to generate large responses, but something a little bit more exciting is to work with real time data.

An interesting use of streaming is to have each chunk replace the previous one in the page, as this enables streams to "play" or animate in the browser window. With this technique you can have each chunk in the stream be an image, and that gives you a cool video feed that runs in the browser!

The secret to implement in-place updates is to use a multipart response. Multipart responses consist of a header that includes one of the multipart content types, followed by the parts, separated by a boundary marker and each having its own part specific content type.

There are several multipart content types for different needs. For the purpose of having a stream where each part replaces the previous part the multipart/x-mixed-replace content type must be used. To help you get an idea of how this looks, here is the structure of a multipart video stream:

HTTP/1.1 200 OK
Content-Type: multipart/x-mixed-replace; boundary=frame

Content-Type: image/jpeg

<jpeg data here>
Content-Type: image/jpeg

<jpeg data here>

As you see above, the structure is pretty simple. The main Content-Type header is set to multipart/x-mixed-replace and a boundary string is defined. Then each part is included, prefixed by two dashes and the part boundary string in their own line. The parts have their own Content-Type header, and each part can optionally include a Content-Length header with the length in bytes of the part payload, but at least for images browsers are able to deal with the stream without the length.

Building a Live Video Streaming Server

There's been enough theory in this article, now it is time to build a complete application that streams live video to web browsers.

There are many ways to stream video to browsers, and each method has its benefits and disadvantages. The method that works well with the streaming feature of Flask is to stream a sequence of independent JPEG pictures. This is called Motion JPEG, and is used by many IP security cameras. This method has low latency, but quality is not the best, since JPEG compression is not very efficient for motion video.

Below you can see a surprisingly simple, yet complete web application that can serve a Motion JPEG stream:

#!/usr/bin/env python
from flask import Flask, render_template, Response
from camera import Camera

app = Flask(__name__)

def index():
    return render_template('index.html')

def gen(camera):
    while True:
        frame = camera.get_frame()
        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')

def video_feed():
    return Response(gen(Camera()),
                    mimetype='multipart/x-mixed-replace; boundary=frame')

if __name__ == '__main__':'', debug=True)

This application imports a Camera class that is in charge of providing the sequence of frames. Putting the camera control portion in a separate module is a good idea in this case, this way the web application remains clean, simple and generic.

The application has two routes. The / route serves the main page, which is defined in the index.html template. Below you can see the contents of this template file:

    <title>Video Streaming Demonstration</title>
    <h1>Video Streaming Demonstration</h1>
    <img src="{{ url_for('video_feed') }}">

This is a simple HTML page with just a heading and an image tag. Note that the image tag's src attribute points to the second route of this application, and this is where the magic happens.

The /video_feed route returns the streaming response. Because this stream returns the images that are to be displayed in the web page, the URL to this route is in the src attribute of the image tag. The browser will automatically keep the image element updated by displaying the stream of JPEG images in it, since multipart responses are supported in most/all browsers (let me know if you find a browser that doesn't like this).

The generator function used in the /video_feed route is called gen(), and takes as an argument an instance of the Camera class. The mimetype argument is set as shown above, with the multipart/x-mixed-replace content type and a boundary set to the string "frame".

The gen() function enters a loop where it continuously returns frames from the camera as response chunks. The function asks the camera to provide a frame by calling the camera.get_frame() method, and then it yields with this frame formatted as a response chunk with a content type of image/jpeg, as shown above.

Obtaining Frames from a Video Camera

Now all that is left is to implement the Camera class, which will have to connect to the camera hardware and download live video frames from it. The nice thing about encapsulating the hardware dependent part of this application in a class is that this class can have different implementations for different people, but the rest of the application remains the same. You can think of this class as a device driver, which provides a uniform implementation regardless of the actual hardware device in use.

The other advantage of having the Camera class separated from the rest of the application is that it is easy to fool the application into thinking there is a camera when in reality there is not, since the camera class can be implemented to emulate a camera without real hardware. In fact, while I was working on this application, the easiest way for me to test the streaming was to do that and not have to worry about the hardware until I had everything else running. Below you can see the simple emulated camera implementation that I used:

from time import time

class Camera(object):
    def __init__(self):
        self.frames = [open(f + '.jpg', 'rb').read() for f in ['1', '2', '3']]

    def get_frame(self):
        return self.frames[int(time()) % 3]

This implementation reads three images from disk called 1.jpg, 2.jpg and 3.jpg and then returns them one after another repeatedly, at a rate of one frame per second. The get_frame() method uses the current time in seconds to determine which of the three frames to return at any given moment. Pretty simple, right?

To run this emulated camera I needed to create the three frames. Using gimp I've made the following images:

Frame 1 Frame 2 Frame 3

Because the camera is emulated, this application runs on any environment, so you can run this right now! I have this application all ready to go on GitHub. If you are familiar with git you can clone it with the following command:

$ git clone

If you prefer to download it, then you can get a zip file here.

Once you have the application installed, create a virtual environment and install Flask in it. Then you can run the application as follows:

$ python

After you start the application enter http://localhost:5000 in your web browser and you will see the emulated video stream playing the 1, 2 and 3 images over and over. Pretty cool, right?

Once I had everything working I fired up my Raspberry Pi with its camera module and implemented a new Camera class that converts the Pi into a video streaming server, using the picamera package to control the hardware. I will not discuss this camera implementation here, but you can find it in the source code in file

If you have a Raspberry Pi and a camera module you can edit to import the Camera class from this module and then you will be able to live stream the Pi camera, like I'm doing in the following screenshot:

Frame 1

If you want to make this streaming application work with a different camera, then all you need to do is write another implementation of the Camera class. If you end up writing one I would appreciate it if you contribute it to my GitHub project.

Limitations of Streaming

When the Flask application serves regular requests the request cycle is short. The web worker receives the request, invokes the handler function and finally returns the response. Once the response is sent back to the client the worker is free and ready to take on another request.

When a request that uses streaming is received, the worker remains attached to the client for the duration of the stream. When working with long, never ending streams such as a video stream from a camera, a worker will stay locked to the client until the client disconnects. This effectively means that unless specific measures are taken, the application can only serve as many clients as there are web workers. When working with the Flask application in debug mode that means just one, so you will not be able to connect a second browser window to watch the stream from two places at the same time.

There are ways to overcome this important limitation. The best solution in my opinion is to use a coroutine based web server such as gevent, which Flask fully supports. With the use of coroutines gevent is able to handle multiple clients on a single worker thread, as gevent modifies the Python I/O functions to issue context switches as necessary.


In case you missed it above, the code that supports this article is this GitHub repository: Here you can find a generic implementation of video streaming that does not require a camera, and also an implementation for the Raspberry Pi camera module. This follow-up article describes some improvements I made after this article was published originally.

I hope this article shed some light on the topic of streaming. I concentrated on video streaming because that is an area I have some experience, but streaming has many more uses besides video. For example, this technique can be used to keep a connection between the client and the server alive for a long time, allowing the server to push new information the moment it becomes available. These days the Web Socket protocol is a more efficient way to achieve this, but Web Socket is fairly new and works only in modern browsers, while streaming will work on pretty much any browser you can think of.

If you have any questions feel free to write them below. I plan to continue documenting more of the not well known Flask topics, so I hope you connect with me in some way to know when more articles are published. I hope to see you in the next one!


Become a Patron!

Hello, and thank you for visiting my blog! If you enjoyed this article, please consider supporting my work on this blog on Patreon!

  • #51 Steve Gale said

    Hi Miguel,

    Great article may be what I have been looking for.
    When I run the camera-PI file and connect remotely from my iPad I get a nice live feed of the video on the IPad but on my PI I get a full size video image which I have not been able to kill. I also get the same affect if I use epiphany on the PI and connect to the local host. This happens whether I run the program from the command line or from a terminal in LXDE.
    In order to kill the app I am having to use SSH from my iPad and reboot.
    Any ideas as to what could be happening?
    As a work around and to get some experience of using flask myself I think I will add a button to the web page and get it to exit the program.


  • #52 Miguel Grinberg said

    @Steve: This is something I missed, because I normally do not have the Pi connected to a monitor, I log in remotely. I will need to try this, but hopefully commenting out the start_preview() call keeps the video overlay from appearing.

  • #53 richarde said

    Miguel, firstly thank you very much ... I've found you're posts on Flask very useful while trying to build a custom web interface to allow mjpeg streaming and control over camera settings and gimbal movements for the Raspberry Pi. I wanted to use gevent to serve concurrent requests and to use WebSockets for some near real-time feedback of parameters like shutter speed, digital and analog gains.

    Even though my code didn't use threading, I couldn't get it running on gevent without monkey patching the standard library's threading module:
    from gevent import monkey; monkey.patch_thread()

    I noticed the picamera library used threading locks which I guess must be what's causing the issue. Am I missing something or is there a way to avoid monkey patching without modifying picamera?

  • #54 Miguel Grinberg said

    @richarde: Try running with debug=False and no monkey patching. I recall the Flask/Werkzeug reloader had some issues when gevent thread was not monkey patched. The picamera background thread does not need to be monkey patched I think.

  • #55 steve gale said

    Hi Miguel,

    yes you were right, commenting out start_preview solved my problem, it took me a while to realise that is what the problem was.

    I am now trying to modify your camera class to stream an opencv image. That is leading me down a path to understand the difference between io.BytesIO() stream and an opencv frame !

  • #56 richarde said

    Thanks for your quick reply Miguel.

    I was already running Flask with debug=False. When using gevent, Flask didn't respond to other requests while the mjpeg stream was active, unless I monkey patched the threading module. When using monkey patch, I intermittently get the following messages:

    Traceback (most recent call last):
    File "_ctypes/callbacks.c", line 314, in 'calling callback function'
    line 237, in _encoder_callback
    encoder._callback(port, buf)
    line 569, in _callback
    File "/usr/lib/python2.7/", line 386, in set
    File "_semaphore.pyx", line 112, in gevent._semaphore.Semaphore.acquire
    line 331, in switch
    return greenlet.switch(self)
    gevent.hub.LoopExit: This operation would block forever

    and ....

    Traceback (most recent call last):
    line 508, in handle_one_response
    line 495, in run_application
    line 484, in process_result
    for data in self.result:
    line 693, in next
    return self._next()
    line 81, in _iter_encoded
    for item in iterable:
    File "/home/pi/dev/code/micam/", line 22, in get_multipart
    for cap in cam.capture_continuous(stream, format='jpeg',
    use_video_port=True, resize=(648,365)):
    line 1742, in capture_continuous
    'Timed out waiting for capture to end')

    I've had good results with concurrent requests by simply using Werkzeug with threaded=True but I want to use gevent in order to make use of WebSockets as you showed in one of your recent posts. Maybe the picamera module is just not compatible with gevent asynchronous operations?

  • #57 Miguel Grinberg said

    @richarde: when using gevent the server should be able to handle multiple concurrent requests, with or without monkey patching. Are you saying that using gevent w/o monkey patching you were only able to have one client?

    Would you like to create a fork of my project on github and add your changes to it? Then I can review and test your version here.

  • #58 richarde said

    I forked your project today and added an additional route/request which worked both with threaded=True (tag v0.1) and gevent (tag v0.2) without monkey patching.

    The errors I encountered occurred in a much bigger application I built which didn't use an explicit background thread like yours, so I must have introduced something which prevented it from handling multiple concurrent requests without monkey patching.

    Thanks for help!

  • #59 Cameron said

    Hi Miguel,

    Thanks for these great tutorials. I think I'm going to have to invest in your O'Reilly book.

    I'm using most of your demonstartion code out of the box for video streaming but am running into some issues i suspect are related to the threading concerns above and similar to this:

    I have two issues:

    1) once I start the stream the raspi camera module stays on indefinitely. this is unfortunate since I have a cronjob taking stills periodically which it cannot do if the camera resource is tied up. How can I release the camera resource when i navigate away from the page?

    2) when i'm streaming the image website using chrome as my browser i am unable to navigate away from the page until i "stop" the page at which point the debug flask server complains about broken pipes. In firefox things behave better but i still get broken pipes when i navigate away.

  • #60 Miguel Grinberg said

    @Cameron: regarding your first question, I'm asked a lot about that, so I'm working on an enhancement that will stop the background thread when there are no clients. I have a server that does that already, but I did not want to overcomplicate this example with it. Now it seems I have no choice, so look for that in the next week or so.
    About question #2, it is expected to see a broken pipe error on the server side. This happens because Flask is happily streaming and suddenly the socket connection goes away.

  • #61 Cameron said

    @Miguel Thanks for the speedy reply! I look forward to rev2. Any way I could take a peek at that server code before your write up?

    Also any thoughts as to why chrome won't let me navigate away without first 'stopping' the page?


  • #62 Miguel Grinberg said

    @Cameron: sure, feel free to take a look:

    This is pretty old though, I haven't tested this code in a several months, and I never fully finished it, it has some rough edges.

  • #63 John said

    @Miguel Grinberg

    I find the thread-method is not realy good because evil clients can start with the refeh button x-threads and CPU usage is at 100% in a few seconds. As we know, threads can not be stoped.

  • #64 Miguel Grinberg said

    @John: not sure I understand what you are saying. This is a single background thread that serves all clients. And most of the heavy work in this thread is done by the Raspberry Pi hardware.

  • #65 John said


    For evry client who sends a request to the server the number of threads increases by one.

    Exception in thread Thread-15:
    Traceback (most recent call last):
    File "/usr/lib/python2.7/", line 552, in bootstrap_inner
    File "/usr/lib/python2.7/", line 505, in run
    self.__target(self.__args, *self.__kwargs)
    File "/home/pi/Python/Flask/", line 25, in _thread
    with picamera.PiCamera() as camera:
    File "/usr/lib/python2.7/dist-packages/picamera/", line 419, in __init

    self.STEREO_MODES[stereo_mode], stereo_decimate)
    File "/usr/lib/python2.7/dist-packages/picamera/", line 551, in _init_camera
    prefix="Camera component couldn't be enabled")
    File "/usr/lib/python2.7/dist-packages/picamera/", line 133, in mmal_check
    raise PiCameraMMALError(status, prefix)
    PiCameraMMALError: Camera component couldn't be enabled: Out of resources (other than memory)

    An other thing:

    Do you have any idea how to stop the pi-camera after x-seconds if server sends no response (respectively no client available anymore)?

    Thx for explanation generator functions.

  • #66 Miguel Grinberg said

    @John: the threads that serve each client do not do any video processing. What I meant is that there is a single thread talking to the camera, regardless of how many clients there are. If you run Werkzeug with --threaded then yes, each client will have a thread to itself, but these threads are lightweight. Can you easily reproduce the MMAL error above? Give me the steps, please.

  • #67 john said

    Just refresh your browser during picamera is streaming or just open a new page and you will see the traceback on your console...

  • #68 Miguel Grinberg said

    @john: You changed the web server. For this solution to work you obviously need a server that has a single worker (multiple threads within the worker are okay). How many worker processes does cherrypy create?

  • #69 John said

    Hi Miguel
    [...]How many worker processes does cherrypy create?[...]
    I don't know. How can I find out? The docu says: WSGI thread-pooled webserver and inside the file sands: 'wsgi.multiprocess': False, 'wsgi.multithread': True, but I don't know if this is the answer...
    I also tried ''gevent'' server. The issue is the same...
    Maybe I find out a solution over the weekend.

  • #70 Jose M. Alonso said

    Hi Miguel:
    I have mounted the camera on a Pan/tilt system and defined a new Flask route with a form in order to receive orders to move the camera. It works but I have error messages like: “Broken pipe”, “mmal_vc_component_enable:failed to enable component ENOSPC” , “PiCameraMMLError: Camera component couldn’t be enabled: Out of resources (other than memory)”
    Any idea/suggestion?
    Thank you very much for the article and for your help

  • #71 Miguel Grinberg said

    Jose: You are probably starting multiple instances of the camera thread.

  • #72 Miguel Grinberg said

    For those interested, note that I have updated the Pi camera class in the Github project to automatically shut down when there are no clients viewing the video stream.

  • #73 John said

    "For those interested, note that I have updated the Pi camera class in the Github project to automatically shut down when there are no clients viewing the video stream."

    Many thanks! Splendid performance!

  • #74 Satish said

    Thanks for your wonderful tutorial,
    fan of yours when i started to work on Flask, your tutorials helped me a lot in implementing many things in flask.

    I don't know below question is relevant or not in this context, but don't have choice to post this question.

    Now, i want to read the multipart/form-data in the flask, The request contains JSON data as well as video or pictures or document in the form of stream data. My question is how to fetch this data from flask.request object ?
    I did - received only json object, how about the stream data?

  • #75 Miguel Grinberg said

    @Satish: not sure what you are trying to do, but as far as I know, browsers will not know how to interpret the jpeg stream if it cames in a multipart request. I recommend that you split the JSON part and put it in a separate request.

Leave a Comment