FFmpeg 101
To stream or not to stream
Multimedia blog and other fancy stuff
Home Tags
FFmpeg 101
26 July 2024 ffmpeg
A high-level architecture overview to start with FFmpeg.
FFmpeg package contentFFmpeg toolsFFmpeg librariesFFmpeg simple player
Code repository: ffmpeg-101 FFmpeg package content # FFmpeg is composed of a suite of tools and libraries. FFmpeg tools # The tools can be used to encode/decode/transcode a multitude of different audio and video formats, and to stream the encoded media over networks.
ffmpeg: a command line tool to convert multimedia files between formats ffplay: a simple mediaplayer based on SDL and the FFmpeg libraries ffprobe: a simple multimedia stream analyzer
FFmpeg libraries # The libraries can be used to integrate those same features into your own product.
libavformat: I/O and muxing/demuxing libavcodec: encoding/decoding libavfilter: graph-based filters for raw media libavdevice: input/output devices libavutil: common multimedia utilities libswresample: audio resampling, samples format conversion and audio mixing libswscale: color conversion and image scaling libpostproc: video post-processing (deblocking/noise filters)
FFmpeg simple player # A basic usage of FFmpeg is to demux a multimedia stream (obtained from a file or from the network) into its audio and video streams and then to decode those streams into raw audio and raw video data. To manage the media streams, FFmpeg uses the following structures:
AVFormatContext: a high level structure providing sync, metadata and muxing for the streams AVStream: a continuous stream (audio or video) AVCodec: defines how data are encoded and decoded AVPacket: encoded data in the stream AVFrame: decoded data (raw video frame or raw audio samples)
The process used to demux and decode follows this logic:
Here is the basic code needed to read an encoded multimedia stream from a file, analyze its content and demux the audio and video streams. Those features are provided by the libavformat library and it uses the AVFormatContext and AVStream structures to store the information. // Allocate memory for the context structure AVFormatContext* format_context = avformat_alloc_context();
// Open a multimedia file (like an mp4 file or any format recognized by FFmpeg) avformat_open_input(&format_context, filename, NULL, NULL); printf("File: %s, format: %s\n", filename, format_context->iformat->name);
// Analyze the file content and identify the streams within avformat_find_stream_info(format_context, NULL);
// List the streams for (unsigned int i = 0; i < format_context->nb_streams; ++i) { AVStream* stream = format_context->streams[i];
printf("---- Stream %02d\n", i); printf(" Time base: %d/%d\n", stream->time_base.num, stream->time_base.den); printf(" Framerate: %d/%d\n", stream->r_frame_rate.num, stream->r_frame_rate.den); printf(" Start time: %" PRId64 "\n", stream->start_time); printf(" Duration: %" PRId64 "\n", stream->duration); printf(" Type: %s\n", av_get_media_type_string(stream->codecpar->codec_type));
uint32_t fourcc = stream->codecpar->codec_tag; printf(" FourCC: %c%c%c%c\n", fourcc & 0xff, (fourcc >> 8) & 0xff, (fourcc >> 16) & 0xff, (fourcc >> 24) & 0xff); }
// Close the multimedia file and free the context structure avformat_close_input(&format_context); Once we’ve got the different streams from inside the multimedia file, we need to find specific codecs to decode the streams to raw audio and raw video data. All codecs are statically included in libavcodec. You can easily create your own codec by just creating an instance of the FFCodec structure and registering it as an extern const FFCodec in libavcodec/allcodecs.c, but this would be a different topic for another post. To find the codec corresponding to the content of an AVStream, we can use the following code: // Stream obtained from the AVFormatContext structure in the former streams listing loop AVStream* stream = format_context->streams[i];
// Search for a compatible codec const AVCodec* codec = avcodec_find_decoder(stream->codecpar->codec_id); if (!codec) { fprintf(stderr, "Unsupported codec\n"); continue; } printf(" Codec: %s, bitrate: %" PRId64 "\n", codec->name, stream->codecpar->bit_rate);
if (codec->type == AVMEDIA_TYPE_VIDEO) { printf(" Video resolution: %dx%d\n", stream->codecpar->width, stream->codecpar->height); } else if (codec->type == AVMEDIA_TYPE_AUDIO) { printf(" Audio: %d channels, sample rate: %d Hz\n", stream->codecpar->ch_layout.nb_channels, stream->codecpar->sample_rate); } With the right codec and codec parameters extracted from the AVStream information, we can now allocate the AVCodecContext structure that will be used to decode the corresponding stream. It is important to remember the index of the stream we want to decode from the former streams list (format_context->streams) because this index will be used later to identify the demuxed packets extracted by the AVFormatContext. In the following code we’re going to select the first video stream contained in the multimedia file. // first_video_stream_index is determined during the streams listing in the former loop int first_video_stream_index = ...;
AVStream* first_video_stream = format_context->streams[first_video_stream_index]; AVCodecParameters* first_video_stream_codec_params = first_video_stream->codecpar; const AVCodec* first_video_stream_codec = avcodec_find_decoder(first_video_stream_codec_params->codec_id);
// Allocate memory for the decoding context structure AVCodecContext* codec_context = avcodec_alloc_context3(first_video_stream_codec);
// Configure the decoder with the codec parameters avcodec_parameters_to_context(codec_context, first_video_stream_codec_params);
// Open the decoder avcodec_open2(codec_context, first_video_stream_codec, NULL); Now that we have a running decoder, we can extract the demuxed packets using the AVFormatContext structure and decode them to raw video frames. For that we need 2 different structures:
AVPacket which contains the encoded packets extracted from the input multimedia file, AVFrame which will contain the raw video frame after the AVCodecContext has decoded the former packets.
// Allocate memory for the encoded packet structure AVPacket* packet = av_packet_alloc();
// Allocate memory for the decoded frame structure AVFrame* frame = av_frame_alloc();
// Demux the next packet from the input multimedia file while (av_read_frame(format_context, packet) >= 0) { // The demuxed packet uses the stream index to identify the AVStream it is coming from printf("Packet received for stream %02d, pts: %" PRId64 "\n", packet->stream_index, packet->pts);
// In our example we are only decoding the first video stream identified formerly by first_video_stream_index if (packet->stream_index == first_video_stream_index) { // Send the packet to the previsouly initialized decoder int res = avcodec_send_packet(codec_context, packet); if (res < 0) { fprintf(stderr, "Cannot send packet to the decoder: %s\n", av_err2str(res)); break; }
// The decoder (AVCodecContext) acts like a FIFO queue, we push the encoded packets on one end and we need to // poll the other end to fetch the decoded frames. The codec implementation may (or may not) use different // threads to perform the actual decoding.
// Poll the running decoder to fetch all available decoded frames until now while (res >= 0) { // Fetch the next available decoded frame res = avcodec_receive_frame(codec_context, frame); if (res == AVERROR(EAGAIN) || res == AVERROR_EOF) { // No more decoded frame is available in the decoder output queue, go to next encoded packet break; } else if (res < 0) { fprintf(stderr, "Error while receiving a frame from the decoder: %s\n", av_err2str(res)); goto end; }
// Now the AVFrame structure contains a decoded raw video frame, we can process it further... printf("Frame %02" PRId64 ", type: %c, format: %d, pts: %03" PRId64 ", keyframe: %s\n", codec_context->frame_num, av_get_picture_type_char(frame->pict_type), frame->format, frame->pts, (frame->flags & AV_FRAME_FLAG_KEY) ? "true" : "false");
// The AVFrame internal content is automatically unreffed and recycled during the next call to // avcodec_receive_frame(codec_context, frame) } }
// Unref the packet internal content to recycle it for the next demuxed packet av_packet_unref(packet); }
// Free the previously allocated memory for the different FFmpeg structures end: av_packet_free(&packet); av_frame_free(&frame); avcodec_free_context(&codec_context); avformat_close_input(&format_context); The way the former code is acting is resumed in the next diagram:
You can download the full code here or directly access to the code repository. To build the example you will need meson and ninja. If you have python and pip installed, you can install them very easily by calling pip3 install meson ninja. Then, once the example archive extracted to a ffmpeg-101 folder, go to this folder and call: meson setup build. It will automatically download the right version of FFmpeg if you don’t have it already installed on your system. Then call: ninja -C build to build the code and ./build/ffmpeg-101 sample.mp4 to run it. You should obtain the following result: File: sample.mp4, format: mov,mp4,m4a,3gp,3g2,mj2 ---- Stream 00 Time base: 1/3000 Framerate: 30/1 Start time: 0 Duration: 30000 Type: video FourCC: avc1 Codec: h264, bitrate: 47094 Video resolution: 206x80 ---- Stream 01 Time base: 1/44100 Framerate: 0/0 Start time: 0 Duration: 440320 Type: audio FourCC: mp4a Codec: aac, bitrate: 112000 Audio: 2 channels, sample rate: 44100 Hz Packet received for stream 00, pts: 0 Send video packet to decoder... Frame 01, type: I, format: 0, pts: 000, keyframe: true Packet received for stream 00, pts: 100 Send video packet to decoder... Frame 02, type: P, format: 0, pts: 100, keyframe: false Packet received for stream 00, pts: 200 Send video packet to decoder... Frame 03, type: P, format: 0, pts: 200, keyframe: false Packet received for stream 00, pts: 300 Send video packet to decoder... Frame 04, type: P, format: 0, pts: 300, keyframe: false Packet received for stream 00, pts: 400 Send video packet to decoder... Frame 05, type: P, format: 0, pts: 400, keyframe: false Packet received for stream 00, pts: 500 Send video packet to decoder... Frame 06, type: P, format: 0, pts: 500, keyframe: false Packet received for stream 00, pts: 600 Send video packet to decoder... Frame 07, type: P, format: 0, pts: 600, keyframe: false Packet received for stream 00, pts: 700 Send video packet to decoder... Frame 08, type: P, format: 0, pts: 700, keyframe: false Packet received for stream 01, pts: 0 Packet received for stream 01, pts: 1024 Packet received for stream 01, pts: 2048 Packet received for stream 01, pts: 3072 Packet received for stream 01, pts: 4096 Packet received for stream 01, pts: 5120 Packet received for stream 01, pts: 6144 Packet received for stream 01, pts: 7168 Packet received for stream 01, pts: 8192 Packet received for stream 01, pts: 9216 Packet received for stream 01, pts: 10240 Packet received for stream 01, pts: 11264 Packet received for stream 01, pts: 12288 Packet received for stream 01, pts: 13312 Packet received for stream 01, pts: 14336 Packet received for stream 01, pts: 15360 Packet received for stream 01, pts: 16384 Packet received for stream 01, pts: 17408 Packet received for stream 01, pts: 18432 Packet received for stream 01, pts: 19456 Packet received for stream 01, pts: 20480 Packet received for stream 01, pts: 21504 Packet received for stream 00, pts: 800 Send video packet to decoder... Frame 09, type: P, format: 0, pts: 800, keyframe: false Packet received for stream 00, pts: 900 Send video packet to decoder... Frame 10, type: P, format: 0, pts: 900, keyframe: false
Previous: Use EGLStreams in a WPE WebKit backendNext: Have fun with Cam and Berry |
This document, “FFmpeg 101,” provides a foundational introduction to the FFmpeg project, designed for college graduates seeking a high-level understanding of its core functionalities. FFmpeg is a powerful, open-source framework used for encoding, decoding, transcoding, and streaming multimedia content. The document breaks down FFmpeg’s architecture into key components and illustrates a basic workflow for demuxing and decoding a multimedia stream from a file, like an MP4.
The project is composed of several distinct elements, primarily categorized as tools, libraries, and a simple player. The tools – `ffmpeg`, `ffplay`, and `ffprobe` – offer command-line capabilities for manipulating multimedia files. `ffmpeg` is the core tool for converting between formats, `ffplay` provides a basic, low-latency media player, and `ffprobe` is used for analyzing media streams. Beyond the tools, FFmpeg’s libraries, specifically `libavformat`, `libavcodec`, `libavfilter`, `libavdevice`, `libavutil`, `libswresample`, `libswscale` and `libpostproc`, offer integrated features for developers to incorporate into their own applications. These libraries handle diverse tasks ranging from input/output management (`libavformat`), encoding and decoding (`libavcodec`), graph-based filtering (`libavfilter`), device interaction (`libavdevice`), and utility functions (`libavutil`). Sub-libraries like `libswresample` manages audio resampling, `libswscale` handles color conversions and scaling, and `libpostproc` contains video post-processing functionalities.
The document highlights the core structures within FFmpeg, notably the `AVFormatContext`, `AVStream`, `AVCodec`, `AVPacket`, and `AVFrame`. The `AVFormatContext` represents a high-level structure providing synchronization, metadata, and muxing capabilities for streams. An `AVStream` represents a continuous audio or video stream. An `AVCodec` defines how data are encoded and decoded. An `AVPacket` holds the encoded data. Finally, an `AVFrame` contains the decoded data, representing either raw video frames or raw audio samples.
The workflow outlined in the document centers around demuxing a multimedia stream and decoding its components. It demonstrates the process of reading an encoded multimedia stream (e.g., an MP4 file) from disk using `avformat_open_input`, analyzing its content with `avformat_find_stream_info`, identifying accessible streams via the `nb_streams`, and then extracting the audio and video streams. The key steps involved are allocation of memory using `avformat_alloc_context`, opening the multimedia file, scanning the file to find streams, listing the stream data, finding the codec corresponding to the stream using `avcodec_find_decoder`, and allocating the decoding context using `avcodec_alloc_context3`.
The example code presented demonstrates the fundamental steps involved in decoding a video stream. It initializes a `AVCodecContext` for decoding, utilizes `avcodec_send_packet` to send encoded packets to it, and monitors the output with `avcodec_receive_frame` to retrieve decoded video frames. The process involves processing the packets and frames to identify the video stream and then extracting, processing, and rendering the decoded frames. These are followed by releasing the previously allocated memory structures through `av_packet_free`, `av_frame_free` and `avcodec_free_context`.
Crucially, the document emphasizes the use of the core libraries, particularly `libavformat` and `libavcodec`, for achieving this functionality. The provided example code and the included source code repository at `ffmpeg-101` allow for direct experimentation and further exploration of FFmpeg's capabilities. The “sample.mp4” file, produced by this example, demonstrates the output of the process, allowing for verification of the encoding and decoding steps. The document clearly provides the necessary code snippets and instructions for building and running the example. |