Using libav library
It has been long time since I wrote here and I really wanted to do again
… but lately my life has had many changes. I recently changed job and city. Whoever follows me for a while knows that I worked as researcher in the “Artificial Vision Applications” group of the University of Córdoba (Spain). Now I work in Seville, in FADA-CATEC, and I’m really happy because I have the opportunity to continue working in topics closely related to my previous research but applied to the industry and therefore I can see the final implementation and operation of these researchs in real systems.
In this first month in my new work I had to tackle with new topics such as the setup of a embedded system with GNU/Linux distributions (Specifically I had work with the BeagleBoard) , and the creation, transmission and reception of video and data streams with the standard MPEG-2 TS (for transmitting data from an unmanned aircraft to a ground station and vice versa). For this second task I studied different alternatives to use, and after shuffling some proprietary software tools, which of course only works on platforms .NET and are quite expensive, in the end we decided to give a change to the free tool libav.
I already knew before this library, and I knew that it was used in many projects. If libav isn’t familiar for you maybe you know the name to ffmpeg, which is one of the programs that are distributed with this library (actually, ffmpeg and Libav were a single project but it seems to be separated by differences between some of their administrators. The gossips can learn more about this issue in the section “Legal Threats” of ffmpeg’s website
). Despite its widespread use and the huge size of the library (implements the majority of existing codecs), I was surprised with the very little documentation and examples in the network dedicated to it. In fact, its webpage only refer to two tutorials, one dated on 02-2004 and the other without specifying a date it notices that it’s outdated. Not even in the documentation provided with doxygen perform a brief description of the main structures used in the library. Seeing this situation and having been on my skin during nearly two weeks, I raised publish a brief tutorial/introduction to the library that can be helpful for people like me that need to use it. And here I am willing to let this post serve as a reference for future generations ^^.
Installation
The modules that are part of the project Libav are (ordered by importance):
- libavcodec : Codecs implementation and definition of main structures.
- libavformat : Implementation of muxers/demuxers, protocols, and I/O structures.
- libavutil : Tools for the rest of modules.
- libswscale : Tool for conversion between formats, resize, etc.
- libavfilter: Filters.
- libdevice : Input devices (firewire, v4l, v4l2, audio, etc).
As in other important projects we have in the different Linux distributions the packages that install the library and headers in order to use the library in our programs. However, because of it’s a library constantly evolving and sometime it change its interface, the versions that are commonly found in the repositories of the distributions are quite old. I recommend to install the version 0.6.2 that can be downloaded directly from the website. To setup the project we will need to run the configure script as follows:
<pre>./configure --disable-yasm --enable-shared
And to compile and install the library in the traditional way.
General introduction
When we obtain the source code of Libava and seeing the little existing documentation related to the library, maybe many of you go directly to take a look to the for finding out how everything works. There are a couple of examples in the code that shows how to use many of the basic structures. These examples are in the following files:
- libavcodec/api-example.c
- libavformat/output-example.c
Initially it’s normal that you found a bit disconcert some of the names of the used structures. Some of the names of these structures doesn’t reveal the aim of themselves clearly and not even the relationship between them. The following image shows how they are related some of thestructures used (not all relationships are present) and some of the most importantmember variables. 
AVFormatContext
It is the backbone on which to run the majority of actions. We must create one of this structure for:
- Open files or streams from which read data.
- Save the streams we create in a file or send them by some protocol (as UDP).
In the former case, the variable nb_streams store the number of streams (AVStream) existing in the file of stream we are reading. These streams can be of different types: video, audio, data, subtitles, etc. In most of case each of these stream will have a codec associated in order to be able to decode the data received through this stream. In the latter situation, we create streams with the function av_new_stream and assign them the appropriate codec for coding the data we are going to send.
AVOutputFormat
When we use the AVFormatContext structure to generate a file or data stream, firstly we must set the output format that we will give it. This output format or encapsulation of data is implemented by different muxers existing in the library. In this post I will work with the MPEG-2 TS muxer, but you can use any other as MP4, Matroska, AVI, etc. Using the function guess_format we can specify the muxer to use and it will returns us a pointer to a AVOutputFormat with appropriate values. The structure AVInputFormat has the same task but in reverse. That is, is encoded by different demuxers existing in the library.
AVStream
Represents a data stream of a specific type: video, audio, subtitles, etc. For generating a new stream we have to use the function av_new_stream.
AVCodecContext
Streams have associated a structure AVCodecContext that determines the encoder or decoder to use. If we’re setting an encoder we must set the appropriate values for the different variables of the structure. Required fields to be established are:
- width : width of images.
- height: height of images.
- pix_fmt: Format of pixels (RGB24, BGR24, YUV420, GRAY8, etc.)
- time_base: Indicates a measure used subsequently to calculate the frame rate in the reproduction.
There are countless more fields and flags to set in order to enable some features, but these parameters depend on the specific codec you choose and the type of compression you want to configure. A proper configuration of these values could become an art … with the little existing documentation and the complexity of some compression algorithms if we don’t understand the algorithm’s details we have to test different settings until to accept an adequate quality and appropriate transmission rate.
AVFrame
Structure used for storing and loading any kind of data (image,audio,etc.) of/from a stream.
AVPacket
Structure used for writing/reading any kind of data (image, audio, etc.) to/from a Muxer/Demuxer.
Sample applications
Now I show you a real example where you there are two applications, one that performs the transmission and data encoding, and another that reads the stream created before and decode it. I will put small blocks of code in order to a better understanding of the different parts that often appear this kind of applications. I can not give you the entire code of my application because I am not allowed to do it (legal issues
). Furthermore, error checking statements are not shown to make the code clearer, but you must carry out checks on all functions that can return some kind of error!
Transmission & Encoding
This application perform the next steps:
- Muxer setup.
- Add video stream.
- Add data stream.
- Initialize muxer and output.
- Read images from disk.
- Encoding and sending data.
- Close muxer and free structures.
This is the declaration of the used variables:
AVOutputFormat * _of; // Output format AVFormatContext * _ifc; // images format context AVFormatContext * _tsfc; // muxer format context AVCodecContext * _vCC; // Video Codec Context AVCodecContext * _iCC; // Codec Context AVCodecContext * _dCC; // Codec Context struct SwsContext * _cvt_ctx; // Image Convert Context AVStream * _vStream; // Video stream AVStream * _dStream; // Data stream AVFrame * _picin; // Input image AVFrame * _picout; // Output image AVPacket _pkt; // Packet uint8_t * _vBuf; // Output video buffer int _vSize; // Video buffer size uint8_t * _dBuf; // Output data buffer int _dSize; // data buffer size
Muxer setup
_of = guess_format("mpegts", NULL, NULL);
_tsfc = avformat_alloc_context();
_tsfc->oformat = _of;
snprintf(_tsfc->filename, sizeof(_tsfc->filename), "%s", path);
In the first we search the appropriate AVOutputFormat by means of the function guess_format. In this function we can directly specify the muxer to use in the first argument or to use the second one with the file path and the function try to determine the muxer looking in to file the extension. In the line 2 we create a new AVFormatContext for our muxer. In the following two lines we set the AVOutputFormat pointer and the file path to our muxer.
It’s important to highlight that path could be both a output file name where streams will be saved or a TCP or UDP location where data will be streamed. We don’t need to differentiate these cases … libav does all the distinctions internally in an automatic way.
Add video stream
// Add stream to AVFormatContext
_vStream = av_new_stream(_tsfc, 0);
_vCC = _vStream->codec; // Take the pointer
// Set parameters
_vCC->codec_id = CODEC_ID_MPEG2VIDEO;
_vCC->codec_type = CODEC_TYPE_VIDEO;
_vCC->width = 640;
_vCC->height = 480;
_vCC->time_base= (AVRational){1,25};
_vCC->pix_fmt = PIX_FMT_YUV420P;
// otros ...
// Find and open codec
AVCodec *c = avcodec_find_encoder(_vCC->codec_id);
avcodec_open(_vCC, c);
// Allocate memory for video buffer and init AVFrame
_vSize = 200000;
_vBuf = (uint8_t *)av_malloc(_vSize);
_picout = alloc_picture(_vCC->pix_fmt, _vCC->width, _vCC->height);
There isn’t much to say here, I only want to highlight that depending on the selected codec we will be able to setup certain parameters or not. If we fix the parameters incorrectly the application likely will fail when it tries to open the codec or encode some data.
Add data stream
_dStream = av_new_stream(_tsfc, 0);
_dCC=_dStream->codec;
avcodec_get_context_defaults2(_dCC, CODEC_TYPE_DATA);
_dCC->codec_type=CODEC_TYPE_DATA;
_dCC->codec_id=CODEC_ID_TEXT;
_dCC->time_base= (AVRational){1,25};
// allocate data buffer
_dSize=size;
_dBuf = (uint8_t *)av_malloc(_dSize);
As you can appreciate this section is very similar to the previous one, but there is a peculiarity. With data streams is not obligatory to use a codec, we can send the data in raw format.
Initialize muxer and output
// some formats want stream headers to be separate if(_tsfc->oformat->flags & AVFMT_GLOBALHEADER) _dCC->flags |= CODEC_FLAG_GLOBAL_HEADER; av_set_parameters(_tsfc, NULL); dump_format(_tsfc, 0, _tsfc->filename, 1); // open the output file, if needed if (!(_of->flags & AVFMT_NOFILE)) url_fopen(&_tsfc->pb, _tsfc->filename, URL_WRONLY); av_write_header(_tsfc);
In line 4 we must call this function although we don’t have parameters to set (NULL). In the last line we write the headers of each stream in the header.
Read images from disk for sending them later
av_open_input_file(&_ifc, path, NULL, 0 , NULL); av_find_stream_info(_ifc); assert(_ifc->streams[0]->codec->codec_type != AVMEDIA_TYPE_VIDEO); _iCC=_ifc->streams[0]->codec; codec=avcodec_find_decoder(_iCC->codec_id); //Find decoder for images avcodec_open(_iCC, codec); //Open codec _picin=avcodec_alloc_frame(); av_read_frame(_ifc, &packet); int frameFinished; avcodec_decode_video2 (_iCC, _picin, &frameFinished, &packet); assert(frameFinished);
Operations from line 2 to 7 are needed only the first time we read an image from disk. If we are going to read several images and we know previously that all of them have the same format we can avoid to call these firsts 7 lines the following times. In line 8 we read a packet from AVFormatContext containing the input image and in line 10 we decode this image.
Encoding and sending data
// ---- Send image ---
if (_vCC->pix_fmt != _iCC->pix_fmt) // If formats differ
{
if (_cvt_ctx == NULL) // If converter is not created yet
_cvt_ctx = sws_getContext(_iCC->width, _iCC->height, _iCC->pix_fmt, _vCC->width, _vCC->height, _vCC->pix_fmt, SWS_BICUBIC, NULL, NULL, NULL);
sws_scale(_cvt_ctx, _picin->data, _picin->linesize, 0, _vCC->height, _picout->data, _picout->linesize);
}
else
_picout = _picin;
// Encode
out_size = avcodec_encode_video(_vCC, _vBuf, _vSize, _picout);
// Setup packet and send it
av_init_packet(&_pkt);
_pkt.stream_index= _vStream->index;
_pkt.pts= av_rescale_q(_vCC->coded_frame->pts, _vCC->time_base, _vStream->time_base);
_pkt.flags |= AV_PKT_FLAG_KEY;
_pkt.data= _vBuf;
_pkt.size= out_size;
av_interleaved_write_frame(_tsfc, &_pkt);
av_free_packet(&_pkt);
// ---- Send data ---
av_init_packet(&_pkt);
_pkt.pts= av_rescale_q(_vCC->coded_frame->pts, _vCC->time_base, _dStream->time_base);
_pkt.flags |= AV_PKT_FLAG_KEY;
_pkt.stream_index= _dStream->index;
_pkt.data = _dBuf;
_pkt.size= _dSize;
av_interleaved_write_frame(_tsfc, &_pkt);
av_free_packet(&_pkt);
Firstly we setup the converter. The module libswscale let us to create converters in order to convert images from their original format to other desired one simply calling one function. Once the image is converted to the format specified in the video encoder we compress this image using the encoder. After that we send both video and data using AVPackets. We must put special attention to the pts field of the AVPackets. This value determines the time stamp that will be accessed by other applications in order to show properly the output.
Close muxer and free structures
av_write_trailer(_tsfc); avcodec_close(_vStream->codec); av_free(_picout); av_free(_picin); av_free(_vBuf); av_free(_dBuf); av_free(_tsfc);
Receiver/Decoder
The application receiving or reading information and decoding it is too much straightforward. These are the steps performed in this application:
- Open file or location.
- Add video stream.
- Add data stream.
- Receive data.
This is the declaration of the variables that we are going to use:
AVFormatContext * _ifc; // Input Format context struct SwsContext * _cvt_ctx; // image convert context int _vIdxStream; // Video Index Stream AVCodecContext * _vCC; // Video Codec Context int _dIdxStream; // Data Index Stream AVCodecContext * _dCC; // Data Codec Context unsigned int _fr; // Frame rate for stream AVFrame * _picin; // Input image AVFrame * _picrgb; // Output image uint8_t * _vBuf; // video buffer int _vSize; // video Buffer size uint8_t * _dBuf; // Data buffer int _dSize; // Data buffer size AVPacket _pkt; // Packet received by AV int _width; // Width of images int _height; // Height of images unsigned int _frame; // Frame counter bool _eof; // End of stream unsigned int _sleep; // Sleep time in msecs.
Open file or location
av_open_input_file(&_ifc, path, NULL, 0, NULL);
av_find_stream_info(_ifc);
for (unsigned int i=0; i<_ifc->nb_streams; i++)
{
if (_ifc->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO)
{
_vIdxStream=i;
// Add video stream (see next section)
}
else if (_ifc->streams[i]->codec->codec_type==AVMEDIA_TYPE_DATA)
{
_dIdxStream=i;
// Add data stream (see next section)
}
}
The function in the first line is obligatory but we can have problems with the function av_find_stream_info appearing in the second line. The function av_open_input_file calculates some parameters of the AVFormatContext structure such as the number of streams, its type and the codecs used in each stream. However, it doesn’t provide some interesting values about the streams like the frame rate, width or height of images, etc. This parameters are retrieved by the function av_find_stream_info analysing few packets from streams. But this function expects a specific format according to the demuxer detected. For the particular case in which we send video and data over UDP with a mpegts muxer, the function av_find_stream_info stay hanged receiving packets without be able to determine these parameters. So, for this particular case we have to stablish these parameters manually as I explain later.
Add video stream
_vCC = _ifc->streams[i]->codec;
// Set parameters manually (if we haven't use av_find_stream_info)
_vCC->pix_fmt=PIX_FMT_YUV420P;
_vCC->width=640;
_vCC->height=480;
_vCC->bit_rate=104875200;
_vCC->time_base= (AVRational){1,25};
AVCodec * codec=avcodec_find_decoder(_vCC->codec_id);
avcodec_open(_vCC, codec);
_picin=avcodec_alloc_frame();
_picout=avcodec_alloc_frame();
_vSize=avpicture_get_size(PIX_FMT_BGR24, _vCC->width, _vCC->height);
_vBuf=new uint8_t[_vSize];
avpicture_fill((AVPicture *)_picout, _vBuf, _oFormat, _width, _height);
_cvt_ctx = sws_getContext(_vCC->width, _vCC->height, _vCC->pix_fmt, _vCC->width, _vCC->height, _oFormat, SWS_BICUBIC, NULL, NULL, NULL);
In the first line we capture the pointer to the AVCodecContext structure in order to use their fields easily. In lines 4-8 we set the parameters of video stream we are going to receive. I repeat again that we set these parameters for the problem commented previously with the function av_find_stream_info. If you don’t have problems with this function it’s always better to use it because of in this way you can obtain automatically the parameters. If we configure manually these parameters we need to know previously the specific configuration of the data we are going to receive. After that we open the codec, allocate the AVFrames and the video buffer and create the converter to transform the images received into a desired format.
Add data stream
_dCC = cc;
avcodec_get_context_defaults2(_dCC, CODEC_TYPE_DATA);
_dCC->time_base= (AVRational){1,25}; // Only if we didn't use av_find_stream_info
_dCC->codec_id=CODEC_ID_TEXT;
In the data stream we also need to specify the field time_base is case of we didn’t use the function av_find_stream_info.
Receive data
if (av_read_frame(_ifc, &_pkt)>=0)
{
if (_pkt.stream_index==_vIdxStream) // packet from video stream
{
avcodec_decode_video2(_vCC, _picin, &fF, &_pkt);
sws_scale(_cvt_ctx, _picin->data, _picin->linesize, 0, _vCC->height, _picout->data, _picout->linesize);
_sleep = _pkt.duration / 90;
}
else if (_pkt.stream_index==_dIdxStream) // packet data stream
{
std::copy(_pkt.data, _pkt.data+_dSize, _dBuf);
}
av_free_packet(&_pkt);
}
Firstly we try to tread a packet and then we check from what stream it comes (video or data). If it comes from video stream we have to decode and convert it to an appropriate format. With data streams we can handle raw data as we want. In this case we copy data to a buffer previously allocated.
Additional notes
Well, in this post I tried to highlight some of the basic concepts of libav. In this library we found most of existing codecs implemented and it have been in evolving since several years so it is a robust software. We can found also several network protocols implemented (TCP, UDP, RTP, STRP, etc.) so it can be used for a wide range of applications.
As I mentioned at the beginning there is very little documentation and code samples related to this library and to be able to do certain things in your firsts days of novice with the library could become a hell. I you know a little of English the best you can do is to ask directly on the project mailing list of the projects libav and ffmpeg y wait to some kind soul gives you a hand. Of course, you can also ask me if you don’t understand something of this post and I will try to help you as much as possible ^^.
Enlaces
- http://es.wikitel.info/wiki/MPEG_2
- http://es.wikipedia.org/wiki/Transport_Stream
- http://libav.org/
- http://www.ffmpeg.org/
- Using libavformat and libavcodec (Tutorial 1 from libav webpage)
- An FFmpeg and SDL Tutorial (Tutorial 2 from libav webpage)
loading...



loading...
Hi!
First I want to thank you for this great tutorial!
I wanted to ask if av_write_header worked for your data stream. Because I tried to include a data stream to an mpeg file and I can’t manage to do a successful av_write_header / avformat_write_header. It always fails without any error message and I don’t know why.
Do you have any tips?
Thanks a lot!
Sabine
loading...
Hi sabine, I didn’t have problems with this function and It’s been a while since I do not use the library. I’m sorry for not being able to help you.
loading...
hello piponazo,
could you tell me how to send data? i think above a mistake happened. please have a loot at this lines:
// Setup packet and send it
av_init_packet(&_pkt);
_pkt.stream_index= _vStream->index;
_pkt.pts= av_rescale_q(_vCC->coded_frame->pts, _vCC->time_base, _vStream->time_base);
_pkt.flags |= AV_PKT_FLAG_KEY;
_pkt.data= _vBuf;
_pkt.size= out_size;
av_interleaved_write_frame(_tsfc, &_pkt);
av_free_packet(&_pkt);
// —- Send data —
av_init_packet(&_pkt);
_pkt.pts= av_rescale_q(_vCC->coded_frame->pts, _vCC->time_base, _dStream->time_base);
_pkt.flags |= AV_PKT_FLAG_KEY;
_pkt.stream_index= _dStream->index;
_pkt.data = _dBuf;
_pkt.size= _dSize;
av_interleaved_write_frame(_tsfc, &_pkt);
av_free_packet(&_pkt)
setup packet and send data is the same?
loading...
@el porrito
Hi. It isn’t the same. In the first block of code I’m packing and sending the video stream (_vBuf & out_size) while in the second one I’m packing and sending the data stream (_dBuf & _dSize).
loading...
you are right, thx.
loading...
An excelent tutorial. It saved me lot of hours and headaches.
“zenkiú verymach!”