Esoteric Tek: Understanding FFMPEG's Group of Pictures (GOP) Options

Disclaimer: Most of following was originally written by thljcl and is available unedited at ffmpeg-archive.org. The following is a heavily edited version of that post.

Video Theory:

By definition, a video is a series of still images which, when shown on screen, creates the illusion of moving images. Frame rate is the frequency at which an imaging device produces unique consecutive images called frames. Frame rates may be constant (CFR) or variable (VFR).

A series of images in a sequence can be encoded into a single video file with lossless encoding (huffyuv, lagarith, x264 w/crf=0). It is possible to decode that video file back to an image sequence for editing purposes and get back the original frames. We can edit “video” frame by frame as an image sequence in image manipulation programs (Adobe Photoshop, mspaint, GIMP, ImageMagick, waifu2x), or in video editing software (Adobe Premiere, AviSynth, Sony Vegas). Editing can include modifying, adding, removing, duplicating frames and treating them as “video clips.”

In contrast to lossless, lossy encoding discards information present in the image sequence and so it is not possible to extract out the frames perfectly. Lossy encoding in not a perfectly reversible process and doing it will compromise image quality, especially if done successively. Videos meant for commercial release or distribution typically use highly compatible but lossy codecs. Distribution codecs are not meant for editing, or storing the “master” image sequence.

Q: What is the (technical) difference between -r and the “fps” filter?

Disclaimer: The following is as well as I know and based on my (“thljcl’s”), and the blog’s author’s, own thoughts and experiments.

The “fps” video filter can handle frame rate conversion accurately with regards to not changing the length; -r, on the other hand, is meant to assume a given frame rate and may change the video length. -r is not a replacement for video filter and vica-versa depending on usage scenarios.

ffmpeg can encode a series of image sequence into a single video file. For example:

ffmpeg -i "frames\f_%06d.png" -c:v libx264 -crf 0 output.mkv

Notice there was no frame rate specified. Since one was not specified and there is no inherent frame rate for discrete images the framerate will be 25. It is 25 FPS because -r “25” is the default preset in ffmpeg for GOP input.

Note that to tell ffmpeg the still images or input frame rate is 24 FPS, -r is placed before -i. If placed after -i, then -r would indicate the output rate. -crf “0”, short for “constant rate factor,” is used to tell the libx264 encoder to use “lossless mode.”

The film industry generally chooses 24 FPS in film capturing and production. In that case:

ffmpeg -r 24 -i "frames\f_%06d.png" -c:v libx264 -crf 0 output.mkv

The discrete images can be extracted back out of output.mkv:

mkdir "frames"
ffmpeg -i output.mkv "frames\f_%06d.png"

This case omits -r both before and after -i. ffmpeg can detect the frame rate from the source output.mkv. Omission of -r in the output option (after -i) means the frame rate of the source should be used. The conversion between discrete images and a video (lossless encoding) is thus reversible. But what if -r is placed after -i?

ffmpeg -i "frames\f_%06d.png" -r 24 -c:v libx264 -crf 0 output.mkv

Of course, omission of -r before -i in an image sequence means that ffmpeg will assume the source is at 25 FPS. Because -r is placed after -i the output video will be 24 FPS. The achievement of 24 FPS output, which is lower than the source input of 25 FPS, is only possible if some frames are dropped. This is indeed frame rate conversion from 25 FPS to 24 FPS. Since ffmpeg had to drop frames to do a frame rate conversion, outpit.mkv would be missing frames. If the images were actually at 24 fps, then 25 fps would play the video faster (assuming a constant frame rate) and so it would be “shorter” than expected as well.

However, the algorithm of -r is different from video filter “fps.”

ffmpeg -i "frames\f_%06d.png" -vf "fps=24" -c:v libx264 -crf 0 output.mkv

ffmpeg will add or discard different frames depending on whether -r or video filter “fps” are in use. The frames in output.mkv using the “fps” filter will be different than using -r after -i and also different from the input.

In constant frame rate video, only a certain number of frames are both allowed and required for a given length; whenever frames are added or removed from the image sequence, there will be changes in video length as well. This is inevitable and what happens when -r is used to specify the frame rate. The “fps” filter is meant to preserve the input length when converting between different frame rates. The frame quantization and/or rounding frame rates calculations the “fps” filter does internally will end up outputting different frames than -r because the “fps” will not change the video length.

Q: If -r is not meant for frame rate conversion, then when should it be used?

Unlike -r, video filter “fps” cannot be used to specify the frame rate for a series of still images sequence. The following is invalid:

ffmpeg -vf "fps=24" -i "frames\f_%06d.png" -c:v libx264 -crf 0 output.mkv

However the following works as intended and should be used when working with GOPs:

ffmpeg -r 24 -i "frames\f_%06d.png" -c:v libx264 -crf 0 output.mkv
# or
ffmpeg -r 24000/1001 -i "frames\f_%06d.png" -c:v huffyuv output.avi

While -r can add and drop frames when changing frame rates, that is not the entire story of how -r works.

Q: How exactly does -r handle frame rate conversion compared to video filter “fps”? What does -r really do?

ffmpeg -i input.mkv -r 29.97 -c:v "libx264" -crf 0 output.mkv

Assume input.mkv has a constant frame rate of 24. When checking the media info of output.mkv, it reports as having a variable frame rate of 29.97. What does the meta-information of 29.97 VFR present in the media file actually mean in this case?

mkdir frames
ffmpeg -r 1 -i output.mkv "frames\f_%06d.png"

Guess what! The number of frames did not change after the encoding.

Variable frame rate means the frame rate could be any and may be changed while the video is being played. Another way of thinking about VFR is that any given frame may be displayed on the screen for a longer or shorter period than any other frame. It does not necessarily mean the framerate actually does change but that it can.

With 24 frames present for every second of video in “output.mkv” that has a VFR of up to 29.97 FPS, how many additional or fewer frames are necessary to achieve that 29.97 FPS? None. Assuming the media player can actually handle such media correctly, it has 24 frames every second to display, and so it just displays them. That is all. It is merely aware that at some point in time a frame may be displayed shorter or longer than another frame. Since the source has a CFR of 24, that never actually happens.

In this case, when actually playing back the file, and for the length of the video to remain unchanged, no change in either the number of frames or the actual frame rate of 24 CFR was necessary. If the media player cannot actually handle such media correctly during playback, the video file may be played at a different speed than originally intended.

So the only thing that changes when -r is placed after -i is the media info reported.

ffmpeg -r 1 -i output.mkv "frames\f_%06d.png"

Why is “-r 1” before -i? Because that will output how many frames the video file actually has. “-r 1” tells ffmpeg to select every frame.

If -r “1” is omitted, -r “29.97” would be assumed since that is what being reported by media player. However, instead of extracting every frame from the video, ffmpeg will add/drop frames to achieve 29.97 FPS since the actual frame rate was reported as “variable” in order to output a GOP. Again, -r, when specifying a frame rate, will add and remove frames as necessary to achieve that frame rate for CFR. But then, ffmpeg would not output how many frames output.mkv actually has. Telling ffmpeg to select every frame will cause it to ignore the frame rate contained in the source meta-information. ffmpeg will then extract the real frames from the output video. If the video really had a VFR, this would cause a desync later during playback, which is why ffmpeg tries to honor the media information by default.

ffmpeg will actually add frames if I choose to create a series of still images first. To illustrate that:

mkdir output
ffmpeg -i input.mkv -r 29.97 "output\f_%06d.png"
ffmpeg -r 29.97 -i "output\f_%06d.png" -c:v libx264 -crf 0 output.mkv

The above argument is simple: because “variable frame rate” simply does not exist for a series of still images sequence, the only way to achieve 29.97 FPS from the source of 24 FPS is to add frames for CFR video/image sequences. Video filter “fps” will also force the output video to have constant frame rate. Question: Will the above output.mkv have the same, shorter, or longer length than input.mkv? Why? Think about it.

ffmpeg -i input.mkv -vf "fps=29.97" -c:v libx264 -crf 0 output.mkv

The answer is that when it comes to conversion to higher frame rate, video filter “fps” and -r “29.97” actually create video of the same length; but upon closer inspection of video filter “fps” and -r “29.97”, each will actually duplicate different frames.

When it comes to conversion to lower frame rate, -r will increase the video length of 1/24 – 1/12 second, depending on which codec is being used; while video filter “fps” largely preserves the video length up to expected quantization. In other words, the “fps” filter converts between different frame rates while preserving length and -r is meant to tell ffmpeg how to select the frames actually present and at what frame rate. If the input does not match the ouput, -r can produce strange results. When trying to extract out all of the frames, the FPS filter will extract out or duplicate different frames then -r since it is intending to convert between different frame rates while also preserving length.

The bottom line is that -r and video filter “fps” are technically different from each other. Video filter “fps” is meant for frame rate conversion. -r is meant for assuming a given framerate and thus is better suited to extracting and reading groups of pictures. Do not use the “fps” filter to extract or combine GOPs and do not use -r to change the FPS of a video when working with GOP or transcoding.

Use the following to output an image sequence:

ffmpeg -i output.mkv "frames\f_%06d.png"
#Or for CFR in a VFR container:
ffmpeg -r 1 -i output.mkv "frhames\f_%06d.png"

And use the following to read and encode an image sequence losslessly:

ffmpeg -r 24 -i "frames\f_%06d.png" -c:v libx264 -crf 0 output.mkv
# or for a different fps, codec, and container:
ffmpeg -r 24000/1001 -i "frames\f_%06d.png" -c:v huffyuv output.avi

Suggestions:

Never use the fps filter when working with GOPs unless trying to deliberatly change the framerate.
Never use the “fps” filter to specify input.
Never use -r to specify output (if keeping every frame).
Only use -r when specifying input and that input is a GOP.
When extracting images from video (regardless of CFR or VFR), do not specify either -r or use the “fps” filter. This will prompt ffmpeg to use the video’s FPS information from the video stream’s metadata and extract frames that will conform to that FPS.
Only if the number of frames does not match the metadata (CFR encoded as VFR) would specifying -r 1 prior to the input be advisable when extracting images from video. Doing so would still not be necessary in terms of retaining the existing length of the video.

Regarding Colorspace Compatibility:

Images are usually stored in the RGB colorspace. Many image readers, such as Windows Photo Viewer and Adobe Photoshop, cannot properly decode images in the YUV colorspace ((technically YCbCr) and/or are picky about the format (jpeg, png, tif, bmp) and/or the image bit-depth and/or precision. Others, like ImageMagick, IrfanView and waifu2 offer more expansive support. If compatibility becomes an issue, the recommended likely compatible lossless image format to use is to use the RGB24 variant of Portable Network Graphics (PNG).

ffmpeg will convert YUV video into RGB24 if exported in this format. For other formats, like tif, use the video filter “format” using its pix_fmt alias to specify the colorspace. Example:

ffmpeg -i input.mkv -pix_fmt rgb24 "frames\f_%06d.tif"

Use the following to check the available pixel formats:

ffmpeg -pix_fmts

Unlike discrete frames (also called “images”), video normally is in the YUV colorspace (technically YCbCr) with the chroma downsampled by 3/4 with 8-bit color precision per channel. That is a very specific format. Most AVC/HEVC decoders, especially hardware ones, need video in that exact format with to decode it properly. libx264 will automatically convert RGB chroma to YUV and, unless steps are taken to ensure otherwise, will use 8-bit precision. Depending upon various factors, chroma downsampling may or may not occur however.

To specify the chroma subsampling level, use the “format” filter using its -pix_fmt alias:

ffmpeg -i "frames\f_%06d.tif" -c:v libx264 -pix_fmt yuv420p input.mkv
#or to disable chroma subsampling:
ffmpeg -i "frames\f_%06d.tif" -c:v libx264 -pix_fmt yuv444p input.mkv

To encode in the RGB colospace:

ffmpeg -i "frames\f_%06d.tif" -c:v libx264rgb input.mkv

To encode video with a color precision different than 8-bit, either compile/link the 10/12-bit binary libraries dynamically (dll files) and use the appropriate pixel format or just download x264 statically linked variants (x264-10b.exe, x264-12b.exe) or x265 ones, rename appropriately, and use them with ffmpeg frontends like MeGUI or vEncode.

Technically, converting between colorspace formats generates rounding errors and is so is not perfectly lossless. This difference is negligible as long as the correct conversion matrix gets used. However, an inappropriate color matrix will distort color dramatically. The specifics of how to choose the correct matrix, or various workarounds to avoid colorspace conversion, vary dramatically based upon the specific programs and filters used in the workflow.

Random tips:

ffmpeg has this vague guide on it and some cryptic documentation.
For waifu2x, do not go above 16-bit precision per channel. 24 has issues.
Do not even try x265 yuv444p at 12-bit. Do not even try.
To load HEVC yuv444p video in AVISynth, encode to a lossless intermediary format like huffyuv, lagarith or an image sequence. Also see the Convert documentation.
If ImageReader() is not reading the chroma of some images correctly, then try CoronaSequence and vice-versa.
Many AVISynth filters need YV12 (4:2:0) or YV16/YUY2 (4:2:2) chroma. After loading an image sequence, use ConvertToYV12() and specify a color matrix:

ConvertToYV12(matrix="Rec601")   #(Rec601), PC.601, Rec709, PC.709

Summary Table:

Creating GOPs:

Command	Translation
ffmpeg -i input.mkv “frames\f_%06d.png”	Read the framerate from input.mkv and then extract that many frames into .\frames\ in .png format.
ffmpeg -r 24 -i input.mkv “frames\f_%06d.png”	Extract 24 frames per second from input.mkv and store them in .\frames\ in .png format.
ffmpeg -r 1 -i input.mkv “frames\f_%06d.png”	Ignore the frame rate meta-information present in input.mkv and extract every frame to .\frames\ in .png format.
ffmpeg -i input.mkv -vf “fps=24” “frames\f_%06d.png”	Read the framerate from input.mkv. Output 24 frames for every second into .\frames\ in .png format without changing the length of the input video.
ffmpeg -i input.mkv -r 24 “frames\f_%06d.png”	Read the framerate from input.mkv. Output 24 frames for every second into .\frames\ in .png format with no regard for the input framerate.
ffmpeg -i input.mkv -pix_fmt rgb24 “frames\f_%06d.tif”	Read the framerate from input.mkv, convert the frames to the rgb24 colorspace and place that many frames into .\frames\ in .tif format.

Encoding from GOPs:

Command	Translation
ffmpeg -i “frames\f_%06d.png” -c:v libx264 -crf 0 output.mkv	Use the default framerate of 25 and encode the image sequence using libx264 losslessly into ouput.mkv
ffmpeg -r 24000/1001 -i “frames\f_%06d.png” -c:v huffyuv output.avi	Use a framerate of 23.976 and encode the image sequence using huffyuv and place it in ouput.avi
ffmpeg -r 24 -i “frames\f_%06d.png” -c:v libx264 -crf 18 output.mkv	Use a framerate of 24 and encode the image sequence using libx264 with a CRF of 18 (decent quality) into ouput.mkv.

Esoteric Tek

16 April 2017

Understanding FFMPEG's Group of Pictures (GOP) Options