LuminousMonkey

NVEncoder vs x264

CPU video encoding, or GPU video encoding, which one should you use?

Anyone who is interested in streaming has probably, at some point wondered what is the best method for encoding their stream. The answer, like most things can be accurately answered in all cases with "it depends".

There are a few factors that come into play that will affect what is the best approach, but the biggest factor at the end of the day will be if you have the processing power to use CPU based encoding or not. All GPU encoders can encode at 60 FPS or greater, however your processor may not, and if it can, then it may get bogged down with other tasks.

GPU encoders have the advantage of having dedicated logic in the GPU that is just for encoding, and while you will take a performance hit, it is so slight as to be unnoticeable. So why use CPU encoders at all? Quality. Although they're great from a performance point of view, it is said that they suffer from poorer encoding quality, and typically if you're a content creator you want to make sure that your stream looks good.1

So, the point of this blog post, my gentle reader, is to try and give an objective answer to the question, "Is CPU based encoding better than GPU based encoding? And when should you use it?" Now, because I have Nvidia graphics cards, and x264 is pretty much the software option for CPU based encoding, that is the reason for the title.

Typically, as I mentioned before, this is only a question if you have CPU power to spare, so generally I'm talking about a setup where you have a PC dedicated to streaming and it has the horsepower needed that CPU encoding is an option. You can still have a dedicated PC setup that uses GPU encoding, or you may play on a console or play retro games on original hardware.

That said, Brogers uses x264 with a single PC setup, so it can vary because you can tweak the settings so it uses less CPU power, but are those settings worth it over using NVEnc? That's what this post is about!

Results

Basically, in most cases, x264 medium provides better quality. However that doesn't mean it's the best thing to use.

My Rake buildfile results in a CSV file that's about 45MB in size, it's this large because I also do things like collect the frame size in bytes, the type of frame and a bunch of different bitrates including a CRF2 option for x264 encoding. Taking this data, and putting it into R, lets us do some analysis so we can compare encoders.

For example, we can graph NVEnc vs x264 medium at a 2500, 3500 and 4500kbit rate:

Quake Champions

A graph gives a much better idea of the situation. x264 using the medium profile, gives a better quality encode than NVenc on my 1080Ti.

quake-champions-medium-3500.png
Figure 1: Quake Champions: NVenc vs x264 medium.
quake-champions-fast-3500.png
Figure 2: Quake Champions: NVenc vs x264 fast.

However, we're doing statistics, and using our eyes is not the thing to do, so, instead, in R, we will test to see if x264 is indeed better than NVEnc for quality encoding. To do this, we are going to do a T-Test. Rather than using our eyes, it will confirm our hypothesis if x264 is better than NVEnc for encoding.

t.test(quake_x264_1000$vmaf, quake_nvenc_1000$vmaf, paired = TRUE)
	Paired t-test

data:  quake_x264_1000$vmaf and quake_nvenc_1000$vmaf
t = 8.4274, df = 17999, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.2582942 0.4148613
sample estimates:
mean of the differences 
              0.3365778

Given this result, we can conclude that in this case, x264 is giving better visual quality than NVEnc.

Doom (2016) - First Level

Things are a little different with Doom (2016), this recording was taken from the first level on the surface of Mars.

doom-e1-medium-3500.png
Figure 3: Doom E1: NVenc vs x264 medium.
doom-e1-fast-3500.png
Figure 4: Doom E1: NVenc vs x264 fast.
t.test(doom_x264_1000$vmaf, doom_nvenc_1000$vmaf, paired = TRUE)
	Paired t-test

data:  doom_x264_1000$vmaf and doom_nvenc_1000$vmaf
t = -17.84, df = 18000, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.119254 -0.897653
sample estimates:
mean of the differences 
              -1.008454

This is a rare test where NVEnc comes out ahead of x264. I'm guessing (without any evidence to back it up, so take it with a grain of salt), that the foggy, red, style of that level is what gives the NVEnc encoder the edge. I will have to record footage of some other levels to see if the trend holds, or maybe it's the motion blur effects, or something else with how Doom renders the final image that makes it nicer for NVEnc.

doom-e1-nvenc-4500-frame.png
Figure 5: NVEnc performs better on the first Doom arena.
    Paired t-test

data:  doom_x264_1000$vmaf and doom_nvenc_1000$vmaf
t = -17.84, df = 18000, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.119254 -0.897653
sample estimates:
mean of the differences
              -1.008454

Fortnite

fortnite-medium-3500.png
Figure 6: Fortnite: NVenc vs x264 medium.
fortnite-fast-3500.png
Figure 7: Fortnite: NVenc vs x264 fast.
t.test(fortnite_x264_1000$vmaf, fortnite_nvenc_1000$vmaf, paired = TRUE)
	Paired t-test

data:  fortnite_x264_1000$vmaf and fortnite_nvenc_1000$vmaf
t = 79.388, df = 18000, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.382139 4.604007
sample estimates:
mean of the differences 
               4.493073

x264 comes out ahead using Fortnite, the graphic style of Fortnite must make the job of both video encoders easier overall, compared to Doom and Quake Champions.

Forza Demo - Spring

forza-medium-3500.png
Figure 8: Forza: NVenc vs x264 medium.
forza-fast-3500.png
Figure 9: Forza: NVenc vs x264 fast.
t.test(forza_x264_1000$vmaf, forza_nvenc_1000$vmaf, paired = TRUE)
	Paired t-test

data:  forza_x264_1000$vmaf and forza_nvenc_1000$vmaf
t = 23.509, df = 18000, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.9822489 1.1609362
sample estimates:
mean of the differences 
               1.071593

Forza, and I would hazard a guess, any game that has screen motion like racing sims, really pushes encoders, since the bitrate just isn't high enough to cope. However, again x264 comes out ahead of NVEnc.

Objective Quality?

In most of the NVEnc vs x264 comparisons I've seen, they don't use an objective measurement of encoding quality. They'll play a game, encode it using NVEnc or x264, stop, then switch to the other method and do a side-by-side comparison of the footage. This isn't ideal. Since the video data is different you're going to get slightly different results, and you can't objectively measure them.

However, there is at least one review I know of, that they do a correct method. Encode at lossless quality, then pipe that through your encoders so you have identical data… Well, almost. They used OBS and played the lossless video as a media source, then encoded off that. Again, you're getting slight variance… and still using fallible human organs while looking at the footage to compare.

Don't get me wrong, you still need to have eyes on the final result for a comparison, the audience is intended for humans after all. The best approach would be to have an objective measurement, then do a visual comparison at points of interest in order to make a better informed judgement.

There are objective measurements of quality. I won't go into what is available, because I'm just going to use one, VMAF, which is the measurement Netflix developed and uses. Basically, you feed in your original video and the video you encoded and it will rank the quality of each frame from 0 to 100. 0 being the lowest quality, 100 the highest. Doing some tests with humans ranking the quality, we have "bad", "poor", "fair", "good" and "excellent". This translated to something like 0 to 20, "bad", with a score of 70 being between "fair" and "good".

Armed with this, we can actually graph and compare encoders a bit more objectively. VMAF is what Netflix uses to gauge the quality of their encoding, since they encode so many videos you couldn't possibly have a human do it all, and they are all about having good quality for the lowest possible bandwidth.

Method

Alright, so we need a method to get frame accurate comparisons of two different encoders. Easy, we:

  1. Record our footage in a lossless format.
  2. Encode in NVEnc.
  3. Encode in x264.
  4. Use VMAF to compare encodings against the original lossless format.

The recording stage is easy, I play a game on my dual PC streaming setup, having my streaming PC encode to disk. I use MagicYUV as my lossless encoding format. For encoding, I use what all the streaming software uses in the background anyway, FFmpeg. FFmpeg is an opensource project that can convert to and from many different video and audio formats, it's the software that Netflix and YouTube for their video encoding, and what OBS, et al use.

obs-x264-encoder.png
Figure 10: x264 Advanced Encoder Settings are the FFmpeg command line options.
obs-ffmpeg.png
Figure 11: FFmpeg DLL in OBS log file.

So we can use FFmpeg as our encoder for x264 and NVEnc and then compare both against the original file using VMAF (which is convenient since VMAF can be compiled into FFmpeg).2

So, with this all in mind, we just have to get a bunch of video files, and use both encoders and different bitrates and see what the results are. I wrote up a script file using Rake3 and collected the results.

Footnotes:

1

First and foremost your stream should be about entertainment quality. Don't go spending any money on trying to get what may be a slight improvement over a cheaper option. Of course, this is a subject that has a bit more nuance, and it is something that is answered better by people are more qualified than me.

2

VMAF is opensource, and you can find it on Github at: https://github.com/Netflix/vmaf.

3

I had originally used Make, since I wanted to be able to have it generate any encodings I was missing automatically. It started to become a bit of a pain, so I switched to Rake. For no other reason that it was another Make-like build system and I had coded in Ruby years and years ago. At the time of writing, I haven't released this build file, because it still has some manual parts that I want to automate still.