Enabling Customizable GPU-Accelerated Video Transcoding Pipelines

Today, over 80% of internet traffic is video. This content is generated by and consumed across various devices, including IoT gadgets, smartphones, computers, and TVs. As pixel density and the number of connected devices grow, continued investment in fast, efficient, high-quality video encoding and decoding is essential.

The latest NVIDIA data center GPUs, such as the NVIDIA L40S and NVIDIA L4 Tensor Core, address demanding use cases, including AI training, inference, visual computing, cloud gaming, and video transcoding. By combining multiple NVIDIA video decoding (NVDEC) and video encoding (NVENC) video engines with advanced computing capabilities, these GPUs help partners accelerate and customize their transcoding pipelines.

V-Nova has ported their implementation of the MPEG-5 Part 2 Low-Complexity Enhancement Video Coding (LCEVC) standard to NVIDIA GPUs. LCEVC enhances existing video coding standards, leveraging NVENC video engines and the computational power of NVIDIA Ada architecture GPUs. This results in improved visual quality and spatial scalability, enabling video providers to build efficient transcoding ladders. These ladders are crucial for maintaining the best video quality across varying network conditions and diverse end devices.

This post demonstrates how NVIDIA technology enables highly efficient and customizable video transcoding pipelines. First, it provides an overview of LCEVC and its use of NVENC and the NVIDIA Video Codec SDK. A comparison of visual quality and performance against CPU-based implementations follows. Last, we highlight the cost-effectiveness of a joint transcoding solution using cloud computing instances.

Enhancing video coding standards with MPEG-5 Part 2 LCEVC

MPEG-5 LCEVC is a “codec enhancer” that boosts compression efficiency of any video codec, delivering higher quality at up to 40% lower bitrates while reducing the overall computational complexity compared to encoding at full resolution with the base codec.

LCEVC employs a hybrid-coding approach with two layers: a base layer and an enhancement layer (Figure 1). The base layer is compressed at a lower resolution, typically a quarter of the target resolution. The enhancement layer refines the video quality by encoding the residual information, that is, the difference between the upscaled base layer and the original frame.

Figure 1. LCEVC encode and decode flow

The low complexity of LCEVC enables efficient, high-performance hardware and software encoder and decoder implementations. V-Nova, in cooperation with NVIDIA, developed an LCEVC encoder for NVIDIA GPUs. Tight integration with NVENC ensures optimal video compression performance with minimal GPU resource usage. Key use cases include ultra-low-latency pixel streaming for VR/XR, cloud gaming, and dense transcoding for video streaming. In these scenarios, LCEVC reduces bandwidth and delivery costs while significantly improving service quality. Although this post focuses on LCEVC-enhanced HEVC, H.264/AVC is also supported and AV1 is on the roadmap.

Integrating NVENC and LCEVC for low-latency and latency-tolerant encoding

NVIDIA Video Codec SDK releases 12.1 and 12.2 introduce several new low-level APIs to enable higher degrees of control and flexibility for NVIDIA customers. These APIs add support for reconstructed frame output and access to encoder statistics, designed to maximize performance and ease of integration. Additionally, new encoding tools have been added for both low-latency (LL) and latency-tolerant encoding use cases that enable improved visual quality for NVENC.

Figure 2 illustrates the NVENC and LCEVC encoder integration.

Figure 2. Integration of NVENC with LCEVC encoder

The following functionalities were essential to combine LCEVC GPU implementation with NVENC video encoding engines:

Reconstructed frame output API: During frame encoding, the NVENCODE API provides both the compressed frame and the reconstructed frame available in device memory. This enables LCEVC to upsample and encode the reconstructed frame directly on the GPU, eliminating the need for memory copies over PCI Express or decoding the encoded bitstream to obtain reconstruction for further processing.

Encoder statistics API: Rate control poses a significant challenge for highly efficient LCEVC encoder implementations. Managing two layers instead of one, the LCEVC rate control requires detailed information about the base layer encoding process. LCEVC leverages the encoder statistics API to access per block QP and bit count.

New video coding tools: Among other additions, unidirectional B-frames and ultra high quality (UHQ) tuning information improve the compression efficiency for low-latency and latency-tolerant use cases, respectively. These new tools are beneficial not only when using NVENC alone but also in combination with LCEVC—that is, as its base layer.

Benchmarking CPU versus GPU video encoding

This section benchmarks CPU and GPU transcoding pipeline implementations, focusing on video compression and encoding speed. We examine two use cases: ultra-high-quality (UHQ) natural video transcoding and low-latency encoding for cloud gaming and pixel streaming. The tests include Full HD and UHD encoding at various streaming-relevant bitrates, comparing native HEVC encoders (NVENC HEVC and x265) with their LCEVC-enhanced versions. The comparison assesses quality and cost for both CPU and GPU pipelines.

Methodology

The tests run on comparable cloud computing instances with and without an NVIDIA GPU. Table 1 lists the hardware and encoding configurations for both cloud computing instance types. Table 2 shows the testing content per use case.

Hardware and CPU/GPU encoder configurationCloud computing instanceCPU: AMD EPYC 9R14 – 16 vCPUGPU: N/ACPU: AMD EPYC 7R13 – 8 vCPUGPU: NVIDIA L4 – 2x NVENCCost/h$0.88$0.98Encodersx265, LCEVC (CPU) x265NVENC HEVC, LCEVC (GPU) NVENC HEVCPresetMediumP4Table 1. Hardware and CPU/GPU encoder configurations (cost/h at time of writing)

Testing content and use casesUse cases/TuneLow Latency (LL)Latency-tolerant/Ultra High Quality (UHQ)Content typeGaming and NaturalNaturalInput videos11 videosResolution/Bitrate1080p60 (4, 7, 12, and 15 Mbps) and 2160p60 (12, 15, 22, and 30 Mbps)Table 2. Testing content and use cases

All encoders were tested using FFmpeg version 6.1. To ensure fair comparison, NVENC HEVC and x265 configurations closely match, aligning GOP size, number of B frames, and lookahead depth. Furthermore, to maximize hardware utilization, FFmpeg encodes multiple streams in parallel. See the full command lines used for testing.

The average per-stream encoding frames per second (FPS) as reported by FFmpeg has to be multiplied by the number of streams to derive the total FPS. To then determine the cost of encoding one hour of video, we combine this FPS with the respective instance cost. For LCEVC x265 and LCEVC NVENC HEVC, the same process was repeated.

While the latest version of LCEVC NVENC HEVC already shows strong performance, we expect further improvements as development continues. Therefore the presented performance evaluation should be considered conservative.

Visual quality results

To evaluate visual quality (VQ), we used a diverse set of video sequences, including natural scenes (for UHQ and LL) and gaming content (for LL), ensuring comprehensive testing. The collection of videos, which includes five in Full HD (1080p) and six in UHD (2160p) resolution. We assessed visual quality across various encoding scenarios with target bitrates of 4-15 Mbps for 1080p and 12-30 Mbps for UHD videos.

We used the Video Multi-Method Assessment Fusion (VMAF, using libvmaf 3.0.0) metric for its high correlation with MOS scores (subjective quality assessments). Additionally, VMAF No Enhancement Gain (VMAF-NEG) was calculated to account for potential image enhancement biases. LCEVC aims to maximize subjective visual quality. Several subjective quality assessments have been conducted by independent third parties for example during the MPEG standardization process. These results include BD-RATE (MOS) versus AVC, HEVC, and VVC.

Figures 3 through 6 present the rate-distortion (RD) curves, averaged across test content, separated by resolution, tune, and objective metric. These results demonstrate that:

NVENC HEVC (red solid line) consistently achieves higher coding efficiency than x265 (red dashed line).

Both NVENC HEVC and x265, when enhanced with LCEVC (green lines), outperform their native counterparts.

LCEVC consistently achieves bitrate savings for VMAF and VMAF-NEG.

Figure 3. 2160p60 tune LL rate-distortion curves comparing x265, LCEVC x265, NVENC HEVC, and LCEVC NVENC HEVC

Figure 4. 1080p60 tune LL rate-distortion curves comparing x265, LCEVC x265, NVENC HEVC, and LCEVC NVENC HEVC

Figure 5. 2160p60 tune UHQ rate-distortion curves comparing x265, LCEVC x265, NVENC HEVC, and LCEVC NVENC HEVC

Figure 6. 1080p60 tune UHQ rate-distortion curves comparing x265, LCEVC x265, NVENC HEVC and LCEVC NVENC HEVC

Table 3 shows a comparison between the bitrate savings achieved when adding LCEVC to NVENC HEVC and to x265. From this table, we reach the following conclusions:

BD-rate improvements of LCEVC NVENC HEVC versus NVENC HEVC are similar or higher to those of LCEVC x265 versus x265.

This efficiency is achieved through the V-Nova tight integration of LCEVC with NVENC, using NVENCODE APIs to provide reconstructed frames and encoder statistics for LCEVC’s rate control.

ResolutionTuneBD-RATE LCEVC NVENC HEVC versus NVENC HEVCBD-RATE LCEVC x265 versus x265VMAFVMAF-NEGVMAFVMAF-NEG2160p60UHQ-35.30%-18.74%-33.21%-20.04%LL-30.65%-16.53%-19.85%-7.92%1080p60UHQ-22.39%-4.42%-23.61%-7.29%LL-25.04%-11.44%-19.82%-9.06%Table 3. BD-RATE (VMAF/VMAF-NEG) between GPU encoders (NVENC HEVC and LCEVC NVENC HEVC) and CPU encoders (x265 and LCEVC x265)

Performance and cost results

Using the same command lines as the visual quality tests, we measured the encoding speed of CPU and GPU pipelines for low-latency and ultra high quality tunes, both for native HEVC encoders and their LCEVC-enhanced versions. From that, we derive the instance cost to encode 1 hour of video.

The results in Figure 7 show:

GPU-accelerated HEVC encoding is 2x-4x cheaper than CPU-based x265 for LL and UHQ tunes across resolutions with and without LCEVC.

LCEVC accelerates both CPU and GPU implementations, especially at higher resolutions and higher quality settings.

Combining NVIDIA NVENC with the LCEVC GPU encoder provides the highest throughput and thus the lowest cost of service at significantly improved visual quality.

Figure 7. Cost per hour of encoded video (USD) for LL and UHQ use cases for GPU video encoders (NVENC HEVC and LCEVC NVENC HEVC) and CPU encoders (x265 and LCEVC x265)

NVIDIA and V-Nova both have provided enhancements to performance and visual quality for their solutions through software updates. Because of its low-complexity, software-enhanced nature, V-Nova intends to further improve LCEVC encoding performance for existing hardware and thus cost.

Summary

The joint NVENC plus LCEVC transcoding solution, developed through the collaboration between NVIDIA and V-Nova, highlights how NVENCODE APIs enable tight integration with the LCEVC GPU encoder. This enables customers to combine the hardware-accelerated NVENC from NVIDIA with the LCEVC multi-layered codec enhancement from V-Nova. This integration improves visual quality, increases throughput, and reduces costs compared to CPU-based solutions.

Ready to get started? Download the NVIDIA Video Codec SDK and the LCEVC GPU encoder.