spot_img
2.8 C
London
HomeHIGH ENDPowering the Next Wave of DPU-Accelerated Cloud Infrastructures with NVIDIA DOCA Platform...

Powering the Next Wave of DPU-Accelerated Cloud Infrastructures with NVIDIA DOCA Platform Framework

Organizations are increasingly turning to accelerated computing to meet the demands of generative AI, 5G telecommunications, and sovereign clouds. NVIDIA has unveiled the DOCA Platform Framework (DPF), providing foundational building blocks to unlock the power of NVIDIA BlueField DPUs and optimize GPU-accelerated computing platforms. Serving as both an orchestration framework and an implementation blueprint, DPF enables developers, service providers, and enterprises to seamlessly create BlueField-accelerated, cloud-native software platforms.

By simplifying DPU provisioning, lifecycle management, and service orchestration, DPF makes BlueField DPUs broadly accessible in Kubernetes environments for accelerating AI and other modern workloads. Additionally, DPF fortifies the vibrant ecosystem of BlueField-accelerated applications and services, fueling advancements in scalable cloud platforms.

Addressing a key gap in cloud infrastructure

NVIDIA’s commitment to the CPU-GPU-DPU trifecta is well known, and with the introduction of DPF, NVIDIA is taking a bold leap forward in the DPU aspect of this architecture. DPF marks an important step toward a more modern cloud infrastructure, helping to redefine how BlueField DPUs are integrated into data centers to address key challenges in performance, efficiency, and security.

NVIDIA BlueField DPUs already offer a high-performance, scalable alternative to traditional, CPU-centric infrastructure, offloading critical networking, storage, and security functions from host CPUs to accelerate data center operations. However, until now, managing DPU-driven services at data-center scale has been a fragmented and cumbersome process.

This is where DPF comes in: a dedicated framework that simplifies the deployment, orchestration, and scaling of BlueField-accelerated cloud infrastructure. DPF extends Kubernetes control plane functionality to DPUs, enabling admins to deploy and orchestrate both NVIDIA DOCA services and third-party, DOCA-based services directly on BlueField DPUs. 

Equipped with a purpose-built SDK for seamless integration, DPF offers developers a consistent, modular toolkit to easily manage software across BlueField DPU fleets. This reduces time and complexity, enabling developers to focus on building robust software platforms and high-impact applications rather than managing DPU software orchestration. 

Additionally, DPF plays a crucial role in the ecosystem by enabling infrastructure independent software vendors (ISVs) to build and integrate BlueField applications with confidence. By providing standardized APIs and tools, DPF ensures that these applications operate seamlessly on BlueField-accelerated infrastructure. This, in turn, also benefits service providers and enterprises, enabling them to leverage a robust portfolio of accelerated services to build high-performance, secure, and efficient cloud platforms.

To simplify and streamline DPU management for cloud-native environments, DPF addresses two primary workflows:

  • DPU provisioning and lifecycle management: Covers the initial steps to deploy BlueField DPUs, including firmware and software installation and configuration, and ongoing maintenance tasks.
  • DPU service management and orchestration: Involves deploying and managing infrastructure services such as SDN controller software, storage target software, firewall, load-balancers, and more, including service function chaining.

Efficient DPU provisioning and lifecycle management

DPF provides end-to-end support for BlueField DPU provisioning and lifecycle management, automating processes like firmware updates, flashing, and configuration to streamline setup and reduce downtime. Key tasks such as provisioning, configuration, monitoring, and troubleshooting are simplified, making it easier to integrate and operate BlueField DPUs at scale.

DPF maintains an updated state for each BlueField across the data center, enabling dynamic responsiveness to DPU health. When a DPU requires maintenance, DPF can proactively drain the node in a controlled manner, minimizing or eliminating impact to active production workloads. Through rolling update capabilities, admins can control batch updates by specifying a percentage of BlueField DPUs to update at a time, avoiding mass updates that could impact system stability. Real-time health monitoring and alerting equip admins to rapidly identify and address issues, essential for high-reliability environments like telecom and AI-powered data centers.

Through exposed APIs and Custom Resource Definitions (CRDs), DPF automates the BlueField DPU lifecycle, enabling cloud operators to manage BlueField-bound services from their standard K8s control plane, providing a unified “single pane of glass” view and control over both K8s worker nodes and DPUs. 

The DPF implementation blueprint is based on upstream Kubernetes, allowing technology partners to adapt and scale the framework for diverse infrastructure requirements and enterprise products.

Comprehensive DPU service management and orchestration

DPF brings a new level of sophistication to cloud-native environments by enabling seamless integration of BlueField DPUs into Kubernetes-based workflows. By introducing a dedicated, secondary Kubernetes control plane, DPF empowers admins to efficiently manage NVIDIA DOCA services and third-party, DOCA-based applications deployed on BlueField DPUs. The DPF Operator manages this secondary DPU Kubernetes control plane autonomously, overseeing all aspects of service deployment, monitoring, and lifecycle management. 

DPF is designed to abstract the DPU management complexity from admins interacting only with the primary Kubernetes control plane using familiar Kubernetes constructs, eliminating any need to directly manage the DPU control layer. DPF also provides flexibility for ISVs, enabling them to implement their own Kubernetes control plane for customized BlueField service management and orchestration.

By optimizing service orchestration across a fleet of BlueField DPUs, DPF simplifies the deployment and management of complex, distributed workloads. With robust lifecycle management capabilities, DPF supports seamless service updates, scaling, and rollbacks, ensuring that admins can efficiently manage changes without disrupting ongoing operations. Combined with DOCA service function chaining (SFC), DPF facilitates secure, efficient chaining of services—such as accelerated networking (CNIs), high-performance data services (CSIs), and firewall functions—to handle complex, multi-step tasks. 

To ensure smooth deployment, DPF provides predeployment verification, confirming the DPU can host required services and returning meaningful error messages when requirements aren’t met. Additionally, DPF offers monitoring and debuggability features to help admins manage and troubleshoot services in real-time, making it easier to achieve high reliability and transparency.

Through DPF, admins gain intuitive, cloud-native tools for provisioning, managing, and orchestrating services on BlueField DPUs. This seamless integration with existing Kubernetes workflows accelerates time-to-deployment for advanced BlueField-accelerated applications across sectors such as telecommunications, cloud, and enterprise environments.

Modular architecture fosters ease of integration 

DPF is designed with a modular architecture that simplifies integration and enables tailored functionality for BlueField-accelerated infrastructures. This flexible design is built on a collection of core components and tools, giving developers, service providers, and enterprises a streamlined approach to provisioning and managing BlueField DPUs within cloud-native environments.
Figure 1 illustrates the DPF software stack, highlighting DPF functions operating on both the host and BlueField DPU. It also includes various infrastructure software services for networking, storage, and security, some of which expose accelerated IO interfaces to containerized workloads through Kubernetes plugins (CNI and CSI).

NVIDIA DOCA Platform Framework stack diagram.
Figure 1. NVIDIA DPF stack

These tools and services, provided through containers, Helm charts, and an implementation blueprint, equip developers with everything needed to integrate and build on DPF.

DPF Operator

At the heart of the DPF orchestration layer is the DPF Operator, which automates DPU provisioning, lifecycle management, and service orchestration. It provides Kubernetes users with a familiar cloud-native interface, simplifying complex configurations and enabling BlueField DPUs to be deployed and managed just like other cluster resources. With built-in support for automated updates and resource management, the DPF Operator makes it easy to deploy and maintain BlueField DPUs in production environments.

DOCA for Host

The DOCA for Host software supplies a comprehensive set of provisioning tools that streamline the deployment and configuration of BlueField DPUs. DOCA for Host handles the firmware, BIOS, and system configurations needed to integrate the DPU with the host environment, ensuring a consistent and reliable setup across deployments.

OVS-DOCA

OVS-DOCA serves as the core networking stack within DPF, facilitating secure, high-performance network connectivity for BlueField-accelerated applications. It provides advanced networking functions and efficient traffic routing within Kubernetes environments, ensuring that BlueField resources can be fully utilized without compromising on performance or security. This foundation enables developers to build high-throughput, latency-sensitive applications with ease.

DOCA Services

A curated set of DOCA services hosted on NVIDIA NGC enhances the capabilities of the BlueField DPU, with DPF providing the tools to fetch and deploy these services directly on the BlueField as part of the Kubernetes cluster. These ready-to-use services—covering advanced monitoring, networking, storage, security, and more—expand BlueField functionality, enabling rapid deployment of critical services. Through NVIDIA NGC, users gain seamless access to an expanding repository of NVIDIA-certified services and applications that fully integrate with DPF. The initial DPF release includes HBN, OVN-Kubernetes, Telemetry, and BlueMan as the first set of DOCA services, with subsequent releases set to introduce support for additional services to further enhance functionality and expand integration capabilities.

In addition to NVIDIA services, DPF orchestrates third-party DOCA services that bring specialized functionalities to the BlueField environment. From network security solutions to load balancing and firewall applications, third-party services enable users to create a robust ecosystem tailored to their specific needs. By embracing an open, modular architecture, DPF fosters collaboration with service vendors, providing users with a wider range of functionality, and flexibility.

DPF empowers developers with the tools and services they need—packaged in containers, Helm charts, and an implementation blueprint—to easily integrate with DPF and build, customize, and deploy advanced BlueField-accelerated software platforms.

Lead the future of DPU-accelerated cloud computing with DPF

The NVIDIA DOCA Platform Framework (DPF) redefines cloud infrastructure for BlueField-accelerated environments, transforming how cloud services are provisioned and managed. In addition, the NVIDIA DPF roadmap signals exciting capabilities on the horizon. Upcoming features will bring zero-trust capabilities to bare-metal, BlueField-accelerated infrastructures, securing environments from the hardware layer up.

Developers, telcos, and enterprises are encouraged to explore the capabilities of DPF, download the blueprint, and experiment with building applications optimized for high-performance and scalable infrastructures. Get started with DPF today and lead the future of BlueField-accelerated cloud infrastructure.

latest articles

explore more