What’s New in F5 NGINX Gateway Fabric 2.4.0

We’re excited to announce F5 NGINX Gateway Fabric 2.4.0 has been released. This release represents a major milestone in the Gateway API journey, with the addition of critical production features such as TCP/UDP routing support, Rate Limiting, Session Persistence and many more. The new features will help operators more efficiently and securely deliver AI and modern applications.

Summary of changes in this release

2.4.0 contains many new features and improvements, from a high level these are;

Rate Limiting support

Session Persistence with NGINX OSS and NGINX Plus support (Sticky Cookie), including the ability to change the load balancing method

Proxy Buffer support using the new ProxySettingsPolicy, with more NGINX directives to come

TCP and UDP Route

Enhanced TLS Listener configuration (Ciphers)

Support Multiple Inference Pool backends as part of the Gateway API Inference Extension

Additional Features

Namespace Filtering: Watch specific namespaces instead of cluster-wide, reducing resource consumption in large clusters

Custom Log Escape Formats: Specify escape format when customising data plane access logs

Upstream Keep-Alive: Enabled by default with 16 connections to reduce connection overhead configurable via UpstreamSettingsPolicy CRD

CRD Discovery: Improved compatibility with clusters running older Gateway API versions

Bug Fixes

Data Plane Stability: Fixed an issue where the data plane would unnecessarily restart when the control plane restarted

Memory Optimization: Agent collector logs now write to stdout instead of disk, resolving memory consumption issues

The following section takes a closer look at each of these important features and why it matters for our users.

Rate Limiting

Rate limiting is essential for API Gateway use-cases, and often the first safeguard developers implement when building and deploying APIs. With the new RateLimitPolicy API, you can now define rate limits declaratively as a Kubernetes policy and apply them directly to your routes. This brings NGINX’s powerful rate limiting capabilities into a Gateway API workflow, eliminating the need for manual NGINX configuration or custom annotations. Protect your services from traffic spikes, prevent abuse, and ensure fair resource allocation all through a simple, version-controlled policy.

Why it matters:

Rate limiting is essential for teams deploying API gateways. It ensures service stability during traffic spikes, helps mitigate DDoS attacks, and protects backend systems from overload. By controlling request volume, organisations maintain predictable performance and deliver a reliable experience to their users.

In addition, GPU allocations are expensive and often scarce. Unlike CPU workloads that scale elastically, inference capacity is constrained by hardware availability and cost. Rate limiting protects that investment by preventing any single client from monopolizing premium compute resources and by ensuring the most important workloads receive sufficient GPU compute.

Who it helps

Platform teams managing multi-tenant environments who need to enforce usage quotas.

API developers protecting services from malicious or misbehaving clients.

Security engineers adding a layer of defence against volumetric attacks

MLOps/AI engineers designing performant inference applications that need guaranteed resources for delivery

Infrastructure teams managing GPU costs who need to prevent runaway consumption from individual clients or services

Session Persistence – OSS & Plus via UpstreamSettingsPolicy CRD

While NGINX Gateway Fabric already supports basic session persistence today, you can now configure more granular cookie-based persistence. This is another important feature that is being added to this release. Again, we are using the power NGINX features in a Kubernetes deployment to make sure session loss is not a problem.

This release introduces flexible session persistence options for both NGINX OSS and NGINX Plus users:

IP Hash (OSS & Plus): Configure ip_hash via UpstreamSettingsPolicy to route clients to the same backend based on their IP address

Cookie-Based Persistence (Plus only): Enable sessionPersistence on HTTPRoute and GRPCRoute rules for more precise, cookie-based session affinity (experimental)

Why it matters:

Session persistence often called “sticky sessions” ensures requests from the same client are consistently routed to the same backend server. This is critical for applications that store session state locally, such as shopping carts, authentication flows, or multi-step forms. Without it, users may experience data loss or broken workflows when requests land on different backends.

For AI workloads, session affinity also reduces wasted GPU cycles. When requests scatter across backends, each instance may need to rebuild context or reload model state. Keeping sessions sticky avoids this redundant computation and makes better use of costly GPU time. This can also play a critical role in reducing “context bloat” or token overconsumption caused by reloading tool descriptions or other metadata frequently added at the beginning of a new session.

Who it helps:

E-commerce teams ensuring cart contents persist throughout a user’s session

Application developers building stateful services that rely on local session storage

Platform engineers migrating legacy stateful applications to Kubernetes

Teams running conversational AI or other long-context window applications that maintain context across multi-turn interactions

Authentication Filter: Basic Auth

Version 2.4.0 marks the beginning of NGINX Gateway Fabric’s authentication capabilities. The new AuthenticationFilter introduces support for HTTP Basic Auth, allowing you to protect routes with username and password credentials with no external identity provider required.

While Basic Auth is one of the simplest authentication methods, it remains valuable for internal tools, development environments, and scenarios where lightweight protection is sufficient. This is just the start; future releases will expand the AuthenticationFilter with additional authentication methods that are available in NGINX.

Why it matters:

Basic Auth provides a quick, low-friction way to secure routes when simplicity is more important than advanced security features. Having this built into the Gateway API means one less tool to deploy and manage.

Who it helps:

DevOps teams protecting internal dashboards and admin endpoints

Developers securing staging and test environments without complex auth setup

Platform engineers needing a lightweight option before implementing enterprise SSO

TCP routing and UDP routing

NGINX Gateway Fabric now supports TCPRoute and UDPRoute resources, extending the Gateway API beyond HTTP/HTTPS to Layer 4 traffic. This enables you to proxy non-HTTP workloads such as databases (PostgreSQL, MySQL, Redis), DNS servers, message queues, and IoT protocols all through the same gateway.

Why it matters:

Modern platforms often run a mix of HTTP APIs and non-HTTP services. Without Layer 4 support, teams are forced to deploy separate load balancers or ingress solutions for TCP/UDP workloads. With TCPRoute and UDPRoute, you can consolidate traffic management into a single, consistent Gateway API workflow simplifying operations and reducing infrastructure sprawl.

Who it helps:

Team exposing PostgreSQL, MySQL, or Redis through the gateway

Platform teams routing traffic to IoT devices

Teams routing traffic to vector databases, model registries, or custom inference protocols

Gateway API Inference Extention

This release adds support for multiple Inference Pool backends as part of the Gateway API Inference Extension. With this capability, a single HTTPRoute can now reference multiple InferencePools in its backendRefs, enabling traffic splitting across model variants, staged rollouts of new model versions, and routing across pools for capacity management.

Why it matters:

Real inference deployments rarely involve a single homogeneous pool. Teams often need to route across different model versions, split traffic between fine-tuned LoRA adapters, or distribute load across capacity tiers. Supporting multiple pools in a single route eliminates awkward workarounds and aligns NGF with production inference patterns. As GPU costs dominate AI infrastructure budgets, intelligent routing across inference pools becomes essential for maximizing utilization and controlling spend.

Who it helps:

ML teams managing multiple model versions or LoRA adapters in production

Platform engineers implementing canary or blue-green deployments for inference services

Infrastructure teams optimizing GPU utilization across capacity tiers

Wrap up and thinking ahead

This release strengthens F5 NGINX Gateway Fabric as a production-ready Gateway API implementation adding support for key use-cases such as Rate Limiting, Session Persistence, Proxy buffer configuration, TCP/UDP Routing and TLS Listeners configuration. These enhancements make it easier to run product grade high throughout applications while maintaining reliability under load. Looking forward to the next release, we will continue to expand authentication features including additional security, while staying conformant to the Kubernetes Gateway API. With GPU allocations commanding premium prices and limited availability, and GPU allocation remaining a challenge for AI inference workloads that are bursty and may require longer context capabilities, the ability to protect, manage, and efficiently route inference traffic is no longer optional.

We’d like to give a big thank you for the following contributions:

Gateway-level support for Snippets via SnippetsPolicy (#4461). Thanks to @fabian4 for the implementation.

Support for TCPRoute and UDPRoute (#4518). Thanks to @Skcey for championing this capability and helping make NGINX Gateway Fabric more enterprise-ready for Layer 4 use cases.

Configurable escape formats for data plane access logs (#4530). Thanks to @michasHL for this enhancement.

If you have any questions on this release of NGINX Gateway fabric in general, we are happy to help. You can contact us on Slack or via email.

Release notes and resources

NGF 2.4.0 release (GitHub): Release notes (GitHub)

NGF documentation changelog: Changelog (NGINX Documentation)