Performance Tuning – Tips & Tricks

by

This blog post includes contributions from several NGINX team members, including Valentin Bartenev and Nick Shadrin.

Over the past few years, I’ve worked with a handful of partners for whom NGINX Plus performance was the primary concern. The conversation typically starts with their difficulty matching our published performance benchmarks. The challenge usually results from the partner jumping straight to a fixed use case, such as using existing SSL keys or targeting very large file payloads, and then seeing sub‑par performance from NGINX Plus.

To a certain degree, this is expected behavior. I always like to explain to partners that as a software component, NGINX Plus can run at near line‑rate speeds on any hardware that’s available to it when dealing with the most basic HTTP use case. In order to hit our published numbers with specific use cases, though, often NGINX Plus benefits from changes to its configuration as well as tweaks to low‑level OS and hardware settings.

In every case to date, our partners have been able to achieve the theoretical performance numbers with very unique use cases simply by focusing on the components of the OS and hardware settings which need to be configured to match their use case and with how NGINX Plus interacts with those components.

Over the years, I’ve compiled this list of NGINX configuration, OS, and hardware tips, tricks, and tweaks. It’s intended to help our partners and customers achieve higher performance of both NGINX Open Source and NGINX Plus for their specific use cases.

This document is only a guide to a subset of configuration settings that can impact performance; it’s not an exhaustive list, nor it is necessarily appropriate to change every setting discussed below in your environment.

Note: We have revised this blog post since its original publication, and are reviewing it further to make it as strong as possible. Please add your own suggested changes and additions in the Comments below; we’ll incorporate what we can into the blog post.

The Tuning Workflow

I generally recommend the following workflow when tackling performance‑tuning issues:

  1. Test NGINX Plus performance in the most generic HTTP use case possible. This allows you to set the correct benchmarking baseline for your particular environment.

  2. Identify your specific use case. If, for instance, your application requires large file uploads, or if you’re dealing with high‑security large SSL key sizes, define the end‑goal use case first.

  3. Configure NGINX Plus for your use case and retest to determine the delta between theoretical performance in your environment and real‑world performance with your use case.

  4. Tweak one setting at a time by focusing on the settings that most apply to your use case. In other words, don’t change a bunch of systemctl settings at the same time you add new NGINX directives. Start small, with the features that are most applicable to your use case. For example, if high security is critical for your environment change SSL key types and sizes first.

  5. If the change doesn’t impact performance, revert the setting back to the default. As you progress through each individual change, you’ll start to see a pattern where related settings tend to affect performance together. This allows you to home in on the groups of settings that you can later tweak together as needed.

It’s important to note that every deployment environment is unique and comes with its own networking and application performance requirements. It might not be advisable to change some of these values in production. Results of any configuration tweaks outlined below can result in dramatically different results based on the application type and networking topology.

With NGINX having such strong roots in the open source community, many people over the years have contributed back to the performance conversations. Where applicable, I’ve included links to external resources for specific performance‑tuning suggestions from people who have already battle‑tested many of these solutions in production.

Tuning NGINX Configuration

Please refer to the NGINX reference documentation for details about supported values, default settings, and the scope within which each setting is supported.

SSL

This section describes how to remove slow and unnecessary ciphers from OpenSSL and NGINX.

When SSL performance is paramount, it’s always a good idea to try different key sizes and types in your environment, finding the correct balance for your specific security needs between longer keys for increased security and shorter keys for faster performance. An easy test is to move from more traditional RSA keys to Elliptical Curve Cryptography (ECC), which uses smaller key sizes and is therefore computationally faster for the same level of security.

To generate quick, self‑signed ECC P‑256 keys for testing, run these commands:

# openssl ecparam -out ./nginx-ecc-p256.key -name prime256v1 -genkey
# openssl req -new -key ./nginx-ecc-p256.key -out ./nginx-ecc-p256-csr.pem -subj ‘/CN=localhost’
# openssl req -x509 -nodes -days 30 -key ./nginx-ecc-p256.key -in ./nginx-ecc-p256-csr.pem -out ./nginx-ecc-p256.pem

Compression

Gzip parameters provide granular control over how NGINX delivers content, so setting them incorrectly can decrease NGINX performance. Enabling gzip can save bandwidth, improving page load time on slow connections. (In local, synthetic benchmarks, enabling gzip might not show the same benefits as in the real world.) Try these settings for optimum performance:

  1. Enable gzip only for relevant content, such as text, JavaScript, and CSS files
  2. Do not increase the compression level, as this costs CPU effort without a commensurate increase in throughput
  3. Evaluate the effect of enabling compression by enabling and disabling gzip for different types and sizes of content

More information on granular gzip control can be found in the reference documentation for the NGINX gzip module.

Connection Handling

There are several tuning options related to connection handling. Please refer to the linked reference documentation for details on proper syntax, applicable configuration blocks (http, server, location), and so on.

  • accept_mutexoff – All worker processes are notified about new connections (the default in NGINX 1.11.3 and later, and NGINX Plus R10 and later). If enabled, worker processes accept new connections by turns.

    We recommend keeping the default value (off) unless you have extensive knowledge of your app’s performance and the opportunity to test under a variety of conditions, but it can lead to inefficient use of system resources if the volume of new connections is low. Changing the value to on might be beneficial under some high loads.

  • keepalive128 – Enables keepalive connections from NGINX Plus to upstream servers, defining the maximum number of idle keepalive connections preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed. Without keepalives, there is more overhead and both connections and ephemeral ports are not used efficiently.

    For HTTP traffic, when you include this directive in an upstream block you must also include the following directives in the configuration so that they apply to all location blocks that proxy traffic to that upstream group (you can place them in individual such location blocks, in the parent server blocks, or at the http level):

    • proxy_http_version1.1 – NGINX Plus uses HTTP/1.1 for proxied requests
    • proxy_set_headerConnection"" – NGINX Plus strips any Connection headers from the proxied request
  • multi_acceptoff – A worker process accepts one new connection at a time (the default). If enabled, a worker process accepts all new connections at once.

    We recommend keeping the default value (off), unless you’re sure there’s a benefit to changing it. Start performance testing with the default value to better measure predictable scale.

  • proxy_bufferingon – NGINX Plus receives a response from the proxied server as soon as possible, and buffers it (the default). If disabled, NGINX Plus passes the response to the client synchronously, as soon as it is received, which increases the load on NGINX Plus.

    Disabling response buffering is necessary only for applications that need immediate access to the data stream.

  • listen80reuseport – Enables port sharding, which means an individual listening socket is created for each worker process (using the SO_REUSEPORT socket option), which allows the kernel to distribute incoming connections among worker processes. For details, see Socket Sharding in NGINX Release 1.9.1 on our blog.

Logging

Logging is an important tool for managing and auditing your system. Logging large amounts of data, and storing large logs, can strain system resources, but we recommend that you disable logging only in very specific cases or for performance troubleshooting.

  • access_logoff – Disables access logging.
  • access_log/path/to/access.logmainbuffer=16k – Enables buffering to access logs.

You may benefit from a centralized logging system based on the syslog protocol, available from many open source projects and commercial vendors. If you need metrics (which aggregate information initially recorded in logs) for NGINX and NGINX Plus servers, you can use NGINX Amplify.

Thread Pooling

Thread pooling consists of a task queue and a number of threads that handle the queue. When a worker process needs to do a potentially long operation, instead of processing the operation by itself, it puts a task in the pool’s queue, from which it can be taken and processed by any free thread.

To enabling thread pooling, include the aiothreads directive. Note that the way thread pools are managed can be affected by other buffer‑related configuration settings. For complete information on tweaking other settings to support thread pooling, see our blog.

CPU Affinity

CPU affinity is used to control which CPUs NGINX Plus utilizes for individual worker processes (for background information, see the reference documentation for worker_cpu_affinity).

In most cases, we recommend the default auto parameter worker_processes directive; it sets the number of worker processes to match the number of available CPU cores.

However, when NGINX Plus is running in a containerized environment such as Docker, a system admin might chose to assign fewer cores to the container that are available on the host machine. In this case, NGINX Plus detects the number available on the host, and rotates workers among the cores that are actually available within that container. In that case, reduce the number of workers by setting worker_processes to the number of cores available in the container.

Testing CPU Affinity

It’s always best to load NGINX Plus with traffic similar to your production traffic. However, for basic testing, you can use a load generator such as wrk, as described here.

Load NGINX Plus with a quick wrk session:

# wrk -t 1 -c 50 -d 20s http://localhost/1k.bin

If necessary, you can create a simple 1k.bin file for testing with:

# dd if=/dev/zero of=1kb.bin bs=1024 count=1

Run top in CPU view mode (by pressing 1 after top starts).

You can repeat the test with different numbers of processes and affinity bindings to see the linear scale. That’s an effective way to set the access limit to the appropriate subset of available cores.

Sizing Recommendations

Here’s a very rough sizing approximation for general web serving and load balancing. The values might not be as approriate for VOD streaming or CDNs.

CPU

Allocate 1 CPU core per 1–2 Gbps of unencrypted traffic.

Small (1–2 KB) responses and 1 response per connection increase CPU load.

RAM

Allocate 1 GB for OS and other general needs.

The rest is divided among NGINX Plus buffers, socket buffers, and virtual memory cache, with a rough estimate of 1 MB per connection.

Details

  • proxy_buffers (per connection)
  • proxy_buffers size should be chosen to avoid disk i/o. If response size is larger than (proxy_buffers size + proxy_buffer_size) the response may be written to disk, thus increasing I/O, response time, etc.

Shared Memory Zones

On the surface, zones are used to store data shared by multiple upstream servers, such as status, metrics, cookies, healthchecks, etc.

Zones can also affect how NGINX Plus distributes load between various components such as worker processes, however. For full documentation on what components a zone stores and effects, please refer to the NGINX Plus Admin Guide.

It’s not possible to prescribe exact settings because usage patterns differso widely. Each feature, such as session persistence with the sticky directive, health checks, or DNS re‑resolving affects the zone size. As an example, with the stickyroute session‑persistence method and a single health check enabled, a 256‑KB zone can accommodate information about the indicated number of upstream servers:

  • 128 servers (each defined as an IP‑address:port pair)
  • 88 servers (each defined as hostname:port pair where the hostname resolves to a single IP address)
  • 12 servers (each defined as hostname:port pair where the hostname resolves to multiple IP addresses)

When creating zones, it’s important to note that the shared memory area is controlled by the name of the zone. If you use the same name for all zones, then all data from all upstreams will be stored in that zone. In this case, the size may be exceeded.

Disk I/O

The limiting factor for disk I/O is a number of I/O operations per second (iops).

NGINX Plus depends on disk I/O and iops for a number of functions, including logging and caching.