gRPC Performance: How to Not Get Held Up by the Concurrent Stream Limit
1. What Is HTTP/2 and Why Does gRPC Use It?
Traditional HTTP/1.1 is a simple request-response protocol: you send one request, wait for a response, then send the next. This is fine for loading web pages, but it is very slow for high-performance communication between services.
HTTP/2 solved this by introducing multiplexing: multiple requests and responses can travel over the same single TCP connection at the same time. Each individual request-response pair travels inside something called a stream.
gRPC uses HTTP/2 as its transport protocol. This is one of the main reasons gRPC is so fast. You do not need a new TCP connection for every call — all calls share one connection.
Real-world analogy: HTTP/1.1 is like a single-lane road: cars must travel one at a time. HTTP/2 is like a multi-lane highway: many cars travel at once on the same road. The "streams" are the lanes.
2. What Is the Concurrent Stream Limit?
Even though HTTP/2 allows many streams on one connection, there is a limit on how many can be active at the same time. This is called the concurrent stream limit.
The default value is 100 concurrent streams per connection. This means:
- If you send 100 gRPC calls at the same moment, all 100 travel to the server simultaneously.
- If you send a 101st call while 100 are still in progress, the 101st call must wait in a queue until one of the first 100 finishes.
For most applications, 100 concurrent calls is more than enough. But for high-throughput applications — for example, a service that fires off thousands of concurrent gRPC calls simultaneously — this limit becomes a bottleneck.
Real-world analogy: Imagine a highway with 100 lanes. A maximum of 100 cars can drive at once. If a 101st car arrives, it must park at the entrance and wait for one of the current 100 cars to exit before it can enter. If you keep adding more cars than the highway can hold, a queue builds up at the entrance.
3. When Does the Limit Hurt You?
This limit mainly affects you when you make many concurrent (parallel) gRPC calls, not sequential ones. Consider the difference:
Sequential calls (one after the other): Only one stream is active at any given time. You will never hit the limit of 100.
Concurrent calls (fired all at once): Many streams are active simultaneously. If you fire 500 calls at once, 400 of them will be queued.
In the second example, if only 100 can run at a time, the remaining 400 form a queue. The queue adds latency and reduces throughput.
4. Why Raising the Limit on the Server Is the Wrong Fix
Your first instinct might be: "just configure the server to allow 500 or 1,000 concurrent streams." This is possible in ASP.NET Core's Kestrel server, but it is not recommended. Here is why:
- Connection packet loss: HTTP/2 connections use a congestion control mechanism. If you push too many streams over a single connection, the network itself can start dropping packets, causing all streams (including the ones currently making progress) to slow down or stall.
- Thread contention: Multiple threads compete to write to the same underlying TCP connection. The more concurrent streams there are, the more threads fight for their turn to write, causing CPU overhead and delays.
- One bad connection blocks everything: All your streams share one TCP connection. If that connection experiences a problem (packet loss, timeout), every active stream is affected simultaneously.
Raising the server limit does not solve the root problem — it just delays the pain and introduces new problems.
5. The Right Fix: Enable Multiple HTTP/2 Connections on the Client
The correct solution is to configure the gRPC client to open a new HTTP/2 connection when the current one reaches its concurrent stream limit. This way:
- Each HTTP/2 connection handles up to 100 streams.
- When the 101st stream is needed, a second TCP connection is opened automatically.
- The 101st stream travels on connection #2, and connections #1 and #2 both run at full capacity.
- When connection #2 is also full, a third is opened, and so on.
This is done with a single setting on the SocketsHttpHandler: EnableMultipleHttp2Connections = true.
Real-world analogy: Instead of making the existing 100-lane highway wider (which causes its own engineering problems), you build a second parallel highway. When the first highway is full, cars automatically use the second one. When that is full too, a third highway opens.
6. Example: Single Connection (Default Behavior)
This is what a standard gRPC client looks like without any concurrency configuration. It uses only one HTTP/2 connection no matter how many concurrent calls are made:
If count is 200, then 200 tasks are fired simultaneously on a single connection. Since the connection can only handle 100 at a time, the other 100 wait in a queue. This causes unnecessary delays.
7. Example: Multiple Connections (Optimized)
The optimized version is almost identical, with one key difference: the GrpcChannelOptions are passed with a custom SocketsHttpHandler that has EnableMultipleHttp2Connections set to true:
With this setting, when calls 101 through 200 need to go out and the first connection is full, the gRPC channel automatically opens a second HTTP/2 connection for them. No waiting in a queue.
Note that you pass the options through GrpcChannelOptions, which wraps a custom SocketsHttpHandler. The SocketsHttpHandler is the .NET class responsible for managing the underlying HTTP connections. Setting EnableMultipleHttp2Connections = true on it tells it to open additional connections as needed.
8. Performance Comparison
After running each endpoint 50 times and averaging the results, here is the measured performance difference when making concurrent gRPC calls:
| Approach | HTTP/2 Connections | Average Request Time |
|---|---|---|
| Single connection (default) | 1 (max 100 streams) | ~35 seconds |
| Multiple connections enabled | As many as needed | ~30 seconds |
The difference grows significantly when the concurrency level is higher. The more calls you fire simultaneously, the more beneficial multiple connections become. For a single-threaded sequential workload, this setting makes no difference at all — but it does not hurt either.
Rule of thumb: If your application fires more than ~50 concurrent gRPC calls at the same time, enable EnableMultipleHttp2Connections = true. It is a one-line change and has no downsides for typical use cases.9. Summary
- HTTP/2 allows multiple requests to travel over one TCP connection simultaneously. These are called streams.
- There is a default limit of 100 concurrent streams per connection. When this limit is reached, further calls must queue up.
- Raising the server-side limit is not the right solution — it introduces packet loss, thread contention, and reliability issues.
- The correct solution is to configure the client channel to open additional HTTP/2 connections when the current one is full.
- This is done with one line:
EnableMultipleHttp2Connections = trueon aSocketsHttpHandlerpassed toGrpcChannelOptions. - This setting is irrelevant for purely sequential calls but provides a measurable benefit for applications making many concurrent gRPC calls.