gRPC Performance: When Streaming Is Better Than Individual (Unary) Calls
1. Recap: The Four Types of gRPC Calls
Before we dive in, let's quickly recall the four types of gRPC calls:
| Type | Client sends | Server sends back |
|---|---|---|
| Unary | One message | One message |
| Server streaming | One message | Many messages (stream) |
| Client streaming | Many messages (stream) | One message |
| Bi-directional streaming | Many messages (stream) | Many messages (stream) |
This article focuses on the difference between making many separate unary calls versus making one bi-directional streaming call that processes all the messages at once.
2. The Problem with Repeated Unary Calls
A unary call works like a normal function call: you send one request and get one response. This is perfectly fine when you only need to call it occasionally.
But imagine you need to process 1,000 items. With unary calls, you would make 1,000 separate gRPC calls — one per item:
Each of those 1,000 calls involves:
- Starting a new HTTP/2 stream on the connection
- Sending the request headers and request body
- Waiting for the server to respond
- Receiving the response headers and response body
- Closing the stream
All of that overhead is repeated 1,000 times. Even though individual calls are fast, this overhead accumulates. In tests with 1,000 calls over a channel reusing the same connection, it took approximately 15 seconds.
Real-world analogy: Imagine a post office. Unary calls are like writing 1,000 individual letters, going to the post office for each one, waiting in line, handing over the letter, waiting to receive a reply, and then going back home — 1,000 times. Streaming is like going to the post office once with a bag containing all 1,000 letters, handing them all over at once, and waiting at the counter while the clerk processes each letter and hands you back 1,000 replies one by one.
3. What Is Bi-Directional Streaming?
Bi-directional streaming opens a single long-lived connection between client and server through which both sides can send as many messages as they want.
Instead of 1,000 separate request-response cycles, you:
- Open one streaming call
- Send all 1,000 request messages through the client stream
- Receive all 1,000 response messages from the server stream
- Close the call
The key benefit is that the per-call HTTP/2 overhead (opening a stream, sending headers, closing a stream) happens only once instead of 1,000 times. The messages themselves can flow continuously without any interruption.
4. Step 1: Define the Streaming RPC in the .proto File
Let's start from the beginning with a real example. We have a Monitor service in a performance.proto file that already has a unary RPC:
To add a bi-directional streaming RPC, we add a new entry to the Monitor service. The stream keyword before both the request and response types marks them as streams:
The .proto file uses the same request and response message types as before. All we have changed is that both sides are now streams instead of single messages. After saving this file, the gRPC tooling regenerates the C# classes automatically.
5. Step 2: Implement the Server Side
On the server side, you need to override the new RPC method. The signature is different from a unary call:
- Instead of receiving a single
PerformanceStatusRequest, you receive anIAsyncStreamReader<PerformanceStatusRequest>— an asynchronous reader you loop through to read incoming request messages one at a time. - Instead of returning a single
PerformanceStatusResponse, you write to anIServerStreamWriter<PerformanceStatusResponse>— you callWriteAsyncon it to push response messages to the client.
The server processes exactly one response per request, maintaining a strict 1:1 mapping. But crucially, all of this happens over a single open stream — there is no per-message overhead of opening and closing HTTP/2 streams.
6. Step 3: Implement the Client Side
The client side is slightly more complex because the sending and receiving happen concurrently. You need to:
- Open the streaming call with
client.GetManyPerformanceStats(). - Start a background task that reads from the response stream.
- Write all request messages to the request stream.
- Close the request stream (signal that you are done sending).
- Wait for the background reading task to finish (all responses received).
The important insight here is that the client writes to the request stream and the server writes to the response stream simultaneously. While the client is still sending request messages, the server is already processing the first few and sending back responses. This pipeline effect is what makes streaming so much faster.
7. Step 4: Expose an API Endpoint That Uses Streaming
Now let's wire everything up in an ASP.NET Core controller to compare streaming and unary side by side:
Compare this to the unary version, which makes count separate calls. The calling code looks almost identical, but the underlying mechanism is completely different. The streaming version sends all messages over a single open stream, while the unary version opens and closes a new stream for each message.
8. Performance Comparison: Unary vs. Streaming
Here is the measured performance for processing 1,000 items in both approaches:
| Approach | gRPC calls made | Time for 1,000 items |
|---|---|---|
| Repeated unary calls (reused channel) | 1,000 | ~15 seconds |
| Single bi-directional streaming call | 1 | ~3 seconds |
The bi-directional streaming call is 5 times faster despite processing exactly the same number of messages and returning exactly the same number of results. The only difference is the mechanism used to transmit them.
The performance gain comes entirely from eliminating the per-call HTTP/2 overhead. With 1,000 unary calls, you pay that overhead 1,000 times. With one streaming call, you pay it once.
9. When Should You Use Streaming vs. Unary?
Streaming is not always the right choice. Here is a practical guide:
| Situation | Recommended approach |
|---|---|
| Occasional single calls (e.g., user login, one-time lookup) | Unary — simple and sufficient |
| Batch processing of many items in one operation | Bi-directional streaming |
| Real-time data feed from server (e.g., stock ticker) | Server streaming |
| Uploading many files or large chunks to the server | Client streaming |
| Chat applications, real-time collaborative editing | Bi-directional streaming |
| Simple CRUD-style operations triggered by user actions | Unary — easier to implement and reason about |
Rule of thumb: If you find yourself calling the same unary RPC many times in a loop to process a collection, that is a strong signal that a streaming RPC would serve you better.
10. Summary
- A unary call is one request and one response. It is the simplest form of gRPC communication and fine for occasional single calls.
- When you need to process many items, making repeated unary calls accumulates per-call HTTP/2 overhead that adds up significantly.
- Bi-directional streaming opens one long-lived call through which both client and server send as many messages as needed. The per-call overhead is paid only once.
- To define a streaming RPC in
.proto, add thestreamkeyword before the request and/or response type:rpc GetManyPerformanceStats (stream PerformanceStatusRequest) returns (stream PerformanceStatusResponse); - On the server, override the method with
IAsyncStreamReaderandIServerStreamWriterparameters. Read from one, write to the other. - On the client, open the streaming call, start a background reading task, write all requests, close the request stream, and await the reading task.
- Measured result: 5x faster than repeated unary calls for 1,000 messages (~3 seconds vs. ~15 seconds).