Aggregation and Action in PLINQ
While many developers use PLINQ primarily for filtering (Where) and mapping (Select), its true power for data-heavy applications lies in how it handles Aggregation and Result Processing. By using specialized methods like Aggregate() and ForAll(), you can move beyond simple queries to high-performance data computation and side-effect execution.
1. Parallel Aggregation: High-Speed Computation
In standard LINQ, Aggregate() works like a "folding" operation that processes elements one by one. In PLINQ, the Aggregate() method is significantly more complex but much faster for large datasets.
Instead of a single accumulator, PLINQ:
- Partitions the data into chunks.
- Calculates a local subtotal for each chunk on different threads.
- Combines those subtotals into a final result.
Code Example: Custom Aggregation
2. The ForAll Method: Parallel Execution
When you want to perform an action (like logging or saving to a database) for every item in a query, you usually use a foreach loop. However, a standard foreach loop is sequential; it pulls results one by one back to the main thread.
ForAll() is the parallel alternative. It executes the specified action on the worker threads directly, without merging the data back to the caller thread first.
Key Difference: foreach vs ForAll
| Feature | foreach (LINQ) | ForAll (PLINQ) |
| Execution Thread | Runs on the calling thread | Runs on multiple worker threads |
| Order | Guaranteed sequential | Non-deterministic (Jumbled) |
| Performance | Slower (requires merging) | Faster (no merge overhead) |
Code Example: Thread-Aware Execution
Note: Because ForAll runs on multiple threads, any shared resources (like a shared list or a non-thread-safe dictionary) must be protected with locks to avoid race conditions.
3. Preserving Sequence with AsOrdered()
As seen in the ForAll example, PLINQ prioritizes speed over sequence. If you are processing data where the order of the source matters—for example, a time-series of financial transactions—you must explicitly tell PLINQ to track the indices.
Using AsOrdered():
The Performance Cost: Be aware that .AsOrdered() introduces a synchronization bottleneck. The system must "buffer" and "sort" the results before handing them to you, which can reduce the speed gains of using parallelism in the first place.
Summary Checklist for Advanced PLINQ
- Use
Aggregate()when you need to reduce a massive dataset into a single value (Sum, Product, Custom Statistics). - Use
ForAll()when you need to execute an independent action for every item and don't care about the order. - Avoid
AsOrdered()unless the business logic strictly requires sequential results, as it adds overhead. - Thread Safety: Always ensure that delegates passed to
AggregateorForAlldo not modify shared global state without proper synchronization.