Why one can't submit an alternative thread pool to the streams API?

Sat Oct 26 00:45:09 UTC 2019

On Sun, Oct 13, 2019 at 9:17 AM Brian Goetz <brian.goetz at oracle.com> wrote:

> Don’t try to create and manage your own pools.  Don’t try to expose APIs
> that encourage your users to create and manage their own pools.  Use
> parallel streams for data parallelism, which automatically use the common
> pool. The common pool is sized to the number of cores; if you create more
> threads, then they just compete for the cores, and you’re paying for extra
> context switching and you will get less throughput — plus (much) more
> memory consumption, more configuration, and more complexity.
>
> The “I know what is happening” that you are seeking is an illusion.  Don’t
> be tempted by it.  99.9% of the time, you’re just going to make it worse.
>

This advice isn't optimal for every type of parallel workload. For example,
for spinning disks with high-latency seek, when reading and/or writing
multiple large files in parallel, one file per stream item, performance
will completely tank if you set the number of threads in a parallel stream
to more than 2 or so.

Most of the time when a job is submitted to a custom pool, it is to control
the level of parallelism. If you don't want people to use anything other
than the common pool, then you should at least add a provision to the API
to set the level of parallelism for a given stream, e.g. `.parallel(2)`.