FJP::setParallelism update

Fri May 27 14:25:33 UTC 2022

On 5/27/22 15:59, Dan Heidinga wrote:
> Definitely the common pool as it's managed by the JVM (though we won't
> change it if the parallelism has been set by env var).

Got it, thanks.

> For other pools, I'm not sure what the right behaviour is yet. They
> would have been tuned based on the # of processors of the machine they
> were running on prior to the checkpoint. After the checkpoint, if
> we're running on a different machine or have a different share of the
> # of processors, then previous tunings - even if explicit - are likely
> wrong.

 From the other side, won't that be a performance problem, rather than 
the correctness issue?

As a user, I would prefer for my settings to be persisted, if I have an 
ability to change them, rather than automatic adjustment that I probably 
don't want and would need to adjust back.

> We could develop an adjustment factor: abs(prevCPUs - currentCPUs))
> and apply that to each pool in an attempt to preserve user intent, but
> it may be better to have users directly update their pools. 

Agree that explicitly fixing parallelism looks like a best approach, 
since we don't know the intent. But that probably can be expressed 
directly on interface level, like setting parallelism as a fraction of 
available cores. Then, it will be possible to differentiate between 
parallelism as a fixed number and the one that is actually a function of 
available cores -- the only one that needs adjustment.

> This
> would be a great use for Volkier's "snapsaftey" concept if we could
> treat use of a FJP as a warning/error that moved up the call stack and
> either prevented checkpointing or was addressed by someone who
> implemented Resource.

I think this is possible to hack in something similar right now. E.g. 
make the FJP to implement the Resource interface that throw an exception 
"you should not have FJP" on checkpoint. To get checkpoint working back 
again, a programmer would need to override interface methods FJP with a 
sensible logic for checkpoint/restore, e.g. do nothing.

Not sure does this fit to the snapsafety concept.

Thanks,
Anton