[crac] RFR: Environment vars propagation into restored process

Mon Oct 3 15:24:38 UTC 2022

On Mon, 3 Oct 2022 14:23:04 GMT, Roman Marchenko <duke at openjdk.org> wrote:

>> One concern with this approach - it means that environment variables will change values after a restore.
>> 
>> It seems odd to say this is a concern when it's the intended behaviour of this PR but it is a concern.  Users typically cache environment variables in static fields or use them to make a one time decision.  They don't expect them (at least at the Java layer) to change value throughout a run of the same process.
>> 
>> This change means two reads of the same env var can give different results at different times which may put unsuspecting applications into inconsistent states if two locations read the env var before vs after a restore.  That's going to be a hard to debug issue.
>> 
>> The VM may also read env vars and bind tightly to the value.  Native code after a restore will still have the original env while java code the modified env.  Do we foresee any issues there?
>
> @DanHeidinga 
> Hi, 
> You're right in your concerns. Indeed the suggested enhancement changes the usual workflow, so users may be confused. 
> That is why we expect users to explicitly adapt their applications in accordance with the behaviour and make sure it works, otherwise there is no guarantee the application run with CRaC is successful.

@wkia You're right the users will need to adapt their applications to work with CRaC.  100% agree there.

The challenge for them will be when they use 3rd party libraries or update their existing applications to work.  It's really easy to miss updating something or not realize the full blast radius of changes requiring updates when an env var becomes "stale" after a restart.

To be safe, I think we need to review the use of env vars in the JDK and ensure that both the native code and the class libraries take correct action on changed env vars.

We should also consider doing something similar to the OpenJ9 approach where we restrict the set of env vars available prior to the checkpoint (minimize the accidental use of checkpoint env), and limit the env var changes to only add new env vars (no inconsistencies).  This got them a long ways in their work with Liberty though they did find it necessary to eventually support overriding some env vars.

With the approach in this PR, it will be hard for service engineers to know what the original env was and to debug issues related to changed env vars. Are there bread crumbs we can leave to make that service work go more smoothly?

-------------

PR: https://git.openjdk.org/crac/pull/30