<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 13, 2023 at 10:20 AM Radim Vansa <<a href="mailto:rvansa@azul.com">rvansa@azul.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p><br>

    </p>

    <div>On 13. 04. 23 15:20, Dan Heidinga

      wrote:<br>

    </div>

    <blockquote type="cite">

      <pre>Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

@Dan, this is very interesting!

Could you please elaborate a bit further. Perhaps in the context of the CrackDemoExt.java sample?

Let me think on that.  I'll see if I can pull something together that shows the api use.

I put together a small example showing the use of SwitchPoint to toggle between phases: normal mode, beforeCheckpoint, afterRestore, normal mode. [0]

In the CRaCPhase class, there are two methods that take Function arguments that allow the user to provide phase-specific behaviour:

* beforeGuard which allows a switching from normal mode to checkpoint mode: <a href="https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L15" target="_blank">https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L15</a>

* aroundGuard which allows switching from normal mode to checkpoint mode and back to normal mode: <a href="https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L28" target="_blank">https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L28</a>

There's a use of this pattern in the "Test" class [1] which transitions from a regular get to a locked get.

The ideas are all there though the code is a little unpleasant to work with due to the exception handling and general complexity of MethodHandles.

Radim has an RCU lock that use Switchpoints as well though his API appears to be more pleasant for users: <a href="https://github.com/openjdk/crac/pull/58/files" target="_blank">https://github.com/openjdk/crac/pull/58/files</a></pre>

    </blockquote>

    <p><br>

    </p>

    <p>I think that it's not only about nicer API; I think that your

      example does not prevent running Test.<span>getSpecialValueRaw()

        and resource </span><span>beforeCheckpoint/afterRestore

        concurrently - if one of the threads enters the

        Test.getSpecialValueRaw method there's nothing that would

        prevent calling beforeCheckpoint(). In other words, you'd need

        the special single-threaded mode.</span></p></div></blockquote><div><br></div><div>You're right.  Sufficiently bad timing with thread scheduling could allow the old value to be seen concurrently (or worse, even after) beforeCheckpoint/afterRestore.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>

    <p><span>While I've also used SwitchPoint as you

        suggested in my PR, can you tell what's the difference between

        just reading a volatile variable (and deciding based on the

        value) and using this class? It seems that it's used mostly in

        scripting support, so I could imagine the utility of generating

        a compact MethodHandle, but is there really any magic?</span></p></div></blockquote><div><br></div><div>The benefits of SwitchPoints (which are built on top of MutableCallSite (MCS) and its syncAll behaviour) is that when rooted in a static final field or invokedynamic callsite, C2 can create a Dependency on the methods that call through the SwitchPoint (ie: the underlying MCS) and force a deoptimization when the MCS.target MH is changed.  This makes the "if" target basically free as there's a deopt when Switchpoint flips and forces the MCS target MH to change.</div><div><br></div><div>Lots of caveats on the above analysis and on actually getting the optimization to happen in practice.  I'm not sure we can reliably provoke it without generating bytecode ourselves.</div><div><br></div><div>--Dan</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>

    <p><span>Radim<br>

      </span></p>

    <p><br>

    </p>

    <blockquote type="cite">

      <pre>[0] <a href="https://github.com/DanHeidinga/SwitchPointExample/blob/main/CRaCPhase.java" target="_blank">https://github.com/DanHeidinga/SwitchPointExample/blob/main/CRaCPhase.java</a>

[1] <a href="https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L114-L140" target="_blank">https://github.com/DanHeidinga/SwitchPointExample/blob/b09fdb2a5d203950abc9de4facbd1435585bf3af/CRaCPhase.java#L114-L140</a>

--Dan

Needs more exploration and prototyping but would provide a potential path to reasonable performance by burying the extra locking in the fallback paths.  And it would be a single pattern to optimize, rather than all the variations users could produce.

--Dan

[0] <a href="https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/" target="_blank">https://blog.openj9.org/2022/10/14/openj9-criu-support-a-look-under-the-hood/</a>

[1] <a href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/SwitchPoint.html</a>

Thank you,

 - Christian

Cheers,

Radim

[1] <a href="https://en.wikipedia.org/wiki/Read-copy-update" target="_blank">https://en.wikipedia.org/wiki/Read-copy-update</a>

On 03. 04. 23 22:30, Christian Tzolov wrote:

</pre>

      <blockquote type="cite">

        <pre>Hi, I'm testing CRaC in the context of long-running applications (e.g. streaming, continuous processing ...) and I've stumbled on an issue related to the coordination of the resolved threads.

For example, let's have a Processor that performs continuous computations. This processor depends on a ProcessorContext and later must be fully initialized before the processor can process any data.

When the application is first started (e.g. not from checkpoints) it ensures that the ProcessorContext is initialized before starting the Processor loop.

To leverage CRaC I've implemented a ProcessorContextResource gracefully stops the context on beforeCheckpoint and then re-initialized it on afterRestore.

When the checkpoint is performed, CRaC calls the ProcessorContextResource.beforeCheckpoint and also preserves the current Processor call stack. On Restore processor's call stack is expectedly restored at the point it was stopped but unfortunately it doesn't wait for the ProcessorContextResource.afterRestore complete. This expectedly crashes the processor.

The <a href="https://github.com/tzolov/crac-demo" target="_blank">https://github.com/tzolov/crac-demo</a> illustreates this issue. The README explains how to reproduce the issue. The OUTPUT.md (<a href="https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md" target="_blank">https://github.com/tzolov/crac-demo/blob/main/OUTPUT.md</a> ) offers terminal snapshots of the observed behavior.

I've used latest JDK CRaC release:

   openjdk 17-crac 2021-09-14

   OpenJDK Runtime Environment (build 17-crac+5-19)

   OpenJDK 64-Bit Server VM (build 17-crac+5-19, mixed mode, sharing)

As I'm new to CRaC, I'd appreciate your thoughts on this issue.

Cheers,

Christian

</pre>

      </blockquote>

      <pre></pre>

    </blockquote>

  </div>

</blockquote></div></div>