<div dir="ltr">Brian, thanks for your comments!<div><br></div><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> - Generate perfectly portable everything, even if that means generating worse everything.  This is the route you are suggesting, but obviously this has limits that will be reached pretty quickly anyway<br></blockquote><div><br></div><div>I wouldn't call for generating perfectly portable code. Portability would depend on what micro-architectures we are willing to support and finding the common set of cpu features supported by them.</div><div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">-Record environmental dependencies, but be willing to toss them if they don’t match the runtime environment.  For things like an AOT’ed code cache, the code cache could be flushed and we could fall back to dynamic code generation.  <br></blockquote><div><br></div><div>I agree that recording the environmental dependencies would always be required. Even when generating the portable code, we should record the dependencies, at the least to verify the AOT'ed code can be executed in the runtime environment.</div><div><br></div><div>To expand a bit more on the portability aspect, the idea behind choosing a set of cpu features is to make the code executable on a broader range of micro architectures.</div><div>Let's say the application needs to be deployed on a cloud where the user may not know the micro-architecture of the systems where the code would eventually be executed. </div><div>In such a case if the ahead-of-time condensation produces code using a specific cpu feature available on the system it is running on, then the AOT'ed code may not be usable on deployment systems.</div><div>We can fall back to dynamic code generation, but we lose the startup benefits we could get if only the code had been more portable.</div><div><br></div><div>Even if the portable code is inferior in quality, it can do the job of providing comparatively quicker start up. At runtime the code can be upgraded (I guess based on profiling) to a more efficient version to exploit all the cpu features of the underlying architecture.</div><div><br></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">- Ashutosh Mehra</div></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Dec 23, 2022 at 3:15 PM Brian Goetz <<a href="mailto:brian.goetz@oracle.com">brian.goetz@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div style="overflow-wrap: break-word;">

You identify a problem (generated code has environmental dependencies which may not carry into other environments) and a solution (detune the generated code to something generic.)  I agree with the problem, but there are many other possible solutions, and we

 shouldn’t prematurely snap to one of them.  

<div><br>

</div>

<div>Moving some work from runtime to earlier phases (whether through a Leyden condenser, a classfile rewriter, etc) always creates the possibility that this work depends on some environmental characteristic of the early execution environment.  The

 list is endless: hardware, operating system, environment variables, time and time zone, JDK and other dependency versions, etc.  </div>

<div><br>

</div>

<div>Strategies for dealing with this include:</div>

<div><br>

</div>

<div> - Record environmental dependencies, and fail if the code is restarted when the dependencies are not met.  This is entirely reasonable when applied to things like “you ran your checkpoint on Intel/Windows, but you’re resuming on ARM/MacOS”; it

 is merely a matter of judgment how fine-grained to record and enforce.  </div>

<div><br>

</div>

<div> - Generate perfectly portable everything, even if that means generating worse everything.  This is the route you are suggesting, but obviously this has limits that will be reached pretty quickly anyway.  Nothing is going to save you from trying

 to resume on a different architecture / OS / incompatible JVM version / wrong class path / etc.  </div>

<div><br>

</div>

<div> - Record environmental dependencies, but be willing to toss them if they don’t match the runtime environment.  For things like an AOT’ed code cache, the code cache could be flushed and we could fall back to dynamic code generation.  </div>

<div><br>

</div>

<div> - Train on a variety of architectures and include multiple versions of the AOT’ed code in the binary.   </div>

<div><br>

</div>

<div>… and plenty of others.</div>

<div><br>

</div>

<div>A key aspect of Leyden is not only ahead-of-time condensation, but recording constraints that capture assumptions inherent in those condensations, and prevent operations that would contradict those assumptions (e.g., prohibit redefinition of classes

 that have been used to generate AOT’ed code.) </div>

<div><br>

</div>

<div>

<div><br>

<blockquote type="cite">

<div>On Dec 23, 2022, at 12:46 PM, Ashutosh Mehra <<a href="mailto:asmehra@redhat.com" target="_blank">asmehra@redhat.com</a>> wrote:</div>

<br>

<div>

<div dir="ltr">

<div>Hello,</div>

<div><br>

</div>

Early this year we were doing some experiments on portability of checkpoints under project CRaC and prepared a document [0] on our findings.

<div>We feel parts of it would be relevant under Project Leyden as well, as it <i>may</i> enable ahead-of-time compilation in future (I know we are not there yet!)</div>

<div><br>

<div>For the CRaC checkpoints, we found the code generated by C1/C2 compiler is not always portable due to the use of architecture specific instructions. </div>

<div>The same would hold true in the context of AoT compilations as well.</div>

<div>To make the code portable C1/C2 compilers should be provided with a minimal set of cpu features that they are allowed to exploit during codegen phase. </div>

<div>However, this can also negatively impact the performance of the generated code as it would now not be utilizing all the features of the underlying architecture.</div>

<div>So the performance may have to be traded for gaining portability.</div>

<div><br>

</div>

<div>If anyone has thoughts on this aspect of the problem, please share them.<br>

</div>

<div><br>

</div>

<div>

<div>[0] <a href="http://cr.openjdk.java.net/~heidinga/crac/Portability_of_checkpoints.pdf" target="_blank">http://cr.openjdk.java.net/~heidinga/crac/Portability_of_checkpoints.pdf</a></div>

<div><br>

</div>

<div>

<div>

<div dir="ltr">

<div dir="ltr">- Ashutosh Mehra</div>

</div>

</div>

</div>

</div>

</div>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</div>

</blockquote></div>