<div dir="ltr">Hi Vladimir,<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not<br>during execution but based on training data in forked VM - no deoptimization happens there.</blockquote><div><br></div><div>Yes, this is with the 1-step workflow.</div><div><br></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">- Ashutosh Mehra</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 24, 2024 at 3:56 PM Vladimir Kozlov <<a href="mailto:vladimir.kozlov@oracle.com">vladimir.kozlov@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thank you for report, Ashutosh<br>
<br>
Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not <br>
during execution but based on training data in forked VM - no deoptimization happens there.<br>
<br>
`decomp count` was introduced for 5 steps workflow when we generate aot code as we execute application with idea that <br>
production run will follow the same compilation/deoptimization steps.<br>
<br>
Actually I implemented it before we start using TD to trigger compilation. May be this is the reason that 5 steps <br>
workflow is slow now when we use TD. I need to check.<br>
<br>
Thanks,<br>
Vladimir K<br>
<br>
On 7/24/24 7:54 AM, Ashutosh Mehra wrote:<br>
> During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the production <br>
> run with the premain branch . I don't see this happening with the mainline.<br>
> The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between <br>
> 25-40 mb) compared to the other compilations.<br>
> The method in question is <br>
> jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z<br>
> <br>
> The assembly phase added two entries for this method in the code cache:<br>
> <br>
> [3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod <br>
> 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp <br>
> level: 4, decomp: 1, has clinit barriers) to Startup Code Cache 'quarkus-getting-started.cds.code'<br>
> ...<br>
> [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod <br>
> 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp <br>
> level: 4, decomp: 1) to Startup Code Cache 'quarkus-getting-started.cds.code'<br>
> <br>
> In the production run the "preload" version was successfully loaded:<br>
> <br>
> [0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod <br>
> 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (decomp: <br>
> 0, hash: 0x493f24e2, has clinit barriers)<br>
> <br>
> The PrintTieredEventslogs indicate this method was also sent for compilation during replay training:<br>
> <br>
> 0.877593: [force-compile level=4 <br>
> [jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1 <br>
> queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00 total=56,0 mdo=0(0),0(0) max levels=4,0 <br>
> compilable=c1,c1-osr,c2,c2-osr status=idle mtd: mdo=18830(8306), 0(0), deps=0]<br>
> <br>
> Ideally this request should have been fulfilled by the second entry in the code cache. But instead I see this message:<br>
> <br>
> [0.878s][info ][scc,nmethod] Missing entry for <br>
> 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' <br>
> (comp_level 4, decomp: 0, hash: 0x493f24e2)<br>
> <br>
> This is followed by the C2 compilation of the method.<br>
> <br>
> It looks like the failure to find the second entry is due to a mismatch of the decomp count [0]. The decomp count is <br>
> stored in the MethodData.<br>
> Is it possible that the method data is not yet installed when replay training is done? If so, is that by design or a bug?<br>
> <br>
> [0] <br>
> <a href="https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp#L938" rel="noreferrer" target="_blank">https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp#L938</a> <br>
> <<a href="https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp#L938" rel="noreferrer" target="_blank">https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp#L938</a>><br>
> <br>
> Thanks,<br>
> - Ashutosh Mehra<br>
<br>
</blockquote></div>