Missing entry in SCC due to decomp count mismatch

newer
git: openjdk/leyden: premain-ea:...

Ashutosh Mehra

24 Jul 2024 24 Jul '24

2:54 p.m.

During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the production run with the premain branch . I don't see this happening with the mainline. The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between 25-40 mb) compared to the other compilations. The method in question is jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z The assembly phase added two entries for this method in the code cache: [3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp level: 4, decomp: 1, has clinit barriers) to Startup Code Cache 'quarkus-getting-started.cds.code' ... [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp level: 4, decomp: 1) to Startup Code Cache 'quarkus-getting-started.cds.code' In the production run the "preload" version was successfully loaded: [0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (decomp: 0, hash: 0x493f24e2, has clinit barriers) The PrintTieredEvents logs indicate this method was also sent for compilation during replay training: 0.877593: [force-compile level=4 [jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1 queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00 total=56,0 mdo=0(0),0(0) max levels=4,0 compilable=c1,c1-osr,c2,c2-osr status=idle mtd: mdo=18830(8306), 0(0), deps=0] Ideally this request should have been fulfilled by the second entry in the code cache. But instead I see this message: [0.878s][info ][scc,nmethod ] Missing entry for 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp_level 4, decomp: 0, hash: 0x493f24e2) This is followed by the C2 compilation of the method. It looks like the failure to find the second entry is due to a mismatch of the decomp count [0]. The decomp count is stored in the MethodData. Is it possible that the method data is not yet installed when replay training is done? If so, is that by design or a bug? [0] https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... Thanks, - Ashutosh Mehra

Attachments:

attachment.html (text/html — 3.8 KB)

Show replies by date

Vladimir Kozlov

24 Jul 24 Jul

7:55 p.m.

Thank you for report, Ashutosh Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there. `decomp count` was introduced for 5 steps workflow when we generate aot code as we execute application with idea that production run will follow the same compilation/deoptimization steps. Actually I implemented it before we start using TD to trigger compilation. May be this is the reason that 5 steps workflow is slow now when we use TD. I need to check. Thanks, Vladimir K On 7/24/24 7:54 AM, Ashutosh Mehra wrote:

...

During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the production run with the premain branch . I don't see this happening with the mainline. The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between 25-40 mb) compared to the other compilations. The method in question is jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z

The assembly phase added two entries for this method in the code cache:

[3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp level: 4, decomp: 1, has clinit barriers) to Startup Code Cache 'quarkus-getting-started.cds.code' ... [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp level: 4, decomp: 1) to Startup Code Cache 'quarkus-getting-started.cds.code'

In the production run the "preload" version was successfully loaded:

[0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (decomp: 0, hash: 0x493f24e2, has clinit barriers)

The PrintTieredEventslogs indicate this method was also sent for compilation during replay training:

0.877593: [force-compile level=4 [jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1 queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00 total=56,0 mdo=0(0),0(0) max levels=4,0 compilable=c1,c1-osr,c2,c2-osr status=idle mtd: mdo=18830(8306), 0(0), deps=0]

Ideally this request should have been fulfilled by the second entry in the code cache. But instead I see this message:

[0.878s][info ][scc,nmethod] Missing entry for 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp_level 4, decomp: 0, hash: 0x493f24e2)

This is followed by the C2 compilation of the method.

It looks like the failure to find the second entry is due to a mismatch of the decomp count [0]. The decomp count is stored in the MethodData. Is it possible that the method data is not yet installed when replay training is done? If so, is that by design or a bug?

[0] https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... <https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp#L938>

Thanks, - Ashutosh Mehra

Ashutosh Mehra

10:46 p.m.

Hi Vladimir,

...

Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

Yes, this is with the 1-step workflow. - Ashutosh Mehra On Wed, Jul 24, 2024 at 3:56 PM Vladimir Kozlov <vladimir.kozlov@oracle.com> wrote:

...

Thank you for report, Ashutosh

Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

`decomp count` was introduced for 5 steps workflow when we generate aot code as we execute application with idea that production run will follow the same compilation/deoptimization steps.

Actually I implemented it before we start using TD to trigger compilation. May be this is the reason that 5 steps workflow is slow now when we use TD. I need to check.

Thanks, Vladimir K

On 7/24/24 7:54 AM, Ashutosh Mehra wrote:

...
During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the production run with the premain branch . I don't see this happening with the mainline. The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between 25-40 mb) compared to the other compilations. The method in question is

jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z

...
The assembly phase added two entries for this method in the code cache:

[3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp

...
level: 4, decomp: 1, has clinit barriers) to Startup Code Cache 'quarkus-getting-started.cds.code' ... [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp

...
level: 4, decomp: 1) to Startup Code Cache 'quarkus-getting-started.cds.code'

In the production run the "preload" version was successfully loaded:

[0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (decomp:

...
0, hash: 0x493f24e2, has clinit barriers)

The PrintTieredEventslogs indicate this method was also sent for compilation during replay training:

0.877593: [force-compile level=4

[jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1

...
queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00 total=56,0 mdo=0(0),0(0) max levels=4,0 compilable=c1,c1-osr,c2,c2-osr status=idle mtd: mdo=18830(8306), 0(0), deps=0]

Ideally this request should have been fulfilled by the second entry in the code cache. But instead I see this message:

[0.878s][info ][scc,nmethod] Missing entry for

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z'

...
(comp_level 4, decomp: 0, hash: 0x493f24e2)

This is followed by the C2 compilation of the method.

It looks like the failure to find the second entry is due to a mismatch of the decomp count [0]. The decomp count is stored in the MethodData. Is it possible that the method data is not yet installed when replay training is done? If so, is that by design or a bug?

[0]

https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24...

...
< https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24...

Thanks, - Ashutosh Mehra

Vladimir Kozlov

26 Jul 26 Jul

12:18 a.m.

New subject: [External] : Re: Missing entry in SCC due to decomp count mismatch

I looked on code and for "Preloading" nmethod we ignore decompile counter. We check it only when looking for "normal" AOT code. The counter's value in "Preloading" and "Missing" messages is taking from current MDO [1] and not from the entry describing AOT code. I remember (Igor may confirm it) that CompilerCounters data is nulled for TD including _nof_decompiles value. That is why we see 0 when loading code. It seems we are "lucky" in most cases because we deoptimize (and update counter) may be level 2 (C1) AOT code or "Preloaded" code. That is what I see in my runs which has opposite issue: we record AOT code with `decomp == 0` and when we load it "preload" AOT code was deoptimized and counter updated in MDO and check failed when we tried to load "normal" AOT code. Anyway. As I said, for one-step workflow we should not use decompile counter. I may actually save only latest version of "normal" nmethod even for 5-step workflow. The question is which nmethod version corresponds to saved MDO? For on-step workflow the answer is simple and we can ignore the counter. For 5 steps I need to check corresponding MDO to save correct version. Thanks, Vladimir K [1] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach... [2] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach... On 7/24/24 3:46 PM, Ashutosh Mehra wrote:

...

Hi Vladimir,

Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

Yes, this is with the 1-step workflow.

- Ashutosh Mehra

On Wed, Jul 24, 2024 at 3:56 PM Vladimir Kozlov <vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>> wrote:

Thank you for report, Ashutosh

Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

`decomp count` was introduced for 5 steps workflow when we generate aot code as we execute application with idea that production run will follow the same compilation/deoptimization steps.

Actually I implemented it before we start using TD to trigger compilation. May be this is the reason that 5 steps workflow is slow now when we use TD. I need to check.

Thanks, Vladimir K

On 7/24/24 7:54 AM, Ashutosh Mehra wrote: > During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the production > run with the premain branch . I don't see this happening with the mainline. > The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between > 25-40 mb) compared to the other compilations. > The method in question is > jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z > > The assembly phase added two entries for this method in the code cache: > > [3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp > level: 4, decomp: 1, has clinit barriers) to Startup Code Cache 'quarkus-getting-started.cds.code' > ... > [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp > level: 4, decomp: 1) to Startup Code Cache 'quarkus-getting-started.cds.code' > > In the production run the "preload" version was successfully loaded: > > [0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (decomp: > 0, hash: 0x493f24e2, has clinit barriers) > > The PrintTieredEventslogs indicate this method was also sent for compilation during replay training: > > 0.877593: [force-compile level=4 > [jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1 > queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00 total=56,0 mdo=0(0),0(0) max levels=4,0 > compilable=c1,c1-osr,c2,c2-osr status=idle mtd: mdo=18830(8306), 0(0), deps=0] > > Ideally this request should have been fulfilled by the second entry in the code cache. But instead I see this message: > > [0.878s][info ][scc,nmethod] Missing entry for > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' > (comp_level 4, decomp: 0, hash: 0x493f24e2) > > This is followed by the C2 compilation of the method. > > It looks like the failure to find the second entry is due to a mismatch of the decomp count [0]. The decomp count is > stored in the MethodData. > Is it possible that the method data is not yet installed when replay training is done? If so, is that by design or a bug? > > [0] > https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MGy9Dev-630RGry8or7lUEBQ4OYoS8cYwif2Z56RKLsqk7kx5YSD65AzSK9yYycpOWGubLGwFIZSImuh3ykRKw$> > <https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MGy9Dev-630RGry8or7lUEBQ4OYoS8cYwif2Z56RKLsqk7kx5YSD65AzSK9yYycpOWGubLGwFIZSImuh3ykRKw$>> > > Thanks, > - Ashutosh Mehra

Ashutosh Mehra

3:17 p.m.

New subject: [External] : Re: Missing entry in SCC due to decomp count mismatch

...

The counter's value in "Preloading" and "Missing" messages is taking from current MDO [1] and not from the entry describing AOT code.

That's right.

...

I remember (Igor may confirm it) that CompilerCounters data is nulled for TD including _nof_decompiles value. That is why we see 0 when loading code.

For "Missing" message, the counter is obtained from Method->method_data() which would return null if the MethodData recorded in the training data is not yet installed in the Method. If Method->method_data() is null, then it prints 0 as the counter value. This is the case I am seeing if I print the method data in the "Missing" message: [0.878s][info ][scc,nmethod ] Missing entry for 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (md: (nil), comp_level 4, decomp: 0, hash: 0x493f24e2) Notice that md is null. With one step workflow we should ignore decomp count because code is

...

generated not during execution but based on training data in forked VM - no deoptimization happens there.

hmm, I see some entries marked as "made not entrant" in the output of PrintCompilation during the assembly phase of 1-step workflow. S131 432 % 2 jdk.internal.util.ArraysSupport::unsignedHashCode @ 8 (36 bytes) made not entrant S133 433 2 jdk.internal.util.ArraysSupport::unsignedHashCode (36 bytes) made not entrant S158 506 2 java.util.HashMap::putVal (300 bytes) made not entrant Thanks, - Ashutosh Mehra On Thu, Jul 25, 2024 at 8:18 PM Vladimir Kozlov <vladimir.kozlov@oracle.com> wrote:

...

I looked on code and for "Preloading" nmethod we ignore decompile counter. We check it only when looking for "normal" AOT code.

The counter's value in "Preloading" and "Missing" messages is taking from current MDO [1] and not from the entry describing AOT code.

I remember (Igor may confirm it) that CompilerCounters data is nulled for TD including _nof_decompiles value. That is why we see 0 when loading code.

It seems we are "lucky" in most cases because we deoptimize (and update counter) may be level 2 (C1) AOT code or "Preloaded" code. That is what I see in my runs which has opposite issue: we record AOT code with `decomp == 0` and when we load it "preload" AOT code was deoptimized and counter updated in MDO and check failed when we tried to load "normal" AOT code.

Anyway. As I said, for one-step workflow we should not use decompile counter. I may actually save only latest version of "normal" nmethod even for 5-step workflow. The question is which nmethod version corresponds to saved MDO? For on-step workflow the answer is simple and we can ignore the counter. For 5 steps I need to check corresponding MDO to save correct version.

Thanks, Vladimir K

[1] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach... [2] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach...

On 7/24/24 3:46 PM, Ashutosh Mehra wrote:

...
Hi Vladimir,

Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

Yes, this is with the 1-step workflow.

- Ashutosh Mehra

On Wed, Jul 24, 2024 at 3:56 PM Vladimir Kozlov < vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>> wrote:

Thank you for report, Ashutosh

Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

`decomp count` was introduced for 5 steps workflow when we generate aot code as we execute application with idea that production run will follow the same compilation/deoptimization steps.

Actually I implemented it before we start using TD to trigger compilation. May be this is the reason that 5 steps workflow is slow now when we use TD. I need to check.

Thanks, Vladimir K

On 7/24/24 7:54 AM, Ashutosh Mehra wrote: > During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the production > run with the premain branch . I don't see this happening with the mainline. > The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between > 25-40 mb) compared to the other compilations. > The method in question is > jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z > > The assembly phase added two entries for this method in the code cache: > > [3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp > level: 4, decomp: 1, has clinit barriers) to Startup Code Cache 'quarkus-getting-started.cds.code' > ... > [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (comp > level: 4, decomp: 1) to Startup Code Cache 'quarkus-getting-started.cds.code' > > In the production run the "preload" version was successfully loaded: > > [0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (decomp: > 0, hash: 0x493f24e2, has clinit barriers) > > The PrintTieredEventslogs indicate this method was also sent for compilation during replay training: > > 0.877593: [force-compile level=4 > [jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1 > queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00 total=56,0 mdo=0(0),0(0) max levels=4,0 > compilable=c1,c1-osr,c2,c2-osr status=idle mtd: mdo=18830(8306), 0(0), deps=0] > > Ideally this request should have been fulfilled by the second entry in the code cache. But instead I see this message: > > [0.878s][info ][scc,nmethod] Missing entry for > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' > (comp_level 4, decomp: 0, hash: 0x493f24e2) > > This is followed by the C2 compilation of the method. > > It looks like the failure to find the second entry is due to a mismatch of the decomp count [0]. The decomp count is > stored in the MethodData. > Is it possible that the method data is not yet installed when replay training is done? If so, is that by design or a bug? > > [0] >

https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965...

...
> <

https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965...

...
...
> > Thanks, > - Ashutosh Mehra

Vladimir Kozlov

4:28 p.m.

New subject: [External] : Re: Missing entry in SCC due to decomp count mismatch

Ashutosh, please try this patch: diff --git a/src/hotspot/share/code/SCCache.cpp b/src/hotspot/share/code/SCCache.cpp index 551f80e1411..87f2d718320 100644 --- a/src/hotspot/share/code/SCCache.cpp +++ b/src/hotspot/share/code/SCCache.cpp @@ -934,8 +934,7 @@ static bool check_entry(SCCEntry::Kind kind, uint id, uint comp_level, uint deco if (entry->kind() == kind) { assert(entry->id() == id, "sanity"); if (kind != SCCEntry::Code || (!entry->not_entrant() && !entry->has_clinit_barriers() && - entry->comp_level() == comp_level && - (comp_level == CompLevel_limited_profile || entry->decompile() == decomp))) { + (entry->comp_level() == comp_level))) { return true; // Found } } Thanks, Vladimir K On 7/26/24 8:17 AM, Ashutosh Mehra wrote:

...

The counter's value in "Preloading" and "Missing" messages is taking from current MDO [1] and not from the entry describing AOT code.

That's right.

I remember (Igor may confirm it) that CompilerCounters data is nulled for TD including _nof_decompiles value. That is why we see 0 when loading code.

For "Missing" message, the counter is obtained from Method->method_data() which would return null if the MethodData recorded in the training data is not yet installed in the Method. If Method->method_data() is null, then it prints 0 as the counter value. This is the case I am seeing if I print the method data in the "Missing" message:

[0.878s][info ][scc,nmethod ] Missing entry for 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (md: (nil), comp_level 4, decomp: 0, hash: 0x493f24e2)

Notice that md is null.

With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

hmm, I see some entries marked as "made not entrant" in the output of PrintCompilation during the assembly phase of 1-step workflow.

S131 432 % 2 jdk.internal.util.ArraysSupport::unsignedHashCode @ 8 (36 bytes) made not entrant S133 433 2 jdk.internal.util.ArraysSupport::unsignedHashCode (36 bytes) made not entrant S158 506 2 java.util.HashMap::putVal (300 bytes) made not entrant

Thanks, - Ashutosh Mehra

On Thu, Jul 25, 2024 at 8:18 PM Vladimir Kozlov <vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>> wrote:

I looked on code and for "Preloading" nmethod we ignore decompile counter. We check it only when looking for "normal" AOT code.

The counter's value in "Preloading" and "Missing" messages is taking from current MDO [1] and not from the entry describing AOT code.

I remember (Igor may confirm it) that CompilerCounters data is nulled for TD including _nof_decompiles value. That is why we see 0 when loading code.

It seems we are "lucky" in most cases because we deoptimize (and update counter) may be level 2 (C1) AOT code or "Preloaded" code. That is what I see in my runs which has opposite issue: we record AOT code with `decomp == 0` and when we load it "preload" AOT code was deoptimized and counter updated in MDO and check failed when we tried to load "normal" AOT code.

Anyway. As I said, for one-step workflow we should not use decompile counter. I may actually save only latest version of "normal" nmethod even for 5-step workflow. The question is which nmethod version corresponds to saved MDO? For on-step workflow the answer is simple and we can ignore the counter. For 5 steps I need to check corresponding MDO to save correct version.

Thanks, Vladimir K

[1] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCache.cpp*L2868__;Iw!!ACWV5N9M2RV99hQ!MUQ7NwYRlq4ggiM1yNmZO3xJb1OV-cpfGFcPgPZ9zA7vbDKlPtPvsVtsm-gca03z_WTzgi8AUzOWOYcRh3ntYA$> [2] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCache.cpp*L317__;Iw!!ACWV5N9M2RV99hQ!MUQ7NwYRlq4ggiM1yNmZO3xJb1OV-cpfGFcPgPZ9zA7vbDKlPtPvsVtsm-gca03z_WTzgi8AUzOWOYe9CuvRDQ$>

On 7/24/24 3:46 PM, Ashutosh Mehra wrote: > Hi Vladimir, > > Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not > during execution but based on training data in forked VM - no deoptimization happens there. > > > Yes, this is with the 1-step workflow. > > - Ashutosh Mehra > > > On Wed, Jul 24, 2024 at 3:56 PM Vladimir Kozlov <vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com> <mailto:vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>>> wrote: > > Thank you for report, Ashutosh > > Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not > during execution but based on training data in forked VM - no deoptimization happens there. > > `decomp count` was introduced for 5 steps workflow when we generate aot code as we execute application with idea that > production run will follow the same compilation/deoptimization steps. > > Actually I implemented it before we start using TD to trigger compilation. May be this is the reason that 5 steps > workflow is slow now when we use TD. I need to check. > > Thanks, > Vladimir K > > On 7/24/24 7:54 AM, Ashutosh Mehra wrote: > > During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the > production > > run with the premain branch . I don't see this happening with the mainline. > > The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between > > 25-40 mb) compared to the other compilations. > > The method in question is > > jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z > > > > The assembly phase added two entries for this method in the code cache: > > > > [3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod > > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' > (comp > > level: 4, decomp: 1, has clinit barriers) to Startup Code Cache 'quarkus-getting-started.cds.code' > > ... > > [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod > > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' > (comp > > level: 4, decomp: 1) to Startup Code Cache 'quarkus-getting-started.cds.code' > > > > In the production run the "preload" version was successfully loaded: > > > > [0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod > > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' > (decomp: > > 0, hash: 0x493f24e2, has clinit barriers) > > > > The PrintTieredEventslogs indicate this method was also sent for compilation during replay training: > > > > 0.877593: [force-compile level=4 > > [jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1 > > queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00 total=56,0 mdo=0(0),0(0) max levels=4,0 > > compilable=c1,c1-osr,c2,c2-osr status=idle mtd: mdo=18830(8306), 0(0), deps=0] > > > > Ideally this request should have been fulfilled by the second entry in the code cache. But instead I see this > message: > > > > [0.878s][info ][scc,nmethod] Missing entry for > > 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' > > (comp_level 4, decomp: 0, hash: 0x493f24e2) > > > > This is followed by the C2 compilation of the method. > > > > It looks like the failure to find the second entry is due to a mismatch of the decomp count [0]. The decomp count is > > stored in the MethodData. > > Is it possible that the method data is not yet installed when replay training is done? If so, is that by design > or a bug? > > > > [0] > > > https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MUQ7NwYRlq4ggiM1yNmZO3xJb1OV-cpfGFcPgPZ9zA7vbDKlPtPvsVtsm-gca03z_WTzgi8AUzOWOYcE3EES5w$> <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MGy9Dev-630RGry8or7lUEBQ4OYoS8cYwif2Z56RKLsqk7kx5YSD65AzSK9yYycpOWGubLGwFIZSImuh3ykRKw$>> > > > <https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MUQ7NwYRlq4ggiM1yNmZO3xJb1OV-cpfGFcPgPZ9zA7vbDKlPtPvsVtsm-gca03z_WTzgi8AUzOWOYcE3EES5w$> <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965... <https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MGy9Dev-630RGry8or7lUEBQ4OYoS8cYwif2Z56RKLsqk7kx5YSD65AzSK9yYycpOWGubLGwFIZSImuh3ykRKw$>>> > > > > Thanks, > > - Ashutosh Mehra >

Ashutosh Mehra

6:53 p.m.

New subject: [External] : Re: Missing entry in SCC due to decomp count mismatch

Yes, this seems to work. I don't see the "Missing entry" message for StackMapGenerator::processBlock any more, and it is now able to locate the c2 compiled version in the code cache and use it. [0.911s][info ][scc,nmethod ] 2662 (L4): Reading nmethod 'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (md: 0x7f99f00c5f68, decomp: 0, hash: 0x493f24e2) Thanks, - Ashutosh Mehra On Fri, Jul 26, 2024 at 12:28 PM Vladimir Kozlov <vladimir.kozlov@oracle.com> wrote:

...

Ashutosh, please try this patch:

diff --git a/src/hotspot/share/code/SCCache.cpp b/src/hotspot/share/code/SCCache.cpp index 551f80e1411..87f2d718320 100644 --- a/src/hotspot/share/code/SCCache.cpp +++ b/src/hotspot/share/code/SCCache.cpp @@ -934,8 +934,7 @@ static bool check_entry(SCCEntry::Kind kind, uint id, uint comp_level, uint deco if (entry->kind() == kind) { assert(entry->id() == id, "sanity"); if (kind != SCCEntry::Code || (!entry->not_entrant() && !entry->has_clinit_barriers() && - entry->comp_level() == comp_level && - (comp_level == CompLevel_limited_profile || entry->decompile() == decomp))) { + (entry->comp_level() == comp_level))) { return true; // Found } }

Thanks, Vladimir K

On 7/26/24 8:17 AM, Ashutosh Mehra wrote:

...
The counter's value in "Preloading" and "Missing" messages is taking

from current MDO [1] and not from the entry

...
describing AOT code.

That's right.

I remember (Igor may confirm it) that CompilerCounters data is

nulled for TD including _nof_decompiles value. That is

...
why we see 0 when loading code.

For "Missing" message, the counter is obtained from

Method->method_data() which would return null if the MethodData

...
recorded in the training data is not yet installed in the Method. If Method->method_data() is null, then it prints 0 as the counter value. This is the case I am seeing if I print the method data in the "Missing" message:

[0.878s][info ][scc,nmethod ] Missing entry for

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z' (md:

...
(nil), comp_level 4, decomp: 0, hash: 0x493f24e2)

Notice that md is null.

With one step workflow we should ignore decomp count because code is generated not during execution but based on training data in forked VM - no deoptimization happens there.

hmm, I see some entries marked as "made not entrant" in the output of PrintCompilation during the assembly phase of 1-step workflow.

S131 432 % 2 jdk.internal.util.ArraysSupport::unsignedHashCode @ 8 (36 bytes) made not entrant S133 433 2 jdk.internal.util.ArraysSupport::unsignedHashCode (36 bytes) made not entrant S158 506 2 java.util.HashMap::putVal (300 bytes) made not entrant

Thanks, - Ashutosh Mehra

On Thu, Jul 25, 2024 at 8:18 PM Vladimir Kozlov < vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>> wrote:

I looked on code and for "Preloading" nmethod we ignore decompile counter. We check it only when looking for "normal" AOT code.

The counter's value in "Preloading" and "Missing" messages is taking from current MDO [1] and not from the entry describing AOT code.

I remember (Igor may confirm it) that CompilerCounters data is nulled for TD including _nof_decompiles value. That is why we see 0 when loading code.

It seems we are "lucky" in most cases because we deoptimize (and update counter) may be level 2 (C1) AOT code or "Preloaded" code. That is what I see in my runs which has opposite issue: we record AOT code with `decomp == 0` and when we load it "preload" AOT code was deoptimized and counter updated in MDO and check failed when we tried to load "normal" AOT code.

Anyway. As I said, for one-step workflow we should not use decompile counter. I may actually save only latest version of "normal" nmethod even for 5-step workflow. The question is which nmethod version corresponds to saved MDO? For on-step workflow the answer is simple and we can ignore the counter. For 5 steps I need to check corresponding MDO to save correct version.

Thanks, Vladimir K

[1] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/premain/s...

[2] https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCach... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/premain/s...

On 7/24/24 3:46 PM, Ashutosh Mehra wrote: > Hi Vladimir, > > Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not > during execution but based on training data in forked VM - no deoptimization happens there. > > > Yes, this is with the 1-step workflow. > > - Ashutosh Mehra > > > On Wed, Jul 24, 2024 at 3:56 PM Vladimir Kozlov < vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com> <mailto:vladimir.kozlov@oracle.com <mailto: vladimir.kozlov@oracle.com>>> wrote: > > Thank you for report, Ashutosh > > Is this with one step workflow? With one step workflow we should ignore decomp count because code is generated not > during execution but based on training data in forked VM - no deoptimization happens there. > > `decomp count` was introduced for 5 steps workflow when we generate aot code as we execute application with idea that > production run will follow the same compilation/deoptimization steps. > > Actually I implemented it before we start using TD to trigger compilation. May be this is the reason that 5 steps > workflow is slow now when we use TD. I need to check. > > Thanks, > Vladimir K > > On 7/24/24 7:54 AM, Ashutosh Mehra wrote: > > During the startup of a quarkus app, I see a particular method that gets C2 compiled almost every time in the > production > > run with the premain branch . I don't see this happening with the mainline. > > The reason this method caught my attention is the significant amount of memory its C2 compilation consumes (between > > 25-40 mb) compared to the other compilations. > > The method in question is > > jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z > > > > The assembly phase added two entries for this method in the code cache: > > > > [3.391s][info ][scc,nmethod ] 2631 (L4): Writing nmethod > >

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z'

...
> (comp > > level: 4, decomp: 1, has clinit barriers) to Startup Code

Cache 'quarkus-getting-started.cds.code'

...
> > ... > > [7.215s][info ][scc,nmethod ] 4354 (L4): Writing nmethod > >

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z'

...
> (comp > > level: 4, decomp: 1) to Startup Code Cache

'quarkus-getting-started.cds.code'

...
> > > > In the production run the "preload" version was

successfully loaded:

...
> > > > [0.695s][info ][scc,nmethod ] 727 (L4): Preloading nmethod > >

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z'

...
> (decomp: > > 0, hash: 0x493f24e2, has clinit barriers) > > > > The PrintTieredEventslogs indicate this method was also

sent for compilation during replay training:

...
> > > > 0.877593: [force-compile level=4 > >

[jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z] @-1

...
> > queues=0,0 rate=0.000000 load=0.007812 k=1.00,1.00

total=56,0 mdo=0(0),0(0) max levels=4,0

...
> > compilable=c1,c1-osr,c2,c2-osr status=idle mtd:

mdo=18830(8306), 0(0), deps=0]

...
> > > > Ideally this request should have been fulfilled by the

second entry in the code cache. But instead I see this

...
> message: > > > > [0.878s][info ][scc,nmethod] Missing entry for > >

'jdk.internal.classfile.impl.StackMapGenerator::processBlock(Ljdk/internal/classfile/impl/RawBytecodeHelper;)Z'

...
> > (comp_level 4, decomp: 0, hash: 0x493f24e2) > > > > This is followed by the C2 compilation of the method. > > > > It looks like the failure to find the second entry is due

to a mismatch of the decomp count [0]. The

...
decomp count is > > stored in the MethodData. > > Is it possible that the method data is not yet installed

when replay training is done? If so, is that by

...
design > or a bug? > > > > [0] > > >

https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MUQ7NwYRlq4ggiM1yNmZO3xJb1OV-cpfGFcPgPZ9zA7vbDKlPtPvsVtsm-gca03z_WTzgi8AUzOWOYcE3EES5w$> < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965...

...
...
> > > < https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b24... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb99653624d02a923a314ce40086753b240fc/src/hotspot/share/code/SCCache.cpp*L938__;Iw!!ACWV5N9M2RV99hQ!MUQ7NwYRlq4ggiM1yNmZO3xJb1OV-cpfGFcPgPZ9zA7vbDKlPtPvsVtsm-gca03z_WTzgi8AUzOWOYcE3EES5w$> < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965... < https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/ec5eb9965...

...
...
> > > > Thanks, > > - Ashutosh Mehra >

593

Age (days ago)

595

Last active (days ago)

List overview

Download

6 comments

2 participants

participants (2)

Ashutosh Mehra
Vladimir Kozlov