Mismatched ciMethodData in replay file.
Liu, Xin
xxinliu at amazon.com
Fri May 20 05:14:10 UTC 2022
hi, Vladimir,
Thanks you for taking a look at this.
On 5/19/22 1:00 PM, Vladimir Kozlov wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Narrowed to hotspot-compiler list.
>
> You are right that it is weird. The dump is done from ciMethodData which is local (for compiler thread) clone of MDO
> which should not be updated during compilation. Unless there is a place we still go into VM for get some numbers.
>oh, I see! Compile constructor calls ciMethod::ensure_method_data(), It
creates a snapshot of MDO at compiler's arena.
> One explanation is that during dump we hit safepoint and we lost part of output.
>
As you said, data are local. c2 thread sweeps profileData 2 rounds.
I am lost here. we write data to a fileStream. it looks like we don't
yield for the safepoint synchronization between 2 rounds. how come we
lost data here?
> I think we need a verification mode for replay dump to catch such case (separate count to catch such mismatch).
>
> Thanks,
> Vladimir K
>
Do you mean this mismatched replay data are useless? I am still trying
to exact what was wrong in C2CompilerThread.
Among 26 data fields, ciReplay manages to decode this sequence as
"ciVirtualCallData" because header 0x70005 denotes bci=7 and tag =
virtual_call_data_tag(5).
0x70005 0x4d55 0x0 0x7f6a5841c3c0 0xa3 0x7f6a5841c470
_data->_cells[4] = 0x7f6a5841c470 is the second profiling receiver's
ciKlass. it's an unmapped address.
This may lead us to the culprit of c2 thread crash. if we ditch
ill-formed ciReplay files, we may miss the clue.
thanks,
--lx
> On 5/18/22 11:14 PM, Liu, Xin wrote:
>> hi,
>>
>> I get a weird replay, which was generated by 17.0.3+6-LTS. I don't see
>> relevant code have changed since then, so I think it is still applicable
>> to the tip of HotSpot.
>>
>> A customer shared the replay file
>> with(https://github.com/corretto/corretto-17/issues/57#issuecomment-1130042063)
>> and I am trying to reproduce his failure. it is written from
>> VMError::report_and_die().
>>
>> One obstacle is that weird entries of ciMethodData. eg. line
>> 14130, It declares that there will 2 non-null oops followed(see '2'
>> after tag 'oops'. however, one only is recorded.
>>
>> ciMethodData kotlin/coroutines/jvm/internal/ContinuationImpl <init>
>> (Lkotlin/coroutines/Continuation;)V 2 21538 orig 80 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 data
>> 26 0x40007 0x402 0x70 0x4e9c 0x70005 0x4d55 0x0 0x7f6a5841c3c0 0xa3
>> 0x7f6a5841c470 0xa4 0xc0003 0x4e9c 0x18 0x110002 0x529e 0x0 0x0 0x0 0x0
>> 0x0 0x0 0x9 0x2 0x6 0x0 oops 2 7
>> com/example/ProductAttRouter$withRequestLoggingContext$1 methods 0
>>
>> Another mismatched entry is at line 14203. it says there are 11
>> oops but only 6 are there.
>>
>> Those mismatched entries leave uninitialized elements of rec->_classes
>> and eventually crash ciReplay::initialize(). Have you seen them before?
>> I can patch up hotspot to handle this mismatch, but I wonder how that
>> happens?
>>
>> ciMethodData::dump_replay_data() iterates
>> _data 2 rounds. The 1st round counts them and second round dumps them.
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciMethodData.cpp#L728
>>
>> Is that possible that the underlying data get updated on the fly? This
>> case is kotlin coroutine. I am not
>> sure whether it is same threading environment as classic Java.
>>
>> thanks,
>> --lx
>>
More information about the hotspot-compiler-dev
mailing list