[14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99

Tue Dec 3 07:12:42 UTC 2019

Thank you Tobias for your review!

Best regards,
Christian

On 02.12.19 14:07, Tobias Hartmann wrote:
> Hi Christian,
> 
> looks reasonable to me.
> 
> Best regards,
> Tobias
> 
> On 20.11.19 15:14, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8231501
>> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.00/
>>
>> The bug could be traced back to the concurrent cleaning of method data with its extra data in
>> MethodData::clean_method_data() and the loading/copying of extra data for the ci method data in
>> ciMethodData::load_extra_data(). I reproduced the bug by using the test [1] which extensively cleans
>> method data by using the whitebox API [2].
>>
>> Before loading and copying the extra data from the MDO to the ciMDO in
>> ciMethodData::load_extra_data(), the metadata is prepared in a fixed-point iteration by cleaning all
>> SpeculativeTrapData entries of methods whose klasses are unloaded [3]. If it encounters such a dead
>> entry it releases the extra data lock (due to ranking issues) and tries again later [4]. This
>> release of the lock triggers the bug: There can be cases where one thread A is waiting in the
>> whitebox API method to get the extra data lock [2] to clean the extra data for the very same MDO for
>> which another thread B just released the lock at [4]. If that MDO actually contained
>> SpeculativeTrapData entries, then thread A cleaned those but the ciMDO, which thread B is preparing,
>> still contains the uncleaned old MDO extra data (because thread B only made a snapshot of the MDO
>> earlier at [5]). Things then go wrong when thread B can reacquire the lock after thread A. It tries
>> to load the now cleaned extra data and immediately finishes at [6] since there are no
>> SpeculativeTrapData entries anymore. It copied a single entry with tag DataLayout::no_tag [7] to the
>> ciMDO which actually contained a SpeculativeTrapData entry. This results in a half way cleared entry
>> (since a SpeculativeTrapData entry has an additional cell for the method) and possible other
>> remaining SpeculativeTrapData entries:
>>
>>
>> Let's assume a little-endian ordering and that both 0x00007fff... addresses are real pointers to
>> methods. Tag 13 (0x0d) is used for SpeculativeTrapData and dp points to the first extra data entry:
>>
>> ciMDO extra data before thread B releases the lock at [4] (same extra data for MDO and ciMDO):
>> 0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000
>> dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63
>> dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68
>> dp+32: tag = 0 -> end of extra data
>>
>> MDO extra data after thread B reacquires the lock and thread A cleaned the MDO (ciMDO extra data is
>> unchanged):
>> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
>> dp: tag = 0 -> end of extra data
>>
>>
>> Returning at [6] when the extra data loading from MDO to ciMDO is finished:
>> MDO extra data:
>> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000
>> dp: tag = 0 -> end of extra data
>>
>> ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 bytes):
>> 0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000
>> dp: tag = 0 -> next entry = dp+8
>> dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal...
>>
>>
>> The next time the ciMDO extra data is iterated, for example by using MethodData::next_extra(), it
>> reads tag 99 after processing the first no_tag entry and jumping to the value at offset 8 which
>> causes a crash since there is no tag 99 available.
>>
>>
>> The fix is to completely zero out the current and all following SpeculativeTrapData entries if we
>> encounter a no_tag in the MDO but a speculative_trap_data_tag tag in the ciMDO. There are also other
>> cases where the method data is cleaned. Thus the bug is not only related to the whitebox API usage
>> but occurs very rarely.
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1]
>> http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java
>>
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137
>> [3] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137
>> [4] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115
>> [5] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219
>> [6] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191
>> [7] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176