[15?] RFR (S): 8249192: MonitorInfo stores raw oops across safepoints

Wed Jul 22 02:13:51 UTC 2020

Hi Thomas,

The fix looks okay to me.
The 15 fix is identical to 16.

This file finally was not changed: src/hotspot/share/runtime/vframe_hp.cpp .
Several files need a copyright comment update.

What tests do you run?
We need at least tier3 to make sure there are no regressions in the 
JVMTI and JDI tests.

Thanks,
Serguei

On 7/21/20 11:00, Thomas Schatzl wrote:
> Hi Coleen and David,
>
>   thanks for your reviews.
>
> On 21.07.20 03:29, David Holmes wrote:
>> Hi Thomas,
>>
>> On 21/07/2020 12:49 am, Thomas Schatzl wrote:
>>> Forwarding to hotspot-dev where it belongs after wrongly sending to 
>>> hotspot-gc-dev.
>>
>> This touches serviceability code as well so cc'ing for good measure.
>>
>> Thanks for taking this one on as it wasn't actually a GC issue!
>
> I have been looking into strange G1 crashes with changed young gen 
> sizing in tier8, which looked very similar to the crashes in that 
> StackWalker/LocalsAndOperands.java test (thanks to Dean Long making us 
> aware of these crashes).
> I guessed correctly in hindsight, that the main difference between C2 
> and Graal would be GC timing, so the problem could be related. Also we 
> have some P2s for 15 with the same stack trace as in this reproducer 
> and the mentioned tier8 (Kitchensink, 24h dacapo) failures that never 
> reproduced... given that the issue reproduced much quicker in 
> LocalsAndOperands (and given its source code gives a pretty narrow 
> area where to look) it seemed easier to start with this one...
>
>>>
>>> -------- Forwarded Message --------
>>> Subject: [15?] RFR (S): 8249192: MonitorInfo stores raw oops across 
>>> safepoints
>>> Date: Mon, 20 Jul 2020 12:07:38 +0200
>>> From: Thomas Schatzl <thomas.schatzl at oracle.com>
>>> To: hotspot-dev at openjdk.java.net <hotspot-gc-dev at openjdk.java.net>
>>>
>>> Hi all,
>>>
>>>    can I get some reviews to handle'ize some raw oops in the 
>>> MonitorInfo class?
>>>
>>> (Afaiu only) in LiveFrameStream::monitors_to_object_array() we try 
>>> to allocate an objArray with raw oops held in the MonitorInfo class 
>>> that are passed in a GrowableArray. This allocation can lead to a 
>>> garbage collection, with the usual random crashes.
>>
>> Right - seems so obvious now. <sigh>
>>
>> Took me a while to convince myself no such similar problem was 
>> lurking in the JVM TI code.
>>
>>> This change changes the raw oops in MonitorInfo to Handles, 
>>
>> My main concern here was whether the MonitorInfo objects are thread 
>> confined. For the StackWalker API we are always dealing with the 
>> current thread so that is fine. For JVM TI, in mainline, we may be 
>> executing code in the calling thread or the target thread; and in 
>> older releases it will be the VMThread at a safepoint. But it seems 
>> that the MonitorInfo's are confined to whichever thread that is, and 
>> so Handle usage is safe.
>
>
>>
>>> and adds a few HandleMarks along the way to make these handles go 
>>> away asap.
>>
>> That, and the ResourceMark changes, were a bit hard to follow. 
>> Basically a HandleMark is now present in the scope of, or just above, 
>> the call to monitors(). The need for the additional ResourceMarks is 
>> far from clear though. In particular I wonder if the RM introduced in 
>> Deoptimization::revoke_from_deopt_handler interacts with the special 
>> DeoptResourceMark in its caller 
>> Deoptimization::fetch_unroll_info_helper? (I have no idea what a 
>> DeoptResourceMark is.)
>
> The DeoptResourceMark in this case seems to act just like a regular 
> ResourceMark, using the thread's resource area.
>
> Other than acting like a ResourceMark, the DeoptResourceMark only 
> seems to be an indicator used in some asserts to verify that no 
> further deoptimization is running.
>
> Looking at the called BiasedLocking::revoke_own_lock(), it adds its 
> own (regular) ResourceMark quite early (further indicating that normal 
> ResourceMarks and DeoptResourceMarks should be "compatible"), meaning 
> that only any resource object allocated in the resource area between 
> the added ResourceMark in Deoptimization::revoke_from_deopt_handler() 
> and the existing one in BiasedLocking::revoke_own_lock() would have a 
> different lifetime than before. There is no resource object allocation 
> in there, actually the only thing that happens is unpacking the 
> contents of the passed handle.
>
> The objects_to_revoke array in 
> Deoptimization::revoke_from_deopt_handler does not escape the method too.
>
> I was much more worried about the caching of a 
> GrowableArray<MonitorInfo> in Thread::cached_monitor_info going on 
> during biased locking...
>
>>
>>>
>>> This issue has been introduced in JDK-8140450: Implement 
>>> Stack-Walking API in jdk9.
>>>
>>> The CR has been triaged as P3, but I would like to ask whether it 
>>> might be good to increase its priority to P2 and apply for inclusion 
>>> in 15. My arguments are as follows:
>>>
>>> - the original issue why I started looking at this were lots of 
>>> seemingly random crashes (5 or 6 were reported and the change 
>>> temporarily backed out for this reason) in tier8 with a g1 change 
>>> that changed young gen sizing. These crashes including that young 
>>> gen sizing change are all gone now with this bugfix.
>>> I.e. this suggests that so far we seem to have not encountered this 
>>> issue more frequently due to pure luck wrt to generation sizing.
>>>
>>> - it affects all collectors (naturally).
>>>
>>> - there are quite a few user reported random crashes with IntelliJ 
>>> and variants, which due to the nature of IDEs tending to retrieve 
>>> stack traces fairly frequently would be more affected than usual. So 
>>> I suspect at least some of them to be caused by this issue, these 
>>> are the only raw oops I am aware of.
>>>
>>> My understanding of the cause and fix is fairly good, but I am no 
>>> expert in this area, so I would like to defer to you about this 
>>> suggestion. The change is imo important enough to be backported to 
>>> 11 and 15 anyway, but the question is about the risk/reward tradeoff 
>>> wrt to bringing it to 15 and not 15.0.1.
>>
>> I'd classify this as a P2 without doubt. As Dan noted there is no 
>> workaround as such.
>>
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8249192
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8249192/webrev/
>>
>> src/hotspot/share/runtime/deoptimization.cpp
>>
>> The code in collect_monitors takes the monitor owner oop and 
>> Handelises it to add to its own GrowableArray of Handles. Is it worth 
>> exposing the MonitorInfo owner() in Handle form to avoid this 
>> unwrapping and re-wrapping?
>
> We talked a bit about this internally and came to the conclusion to 
> initially provide a small fix for 15 and 16, and do that suggested 
> refactoring in 16 only.
>
>>
>> src/hotspot/share/runtime/vframe.hpp
>>
>> I agree with Coleen that the MonitorInfo constructor should not take 
>> a Thread* but should itself materialize and use Thread::current().
>
> Fixed.
>
> New webrevs (jdk16):
>
> http://cr.openjdk.java.net/~tschatzl/8249192/webrev.0_to_1/ (diff)
> http://cr.openjdk.java.net/~tschatzl/8249192/webrev.1/ (full)
>
> jdk15:
>
> http://cr.openjdk.java.net/~tschatzl/8249192/webrev.jdk15.1/
>
> The only difference is that JDK-8247729 in 16 changed a
>
> ResourceMark rm;
>
> in jdk15 (jvmtiEnvBase.cpp:1029) from
>
> ResourceMark rm(current_thread);
>
> in jdk16 (jvmtiEnvBase.cpp:1008)
>
> which gives a merge error now. See also 
> https://hg.openjdk.java.net/jdk/jdk/rev/f8a9be0f9e1a#l2.82 .
>
> Started another tier1-5 run for jdk15; both versions passed 1.2k 
> iterations of the LocalsAndOperands.java test again.
>
> Thanks,
>   Thomas