[15?] RFR (S): 8249192: MonitorInfo stores raw oops across safepoints
serguei.spitsyn at oracle.com
serguei.spitsyn at oracle.com
Wed Jul 22 02:13:51 UTC 2020
Hi Thomas,
The fix looks okay to me.
The 15 fix is identical to 16.
This file finally was not changed: src/hotspot/share/runtime/vframe_hp.cpp .
Several files need a copyright comment update.
What tests do you run?
We need at least tier3 to make sure there are no regressions in the
JVMTI and JDI tests.
Thanks,
Serguei
On 7/21/20 11:00, Thomas Schatzl wrote:
> Hi Coleen and David,
>
> thanks for your reviews.
>
> On 21.07.20 03:29, David Holmes wrote:
>> Hi Thomas,
>>
>> On 21/07/2020 12:49 am, Thomas Schatzl wrote:
>>> Forwarding to hotspot-dev where it belongs after wrongly sending to
>>> hotspot-gc-dev.
>>
>> This touches serviceability code as well so cc'ing for good measure.
>>
>> Thanks for taking this one on as it wasn't actually a GC issue!
>
> I have been looking into strange G1 crashes with changed young gen
> sizing in tier8, which looked very similar to the crashes in that
> StackWalker/LocalsAndOperands.java test (thanks to Dean Long making us
> aware of these crashes).
> I guessed correctly in hindsight, that the main difference between C2
> and Graal would be GC timing, so the problem could be related. Also we
> have some P2s for 15 with the same stack trace as in this reproducer
> and the mentioned tier8 (Kitchensink, 24h dacapo) failures that never
> reproduced... given that the issue reproduced much quicker in
> LocalsAndOperands (and given its source code gives a pretty narrow
> area where to look) it seemed easier to start with this one...
>
>>>
>>> -------- Forwarded Message --------
>>> Subject: [15?] RFR (S): 8249192: MonitorInfo stores raw oops across
>>> safepoints
>>> Date: Mon, 20 Jul 2020 12:07:38 +0200
>>> From: Thomas Schatzl <thomas.schatzl at oracle.com>
>>> To: hotspot-dev at openjdk.java.net <hotspot-gc-dev at openjdk.java.net>
>>>
>>> Hi all,
>>>
>>> can I get some reviews to handle'ize some raw oops in the
>>> MonitorInfo class?
>>>
>>> (Afaiu only) in LiveFrameStream::monitors_to_object_array() we try
>>> to allocate an objArray with raw oops held in the MonitorInfo class
>>> that are passed in a GrowableArray. This allocation can lead to a
>>> garbage collection, with the usual random crashes.
>>
>> Right - seems so obvious now. <sigh>
>>
>> Took me a while to convince myself no such similar problem was
>> lurking in the JVM TI code.
>>
>>> This change changes the raw oops in MonitorInfo to Handles,
>>
>> My main concern here was whether the MonitorInfo objects are thread
>> confined. For the StackWalker API we are always dealing with the
>> current thread so that is fine. For JVM TI, in mainline, we may be
>> executing code in the calling thread or the target thread; and in
>> older releases it will be the VMThread at a safepoint. But it seems
>> that the MonitorInfo's are confined to whichever thread that is, and
>> so Handle usage is safe.
>
>
>>
>>> and adds a few HandleMarks along the way to make these handles go
>>> away asap.
>>
>> That, and the ResourceMark changes, were a bit hard to follow.
>> Basically a HandleMark is now present in the scope of, or just above,
>> the call to monitors(). The need for the additional ResourceMarks is
>> far from clear though. In particular I wonder if the RM introduced in
>> Deoptimization::revoke_from_deopt_handler interacts with the special
>> DeoptResourceMark in its caller
>> Deoptimization::fetch_unroll_info_helper? (I have no idea what a
>> DeoptResourceMark is.)
>
> The DeoptResourceMark in this case seems to act just like a regular
> ResourceMark, using the thread's resource area.
>
> Other than acting like a ResourceMark, the DeoptResourceMark only
> seems to be an indicator used in some asserts to verify that no
> further deoptimization is running.
>
> Looking at the called BiasedLocking::revoke_own_lock(), it adds its
> own (regular) ResourceMark quite early (further indicating that normal
> ResourceMarks and DeoptResourceMarks should be "compatible"), meaning
> that only any resource object allocated in the resource area between
> the added ResourceMark in Deoptimization::revoke_from_deopt_handler()
> and the existing one in BiasedLocking::revoke_own_lock() would have a
> different lifetime than before. There is no resource object allocation
> in there, actually the only thing that happens is unpacking the
> contents of the passed handle.
>
> The objects_to_revoke array in
> Deoptimization::revoke_from_deopt_handler does not escape the method too.
>
> I was much more worried about the caching of a
> GrowableArray<MonitorInfo> in Thread::cached_monitor_info going on
> during biased locking...
>
>>
>>>
>>> This issue has been introduced in JDK-8140450: Implement
>>> Stack-Walking API in jdk9.
>>>
>>> The CR has been triaged as P3, but I would like to ask whether it
>>> might be good to increase its priority to P2 and apply for inclusion
>>> in 15. My arguments are as follows:
>>>
>>> - the original issue why I started looking at this were lots of
>>> seemingly random crashes (5 or 6 were reported and the change
>>> temporarily backed out for this reason) in tier8 with a g1 change
>>> that changed young gen sizing. These crashes including that young
>>> gen sizing change are all gone now with this bugfix.
>>> I.e. this suggests that so far we seem to have not encountered this
>>> issue more frequently due to pure luck wrt to generation sizing.
>>>
>>> - it affects all collectors (naturally).
>>>
>>> - there are quite a few user reported random crashes with IntelliJ
>>> and variants, which due to the nature of IDEs tending to retrieve
>>> stack traces fairly frequently would be more affected than usual. So
>>> I suspect at least some of them to be caused by this issue, these
>>> are the only raw oops I am aware of.
>>>
>>> My understanding of the cause and fix is fairly good, but I am no
>>> expert in this area, so I would like to defer to you about this
>>> suggestion. The change is imo important enough to be backported to
>>> 11 and 15 anyway, but the question is about the risk/reward tradeoff
>>> wrt to bringing it to 15 and not 15.0.1.
>>
>> I'd classify this as a P2 without doubt. As Dan noted there is no
>> workaround as such.
>>
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8249192
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8249192/webrev/
>>
>> src/hotspot/share/runtime/deoptimization.cpp
>>
>> The code in collect_monitors takes the monitor owner oop and
>> Handelises it to add to its own GrowableArray of Handles. Is it worth
>> exposing the MonitorInfo owner() in Handle form to avoid this
>> unwrapping and re-wrapping?
>
> We talked a bit about this internally and came to the conclusion to
> initially provide a small fix for 15 and 16, and do that suggested
> refactoring in 16 only.
>
>>
>> src/hotspot/share/runtime/vframe.hpp
>>
>> I agree with Coleen that the MonitorInfo constructor should not take
>> a Thread* but should itself materialize and use Thread::current().
>
> Fixed.
>
> New webrevs (jdk16):
>
> http://cr.openjdk.java.net/~tschatzl/8249192/webrev.0_to_1/ (diff)
> http://cr.openjdk.java.net/~tschatzl/8249192/webrev.1/ (full)
>
> jdk15:
>
> http://cr.openjdk.java.net/~tschatzl/8249192/webrev.jdk15.1/
>
> The only difference is that JDK-8247729 in 16 changed a
>
> ResourceMark rm;
>
> in jdk15 (jvmtiEnvBase.cpp:1029) from
>
> ResourceMark rm(current_thread);
>
> in jdk16 (jvmtiEnvBase.cpp:1008)
>
> which gives a merge error now. See also
> https://hg.openjdk.java.net/jdk/jdk/rev/f8a9be0f9e1a#l2.82 .
>
> Started another tier1-5 run for jdk15; both versions passed 1.2k
> iterations of the LocalsAndOperands.java test again.
>
> Thanks,
> Thomas
More information about the hotspot-dev
mailing list