RFR[13]: 8224531: SEGV while collecting Klass statistics
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Wed Jul 3 12:10:22 UTC 2019
http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html
There's another object_iterate() in this file with a comment to change
it to safe_object_iterate(). Should you change that too?
Did you run the jvmti tests? There used to be tests that failed if dead
objects weren't found, but the tests may have been fixed.
The rest of the change looks good. Thank you for figuring this out!
Coleen
On 7/3/19 6:15 AM, Erik Österlund wrote:
> Hi,
>
> The heap inspection API performs pointer chasing through dead memory.
> In particular, it populates a table with classes based on a heap walk
> using CollectedHeap::object_iterate(). That API can give you dead
> objects as well, whereas CollectedHeap::safe_object_iterate() gives
> you only live objects.
> The possibly dead objects have their possibly dead Klass* recorded.
> Then we call Klass::collect_statistics on said possibly dead recorded
> classes. In there, we follow the possibly dead mirror oop of the
> possibly dead Klass, and subsequently read the Klass* of the possibly
> dead mirror.
> Here is an attempted ASCII art explaining this
>
> | Klass* | -> | |
> | hi I am a | | hi I am a |
> | dead object | | dead Klass |
> | |
> | oop mirror | -> | Klass* | -> | Class klass |
> | hi I am a |
> | dead mirror |
>
> So as you can see we pointer chase through this chain of possibly dead
> memory. What could possibly go wrong though?
> In CMS a crash can manifest in the following way:
>
> In a concurrent collection, both the object and its class die. They
> are dead once we reach final marking. But the memory is kept around
> and being swept by concurrent sweeping.
> The sweeping yields to young collections (controllable through the
> CMSYield flag). So what can happen is that the mirror is swept (which
> already is a problem in debug builds because we zap the free chunk of
> memory, but hold on there is a problem in in product builds too) and
> gets added to a free list. In the yielded safepoint we may perform a
> young collection that promotes objects to the memory of the freed
> chunk (where the mirror used to be, except due to coalescing of freed
> chunks, there might not be a Klass* pointer where there used to be one
> for the mirror). And then, before sweeping finishes, the heap
> inspection API is called. That API sometimes tries to perform a STW GC
> first to get only live objects, but that GC may fail because of the
> JNI gc locker. And then it just goes ahead calling the unsafe
> object_iterate API anyway. The object_iterate API will pass in the
> object to the closure but not the mirror as it has been freed and
> reused. Buuut... since we pointer chase through the dead object to the
> stale reference to the dead mirror, we eventually find ourselves in an
> awkward situation where we try to read and use a Klass* that might
> really be a primitive vaue now (read: crash).
>
> The general rule of thumb is that pointer chasing through dead memory
> should NOT be done. We allow it in a few rare situations with the
> following constraints: 1) You have to use AS_NO_KEEPALIVE when reading
> dead oops, or things can blow up, 2) You may only read dead oops if
> you are the GC and hence can control that the memory it points at has
> not and will not be freed until your read finishes (...because you are
> the GC).
> Neither of these two constraints hold here. We read the mirrors
> without AS_NO_KEEPALIVE in the pointer chase, and we can not control
> that the memory it points at has not been freed. Therefore, this is an
> invalid use of pointer chasing through dead memory. The fix is simple:
> use the safe_object_iterate API instead, which only hands out live
> objects. I also sprinkled in no_keepalive decorators on the mirrors
> because it's good practice to not use that for such use cases where
> you clobber the whole heap (causing it to be marked in ZGC) but really
> just read an some int or something from the oop, without publishing
> any references to the oop.
>
> I tested this with 100 kitchensink iterations without my fix (failed 2
> times) and 100 kitchensink iterations with my fix (failed 0 times).
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8224531
>
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/
>
> Thanks,
> /Erik
More information about the hotspot-dev
mailing list