RFR[13]: 8224531: SEGV while collecting Klass statistics
Erik Österlund
erik.osterlund at oracle.com
Wed Jul 3 12:48:41 UTC 2019
Hi Coleen,
Thanks for the review.
On 2019-07-03 14:10, coleen.phillimore at oracle.com wrote:
>
> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html
>
>
> There's another object_iterate() in this file with a comment to change
> it to safe_object_iterate(). Should you change that too?
It probably should change, but it does not seem to have the same
behaviour and I don't know if there is a real bug in that unrelated
code. But yeah I would prefer that to change too, but I think that
should be a separate change.
> Did you run the jvmti tests? There used to be tests that failed if
> dead objects weren't found, but the tests may have been fixed.
I will take it for a spin. Note though that behaviour relying on always
getting dead objects will fail; the caller of the API will race with
concurrent sweeping and either get or not get the dead objects depending
on whether the heap iteration happened to kick in before or after
sweeping. There is just no way you can rely on that. So if there is a
test failure because of that, the test is wrong. Nevertheless, I will
try to hunt down such tests.
Thanks,
/Erik
> The rest of the change looks good. Thank you for figuring this out!
> Coleen
>
> On 7/3/19 6:15 AM, Erik Österlund wrote:
>> Hi,
>>
>> The heap inspection API performs pointer chasing through dead memory.
>> In particular, it populates a table with classes based on a heap walk
>> using CollectedHeap::object_iterate(). That API can give you dead
>> objects as well, whereas CollectedHeap::safe_object_iterate() gives
>> you only live objects.
>> The possibly dead objects have their possibly dead Klass* recorded.
>> Then we call Klass::collect_statistics on said possibly dead recorded
>> classes. In there, we follow the possibly dead mirror oop of the
>> possibly dead Klass, and subsequently read the Klass* of the possibly
>> dead mirror.
>> Here is an attempted ASCII art explaining this
>>
>> | Klass* | -> | |
>> | hi I am a | | hi I am a |
>> | dead object | | dead Klass |
>> | |
>> | oop mirror | -> | Klass* | -> | Class klass |
>> | hi I am a |
>> | dead mirror |
>>
>> So as you can see we pointer chase through this chain of possibly
>> dead memory. What could possibly go wrong though?
>> In CMS a crash can manifest in the following way:
>>
>> In a concurrent collection, both the object and its class die. They
>> are dead once we reach final marking. But the memory is kept around
>> and being swept by concurrent sweeping.
>> The sweeping yields to young collections (controllable through the
>> CMSYield flag). So what can happen is that the mirror is swept (which
>> already is a problem in debug builds because we zap the free chunk of
>> memory, but hold on there is a problem in in product builds too) and
>> gets added to a free list. In the yielded safepoint we may perform a
>> young collection that promotes objects to the memory of the freed
>> chunk (where the mirror used to be, except due to coalescing of freed
>> chunks, there might not be a Klass* pointer where there used to be
>> one for the mirror). And then, before sweeping finishes, the heap
>> inspection API is called. That API sometimes tries to perform a STW
>> GC first to get only live objects, but that GC may fail because of
>> the JNI gc locker. And then it just goes ahead calling the unsafe
>> object_iterate API anyway. The object_iterate API will pass in the
>> object to the closure but not the mirror as it has been freed and
>> reused. Buuut... since we pointer chase through the dead object to
>> the stale reference to the dead mirror, we eventually find ourselves
>> in an awkward situation where we try to read and use a Klass* that
>> might really be a primitive vaue now (read: crash).
>>
>> The general rule of thumb is that pointer chasing through dead memory
>> should NOT be done. We allow it in a few rare situations with the
>> following constraints: 1) You have to use AS_NO_KEEPALIVE when
>> reading dead oops, or things can blow up, 2) You may only read dead
>> oops if you are the GC and hence can control that the memory it
>> points at has not and will not be freed until your read finishes
>> (...because you are the GC).
>> Neither of these two constraints hold here. We read the mirrors
>> without AS_NO_KEEPALIVE in the pointer chase, and we can not control
>> that the memory it points at has not been freed. Therefore, this is
>> an invalid use of pointer chasing through dead memory. The fix is
>> simple: use the safe_object_iterate API instead, which only hands out
>> live objects. I also sprinkled in no_keepalive decorators on the
>> mirrors because it's good practice to not use that for such use cases
>> where you clobber the whole heap (causing it to be marked in ZGC) but
>> really just read an some int or something from the oop, without
>> publishing any references to the oop.
>>
>> I tested this with 100 kitchensink iterations without my fix (failed
>> 2 times) and 100 kitchensink iterations with my fix (failed 0 times).
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8224531
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/
>>
>> Thanks,
>> /Erik
>
More information about the hotspot-dev
mailing list