RFR[13]: 8224531: SEGV while collecting Klass statistics
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Wed Jul 3 12:50:02 UTC 2019
On 7/3/19 8:48 AM, Erik Österlund wrote:
> Hi Coleen,
>
> Thanks for the review.
>
> On 2019-07-03 14:10, coleen.phillimore at oracle.com wrote:
>>
>> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html
>>
>>
>> There's another object_iterate() in this file with a comment to
>> change it to safe_object_iterate(). Should you change that too?
>
> It probably should change, but it does not seem to have the same
> behaviour and I don't know if there is a real bug in that unrelated
> code. But yeah I would prefer that to change too, but I think that
> should be a separate change.
That's fine.
>
>> Did you run the jvmti tests? There used to be tests that failed if
>> dead objects weren't found, but the tests may have been fixed.
>
> I will take it for a spin. Note though that behaviour relying on
> always getting dead objects will fail; the caller of the API will race
> with concurrent sweeping and either get or not get the dead objects
> depending on whether the heap iteration happened to kick in before or
> after sweeping. There is just no way you can rely on that. So if there
> is a test failure because of that, the test is wrong. Nevertheless, I
> will try to hunt down such tests.
>
Yes, please.
Thanks,
Coleen
> Thanks,
> /Erik
>
>> The rest of the change looks good. Thank you for figuring this out!
>> Coleen
>>
>> On 7/3/19 6:15 AM, Erik Österlund wrote:
>>> Hi,
>>>
>>> The heap inspection API performs pointer chasing through dead
>>> memory. In particular, it populates a table with classes based on a
>>> heap walk using CollectedHeap::object_iterate(). That API can give
>>> you dead objects as well, whereas
>>> CollectedHeap::safe_object_iterate() gives you only live objects.
>>> The possibly dead objects have their possibly dead Klass* recorded.
>>> Then we call Klass::collect_statistics on said possibly dead
>>> recorded classes. In there, we follow the possibly dead mirror oop
>>> of the possibly dead Klass, and subsequently read the Klass* of the
>>> possibly dead mirror.
>>> Here is an attempted ASCII art explaining this
>>>
>>> | Klass* | -> | |
>>> | hi I am a | | hi I am a |
>>> | dead object | | dead Klass |
>>> | |
>>> | oop mirror | -> | Klass* | -> | Class klass |
>>> | hi I am a |
>>> | dead mirror |
>>>
>>> So as you can see we pointer chase through this chain of possibly
>>> dead memory. What could possibly go wrong though?
>>> In CMS a crash can manifest in the following way:
>>>
>>> In a concurrent collection, both the object and its class die. They
>>> are dead once we reach final marking. But the memory is kept around
>>> and being swept by concurrent sweeping.
>>> The sweeping yields to young collections (controllable through the
>>> CMSYield flag). So what can happen is that the mirror is swept
>>> (which already is a problem in debug builds because we zap the free
>>> chunk of memory, but hold on there is a problem in in product builds
>>> too) and gets added to a free list. In the yielded safepoint we may
>>> perform a young collection that promotes objects to the memory of
>>> the freed chunk (where the mirror used to be, except due to
>>> coalescing of freed chunks, there might not be a Klass* pointer
>>> where there used to be one for the mirror). And then, before
>>> sweeping finishes, the heap inspection API is called. That API
>>> sometimes tries to perform a STW GC first to get only live objects,
>>> but that GC may fail because of the JNI gc locker. And then it just
>>> goes ahead calling the unsafe object_iterate API anyway. The
>>> object_iterate API will pass in the object to the closure but not
>>> the mirror as it has been freed and reused. Buuut... since we
>>> pointer chase through the dead object to the stale reference to the
>>> dead mirror, we eventually find ourselves in an awkward situation
>>> where we try to read and use a Klass* that might really be a
>>> primitive vaue now (read: crash).
>>>
>>> The general rule of thumb is that pointer chasing through dead
>>> memory should NOT be done. We allow it in a few rare situations with
>>> the following constraints: 1) You have to use AS_NO_KEEPALIVE when
>>> reading dead oops, or things can blow up, 2) You may only read dead
>>> oops if you are the GC and hence can control that the memory it
>>> points at has not and will not be freed until your read finishes
>>> (...because you are the GC).
>>> Neither of these two constraints hold here. We read the mirrors
>>> without AS_NO_KEEPALIVE in the pointer chase, and we can not control
>>> that the memory it points at has not been freed. Therefore, this is
>>> an invalid use of pointer chasing through dead memory. The fix is
>>> simple: use the safe_object_iterate API instead, which only hands
>>> out live objects. I also sprinkled in no_keepalive decorators on the
>>> mirrors because it's good practice to not use that for such use
>>> cases where you clobber the whole heap (causing it to be marked in
>>> ZGC) but really just read an some int or something from the oop,
>>> without publishing any references to the oop.
>>>
>>> I tested this with 100 kitchensink iterations without my fix (failed
>>> 2 times) and 100 kitchensink iterations with my fix (failed 0 times).
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8224531
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/
>>>
>>> Thanks,
>>> /Erik
>>
>
More information about the hotspot-dev
mailing list