RFR[13]: 8224531: SEGV while collecting Klass statistics
Erik Österlund
erik.osterlund at oracle.com
Wed Jul 3 10:15:47 UTC 2019
Hi,
The heap inspection API performs pointer chasing through dead memory. In
particular, it populates a table with classes based on a heap walk using
CollectedHeap::object_iterate(). That API can give you dead objects as
well, whereas CollectedHeap::safe_object_iterate() gives you only live
objects.
The possibly dead objects have their possibly dead Klass* recorded. Then
we call Klass::collect_statistics on said possibly dead recorded
classes. In there, we follow the possibly dead mirror oop of the
possibly dead Klass, and subsequently read the Klass* of the possibly
dead mirror.
Here is an attempted ASCII art explaining this
| Klass* | -> | |
| hi I am a | | hi I am a |
| dead object | | dead Klass |
| |
| oop mirror | -> | Klass* | -> | Class klass |
| hi I am a |
| dead mirror |
So as you can see we pointer chase through this chain of possibly dead
memory. What could possibly go wrong though?
In CMS a crash can manifest in the following way:
In a concurrent collection, both the object and its class die. They are
dead once we reach final marking. But the memory is kept around and
being swept by concurrent sweeping.
The sweeping yields to young collections (controllable through the
CMSYield flag). So what can happen is that the mirror is swept (which
already is a problem in debug builds because we zap the free chunk of
memory, but hold on there is a problem in in product builds too) and
gets added to a free list. In the yielded safepoint we may perform a
young collection that promotes objects to the memory of the freed chunk
(where the mirror used to be, except due to coalescing of freed chunks,
there might not be a Klass* pointer where there used to be one for the
mirror). And then, before sweeping finishes, the heap inspection API is
called. That API sometimes tries to perform a STW GC first to get only
live objects, but that GC may fail because of the JNI gc locker. And
then it just goes ahead calling the unsafe object_iterate API anyway.
The object_iterate API will pass in the object to the closure but not
the mirror as it has been freed and reused. Buuut... since we pointer
chase through the dead object to the stale reference to the dead mirror,
we eventually find ourselves in an awkward situation where we try to
read and use a Klass* that might really be a primitive vaue now (read:
crash).
The general rule of thumb is that pointer chasing through dead memory
should NOT be done. We allow it in a few rare situations with the
following constraints: 1) You have to use AS_NO_KEEPALIVE when reading
dead oops, or things can blow up, 2) You may only read dead oops if you
are the GC and hence can control that the memory it points at has not
and will not be freed until your read finishes (...because you are the GC).
Neither of these two constraints hold here. We read the mirrors without
AS_NO_KEEPALIVE in the pointer chase, and we can not control that the
memory it points at has not been freed. Therefore, this is an invalid
use of pointer chasing through dead memory. The fix is simple: use the
safe_object_iterate API instead, which only hands out live objects. I
also sprinkled in no_keepalive decorators on the mirrors because it's
good practice to not use that for such use cases where you clobber the
whole heap (causing it to be marked in ZGC) but really just read an some
int or something from the oop, without publishing any references to the oop.
I tested this with 100 kitchensink iterations without my fix (failed 2
times) and 100 kitchensink iterations with my fix (failed 0 times).
Bug:
https://bugs.openjdk.java.net/browse/JDK-8224531
Webrev:
http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/
Thanks,
/Erik
More information about the hotspot-dev
mailing list