RFC: One marking bitmap

Fri Oct 6 07:40:27 UTC 2017

On 10/05/2017 05:42 PM, Aleksey Shipilev wrote:
> On 10/05/2017 05:05 PM, Roman Kennke wrote:
>> Am 05.10.2017 um 16:57 schrieb Aleksey Shipilev:
>>> On 10/05/2017 04:53 PM, Roman Kennke wrote:
>>>> Am 05.10.2017 um 16:51 schrieb Aleksey Shipilev:
>>>>> On 10/05/2017 12:24 PM, Roman Kennke wrote:
>>>>>> The main problem has been concurrent class unloading. When we do this, we unload classes that may
>>>>>> still be referenced by unreachable objects in non-cset regions. When we try to iterate such a
>>>>>> region
>>>>>> object-by-object, we may hit such an object with a dangling Klass* and crash. Always having a
>>>>>> valid
>>>>>> bitmap around means we can iterate based on actual liveness info, which guarantees us to skip dead
>>>>>> objects with dangling Klass*.
>>>>> I do not understand this explanation. Actually, I don't understand how second bitmap avoids this
>>>>> issue. One marking bitmap is *also* valid after class unloading (during final mark) had happened,
>>>>> and we can iterate over it safely. Can you do the more verbose example?
>>>> I probably missed the part where we would cancel marking and don't have a valid marking bitmap to
>>>> support safe iteration that skips unreachable objects...
>>> But if we *do* cancel the marking, we either follow to degenerate final mark that completes the
>>> bitmap, or we slide to Full GC. Where's the issue? I need verbose example. And you probably need it
>>> too! And anyone who would be reading this thread years later -- too!
>> Ok, you're right. Sorry.
>> The issue is, or at least was (not sure it is still the case), that we would not slide into full GC.
>> Instead, we do an update-refs pass to fix up any references before sliding into full GC. This would
>> crash because of dangling Klass*.
> 
> So I guess it goes like this.
> 
> As you said: suppose the cycle completes with class unloading, some objects are dormant with broken
> Klass* are in the regions. Now, any iteration that consults the bitmaps is safe, because it will not
> visit those dead objects. This includes update-refs phase and other marked_object_iterate users.
> 
> There are two complications:
>  a) The code that walks the heap without taking care of bitmaps. This is problematic, because it
> indeed can step on broken Klass*. But it also can step on broken oop fields that point nowhere, or
> point to some garbage, or worse. So, it would seem such code is already broken, and we would need to
> fix it, if observed;
>  b) The code that walks the heap with bitmaps, but *during* concmark or *after* cancelled mark.
> There, the bitmaps are incomplete: they don't include some objects that are alive, because concmark
> had not visited them yet. This should not lead to crash, because incomplete bitmap would not give
> you dead objects either way.
> 
> The noticeable examples for (b) are heap dumps, JVMTI heap walkers, and some of our debugging code.
> For heap dumps, we can force Full GC to purge all dirty objects out, and thus guarantee safe
> iteration. For JVMTI IterateOverHeap and friends triggering the Full GC is going to be more
> complicated. This is where second bitmap helps to handle stuff better. Is there a way out without a
> second bitmap?

Thread continuity: Roman had suggested the actual change here:
  http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-October/003884.html

-Aleksey