[External] : Re: Semantics of CollectedHeap::requires_barriers()

Mon Feb 8 14:22:43 UTC 2021

Hi Ron,

>> Hi Ron,
>>
>> Thanks for the explanation!
>>
>> Does that mean that always returning true in requires_barriers() should be conservatively ok? If yes, why am I hitting this assert:

> 
> No, you must not always return true. An that’s just been allocated since the last safepoint must return false.
> Otherwise, we’d never be able to freeze — we can only freeze frames into a chunk that doesn’t require barriers.

Ok.

>> Besides this, I think I know what to do for requires_barriers() in Shenandoah. In-fact (if I understand it corrcetly) I don't think Shenandoah needs any barriers at all there: this is about copying stack-frames (which might contain oops) *into* the StackChunk. It should be ok for SATB because there are no previous oop locations in a new StackChunk, and it should be ok for LRB because LRBs are only relevant when *loading* from oop locations, but here we're storing to oop locations.
> 
> First, when storing frames, we already require that no barriers are needed. If they are we can’t freeze (i.e. we can never freeze into a chunk
> whose requires_barriers returns true).

Ok, makes sense.

> Now, we are reusing the same chunk for multiple freezes, as long as it doesn’t require barriers. That means we are overwriting oops,
> which is not okay for SATB. Unless you perform the writes in memory that has been allocated after marking started. Such memory does not
> need to be marked through by SATB marking.

Aha, ok! In this case we'll require-barriers for objects < 
top-at-mark-start during SATB.

> We must not change the oop layout of something that is concurrently being traversed by the GC. Obviously if you allow to overwrite the chunk at
> any time, that implies it will get overwritten by a marking traversal operation that runs concurrently and inevitably crash. So it's not valid to mutate
> the layout or contents of a stack chunk that can be traversed concurrently by the GC, which again takes us back to allocating memory (allocated
> since the last marking phase started), which is not traversed by concurrent marking.

Right, that is very useful to know.

>> However, we might need barriers when copying frames out of the stack chunk (e.g. when thawing frames), because then we'd be loading from oop locations. Is this taken care of already somehow?
> 
> If you need barriers when thawing, then requires_barriers must return false, and then it’s taken care of.

I assume you mean to return true when we require barriers when thawing?

>> Similar considerations should apply for ZGC. Or maybe I'm missing something?
> 
> ZGC doesn’t require barriers iff the object is in an allocating region. Something similar would need to be done for Shenandoah, I assume.

Yes, with the above explanations in mind, I think we require barriers 
when object is below update-watermark (aka top-at-evacuation-start) 
during evac and update-refs phases, in addition to the SATB handling 
during marking.

> If Shenandoah *always* requires barriers when reading, then we might need to split requires_barriers into two different methods.

No, I don't think this will be necessary.

I think it would be *very* useful to put all those explanations in 
CollectedHeap::requires_barriers().

>> Do GCs need to treat StackChunk instances specially when marking or relocating?
> 
> Absolutely, but I think that’s done automatically, as InstanceStackChunkKlass overrides oop_oop_iterate etc.

Ok, I see.

>> I am a bit skeptical about ContMirror::allocate_stack_chunk() (in continuation.cpp around line 5965), it looks to me like it's trying to second-guess GC allocation mechanics by going directly to the TLAB. I'm not sure if this is a good idea, but can't exactly point to what might go wrong (except a slight breach of GC interface). Can you say what is the reason behind this?
> 
> I think this was supposed to be an optimisation. Not sure if it’s still required/effective.

It looks to me like something that GCs do anyway: try TLAB first, 
failing that, fall back to allocate a new TLAB, try that again, and 
failing that, try locked allocation. GCs may know better how to do that 
fast ;-) (Plus, there might be special situations like -UseTLAB, etc)

Thanks for taking the time to explain all this, it's very useful!
Roman