RFR (M) 8235573: Move JFR ObjectSample oop into OopStorage
    Markus Gronlund 
    markus.gronlund at oracle.com
       
    Wed Aug  5 10:37:04 UTC 2020
    
    
  
Hi Kim,
That is a good argument, thank you.
So we would need Atomic accesses to the variable _last_sweep to prevent word tearing on 32-bit platforms for GCs that perform concurrent reference processing.
About it currently being a clock value, and could instead be an natural number, that's a good observation too. We have thought about that change but never followed through with it, mainly because the stamping mechanism is not optimal from the start and we would like instead to come up with a better solution.
Thanks
Markus
-----Original Message-----
From: Kim Barrett 
Sent: den 5 augusti 2020 08:40
To: Markus Gronlund <markus.gronlund at oracle.com>
Cc: David Holmes <david.holmes at oracle.com>; Coleen Phillimore <coleen.phillimore at oracle.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>; hotspot-jfr-dev at openjdk.java.net
Subject: Re: RFR (M) 8235573: Move JFR ObjectSample oop into OopStorage
> On Aug 4, 2020, at 5:39 AM, Markus Gronlund <markus.gronlund at oracle.com> wrote:
> 
> About _last_sweep:
> 
> The value, representing the last time the GC passed over the sample instances, is only treated as a hint, used as input to construct a filter when serializing object samples as events along with their associated chains. The filter excludes samples that are short-lived, i.e. having timestamps after the _last_sweep value. _last_sweep is only read by a single thread which at that moment have exclusive access to the sampler instance. It is ok for it to read whatever value leaked to it, in fact it can be advantageous to see an older value as it makes the filter more effective in reducing noise. Therefore, I think it is ok to leave an undecorated access to this variable. Also, there is no ordering constraint, the value is just reset when constructing a new instance, and a race is ok here too.
In the old code, the write of _last_sweep was done during a weak_oops_do that was called by all GCs during a safepoint, thus providing the needed synchronization with the Java thread reads.
In the new code, for a GC that does STW reference processing, the GC callback that updates _last_sweep is from that STW safepoint context, so still not concurrent with the sampling.
However, for a GC that does concurrent reference processing, the GC callback is concurrent with mutator threads, so potentially concurrent with sampling.
So we have unsynchronized non-atomic read/write pairs, which is UB.  And that's UB being introduced by this change.
And it's not one of those "undetectable" UB's that we sometimes squint and pretend isn't happening and hope the compiler never becomes smart enough to find. The value involved is a 64bit timestamp. If run on a 32bit platform, word tearing could ensue for the reader.
A mitigating factor is that ZGC doesn't run on 32bit platforms and changing that is not on the roadmap.
I don't know if Shenandoah has concurrent reference processing (yet), but it does support 32bit platforms:
https://wiki.openjdk.java.net/display/shenandoah/Main
Looking at the use of _last_sweep, it's not clear that it really needs to be a clock value; maybe it could just be a (atomic) counter? But that doesn't look like a simple change to make.
    
    
More information about the hotspot-jfr-dev
mailing list