Frequently dereferenced weak references never get cleared

Fri Mar 15 11:22:22 UTC 2019

> Since dereferencing a weak reference (by definition) creates a strong
reference, doing so in the middle of a concurrent SATB marker (e.g. like G1
and Shenandoah) or a concurrent precise-wavefront
> marker (Like C4 and ZGC) will keep the newly-strong-referenced object
reachable, and considered live by the in-flight marking cycle. This has
always been the case for these GC algorithms, and there
> does not seem to be a practical reason to reduce or remove these
clearly-within-the-spec and very well behaved qualities.

Pity that the weak reference is all about calling the dereferencing get
method. There is not much other sense of having weak reference if you should
not use it to access the strong referent. 

> Building mechanisms that strongly depend on the timely clearing of weak
references that get dereferenced often (with no "quiet" period long enough
for a marker to determine unreachability) is simply
> not a sound design. It "works" robustly when STW collectors are used for
collecting the various generations (because the application is frozen for
the duration of the mark). It also somewhat-works
> with multi-pass semi-concurrent markers that revisit mutated heap
references and finish up with an STW "cleanup marking" (like the CMS marker
does), as long as the strong references were not propagated
> to the heap, or were quickly overwritten there if they were. But these
things only "work" when you make assumptions about the collector behavior
that are clearly invalid.

I agree but it is design from 2002 which still has many benefits for us. It
is very hard to replace it as there is no real solution for the reference
counting you can implement at the Java language side (like in C++). 

> Since pretty much all the mainstream-maintained, non-deprecated, non-STW
collectors in server-side JVMs (G1, C4, ZGC, Shenandoah) share the qualities
described above, changing your design to
> correctly deal with weak reference semantics is probably your best bet
going forward. There are some straight forward ways to build applications
that use weak references without being susceptible
> to frequent-strengthening problems, including (as we discussed years ago)
for your specific use pattern. The most common pattern I've seen used with
significant success involves using keep-alive
> "sentinel" leaf objects (objects hanging off of your temporary stuff using
weak refs or phantom refs, and are only accessed when you actually want that
stuff to remain alive, but do not themselves
> refer to other things), and using reference queues to process the detected
death of those "sentinel" objects, intentionally and safely unlinking their
associated "temporary stuff" object graphs
> (e.g. removing associated event listeners from listener queues, or cleanly
taking items out of lookup tables) upon notice. From java 9 on,
java.lang.ref.Cleaner provides a way to do this sort of thing without
> even managing the queues yourself.

We were stuck with the Java 8 due to the infrastructure running on
Solaris/Intel which got unsupported by Oracle. Only today, when we are
moving the application to even bigger financial institution, we got 
into Java 9+ migration activity looking around GC alternatives. The G1
always had pretty bad results for our case. Due to the permanent mutation of
the complete heap it had some issue with remembered
sets processing. The C4 situation you know very well. The ZGC and Shenandoah
are still experimental. The CMS still does the daily job for us.

Yes, it seems to be finally he right time to re-wire references using the
"sentinel" leaf object to get at least something cleared by the new coming
garbage collectors (ZGC, Shenandoah) or C4. The weak
referencing pattern is used within 30+ layers which we would need to address
all the same way to get it all working together again.

Anyway, thank you for your feedback. Note I still regret the 2012 moment of
our client not going for the Azul C4 collector including the weak reference
handling extension we agreed on.

Regards
Michal

Od: Gil Tene <gil at azul.com> 
Odesláno: pátek 15. března 2019 6:56
Komu: Ing. Michal Frajt (Luxonit s.r.o.) <michal.frajt at luxonit.com>
Kopie: zgc-dev at openjdk.java.net
Předmět: Re: Frequently dereferenced weak references never get cleared

Since dereferencing a weak reference (by definition) creates a strong
reference, doing so in the middle of a concurrent SATB marker (e.g. like G1
and Shenandoah) or a concurrent precise-wavefront marker (Like C4 and ZGC)
will keep the newly-strong-referenced object reachable, and considered live
by the in-flight marking cycle. This has always been the case for these GC
algorithms, and there does not seem to be a practical reason to reduce or
remove these clearly-within-the-spec and very well behaved qualities.

Building mechanisms that strongly depend on the timely clearing of weak
references that get dereferenced often (with no "quiet" period long enough
for a marker to determine unreachability) is simply not a sound design. It
"works" robustly when STW collectors are used for collecting the various
generations (because the application is frozen for the duration of the
mark). It also somewhat-works with multi-pass semi-concurrent markers that
revisit mutated heap references and finish up with an STW "cleanup marking"
(like the CMS marker does), as long as the strong references were not
propagated to the heap, or were quickly overwritten there if they were. But
these things only "work" when you make assumptions about the collector
behavior that are clearly invalid.
Since pretty much all the mainstream-maintained, non-deprecated, non-STW
collectors in server-side JVMs (G1, C4, ZGC, Shenandoah) share the qualities
described above, changing your design to correctly deal with weak reference
semantics is probably your best bet going forward. There are some straight
forward ways to build applications that use weak references without being
susceptible to frequent-strengthening problems, including (as we discussed
years ago) for your specific use pattern. The most common pattern I've seen
used with significant success involves using keep-alive "sentinel" leaf
objects (objects hanging off of your temporary stuff using weak refs or
phantom refs, and are only accessed when you actually want that stuff to
remain alive, but do not themselves refer to other things), and using
reference queues to process the detected death of those "sentinel" objects,
intentionally and safely unlinking their associated "temporary stuff" object
graphs (e.g. removing associated event listeners from listener queues, or
cleanly taking items out of lookup tables) upon notice. From java 9 on,
java.lang.ref.Cleaner provides a way to do this sort of thing without even
managing the queues yourself.
> On Mar 14, 2019, at 2:26 PM, Ing. Michal Frajt (Luxonit s.r.o.)
<mailto:michal.frajt at luxonit.com> wrote: 
> 
> Hi all, 
> 
> 
> 
> We are evaluating ZGC for financial applications and so far have seen 
> promising results. However there seems to be an issue with handling of
weak 
> references that is affecting our custom data distribution framework. 
> 
> 
> 
> Since 2005 we are mainly using the CMS collector for financial
applications 
> based on a custom data distribution framework. The framework is based on 
> weak references used similar way as smart pointers in C++. Application
code 
> holds strong references where framework provides all data via weak 
> references only. A weak reference clear acts as an indication for the 
> framework that data is not required by the application code anymore. The 
> framework provided data are always coupled with a network resource 
> (receiving market updates) or CPU resource (computing 
> aggregations/risk/etc). The framework has huge interest to get the weak 
> reference cleared by the GC as it can deregister network subscription
(less 
> data to read, less data to parse and handle) or stop CPU intensive 
> computations. As long as the weak reference is not cleared (reported via
the 
> reference queue) the framework must permanently update provided data which

> simply requires to dereference weak references regularly. Such
dereferencing 
> happens on an average once per second for each weakly referenced object. 
> 
> 
> 
> In order to get the weak references cleared regularly we were using the 
> incremental CMS (iCMS) which was always running in the background and 
> scanning the complete heap. When the iCMS got announced to be deprecated
we 
> started working on a Hotspot extension and introduced the new 
> CMSTriggerInterval parameter into 8u40 
> (https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8038265) to allow to

> specify a maximum time between CMS collections. This way all our weak 
> references (which are not strongly referenced) are cleared within one or
two 
> CMS iterations invoked by the CMSTriggerInterval or by another reason 
> (occupancy or explicit concurrent GC invoke for example). Usually we 
> configure the CMSTriggerInterval between 5 and 10 minutes. 
> 
> 
> 
> Some years back (2012) we evaluated Azul C4 collector and recognized an 
> issue with the weak references processing. After some discussion we were 
> finally told by Gil Tene that C4 will never clear a weak reference if 
> dereferenced within the C4 major cycle (that time around each 5 minutes).
We 
> were offered a C4 extension where a weak reference might be dereferenced
but 
> the strong reference cannot be stored to any object, only kept at the 
> thread/call stack level. If stored, even for a short moment, it will again

> require the full C4 major cycle without dereferencing it to get it
cleared. 
> Because of this and other reasons our client finally did not purchased the

> Azul C4 collector. We don't know how the C4 is addressing this issue in
the 
> current implementation. 
> 
> 
> 
> Today we got very pleased with the ZGC initial testing results. The test 
> scenario with CMS has ParNew STW 150ms every 10 seconds. It changed to 1ms

> only every 5 minutes when using ZGC (if understanding the reporting 
> correctly). We immediately tested the weak references processing and 
> unfortunately observed the same behaviour as with the C4 collector. It
seems 
> that weak references which are frequently dereferenced are never cleared. 
> Invoking the explicit concurrent GC by the jcmd GC.run does not help
either. 
> 
> 
> 
> Could you please explain us the ZGC weak references handling related to
the 
> described scenario? Is there a way out for us? 
> 
> 
> 
> Your input would be much appreciated. 
> 
> 
> 
> Best regards, 
> Michal Frajt 
> Luxonit 
> 
>