Frequently dereferenced weak references never get cleared

Fri Mar 15 05:55:47 UTC 2019

Since dereferencing a weak reference (by definition) creates a strong reference, doing so in the middle of a concurrent SATB marker (e.g. like G1 and Shenandoah) or a concurrent precise-wavefront marker (Like C4 and ZGC) will keep the newly-strong-referenced object reachable, and considered live by the in-flight marking cycle. This has always been the case for these GC algorithms, and there does not seem to be a practical reason to reduce or remove these clearly-within-the-spec and very well behaved qualities.

Building mechanisms that strongly depend on the timely clearing of weak references that get dereferenced often (with no "quiet" period long enough for a marker to determine unreachability) is simply not a sound design. It "works" robustly when STW collectors are used for collecting the various generations (because the application is frozen for the duration of the mark). It also somewhat-works with multi-pass semi-concurrent markers that revisit mutated heap references and finish up with an STW "cleanup marking" (like the CMS marker does), as long as the strong references were not propagated to the heap, or were quickly overwritten there if they were. But these things only "work" when you make assumptions about the collector behavior that are clearly invalid.

Since pretty much all the mainstream-maintained, non-deprecated, non-STW collectors in server-side JVMs (G1, C4, ZGC, Shenandoah) share the qualities described above, changing your design to correctly deal with weak reference semantics is probably your best bet going forward. There are some straight forward ways to build applications that use weak references without being susceptible to frequent-strengthening problems, including (as we discussed years ago) for your specific use pattern. The most common pattern I've seen used with significant success involves using keep-alive "sentinel" leaf objects (objects hanging off of your temporary stuff using weak refs or phantom refs, and are only accessed when you actually want that stuff to remain alive, but do not themselves refer to other things), and using reference queues to process the detected death of those "sentinel" objects, intentionally and safely unlinking their associated "temporary stuff" object graphs (e.g. removing associated event listeners from listener queues, or cleanly taking items out of lookup tables) upon notice. From java 9 on, java.lang.ref.Cleaner provides a way to do this sort of thing without even managing the queues yourself.

> On Mar 14, 2019, at 2:26 PM, Ing. Michal Frajt (Luxonit s.r.o.) <michal.frajt at luxonit.com> wrote:
> 
> Hi all,
> 
> 
> 
> We are evaluating ZGC for financial applications and so far have seen
> promising results. However there seems to be an issue with handling of weak
> references that is affecting our custom data distribution framework.
> 
> 
> 
> Since 2005 we are mainly using the CMS collector for financial applications
> based on a custom data distribution framework. The framework is based on
> weak references used similar way as smart pointers in C++. Application code
> holds strong references where framework provides all data via weak
> references only. A weak reference clear acts as an indication for the
> framework that data is not required by the application code anymore. The
> framework provided data are always coupled with a network resource
> (receiving market updates) or CPU resource (computing
> aggregations/risk/etc). The framework has huge interest to get the weak
> reference cleared by the GC as it can deregister network subscription (less
> data to read, less data to parse and handle) or stop CPU intensive
> computations. As long as the weak reference is not cleared (reported via the
> reference queue) the framework must permanently update provided data which
> simply requires to dereference weak references regularly. Such dereferencing
> happens on an average once per second for each weakly referenced object.
> 
> 
> 
> In order to get the weak references cleared regularly we were using the
> incremental CMS (iCMS) which was always running in the background and
> scanning the complete heap. When the iCMS got announced to be deprecated we
> started working on a Hotspot extension and introduced the new
> CMSTriggerInterval parameter into 8u40
> (https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8038265) to allow to
> specify a maximum time between CMS collections. This way all our weak
> references (which are not strongly referenced) are cleared within one or two
> CMS iterations invoked by the CMSTriggerInterval or by another reason
> (occupancy or explicit concurrent GC invoke for example). Usually we
> configure the CMSTriggerInterval between 5 and 10 minutes.
> 
> 
> 
> Some years back (2012) we evaluated Azul C4 collector and recognized an
> issue with the weak references processing. After some discussion we were
> finally told by Gil Tene that C4 will never clear a weak reference if
> dereferenced within the C4 major cycle (that time around each 5 minutes). We
> were offered a C4 extension where a weak reference might be dereferenced but
> the strong reference cannot be stored to any object, only kept at the
> thread/call stack level. If stored, even for a short moment, it will again
> require the full C4 major cycle without dereferencing it to get it cleared.
> Because of this and other reasons our client finally did not purchased the
> Azul C4 collector. We don't know how the C4 is addressing this issue in the
> current implementation.
> 
> 
> 
> Today we got very pleased with the ZGC initial testing results. The test
> scenario with CMS has ParNew STW 150ms every 10 seconds. It changed to 1ms
> only every 5 minutes when using ZGC (if understanding the reporting
> correctly). We immediately tested the weak references processing and
> unfortunately observed the same behaviour as with the C4 collector. It seems
> that weak references which are frequently dereferenced are never cleared.
> Invoking the explicit concurrent GC by the jcmd GC.run does not help either.
> 
> 
> 
> Could you please explain us the ZGC weak references handling related to the
> described scenario? Is there a way out for us?
> 
> 
> 
> Your input would be much appreciated.
> 
> 
> 
> Best regards,
> Michal Frajt
> Luxonit
> 
>