JEP 132: More-prompt finalization
peter.levart at gmail.com
Sun May 31 19:32:00 UTC 2015
Thanks for views and opinions. I'll try to confront them in-line...
On 05/29/2015 04:18 AM, David Holmes wrote:
> Hi Peter,
> I guess I'm very concerned about the premise that finalization should
> scale to millions of objects and be performed highly concurrently. To
> me that's sending the wrong message about finalization. It also isn't
> the most effective use of cpu resources - most people would want to do
> useful work on most cpu's most of the time.
Ok, fair enough. It shouldn't be necessary to scale finalization to
millions of objects and be performed concurrently. Normal programs don't
need this. But there is a diagnostic command being developed at this
moment that displays the finalization queue. The utility of such
command, as I understand, is precisely to display when the finalization
thread can not cope and Finalizer(s) accumulate. So there must be that
some hypothetical programs (ab)use finalization or are buggy (deadlock)
so that the queue builds up. To diagnose this, a diagnostic command is
helpful. To fix it, one has to fix the code. But what if the problem is
not that much about the allocation/death rate of finalizable instances
then it is about the heavy code of finalize() methods of those
instances. I agree that such programs have a smell and should be
rewritten to not use finalization but other means of cleanup such as
multiple threads removing WeakReferences from the queue for example or
something completely different and not based on Reference(s). But
wouldn't it be nice if one could simply set a system property for the
max. number of threads processing Finalizer(s)?
I have prepared an improved variant of the prototype that employs a
single ReferenceHandler thread and adds a ForkJoinPool that by default
has a single worker thread which replaces the single finalization
thread. So by default, no more threads are used than currently. If one
wants (s)he can increase the concurrency of finalization with a system
I have also improved the benchmarks that now focus on CPU overhead when
processing references at more typical rates, rather than maximum
throughput. They show that all changes taken together practically half
the CPU time overhead of the finalization processing. So freed CPU time
can be used for more useful work. I have also benchmarked the typical
asynchronous WeakReference processing scenario where one thread removes
enqueued WeakReferences from the queue. Results show about 25% decrease
of CPU time overhead.
Why does the prototype reduce more overhead for finalization than
WeakReference processing? The main improvement in the change is the use
of multiple doubly-linked lists for registration of Finalizer(s) and the
use of lock-less algorithm for the lists. The WeakReference processing
benchmark also uses such lists internally to handle
registration/deregistration of WeakReferences so that the impact of this
part is minimal and the difference of processing overheads between
original and changed JDK code more obvious. (De)registration of
Finalizer(s) OTOH is part of JDK infrastructure, so the improvement to
registration list(s) also shows in the results. The results of
WeakReferece processing benchmark also indicate that reverting to the
use of a single finalization thread that just removes Finalizer(s) from
the ReferenceQueue could lower the overhead even a bit further, but then
it would not be possible to leverage FJ pool to simply configure the
parallelism of finalization. If parallel processing of Finalizer(s) is
an undesirable feature, I could restore the single finalization thread
and the CPU overhead of finalization would be reduced to about 40% of
current overhead with just the changes to data structures.
So, for the curious, here's the improved prototype:
And here are the improved benchmarks (with some results inline):
The benchmark results in the ThroughputBench.java show the output of the
test(s) when run with the Linux "time" command which shows the elapsed
real time and the consumed user and system CPU times. I think this is
relevant for measuring CPU overhead.
So my question is: Is it or is it not desirable to have a configurable
means to parallelize the finalization processing? The reduction of CPU
overhead of infrastructure code should always be desirable, right?
On 05/29/2015 05:57 AM, Kirk Pepperdine wrote:
> Hi Peter,
> It is a very interesting proposal but to further David’s comments, the life-cycle costs of reference objects is horrendous of which the actual process of finalizing an object is only a fraction of that total cost. Unfortunately your micro-benchmark only focuses on one aspect of that cost. In other words, it isn’t very representative of a real concern. In the real world the finalizer *must compete with mutator threads and since F-J is an “all threads on deck” implementation, it doesn’t play well with others. It creates a “tragedy of the commons”. That is situations where everyone behaves rationally with a common resource but to the detriment of the whole group”. In short, parallelizing (F-Jing) *everything* in an application is simply not a good idea. We do not live in an infinite compute environment which means to have to consider the impact of our actions to the entire group.
I changed the prototype to only use a single FJ thread by default
(configurable with a system property). Lowering the CPU overhead of
finalizer processing for 50% is also an improvement. I'm still keeping
finalization FJ-pool for now because it is more scaleable and has less
overhead than a solution with multiple threads removing references from
the same ReferenceQueue. This happens when the FJ-pool is configured
with > 1 parallelism or when user code calls Runtime.runFinalization()
that translates to ForkJoinPool.awaitQuiescence() which lends the
calling thread to help the poll execute the tasks.
> This was one of the points of my recent article in Java Magazine which I wrote to try to counter some of the rhetoric I was hearing in conference about the universal benefits of being able easily parallelize streams in Java 8. Yes, I agree it’s a great feature but it must be used with discretion. Case in point. After I finished writing the article, I started running into a couple of early adopters that had swallowed the parallel message whole indiscriminately parallelizing all of their streams. As you can imagine, they were quite surprised by the results and quickly worked to de-parallelize *all* of the streams in the application.
> To add some ability to parallelize the handling of reference objects seems like a good idea if you are collecting large numbers of reference objects (>10,000 per GC cycle). However if you are collecting large numbers of reference objects you’re most likely doing something else wrong. IME, finalization is extremely useful but really only for a limited number of use cases and none of them (to date) have resulted in the app burning through 1000s of final objects / sec.
> It would be interesting to know why why you picked on this particular issue.
Well, JEP-132 was filed by Oracle, so I thought I'll try to tackle some
of it's goals. I think I at least showed that the VM part of reference
handling is mostly not the performance problem (if there is a problem at
all), but the Java side could be modernized a bit.
> Kind regards,
On 05/29/2015 07:20 PM, Rezaei, Mohammad A. wrote:
> For what it's worth, I fully agree with David and Kirk around finalization not necessarily needing this treatment.
> However, I was hoping this would have the effect of improving (non-finalizable) reference handling. We've seen serious issues in WeakReference handling and have had to write some twisted code to deal with this.
Can you elaborate some more on what twists were necessary or what
problems you had?
> So I guess the question I have to Kirk and David is: do you feel a GC load of 10K WeakReferences per cycle is also "doing something else wrong"?
If there is an elegant way to achieve your goal without using
WeakReferences then it might be better to not use them. But it is also
true that WeakReferences frequently lend an elegant way to solve a
problem. The same goes with finalization which is sometimes even more
> Sorry if this is going off-topic.
You're spot on topic and thanks for your comment.
More information about the core-libs-dev