JEP 132: More-prompt finalization

Sun May 31 19:32:00 UTC 2015

Hi,

Thanks for views and opinions. I'll try to confront them in-line...

On 05/29/2015 04:18 AM, David Holmes wrote:
> Hi Peter,
>
> I guess I'm very concerned about the premise that finalization should 
> scale to millions of objects and be performed highly concurrently. To 
> me that's sending the wrong message about finalization. It also isn't 
> the most effective use of cpu resources - most people would want to do 
> useful work on most cpu's most of the time.
>
> Cheers,
> David

@David

Ok, fair enough. It shouldn't be necessary to scale finalization to 
millions of objects and be performed concurrently. Normal programs don't 
need this. But there is a diagnostic command being developed at this 
moment that displays the finalization queue. The utility of such 
command, as I understand, is precisely to display when the finalization 
thread can not cope and Finalizer(s) accumulate. So there must be that 
some hypothetical programs (ab)use finalization or are buggy (deadlock) 
so that the queue builds up. To diagnose this, a diagnostic command is 
helpful. To fix it, one has to fix the code. But what if the problem is 
not that much about the allocation/death rate of finalizable instances 
then it is about the heavy code of finalize() methods of those 
instances. I agree that such programs have a smell and should be 
rewritten to not use finalization but other means of cleanup such as 
multiple threads removing WeakReferences from the queue for example or 
something completely different and not based on Reference(s). But 
wouldn't it be nice if one could simply set a system property for the 
max. number of threads processing Finalizer(s)?

I have prepared an improved variant of the prototype that employs a 
single ReferenceHandler thread and adds a ForkJoinPool that by default 
has a single worker thread which replaces the single finalization 
thread. So by default, no more threads are used than currently. If one 
wants (s)he can increase the concurrency of finalization with a system 
property.

I have also improved the benchmarks that now focus on CPU overhead when 
processing references at more typical rates, rather than maximum 
throughput. They show that all changes taken together practically half 
the CPU time overhead of the finalization processing. So freed CPU time 
can be used for more useful work. I have also benchmarked the typical 
asynchronous WeakReference processing scenario where one thread removes 
enqueued WeakReferences from the queue. Results show about 25% decrease 
of CPU time overhead.

Why does the prototype reduce more overhead for finalization than 
WeakReference processing? The main improvement in the change is the use 
of multiple doubly-linked lists for registration of Finalizer(s) and the 
use of lock-less algorithm for the lists. The WeakReference processing 
benchmark also uses such lists internally to handle 
registration/deregistration of WeakReferences so that the impact of this 
part is minimal and the difference of processing overheads between 
original and changed JDK code more obvious. (De)registration of 
Finalizer(s) OTOH is part of JDK infrastructure, so the improvement to 
registration list(s) also shows in the results. The results of 
WeakReferece processing benchmark also indicate that reverting to the 
use of a single finalization thread that just removes Finalizer(s) from 
the ReferenceQueue could lower the overhead even a bit further, but then 
it would not be possible to leverage FJ pool to simply configure the 
parallelism of finalization. If parallel processing of Finalizer(s) is 
an undesirable feature, I could restore the single finalization thread 
and the CPU overhead of finalization would be reduced to about 40% of 
current overhead with just the changes to data structures.

So, for the curious, here's the improved prototype:

http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.02/

And here are the improved benchmarks (with some results inline):

http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/refproc/

The benchmark results in the ThroughputBench.java show the output of the 
test(s) when run with the Linux "time" command which shows the elapsed 
real time and the consumed user and system CPU times. I think this is 
relevant for measuring CPU overhead.

So my question is: Is it or is it not desirable to have a configurable 
means to parallelize the finalization processing? The reduction of CPU 
overhead of infrastructure code should always be desirable, right?

On 05/29/2015 05:57 AM, Kirk Pepperdine wrote:
> Hi Peter,
>
> It is a very interesting proposal but to further David’s comments, the life-cycle costs of reference objects is horrendous of which the actual process of finalizing an object is only a fraction of that total cost. Unfortunately your micro-benchmark only focuses on one aspect of that cost. In other words, it isn’t very representative of a real concern. In the real world the finalizer *must compete with mutator threads and since F-J is an “all threads on deck” implementation, it doesn’t play well with others. It creates a “tragedy of the commons”. That is situations where everyone behaves rationally with a common resource but to the detriment of the whole group”. In short, parallelizing (F-Jing) *everything* in an application is simply not a good idea. We do not live in an infinite compute environment which means to have to consider the impact of our actions to the entire group.

@Kirk

I changed the prototype to only use a single FJ thread by default 
(configurable with a system property). Lowering the CPU overhead of 
finalizer processing for 50% is also an improvement. I'm still keeping 
finalization FJ-pool for now because it is more scaleable and has less 
overhead than a solution with multiple threads removing references from 
the same ReferenceQueue. This happens when the FJ-pool is configured 
with > 1 parallelism or when user code calls Runtime.runFinalization() 
that translates to ForkJoinPool.awaitQuiescence() which lends the 
calling thread to help the poll execute the tasks.

> This was one of the points of my recent article in Java Magazine which I wrote to try to counter some of the rhetoric I was hearing in conference about the universal benefits of being able easily parallelize streams in Java 8. Yes, I agree it’s a great feature but it must be used with discretion. Case in point. After I finished writing the article, I started running into a couple of early adopters that had swallowed the parallel message whole indiscriminately parallelizing all of their streams. As you can imagine, they were quite surprised by the results and quickly worked to de-parallelize *all* of the streams in the application.
>
> To add some ability to parallelize the handling of reference objects seems like a good idea if you are collecting large numbers of reference objects (>10,000 per GC cycle). However if you are collecting large numbers of reference objects you’re most likely doing something else wrong. IME, finalization is extremely useful but really only for a limited number of use cases and none of them (to date) have resulted in the app burning through 1000s of final objects / sec.
>
> It would be interesting to know why why you picked on this particular issue.

Well, JEP-132 was filed by Oracle, so I thought I'll try to tackle some 
of it's goals. I think I at least showed that the VM part of reference 
handling is mostly not the performance problem (if there is a problem at 
all), but the Java side could be modernized a bit.

> Kind regards,
> Kirk

On 05/29/2015 07:20 PM, Rezaei, Mohammad A. wrote:
> For what it's worth, I fully agree with David and Kirk around finalization not necessarily needing this treatment.
>
> However, I was hoping this would have the effect of improving (non-finalizable) reference handling. We've seen serious issues in WeakReference handling and have had to write some twisted code to deal with this.

@Moh

Can you elaborate some more on what twists were necessary or what 
problems you had?

> So I guess the question I have to Kirk and David is: do you feel a GC load of 10K WeakReferences per cycle is also "doing something else wrong"?

If there is an elegant way to achieve your goal without using 
WeakReferences then it might be better to not use them. But it is also 
true that WeakReferences frequently lend an elegant way to solve a 
problem. The same goes with finalization which is sometimes even more 
elegant.

> Sorry if this is going off-topic.

You're spot on topic and thanks for your comment.

> Thanks
> Moh
>
>

Regards, Peter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150531/60352ab4/attachment.htm>