RFR: 8209189: Make CompiledMethod::do_unloading more concurrent

Fri Oct 26 12:20:33 UTC 2018

Hi Vladimir,

Thank you for having a look at these changes.

On 2018-10-25 19:34, Vladimir Kozlov wrote:
> Hi Erik,
>
> First, what testing you did? This is significant rewrite and requires 
> a lot of testing. I don't see a link to test results in RFE. Any 
> performance testing?

In terms of testing, I have run this through tier 1-6 in mach 5 with no 
failures related to my changes.

As far as performance testing goes, I did spend some extra attention to 
the G1 parallel case, as it is most sensitive to performance.
I ran through DaCapo2009, SPECjvm98, SPECjvm2008, SPECjbb2000, 
SPECjbb2005 and SPECjbb2015 with fairly constrained heap size to cause a 
lot of old GCs. I recorded the time taken for the parallel class 
unloading and cleanup task which is where my changes would affect things.

Here is a summary of the results:
new DaCapo2009:
mean    19.46   min     4.97    max     41.94   std     10.74
old DaCapo2009:
mean    20.22   min     6.54    max     42.55   std     10.41

new SPECjvm98:
mean    2.4     min     0.34    max     6.43    std     1.66
old SPECjvm98:
mean    3.18    min     0.88    max     8.15    std     2.01

new SPECjvm2008:
mean    374.75  min     69.72   max     477.76  std     73.42
old SPECjvm2008:
mean    345.14  min     84.85   max     496.55  std     101.24

new SPECjbb2000:
mean    354.69  min     0.61    max     482.97  std     87.87
old SPECjbb2000:
mean    360.56  min     0.56    max     482.86  std     85.83

new SPECjbb2005:
mean    174.69  min     0.8     max     242.71  std     36.37
old SPECjbb2005:
mean    174.66  min     0.74    max     242.76  std     36.82

new SPECjbb2015:
mean    2.71    min     2.38    max     3.05    std     0.47
old SPECjbb2015:
mean    4.05    min     3.52    max     4.58    std     0.75

As we can see, my approach seems to perform mostly "a bit better". Looks 
like the odd one I don't perform better is SPECjbb2008, but that is also 
the benchmark where I see highest standard deviations, so that might 
just be noise. The reason I think why I get overall better results, is 
because I always do everything in one pass over the code heap, instead 
of two with an intrusive wait barrier separating the two parallel 
phases. And, as a bonus when you do it this way, for each nmethod, you 
first walk the oop reloc entries to figure out if the nmethod is alive, 
and then immediately clean the same nmethod, walking the same reloc 
entries again, to find IC caches, while the caches are warm. That 
improved data locality gives this approach an edge. And the bigger the 
code cache, the bigger that advantage becomes.

I have not looked with the same level of detail on SerialGC and 
ParallelGC, because it seems strictly less interesting (when you do a 
full STW GC, the unloading times is by far not the biggest problem, 
which is why it's done with 1 thread). However, FWIW, the same improved 
data locality should be beneficial there too by again doing everything 
in one single pass over the code cache instead of two.

> It seems you removed the only case (in do_unloading()) where 
> CodeCache::do_unloading_nmethod_caches() is called. I can't find other 
> place where it is called after your change. Why keep it then?

Good point. Removed.

> It seems CompiledMethod::is_unloading() is performance critical. Can 
> we make IsUnloadingBehaviour::current()->is_unloading(this) as non 
> virtual call? I see only ClosureIsUnloadingBehaviour subclass of 
> IsUnloadingBehaviour.

Yes, is_unloading does get pretty hot indeed. However, after it has been 
called once on an nmethod, the result of the computation is cached. So 
the virtual call you point out will only be called once per nmethod, and 
then the cached value will be used instead. And I need to override that 
virtual call with ZGC, as this will be called outside of safepoints and 
we have to use locking to guarantee safety of computing the value, 
especially since the immediate oops in the code stream are misaligned, 
and will get patched concurrently by nmethod entry barriers.

Thanks,
/Erik

> Thanks,
> Vladimir
>
> On 10/19/18 7:22 AM, Erik Österlund wrote:
>> Hi,
>>
>> Today the nmethods are unloaded either serially or in parallel due to 
>> GC (and e.g. class unloading). With ZGC concurrent class unloading, a 
>> concurrent fashion is required. Rather than inventing yet a third way 
>> of unloading nmethods due to class unloading, it would be ideal if 
>> there could be one unified way of doing it that makes everyone happy.
>>
>> To solve this problem in a more general way, a new member function on 
>> CompiledMethod is added: is_unloading(). It returns whether a 
>> CompiledMethod has broken oops. In order to not have to iterate over 
>> all the oops every time this question is asked, it caches the result, 
>> so it needs to be computed only once per "epoch". Every time a 
>> CodeCache unloading is triggered by the GC, a new epoch is started, 
>> which forces re-computation of whether the CompiledMethod 
>> is_unloading() the first time it is called.
>>
>> So by having is_unloading() be lazily evaluated, it is now possible 
>> to build a do_unloading() method on nmethod that simply makes the 
>> nmethod unloaded if it is_unloading(), and otherwise cleans the 
>> various caches. Cleaning e.g. the inline caches of said nmethod, uses 
>> the same is_unloading() method on its targets to figure out if the IC 
>> cache should be cleaned or not. Again, the epoched caching mechanism 
>> makes sure we don't recompute this value.
>>
>> The new do_unloading() method may be called in both serial, parallel 
>> and concurrent contexts. So I removed the parallel variation of this 
>> that we had before, that unnecessarily postponed the unloading due to 
>> not having computed whether the nmethod should be unloaded or not. 
>> Since that is now computed on-demand lazily, there is no longer a 
>> need for postponing this, nor to have two phases for parallel nmethod 
>> unloading.
>>
>> While there is more work involved in making concurrent nmethod 
>> unloading work, this is a good cleanup towards that goal.
>>
>> Bug ID:
>> https://bugs.openjdk.java.net/browse/JDK-8209189
>>
>> Webrev:
>> cr.openjdk.java.net/~eosterlund/8209189/webrev.00/
>>
>> Thanks,
>> /Erik