RFR: 8209189: Make CompiledMethod::do_unloading more concurrent
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Oct 26 15:15:31 UTC 2018
Thank you, Erik, for data.
Looks good. And I am fine with virtual call in is_unloading() since you will have other variant for ZGC - I thought it
is only one.
Thanks,
Vladimir
On 10/26/18 5:20 AM, Erik Österlund wrote:
> Hi Vladimir,
>
> Thank you for having a look at these changes.
>
> On 2018-10-25 19:34, Vladimir Kozlov wrote:
>> Hi Erik,
>>
>> First, what testing you did? This is significant rewrite and requires a lot of testing. I don't see a link to test
>> results in RFE. Any performance testing?
>
> In terms of testing, I have run this through tier 1-6 in mach 5 with no failures related to my changes.
>
> As far as performance testing goes, I did spend some extra attention to the G1 parallel case, as it is most sensitive to
> performance.
> I ran through DaCapo2009, SPECjvm98, SPECjvm2008, SPECjbb2000, SPECjbb2005 and SPECjbb2015 with fairly constrained heap
> size to cause a lot of old GCs. I recorded the time taken for the parallel class unloading and cleanup task which is
> where my changes would affect things.
>
> Here is a summary of the results:
> new DaCapo2009:
> mean 19.46 min 4.97 max 41.94 std 10.74
> old DaCapo2009:
> mean 20.22 min 6.54 max 42.55 std 10.41
>
> new SPECjvm98:
> mean 2.4 min 0.34 max 6.43 std 1.66
> old SPECjvm98:
> mean 3.18 min 0.88 max 8.15 std 2.01
>
> new SPECjvm2008:
> mean 374.75 min 69.72 max 477.76 std 73.42
> old SPECjvm2008:
> mean 345.14 min 84.85 max 496.55 std 101.24
>
> new SPECjbb2000:
> mean 354.69 min 0.61 max 482.97 std 87.87
> old SPECjbb2000:
> mean 360.56 min 0.56 max 482.86 std 85.83
>
> new SPECjbb2005:
> mean 174.69 min 0.8 max 242.71 std 36.37
> old SPECjbb2005:
> mean 174.66 min 0.74 max 242.76 std 36.82
>
> new SPECjbb2015:
> mean 2.71 min 2.38 max 3.05 std 0.47
> old SPECjbb2015:
> mean 4.05 min 3.52 max 4.58 std 0.75
>
> As we can see, my approach seems to perform mostly "a bit better". Looks like the odd one I don't perform better is
> SPECjbb2008, but that is also the benchmark where I see highest standard deviations, so that might just be noise. The
> reason I think why I get overall better results, is because I always do everything in one pass over the code heap,
> instead of two with an intrusive wait barrier separating the two parallel phases. And, as a bonus when you do it this
> way, for each nmethod, you first walk the oop reloc entries to figure out if the nmethod is alive, and then immediately
> clean the same nmethod, walking the same reloc entries again, to find IC caches, while the caches are warm. That
> improved data locality gives this approach an edge. And the bigger the code cache, the bigger that advantage becomes.
>
> I have not looked with the same level of detail on SerialGC and ParallelGC, because it seems strictly less interesting
> (when you do a full STW GC, the unloading times is by far not the biggest problem, which is why it's done with 1
> thread). However, FWIW, the same improved data locality should be beneficial there too by again doing everything in one
> single pass over the code cache instead of two.
>
>> It seems you removed the only case (in do_unloading()) where CodeCache::do_unloading_nmethod_caches() is called. I
>> can't find other place where it is called after your change. Why keep it then?
>
> Good point. Removed.
>
>> It seems CompiledMethod::is_unloading() is performance critical. Can we make
>> IsUnloadingBehaviour::current()->is_unloading(this) as non virtual call? I see only ClosureIsUnloadingBehaviour
>> subclass of IsUnloadingBehaviour.
>
> Yes, is_unloading does get pretty hot indeed. However, after it has been called once on an nmethod, the result of the
> computation is cached. So the virtual call you point out will only be called once per nmethod, and then the cached value
> will be used instead. And I need to override that virtual call with ZGC, as this will be called outside of safepoints
> and we have to use locking to guarantee safety of computing the value, especially since the immediate oops in the code
> stream are misaligned, and will get patched concurrently by nmethod entry barriers.
>
> Thanks,
> /Erik
>
>> Thanks,
>> Vladimir
>>
>> On 10/19/18 7:22 AM, Erik Österlund wrote:
>>> Hi,
>>>
>>> Today the nmethods are unloaded either serially or in parallel due to GC (and e.g. class unloading). With ZGC
>>> concurrent class unloading, a concurrent fashion is required. Rather than inventing yet a third way of unloading
>>> nmethods due to class unloading, it would be ideal if there could be one unified way of doing it that makes everyone
>>> happy.
>>>
>>> To solve this problem in a more general way, a new member function on CompiledMethod is added: is_unloading(). It
>>> returns whether a CompiledMethod has broken oops. In order to not have to iterate over all the oops every time this
>>> question is asked, it caches the result, so it needs to be computed only once per "epoch". Every time a CodeCache
>>> unloading is triggered by the GC, a new epoch is started, which forces re-computation of whether the CompiledMethod
>>> is_unloading() the first time it is called.
>>>
>>> So by having is_unloading() be lazily evaluated, it is now possible to build a do_unloading() method on nmethod that
>>> simply makes the nmethod unloaded if it is_unloading(), and otherwise cleans the various caches. Cleaning e.g. the
>>> inline caches of said nmethod, uses the same is_unloading() method on its targets to figure out if the IC cache
>>> should be cleaned or not. Again, the epoched caching mechanism makes sure we don't recompute this value.
>>>
>>> The new do_unloading() method may be called in both serial, parallel and concurrent contexts. So I removed the
>>> parallel variation of this that we had before, that unnecessarily postponed the unloading due to not having computed
>>> whether the nmethod should be unloaded or not. Since that is now computed on-demand lazily, there is no longer a need
>>> for postponing this, nor to have two phases for parallel nmethod unloading.
>>>
>>> While there is more work involved in making concurrent nmethod unloading work, this is a good cleanup towards that goal.
>>>
>>> Bug ID:
>>> https://bugs.openjdk.java.net/browse/JDK-8209189
>>>
>>> Webrev:
>>> cr.openjdk.java.net/~eosterlund/8209189/webrev.00/
>>>
>>> Thanks,
>>> /Erik
>
More information about the hotspot-dev
mailing list