RFR: 8209189: Make CompiledMethod::do_unloading more concurrent
Erik Österlund
erik.osterlund at oracle.com
Fri Oct 26 15:18:07 UTC 2018
Hi Vladimir,
Thank you for the review!
/Erik
On 2018-10-26 17:15, Vladimir Kozlov wrote:
> Thank you, Erik, for data.
>
> Looks good. And I am fine with virtual call in is_unloading() since
> you will have other variant for ZGC - I thought it is only one.
>
> Thanks,
> Vladimir
>
> On 10/26/18 5:20 AM, Erik Österlund wrote:
>> Hi Vladimir,
>>
>> Thank you for having a look at these changes.
>>
>> On 2018-10-25 19:34, Vladimir Kozlov wrote:
>>> Hi Erik,
>>>
>>> First, what testing you did? This is significant rewrite and
>>> requires a lot of testing. I don't see a link to test results in
>>> RFE. Any performance testing?
>>
>> In terms of testing, I have run this through tier 1-6 in mach 5 with
>> no failures related to my changes.
>>
>> As far as performance testing goes, I did spend some extra attention
>> to the G1 parallel case, as it is most sensitive to performance.
>> I ran through DaCapo2009, SPECjvm98, SPECjvm2008, SPECjbb2000,
>> SPECjbb2005 and SPECjbb2015 with fairly constrained heap size to
>> cause a lot of old GCs. I recorded the time taken for the parallel
>> class unloading and cleanup task which is where my changes would
>> affect things.
>>
>> Here is a summary of the results:
>> new DaCapo2009:
>> mean 19.46 min 4.97 max 41.94 std 10.74
>> old DaCapo2009:
>> mean 20.22 min 6.54 max 42.55 std 10.41
>>
>> new SPECjvm98:
>> mean 2.4 min 0.34 max 6.43 std 1.66
>> old SPECjvm98:
>> mean 3.18 min 0.88 max 8.15 std 2.01
>>
>> new SPECjvm2008:
>> mean 374.75 min 69.72 max 477.76 std 73.42
>> old SPECjvm2008:
>> mean 345.14 min 84.85 max 496.55 std 101.24
>>
>> new SPECjbb2000:
>> mean 354.69 min 0.61 max 482.97 std 87.87
>> old SPECjbb2000:
>> mean 360.56 min 0.56 max 482.86 std 85.83
>>
>> new SPECjbb2005:
>> mean 174.69 min 0.8 max 242.71 std 36.37
>> old SPECjbb2005:
>> mean 174.66 min 0.74 max 242.76 std 36.82
>>
>> new SPECjbb2015:
>> mean 2.71 min 2.38 max 3.05 std 0.47
>> old SPECjbb2015:
>> mean 4.05 min 3.52 max 4.58 std 0.75
>>
>> As we can see, my approach seems to perform mostly "a bit better".
>> Looks like the odd one I don't perform better is SPECjbb2008, but
>> that is also the benchmark where I see highest standard deviations,
>> so that might just be noise. The reason I think why I get overall
>> better results, is because I always do everything in one pass over
>> the code heap, instead of two with an intrusive wait barrier
>> separating the two parallel phases. And, as a bonus when you do it
>> this way, for each nmethod, you first walk the oop reloc entries to
>> figure out if the nmethod is alive, and then immediately clean the
>> same nmethod, walking the same reloc entries again, to find IC
>> caches, while the caches are warm. That improved data locality gives
>> this approach an edge. And the bigger the code cache, the bigger that
>> advantage becomes.
>>
>> I have not looked with the same level of detail on SerialGC and
>> ParallelGC, because it seems strictly less interesting (when you do a
>> full STW GC, the unloading times is by far not the biggest problem,
>> which is why it's done with 1 thread). However, FWIW, the same
>> improved data locality should be beneficial there too by again doing
>> everything in one single pass over the code cache instead of two.
>>
>>> It seems you removed the only case (in do_unloading()) where
>>> CodeCache::do_unloading_nmethod_caches() is called. I can't find
>>> other place where it is called after your change. Why keep it then?
>>
>> Good point. Removed.
>>
>>> It seems CompiledMethod::is_unloading() is performance critical. Can
>>> we make IsUnloadingBehaviour::current()->is_unloading(this) as non
>>> virtual call? I see only ClosureIsUnloadingBehaviour subclass of
>>> IsUnloadingBehaviour.
>>
>> Yes, is_unloading does get pretty hot indeed. However, after it has
>> been called once on an nmethod, the result of the computation is
>> cached. So the virtual call you point out will only be called once
>> per nmethod, and then the cached value will be used instead. And I
>> need to override that virtual call with ZGC, as this will be called
>> outside of safepoints and we have to use locking to guarantee safety
>> of computing the value, especially since the immediate oops in the
>> code stream are misaligned, and will get patched concurrently by
>> nmethod entry barriers.
>>
>> Thanks,
>> /Erik
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 10/19/18 7:22 AM, Erik Österlund wrote:
>>>> Hi,
>>>>
>>>> Today the nmethods are unloaded either serially or in parallel due
>>>> to GC (and e.g. class unloading). With ZGC concurrent class
>>>> unloading, a concurrent fashion is required. Rather than inventing
>>>> yet a third way of unloading nmethods due to class unloading, it
>>>> would be ideal if there could be one unified way of doing it that
>>>> makes everyone happy.
>>>>
>>>> To solve this problem in a more general way, a new member function
>>>> on CompiledMethod is added: is_unloading(). It returns whether a
>>>> CompiledMethod has broken oops. In order to not have to iterate
>>>> over all the oops every time this question is asked, it caches the
>>>> result, so it needs to be computed only once per "epoch". Every
>>>> time a CodeCache unloading is triggered by the GC, a new epoch is
>>>> started, which forces re-computation of whether the CompiledMethod
>>>> is_unloading() the first time it is called.
>>>>
>>>> So by having is_unloading() be lazily evaluated, it is now possible
>>>> to build a do_unloading() method on nmethod that simply makes the
>>>> nmethod unloaded if it is_unloading(), and otherwise cleans the
>>>> various caches. Cleaning e.g. the inline caches of said nmethod,
>>>> uses the same is_unloading() method on its targets to figure out if
>>>> the IC cache should be cleaned or not. Again, the epoched caching
>>>> mechanism makes sure we don't recompute this value.
>>>>
>>>> The new do_unloading() method may be called in both serial,
>>>> parallel and concurrent contexts. So I removed the parallel
>>>> variation of this that we had before, that unnecessarily postponed
>>>> the unloading due to not having computed whether the nmethod should
>>>> be unloaded or not. Since that is now computed on-demand lazily,
>>>> there is no longer a need for postponing this, nor to have two
>>>> phases for parallel nmethod unloading.
>>>>
>>>> While there is more work involved in making concurrent nmethod
>>>> unloading work, this is a good cleanup towards that goal.
>>>>
>>>> Bug ID:
>>>> https://bugs.openjdk.java.net/browse/JDK-8209189
>>>>
>>>> Webrev:
>>>> cr.openjdk.java.net/~eosterlund/8209189/webrev.00/
>>>>
>>>> Thanks,
>>>> /Erik
>>
More information about the hotspot-dev
mailing list