RFR: JDK-8229871: Improve performance of Method.copy() and leafCopy()

Peter Levart peter.levart at gmail.com
Wed Oct 23 21:09:54 UTC 2019


Hi Ogata,

I finally managed to find some time to experiment with this. To measure 
invocation performance I created the following JMH benchmark [1]. It 
measures the invocation speed of instance and static methods using either:
- direct invocation (bytecodes)
- invocation via constant Method instance
- invocation via variable Method instance

Here are the results using unmodified JDK 14 build (baseline):

Benchmark                                         Mode  Cnt Score   
Error  Units
ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.272 ± 
0.002  ns/op
ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10  16.609 ± 
0.162  ns/op
ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.715 ± 
0.163  ns/op
ReflectionSpeedBenchmark.staticDirect             avgt   10   2.275 ± 
0.012  ns/op
ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10  16.351 ± 
0.330  ns/op
ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  16.259 ± 
0.196  ns/op

Your webrev.04 [2] has a slight (~ 6%) improvement for constant Method 
instance (i.e. assigned to static final field):

Benchmark                                         Mode  Cnt Score   
Error  Units
ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.273 ± 
0.003  ns/op
ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10  15.628 ± 
0.115  ns/op
ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.706 ± 
0.144  ns/op
ReflectionSpeedBenchmark.staticDirect             avgt   10   2.277 ± 
0.008  ns/op
ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10  15.285 ± 
0.109  ns/op
ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  16.600 ± 
0.222  ns/op

Now I have prepared another variant [3] that replaces 
DelegatingMethodAccessorImpl with SlowFastMethodAccessorImpl and 
produces the following result:

Benchmark                                         Mode  Cnt Score   
Error  Units
ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.371 ± 
0.027  ns/op
ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10   7.161 ± 
0.066  ns/op
ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.501 ± 
0.154  ns/op
ReflectionSpeedBenchmark.staticDirect             avgt   10   2.373 ± 
0.017  ns/op
ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10   6.971 ± 
0.103  ns/op
ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  15.893 ± 
0.110  ns/op

This is more than twice as fast as the baseline for constant Method 
instances while not degrading performance for variable Method instances.


[1] 
http://cr.openjdk.java.net/~plevart/jdk-dev/8229871_Method.methodAccessor/ReflectionSpeedBenchmark.java
[2] http://cr.openjdk.java.net/~ogatak/8229871/webrev.04/
[3] 
http://cr.openjdk.java.net/~plevart/jdk-dev/8229871_Method.methodAccessor/webrev.01/


Could you spin this one [3] on your SPECjbb2015 benchmark to see if it 
still performs favorably?


Regards, Peter

On 10/11/19 12:17 PM, Kazunori Ogata wrote:
> Hi Peter,
>
> Thank you for the comment and suggestion of the fix.
>
> I tried to pick up your change w.r.t. methodAccessor:
> https://cr.openjdk.java.net/~ogatak/8229871/webrev.04/
>
>
> Regarding micro benchmark, my original motivation of this change is to
> improve performance of Class.getMethods(), which calls Method.copy() for
> each declared method to create a copy of Method[].
>
> I measured my simple microbench:
> https://cr.openjdk.java.net/~ogatak/8229871/GetMethodsBench.java
>
> Base code: Elapsed time = 4808 ms
> webrev.01: Elapsed time = 4536 ms  (+   6%)
> webrev.02: Elapsed time = 2331 ms  (+106%)
> webrev.04: Elapsed time = 3746 ms  (+ 28%)
>
> I'll measure larger benchmark and try to think if we can reduce the
> overhead of memory barrier.
>
>
> Regards,
> Ogata
>
>
> Peter Levart <peter.levart at gmail.com> wrote on 2019/10/09 16:44:13:
>
>> From: Peter Levart <peter.levart at gmail.com>
>> To: Kazunori Ogata <OGATAK at jp.ibm.com>, core-libs-dev at openjdk.java.net
>> Date: 2019/10/09 16:44
>> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of
>> Method.copy() and leafCopy()
>>
>> Hi Ogata,
>>
>> May I just add that volatile field ensured that MethodAccessor object
> was
>> correctly published. DelegatingMethodAccessortImpl is not safe to be
>> published via data race because it contains plain `delegate` field
>> initialized in the constructor. In addition, the object that is first
>> assigned to that field is NativeMethodAccessorImpl which has plain
>> `parent` field. You can get NPE when invoking the Method.invoke from
>> multuiple threads if Method.methodAccessor is not volatile.
>>
>> In addition, It would be nice to have two microbenchmarks exercising:
>> a) Method copy performance
>> b) Method invocation performance
>>
>> Regards, Peter
>>
>> P.S. When exploring the possibility of an alternative MethodAccessor
>> implementation (using MethodHandle(s)):
>>
>>
> http://cr.openjdk.java.net/~plevart/jdk-dev/6824466_MHReflectionAccessors/
>> webrev.00.2/
>>
>> ...I found out that it was possible to re-arrange the play between
>> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and generated
>> MethodAccessor in such a way that the DelegatingMethodAccessortImpl
>> becomes safe to be published via data race. This allowed for
>> Method.methodAccessor field to become plain field. In addition this
> field
>> can be made @Stable which further optimizes access to MethodAccessor
>> instance when Method instance can be constant-folded, which showed in
>> special microbenchmarks.
>>
>> So perhaps you could try to use above implementation (just changes to
>> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and part of
>> Reflection factory but without MH* stuff) and measure it against current
>> and your implementation (which as shown above has a data-race flaw).
>> On 10/8/19 12:23 PM, Kazunori Ogata wrote:
>> Hi all,
>>
>> I posted two changes and got reply that performance evaluation is
> needed.
>> I found that making Method.methodAccessor non-volatile (webrev.02) is
>> better than avoid copying methodAccessor value when it is null
>> (webrev.01), as shown below.
>>
>> So I'd like to ask review of the former change.  I updated weberv using
>> the latest code base (though there was no difference from webrev.02):
>>
>> Webrev: http://cr.openjdk.java.net/~ogatak/8229871/webrev.03/
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8229871
>>
>>
>> For this performance evaluation, I calculated 75 percentile of 9 runs of
>> SPECjbb2015 and 60 percentile of 50 runs of DaCapo to omit outliers.  I
>> bound a JVM to a NUMA node and set the number of GC threads to the same
> as
>> the number of physical cores.  These tuning reduced run-to-run
> fluctuation
>> on POWER (as usual...).
>>
>> SPECjbb2015:
>>    webrev.02:  critical jOPS +1.6%, max jOPS +0.2%
>>    webrev.01:  critical jOPS +0.4%, max jOPS +0.2%
>>
>>
>> For DaCapo, some benchmark still improved performance and some degraded,
>> but the geometric mean of all benchmarks were small:
>>    weberv.02: +0.3%
>>    weberv.01: +0.2%
>>
>> The difference of improvement/degradation between the two changes in
> each
>> benchmark were less than 0.8%.
>>
>> The range of improvement/degradation in each benchmark were:
>>    webrev.02: between +2.4% and -1.0%
>>    webrev.01: between +1.6% and -1.8%
>>
>>
>> So I think webrev.02 (i.e., making methodAccessor non-volatile) is a
> good
>> change, since it improved SPECjbb critical jOPS by 1.6%.
>>
>>
>> Regards,
>> Ogata
>>
>>
>> Kazunori Ogata/Japan/IBM wrote on 2019/08/27 15:41:39:
>>
>> From: Kazunori Ogata/Japan/IBM
>> To: Mandy Chung <mandy.chung at oracle.com>
>> Cc: core-libs-dev at openjdk.java.net
>> Date: 2019/08/27 15:41
>> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy() and
>> leafCopy()
>> Hi Mandy,
>>
>> Let me post interim results of the performance evaluation, though I'm
>> still measuring benchmarks and analyzing them.
>>
>> For SPECjbb2015, skipping storing null (webrev.01) was faster than
>> making
>> methodAccessor non-volatile (webrev.02).  The improvements of each of
>> the
>> patches in maxJOPS/criticalJOPS were 2.6%/3.9% and 1.8%/2.9%,
>> respectively.  This is only an average of six runs.
>>
>> For DaCapo, the results were mixed.  In some benchmark, both of the
>> changes degraded performance.  In some others, webrev.01 was better, but
>
>> weberv.02 was better in some others.
>>
>> I'll continue evaluation, but it is helpful if you could give me some
>> hints on why webrev.01 can be better than webrev.02 in SPECjbb2015.
>>
>> Regards,
>> Ogata
>>
>> Kazunori Ogata/Japan/IBM wrote on 2019/08/21 20:02:41:
>>
>> From: Kazunori Ogata/Japan/IBM
>> To: Mandy Chung <mandy.chung at oracle.com>
>> Cc: core-libs-dev at openjdk.java.net
>> Date: 2019/08/21 20:02
>> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy()
>> and leafCopy()
>> Hi Mandy,
>>
>> Thank you for reviewing the webrev.  I updated it to add a space after
>> "if" and also put four spaces for indentation (it was three).
>>
>> http://cr.openjdk.java.net/~ogatak/8229871/webrev.01/
>>
>> Thank you so much for checking the history of fieldAccessor.  I was
>> surprised that fieldAccessor was made non-volatile in JDK5, but
>> methodAccessor was left as volatile for 15 years after that...
>>
>> I agree we need benchmark data.  My simple micro benchmark that
>> repeats
>> invoking Class.getMethods() improved performance by 70% when it made
>> non-
>> volatile (as shown in the following webrev).  I'll try to run larger
>> benchmarks, such as SPECjbb2015, to see real impact.
>>
>> http://cr.openjdk.java.net/~ogatak/8229871/webrev.02/
>>
>> Regards,
>> Ogata
>>
>> Mandy Chung <mandy.chung at oracle.com> wrote on 2019/08/21 01:21:42:
>>
>> From: Mandy Chung <mandy.chung at oracle.com>
>> To: Kazunori Ogata <OGATAK at jp.ibm.com>
>> Cc: core-libs-dev at openjdk.java.net
>> Date: 2019/08/21 01:21
>> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Imporve performance of
>> Method.copy() and leafCopy()
>>
>> Hi Ogata,
>>
>> The patch looks okay.  Nit: please add a space between if and (.
>>
>> About volatile methodAccessor field, I checked the history.  Both
>> fieldAccessor and methodAccessor were started as volatile and the
>> fieldAccessor declaration was updated due to JDK-5044412.   As you
>> observe, I think the methodAccessor field could be made
>> non-volatile.
>> OTOH that might impact when it's inflated to spin bytecode for this
>> method invocation.  I don't know how importance to keep its volatile
>> vs
>> non-volatile in practice without doing benchmarking/real application
>> testing.
>>
>> Mandy
>>
>> On 8/19/19 2:51 AM, Kazunori Ogata wrote:
>> Hi,
>>
>> May I have review for "JDK-8229871: Imporve performance of
>> Method.copy()
>> and leafCopy()"?
>>
>> Method.copy() and leafCopy() creates a copy of a Method object
>> with
>> sharing MethodAccessor object. Since the methodAccessor field is a
>> volatile variable, copying this field needs memory fence to ensure
>> the
>> field is visible to all threads on the weak memory platforms such
>> as POWER
>> and ARM.
>>
>> When the methodAccessor of the root object is null (i.e., not
>> initialized
>> yet), we do not need to copy the null value because this field of
>> the
>> copied object has been initialized to null in the constructor. We
>> can
>> reduce overhead of the memory fence only when the root's
>> methodAccessor is
>> non-null. This change improved performance by 5.8% using a micro
>> benchmark
>> that repeatedly invokes Class.getMethods().
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8229871
>>
>> Webrev: http://cr.openjdk.java.net/~ogatak/8229871/webrev.00/
>>
>>
>> By the way, why Method.methodAccessor is volatile, while
>> Field.fieldAccessor and Field.overrideFieldAccessor are not
>> volatile?  I
>> know the use of volatile reduces probability of creating
>> duplicated method
>> accessor, but the chance still exists.  I couldn't find the
>> difference
>> between Method and Field classes to make Method.methodAccessor
>> volatile.
>> If we can make it non-volatile, it is more preferable than a quick
>> hack
>> above.
>>
>>
>> Regards,
>> Ogata
>>
>>
>>
>



More information about the core-libs-dev mailing list