RFR: JDK-8229871: Improve performance of Method.copy() and leafCopy()

Kazunori Ogata OGATAK at jp.ibm.com
Wed Dec 11 11:17:33 UTC 2019


Hi,

I re-evaluated the performance of Peter's improvement [1] using the same 
base version.  Sorry for taking long time, but it took time to verify if 
results are reliable.

The Peter's version reduced elapsed time of a micro bench [2] that 
repeatedly calls Class.getMethods() by 30.3%.

However, this change did not affect so much on the scores of macro 
benchmarks, like SPECjbb2015 and DaCapo.

For SPECjbb2015, critical jOPS improved by 0.4%, but max jOPS degraded by 
0.1%.  (75 percentile of 20 runs)  The magnitude of the 
improvement/degradation was very small, in any way.

For DaCapo, performance differences range from +2.9% (tradebeans) to -2.6% 
(lusearch) in 60 percentile of 50 runs.  Since DaCapo scores tends to 
fluctuate from run to run, I think this result is in a range of run-to-run 
variation.


May I ask to push this change to the repository?  I guess these programs 
don't use extensively use reflective method invocation, but it does 
improve performance of Method.copy().  Or should I need to find a macro 
benchmark that extensively use reflective method invocation? 


[1] http://cr.openjdk.java.net/~plevart/jdk-dev/
8229871_Method.methodAccessor/webrev.01/
[2] http://cr.openjdk.java.net/~ogatak/8229871/GetMethodsBench.java


Regards,
Ogata


Kazunori Ogata/Japan/IBM wrote on 2019/10/31 01:46:45:

> From: Kazunori Ogata/Japan/IBM
> To: Peter Levart <peter.levart at gmail.com>
> Cc: core-libs-dev at openjdk.java.net
> Date: 2019/10/31 01:46
> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy() and 
leafCopy()
> 
> Hi Peter,
> 
> Thank you very much for your updated fix and sorry to be late to reply.
> 
> I found that the performance data I posted earlier was wrong because I 
> fetched the latest code before building the JVM with your fix, while I 
> still used older JVM as the base version.  The new build picked up 
> JDK-8230020 [1], which reverts JDK-8225670 [2] that degraded performance 

> of SPECjbb2015.  Unfortunately, the base version only included [2]...
> 
> Your new version [3] apparently looks better.  I'll update my base JVM 
and
> measure the performance of [3].
> 
> Regards,
> Ogata
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8230020
> [2] https://bugs.openjdk.java.net/browse/JDK-8225670
> [3] http://cr.openjdk.java.net/~plevart/jdk-dev/
> 8229871_Method.methodAccessor/webrev.01/
> 
> Peter Levart <peter.levart at gmail.com> wrote on 2019/10/24 06:09:54:
> 
> > From: Peter Levart <peter.levart at gmail.com>
> > To: Kazunori Ogata <OGATAK at jp.ibm.com>
> > Cc: core-libs-dev at openjdk.java.net
> > Date: 2019/10/24 06:10
> > Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of 
> > Method.copy() and leafCopy()
> > 
> > Hi Ogata,
> > 
> > I finally managed to find some time to experiment with this. To 
measure 
> > invocation performance I created the following JMH benchmark [1]. It 
> > measures the invocation speed of instance and static methods using 
either:
> > - direct invocation (bytecodes)
> > - invocation via constant Method instance
> > - invocation via variable Method instance
> > 
> > Here are the results using unmodified JDK 14 build (baseline):
> > 
> > Benchmark                                         Mode  Cnt Score   
> > Error  Units
> > ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.272 ± 

> > 0.002  ns/op
> > ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10  16.609 ± 

> > 0.162  ns/op
> > ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.715 ± 

> > 0.163  ns/op
> > ReflectionSpeedBenchmark.staticDirect             avgt   10   2.275 ± 

> > 0.012  ns/op
> > ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10  16.351 ± 

> > 0.330  ns/op
> > ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  16.259 ± 

> > 0.196  ns/op
> > 
> > Your webrev.04 [2] has a slight (~ 6%) improvement for constant Method 

> > instance (i.e. assigned to static final field):
> > 
> > Benchmark                                         Mode  Cnt Score   
> > Error  Units
> > ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.273 ± 

> > 0.003  ns/op
> > ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10  15.628 ± 

> > 0.115  ns/op
> > ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.706 ± 

> > 0.144  ns/op
> > ReflectionSpeedBenchmark.staticDirect             avgt   10   2.277 ± 

> > 0.008  ns/op
> > ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10  15.285 ± 

> > 0.109  ns/op
> > ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  16.600 ± 

> > 0.222  ns/op
> > 
> > Now I have prepared another variant [3] that replaces 
> > DelegatingMethodAccessorImpl with SlowFastMethodAccessorImpl and 
> > produces the following result:
> > 
> > Benchmark                                         Mode  Cnt Score   
> > Error  Units
> > ReflectionSpeedBenchmark.instanceDirect           avgt   10   2.371 ± 

> > 0.027  ns/op
> > ReflectionSpeedBenchmark.instanceReflectiveConst  avgt   10   7.161 ± 

> > 0.066  ns/op
> > ReflectionSpeedBenchmark.instanceReflectiveVar    avgt   10  16.501 ± 

> > 0.154  ns/op
> > ReflectionSpeedBenchmark.staticDirect             avgt   10   2.373 ± 

> > 0.017  ns/op
> > ReflectionSpeedBenchmark.staticReflectiveConst    avgt   10   6.971 ± 

> > 0.103  ns/op
> > ReflectionSpeedBenchmark.staticReflectiveVar      avgt   10  15.893 ± 

> > 0.110  ns/op
> > 
> > This is more than twice as fast as the baseline for constant Method 
> > instances while not degrading performance for variable Method 
instances.
> > 
> > 
> > [1] http://cr.openjdk.java.net/~plevart/jdk-dev/
> 8229871_Method.methodAccessor/ReflectionSpeedBenchmark.java
> > [2] http://cr.openjdk.java.net/~ogatak/8229871/webrev.04/
> > [3] http://cr.openjdk.java.net/~plevart/jdk-dev/
> 8229871_Method.methodAccessor/webrev.01/ 
> > 
> > 
> > Could you spin this one [3] on your SPECjbb2015 benchmark to see if it 

> > still performs favorably?
> > 
> > 
> > Regards, Peter
> > 
> > On 10/11/19 12:17 PM, Kazunori Ogata wrote:
> > > Hi Peter,
> > >
> > > Thank you for the comment and suggestion of the fix.
> > >
> > > I tried to pick up your change w.r.t. methodAccessor:
> > > https://urldefense.proofpoint.com/v2/url?
> > u=https-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> > 04_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=-xRlUE3M_VEQ_pLDsVNMsIneJ7tKig8ElUy8vmAQoUM&e= 
> > >
> > >
> > > Regarding micro benchmark, my original motivation of this change is 
to
> > > improve performance of Class.getMethods(), which calls Method.copy() 
for
> > > each declared method to create a copy of Method[].
> > >
> > > I measured my simple microbench:
> > > https://urldefense.proofpoint.com/v2/url?
> > 
> 
u=https-3A__cr.openjdk.java.net_-7Eogatak_8229871_GetMethodsBench.java&d=DwIDaQ&c=jf_iaSHvJObTbx-
> > siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=Phaibyh6EWjUKos14T7aQfBzSGcH4stxqnhQFkEZsp4&e= 
> > >
> > > Base code: Elapsed time = 4808 ms
> > > webrev.01: Elapsed time = 4536 ms  (+   6%)
> > > webrev.02: Elapsed time = 2331 ms  (+106%)
> > > webrev.04: Elapsed time = 3746 ms  (+ 28%)
> > >
> > > I'll measure larger benchmark and try to think if we can reduce the
> > > overhead of memory barrier.
> > >
> > >
> > > Regards,
> > > Ogata
> > >
> > >
> > > Peter Levart <peter.levart at gmail.com> wrote on 2019/10/09 16:44:13:
> > >
> > >> From: Peter Levart <peter.levart at gmail.com>
> > >> To: Kazunori Ogata <OGATAK at jp.ibm.com>, 
core-libs-dev at openjdk.java.net
> > >> Date: 2019/10/09 16:44
> > >> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of
> > >> Method.copy() and leafCopy()
> > >>
> > >> Hi Ogata,
> > >>
> > >> May I just add that volatile field ensured that MethodAccessor 
object
> > > was
> > >> correctly published. DelegatingMethodAccessortImpl is not safe to 
be
> > >> published via data race because it contains plain `delegate` field
> > >> initialized in the constructor. In addition, the object that is 
first
> > >> assigned to that field is NativeMethodAccessorImpl which has plain
> > >> `parent` field. You can get NPE when invoking the Method.invoke 
from
> > >> multuiple threads if Method.methodAccessor is not volatile.
> > >>
> > >> In addition, It would be nice to have two microbenchmarks 
exercising:
> > >> a) Method copy performance
> > >> b) Method invocation performance
> > >>
> > >> Regards, Peter
> > >>
> > >> P.S. When exploring the possibility of an alternative 
MethodAccessor
> > >> implementation (using MethodHandle(s)):
> > >>
> > >>
> > > https://urldefense.proofpoint.com/v2/url?
> > 
> 
u=http-3A__cr.openjdk.java.net_-7Eplevart_jdk-2Ddev_6824466-5FMHReflectionAccessors_&d=DwIDaQ&c=jf_iaSHvJObTbx-
> > siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=LKR_2z3fvXB0IYUryijzgd-jH6wG3Mr2UmiOMKviFGU&e= 
> > >> webrev.00.2/
> > >>
> > >> ...I found out that it was possible to re-arrange the play between
> > >> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and 
generated
> > >> MethodAccessor in such a way that the DelegatingMethodAccessortImpl
> > >> becomes safe to be published via data race. This allowed for
> > >> Method.methodAccessor field to become plain field. In addition this
> > > field
> > >> can be made @Stable which further optimizes access to 
MethodAccessor
> > >> instance when Method instance can be constant-folded, which showed 
in
> > >> special microbenchmarks.
> > >>
> > >> So perhaps you could try to use above implementation (just changes 
to
> > >> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and part of
> > >> Reflection factory but without MH* stuff) and measure it against 
current
> > >> and your implementation (which as shown above has a data-race 
flaw).
> > >> On 10/8/19 12:23 PM, Kazunori Ogata wrote:
> > >> Hi all,
> > >>
> > >> I posted two changes and got reply that performance evaluation is
> > > needed.
> > >> I found that making Method.methodAccessor non-volatile (webrev.02) 
is
> > >> better than avoid copying methodAccessor value when it is null
> > >> (webrev.01), as shown below.
> > >>
> > >> So I'd like to ask review of the former change.  I updated weberv 
using
> > >> the latest code base (though there was no difference from 
webrev.02):
> > >>
> > >> Webrev: https://urldefense.proofpoint.com/v2/url?
> > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> > 03_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=xm_iw74CmqAabV2cctZfI75t28_DCXP9VFVjHcnQXp4&e= 
> > >>
> > >> Bug: https://urldefense.proofpoint.com/v2/url?
> > 
u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8229871&d=DwIDaQ&c=jf_iaSHvJObTbx-
> > siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=bzRkFq845mYFriH7TirkzA4JzG0m47x09kebpHfMgTw&e= 
> > >>
> > >>
> > >> For this performance evaluation, I calculated 75 percentile of 9 
runs of
> > >> SPECjbb2015 and 60 percentile of 50 runs of DaCapo to omit 
outliers.  I
> > >> bound a JVM to a NUMA node and set the number of GC threads to the 
same
> > > as
> > >> the number of physical cores.  These tuning reduced run-to-run
> > > fluctuation
> > >> on POWER (as usual...).
> > >>
> > >> SPECjbb2015:
> > >>    webrev.02:  critical jOPS +1.6%, max jOPS +0.2%
> > >>    webrev.01:  critical jOPS +0.4%, max jOPS +0.2%
> > >>
> > >>
> > >> For DaCapo, some benchmark still improved performance and some 
degraded,
> > >> but the geometric mean of all benchmarks were small:
> > >>    weberv.02: +0.3%
> > >>    weberv.01: +0.2%
> > >>
> > >> The difference of improvement/degradation between the two changes 
in
> > > each
> > >> benchmark were less than 0.8%.
> > >>
> > >> The range of improvement/degradation in each benchmark were:
> > >>    webrev.02: between +2.4% and -1.0%
> > >>    webrev.01: between +1.6% and -1.8%
> > >>
> > >>
> > >> So I think webrev.02 (i.e., making methodAccessor non-volatile) is 
a
> > > good
> > >> change, since it improved SPECjbb critical jOPS by 1.6%.
> > >>
> > >>
> > >> Regards,
> > >> Ogata
> > >>
> > >>
> > >> Kazunori Ogata/Japan/IBM wrote on 2019/08/27 15:41:39:
> > >>
> > >> From: Kazunori Ogata/Japan/IBM
> > >> To: Mandy Chung <mandy.chung at oracle.com>
> > >> Cc: core-libs-dev at openjdk.java.net
> > >> Date: 2019/08/27 15:41
> > >> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy() 
and
> > >> leafCopy()
> > >> Hi Mandy,
> > >>
> > >> Let me post interim results of the performance evaluation, though 
I'm
> > >> still measuring benchmarks and analyzing them.
> > >>
> > >> For SPECjbb2015, skipping storing null (webrev.01) was faster than
> > >> making
> > >> methodAccessor non-volatile (webrev.02).  The improvements of each 
of
> > >> the
> > >> patches in maxJOPS/criticalJOPS were 2.6%/3.9% and 1.8%/2.9%,
> > >> respectively.  This is only an average of six runs.
> > >>
> > >> For DaCapo, the results were mixed.  In some benchmark, both of the
> > >> changes degraded performance.  In some others, webrev.01 was 
better, but
> > >
> > >> weberv.02 was better in some others.
> > >>
> > >> I'll continue evaluation, but it is helpful if you could give me 
some
> > >> hints on why webrev.01 can be better than webrev.02 in SPECjbb2015.
> > >>
> > >> Regards,
> > >> Ogata
> > >>
> > >> Kazunori Ogata/Japan/IBM wrote on 2019/08/21 20:02:41:
> > >>
> > >> From: Kazunori Ogata/Japan/IBM
> > >> To: Mandy Chung <mandy.chung at oracle.com>
> > >> Cc: core-libs-dev at openjdk.java.net
> > >> Date: 2019/08/21 20:02
> > >> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy()
> > >> and leafCopy()
> > >> Hi Mandy,
> > >>
> > >> Thank you for reviewing the webrev.  I updated it to add a space 
after
> > >> "if" and also put four spaces for indentation (it was three).
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?
> > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> > 01_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=qLT9k5xsheWZfU7ocimSbEMANQDnelEUqqiR5X-Zio4&e= 
> > >>
> > >> Thank you so much for checking the history of fieldAccessor.  I was
> > >> surprised that fieldAccessor was made non-volatile in JDK5, but
> > >> methodAccessor was left as volatile for 15 years after that...
> > >>
> > >> I agree we need benchmark data.  My simple micro benchmark that
> > >> repeats
> > >> invoking Class.getMethods() improved performance by 70% when it 
made
> > >> non-
> > >> volatile (as shown in the following webrev).  I'll try to run 
larger
> > >> benchmarks, such as SPECjbb2015, to see real impact.
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?
> > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> > 02_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=aq50ONJW0fK7CBk1upVkJekAbRrDsZygPkWrjL_sM4I&e= 
> > >>
> > >> Regards,
> > >> Ogata
> > >>
> > >> Mandy Chung <mandy.chung at oracle.com> wrote on 2019/08/21 01:21:42:
> > >>
> > >> From: Mandy Chung <mandy.chung at oracle.com>
> > >> To: Kazunori Ogata <OGATAK at jp.ibm.com>
> > >> Cc: core-libs-dev at openjdk.java.net
> > >> Date: 2019/08/21 01:21
> > >> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Imporve performance of
> > >> Method.copy() and leafCopy()
> > >>
> > >> Hi Ogata,
> > >>
> > >> The patch looks okay.  Nit: please add a space between if and (.
> > >>
> > >> About volatile methodAccessor field, I checked the history.  Both
> > >> fieldAccessor and methodAccessor were started as volatile and the
> > >> fieldAccessor declaration was updated due to JDK-5044412.   As you
> > >> observe, I think the methodAccessor field could be made
> > >> non-volatile.
> > >> OTOH that might impact when it's inflated to spin bytecode for this
> > >> method invocation.  I don't know how importance to keep its 
volatile
> > >> vs
> > >> non-volatile in practice without doing benchmarking/real 
application
> > >> testing.
> > >>
> > >> Mandy
> > >>
> > >> On 8/19/19 2:51 AM, Kazunori Ogata wrote:
> > >> Hi,
> > >>
> > >> May I have review for "JDK-8229871: Imporve performance of
> > >> Method.copy()
> > >> and leafCopy()"?
> > >>
> > >> Method.copy() and leafCopy() creates a copy of a Method object
> > >> with
> > >> sharing MethodAccessor object. Since the methodAccessor field is a
> > >> volatile variable, copying this field needs memory fence to ensure
> > >> the
> > >> field is visible to all threads on the weak memory platforms such
> > >> as POWER
> > >> and ARM.
> > >>
> > >> When the methodAccessor of the root object is null (i.e., not
> > >> initialized
> > >> yet), we do not need to copy the null value because this field of
> > >> the
> > >> copied object has been initialized to null in the constructor. We
> > >> can
> > >> reduce overhead of the memory fence only when the root's
> > >> methodAccessor is
> > >> non-null. This change improved performance by 5.8% using a micro
> > >> benchmark
> > >> that repeatedly invokes Class.getMethods().
> > >>
> > >> Bug: https://urldefense.proofpoint.com/v2/url?
> > 
u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8229871&d=DwIDaQ&c=jf_iaSHvJObTbx-
> > siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=bzRkFq845mYFriH7TirkzA4JzG0m47x09kebpHfMgTw&e= 
> > >>
> > >> Webrev: https://urldefense.proofpoint.com/v2/url?
> > u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> > 00_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> > 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> > K9uCUrU_Yw&s=lGQy-Xy0ofp8d551jCUZdwmZ_OD4sXsMaoRKWzwer4o&e= 
> > >>
> > >>
> > >> By the way, why Method.methodAccessor is volatile, while
> > >> Field.fieldAccessor and Field.overrideFieldAccessor are not
> > >> volatile?  I
> > >> know the use of volatile reduces probability of creating
> > >> duplicated method
> > >> accessor, but the chance still exists.  I couldn't find the
> > >> difference
> > >> between Method and Field classes to make Method.methodAccessor
> > >> volatile.
> > >> If we can make it non-volatile, it is more preferable than a quick
> > >> hack
> > >> above.
> > >>
> > >>
> > >> Regards,
> > >> Ogata
> > >>
> > >>
> > >>
> > >
> > 
> > 




More information about the core-libs-dev mailing list