RFR: JDK-8229871: Improve performance of Method.copy() and leafCopy()
Kazunori Ogata
OGATAK at jp.ibm.com
Wed Oct 30 16:46:44 UTC 2019
Hi Peter,
Thank you very much for your updated fix and sorry to be late to reply.
I found that the performance data I posted earlier was wrong because I
fetched the latest code before building the JVM with your fix, while I
still used older JVM as the base version. The new build picked up
JDK-8230020 [1], which reverts JDK-8225670 [2] that degraded performance
of SPECjbb2015. Unfortunately, the base version only included [2]...
Your new version [3] apparently looks better. I'll update my base JVM and
measure the performance of [3].
Regards,
Ogata
[1] https://bugs.openjdk.java.net/browse/JDK-8230020
[2] https://bugs.openjdk.java.net/browse/JDK-8225670
[3]
http://cr.openjdk.java.net/~plevart/jdk-dev/8229871_Method.methodAccessor/webrev.01/
Peter Levart <peter.levart at gmail.com> wrote on 2019/10/24 06:09:54:
> From: Peter Levart <peter.levart at gmail.com>
> To: Kazunori Ogata <OGATAK at jp.ibm.com>
> Cc: core-libs-dev at openjdk.java.net
> Date: 2019/10/24 06:10
> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of
> Method.copy() and leafCopy()
>
> Hi Ogata,
>
> I finally managed to find some time to experiment with this. To measure
> invocation performance I created the following JMH benchmark [1]. It
> measures the invocation speed of instance and static methods using
either:
> - direct invocation (bytecodes)
> - invocation via constant Method instance
> - invocation via variable Method instance
>
> Here are the results using unmodified JDK 14 build (baseline):
>
> Benchmark Mode Cnt Score
> Error Units
> ReflectionSpeedBenchmark.instanceDirect avgt 10 2.272 ±
> 0.002 ns/op
> ReflectionSpeedBenchmark.instanceReflectiveConst avgt 10 16.609 ±
> 0.162 ns/op
> ReflectionSpeedBenchmark.instanceReflectiveVar avgt 10 16.715 ±
> 0.163 ns/op
> ReflectionSpeedBenchmark.staticDirect avgt 10 2.275 ±
> 0.012 ns/op
> ReflectionSpeedBenchmark.staticReflectiveConst avgt 10 16.351 ±
> 0.330 ns/op
> ReflectionSpeedBenchmark.staticReflectiveVar avgt 10 16.259 ±
> 0.196 ns/op
>
> Your webrev.04 [2] has a slight (~ 6%) improvement for constant Method
> instance (i.e. assigned to static final field):
>
> Benchmark Mode Cnt Score
> Error Units
> ReflectionSpeedBenchmark.instanceDirect avgt 10 2.273 ±
> 0.003 ns/op
> ReflectionSpeedBenchmark.instanceReflectiveConst avgt 10 15.628 ±
> 0.115 ns/op
> ReflectionSpeedBenchmark.instanceReflectiveVar avgt 10 16.706 ±
> 0.144 ns/op
> ReflectionSpeedBenchmark.staticDirect avgt 10 2.277 ±
> 0.008 ns/op
> ReflectionSpeedBenchmark.staticReflectiveConst avgt 10 15.285 ±
> 0.109 ns/op
> ReflectionSpeedBenchmark.staticReflectiveVar avgt 10 16.600 ±
> 0.222 ns/op
>
> Now I have prepared another variant [3] that replaces
> DelegatingMethodAccessorImpl with SlowFastMethodAccessorImpl and
> produces the following result:
>
> Benchmark Mode Cnt Score
> Error Units
> ReflectionSpeedBenchmark.instanceDirect avgt 10 2.371 ±
> 0.027 ns/op
> ReflectionSpeedBenchmark.instanceReflectiveConst avgt 10 7.161 ±
> 0.066 ns/op
> ReflectionSpeedBenchmark.instanceReflectiveVar avgt 10 16.501 ±
> 0.154 ns/op
> ReflectionSpeedBenchmark.staticDirect avgt 10 2.373 ±
> 0.017 ns/op
> ReflectionSpeedBenchmark.staticReflectiveConst avgt 10 6.971 ±
> 0.103 ns/op
> ReflectionSpeedBenchmark.staticReflectiveVar avgt 10 15.893 ±
> 0.110 ns/op
>
> This is more than twice as fast as the baseline for constant Method
> instances while not degrading performance for variable Method instances.
>
>
> [1]
http://cr.openjdk.java.net/~plevart/jdk-dev/8229871_Method.methodAccessor/ReflectionSpeedBenchmark.java
> [2] http://cr.openjdk.java.net/~ogatak/8229871/webrev.04/
> [3]
http://cr.openjdk.java.net/~plevart/jdk-dev/8229871_Method.methodAccessor/webrev.01/
>
>
> Could you spin this one [3] on your SPECjbb2015 benchmark to see if it
> still performs favorably?
>
>
> Regards, Peter
>
> On 10/11/19 12:17 PM, Kazunori Ogata wrote:
> > Hi Peter,
> >
> > Thank you for the comment and suggestion of the fix.
> >
> > I tried to pick up your change w.r.t. methodAccessor:
> > https://urldefense.proofpoint.com/v2/url?
> u=https-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> 04_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=-xRlUE3M_VEQ_pLDsVNMsIneJ7tKig8ElUy8vmAQoUM&e=
> >
> >
> > Regarding micro benchmark, my original motivation of this change is to
> > improve performance of Class.getMethods(), which calls Method.copy()
for
> > each declared method to create a copy of Method[].
> >
> > I measured my simple microbench:
> > https://urldefense.proofpoint.com/v2/url?
>
u=https-3A__cr.openjdk.java.net_-7Eogatak_8229871_GetMethodsBench.java&d=DwIDaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=Phaibyh6EWjUKos14T7aQfBzSGcH4stxqnhQFkEZsp4&e=
> >
> > Base code: Elapsed time = 4808 ms
> > webrev.01: Elapsed time = 4536 ms (+ 6%)
> > webrev.02: Elapsed time = 2331 ms (+106%)
> > webrev.04: Elapsed time = 3746 ms (+ 28%)
> >
> > I'll measure larger benchmark and try to think if we can reduce the
> > overhead of memory barrier.
> >
> >
> > Regards,
> > Ogata
> >
> >
> > Peter Levart <peter.levart at gmail.com> wrote on 2019/10/09 16:44:13:
> >
> >> From: Peter Levart <peter.levart at gmail.com>
> >> To: Kazunori Ogata <OGATAK at jp.ibm.com>,
core-libs-dev at openjdk.java.net
> >> Date: 2019/10/09 16:44
> >> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Improve performance of
> >> Method.copy() and leafCopy()
> >>
> >> Hi Ogata,
> >>
> >> May I just add that volatile field ensured that MethodAccessor object
> > was
> >> correctly published. DelegatingMethodAccessortImpl is not safe to be
> >> published via data race because it contains plain `delegate` field
> >> initialized in the constructor. In addition, the object that is first
> >> assigned to that field is NativeMethodAccessorImpl which has plain
> >> `parent` field. You can get NPE when invoking the Method.invoke from
> >> multuiple threads if Method.methodAccessor is not volatile.
> >>
> >> In addition, It would be nice to have two microbenchmarks exercising:
> >> a) Method copy performance
> >> b) Method invocation performance
> >>
> >> Regards, Peter
> >>
> >> P.S. When exploring the possibility of an alternative MethodAccessor
> >> implementation (using MethodHandle(s)):
> >>
> >>
> > https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__cr.openjdk.java.net_-7Eplevart_jdk-2Ddev_6824466-5FMHReflectionAccessors_&d=DwIDaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=LKR_2z3fvXB0IYUryijzgd-jH6wG3Mr2UmiOMKviFGU&e=
> >> webrev.00.2/
> >>
> >> ...I found out that it was possible to re-arrange the play between
> >> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and generated
> >> MethodAccessor in such a way that the DelegatingMethodAccessortImpl
> >> becomes safe to be published via data race. This allowed for
> >> Method.methodAccessor field to become plain field. In addition this
> > field
> >> can be made @Stable which further optimizes access to MethodAccessor
> >> instance when Method instance can be constant-folded, which showed in
> >> special microbenchmarks.
> >>
> >> So perhaps you could try to use above implementation (just changes to
> >> DelegatingMethodAccessorImpl, NativeMethodAccessorImpl and part of
> >> Reflection factory but without MH* stuff) and measure it against
current
> >> and your implementation (which as shown above has a data-race flaw).
> >> On 10/8/19 12:23 PM, Kazunori Ogata wrote:
> >> Hi all,
> >>
> >> I posted two changes and got reply that performance evaluation is
> > needed.
> >> I found that making Method.methodAccessor non-volatile (webrev.02) is
> >> better than avoid copying methodAccessor value when it is null
> >> (webrev.01), as shown below.
> >>
> >> So I'd like to ask review of the former change. I updated weberv
using
> >> the latest code base (though there was no difference from webrev.02):
> >>
> >> Webrev: https://urldefense.proofpoint.com/v2/url?
> u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> 03_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=xm_iw74CmqAabV2cctZfI75t28_DCXP9VFVjHcnQXp4&e=
> >>
> >> Bug: https://urldefense.proofpoint.com/v2/url?
>
u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8229871&d=DwIDaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=bzRkFq845mYFriH7TirkzA4JzG0m47x09kebpHfMgTw&e=
> >>
> >>
> >> For this performance evaluation, I calculated 75 percentile of 9 runs
of
> >> SPECjbb2015 and 60 percentile of 50 runs of DaCapo to omit outliers.
I
> >> bound a JVM to a NUMA node and set the number of GC threads to the
same
> > as
> >> the number of physical cores. These tuning reduced run-to-run
> > fluctuation
> >> on POWER (as usual...).
> >>
> >> SPECjbb2015:
> >> webrev.02: critical jOPS +1.6%, max jOPS +0.2%
> >> webrev.01: critical jOPS +0.4%, max jOPS +0.2%
> >>
> >>
> >> For DaCapo, some benchmark still improved performance and some
degraded,
> >> but the geometric mean of all benchmarks were small:
> >> weberv.02: +0.3%
> >> weberv.01: +0.2%
> >>
> >> The difference of improvement/degradation between the two changes in
> > each
> >> benchmark were less than 0.8%.
> >>
> >> The range of improvement/degradation in each benchmark were:
> >> webrev.02: between +2.4% and -1.0%
> >> webrev.01: between +1.6% and -1.8%
> >>
> >>
> >> So I think webrev.02 (i.e., making methodAccessor non-volatile) is a
> > good
> >> change, since it improved SPECjbb critical jOPS by 1.6%.
> >>
> >>
> >> Regards,
> >> Ogata
> >>
> >>
> >> Kazunori Ogata/Japan/IBM wrote on 2019/08/27 15:41:39:
> >>
> >> From: Kazunori Ogata/Japan/IBM
> >> To: Mandy Chung <mandy.chung at oracle.com>
> >> Cc: core-libs-dev at openjdk.java.net
> >> Date: 2019/08/27 15:41
> >> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy()
and
> >> leafCopy()
> >> Hi Mandy,
> >>
> >> Let me post interim results of the performance evaluation, though I'm
> >> still measuring benchmarks and analyzing them.
> >>
> >> For SPECjbb2015, skipping storing null (webrev.01) was faster than
> >> making
> >> methodAccessor non-volatile (webrev.02). The improvements of each of
> >> the
> >> patches in maxJOPS/criticalJOPS were 2.6%/3.9% and 1.8%/2.9%,
> >> respectively. This is only an average of six runs.
> >>
> >> For DaCapo, the results were mixed. In some benchmark, both of the
> >> changes degraded performance. In some others, webrev.01 was better,
but
> >
> >> weberv.02 was better in some others.
> >>
> >> I'll continue evaluation, but it is helpful if you could give me some
> >> hints on why webrev.01 can be better than webrev.02 in SPECjbb2015.
> >>
> >> Regards,
> >> Ogata
> >>
> >> Kazunori Ogata/Japan/IBM wrote on 2019/08/21 20:02:41:
> >>
> >> From: Kazunori Ogata/Japan/IBM
> >> To: Mandy Chung <mandy.chung at oracle.com>
> >> Cc: core-libs-dev at openjdk.java.net
> >> Date: 2019/08/21 20:02
> >> Subject: Re: RFR: JDK-8229871: Improve performance of Method.copy()
> >> and leafCopy()
> >> Hi Mandy,
> >>
> >> Thank you for reviewing the webrev. I updated it to add a space
after
> >> "if" and also put four spaces for indentation (it was three).
> >>
> >> https://urldefense.proofpoint.com/v2/url?
> u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> 01_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=qLT9k5xsheWZfU7ocimSbEMANQDnelEUqqiR5X-Zio4&e=
> >>
> >> Thank you so much for checking the history of fieldAccessor. I was
> >> surprised that fieldAccessor was made non-volatile in JDK5, but
> >> methodAccessor was left as volatile for 15 years after that...
> >>
> >> I agree we need benchmark data. My simple micro benchmark that
> >> repeats
> >> invoking Class.getMethods() improved performance by 70% when it made
> >> non-
> >> volatile (as shown in the following webrev). I'll try to run larger
> >> benchmarks, such as SPECjbb2015, to see real impact.
> >>
> >> https://urldefense.proofpoint.com/v2/url?
> u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> 02_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=aq50ONJW0fK7CBk1upVkJekAbRrDsZygPkWrjL_sM4I&e=
> >>
> >> Regards,
> >> Ogata
> >>
> >> Mandy Chung <mandy.chung at oracle.com> wrote on 2019/08/21 01:21:42:
> >>
> >> From: Mandy Chung <mandy.chung at oracle.com>
> >> To: Kazunori Ogata <OGATAK at jp.ibm.com>
> >> Cc: core-libs-dev at openjdk.java.net
> >> Date: 2019/08/21 01:21
> >> Subject: [EXTERNAL] Re: RFR: JDK-8229871: Imporve performance of
> >> Method.copy() and leafCopy()
> >>
> >> Hi Ogata,
> >>
> >> The patch looks okay. Nit: please add a space between if and (.
> >>
> >> About volatile methodAccessor field, I checked the history. Both
> >> fieldAccessor and methodAccessor were started as volatile and the
> >> fieldAccessor declaration was updated due to JDK-5044412. As you
> >> observe, I think the methodAccessor field could be made
> >> non-volatile.
> >> OTOH that might impact when it's inflated to spin bytecode for this
> >> method invocation. I don't know how importance to keep its volatile
> >> vs
> >> non-volatile in practice without doing benchmarking/real application
> >> testing.
> >>
> >> Mandy
> >>
> >> On 8/19/19 2:51 AM, Kazunori Ogata wrote:
> >> Hi,
> >>
> >> May I have review for "JDK-8229871: Imporve performance of
> >> Method.copy()
> >> and leafCopy()"?
> >>
> >> Method.copy() and leafCopy() creates a copy of a Method object
> >> with
> >> sharing MethodAccessor object. Since the methodAccessor field is a
> >> volatile variable, copying this field needs memory fence to ensure
> >> the
> >> field is visible to all threads on the weak memory platforms such
> >> as POWER
> >> and ARM.
> >>
> >> When the methodAccessor of the root object is null (i.e., not
> >> initialized
> >> yet), we do not need to copy the null value because this field of
> >> the
> >> copied object has been initialized to null in the constructor. We
> >> can
> >> reduce overhead of the memory fence only when the root's
> >> methodAccessor is
> >> non-null. This change improved performance by 5.8% using a micro
> >> benchmark
> >> that repeatedly invokes Class.getMethods().
> >>
> >> Bug: https://urldefense.proofpoint.com/v2/url?
>
u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8229871&d=DwIDaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=bzRkFq845mYFriH7TirkzA4JzG0m47x09kebpHfMgTw&e=
> >>
> >> Webrev: https://urldefense.proofpoint.com/v2/url?
> u=http-3A__cr.openjdk.java.net_-7Eogatak_8229871_webrev.
> 00_&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
>
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=kWMN3Fiqhqdlc9lMvgHDA1VViBz9r2Eb-
> K9uCUrU_Yw&s=lGQy-Xy0ofp8d551jCUZdwmZ_OD4sXsMaoRKWzwer4o&e=
> >>
> >>
> >> By the way, why Method.methodAccessor is volatile, while
> >> Field.fieldAccessor and Field.overrideFieldAccessor are not
> >> volatile? I
> >> know the use of volatile reduces probability of creating
> >> duplicated method
> >> accessor, but the chance still exists. I couldn't find the
> >> difference
> >> between Method and Field classes to make Method.methodAccessor
> >> volatile.
> >> If we can make it non-volatile, it is more preferable than a quick
> >> hack
> >> above.
> >>
> >>
> >> Regards,
> >> Ogata
> >>
> >>
> >>
> >
>
>
More information about the core-libs-dev
mailing list