[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request
Bhateja, Jatin
jatin.bhateja at intel.com
Mon Feb 18 02:17:50 UTC 2019
Hi Bernard,
Thanks for your comments, please find my responses in below mail.
Regards,
Jatin
> -----Original Message-----
> From: B. Blaser [mailto:bsrbnd at gmail.com]
> Sent: Sunday, February 17, 2019 9:06 PM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> Cc: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Nils Eliasson
> <nils.eliasson at oracle.com>; Viswanathan, Sandhya
> <sandhya.viswanathan at intel.com>; hotspot-compiler-
> dev at openjdk.java.net; eric.caspole at oracle.com
> Subject: Re: [PATCH] 8217561 : X86: Add floating-point Math.min/max
> intrinsics, approval request
>
> Hi Jatin, Vladimir, All,
>
> On Sun, 17 Feb 2019 at 10:02, Bhateja, Jatin <jatin.bhateja at intel.com>
> wrote:
> >
> > Hi Vladimir,
> >
> > I have added a benchmark over the revision shared by Bernard
> >
> > Following is the link to updated patch.
> >
> > http://cr.openjdk.java.net/~sviswanathan/Jatin/8217561/webrev.04/
> >
> > Regards,
> > Jatin
> >
> > > -----Original Message-----
> > > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> > > Sent: Saturday, February 16, 2019 7:19 AM
> > >
> > > Eric C. asked to add JMH microbanchmark to show performance of new
> > > code. You may be missed his e-mail:
> > >
> > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-
> > > February/032830.html
> > >
> > > We need to see benefits from these changes and verify them before
> > > pushing it.
> > >
> > > Thanks,
> > > Vladimir
>
> Thanks for your benchmark, Jatin, which is a good start.
>
> Unfortunately, it's biased because the loops are far too much predictable
I agree. Which is why to counter the predictability special values (signed zeros , NaNs ) were interleaved in between strict floating point values since original implementation
exists early in such scenarios.
> and modulo operations very expensive vs min/max.
I agree that modulo is expensive than min/max but it has been used during setup phase only.
> Therefore, the results don't seem to be really representative.
>
> The legal header looks also unusual and please use 4 spaces indentation for
> Java files.
>
> I wrote another benchmark, here under and also attached, to be added to
> webrev.04.
>
> Here are the results on x86_64 (xeon), before:
>
> $ make test TEST="micro:vm.compiler.FpMinMaxIntrinsics"
> MICRO="VM_OPTIONS='-
> XX:CompileCommand=print,org/openjdk/bench/vm/compiler/FpMinMaxIntr
> insics.*Bench*
> -XX:-InlineMathNatives';FORK=1;WARMUP_ITER=1;ITER=3"
>
> 089 movsd XMM0, [R10 + #16 + RAX << #3] # double
> [...]
> 099 movsd XMM1, [R10 + #16 + R8 << #3] # double
> [...]
> 0c0 ucomisd XMM0, XMM1 test
>
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 3 8786.771 ± 1853.250 ns/op
> FpMinMaxIntrinsics.dMin avgt 3 9066.529 ± 1257.368 ns/op
> FpMinMaxIntrinsics.fMax avgt 3 8655.304 ± 2056.253 ns/op
> FpMinMaxIntrinsics.fMin avgt 3 8639.211 ± 1400.143 ns/op
>
>
> and after:
>
> $ make test TEST="micro:vm.compiler.FpMinMaxIntrinsics"
> MICRO="VM_OPTIONS='-
> XX:CompileCommand=print,org/openjdk/bench/vm/compiler/FpMinMaxIntr
> insics.*Bench*
> -XX:+InlineMathNatives';FORK=1;WARMUP_ITER=1;ITER=3"
>
> 085 movsd XMM4, [R11 + #16 + RAX << #3] # double
> [...]
> 095 movsd XMM0, [R11 + #16 + R8 << #3] # double
> [...]
> 09c blendvpd XMM2,XMM0,XMM4,XMM0
> 09c blendvpd XMM1,XMM4,XMM0,XMM0
> 09c vmaxsd XMM3,XMM1,XMM2
> 09c cmppd.unordered XMM2,XMM1,XMM1
> 09c blendvpd XMM0,XMM3,XMM1,XMM2
>
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 3 3971.653 ± 855.886 ns/op
> FpMinMaxIntrinsics.dMin avgt 3 3787.000 ± 524.271 ns/op
> FpMinMaxIntrinsics.fMax avgt 3 3786.412 ± 1013.097 ns/op
> FpMinMaxIntrinsics.fMin avgt 3 4051.351 ± 2327.196 ns/op
>
>
> We see that the new instruction sequence is about 2x faster than the
> previous one.
> Could we have a Reviewer feedback and eventually a definitive approval for
> this?
>
> Thanks,
> Bernard
>
>
> package org.openjdk.bench.vm.compiler;
>
> import org.openjdk.jmh.annotations.*;
> import org.openjdk.jmh.infra.*;
>
> import java.util.concurrent.TimeUnit;
> import java.util.Random;
>
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @State(Scope.Thread)
> public class FpMinMaxIntrinsics {
> private static final int COUNT = 1000;
>
> private double[] doubles = new double[COUNT];
> private float[] floats = new float[COUNT];
>
> private int c1, c2, s1, s2;
>
> private Random r = new Random();
>
> @Setup
> public void init() {
> c1 = s1 = step();
> c2 = COUNT - (s2 = step());
>
> for (int i=0; i<COUNT; i++) {
> floats[i] = r.nextFloat();
> doubles[i] = r.nextDouble();
> }
> }
>
> private int step() {
> return (r.nextInt() & 0xf) + 1;
> }
>
> @Benchmark
> public void dMax(Blackhole bh) {
> for (int i=0; i<COUNT; i++)
> bh.consume(dMaxBench());
> }
>
> @Benchmark
> public void dMin(Blackhole bh) {
> for (int i=0; i<COUNT; i++)
> bh.consume(dMinBench());
> }
>
> @Benchmark
> public void fMax(Blackhole bh) {
> for (int i=0; i<COUNT; i++)
> bh.consume(fMaxBench());
> }
>
> @Benchmark
> public void fMin(Blackhole bh) {
> for (int i=0; i<COUNT; i++)
> bh.consume(fMinBench());
> }
>
Compilation logs show that [fd]MaxBench* methods are getting inlined which is expected.
This means that call to function inc() will also get benchmarked which looks like an overhead.
> private double dMaxBench() {
> inc();
> return Math.max(doubles[c1], doubles[c2]);
> }
>
> private double dMinBench() {
> inc();
> return Math.min(doubles[c1], doubles[c2]);
> }
>
> private float fMaxBench() {
> inc();
> return Math.max(floats[c1], floats[c2]);
> }
>
> private float fMinBench() {
> inc();
> return Math.min(floats[c1], floats[c2]);
> }
>
> private void inc() {
> c1 = c1 + s1 < COUNT ? c1 + s1 : (s1 = step());
> c2 = c2 - s2 > 0 ? c2 - s2 : COUNT - (s2 = step());
> }
> }
More information about the hotspot-compiler-dev
mailing list