"xdual(xdual()) should be identity" error when backporting Montgomery multiplication to JDK8 (PPC64le)

Gustavo Serra Scalet gustavo.scalet at eldorado.org.br
Wed Sep 27 14:10:15 UTC 2017


Hi,

Martin helped me on this one:
https://gist.github.com/anonymous/b3a8782aac9d56615d6e15db834335b4#file-montgomery-diff-L76

I was incrementing argcnt after it was being used on:
const Type** fields = TypeTuple::fields(argcnt);

So the check:
assert(argp == TypeFunc::Parms+argcnt, "correct decoding");

was valid but the fields array was not!


Now it is working. I'll send the backport for review soon.


> -----Original Message-----
> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> Sent: terça-feira, 26 de setembro de 2017 10:06
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> Subject: RE: "xdual(xdual()) should be identity" error when backporting
> Montgomery multiplication to JDK8 (PPC64le)
> 
> 
> 
> > -----Original Message-----
> > From: Doerr, Martin
> 
> > Hi,
> >
> > did you feed the right argcnt into:
> > const Type** fields = TypeTuple::fields(argcnt); ?
> 
> Yes, that looks ok.
> 
> > If this is not the problem, I guess you'll have to debug.
> 
> Alright. I'll post the results for reference.
> 
> > Best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
> > Sent: Montag, 25. September 2017 21:12
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> > Subject: RE: "xdual(xdual()) should be identity" error when
> > backporting Montgomery multiplication to JDK8 (PPC64le)
> >
> > In order words, despite the proper JDK9 commit[1] I added these
> > changes due to CCallingConventionRequiresIntsAsLongs:
> > https://gist.github.com/anonymous/b3a8782aac9d56615d6e15db834335b4
> >
> > References:
> > [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/ce0dacc26f3d
> >
> > -----Original Message-----
> > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> > Sent: segunda-feira, 25 de setembro de 2017 15:55
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> > Subject: RE: "xdual(xdual()) should be identity" error when
> > backporting Montgomery multiplication to JDK8 (PPC64le)
> >
> > Yup:
> > Node* call = NULL;
> > if (CCallingConventionRequiresIntsAsLongs) {
> >   Node* len_I2L = ConvI2L(len);
> >   call = make_runtime_call(RC_LEAF,
> >                            OptoRuntime::montgomeryMultiply_Type(),
> >                            stubAddr, stubName, TypePtr::BOTTOM,
> >                            a_start, b_start, n_start, len_I2L XTOP,
> inv,
> >                            top(), m_start); } else {
> >   call = make_runtime_call(RC_LEAF,
> >                            OptoRuntime::montgomeryMultiply_Type(),
> >                            stubAddr, stubName, TypePtr::BOTTOM,
> >                            a_start, b_start, n_start, len, inv, top(),
> >                            m_start);
> > }
> > set_result(m);
> >
> >
> > Any other thing I may be missing?
> >
> > -----Original Message-----
> > From: Doerr, Martin [mailto:martin.doerr at sap.com]
> > Sent: segunda-feira, 25 de setembro de 2017 13:55
> > To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>
> > Cc: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> > Subject: RE: "xdual(xdual()) should be identity" error when
> > backporting Montgomery multiplication to JDK8 (PPC64le)
> >
> > Hi Gustavo,
> >
> > Did you also change library_call?
> > Should be something like (example for multiplyToLen):
> >
> >     Node* call = NULL;
> >     if (CCallingConventionRequiresIntsAsLongs) {
> >       Node* xlen_I2L = ConvI2L(xlen);
> >       Node* ylen_I2L = ConvI2L(ylen);
> >       Node* zlen_I2L = ConvI2L(zlen);
> >       call = make_runtime_call(RC_LEAF|RC_NO_FP,
> >                                OptoRuntime::multiplyToLen_Type(),
> >                                stubAddr, stubName, TypePtr::BOTTOM,
> >                                x_start, xlen_I2L XTOP, y_start,
> >                                ylen_I2L XTOP, z_start, zlen_I2L XTOP);
> >     }
> >     else {
> >       call = make_runtime_call(RC_LEAF|RC_NO_FP,
> >                                OptoRuntime::multiplyToLen_Type(),
> >                                stubAddr, stubName, TypePtr::BOTTOM,
> >                                x_start, xlen, y_start, ylen,
> >                                z_start, zlen);
> >     }
> >
> >
> > Best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
> > Sent: Montag, 25. September 2017 16:03
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> > Subject: "xdual(xdual()) should be identity" error when backporting
> > Montgomery multiplication to JDK8 (PPC64le)
> >
> > Hi,
> >
> > Previously I faced some issues when backporting intrinsics to JDK8[1]
> > but after some help, manage to make it work.
> >
> > Despite following those rules related to CCallingConvention and Int ->
> > Long conversion, I still face issues not previously noticed on JDK9
> > when running SpecJVM2008's crypto.rsa bechmark.
> >
> > [release build]:
> >   Benchmark:   crypto.rsa
> >   Run mode:    timed run
> >   Test type:   multi
> >   Threads:     24
> >   Warmup:      120s
> >   Iterations:  1
> >   Run length:  240s
> > #
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00003fffa0cc6f84, pid=3698,
> > tid=0x00003fff7a34f1a0 # # JRE version: OpenJDK Runtime Environment
> > (8.0) (build 1.8.0-internal-gut_2017_04_12_09_19-b00)
> > # Java VM: OpenJDK 64-Bit Server VM (25.71-b00 mixed mode linux-ppc64
> > compressed oops) # Problematic frame:
> > # V  [libjvm.so+0x446f84]
> > ConnectionGraph::process_call_arguments(CallNode*)+0x1c4
> >
> >
> > [slowdebug build]:
> >   Benchmark:   crypto.rsa
> >   Run mode:    timed run
> >   Test type:   multi
> >   Threads:     24
> >   Warmup:      120s
> >   Iterations:  1
> >   Run length:  240s
> > # To suppress the following error report, specify this argument #
> > after
> > -XX: or in .hotspotrc:  SuppressErrorAt=/type.cpp:596 # # A fatal
> > error has been detected by the Java Runtime Environment:
> > #
> > #  Internal Error (/home/gut/jdk8u-
> > dev/hotspot/src/share/vm/opto/type.cpp:596), pid=6793,
> > tid=0x00003fff5847f1a0 #  assert(eq(dual_dual)) failed: xdual(xdual())
> > should be identity
> >
> >
> >
> > Any hints? I see the issue is when calling the last line of this code
> > on
> > runtime.cpp:
> > const TypeFunc* OptoRuntime::montgomeryMultiply_Type() {
> >   // create input type (domain)
> >   int num_args      = 7;
> >   int argcnt = num_args;
> >   const Type** fields = TypeTuple::fields(argcnt);
> >   int argp = TypeFunc::Parms;
> >   fields[argp++] = TypePtr::NOTNULL;    // a
> >   fields[argp++] = TypePtr::NOTNULL;    // b
> >   fields[argp++] = TypePtr::NOTNULL;    // n
> >   if (CCallingConventionRequiresIntsAsLongs) {
> >     argcnt += 1;                        // additional placeholders
> >     fields[argp++] = TypeLong::LONG;    // len
> >     fields[argp++] = TypeLong::HALF;    // placeholder
> >   } else {
> >     fields[argp++] = TypeInt::INT;      // len
> >   }
> >   fields[argp++] = TypeLong::LONG;      // inv
> >   fields[argp++] = Type::HALF;
> >   fields[argp++] = TypePtr::NOTNULL;    // result
> >   assert(argp == TypeFunc::Parms+argcnt, "correct decoding");
> >   const TypeTuple* domain = TypeTuple::make(TypeFunc::Parms+argcnt,
> > fields);
> >
> >
> >
> > Thanks
> >
> > References:
> > [1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2017-
> > April/002970.html
> >
> > -----Original Message-----
> > From: Doerr, Martin [mailto:martin.doerr at sap.com]
> > Sent: sexta-feira, 15 de setembro de 2017 04:39
> > To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>
> > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > SquareToLen intrinsics (off-list)
> >
> > Hi Gustavo,
> >
> > you can request backport by sending a
> > [8u-dev] Request for backport approval: 8145913: PPC64: add Montgomery
> > multiply intrinsic to jdk8u-dev at openjdk.java.net
> >
> > But first of all, you need to check if the change applies to 8u and
> > run tests.
> >
> > The following information is needed in the email:
> > Original Bug: https://bugs.openjdk.java.net/browse/JDK-8145913
> > Jdk9 change:
> > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/ce0dacc26f3d
> > Jdk9 review thread: http://mail.openjdk.java.net/pipermail/hotspot-
> > compiler-dev/2015-December/020534.html
> > Statement: Change applies cleanly or describe which changes were
> > necessary You should also mention that it belongs to 8u102 backport of
> > https://bugs.openjdk.java.net/browse/JDK-8150152
> >
> > Once it's approved, you can ask a jdk8u reviewer or committer (which
> > I'm
> > not) to push it.
> >
> > Best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
> > Sent: Donnerstag, 14. September 2017 21:00
> > To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > SquareToLen intrinsics
> >
> > Hi Martin,
> >
> > Short question about the Montgomery backport to JDK8:
> >
> > > > No, I used the jdk8u152-b01 (State of repository at Thu Apr 6
> > > > 14:15:31 2017). The reported performance speedup was calculated by
> > > > > running the following test (TestSquareToLen.java):
> > >
> > > Seems like JDK-8145913 has not been backported, yet. Sorry for not
> > > checking this earlier. So if you want to make RSA really fast, it
> > > should be so much better to backport that one. But I can still
> > > sponsor this change as it may be used elsewhere.
> >
> > Do we have a bug ID for the Montgomery backport to JDK8 so I can track
> > it? If not, should I request that backport or request this intrinsic
> > to be backported (once it's inside of JDK10)? I'm interested in this
> > kind of performance gain.
> >
> > Thanks
> >
> > >
> > > Best regards,
> > > Martin
> > >
> > >
> > > -----Original Message-----
> > > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
> > > Sent: Dienstag, 29. August 2017 22:37
> > > To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> > > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > SquareToLen intrinsics
> > >
> > > Hi Martin,
> > >
> > > New changes:
> > > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev.02/
> > >
> > > Check comments below, please.
> > >
> > > > -----Original Message-----
> > > > From: Doerr, Martin
> > > >
> > > > 1. Sign extending offset and len
> > > > Right, sign and zero extending is equivalent for offset and len
> > > > because they are guaranteed to be >=0 (by checks in Java). But you
> > > > can only rely on bit 32 (IBM notation) to be 0. Bit 0-31 may
> > > > contain
> > > garbage.
> > > > rldicl was incorrect. My mistake, sorry for that. Correct would be
> > > > rldic which also clears the least significant bits.
> > > > len should also get fixed e.g. by replacing cmpdi by extsw_ in
> > muladd.
> > >
> > > The s/rldicl/rldic/ was fixed for "offset", but "len" doesn't seem
> > > to need further changes as it's being cleared with clrldi, which is
> > > the same as rldic with no shift. Therefore it's treated
> > > appropriately as requested for "offset" parameter. Do you agree?
> > >
> > > > 2. Using 8 byte instructions for int The code which feeds stdu is
> > > > endianess specific. Doesn't work on all
> > > > PPC64 platforms.
> > >
> > > You are right. The way I'm building the 64 bits of the register
> > > depends on which kind of endianness it is run. For now it works only
> > > on little endian so I'm adding a switch (just like I did for SHA) to
> > > make it available only on little endian systems.
> > >
> > > > 3.Regarding Andrew's point: Superseded by Montgomery?
> > > > The Montgomery change got backported to jdk8u (JDK-8150152  in
> > 8u102).
> > > > I'd expect the performance improvement of these intrinsics to be
> > > > irrelevant for crypto.rsa. Did you measure with an older jdk8
> > release?
> > >
> > > No, I used the jdk8u152-b01 (State of repository at Thu Apr 6
> > > 14:15:31 2017). The reported performance speedup was calculated by
> > > running the following test (TestSquareToLen.java):
> > > import java.math.BigInteger;
> > >
> > > public class TestSquareToLen {
> > >
> > >     public static void main(String args[]) throws Exception {
> > >
> > >       int n = 10000000;
> > >       if (args.length >=1) {
> > >         n = Integer.parseInt(args[0]);
> > >       }
> > >
> > >       BigInteger b1 = new
> > > BigInteger("34893980923557359086350514982082503920002298311877320859
> > > 99
> > > 36
> > > 73955941838010214688430713917560492078731370166315598379312147549260
> > > 92
> > > 22
> > > 37802921102076092232721848082893366300577359694237268085206410301181
> > > 16
> > > 51
> > > 64401804883382348239081994789652420763585798455208997799631311315401
> > > 66
> > > 68 718795349783157384006672542605760392289645528307");
> > >       BigInteger b2 = BigInteger.valueOf(0);
> > >       BigInteger check = BigInteger.valueOf(1);
> > >       for (int i = 0; i < n; i++) {
> > >         b2 = b1.multiply(b1);
> > >         if (i == 0)
> > >           // Didn't JIT yet. Comparing against interpreted mode
> > >           check = b2;
> > >       }
> > >       if (b2.compareTo(check) == 0)
> > >         System.out.println("Check ok!");
> > >       else
> > >         System.out.println("Check failed!");
> > >    }
> > > }
> > >
> > >
> > > I got these results on JDK8 on my POWER8 machine:
> > > $ ./javac TestSquareToLen.java
> > > $ sudo perf stat -r 5 ./java -XX:-UseMulAddIntrinsic -XX:-
> > > UseSquareToLenIntrinsic TestSquareToLen Check ok!
> > > Check ok!
> > > Check ok!
> > > Check ok!
> > > Check ok!
> > >
> > >  Performance counter stats for './java -XX:-UseMulAddIntrinsic -XX:-
> > > UseSquareToLenIntrinsic TestSquareToLen' (5 runs):
> > >
> > >       15148.009557      task-clock (msec)         #    1.053 CPUs
> > > utilized            ( +-  0.48% )
> > >              2,425      context-switches          #    0.160 K/sec
> > > ( +-  5.84% )
> > >                356      cpu-migrations            #    0.023 K/sec
> > > ( +-  3.01% )
> > >              5,153      page-faults               #    0.340 K/sec
> > > ( +-  5.22% )
> > >     54,536,889,909      cycles                    #    3.600 GHz
> > > ( +-  0.56% )  (66.68%)
> > >        239,554,105      stalled-cycles-frontend   #    0.44%
> frontend
> > > cycles idle     ( +-  4.87% )  (49.90%)
> > >     27,683,316,001      stalled-cycles-backend    #   50.76% backend
> > > cycles idle      ( +-  0.56% )  (50.17%)
> > >    102,020,229,733      instructions              #    1.87  insn
> per
> > > cycle
> > >                                                   #    0.27  stalled
> > > cycles per insn  ( +-  0.14% )  (66.94%)
> > >      7,706,072,218      branches                  #  508.718 M/sec
> > > ( +-  0.23% )  (50.20%)
> > >        456,051,162      branch-misses             #    5.92% of all
> > > branches          ( +-  0.09% )  (50.07%)
> > >
> > >       14.390840733 seconds time elapsed ( +-  0.09% )
> > >
> > > $ sudo perf stat -r 5 ./java -XX:+UseMulAddIntrinsic -
> > > XX:+UseSquareToLenIntrinsic TestSquareToLen Check ok!
> > > Check ok!
> > > Check ok!
> > > Check ok!
> > > Check ok!
> > >
> > >  Performance counter stats for './java -XX:+UseMulAddIntrinsic -
> > > XX:+UseSquareToLenIntrinsic TestSquareToLen' (5 runs):
> > >
> > >       11368.141410      task-clock (msec)         #    1.045 CPUs
> > > utilized            ( +-  0.64% )
> > >              1,964      context-switches          #    0.173 K/sec
> > > ( +-  8.93% )
> > >                338      cpu-migrations            #    0.030 K/sec
> > > ( +-  7.65% )
> > >              5,627      page-faults               #    0.495 K/sec
> > > ( +-  6.15% )
> > >     41,100,168,967      cycles                    #    3.615 GHz
> > > ( +-  0.50% )  (66.36%)
> > >        309,052,316      stalled-cycles-frontend   #    0.75%
> frontend
> > > cycles idle     ( +-  2.84% )  (49.89%)
> > >     14,188,581,685      stalled-cycles-backend    #   34.52% backend
> > > cycles idle      ( +-  0.99% )  (50.34%)
> > >     77,846,029,829      instructions              #    1.89  insn
> per
> > > cycle
> > >                                                   #    0.18  stalled
> > > cycles per insn  ( +-  0.29% )  (66.96%)
> > >      8,435,216,989      branches                  #  742.005 M/sec
> > > ( +-  0.28% )  (50.17%)
> > >        339,903,936      branch-misses             #    4.03% of all
> > > branches          ( +-  0.27% )  (49.90%)
> > >
> > >       10.882357546 seconds time elapsed ( +-  0.24% )
> > >
> > >
> > > (out of curiosity, these numbers are 15.19s (+- 0.32%) and 13.42s
> > > (+-
> > > 0.53%) on JDK10)
> > >
> > > I may run for SpecJVM2008's crypto.rsa if you are interested.
> > >
> > > Thank you once again for reviewing this.
> > >
> > > Best regards,
> > > Gustavo
> > >
> > > > (I think the change is still acceptable as the intrinsics could be
> > > > used elsewhere and the implementation also exists on other
> > > > platforms.)
> > > >
> > > > Best regards,
> > > > Martin
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
> > > > Sent: Mittwoch, 16. August 2017 18:50
> > > > To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> > > > dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > > SquareToLen intrinsics
> > > >
> > > > Hi Martin,
> > > >
> > > > Thanks for dedicated review. It took me a while to be able to work
> > > > on this but I hope to have your points solved. Please check below
> > > > the review as well as my comments quoting your email:
> > > > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev.01/
> > > >
> > > > > -----Original Message-----
> > > > > First of all, C2 does not perform sign extend when calling
> stubs.
> > > > > The int parms need to get zero/sign extended. (Could even be
> > > > > done without extra instructions by replacing sldi -> rldicl,
> > > > > cmpdi -> extsw_ in some
> > > > > cases.)
> > > >
> > > > Does it make a difference on my case?
> > > >
> > > > I guess you are talking about mulAdd preparation code. The only
> > > > aspect I found about him is to force the cast from 32 bits -> 64
> > > > bits by cleaning higher bits. Offset is a signed integer but it
> > > > can't be
> > > negative anyway.
> > > >
> > > > So I changed from:
> > > > sldi   (R5_ARG3, R5_ARG3, 2);
> > > >
> > > > to:
> > > > rldicl (R5_ARG3, R5_ARG3, 2, 32);  // always positive
> > > >
> > > >
> > > > > macroAssembler_ppc.cpp:
> > > > > - Indentation should be 2 spaces.
> > > >
> > > > Done
> > > >
> > > >
> > > > > stubGenerator_ppc:cpp:
> > > > > - or_, addi_ should get replaced by orr, addi when CR0 result is
> > > > > not needed.
> > > >
> > > > Done
> > > >
> > > > > - Where is lplw initialized?
> > > >
> > > > It should be initialized with 0, I missed that...
> > > >
> > > > > - I believe that the updating load/store instructions e.g. lwzu
> > > > > don't perform well on some processors. At least using stwu 2
> > > > > times in the loop doesn't make sense.
> > > >
> > > > You are right. I could manipulate the bits differently and ended
> > > > up with a single stdu in the loop. Neat! Although I could not
> > > > reduce the total number of instructions.
> > > >
> > > > > - Note: It should be possible to use 8 byte instead of 4 byte
> > > > > instructions: MacroAssembler::multiply64, addc, adde. But I'm
> > > > > not requesting to change that because I guess it would make the
> > > > > code very complicated, especially when supporting both endianess
> > > versions.
> > > >
> > > > Yes, that would require a new analysis on this code. May we
> > > > consider it next? As you said, I prefer having an initial version
> > > > that looks as simple as the original java code.
> > > >
> > > > > - The squareToLen stub implementation is very close the Java
> > > > > implementation. So it'd be interesting to understand what C2
> > > > > doesn't do as well as the hand written assembly code. Do you
> > > > > know that? (Not absolutely necessary for accepting this change
> > > > > as long as the stub is measurably faster.)
> > > >
> > > > I don't know either. Basically I chose doing it because I noticed
> > > > some performance gain on SpecJVM2008 when analyzing X64. Then,
> > > > taking a closer look, I didn't notice any AVX or some special
> > > > instructions on
> > > > X64 so I decided to try it on ppc64 by using some basic assembly.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Best regards,
> > > > > Martin
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> > > > > Sent: Donnerstag, 10. August 2017 19:22
> > > > > To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> > > > > dev at openjdk.java.net>
> > > > > Subject: FW: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > > > SquareToLen intrinsics
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> > > > > Sent: terça-feira, 8 de agosto de 2017 17:19
> > > > > To: ppc-aix-port-dev at openjdk.java.net
> > > > > Subject: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > > > SquareToLen intrinsics
> > > > >
> > > > > Hi,
> > > > >
> > > > > Could you please review this specific PPC64 change to hotspot?
> > > > > By implementing these intrinsics I noticed a small improvement
> > > > > with microbenchmarks analysis. On SpecJVM2008's crypto.rsa
> > > > > benchmark, only when backporting to JDK8 an improvement was
> noticed.
> > > > >
> > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8185976
> > > > > Webrev: https://gut.github.io/openjdk/webrev/JDK-8185976/webrev/
> > > > >
> > > > > Motivation for this implementation:
> > > > > https://twitter.com/ijuma/status/698309312498835457
> > > > >
> > > > > Best regards,
> > > > > Gustavo Serra Scalet


More information about the jdk8u-dev mailing list