8179527: Implement intrinsic code for reverseBytes with load/store

Michihiro Horie HORIE at jp.ibm.com
Fri Jun 2 14:54:17 UTC 2017


Martin, Volker,

Thank you very much for double-checking the change with micro benchmarks
and pushing it.
I should attach benchmarks with their results first time when I posted
webrev, sorry.

Best regards,
--
Michihiro,
IBM Research - Tokyo



From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie <HORIE at jp.ibm.com>, Volker Simonis
            <volker.simonis at gmail.com>
Cc:	Gustavo Bueno Romero <gromero at br.ibm.com>, Hiroshi H Horii
            <HORII at jp.ibm.com>, "hotspot-dev at openjdk.java.net"
            <hotspot-dev at openjdk.java.net>,
            "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>
Date:	2017/06/02 22:39
Subject:	RE: 8179527: Implement intrinsic code for reverseBytes with
            load/store



Hi Michihiro,

thanks for providing the micro benchmarks. I have run some of them on
Power6 as additional test.
All testing was good and we have 2 reviews, so I’ve pushed it.

Best regards,
Martin


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Freitag, 2. Juni 2017 14:16
To: Volker Simonis <volker.simonis at gmail.com>
Cc: Gustavo Bueno Romero <gromero at br.ibm.com>; Hiroshi H Horii
<HORII at jp.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin
<martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net
Subject: Re: 8179527: Implement intrinsic code for reverseBytes with
load/store



Volker,

I used OProfile's opannotate to verify the nodes are matched.

Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for Volker Simonis ---2017/06/02 20:11:22---Hi
Michihiro, thanks a lot for providing your micro-benchmarkVolker Simonis
---2017/06/02 20:11:22---Hi Michihiro, thanks a lot for providing your
micro-benchmarks.

From: Volker Simonis <volker.simonis at gmail.com>
To: Michihiro Horie <HORIE at jp.ibm.com>
Cc: Gustavo Bueno Romero <gromero at br.ibm.com>, Hiroshi H Horii <
HORII at jp.ibm.com>, "hotspot-dev at openjdk.java.net" <
hotspot-dev at openjdk.java.net>, "Doerr, Martin" <martin.doerr at sap.com>, "
ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>
Date: 2017/06/02 20:11
Subject: Re: 8179527: Implement intrinsic code for reverseBytes with
load/store




Hi Michihiro,

thanks a lot for providing your micro-benchmarks.
So, did you use PrintAssemby/PrintOtpoAssembly to verify that the new nodes
are matched?

Regards,
Volker


On Fri, Jun 2, 2017 at 11:28 AM, Michihiro Horie <HORIE at jp.ibm.com> wrote:
      Volker,

      Thank you very much for the 2nd review.
      Attached includes micro benchmarks I used for the verification.
      (See attached file: reverseBytesBench.tar.gz)

      Best regards,
      --
      Michihiro,
      IBM Research - Tokyo

      Inactive hide details for Volker Simonis ---2017/06/02 17:04:16---Hi
      Michihiro, Martin, thanks a lot for the change Michihiro aVolker
      Simonis ---2017/06/02 17:04:16---Hi Michihiro, Martin, thanks a lot
      for the change Michihiro and thanks for the thorough review

      From: Volker Simonis <volker.simonis at gmail.com>
      To: "Doerr, Martin" <martin.doerr at sap.com>
      Cc: Michihiro Horie <HORIE at jp.ibm.com>, Gustavo Bueno Romero <
      gromero at br.ibm.com>, Hiroshi H Horii <HORII at jp.ibm.com>, "
      hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
      Date: 2017/06/02 17:04
      Subject: Re: 8179527: Implement intrinsic code for reverseBytes with
      load/store







      Hi Michihiro, Martin,

      thanks a lot for the change Michihiro and thanks for the thorough
      review Martin :)

      I'm just wondering if you had some test-cases to verify if the new
      nodes really match?

      Change looks good now.

      Regards,
      Volker


      On Thu, Jun 1, 2017 at 6:00 PM, Doerr, Martin <martin.doerr at sap.com>
      wrote:
                  Hi Michihiro,





                  thank you very much for providing the new webrev (with
                  support for older Power processors, which was missing in
                  webrev.02):


                  http://cr.openjdk.java.net/~horii/8179527/webrev.03/





                  I have built and executed the VM on a Power6 machine.
                  Thanks for supporting it.


                  I would have spent only one flag for
                  has_ldbrx/has_stdbrx, but I’m ok with it. The change
                  looks good to me and I will sponsor it after 2nd review.





                  Best regards,


                  Martin








                  From: Doerr, Martin
                  Sent: Mittwoch, 31. Mai 2017 15:54
                  To: 'Michihiro Horie' <HORIE at jp.ibm.com>; Volker Simonis
                  (volker.simonis at gmail.com) <volker.simonis at gmail.com>
                  Cc: Gustavo Bueno Romero <gromero at br.ibm.com>; Hiroshi H
                  Horii <HORII at jp.ibm.com>; hotspot-dev at openjdk.java.net


                  Subject: RE: 8179527: Implement intrinsic code for
                  reverseBytes with load/store





                  Hi Michihiro,





                  thank you for the new webrev with all of my suggestions.
                  Looks good to me.


                  There’s only a trailing whitespace which I had to remove
                  to satisfy “hg jcheck” (otherwise push attempt would get
                  rejected).





                  We’ll run tests. Maybe we can get a 2nd review in the
                  meantime?





                  Best regards,


                  Martin








                  From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
                  Sent: Mittwoch, 31. Mai 2017 14:36
                  To: Doerr, Martin <martin.doerr at sap.com>
                  Cc: Gustavo Bueno Romero <gromero at br.ibm.com>; Hiroshi H
                  Horii <HORII at jp.ibm.com>; hotspot-dev at openjdk.java.net;
                  Simonis, Volker <volker.simonis at sap.com>;
                  ppc-aix-port-dev at openjdk.java.net
                  Subject: RE: 8179527: Implement intrinsic code for
                  reverseBytes with load/store





                  Martin,

                  Thank you very much for your helpful comments and
                  sponsoring this change.

                  Would you review the latest change?
                  http://cr.openjdk.java.net/~horii/8179527/webrev.02/

                  Best regards,
                  --
                  Michihiro,
                  IBM Research - Tokyo

                  Inactive hide details for "Doerr, Martin" ---2017/05/30
                  01:26:01---Hi Michihiro, thanks for the improved webrev.
                  This looks bet"Doerr, Martin" ---2017/05/30 01:26:01---Hi
                  Michihiro, thanks for the improved webrev. This looks
                  better, but I still have a couple of sugges

                  From: "Doerr, Martin" <martin.doerr at sap.com>
                  To: Michihiro Horie <HORIE at jp.ibm.com>
                  Cc: "hotspot-dev at openjdk.java.net" <
                  hotspot-dev at openjdk.java.net>, "Simonis, Volker" <
                  volker.simonis at sap.com>, Hiroshi H Horii <
                  HORII at jp.ibm.com>, Gustavo Bueno Romero <
                  gromero at br.ibm.com>
                  Date: 2017/05/30 01:26
                  Subject: RE: 8179527: Implement intrinsic code for
                  reverseBytes with load/store






                  Hi Michihiro,

                  thanks for the improved webrev. This looks better, but I
                  still have a couple of suggestions.

                  1.
                  I still don’t like match rules which contain nodes which
                  do something else (even though direct matching is
                  prohibited by predicate).
                  I think it would be better to remove “match(…)”,
                  “predicate(false)” and “ins_const(…)” and just describe
                  the “effect()”. At least, I’m not aware of why a match
                  rule should be needed for rldicl and extsh.

                  2.
                  I’d appreciate if you could remove “predicate
                  (UseCountLeadingZerosInstructionsPPC64)” from all
                  byte_reverse_... rules. They don’t make any sense (not
                  your fault).

                  3.
                  The costs seem not to be set appropriately in the
                  byte_reverse_... rules. E.g. instruction count *
                  DEFAULT_COST would be better.

                  4.
                  The load/store byte reversed instructions should use the
                  2 operand form (no explicit 0 for R0 to support
                  assertions).

                  Maybe we can find a 2nd reviewer if you provide a new
                  webrev. I can sponsor the change.

                  Thanks and best regards,
                  Martin


                  From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
                  Sent: Montag, 8. Mai 2017 06:58
                  To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier,
                  Goetz <goetz.lindenmaier at sap.com>
                  Cc: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br
                  >; hotspot-dev at openjdk.java.net;
                  ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <
                  volker.simonis at sap.com>; Hiroshi H Horii <
                  HORII at jp.ibm.com>; Gustavo Bueno Romero <
                  gromero at br.ibm.com>
                  Subject: RE: 8179527: Implement intrinsic code for
                  reverseBytes with load/store


                  Dear Martin, Gustavo,

                  Thank you very much for your helpful comments.

                  Fixed code is
                  http://cr.openjdk.java.net/~horii/8179527/webrev.01/

                  Dear Goetz,
                  Would you kindly review and sponsor this change?
                  I heard you are a C2 compiler expert and Martin is out
                  for a while.


                  Best regards,
                  --
                  Michihiro,
                  IBM Research - Tokyo

                  Inactive hide details for "Doerr, Martin" ---2017/05/03
                  02:24:18---Hi Michihiro and Gustavo, thank you very much
                  for implementi"Doerr, Martin" ---2017/05/03 02:24:18---Hi
                  Michihiro and Gustavo, thank you very much for
                  implementing this change.

                  From: "Doerr, Martin" <martin.doerr at sap.com>
                  To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br
                  >, Michihiro Horie/Japan/IBM at IBMJP
                  Cc: "ppc-aix-port-dev at openjdk.java.net" <
                  ppc-aix-port-dev at openjdk.java.net>, "
                  hotspot-dev at openjdk.java.net" <
                  hotspot-dev at openjdk.java.net>, "Simonis, Volker" <
                  volker.simonis at sap.com>
                  Date: 2017/05/03 02:24
                  Subject: RE: 8179527: Implement intrinsic code for
                  reverseBytes with load/store







                  Hi Michihiro and Gustavo,

                  thank you very much for implementing this change.

                  @Gustavo: Thanks for taking a look.
                  I think that the direct match rules are just there to
                  satisfy match_rule_supported. They don't need to be fast,
                  they are just a fall back solution.
                  The goal is to exploit the byte reverse load and store
                  instructions which should match in more performance
                  critical cases.

                  Now my review:

                  assembler_ppc.hpp:
                  Looks good except a minor formatting request:
                  LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532 << 1),
                  should be
                  LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532u << 1),
                  to be consistent.
                  The comments // X-FORM should be aligned with the other
                  ones.

                  assembler_ppc.inline.hpp:
                  Good.

                  ppc.ad:
                  I'm concerned about the additional match rules which are
                  only used for the expand step. They could match directly
                  leading to incorrect code. What they match is not what
                  they do.
                  I suggest to implement the code directly in the
                  ins_encode. This would make the new code significantly
                  shorter and less error prone.
                  I think we don't need to optimize for Power6 anymore and
                  newer processors shouldn't really suffer under a little
                  less optimized instruction scheduling. Would you agree?

                  Displacements may be too large for "li" so I suggest to
                  use the "indirect" memory operand and let the compiler
                  handle it. I know that it may increase latency because
                  the compiler will need to insert an addition which could
                  better be matched into the memory operand of the load
                  which is harder to implement (it is possible to match an
                  addition in an operand).


                  Best regards,
                  Martin


                  -----Original Message-----
                  From: Gustavo Serra Scalet [
                  mailto:gustavo.scalet at eldorado.org.br]
                  Sent: Dienstag, 2. Mai 2017 17:05
                  To: Michihiro Horie <HORIE at jp.ibm.com>
                  Cc: ppc-aix-port-dev at openjdk.java.net;
                  hotspot-dev at openjdk.java.net; Simonis, Volker <
                  volker.simonis at sap.com>; Doerr, Martin <
                  martin.doerr at sap.com>
                  Subject: RE: 8179527: Implement intrinsic code for
                  reverseBytes with load/store

                  Hi Michihiro,

                  I wonder if there is no vectorized approach for
                  implementing your "bytes_reverse_long_Ex" instruct on
                  ppc.ad. Or did you avoid doing it so intentionally?

                  > -----Original Message-----
                  > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
                  > bounces at openjdk.java.net] On Behalf Of Michihiro Horie
                  > Sent: terça-feira, 2 de maio de 2017 11:47
                  > To: ppc-aix-port-dev at openjdk.java.net;
                  hotspot-dev at openjdk.java.net;
                  > volker.simonis at sap.com; martin.doerr at sap.com
                  > Subject: 8179527: Implement intrinsic code for
                  reverseBytes with
                  > load/store
                  >
                  > Dear all,
                  >
                  > Would you please review following change?
                  >
                  > Bug: https://bugs.openjdk.java.net/browse/JDK-8179527
                  > Webrev:
                  http://cr.openjdk.java.net/~horii/8179527/webrev.00/
                  >
                  > I added new intrinsic code for reverseBytes() in ppc.ad
                  with
                  > * match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
                  > * match(Set dst (StoreI dst (ReverseBytesI/L/US/S
                  src)));
                  >
                  >
                  > Best regards,
                  > --
                  > Michihiro,
                  > IBM Research - Tokyo









-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170602/bf49f79a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170602/bf49f79a/graycol-0001.gif>


More information about the ppc-aix-port-dev mailing list