RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits
B. Blaser
bsrbnd at gmail.com
Tue Nov 12 21:13:52 UTC 2019
Hi Vladimir Kozlov and Ivanov,
Please review the updated patch according to Vladimir Ivanov comments:
http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.01/
I've pushed it to jdk/submit as second changeset on branch
"JDK-8214239" and tests are OK:
http://hg.openjdk.java.net/jdk/submit/rev/f961f7a454e4
Any feedback is welcome.
Thanks,
Bernard
On Tue, 12 Nov 2019 at 14:32, Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
>
> Thanks for the clarifications, Bernard.
>
> >>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
> >>
> >> I don't see cases for non-constant masks John suggested covered. Have
> >> you tried to implement them? Any problems encountered or did you just
> >> leave them for future improvement?
> >
> > I didn't experiment with non-constant masks yet, which is why I left
> > them for future improvements (as told to John).
>
> Sounds good.
>
>
> >> Why do you limit the optimization to bits in upper half? Is it because
> >> ordinary andq/orq instructions work well for the rest? If that's the
> >> case, it deserves a comment.
> >
> > On a pure specification basis (Intel optimization manual that Sandhya
> > pointed me to), AND/OR and BTR/BTS have the same latency=1 but a
> > slightly better throughput for the former and when experimenting with
> > values <= 32-bit, I didn't observed much difference or quite
> > imperceptibly in favor of AND/OR. But with pure 64-bit values, the
> > benefit is much more evident because BTR/BTS replaces both a MOV and
> > an AND/OR which is simply better on specification basis (latency=1 for
> > BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments
> > as next:
> >
> > // n should be a pure 64-bit power of 2 immediate because AND/OR works
> > well enough for 8/32-bit values.
> > // n should be a pure 64-bit immediate given that not(n) is a power of
> > 2 because AND/OR works well enough for 8/32-bit values.
>
> Looks good.
>
> >
> >> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a
> >> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to
> >> encode that it's > 2^32, but I would just skip it for now.)
> >
> > I agree with immL_NotPow2/immL_Pow2, for the encoding, see below.
>
> One idea to try: you can move "log2_long(n->get_long()) > ..." check
> from operand declaration to the instruction.
>
> operand immL_Pow2() %{
> // ...
> predicate(is_power_of_2_long(n->get_long()));
> ...
>
> operand immL_NotPow2() %{
> // ...
> predicate(is_power_of_2_long(~n->get_long()));
> ...
>
> instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{
> predicate(log2_long(~in(2)->in(2)->get_long()) > 30);
> match(Set dst (StoreL dst (AndL (LoadL dst) con)));
> ...
>
> instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{
> predicate(log2_long(in(2)->in(2)->get_long()) > 31);
> match(Set dst (StoreL dst (OrL (LoadL dst) con)));
> ...
>
> It looks more natural (but also it requires more code) to do such
> operation-specific dispatching on instructions than on operands.
>
> Best regards,
> Vladimir Ivanov
More information about the hotspot-compiler-dev
mailing list