RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Tue Nov 12 13:32:40 UTC 2019
Thanks for the clarifications, Bernard.
>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>>
>> I don't see cases for non-constant masks John suggested covered. Have
>> you tried to implement them? Any problems encountered or did you just
>> leave them for future improvement?
>
> I didn't experiment with non-constant masks yet, which is why I left
> them for future improvements (as told to John).
Sounds good.
>> Why do you limit the optimization to bits in upper half? Is it because
>> ordinary andq/orq instructions work well for the rest? If that's the
>> case, it deserves a comment.
>
> On a pure specification basis (Intel optimization manual that Sandhya
> pointed me to), AND/OR and BTR/BTS have the same latency=1 but a
> slightly better throughput for the former and when experimenting with
> values <= 32-bit, I didn't observed much difference or quite
> imperceptibly in favor of AND/OR. But with pure 64-bit values, the
> benefit is much more evident because BTR/BTS replaces both a MOV and
> an AND/OR which is simply better on specification basis (latency=1 for
> BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments
> as next:
>
> // n should be a pure 64-bit power of 2 immediate because AND/OR works
> well enough for 8/32-bit values.
> // n should be a pure 64-bit immediate given that not(n) is a power of
> 2 because AND/OR works well enough for 8/32-bit values.
Looks good.
>
>> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a
>> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to
>> encode that it's > 2^32, but I would just skip it for now.)
>
> I agree with immL_NotPow2/immL_Pow2, for the encoding, see below.
One idea to try: you can move "log2_long(n->get_long()) > ..." check
from operand declaration to the instruction.
operand immL_Pow2() %{
// ...
predicate(is_power_of_2_long(n->get_long()));
...
operand immL_NotPow2() %{
// ...
predicate(is_power_of_2_long(~n->get_long()));
...
instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{
predicate(log2_long(~in(2)->in(2)->get_long()) > 30);
match(Set dst (StoreL dst (AndL (LoadL dst) con)));
...
instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{
predicate(log2_long(in(2)->in(2)->get_long()) > 31);
match(Set dst (StoreL dst (OrL (LoadL dst) con)));
...
It looks more natural (but also it requires more code) to do such
operation-specific dispatching on instructions than on operands.
Best regards,
Vladimir Ivanov
More information about the hotspot-compiler-dev
mailing list