RFR 8214239 (S): Missing x86_64.ad patterns for clearing and setting long vector bits

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Tue Nov 12 13:32:40 UTC 2019


Thanks for the clarifications, Bernard.

>>> http://cr.openjdk.java.net/~bsrbnd/jdk8214239/webrev.00/
>>
>> I don't see cases for non-constant masks John suggested covered. Have
>> you tried to implement them? Any problems encountered or did you just
>> leave them for future improvement?
> 
> I didn't experiment with non-constant masks yet, which is why I left
> them for future improvements (as told to John).

Sounds good.


>> Why do you limit the optimization to bits in upper half? Is it because
>> ordinary andq/orq instructions work well for the rest? If that's the
>> case, it deserves a comment.
> 
> On a pure specification basis (Intel optimization manual that Sandhya
> pointed me to), AND/OR and BTR/BTS have the same latency=1 but a
> slightly better throughput for the former and when experimenting with
> values <= 32-bit, I didn't observed much difference or quite
> imperceptibly in favor of AND/OR. But with pure 64-bit values, the
> benefit is much more evident because BTR/BTS replaces both a MOV and
> an AND/OR which is simply better on specification basis (latency=1 for
> BTR/BTS vs latency=1+1 for MOV + AND/OR). So, I'll update the comments
> as next:
> 
> // n should be a pure 64-bit power of 2 immediate because AND/OR works
> well enough for 8/32-bit values.
> // n should be a pure 64-bit immediate given that not(n) is a power of
> 2 because AND/OR works well enough for 8/32-bit values.

Looks good.

> 
>> (immPow2NotL is a bit misleading: I read it as "power of 2, but not a
>> long". What do you think about immL_NegPow2/immL_Pow2? Not sure how to
>> encode that it's > 2^32, but I would just skip it for now.)
> 
> I agree with immL_NotPow2/immL_Pow2, for the encoding, see below.

One idea to try: you can move "log2_long(n->get_long()) > ..." check 
from operand declaration to the instruction.

operand immL_Pow2() %{
   // ...
   predicate(is_power_of_2_long(n->get_long()));
   ...

operand immL_NotPow2() %{
   // ...
   predicate(is_power_of_2_long(~n->get_long()));
   ...

instruct btrL_mem_imm(memory dst, immL_NotPow2 con, rFlagsReg cr) %{
   predicate(log2_long(~in(2)->in(2)->get_long()) > 30);
   match(Set dst (StoreL dst (AndL (LoadL dst) con)));
...

instruct btsL_mem_imm(memory dst, immPow2L con, rFlagsReg cr) %{
   predicate(log2_long(in(2)->in(2)->get_long()) > 31);
   match(Set dst (StoreL dst (OrL (LoadL dst) con)));
...

It looks more natural (but also it requires more code) to do such 
operation-specific dispatching on instructions than on operands.

Best regards,
Vladimir Ivanov


More information about the hotspot-compiler-dev mailing list