JDK-8214239 (?): Missing x86_64.ad patterns for clearing and setting long vector bits

John Rose john.r.rose at oracle.com
Thu Nov 7 01:01:34 UTC 2019


I recently saw LLVM compile a classification switch into a really tidy BTR instruction,
something like this:

  switch (ch) {
  case ';': case '/': case '.': case '[':  return 0;
  default: return 1;
  }
=>
  … range check …
  movabsq	0x200000002003, %rcx
  btq	%rdi, %rcx

It made me wish for this change, plus some more to switch itself.
Given Sandhya’s report, though, BTR may only be helpful in limited
cases.  In the case above, it subsumes a shift instruction.

Bernard’s JMH experiment suggests something else is going on besides
the throughput difference which Sandhya cites.  Maybe it’s a benchmark
artifact, or maybe it’s a good effect from smaller code.  I suggest jamming
more back-to-back BTRs together, to see if the throughput effect appears.

— John

On Nov 6, 2019, at 4:34 PM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
> 
> Hi Vladimir/Bernard,
> 
> 
> 
> I don’t see any restrictions/limitations on these instructions other than the fact that the “long” operation is only supported on 64-bit format as usual so should be restricted to 64-bit JVM only.
> 
> The code size improvement that Bernard demonstrates is significant for operation on longs.
> 
> It looks like the throughput for AND/OR is better than BTR/BTS  (0.25 vs 0.5) though. Please refer Table C-17 in the document below:



More information about the hotspot-compiler-dev mailing list