Bit set intrinsic

B. Blaser bsrbnd at gmail.com
Mon Nov 5 21:21:02 UTC 2018


On Wed, 31 Oct 2018 at 15:51, B. Blaser <bsrbnd at gmail.com> wrote:
>
> The last but not least, I implemented the c2 part (using the 8-bit
> AND/OR variant) to do sharper comparisons also on non-concurrent
> execution:
>
> http://cr.openjdk.java.net/~bsrbnd/boolpack/webrev.02/
>
> With 10e6 iterations the lock latency seems to be more or less
> negligible and removing it would make the intrinsic about 10% faster
> than BitSet without synchronization.

Which actually seems to be due to the following missing ANDB/ORB
patterns in x86_64.ad:

instruct andB_mem_rReg(memory dst, rRegI src, rFlagsReg cr)
%{
  match(Set dst (StoreB dst (AndI (LoadB dst) src)));
  effect(KILL cr);

  ins_cost(150);
  format %{ "andb    $dst, $src\t# byte" %}
  opcode(0x20);
  ins_encode(REX_breg_mem(src, dst), OpcP, reg_mem(src, dst));
  ins_pipe(ialu_mem_reg);
%}

instruct orB_mem_rReg(memory dst, rRegI src, rFlagsReg cr)
%{
  match(Set dst (StoreB dst (OrI (LoadB dst) src)));
  effect(KILL cr);

  ins_cost(150);
  format %{ "orb    $dst, $src\t# byte" %}
  opcode(0x08);
  ins_encode(REX_breg_mem(src, dst), OpcP, reg_mem(src, dst));
  ins_pipe(ialu_mem_reg);
%}

The next two lines:
1) bits[index>>>3] |= (byte)(1 << (index & 7));
2) bits[index>>>3] &= (byte)~(1 << (index & 7));

where assembled as:
1)
024       movsbl  R8, [RSI + #16 + R10]    # byte
02a       movl    R11, #1    # int
030       sall    R11, RCX
033       movsbl  R11, R11    # i2b
037       orl     R11, R8    # int
03a       movb    [RSI + #16 + R10], R11    # byte
2)
024       movsbl  R8, [RSI + #16 + R10]    # byte
02a       movl    R11, #1    # int
030       sall    R11, RCX
033       not    R11
036       movsbl  R11, R11    # i2b
03a       andl    R8, R11    # int
03d       movb    [RSI + #16 + R10], R8    # byte

instead of:
1)
024       movl    R11, #1    # int
02a       sall    R11, RCX
02d       movsbl  R11, R11    # i2b
031       orb    [RSI + #16 + R10], R11    # byte
2)
024       movl    R11, #1    # int
02a       sall    R11, RCX
02d       not    R11
030       movsbl  R11, R11    # i2b
034       andb    [RSI + #16 + R10], R11    # byte

So, as first step, I would probably create a JBS issue and send out a
RFR on hotspot-dev for this simple enhancement if there are no
objections?

Any opinion is welcome.

Thanks,
Bernard


More information about the compiler-dev mailing list