Bit set intrinsic

Mon Nov 5 21:24:12 UTC 2018

Just want to say that I like this effort. Please go ahead and create an
issue and send it out for review.

Roman

> On Wed, 31 Oct 2018 at 15:51, B. Blaser <bsrbnd at gmail.com> wrote:
>>
>> The last but not least, I implemented the c2 part (using the 8-bit
>> AND/OR variant) to do sharper comparisons also on non-concurrent
>> execution:
>>
>> http://cr.openjdk.java.net/~bsrbnd/boolpack/webrev.02/
>>
>> With 10e6 iterations the lock latency seems to be more or less
>> negligible and removing it would make the intrinsic about 10% faster
>> than BitSet without synchronization.
> 
> Which actually seems to be due to the following missing ANDB/ORB
> patterns in x86_64.ad:
> 
> instruct andB_mem_rReg(memory dst, rRegI src, rFlagsReg cr)
> %{
>   match(Set dst (StoreB dst (AndI (LoadB dst) src)));
>   effect(KILL cr);
> 
>   ins_cost(150);
>   format %{ "andb    $dst, $src\t# byte" %}
>   opcode(0x20);
>   ins_encode(REX_breg_mem(src, dst), OpcP, reg_mem(src, dst));
>   ins_pipe(ialu_mem_reg);
> %}
> 
> instruct orB_mem_rReg(memory dst, rRegI src, rFlagsReg cr)
> %{
>   match(Set dst (StoreB dst (OrI (LoadB dst) src)));
>   effect(KILL cr);
> 
>   ins_cost(150);
>   format %{ "orb    $dst, $src\t# byte" %}
>   opcode(0x08);
>   ins_encode(REX_breg_mem(src, dst), OpcP, reg_mem(src, dst));
>   ins_pipe(ialu_mem_reg);
> %}
> 
> The next two lines:
> 1) bits[index>>>3] |= (byte)(1 << (index & 7));
> 2) bits[index>>>3] &= (byte)~(1 << (index & 7));
> 
> where assembled as:
> 1)
> 024       movsbl  R8, [RSI + #16 + R10]    # byte
> 02a       movl    R11, #1    # int
> 030       sall    R11, RCX
> 033       movsbl  R11, R11    # i2b
> 037       orl     R11, R8    # int
> 03a       movb    [RSI + #16 + R10], R11    # byte
> 2)
> 024       movsbl  R8, [RSI + #16 + R10]    # byte
> 02a       movl    R11, #1    # int
> 030       sall    R11, RCX
> 033       not    R11
> 036       movsbl  R11, R11    # i2b
> 03a       andl    R8, R11    # int
> 03d       movb    [RSI + #16 + R10], R8    # byte
> 
> instead of:
> 1)
> 024       movl    R11, #1    # int
> 02a       sall    R11, RCX
> 02d       movsbl  R11, R11    # i2b
> 031       orb    [RSI + #16 + R10], R11    # byte
> 2)
> 024       movl    R11, #1    # int
> 02a       sall    R11, RCX
> 02d       not    R11
> 030       movsbl  R11, R11    # i2b
> 034       andb    [RSI + #16 + R10], R11    # byte
> 
> So, as first step, I would probably create a JBS issue and send out a
> RFR on hotspot-dev for this simple enhancement if there are no
> objections?
> 
> Any opinion is welcome.
> 
> Thanks,
> Bernard
>