RFR: 8256431: [PPC64] Implement Base64 encodeBlock() for Power64-LE

Martin Doerr mdoerr at openjdk.java.net
Fri Dec 4 15:18:17 UTC 2020


On Thu, 3 Dec 2020 20:50:34 GMT, Corey Ashford <github.com+51754783+CoreyAshford at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4036:
>> 
>>> 4034:        // 5.4X slower.  So on P9, we replace lxvl with a conditional
>>> 4035:        // unaligned load sequence, based on the alignment of the address
>>> 4036:        // and the length of the data requested.
>> 
>> This code looks like it is more than 5.4X slower than fast lxvl and hence slower than slow lxvl.
>
> I spent quite a lot of time benchmarking different variations of replacement code, and arrived at this one.  In about 40% of the cases, lxvl outperforms this replacement by a bit, but in 60% of the cases, the replacement does quite a lot better than lxvl.  I have the spreadsheets that show it.  The 5.4X number is conservative because it includes overhead of the benchmark loop used to test it, so lxvl may in fact be quite a lot more than 5.4X slower.
> 
> That said, I don't really like having this code in there, and would be happy to get rid of it.  Since it's not used in the main loop, I'm guessing using just lxvl might not impact overall performance very much.  So I'm a bit on the fence about it, to be honest.

Thanks for providing this additional information. I'm sorry that I have only very limited time for this atm.
Hmm... I still think it's a bit lengthy for a code snippet which is only used outside of the loop.
On the other hand, it could be reused elsewhere.

I believe it's possible to create the mask more quickly (unfortunately, it doesn't work for len==16, but it can cover the len==0 case):
`cmpdi(len,16)
bge(skip_masking)

bind(no_load)
vsplisb(v0,0)
vsplisb(v1,0xff)
// BigEndian:
                   // Example: len=16 or 0               len=1                              len=15
lvsr(v2,len)       // 0x101112131415161718191A1B1C1D1E1F 0x0F101112131415161718191A1B1C1D1E 0x0102030405060708090A0B0C0D0E0F10
vperm(v2,v1,v0,v2) // 0x00000000000000000000000000000000 0xFF000000000000000000000000000000 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00
// LittleEndian:
lvsr(v2,len)       // 0x000102030405060708090A0B0C0D0E0F 0x0102030405060708090A0B0C0D0E0F10 0x0F101112131415161718191A1B1C1D1E
vperm(v2,v0,v1,v2) // 0x00000000000000000000000000000000 0x000000000000000000000000000000FF 0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

vand(result,v2,result)
bind(skip_masking)`

-------------

PR: https://git.openjdk.java.net/jdk/pull/1245


More information about the hotspot-compiler-dev mailing list