RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding

Wed Jun 24 00:40:03 UTC 2020

Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and 
API for encodeBlock, but none for decoding.  This means that only 
encoding gets acceleration from the underlying CPU's vector hardware.

I'd like to propose adding a new intrinsic for decodeBlock.  The 
considerations I have for this new intrinsic's API:

  * Don't make any assumptions about the underlying capability of the 
hardware.  For example, do not impose any specific block size granularity.

  * Don't assume the underlying intrinsic can handle isMIME or isURL 
modes, but also let them decide if they will process the data regardless 
of the settings of the two booleans.

  * Any remaining data that is not processed by the intrinsic will be 
processed by the pure Java implementation.  This allows the intrinsic to 
process whatever block sizes it's good at without the complexity of 
handling the end fragments.

  * If any illegal character is discovered in the decoding process, the 
intrinsic will simply return -1, instead of requiring it to throw a 
proper exception from the context of the intrinsic.  In the event of 
getting a -1 returned from the intrinsic, the Java Base64 library code 
simply calls the pure Java implementation to have it find the error and 
properly throw an exception.  This is a performance trade-off in the 
case of an error (which I expect to be very rare).

  * One thought I have for a further optimization (not implemented in 
the current patch), is that when the intrinsic decides not to process a 
block because of some combination of isURL and isMIME settings it 
doesn't handle, it could return extra bits in the return code, encoded 
as a negative number.  For example:

Illegal_Base64_char   = 0b001;
isMIME_unsupported    = 0b010;
isURL_unsupported     = 0b100;

These can be OR'd together as needed and then negated (flip the sign). 
The Base64 library code could then cache these flags, so it will know 
not to call the intrinsic again when another decodeBlock is requested 
but with an unsupported mode.  This will save the performance hit of 
calling the intrinsic when it is guaranteed to fail.

I've tested the attached patch with an actual intrinsic coded up for 
Power9/Power10, but those runtime intrinsics and arch-specific patches 
aren't attached today.  I want to get some consensus on the 
library-level intrinsic API first.

Also attached is a simple test case to test that the new intrinsic API 
doesn't break anything.

I'm open to any comments about this.

Thanks for your consideration,

- Corey

Corey Ashford
IBM Systems, Linux Technology Center, OpenJDK team
cjashfor at us dot ibm dot com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: decodeBlock_api-20200623.patch
Type: text/x-patch
Size: 3953 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20200623/c1a7029f/decodeBlock_api-20200623.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestBase64.java
Type: text/x-java
Size: 1793 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20200623/c1a7029f/TestBase64.java>