RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding
Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but none for decoding. This means that only encoding gets acceleration from the underlying CPU's vector hardware. I'd like to propose adding a new intrinsic for decodeBlock. The considerations I have for this new intrinsic's API: * Don't make any assumptions about the underlying capability of the hardware. For example, do not impose any specific block size granularity. * Don't assume the underlying intrinsic can handle isMIME or isURL modes, but also let them decide if they will process the data regardless of the settings of the two booleans. * Any remaining data that is not processed by the intrinsic will be processed by the pure Java implementation. This allows the intrinsic to process whatever block sizes it's good at without the complexity of handling the end fragments. * If any illegal character is discovered in the decoding process, the intrinsic will simply return -1, instead of requiring it to throw a proper exception from the context of the intrinsic. In the event of getting a -1 returned from the intrinsic, the Java Base64 library code simply calls the pure Java implementation to have it find the error and properly throw an exception. This is a performance trade-off in the case of an error (which I expect to be very rare). * One thought I have for a further optimization (not implemented in the current patch), is that when the intrinsic decides not to process a block because of some combination of isURL and isMIME settings it doesn't handle, it could return extra bits in the return code, encoded as a negative number. For example: Illegal_Base64_char = 0b001; isMIME_unsupported = 0b010; isURL_unsupported = 0b100; These can be OR'd together as needed and then negated (flip the sign). The Base64 library code could then cache these flags, so it will know not to call the intrinsic again when another decodeBlock is requested but with an unsupported mode. This will save the performance hit of calling the intrinsic when it is guaranteed to fail. I've tested the attached patch with an actual intrinsic coded up for Power9/Power10, but those runtime intrinsics and arch-specific patches aren't attached today. I want to get some consensus on the library-level intrinsic API first. Also attached is a simple test case to test that the new intrinsic API doesn't break anything. I'm open to any comments about this. Thanks for your consideration, - Corey Corey Ashford IBM Systems, Linux Technology Center, OpenJDK team cjashfor at us dot ibm dot com
Hi Corey, Following is the issue I created. https://bugs.openjdk.java.net/browse/JDK-8248188 I will upload a webrev when you're ready as we talked in private. Best regards, Michihiro From: "Corey Ashford" <cjashfor@linux.ibm.com> To: "hotspot-compiler-dev@openjdk.java.net" <hotspot-compiler-dev@openjdk.java.net>, "ppc-aix-port-dev@openjdk.java.net" <ppc-aix-port-dev@openjdk.java.net> Cc: Michihiro Horie/Japan/IBM@IBMJP, Kazunori Ogata/Japan/IBM@IBMJP, joserz@br.ibm.com Date: 2020/06/24 09:40 Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but none for decoding. This means that only encoding gets acceleration from the underlying CPU's vector hardware. I'd like to propose adding a new intrinsic for decodeBlock. The considerations I have for this new intrinsic's API: * Don't make any assumptions about the underlying capability of the hardware. For example, do not impose any specific block size granularity. * Don't assume the underlying intrinsic can handle isMIME or isURL modes, but also let them decide if they will process the data regardless of the settings of the two booleans. * Any remaining data that is not processed by the intrinsic will be processed by the pure Java implementation. This allows the intrinsic to process whatever block sizes it's good at without the complexity of handling the end fragments. * If any illegal character is discovered in the decoding process, the intrinsic will simply return -1, instead of requiring it to throw a proper exception from the context of the intrinsic. In the event of getting a -1 returned from the intrinsic, the Java Base64 library code simply calls the pure Java implementation to have it find the error and properly throw an exception. This is a performance trade-off in the case of an error (which I expect to be very rare). * One thought I have for a further optimization (not implemented in the current patch), is that when the intrinsic decides not to process a block because of some combination of isURL and isMIME settings it doesn't handle, it could return extra bits in the return code, encoded as a negative number. For example: Illegal_Base64_char = 0b001; isMIME_unsupported = 0b010; isURL_unsupported = 0b100; These can be OR'd together as needed and then negated (flip the sign). The Base64 library code could then cache these flags, so it will know not to call the intrinsic again when another decodeBlock is requested but with an unsupported mode. This will save the performance hit of calling the intrinsic when it is guaranteed to fail. I've tested the attached patch with an actual intrinsic coded up for Power9/Power10, but those runtime intrinsics and arch-specific patches aren't attached today. I want to get some consensus on the library-level intrinsic API first. Also attached is a simple test case to test that the new intrinsic API doesn't break anything. I'm open to any comments about this. Thanks for your consideration, - Corey Corey Ashford IBM Systems, Linux Technology Center, OpenJDK team cjashfor at us dot ibm dot com [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro Horie/Japan/IBM]
The webrev is located here: http://cr.openjdk.java.net/~mhorie/8248188/webrev.00/ On 6/23/20 6:23 PM, Michihiro Horie wrote:
Hi Corey,
Following is the issue I created. https://bugs.openjdk.java.net/browse/JDK-8248188
I will upload a webrev when you're ready as we talked in private.
Best regards, Michihiro
Inactive hide details for "Corey Ashford" ---2020/06/24 09:40:10---Currently in java.util.Base64, there is a HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but no
From: "Corey Ashford" <cjashfor@linux.ibm.com> To: "hotspot-compiler-dev@openjdk.java.net" <hotspot-compiler-dev@openjdk.java.net>, "ppc-aix-port-dev@openjdk.java.net" <ppc-aix-port-dev@openjdk.java.net> Cc: Michihiro Horie/Japan/IBM@IBMJP, Kazunori Ogata/Japan/IBM@IBMJP, joserz@br.ibm.com Date: 2020/06/24 09:40 Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding
------------------------------------------------------------------------
Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but none for decoding. This means that only encoding gets acceleration from the underlying CPU's vector hardware.
I'd like to propose adding a new intrinsic for decodeBlock. The considerations I have for this new intrinsic's API:
* Don't make any assumptions about the underlying capability of the hardware. For example, do not impose any specific block size granularity.
* Don't assume the underlying intrinsic can handle isMIME or isURL modes, but also let them decide if they will process the data regardless of the settings of the two booleans.
* Any remaining data that is not processed by the intrinsic will be processed by the pure Java implementation. This allows the intrinsic to process whatever block sizes it's good at without the complexity of handling the end fragments.
* If any illegal character is discovered in the decoding process, the intrinsic will simply return -1, instead of requiring it to throw a proper exception from the context of the intrinsic. In the event of getting a -1 returned from the intrinsic, the Java Base64 library code simply calls the pure Java implementation to have it find the error and properly throw an exception. This is a performance trade-off in the case of an error (which I expect to be very rare).
* One thought I have for a further optimization (not implemented in the current patch), is that when the intrinsic decides not to process a block because of some combination of isURL and isMIME settings it doesn't handle, it could return extra bits in the return code, encoded as a negative number. For example:
Illegal_Base64_char = 0b001; isMIME_unsupported = 0b010; isURL_unsupported = 0b100;
These can be OR'd together as needed and then negated (flip the sign). The Base64 library code could then cache these flags, so it will know not to call the intrinsic again when another decodeBlock is requested but with an unsupported mode. This will save the performance hit of calling the intrinsic when it is guaranteed to fail.
I've tested the attached patch with an actual intrinsic coded up for Power9/Power10, but those runtime intrinsics and arch-specific patches aren't attached today. I want to get some consensus on the library-level intrinsic API first.
Also attached is a simple test case to test that the new intrinsic API doesn't break anything.
I'm open to any comments about this.
Thanks for your consideration,
- Corey
Corey Ashford IBM Systems, Linux Technology Center, OpenJDK team cjashfor at us dot ibm dot com [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro Horie/Japan/IBM]
Michihiro Horie uploaded a new revision of the Base64 decodeBlock intrinsic API for me: http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ It has the following changes with respect to the original one posted: * In the event of encountering a non-base64 character, instead of having a separate error code of -1, the intrinsic can now just return either 0, or the number of data bytes produced up to the point where the illegal base64 character was encountered. This reduces the number of special cases, and also provides a way to speed up the process of finding the bad character by the slower, pure-Java algorithm. * The isMIME boolean is removed from the API for two reasons: - The current API is not sufficient to handle the isMIME case, because there isn't a strict relationship between the number of input bytes and the number of output bytes, because there can be an arbitrary number of non-base64 characters in the source. - If an intrinsic only implements the (isMIME == false) case as ours does, it will always return 0 bytes processed, which will slightly slow down the normal path of processing an (isMIME == true) instantiation. - We considered adding a separate hotspot candidate for the (isMIME == true) case, but since we don't have an intrinsic implementation to test that, we decided to leave it as a future optimization. Comments and suggestions are welcome. Thanks for your consideration. - Corey On 6/23/20 6:23 PM, Michihiro Horie wrote:
Hi Corey,
Following is the issue I created. https://bugs.openjdk.java.net/browse/JDK-8248188
I will upload a webrev when you're ready as we talked in private.
Best regards, Michihiro
Inactive hide details for "Corey Ashford" ---2020/06/24 09:40:10---Currently in java.util.Base64, there is a HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but no
From: "Corey Ashford" <cjashfor@linux.ibm.com> To: "hotspot-compiler-dev@openjdk.java.net" <hotspot-compiler-dev@openjdk.java.net>, "ppc-aix-port-dev@openjdk.java.net" <ppc-aix-port-dev@openjdk.java.net> Cc: Michihiro Horie/Japan/IBM@IBMJP, Kazunori Ogata/Japan/IBM@IBMJP, joserz@br.ibm.com Date: 2020/06/24 09:40 Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding
------------------------------------------------------------------------
Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but none for decoding. This means that only encoding gets acceleration from the underlying CPU's vector hardware.
I'd like to propose adding a new intrinsic for decodeBlock. The considerations I have for this new intrinsic's API:
* Don't make any assumptions about the underlying capability of the hardware. For example, do not impose any specific block size granularity.
* Don't assume the underlying intrinsic can handle isMIME or isURL modes, but also let them decide if they will process the data regardless of the settings of the two booleans.
* Any remaining data that is not processed by the intrinsic will be processed by the pure Java implementation. This allows the intrinsic to process whatever block sizes it's good at without the complexity of handling the end fragments.
* If any illegal character is discovered in the decoding process, the intrinsic will simply return -1, instead of requiring it to throw a proper exception from the context of the intrinsic. In the event of getting a -1 returned from the intrinsic, the Java Base64 library code simply calls the pure Java implementation to have it find the error and properly throw an exception. This is a performance trade-off in the case of an error (which I expect to be very rare).
* One thought I have for a further optimization (not implemented in the current patch), is that when the intrinsic decides not to process a block because of some combination of isURL and isMIME settings it doesn't handle, it could return extra bits in the return code, encoded as a negative number. For example:
Illegal_Base64_char = 0b001; isMIME_unsupported = 0b010; isURL_unsupported = 0b100;
These can be OR'd together as needed and then negated (flip the sign). The Base64 library code could then cache these flags, so it will know not to call the intrinsic again when another decodeBlock is requested but with an unsupported mode. This will save the performance hit of calling the intrinsic when it is guaranteed to fail.
I've tested the attached patch with an actual intrinsic coded up for Power9/Power10, but those runtime intrinsics and arch-specific patches aren't attached today. I want to get some consensus on the library-level intrinsic API first.
Also attached is a simple test case to test that the new intrinsic API doesn't break anything.
I'm open to any comments about this.
Thanks for your consideration,
- Corey
Corey Ashford IBM Systems, Linux Technology Center, OpenJDK team cjashfor at us dot ibm dot com [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro Horie/Japan/IBM]
participants (2)
-
Corey Ashford
-
Michihiro Horie