RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding
Corey Ashford
cjashfor at linux.ibm.com
Wed Aug 26 16:50:05 UTC 2020
Thanks for your careful review, Martin. I will consider what you have
said, and reply with comments/questions and possibly a revised webrev if
I think I can satisfy your concerns.
Regards,
- Corey
On 8/26/20 8:26 AM, Doerr, Martin wrote:
> Hi Corey,
>
> I should explain my comments regarding Base64.java better.
>
>> Let's be precise: "should process a multiple of four" => "must process a
>> multiple of four"
> Did you try to support non-multiple of 4 and this was intended as recommendation?
> I think making it a requirement and simplifying the logic in decode0 is better.
> Or what's the benefit of the recommendation?
>
>>> If any illegal base64 bytes are encountered in the source by the
>>> intrinsic, the intrinsic can return a data length of zero or any
>>> number of bytes before the place where the illegal base64 byte
>>> was encountered.
>> I think this has a drawback. Somebody may use a debugger and want to stop
>> when throwing IllegalArgumentException. He should see the position which
>> matches the Java implementation.
> This is probably hard to understand. Let me try to explain it by example:
> 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the destination array.
> 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed by your specification.
> 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the large while loop in decodeBlockSlow).
> 4. A JVMTI agent (debugger) reads dp and dst.
> 5. The person using the debugger gets angry because more bytes than dp were written into dst. The JVM didn't follow the specified behavior.
>
> I guess we can and should avoid it by specifying that the intrinsic needs to return the dp value matching the number of Bytes written.
>
> Best regards,
> Martin
>
>
>> -----Original Message-----
>> From: Doerr, Martin
>> Sent: Dienstag, 25. August 2020 15:38
>> To: Corey Ashford <cjashfor at linux.ibm.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>> joserz at br.ibm.com
>> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
>> API for Base64 decoding
>>
>> Hi Corey,
>>
>> thanks for proposing this change. I have comments and suggestions
>> regarding various files.
>>
>>
>> Base64.java
>>
>> This is the only file which needs another review from core-libs-dev.
>> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
>> consume as many bytes as the implementation wants.
>>
>> Comment before decodeBlock:
>> Let's be precise: "should process a multiple of four" => "must process a
>> multiple of four"
>>
>>> If any illegal base64 bytes are encountered in the source by the
>>> intrinsic, the intrinsic can return a data length of zero or any
>>> number of bytes before the place where the illegal base64 byte
>>> was encountered.
>> I think this has a drawback. Somebody may use a debugger and want to stop
>> when throwing IllegalArgumentException. He should see the position which
>> matches the Java implementation.
>>
>> Please note that the comment indentation differs from other comments.
>>
>> decode0: Final "else" after return is redundant.
>>
>>
>> stubGenerator_ppc.cpp
>>
>> "__vector" breaks AIX build!
>> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
>> Please either support Big Endian properly or #ifdef it out.
>> What exactly does it on linux?
>> I remember that we had tried such prefixes but were not satisfied. I think it
>> didn't enforce 16 Byte alignment if I remember correctly.
>>
>> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
>> 8086069). So the argument registers for offset, length and isURL may contain
>> garbage in the higher bits.
>>
>> You may want to use load_const_optimized which produces shorter code.
>>
>> You may want to use __ align(32) to align unrolled_loop_start.
>>
>> I'll review the algorithm in detail when I find more time.
>>
>>
>> assembler_ppc.hpp
>> assembler_ppc.inline.hpp
>> vm_version_ppc.cpp
>> vm_version_ppc.hpp
>> Please rebase. Parts of the change were pushed as part of 8248190: Enable
>> Power10 system and implement new byte-reverse instructions
>>
>>
>> vmSymbols.hpp
>> Indentation looks odd at the end.
>>
>>
>> library_call.cpp
>> Good. Indentation style of the call parameters differs from encodeBlock.
>>
>>
>> runtime.cpp
>> Good.
>>
>>
>> aotCodeHeap.cpp
>> vmSymbols.cpp
>> shenandoahSupport.cpp
>> vmStructs_jvmci.cpp
>> shenandoahSupport.cpp
>> escape.cpp
>> runtime.hpp
>> stubRoutines.cpp
>> stubRoutines.hpp
>> vmStructs.cpp
>> Good and trivial.
>>
>>
>> Tests:
>> I think we should have JTREG tests to check for regressions in the future.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: Corey Ashford <cjashfor at linux.ibm.com>
>>> Sent: Mittwoch, 19. August 2020 20:11
>>> To: Michihiro Horie <HORIE at jp.ibm.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>>> joserz at br.ibm.com; Doerr, Martin <martin.doerr at sap.com>
>>> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
>>> API for Base64 decoding
>>>
>>> Michihiro Horie posted up a new iteration of this webrev for me. This
>>> time the webrev includes a complete implementation of the intrinsic for
>>> Power9 and Power10.
>>>
>>> You can find it here:
>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
>>>
>>> Changes in webrev.02 vs. webrev.01:
>>>
>>> * The method header for the intrinsic in the Base64 code has been
>>> rewritten using the Javadoc style. The clarity of the comments has been
>>> improved and some verbosity has been removed. There are no additional
>>> functional changes to Base64.java.
>>>
>>> * The code needed to martial and check the intrinsic parameters has
>>> been added, using the base64 encodeBlock intrinsic as a guideline.
>>>
>>> * A complete intrinsic implementation for Power9 and Power10 is
>> included.
>>>
>>> * Adds some Power9 and Power10 assembler instructions needed by the
>>> intrinsic which hadn't been defined before.
>>>
>>> The intrinsic implementation in this patch accelerates the decoding of
>>> large blocks of base64 data by a factor of about 3.5X on Power9.
>>>
>>> I'm attaching two Java test cases I am using for testing and
>>> benchmarking. The TestBase64_VB encodes and decodes randomly-sized
>>> buffers of random data and checks that original data matches the
>>> encoded-then-decoded data. TestBase64Errors encodes a 48K block of
>>> random bytes, then corrupts each byte of the encoded data, one at a
>>> time, checking to see if the decoder catches the illegal byte.
>>>
>>> Any comments/suggestions would be appreciated.
>>>
>>> Thanks,
>>>
>>> - Corey
>>>
>>> On 7/27/20 6:49 PM, Corey Ashford wrote:
>>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock
>>>> intrinsic API for me:
>>>>
>>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
>>>>
>>>> It has the following changes with respect to the original one posted:
>>>>
>>>> * In the event of encountering a non-base64 character, instead of
>>>> having a separate error code of -1, the intrinsic can now just return
>>>> either 0, or the number of data bytes produced up to the point where
>> the
>>>> illegal base64 character was encountered. This reduces the number of
>>>> special cases, and also provides a way to speed up the process of
>>>> finding the bad character by the slower, pure-Java algorithm.
>>>>
>>>> * The isMIME boolean is removed from the API for two reasons:
>>>> - The current API is not sufficient to handle the isMIME case,
>>>> because there isn't a strict relationship between the number of input
>>>> bytes and the number of output bytes, because there can be an arbitrary
>>>> number of non-base64 characters in the source.
>>>> - If an intrinsic only implements the (isMIME == false) case as ours
>>>> does, it will always return 0 bytes processed, which will slightly slow
>>>> down the normal path of processing an (isMIME == true) instantiation.
>>>> - We considered adding a separate hotspot candidate for the (isMIME
>>>> == true) case, but since we don't have an intrinsic implementation to
>>>> test that, we decided to leave it as a future optimization.
>>>>
>>>> Comments and suggestions are welcome. Thanks for your consideration.
>>>>
>>>> - Corey
>>>>
>>>> On 6/23/20 6:23 PM, Michihiro Horie wrote:
>>>>> Hi Corey,
>>>>>
>>>>> Following is the issue I created.
>>>>> https://bugs.openjdk.java.net/browse/JDK-8248188
>>>>>
>>>>> I will upload a webrev when you're ready as we talked in private.
>>>>>
>>>>> Best regards,
>>>>> Michihiro
>>>>>
>>>>> Inactive hide details for "Corey Ashford" ---2020/06/24
>>>>> 09:40:10---Currently in java.util.Base64, there is a
>>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently
>>>>> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for
>>>>> encodeBlock, but no
>>>>>
>>>>> From: "Corey Ashford" <cjashfor at linux.ibm.com>
>>>>> To: "hotspot-compiler-dev at openjdk.java.net"
>>>>> <hotspot-compiler-dev at openjdk.java.net>,
>>>>> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>>> dev at openjdk.java.net>
>>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori
>>> Ogata/Japan/IBM at IBMJP,
>>>>> joserz at br.ibm.com
>>>>> Date: 2020/06/24 09:40
>>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for
>>>>> Base64 decoding
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
>>>>> API for encodeBlock, but none for decoding. This means that only
>>>>> encoding gets acceleration from the underlying CPU's vector hardware.
>>>>>
>>>>> I'd like to propose adding a new intrinsic for decodeBlock. The
>>>>> considerations I have for this new intrinsic's API:
>>>>>
>>>>> * Don't make any assumptions about the underlying capability of the
>>>>> hardware. For example, do not impose any specific block size
>>>>> granularity.
>>>>>
>>>>> * Don't assume the underlying intrinsic can handle isMIME or isURL
>>>>> modes, but also let them decide if they will process the data regardless
>>>>> of the settings of the two booleans.
>>>>>
>>>>> * Any remaining data that is not processed by the intrinsic will be
>>>>> processed by the pure Java implementation. This allows the intrinsic to
>>>>> process whatever block sizes it's good at without the complexity of
>>>>> handling the end fragments.
>>>>>
>>>>> * If any illegal character is discovered in the decoding process, the
>>>>> intrinsic will simply return -1, instead of requiring it to throw a
>>>>> proper exception from the context of the intrinsic. In the event of
>>>>> getting a -1 returned from the intrinsic, the Java Base64 library code
>>>>> simply calls the pure Java implementation to have it find the error and
>>>>> properly throw an exception. This is a performance trade-off in the
>>>>> case of an error (which I expect to be very rare).
>>>>>
>>>>> * One thought I have for a further optimization (not implemented in
>>>>> the current patch), is that when the intrinsic decides not to process a
>>>>> block because of some combination of isURL and isMIME settings it
>>>>> doesn't handle, it could return extra bits in the return code, encoded
>>>>> as a negative number. For example:
>>>>>
>>>>> Illegal_Base64_char = 0b001;
>>>>> isMIME_unsupported = 0b010;
>>>>> isURL_unsupported = 0b100;
>>>>>
>>>>> These can be OR'd together as needed and then negated (flip the sign).
>>>>> The Base64 library code could then cache these flags, so it will know
>>>>> not to call the intrinsic again when another decodeBlock is requested
>>>>> but with an unsupported mode. This will save the performance hit of
>>>>> calling the intrinsic when it is guaranteed to fail.
>>>>>
>>>>> I've tested the attached patch with an actual intrinsic coded up for
>>>>> Power9/Power10, but those runtime intrinsics and arch-specific patches
>>>>> aren't attached today. I want to get some consensus on the
>>>>> library-level intrinsic API first.
>>>>>
>>>>> Also attached is a simple test case to test that the new intrinsic API
>>>>> doesn't break anything.
>>>>>
>>>>> I'm open to any comments about this.
>>>>>
>>>>> Thanks for your consideration,
>>>>>
>>>>> - Corey
>>>>>
>>>>>
>>>>> Corey Ashford
>>>>> IBM Systems, Linux Technology Center, OpenJDK team
>>>>> cjashfor at us dot ibm dot com
>>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro
>>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro
>>>>> Horie/Japan/IBM]
>>>>>
>>>>>
>>>>
>
More information about the hotspot-compiler-dev
mailing list