RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v2]

Martin Doerr mdoerr at openjdk.java.net
Wed Oct 7 16:29:13 UTC 2020


On Mon, 5 Oct 2020 18:29:58 GMT, CoreyAshford <github.com+51754783+CoreyAshford at openjdk.org> wrote:

>> This patch set encompasses the following commits:
>> 
>> - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for
>>   the intrinsic.  The API is similar to the existing encodeBlock intrinsic.
>> - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic
>>   implementation
>> - Adds a Power64LE-specific implementation of the decodeBlock intrinsic.
>> - Adds a JMH microbenchmark for both Base64 encoding and encoding.
>> - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding.
>
> CoreyAshford has updated the pull request with a new target base due to a merge or a rebase. The pull request now
> contains ten commits:
>  - AOT: Revert change to aotCodeHeap.cpp for decodeBlock
>    
>    Don't add the SET_AOT_GLOBAL_SYMBOL_VALUE macro for decode block until all
>    arches that implement AOT, implement the decodeBlock intrinsic.
>  - Base64.java decodeBlock: Changes from PR review
>    
>     * Make comparison safer and consistent with the while loop
>     * Update comment about the decodeBlock intrinsic so that it matches the new structure
>     * Add comment about the lack of a length check on the destination buffer
>     * As per issue 8138732, change HotSpotIntrinsicCandidate to IntrinsicCandidate
>  - stubGenerator_ppc.cpp: Changes from PR review
>    
>     * Fix clearing of upper bits to clear 32 bits instead of 31 (due to misreading of clrldi instruction)
>     * change and document loop_unrolls setting from 8 to 2 after re-running the benchmark
>     * align unrolled loop on a 32-byte boundary
>     * replace instruction used for checking isURL from a double word to single
>       word instruction since the register is effectively 32 bits wide
>     * cosmetic change to realign register comments.
>  - TestBase64.java: Changes from PR review
>    
>     * Use Utils.toByteArrays() method instead of a locally-defined method
>     * Generate the two non-Base64 tables dynamically rather than use static initialization
>     * Added comments describing the two above-mentioned arrays
>  - Expand the Base64 intrinsic regression test to cover decodeBlock
>    
>    This patch makes four significant changes:
>    
>    1) The Power implementation of the decodeBlock intrinsic, at least,
>    requires a decode length of at least 128 bytes, but the existing test cases
>    are much shorter, maxing out at 111 bytes.  So the patch adds a new input
>    data file which has longer test cases in it.
>    
>    2) The original test cases only covers the encoding of just the printable
>    subset of the 7-bit ASCII characters.  However, Base64 encoding requires
>    being able to encode arbitrary binary data, i.e. it must handle all 256
>    8-bit byte encodings.  To remedy this, but keep the original line-oriented
>    style of the input data, I added another input file type that uses a simple
>    ASCII hexadecimal encoding - two ASCII hex characters per 8-bit byte.  When
>    test0 is called, a new parameter is passed that specifies the type of the
>    input file, which is either the original ASCII type or the hexadecimal
>    format.  So to test both longer input data and arbitrary 8-bit data, the
>    newly added input test file has test cases which are both longer and
>    encoded in ASCII hex so as to give full 8-bit capability.  When reading
>    this type of file, test0 calls a newly-added function to translate the
>    ASCII hex to binary data.  Except for the first line of input data, which
>    contains all possible 8-bit values sequentially, the input data was
>    generated using a random length (between 111 and 520 bytes) buffer filled
>    with random 8-bit data, which should give adequate coverage.
>    
>    3) The original test did not test that the decoder detects illegal Base64
>    bytes.  This change chooses a random location in the encoded data to
>    corrupt with a randomly-chosen byte which is illegal for the specific
>    Base64 encoding that is chosen (i.e. standard or URLsafe).  It then calls
>    the decode function to verify that the illegal byte is detected and the
>    proper exception is thrown.
>    
>    4) The test iteration count was originally 100K, but that is far more than
>    enough iterations to test the intrinsic.  It takes 20K iterations on each
>    instrinsic for HotSpot C2 to begin calling it.  The test originally had
>    three types of encodings to test and called the encode intrinsic four times
>    for each iteration, which works out to 100K * 3 * 4 = 1.2M calls just to
>    encode.  Decode was called four times as well (now five because of the
>    illegal byte test).  I believe this is excessive and with the extra test
>    data I have added, the test was timing out after ten minutes of execution.
>    It appears that it is timing out, not because the intrinsics take a long
>    time to run, but because test0 generates an enormous number of discarded
>    data buffers for the GC system to recover (the test runs at about 39GB of
>    virtual memory on my test machine).  To remedy the timeout problem, I have
>    changed the code so that a warmup function of 20K repetitions is performed
>    on a fixed buffer, to activate the instrinsic(s).  After the warmup, I have
>    reduced the number of iterations to 5K on each test0 call.  This should
>    give adequate coverage.
>  - Add JMH benchmark for Base64 variable length buffer decoding
>  - Add Power9+ intrinsic implementation for Base64 decoding
>  - Add HotSpot code to implement Base64 decodeBlock API
>  - Add HotSpotIntrinsicCandidate and API for Base64 decoding

Hi Corey,
thanks for contributing this change. Looks basically good. Please address the inline comments from Roger and me.
Core libs part is reviewed by Roger and the whole change by me. The shared hotspot part is straight forward because
it's very similar to the encode intrinsic. So I think we only need a 2nd review for the PPC64 algorithm implementation.
I can sponsor the change when this is completed.

-------------

Changes requested by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/293


More information about the core-libs-dev mailing list