RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v2]

Thu Oct 8 06:53:42 UTC 2020

On Wed, 7 Oct 2020 16:26:04 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> CoreyAshford has updated the pull request with a new target base due to a merge or a rebase. The pull request now
>> contains ten commits:
>>  - AOT: Revert change to aotCodeHeap.cpp for decodeBlock
>>    
>>    Don't add the SET_AOT_GLOBAL_SYMBOL_VALUE macro for decode block until all
>>    arches that implement AOT, implement the decodeBlock intrinsic.
>>  - Base64.java decodeBlock: Changes from PR review
>>    
>>     * Make comparison safer and consistent with the while loop
>>     * Update comment about the decodeBlock intrinsic so that it matches the new structure
>>     * Add comment about the lack of a length check on the destination buffer
>>     * As per issue 8138732, change HotSpotIntrinsicCandidate to IntrinsicCandidate
>>  - stubGenerator_ppc.cpp: Changes from PR review
>>    
>>     * Fix clearing of upper bits to clear 32 bits instead of 31 (due to misreading of clrldi instruction)
>>     * change and document loop_unrolls setting from 8 to 2 after re-running the benchmark
>>     * align unrolled loop on a 32-byte boundary
>>     * replace instruction used for checking isURL from a double word to single
>>       word instruction since the register is effectively 32 bits wide
>>     * cosmetic change to realign register comments.
>>  - TestBase64.java: Changes from PR review
>>    
>>     * Use Utils.toByteArrays() method instead of a locally-defined method
>>     * Generate the two non-Base64 tables dynamically rather than use static initialization
>>     * Added comments describing the two above-mentioned arrays
>>  - Expand the Base64 intrinsic regression test to cover decodeBlock
>>    
>>    This patch makes four significant changes:
>>    
>>    1) The Power implementation of the decodeBlock intrinsic, at least,
>>    requires a decode length of at least 128 bytes, but the existing test cases
>>    are much shorter, maxing out at 111 bytes.  So the patch adds a new input
>>    data file which has longer test cases in it.
>>    
>>    2) The original test cases only covers the encoding of just the printable
>>    subset of the 7-bit ASCII characters.  However, Base64 encoding requires
>>    being able to encode arbitrary binary data, i.e. it must handle all 256
>>    8-bit byte encodings.  To remedy this, but keep the original line-oriented
>>    style of the input data, I added another input file type that uses a simple
>>    ASCII hexadecimal encoding - two ASCII hex characters per 8-bit byte.  When
>>    test0 is called, a new parameter is passed that specifies the type of the
>>    input file, which is either the original ASCII type or the hexadecimal
>>    format.  So to test both longer input data and arbitrary 8-bit data, the
>>    newly added input test file has test cases which are both longer and
>>    encoded in ASCII hex so as to give full 8-bit capability.  When reading
>>    this type of file, test0 calls a newly-added function to translate the
>>    ASCII hex to binary data.  Except for the first line of input data, which
>>    contains all possible 8-bit values sequentially, the input data was
>>    generated using a random length (between 111 and 520 bytes) buffer filled
>>    with random 8-bit data, which should give adequate coverage.
>>    
>>    3) The original test did not test that the decoder detects illegal Base64
>>    bytes.  This change chooses a random location in the encoded data to
>>    corrupt with a randomly-chosen byte which is illegal for the specific
>>    Base64 encoding that is chosen (i.e. standard or URLsafe).  It then calls
>>    the decode function to verify that the illegal byte is detected and the
>>    proper exception is thrown.
>>    
>>    4) The test iteration count was originally 100K, but that is far more than
>>    enough iterations to test the intrinsic.  It takes 20K iterations on each
>>    instrinsic for HotSpot C2 to begin calling it.  The test originally had
>>    three types of encodings to test and called the encode intrinsic four times
>>    for each iteration, which works out to 100K * 3 * 4 = 1.2M calls just to
>>    encode.  Decode was called four times as well (now five because of the
>>    illegal byte test).  I believe this is excessive and with the extra test
>>    data I have added, the test was timing out after ten minutes of execution.
>>    It appears that it is timing out, not because the intrinsics take a long
>>    time to run, but because test0 generates an enormous number of discarded
>>    data buffers for the GC system to recover (the test runs at about 39GB of
>>    virtual memory on my test machine).  To remedy the timeout problem, I have
>>    changed the code so that a warmup function of 20K repetitions is performed
>>    on a fixed buffer, to activate the instrinsic(s).  After the warmup, I have
>>    reduced the number of iterations to 5K on each test0 call.  This should
>>    give adequate coverage.
>>  - Add JMH benchmark for Base64 variable length buffer decoding
>>  - Add Power9+ intrinsic implementation for Base64 decoding
>>  - Add HotSpot code to implement Base64 decodeBlock API
>>  - Add HotSpotIntrinsicCandidate and API for Base64 decoding
>
> Hi Corey,
> thanks for contributing this change. Looks basically good. Please address the inline comments from Roger and me.
> Core libs part is reviewed by Roger and the whole change by me. The shared hotspot part is straight forward because
> it's very similar to the encode intrinsic. So I think we only need a 2nd review for the PPC64 algorithm implementation.
> I can sponsor the change when this is completed.

Ready for another review.  I hope I addressed all of the issues raised.

-------------

PR: https://git.openjdk.java.net/jdk/pull/293