RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v2]
CoreyAshford
github.com+51754783+coreyashford at openjdk.java.net
Wed Oct 7 17:24:17 UTC 2020
On Wed, 7 Oct 2020 14:03:14 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:
>> CoreyAshford has updated the pull request with a new target base due to a merge or a rebase. The pull request now
>> contains ten commits:
>> - AOT: Revert change to aotCodeHeap.cpp for decodeBlock
>>
>> Don't add the SET_AOT_GLOBAL_SYMBOL_VALUE macro for decode block until all
>> arches that implement AOT, implement the decodeBlock intrinsic.
>> - Base64.java decodeBlock: Changes from PR review
>>
>> * Make comparison safer and consistent with the while loop
>> * Update comment about the decodeBlock intrinsic so that it matches the new structure
>> * Add comment about the lack of a length check on the destination buffer
>> * As per issue 8138732, change HotSpotIntrinsicCandidate to IntrinsicCandidate
>> - stubGenerator_ppc.cpp: Changes from PR review
>>
>> * Fix clearing of upper bits to clear 32 bits instead of 31 (due to misreading of clrldi instruction)
>> * change and document loop_unrolls setting from 8 to 2 after re-running the benchmark
>> * align unrolled loop on a 32-byte boundary
>> * replace instruction used for checking isURL from a double word to single
>> word instruction since the register is effectively 32 bits wide
>> * cosmetic change to realign register comments.
>> - TestBase64.java: Changes from PR review
>>
>> * Use Utils.toByteArrays() method instead of a locally-defined method
>> * Generate the two non-Base64 tables dynamically rather than use static initialization
>> * Added comments describing the two above-mentioned arrays
>> - Expand the Base64 intrinsic regression test to cover decodeBlock
>>
>> This patch makes four significant changes:
>>
>> 1) The Power implementation of the decodeBlock intrinsic, at least,
>> requires a decode length of at least 128 bytes, but the existing test cases
>> are much shorter, maxing out at 111 bytes. So the patch adds a new input
>> data file which has longer test cases in it.
>>
>> 2) The original test cases only covers the encoding of just the printable
>> subset of the 7-bit ASCII characters. However, Base64 encoding requires
>> being able to encode arbitrary binary data, i.e. it must handle all 256
>> 8-bit byte encodings. To remedy this, but keep the original line-oriented
>> style of the input data, I added another input file type that uses a simple
>> ASCII hexadecimal encoding - two ASCII hex characters per 8-bit byte. When
>> test0 is called, a new parameter is passed that specifies the type of the
>> input file, which is either the original ASCII type or the hexadecimal
>> format. So to test both longer input data and arbitrary 8-bit data, the
>> newly added input test file has test cases which are both longer and
>> encoded in ASCII hex so as to give full 8-bit capability. When reading
>> this type of file, test0 calls a newly-added function to translate the
>> ASCII hex to binary data. Except for the first line of input data, which
>> contains all possible 8-bit values sequentially, the input data was
>> generated using a random length (between 111 and 520 bytes) buffer filled
>> with random 8-bit data, which should give adequate coverage.
>>
>> 3) The original test did not test that the decoder detects illegal Base64
>> bytes. This change chooses a random location in the encoded data to
>> corrupt with a randomly-chosen byte which is illegal for the specific
>> Base64 encoding that is chosen (i.e. standard or URLsafe). It then calls
>> the decode function to verify that the illegal byte is detected and the
>> proper exception is thrown.
>>
>> 4) The test iteration count was originally 100K, but that is far more than
>> enough iterations to test the intrinsic. It takes 20K iterations on each
>> instrinsic for HotSpot C2 to begin calling it. The test originally had
>> three types of encodings to test and called the encode intrinsic four times
>> for each iteration, which works out to 100K * 3 * 4 = 1.2M calls just to
>> encode. Decode was called four times as well (now five because of the
>> illegal byte test). I believe this is excessive and with the extra test
>> data I have added, the test was timing out after ten minutes of execution.
>> It appears that it is timing out, not because the intrinsics take a long
>> time to run, but because test0 generates an enormous number of discarded
>> data buffers for the GC system to recover (the test runs at about 39GB of
>> virtual memory on my test machine). To remedy the timeout problem, I have
>> changed the code so that a warmup function of 20K repetitions is performed
>> on a fixed buffer, to activate the instrinsic(s). After the warmup, I have
>> reduced the number of iterations to 5K on each test0 call. This should
>> give adequate coverage.
>> - Add JMH benchmark for Base64 variable length buffer decoding
>> - Add Power9+ intrinsic implementation for Base64 decoding
>> - Add HotSpot code to implement Base64 decodeBlock API
>> - Add HotSpotIntrinsicCandidate and API for Base64 decoding
>
> src/java.base/share/classes/java/util/Base64.java line 812:
>
>> 810:
>> 811: while (sp < sl) {
>> 812: if (shiftto == 18 && sp + 4 < sl) { // fast path
>
> Please change to sp < s1 - 4. Current version is sensitive to integer overflow. That's not a real problem in the
> current code, because the next check catches that, but we should better avoid this with the new intrinsics.
Good catch. Will fix.
> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3818:
>
>> 3816: __ cmpd(CCR0, end, in);
>> 3817: __ blt_predict_not_taken(CCR0, unrolled_loop_exit);
>> 3818: __ align(32);
>
> align should be before bind(unrolled_loop_start)
oops, yes, that was dumb. Good catch! Will fix and re-benchmark.
> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3:
>
>> 1: /*
>> 2: * Copyright (c) 1997, 2020, Oracle and/or its affiliates. All rights reserved.
>> 3: * Copyright (c) 2012, 2020, SAP SE. All rights reserved.
>
> No comma before SAP SE, please! (See https://bugs.openjdk.java.net/browse/JDK-8252837)
Interesting. I was trying to make it consistent with the Oracle copyright. Ok, will fix.
> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3671:
>
>> 3669: // an advantage to keeping loop_unrolls small (to be able to process
>> 3670: // smaller buffers), 2 is clearly the best choice.
>> 3671: const unsigned loop_unrolls = 2;
>
> Unrolling should be re-evaluated after alignment is fixed. align(32) is currently at the wrong place (see my comment
> below).
Agreed.
> src/hotspot/cpu/ppc/vm_version_ppc.cpp line 160:
>
>> 158: if (UseBASE64Intrinsics) {
>> 159: warning("UseBASE64Intrinsics specified, but needs at least Power9.");
>> 160: FLAG_SET_DEFAULT(UseCharacterCompareIntrinsics, false);
>
> Copy & paste bug.
Oops, yes, will fix.
> src/hotspot/share/classfile/vmIntrinsics.cpp line 491:
>
>> 489: if (!UseGHASHIntrinsics) return true;
>> 490: break;
>> 491: case vmIntrinsics::_base64_decodeBlock:
>
> I'd prefer to use consistent order. You have inserted decode after encode at other places.
I agree. Will fix.
> src/hotspot/share/opto/library_call.cpp line 310:
>
>> 308: bool inline_base64_decodeBlock();
>> 309: bool inline_digestBase_implCompress(vmIntrinsics::ID id);
>> 310: bool inline_sha_implCompress(vmIntrinsics::ID id);
>
> Why is that in this change?
Good catch! I'm not sure what happened there. Will investigate.
> src/hotspot/share/opto/runtime.cpp line 1211:
>
>> 1209: // result type needed
>> 1210: fields = TypeTuple::fields(1);
>> 1211: fields[TypeFunc::Parms + 0] = TypeInt::INT; // dst ofs, or -1
>
> Why ", or -1" in the comment?
At one point the intrinsic would return -1 in the event of encountering a non-base64 byte. I will update the comment
to be correct for the revised semantics.
-------------
PR: https://git.openjdk.java.net/jdk/pull/293
More information about the shenandoah-dev
mailing list