RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding

Mon Aug 31 16:41:32 UTC 2020

On 8/27/20 8:07 AM, Doerr, Martin wrote:
> Hi Corey,
> 
>> If I make a requirement, I feel decode0 should check that the
>> requirement is met, and raise some kind of internal error if it isn't.
>> That actually was my first implementation, but I received some comments
>> during an internal review suggesting that I just "round down" the
>> destination count to the closest multiple of 3 less than or equal to the
>> returned value, rather than throw an internal exception which would
>> confuse users.  This "enforces" the rule, in some sense, without error
>> handling.  Do you have some thoughts about this?
> 
> I think the rounding logic is hard to understand and I'm not sure if it's correct (you're rounding up for the 1st computation of chars_decoded).
> If we don't use it, it will never get tested (because the intrinsic always returns a multiple of 3).
> I prefer having a more simple version which is easy to understand and for which we can test all cases.

I will see what I can do with the calculation of chars_decoded, at least 
in the comments, to make it more clear as to the "why" of the calculation.

I will remove the round down code: "dl = (dl / 3) * 3;" and leave it for 
intrinsics implementers/maintainers to check that assumption when the 
intrinsic returns.

> 
> I think we should be able to catch violations of this requirement by adding good JTREG tests.
> An illegal intrinsic implementation should never pass the tests. So I don't see a need to catch an illegal state in the Java source code in this case.
> I guess this will be best for intrinsic implementors for other platforms as well.
> 
> I'd appreciate more opinions on this.
> 
> 
>> I will double check that everything compiles and runs properly with gcc
>> 7.3.1.
> Please note that 7.3.1 is our minimum for Big Endian linux. For Little Endian it's 7.4.0.

Ah, that might explain why I wasn't able to find gcc-7.3.1 on RHEL 8.1 
(gcc-8.3.1) or Ubuntu 16.04 (gcc-7.4.0) for Power9.  As long as the code 
is enabled on little endian machines only, there should be no trouble 
with compilation.  I did compile and run the tests against 7.4.0, and it 
worked without a problem.

> You can also find this information here:
> https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms
> under "Other JDK 13 build platforms" which hasn't changed since then.
>

Great, thank you.

>> I will use __attribute__ ((align(16))) instead of __vector, and make
>> them arrays of 16 unsigned char.
> Maybe __vectors works as expected, too, now. Whatever we use, I'd appreciate to double-check the alignment e.g. by using gdb.

Ok, I will experiment with that with some small test cases and see if I 
can make the compiler stumble and not align the vector.  The lxv 
instruction can handle unaligned vectors in memory, but it would be 
better to have the vectors aligned for performance reasons.

> I don't remember what we had tried and why it didn't work as desired.
> 
> 
>> I was following what was done for encodeBlock, but it appears
>> encodeBlock's style isn't what is used for the other intrinsics.  I will
>> correct decodeBlock to use the prevailing style.  Another patch should
>> be added (not part of this webrev) to correct encodeBlock's style.
> In your code one '\' is not aligned with the other ones.

Yes, it's corrected now.

> 
> 
>> Ah, this is another thing I didn't know about.  I will make some
>> regression tests.
> Thanks. There's some documentation available:
> https://openjdk.java.net/jtreg/
> I guess your colleagues can assist you with that so you don't have to figure out everything alone.

Yes, thank you.  JTREG tests will be part of the next webrev version.

Regards,

- Corey

> 
> 
>> Thanks for your time on this.  As you can tell, I'm inexperienced in
>> writing openjdk code, so your patience and careful review is really
>> appreciated.
> I'm glad you work on contributions. I think we should welcome new contributors and assist as far as we can.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Corey Ashford <cjashfor at linux.ibm.com>
>> Sent: Donnerstag, 27. August 2020 00:17
>> To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>> joserz at br.ibm.com
>> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
>> API for Base64 decoding
>>
>> Hi Martin,
>>
>> Some inline responses below.
>>
>> On 8/26/20 8:26 AM, Doerr, Martin wrote:
>>
>>> Hi Corey,
>>>
>>> I should explain my comments regarding Base64.java better.
>>>
>>>> Let's be precise: "should process a multiple of four" => "must process a
>>>> multiple of four"
>>> Did you try to support non-multiple of 4 and this was intended as
>> recommendation?
>>> I think making it a requirement and simplifying the logic in decode0 is
>> better.
>>> Or what's the benefit of the recommendation?
>>
>> If I make a requirement, I feel decode0 should check that the
>> requirement is met, and raise some kind of internal error if it isn't.
>> That actually was my first implementation, but I received some comments
>> during an internal review suggesting that I just "round down" the
>> destination count to the closest multiple of 3 less than or equal to the
>> returned value, rather than throw an internal exception which would
>> confuse users.  This "enforces" the rule, in some sense, without error
>> handling.  Do you have some thoughts about this?
>>
>>>
>>>>> If any illegal base64 bytes are encountered in the source by the
>>>>> intrinsic, the intrinsic can return a data length of zero or any
>>>>> number of bytes before the place where the illegal base64 byte
>>>>> was encountered.
>>>> I think this has a drawback. Somebody may use a debugger and want to
>> stop
>>>> when throwing IllegalArgumentException. He should see the position
>> which
>>>> matches the Java implementation.kkkk
>>> This is probably hard to understand. Let me try to explain it by example:
>>> 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the
>> destination array.
>>> 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed
>> by your specification.
>>> 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the
>> large while loop in decodeBlockSlow).
>>> 4. A JVMTI agent (debugger) reads dp and dst.
>>> 5. The person using the debugger gets angry because more bytes than dp
>> were written into dst. The JVM didn't follow the specified behavior.
>>>
>>> I guess we can and should avoid it by specifying that the intrinsic needs to
>> return the dp value matching the number of Bytes written.
>>
>> That's an interesting point.  I will change the specification, and the
>> intrinsic implementation.  Right now the Power9/10 intrinsic returns 0
>> when any illegal character is discovered, but I've been thinking about
>> returning the number of bytes already written, which will allow
>> decodeBlockSlow to more quickly find the offending character.  This
>> provides another good reason to make that change.
>>
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Doerr, Martin
>>>> Sent: Dienstag, 25. August 2020 15:38
>>>> To: Corey Ashford <cjashfor at linux.ibm.com>; Michihiro Horie
>>>> <HORIE at jp.ibm.com>
>>>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>>>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>>>> joserz at br.ibm.com
>>>> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate
>> and
>>>> API for Base64 decoding
>>>>
>>>> Hi Corey,
>>>>
>>>> thanks for proposing this change. I have comments and suggestions
>>>> regarding various files.
>>>>
>>>>
>>>> Base64.java
>>>>
>>>> This is the only file which needs another review from core-libs-dev.
>>>> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
>>>> consume as many bytes as the implementation wants.
>>>>
>>>> Comment before decodeBlock:
>>>> Let's be precise: "should process a multiple of four" => "must process a
>>>> multiple of four"
>>>>
>>>>> If any illegal base64 bytes are encountered in the source by the
>>>>> intrinsic, the intrinsic can return a data length of zero or any
>>>>> number of bytes before the place where the illegal base64 byte
>>>>> was encountered.
>>>> I think this has a drawback. Somebody may use a debugger and want to
>> stop
>>>> when throwing IllegalArgumentException. He should see the position
>> which
>>>> matches the Java implementation.
>>>>
>>>> Please note that the comment indentation differs from other comments.
>>
>> Will fix.
>>
>>>>
>>>> decode0: Final "else" after return is redundant.
>>
>> Will fix.
>>
>>>>
>>>>
>>>> stubGenerator_ppc.cpp
>>>>
>>>> "__vector" breaks AIX build!
>>>> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
>>>> Please either support Big Endian properly or #ifdef it out.
>>
>> I have been compiling with only Advance Toolchain 13, which is 9.3.1,
>> and only on Linux.  It will not work with big endian, so it won't work
>> on AIX, however obviously it shouldn't break the AIX build, so I will
>> address that.  There's code to set UseBASE64Intrinsics to false on big
>> endian, but you're right -- I should ifdef all of the intrinsic code for
>> little endian for now.  Getting it to work on big endian / AIX shouldn't
>> be difficult, but it's not in my scope of work at the moment.
>>
>> I will double check that everything compiles and runs properly with gcc
>> 7.3.1.
>>
>>>> What exactly does it (do) on linux?
>>
>> It's an arch-specific type that's 16 bytes in size and aligned on a
>> 16-byte boundary.
>>
>>>> I remember that we had tried such prefixes but were not satisfied. I think
>> it
>>>> didn't enforce 16 Byte alignment if I remember correctly.
>>
>> I will use __attribute__ ((align(16))) instead of __vector, and make
>> them arrays of 16 unsigned char.
>>
>>>>
>>>> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
>>>> 8086069). So the argument registers for offset, length and isURL may
>> contain
>>>> garbage in the higher bits.
>>
>> Wow, that's good to know!  I will mask off the incoming values.
>>
>>>>
>>>> You may want to use load_const_optimized which produces shorter code.
>>
>> Will fix.
>>
>>>>
>>>> You may want to use __ align(32) to align unrolled_loop_start.
>>
>> Will fix.
>>
>>>>
>>>> I'll review the algorithm in detail when I find more time.
>>>>
>>>>
>>>> assembler_ppc.hpp
>>>> assembler_ppc.inline.hpp
>>>> vm_version_ppc.cpp
>>>> vm_version_ppc.hpp
>>>> Please rebase. Parts of the change were pushed as part of 8248190:
>> Enable
>>>> Power10 system and implement new byte-reverse instructions
>>
>> Will do.
>>
>>>>
>>>>
>>>> vmSymbols.hpp
>>>> Indentation looks odd at the end.
>>
>> I was following what was done for encodeBlock, but it appears
>> encodeBlock's style isn't what is used for the other intrinsics.  I will
>> correct decodeBlock to use the prevailing style.  Another patch should
>> be added (not part of this webrev) to correct encodeBlock's style.
>>
>>>>
>>>>
>>>> library_call.cpp
>>>> Good. Indentation style of the call parameters differs from encodeBlock.
>>
>> Will fix.
>>
>>>>
>>>>
>>>> runtime.cpp
>>>> Good.
>>>>
>>>>
>>>> aotCodeHeap.cpp
>>>> vmSymbols.cpp
>>>> shenandoahSupport.cpp
>>>> vmStructs_jvmci.cpp
>>>> shenandoahSupport.cpp
>>>> escape.cpp
>>>> runtime.hpp
>>>> stubRoutines.cpp
>>>> stubRoutines.hpp
>>>> vmStructs.cpp
>>>> Good and trivial.
>>>>
>>>>
>>>> Tests:
>>>> I think we should have JTREG tests to check for regressions in the future.
>>
>> Ah, this is another thing I didn't know about.  I will make some
>> regression tests.
>>
>> Thanks for your time on this.  As you can tell, I'm inexperienced in
>> writing openjdk code, so your patience and careful review is really
>> appreciated.
>>
>> - Corey