RFR: 8248188: Add HotSpotIntrinsicCandidate and API for Base64 decoding
CoreyAshford
github.com+51754783+CoreyAshford at openjdk.java.net
Mon Sep 28 17:20:45 UTC 2020
On Mon, 28 Sep 2020 16:35:59 GMT, CoreyAshford <github.com+51754783+CoreyAshford at openjdk.org> wrote:
>> AOT support needs an update:
>> # Internal Error (jdk/src/hotspot/share/aot/aotCodeHeap.cpp:557), pid=345656, tid=364316
>> # guarantee(adr != NULL) failed: AOT Symbol not found _aot_stub_routines_base64_decodeBlock
>>
>> V [jvm.dll+0x1dbc6e] AOTCodeHeap::link_stub_routines_symbols+0xf7e (aotcodeheap.cpp:557)
>> V [jvm.dll+0x1d95e8] AOTCodeHeap::link_global_lib_symbols+0x2f8 (aotcodeheap.cpp:603)
>> V [jvm.dll+0x1dc616] AOTCodeHeap::load_klass_data+0x476 (aotcodeheap.cpp:840)
>> V [jvm.dll+0x1e1021] AOTLoader::load_for_klass+0x161 (aotloader.cpp:55)
>> V [jvm.dll+0x5fec96] InstanceKlass::initialize_impl+0x4e6 (instanceklass.cpp:1159)
>> V [jvm.dll+0x5fead6] InstanceKlass::initialize_impl+0x326 (instanceklass.cpp:1133)
>> V [jvm.dll+0xc5a633] Threads::initialize_java_lang_classes+0x93 (thread.cpp:3766)
>> V [jvm.dll+0xc57c32] Threads::create_vm+0xa12 (thread.cpp:4037)
>>
>> Can be reproduced by running JTREG tests:
>> compiler/aot/calls/fromAot
>
>> AOT support needs an update:
>> # Internal Error (jdk/src/hotspot/share/aot/aotCodeHeap.cpp:557), pid=345656, tid=364316
>> # guarantee(adr != NULL) failed: AOT Symbol not found _aot_stub_routines_base64_decodeBlock
>>
>> V [jvm.dll+0x1dbc6e] AOTCodeHeap::link_stub_routines_symbols+0xf7e (aotcodeheap.cpp:557)
>> V [jvm.dll+0x1d95e8] AOTCodeHeap::link_global_lib_symbols+0x2f8 (aotcodeheap.cpp:603)
>> V [jvm.dll+0x1dc616] AOTCodeHeap::load_klass_data+0x476 (aotcodeheap.cpp:840)
>> V [jvm.dll+0x1e1021] AOTLoader::load_for_klass+0x161 (aotloader.cpp:55)
>> V [jvm.dll+0x5fec96] InstanceKlass::initialize_impl+0x4e6 (instanceklass.cpp:1159)
>> V [jvm.dll+0x5fead6] InstanceKlass::initialize_impl+0x326 (instanceklass.cpp:1133)
>> V [jvm.dll+0xc5a633] Threads::initialize_java_lang_classes+0x93 (thread.cpp:3766)
>> V [jvm.dll+0xc57c32] Threads::create_vm+0xa12 (thread.cpp:4037)
>>
>> Can be reproduced by running JTREG tests:
>> compiler/aot/calls/fromAot
>
> Thanks for catching that! Will fix on next round.
Martin Doerr wrote:
...
> I can see you're using clrldi to clear the upper bits of the parameters. But seems like it clears one bit too few.
You're right. I misread the instruction as clear from bit 0 to bit N, but it's actually create a mask with bits N to
63 with one's, zeroes elsewhere, then AND it with the src register.
Will fix.
> You can also use cmpwi for the boolean one.
Ah, good! Thanks. Will change.
>
> I wonder about the loop unrolling. It doesn't look beneficial because the loop body is large.
> Did you measure performance gain by this unrolling?
> I think for agressive tuning we'd have to apply techniques like modulo scheduling, but that's much more work.
> So please only use unrolling as far as a benefit is measurable.
I did test on a prototype written in C using vector intrinsics, and 8 was the sweet spot, however the structure of that
code was a bit different and I should have verified that the same amount of loop unrolling makes sense for the Java
intrinsic. I will perform those experiments.
>
> But you may want to align the loop start to help instruction fetch.
Interesting. I did add an align, but in my patch clean up I must have lost it again somehow. I will add it back
again. Sorry for that mistake.
> We'll test it, but we don't have Power 10. You guys need to cover that.
I did test on Power10, but I wasn't able to do performance testing because I ran on an instruction-level simulator.
Real hardware will be available in the coming months.
Thanks for your careful look at the code, and the regression testing you've done.
-------------
PR: https://git.openjdk.java.net/jdk/pull/293
More information about the shenandoah-dev
mailing list