RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v4]
Paul Sandoz
psandoz at openjdk.java.net
Thu Oct 15 16:02:20 UTC 2020
On Wed, 14 Oct 2020 21:29:55 GMT, CoreyAshford <github.com+51754783+CoreyAshford at openjdk.org> wrote:
>> Hi Corey,
>>
>>> Are you thinking of a case where that produces a higher iteration count?
>> Sorry for the confusion. This is also fine:
>> length = sl - sp - 12
>> i = length / block_size
>> if (i <= 0) return 0
>>
>> But I still wonder why we should use 2 branches. Why not
>> srawi_
>> ble(CCR0, return_zero)
>> ?
>>
>>> Ah, I should have checked the calling conventions. I thought all of the CR* regs are volatile. I will fix that.
>> Actually, we do save and restore all CRs, so it’s not a real problem with the current implementation. But I prefer
>> staying closer to the elf ABI as long as there’s no good reason to do it differently.
>>> Your original comment said "2nd review", so I thought you meant you need to review it again after the changes.
>> We usually require at least 2 reviews by different people for all non-trivial changes. And I don’t consider the PPC64
>> part as trivial. In addition to that, I’m not familiar with Power 10.
>> Best regards,
>> Martin
>>
>>
>> From: CoreyAshford <notifications at github.com>
>> Sent: Dienstag, 13. Oktober 2020 22:59
>> To: openjdk/jdk <jdk at noreply.github.com>
>> Cc: Doerr, Martin <martin.doerr at sap.com>; Mention <mention at noreply.github.com>
>> Subject: Re: [openjdk/jdk] 8248188: Add IntrinsicCandidate and API for Base64 decoding (#293)
>>
>>
>> Hi Corey, thanks for taking some stuff out of the “too short” path. There may be a performance regression when decoding
>> many short arrays because of the stub call overhead and the usage of the slower part of the Java implementation. We
>> could do it a little better in many cases to compute the maximum possible iteration count i: i = (sl - sp) / block_size
>> if (i * block_size > sl - 12) i-- if (i <= 0) return 0 What do you think? Are you thinking of a case where that
>> produces a higher iteration count? It looks effectively the same to me. I don’t think branch prediction hints are
>> helpful for the “too short” check.
>> My thinking is that most of the time when the intrinsic is called, it will not take the early exit, but I suppose when
>> it is processing a sub-block_size buffer, it will return early every time. I will remove the hints.
>> And we should better use CCR1 instead of CCR2 which is specified as non-volatile.
>>
>> Ah, I should have checked the calling conventions. I thought all of the CR* regs are volatile. I will fix that.
>>
>> Did you already find a 2nd reviewer for the PPC64 part?
>>
>> Your original comment said "2nd review", so I thought you meant you need to review it again after the changes. So, no,
>> I haven't looked for or found a second reviewer. Any suggestions? The folks on the team here have been busy with other
>> work. Btw, I'm off today, so I will push commits to the above-mentioned issues tomorrow.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/293#issuecomment-708005545>, or
>> unsubscribe<https://github.com/notifications/unsubscribe-auth/AKR64KC2K7IZFAPXVQIVYZDSKS5SLANCNFSM4RVHNW5Q>.
>
>> Hi Corey,
>> Are you thinking of a case where that produces a higher iteration count?
>> Sorry for the confusion. This is also fine: length = sl - sp - 12 i = length / block_size if (i <= 0) return 0 But I
>> still wonder why we should use 2 branches. Why not srawi_ ble(CCR0, return_zero) ?
>
> You're right! I originally thought that the `srawi.` was setting only the Zero bit, but it sets others as well.
>
>> Ah, I should have checked the calling conventions. I thought all of the CR* regs are volatile. I will fix that.
>> Actually, we do save and restore all CRs, so it’s not a real problem with the current implementation. But I prefer
>> staying closer to the elf ABI as long as there’s no good reason to do it differently.
>
> Looks like I don't need that code at all now, but it's good to know for the future; I have an encode intrinsic in the
> works.
>> Your original comment said "2nd review", so I thought you meant you need to review it again after the changes.
>> We usually require at least 2 reviews by different people for all non-trivial changes. And I don’t consider the PPC64
>> part as trivial. In addition to that, I’m not familiar with Power 10.
>
> I received permission to request help from the GNU toolchain team here to review it. Due to family issues and work
> schedule on my end, it will be at least the middle of next week before I can get a reviewer to have a look.
> Thanks for your continued patience and help.
Please update
[compiler/graalunit/HotspotTest.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/graalunit/HotspotTest.java),
and add the intrinsic signature.
-------------
PR: https://git.openjdk.java.net/jdk/pull/293
More information about the core-libs-dev
mailing list