RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v4]
Martin Doerr
mdoerr at openjdk.java.net
Wed Oct 14 10:33:11 UTC 2020
On Wed, 14 Oct 2020 02:04:42 GMT, CoreyAshford <github.com+51754783+CoreyAshford at openjdk.org> wrote:
>>> Hi Corey, thanks for taking some stuff out of the “too short” path. There may be a performance regression when decoding
>>> many short arrays because of the stub call overhead and the usage of the slower part of the Java implementation. We
>>> could do it a little better in many cases to compute the maximum possible iteration count i: i = (sl - sp) / block_size
>>> if (i * block_size > sl - 12) i-- if (i <= 0) return 0 What do you think?
>>
>> Are you thinking of a case where that produces a higher iteration count? It looks effectively the same to me.
>>
>>> I don’t think branch prediction hints are helpful for the “too short” check.
>>
>> My thinking is that most of the time when the intrinsic is called, it will not take the early exit, but I suppose when
>> it is processing a sub-block_size buffer, it will return early every time. I will remove the hints.
>>> And we should better use CCR1 instead of CCR2 which is specified as non-volatile.
>>
>> Ah, I should have checked the calling conventions. I thought all of the CR* regs are volatile. I will fix that.
>>
>>> Did you already find a 2nd reviewer for the PPC64 part?
>>
>> Your original comment said "2nd review", so I thought you meant you need to review it again after the changes. So, no,
>> I haven't looked for or found a second reviewer. Any suggestions? The folks on the team here have been busy with
>> other work. Btw, I'm off today, so I will push commits to the above-mentioned issues tomorrow.
>
>>
>> > Did you already find a 2nd reviewer for the PPC64 part?
>>
>> Your original comment said "2nd review", so I thought you meant you need to review it again after the changes. So, no,
>> I haven't looked for or found a second reviewer. Any suggestions? The folks on the team here have been busy with other
>> work.
>
> I am actively asking for some help here, so maybe within a few days I can get a 2nd reviewer.
Hi Corey,
> Are you thinking of a case where that produces a higher iteration count?
Sorry for the confusion. This is also fine:
length = sl - sp - 12
i = length / block_size
if (i <= 0) return 0
But I still wonder why we should use 2 branches. Why not
srawi_
ble(CCR0, return_zero)
?
> Ah, I should have checked the calling conventions. I thought all of the CR* regs are volatile. I will fix that.
Actually, we do save and restore all CRs, so it’s not a real problem with the current implementation. But I prefer
staying closer to the elf ABI as long as there’s no good reason to do it differently.
> Your original comment said "2nd review", so I thought you meant you need to review it again after the changes.
We usually require at least 2 reviews by different people for all non-trivial changes. And I don’t consider the PPC64
part as trivial. In addition to that, I’m not familiar with Power 10.
Best regards,
Martin
From: CoreyAshford <notifications at github.com>
Sent: Dienstag, 13. Oktober 2020 22:59
To: openjdk/jdk <jdk at noreply.github.com>
Cc: Doerr, Martin <martin.doerr at sap.com>; Mention <mention at noreply.github.com>
Subject: Re: [openjdk/jdk] 8248188: Add IntrinsicCandidate and API for Base64 decoding (#293)
Hi Corey, thanks for taking some stuff out of the “too short” path. There may be a performance regression when decoding
many short arrays because of the stub call overhead and the usage of the slower part of the Java implementation. We
could do it a little better in many cases to compute the maximum possible iteration count i: i = (sl - sp) / block_size
if (i * block_size > sl - 12) i-- if (i <= 0) return 0 What do you think?
Are you thinking of a case where that produces a higher iteration count? It looks effectively the same to me.
I don’t think branch prediction hints are helpful for the “too short” check.
My thinking is that most of the time when the intrinsic is called, it will not take the early exit, but I suppose when
it is processing a sub-block_size buffer, it will return early every time. I will remove the hints.
And we should better use CCR1 instead of CCR2 which is specified as non-volatile.
Ah, I should have checked the calling conventions. I thought all of the CR* regs are volatile. I will fix that.
Did you already find a 2nd reviewer for the PPC64 part?
Your original comment said "2nd review", so I thought you meant you need to review it again after the changes. So, no,
I haven't looked for or found a second reviewer. Any suggestions? The folks on the team here have been busy with other
work.
Btw, I'm off today, so I will push commits to the above-mentioned issues tomorrow.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/293#issuecomment-708005545>, or
unsubscribe<https://github.com/notifications/unsubscribe-auth/AKR64KC2K7IZFAPXVQIVYZDSKS5SLANCNFSM4RVHNW5Q>.
-------------
PR: https://git.openjdk.java.net/jdk/pull/293
More information about the core-libs-dev
mailing list