RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows
David Holmes
david.holmes at oracle.com
Fri Jun 5 04:35:50 UTC 2020
Hi Simon,
On 4/06/2020 11:35 pm, Simon Tooke wrote:
> Hello, David, and thanks for the review!
>
> I've responded to your comments below, and intend to post a new patch
> for review today or tomorrow.
>
> Thanks again,
>
> -Simon
>
> On 2020-06-03 2:06 a.m., David Holmes wrote:
>> Hi Simon,
>>
>> On 23/05/2020 12:04 am, Sean Mullan wrote:
>>> Cross-posting to hotspot-dev for additional review since the code
>>> changes are in hotspot.
>>>
>>> --Sean
>>>
>>> On 5/21/20 1:24 PM, Simon Tooke wrote:
>>>> Hello,
>>>>
>>>> I'd like to request a review for:
>>>>
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243114
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/
>>>>
>>>> This change implements the given intrinsics on Windows.
>>>>
>>>> The Windows toolchain has no 128 bit integer types, and no inline
>>>> asm (on x64 and Arm). In addition, 'long' is 4 bytes, not 8, as it
>>>> is with gcc. This patch had to change some of the linux
>>>> implementation to account for these limitations.
>>
>> I can't really comment on the intrinsics themselves but a couple of
>> suggestions:
>>
>> - use explicitly sized types e.g. uint64_t instead of unsigned long long
>
> Yes, this hurt to type. A previous review suggested using julong, is
> that acceptable to you?
j* types should only be used when dealing with values that come from or
go to Java. Obviously julong is a misnomer as Java doesn't have an
unsigned type, but in general this is something we are trying to fix up
in the codebase. If these arrays are extracted from Java arrays then
using a j* type would be appropriate, but then I would expect jlong,
unless the algorithm explicitly requires unsigned types? If so julong
would be acceptable.
> (an aside: I'm now wondering if there is other code in the JDK that
> assumes long is 64bits - which is not true on all platforms...)
There has been such code, but hopefully there is no remaining actively
used code with that bug. There are using of "long" that should be
eradicated (there's an open JBS issue for that as I recall) but the
remaining uses don't seem to require the long be 64-bit.
>> - use the existing NOINLINE macro for the _declspec(noinline)
> Thanks, will do.
>>
>> The conditional compilation in this code has me quite confused.
>> Looking at the existing code we have:
>>
>> 3680 #ifndef _WINDOWS
>> 3681
>> 3682 #define ASM_SUBTRACT
>> 3683
>> 3684 #ifdef ASM_SUBTRACT
>> ...
>> 3702 #else // ASM_SUBTRACT
>> ...
>> 3719 #endif // ! ASM_SUBTRACT
>>
>> So all the above code is only built on not-Windows, and we
>> unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to
>> be dead code! I'm not at all sure whether that is actually a bug and
>> the windows ifndef should have had an endif after line 3682; or
>> whether we can just simplify the code.
> The dead code existed prior to this patch, so I wasn't proposing
> removing it. I'm happy to do so if that's for the best. As far as I
> can tell, it's for implementing these intrinsics on gcc platforms
> without assembler.
AFAICS Andrew originally had the ASM_SUBTRACT parts, with the
unconditional #define, but there was no explanation in the review thread
as to why the unused code remained present. Then before integration all
the code was wrapped by the ifndef Windows to exclude it from Windows.
I think it needs to be fixed as you are making changes to the Windows
part and it is very hard to establish how the dead code should look in
that case.
Thanks,
David
>>
>> Thanks,
>> David
>>
>>>>
>>>> My gratitude for Andrew Haley for doing the heavy lifting at the
>>>> core of this patch.
>>>>
>>>> The patch, if accepted, will be offered to 11u as a potential
>>>> backport. The changes apply cleanly modulo some line number changes.
>>>>
>>>> As for the speedup, this test case:
>>>>
>>>> BigInteger base = BigInteger.ONE.shiftLeft(1024);
>>>> long count = LongStream.rangeClosed(2, 100_000)
>>>> .mapToObj(n -> BigInteger.valueOf(n).add(base))
>>>> .filter(i -> i.isProbablePrime(50))
>>>> .count();
>>>>
>>>> goes from 1 minute 20 seconds down to about 35 seconds om my VM,
>>>> over multiple runs. As is the case on other platforms where the
>>>> intrinsics are supported, they will be enabled by default on Windows.
>>>>
>>>> Thank you for your time,
>>>>
>>>> -Simon Tooke
>>>>
>>>>
>>>> Principal Software Engineer,
>>>>
>>>> Red Hat Canada
>>>>
>>>>
>>>>
>>
>
More information about the security-dev
mailing list