RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows
David Holmes
david.holmes at oracle.com
Wed Jun 3 06:06:36 UTC 2020
Hi Simon,
On 23/05/2020 12:04 am, Sean Mullan wrote:
> Cross-posting to hotspot-dev for additional review since the code
> changes are in hotspot.
>
> --Sean
>
> On 5/21/20 1:24 PM, Simon Tooke wrote:
>> Hello,
>>
>> I'd like to request a review for:
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243114
>>
>> Webrev: http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/
>>
>> This change implements the given intrinsics on Windows.
>>
>> The Windows toolchain has no 128 bit integer types, and no inline asm
>> (on x64 and Arm). In addition, 'long' is 4 bytes, not 8, as it is
>> with gcc. This patch had to change some of the linux implementation
>> to account for these limitations.
I can't really comment on the intrinsics themselves but a couple of
suggestions:
- use explicitly sized types e.g. uint64_t instead of unsigned long long
- use the existing NOINLINE macro for the _declspec(noinline)
The conditional compilation in this code has me quite confused. Looking
at the existing code we have:
3680 #ifndef _WINDOWS
3681
3682 #define ASM_SUBTRACT
3683
3684 #ifdef ASM_SUBTRACT
...
3702 #else // ASM_SUBTRACT
...
3719 #endif // ! ASM_SUBTRACT
So all the above code is only built on not-Windows, and we
unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to be
dead code! I'm not at all sure whether that is actually a bug and the
windows ifndef should have had an endif after line 3682; or whether we
can just simplify the code.
Thanks,
David
>>
>> My gratitude for Andrew Haley for doing the heavy lifting at the core
>> of this patch.
>>
>> The patch, if accepted, will be offered to 11u as a potential
>> backport. The changes apply cleanly modulo some line number changes.
>>
>> As for the speedup, this test case:
>>
>> BigInteger base = BigInteger.ONE.shiftLeft(1024);
>> long count = LongStream.rangeClosed(2, 100_000)
>> .mapToObj(n -> BigInteger.valueOf(n).add(base))
>> .filter(i -> i.isProbablePrime(50))
>> .count();
>>
>> goes from 1 minute 20 seconds down to about 35 seconds om my VM, over
>> multiple runs. As is the case on other platforms where the intrinsics
>> are supported, they will be enabled by default on Windows.
>>
>> Thank you for your time,
>>
>> -Simon Tooke
>>
>>
>> Principle Software Engineer,
>>
>> Red Hat Canada
>>
>>
>>
More information about the security-dev
mailing list