RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows
David Holmes
david.holmes at oracle.com
Mon Jun 8 09:33:23 UTC 2020
Hi Simon,
On 6/06/2020 6:46 am, Simon Tooke wrote:
> Thanks again for the review.
>
> As per your and Andrew Haley's comments, I have updated the webrev:
>
> - used NOINLINE
>
> - used julong
>
> - deleted the block of unused code.
>
> Please let me know what you think.
>
> updated webrev:
> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/
All my comments have been addressed - thanks.
David
> Thanks,
>
> -Simon
>
>
>
> On 2020-06-05 12:35 a.m., David Holmes wrote:
>> Hi Simon,
>>
>> On 4/06/2020 11:35 pm, Simon Tooke wrote:
>>> Hello, David, and thanks for the review!
>>>
>>> I've responded to your comments below, and intend to post a new patch
>>> for review today or tomorrow.
>>>
>>> Thanks again,
>>>
>>> -Simon
>>>
>>> On 2020-06-03 2:06 a.m., David Holmes wrote:
>>>> Hi Simon,
>>>>
>>>> On 23/05/2020 12:04 am, Sean Mullan wrote:
>>>>> Cross-posting to hotspot-dev for additional review since the code
>>>>> changes are in hotspot.
>>>>>
>>>>> --Sean
>>>>>
>>>>> On 5/21/20 1:24 PM, Simon Tooke wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'd like to request a review for:
>>>>>>
>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243114
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/
>>>>>>
>>>>>> This change implements the given intrinsics on Windows.
>>>>>>
>>>>>> The Windows toolchain has no 128 bit integer types, and no inline
>>>>>> asm (on x64 and Arm). In addition, 'long' is 4 bytes, not 8, as
>>>>>> it is with gcc. This patch had to change some of the linux
>>>>>> implementation to account for these limitations.
>>>>
>>>> I can't really comment on the intrinsics themselves but a couple of
>>>> suggestions:
>>>>
>>>> - use explicitly sized types e.g. uint64_t instead of unsigned long
>>>> long
>>>
>>> Yes, this hurt to type. A previous review suggested using julong, is
>>> that acceptable to you?
>>
>> j* types should only be used when dealing with values that come from
>> or go to Java. Obviously julong is a misnomer as Java doesn't have an
>> unsigned type, but in general this is something we are trying to fix
>> up in the codebase. If these arrays are extracted from Java arrays
>> then using a j* type would be appropriate, but then I would expect
>> jlong, unless the algorithm explicitly requires unsigned types? If so
>> julong would be acceptable.
>>
>>> (an aside: I'm now wondering if there is other code in the JDK that
>>> assumes long is 64bits - which is not true on all platforms...)
>>
>> There has been such code, but hopefully there is no remaining actively
>> used code with that bug. There are using of "long" that should be
>> eradicated (there's an open JBS issue for that as I recall) but the
>> remaining uses don't seem to require the long be 64-bit.
>>
>>>> - use the existing NOINLINE macro for the _declspec(noinline)
>>> Thanks, will do.
>>>>
>>>> The conditional compilation in this code has me quite confused.
>>>> Looking at the existing code we have:
>>>>
>>>> 3680 #ifndef _WINDOWS
>>>> 3681
>>>> 3682 #define ASM_SUBTRACT
>>>> 3683
>>>> 3684 #ifdef ASM_SUBTRACT
>>>> ...
>>>> 3702 #else // ASM_SUBTRACT
>>>> ...
>>>> 3719 #endif // ! ASM_SUBTRACT
>>>>
>>>> So all the above code is only built on not-Windows, and we
>>>> unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to
>>>> be dead code! I'm not at all sure whether that is actually a bug and
>>>> the windows ifndef should have had an endif after line 3682; or
>>>> whether we can just simplify the code.
>>> The dead code existed prior to this patch, so I wasn't proposing
>>> removing it. I'm happy to do so if that's for the best. As far as I
>>> can tell, it's for implementing these intrinsics on gcc platforms
>>> without assembler.
>>
>> AFAICS Andrew originally had the ASM_SUBTRACT parts, with the
>> unconditional #define, but there was no explanation in the review
>> thread as to why the unused code remained present. Then before
>> integration all the code was wrapped by the ifndef Windows to exclude
>> it from Windows.
>>
>> I think it needs to be fixed as you are making changes to the Windows
>> part and it is very hard to establish how the dead code should look in
>> that case.
>>
>> Thanks,
>> David
>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>>
>>>>>> My gratitude for Andrew Haley for doing the heavy lifting at the
>>>>>> core of this patch.
>>>>>>
>>>>>> The patch, if accepted, will be offered to 11u as a potential
>>>>>> backport. The changes apply cleanly modulo some line number changes.
>>>>>>
>>>>>> As for the speedup, this test case:
>>>>>>
>>>>>> BigInteger base = BigInteger.ONE.shiftLeft(1024);
>>>>>> long count = LongStream.rangeClosed(2, 100_000)
>>>>>> .mapToObj(n -> BigInteger.valueOf(n).add(base))
>>>>>> .filter(i -> i.isProbablePrime(50))
>>>>>> .count();
>>>>>>
>>>>>> goes from 1 minute 20 seconds down to about 35 seconds om my VM,
>>>>>> over multiple runs. As is the case on other platforms where the
>>>>>> intrinsics are supported, they will be enabled by default on Windows.
>>>>>>
>>>>>> Thank you for your time,
>>>>>>
>>>>>> -Simon Tooke
>>>>>>
>>>>>>
>>>>>> Principal Software Engineer,
>>>>>>
>>>>>> Red Hat Canada
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
More information about the security-dev
mailing list