RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

Simon Tooke stooke at redhat.com
Fri Jun 5 20:46:22 UTC 2020


Thanks again for the review.

As per your and Andrew Haley's comments, I have updated the webrev:

- used NOINLINE

- used julong

- deleted the block of unused code.

Please let me know what you think.

updated webrev: 
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/

Thanks,

-Simon



On 2020-06-05 12:35 a.m., David Holmes wrote:
> Hi Simon,
>
> On 4/06/2020 11:35 pm, Simon Tooke wrote:
>> Hello, David, and thanks for the review!
>>
>> I've responded to your comments below, and intend to post a new patch 
>> for review today or tomorrow.
>>
>> Thanks again,
>>
>> -Simon
>>
>> On 2020-06-03 2:06 a.m., David Holmes wrote:
>>> Hi Simon,
>>>
>>> On 23/05/2020 12:04 am, Sean Mullan wrote:
>>>> Cross-posting to hotspot-dev for additional review since the code 
>>>> changes are in hotspot.
>>>>
>>>> --Sean
>>>>
>>>> On 5/21/20 1:24 PM, Simon Tooke wrote:
>>>>> Hello,
>>>>>
>>>>> I'd like to request a review for:
>>>>>
>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243114
>>>>>
>>>>> Webrev: 
>>>>> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/
>>>>>
>>>>> This change implements the given intrinsics on Windows.
>>>>>
>>>>> The Windows toolchain has no 128 bit integer types, and no inline 
>>>>> asm (on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as 
>>>>> it is with gcc.  This patch had to change some of the linux 
>>>>> implementation to account for these limitations.
>>>
>>> I can't really comment on the intrinsics themselves but a couple of 
>>> suggestions:
>>>
>>> - use explicitly sized types e.g. uint64_t instead of unsigned long 
>>> long
>>
>> Yes, this hurt to type.  A previous review suggested using julong, is 
>> that acceptable to you?
>
> j* types should only be used when dealing with values that come from 
> or go to Java. Obviously julong is a misnomer as Java doesn't have an 
> unsigned type, but in general this is something we are trying to fix 
> up in the codebase. If these arrays are extracted from Java arrays 
> then using a j* type would be appropriate, but then I would expect 
> jlong, unless the algorithm explicitly requires unsigned types? If so 
> julong would be acceptable.
>
>> (an aside: I'm now wondering if there is other code in the JDK that 
>> assumes long is 64bits - which is not true on all platforms...)
>
> There has been such code, but hopefully there is no remaining actively 
> used code with that bug. There are using of "long" that should be 
> eradicated (there's an open JBS issue for that as I recall) but the 
> remaining uses don't seem to require the long be 64-bit.
>
>>> - use the existing NOINLINE macro for the _declspec(noinline)
>> Thanks, will do.
>>>
>>> The conditional compilation in this code has me quite confused. 
>>> Looking at the existing code we have:
>>>
>>> 3680 #ifndef _WINDOWS
>>> 3681
>>> 3682 #define ASM_SUBTRACT
>>> 3683
>>> 3684 #ifdef ASM_SUBTRACT
>>> ...
>>> 3702 #else // ASM_SUBTRACT
>>> ...
>>> 3719 #endif // ! ASM_SUBTRACT
>>>
>>> So all the above code is only built on not-Windows, and we 
>>> unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to 
>>> be dead code! I'm not at all sure whether that is actually a bug and 
>>> the windows ifndef should have had an endif after line 3682; or 
>>> whether we can just simplify the code.
>> The dead code existed prior to this patch, so I wasn't proposing 
>> removing it.  I'm happy to do so if that's for the best.  As far as I 
>> can tell, it's for implementing these intrinsics on gcc platforms 
>> without assembler.
>
> AFAICS Andrew originally had the ASM_SUBTRACT parts, with the 
> unconditional #define, but there was no explanation in the review 
> thread as to why the unused code remained present. Then before 
> integration all the code was wrapped by the ifndef Windows to exclude 
> it from Windows.
>
> I think it needs to be fixed as you are making changes to the Windows 
> part and it is very hard to establish how the dead code should look in 
> that case.
>
> Thanks,
> David
>
>>>
>>> Thanks,
>>> David
>>>
>>>>>
>>>>> My gratitude for Andrew Haley for doing the heavy lifting at the 
>>>>> core of this patch.
>>>>>
>>>>> The patch, if accepted, will be offered to 11u as a potential 
>>>>> backport. The changes apply cleanly modulo some line number changes.
>>>>>
>>>>> As for the speedup, this test case:
>>>>>
>>>>> BigInteger base = BigInteger.ONE.shiftLeft(1024);
>>>>> long count = LongStream.rangeClosed(2, 100_000)
>>>>>      .mapToObj(n -> BigInteger.valueOf(n).add(base))
>>>>>      .filter(i -> i.isProbablePrime(50))
>>>>>      .count();
>>>>>
>>>>> goes from 1 minute 20 seconds down to about 35 seconds om my VM, 
>>>>> over multiple runs.  As is the case on other platforms where the 
>>>>> intrinsics are supported, they will be enabled by default on Windows.
>>>>>
>>>>> Thank you for your time,
>>>>>
>>>>> -Simon Tooke
>>>>>
>>>>>
>>>>> Principal Software Engineer,
>>>>>
>>>>> Red Hat Canada
>>>>>
>>>>>
>>>>>
>>>
>>
>




More information about the security-dev mailing list