RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

David Holmes david.holmes at oracle.com
Mon Jun 8 09:33:23 UTC 2020


Hi Simon,

On 6/06/2020 6:46 am, Simon Tooke wrote:
> Thanks again for the review.
> 
> As per your and Andrew Haley's comments, I have updated the webrev:
> 
> - used NOINLINE
> 
> - used julong
> 
> - deleted the block of unused code.
> 
> Please let me know what you think.
> 
> updated webrev: 
> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/

All my comments have been addressed - thanks.

David

> Thanks,
> 
> -Simon
> 
> 
> 
> On 2020-06-05 12:35 a.m., David Holmes wrote:
>> Hi Simon,
>>
>> On 4/06/2020 11:35 pm, Simon Tooke wrote:
>>> Hello, David, and thanks for the review!
>>>
>>> I've responded to your comments below, and intend to post a new patch 
>>> for review today or tomorrow.
>>>
>>> Thanks again,
>>>
>>> -Simon
>>>
>>> On 2020-06-03 2:06 a.m., David Holmes wrote:
>>>> Hi Simon,
>>>>
>>>> On 23/05/2020 12:04 am, Sean Mullan wrote:
>>>>> Cross-posting to hotspot-dev for additional review since the code 
>>>>> changes are in hotspot.
>>>>>
>>>>> --Sean
>>>>>
>>>>> On 5/21/20 1:24 PM, Simon Tooke wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'd like to request a review for:
>>>>>>
>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243114
>>>>>>
>>>>>> Webrev: 
>>>>>> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/
>>>>>>
>>>>>> This change implements the given intrinsics on Windows.
>>>>>>
>>>>>> The Windows toolchain has no 128 bit integer types, and no inline 
>>>>>> asm (on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as 
>>>>>> it is with gcc.  This patch had to change some of the linux 
>>>>>> implementation to account for these limitations.
>>>>
>>>> I can't really comment on the intrinsics themselves but a couple of 
>>>> suggestions:
>>>>
>>>> - use explicitly sized types e.g. uint64_t instead of unsigned long 
>>>> long
>>>
>>> Yes, this hurt to type.  A previous review suggested using julong, is 
>>> that acceptable to you?
>>
>> j* types should only be used when dealing with values that come from 
>> or go to Java. Obviously julong is a misnomer as Java doesn't have an 
>> unsigned type, but in general this is something we are trying to fix 
>> up in the codebase. If these arrays are extracted from Java arrays 
>> then using a j* type would be appropriate, but then I would expect 
>> jlong, unless the algorithm explicitly requires unsigned types? If so 
>> julong would be acceptable.
>>
>>> (an aside: I'm now wondering if there is other code in the JDK that 
>>> assumes long is 64bits - which is not true on all platforms...)
>>
>> There has been such code, but hopefully there is no remaining actively 
>> used code with that bug. There are using of "long" that should be 
>> eradicated (there's an open JBS issue for that as I recall) but the 
>> remaining uses don't seem to require the long be 64-bit.
>>
>>>> - use the existing NOINLINE macro for the _declspec(noinline)
>>> Thanks, will do.
>>>>
>>>> The conditional compilation in this code has me quite confused. 
>>>> Looking at the existing code we have:
>>>>
>>>> 3680 #ifndef _WINDOWS
>>>> 3681
>>>> 3682 #define ASM_SUBTRACT
>>>> 3683
>>>> 3684 #ifdef ASM_SUBTRACT
>>>> ...
>>>> 3702 #else // ASM_SUBTRACT
>>>> ...
>>>> 3719 #endif // ! ASM_SUBTRACT
>>>>
>>>> So all the above code is only built on not-Windows, and we 
>>>> unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to 
>>>> be dead code! I'm not at all sure whether that is actually a bug and 
>>>> the windows ifndef should have had an endif after line 3682; or 
>>>> whether we can just simplify the code.
>>> The dead code existed prior to this patch, so I wasn't proposing 
>>> removing it.  I'm happy to do so if that's for the best.  As far as I 
>>> can tell, it's for implementing these intrinsics on gcc platforms 
>>> without assembler.
>>
>> AFAICS Andrew originally had the ASM_SUBTRACT parts, with the 
>> unconditional #define, but there was no explanation in the review 
>> thread as to why the unused code remained present. Then before 
>> integration all the code was wrapped by the ifndef Windows to exclude 
>> it from Windows.
>>
>> I think it needs to be fixed as you are making changes to the Windows 
>> part and it is very hard to establish how the dead code should look in 
>> that case.
>>
>> Thanks,
>> David
>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>>
>>>>>> My gratitude for Andrew Haley for doing the heavy lifting at the 
>>>>>> core of this patch.
>>>>>>
>>>>>> The patch, if accepted, will be offered to 11u as a potential 
>>>>>> backport. The changes apply cleanly modulo some line number changes.
>>>>>>
>>>>>> As for the speedup, this test case:
>>>>>>
>>>>>> BigInteger base = BigInteger.ONE.shiftLeft(1024);
>>>>>> long count = LongStream.rangeClosed(2, 100_000)
>>>>>>      .mapToObj(n -> BigInteger.valueOf(n).add(base))
>>>>>>      .filter(i -> i.isProbablePrime(50))
>>>>>>      .count();
>>>>>>
>>>>>> goes from 1 minute 20 seconds down to about 35 seconds om my VM, 
>>>>>> over multiple runs.  As is the case on other platforms where the 
>>>>>> intrinsics are supported, they will be enabled by default on Windows.
>>>>>>
>>>>>> Thank you for your time,
>>>>>>
>>>>>> -Simon Tooke
>>>>>>
>>>>>>
>>>>>> Principal Software Engineer,
>>>>>>
>>>>>> Red Hat Canada
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
> 



More information about the security-dev mailing list