RFR 8167065 - Adding support for double precision shifting for x86

Tue Oct 11 18:28:44 UTC 2016

Thank you for explaining the situation.

Okay, add range check methods then. There are examples in the file. Their code should be optimized out when inlined by JIT because as you said the range is limited by mag.length. I don't think you 
need to check for NULL - it is checked outside intrinsic method.

Do you have normal benchmark which will show benefit of BigInteger.shift* intrinsics ?

Thanks,
Vladimir

On 10/11/16 11:19 AM, Lupusoru, Razvan A wrote:
> Vladimir,
>
> Thanks for taking a look! I was debating whether to include range checks since we know based on implementation that the arrays will not be null or out of range. However, I will definitely add them if we agree on the intrinsification.
>
> I do not have performance numbers on using pattern matching for BigInteger shifts. The reason is that current pattern matching implementation I provided relies on knowing constant information in order to confirm that the sum of the two shift counts is 32 (integer case). For BigInteger shifts, the shift count is not a constant, but an invariant (since it is an argument passed in). Being an invariant by itself is not a problem as long as we can walk the graph and determine that the two shifts have sum of 32. That said, I actually did not know how to implement such a solution since detecting the sum via the IR did not look obvious. And currently the value ranges don't seem to provide an easy way to get value range of an expression (like a sum expression).
>
> Thus I decided that in order to take advantage of the benefit of double precision shifting, intrinsification would be best. It also gets around a register allocation issue that I saw in BigInteger.shiftRight where the shiftCount invariants are placed in XMM registers and then loaded in GPRs in the loop.
>
> Thanks again for the quick look at my patch. Let me know if you have any other questions or concerns. I can also provide code snippets of what code used to look like before and it looks like now.
>
> --Razvan
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, October 11, 2016 10:55 AM
> To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: Re: RFR 8167065 - Adding support for double precision shifting for x86
>
> On 10/11/16 10:51 AM, Vladimir Kozlov wrote:
>> Hi Razvan,
>>
>> The biggest issue with this change is missing range checks for
>> accessed arrays (they are generated by JIT with normal compilation). Usually we add additional java method to do that when we intrinsify array accesses.
>> And I don't think we should intrinsify these small loops. When
>> performance improvement you will get for  BigInteger.shiftLeft and BigInteger.shiftRight if you only Pattern match their body code (mag[j++] << nBits | mag[j] >>> nBits2)?
>
> I mean "What performance improvement".
>
>>
>> Thanks,
>> Vladimir
>>
>> On 10/11/16 9:43 AM, Lupusoru, Razvan A wrote:
>>> Hello,
>>>
>>> I would like to propose some work that I have been recently looking
>>> into - which is use of double precision shifting on x86_64. Double
>>> precision shift is an operation which shifts but instead of bringing in zeros or ones, it brings in bits from another source. This is a powerful operation because it allows making some patterns more concise and enables speed up of code which uses masks to create same semantics.
>>>
>>> So I would like to propose a patch which adds support for double precision shifting in two ways:
>>>
>>> 1) Pattern matching is done to catch pattern A << C | B >>> D and A
>>> << C + B >>> D (where C + D is 32 and 64 for integers and longs
>>> respectively)
>>>
>>> 2) BigInteger methods shiftRight and shiftLeft are intrinsified and made to use double precision shifting.
>>>
>>> In order to confirm performance gains, small micros were created
>>> which test the above operations in tight loops. The tests were done on a Skylake i7-6700k. The following performance improvements were seen by me:
>>>
>>> - Intrinsification of BigInteger.shiftLeft: 15%
>>>
>>> - Intrinsification of BigInteger.shiftRight: 25%
>>>
>>> - Pattern replacement of pattern above with integers: 7.5%
>>>
>>> - Pattern replacement of patterns above with longs: 2%
>>>
>>> The bug is filed here:
>>> https://bugs.openjdk.java.net/browse/JDK-8167065
>>>
>>> The webrev for hotspot is available here:
>>> http://cr.openjdk.java.net/~mcberg/8167065/hotspot/webrev.02/
>>>
>>> The webrev for jdk (minor changes to allow intrinsification) is here:
>>> http://cr.openjdk.java.net/~mcberg/8167065/jdk/webrev.01/
>>>
>>> Thanks so much for your consideration!
>>>
>>> --Razvan
>>>