RFR (M): 8143353: Update for x86 sin and cos in the math lib

Sat Jan 9 06:43:40 UTC 2016

Good. I sponsor it.

Thanks,
Vladimir

On 1/8/16 6:16 PM, Deshpande, Vivek R wrote:
> Hi Vladimir,
>
> I have updated the patch with latest base source and split the macroAssembler_x86_libm.cpp  file into two files for your review.
> The patch is at this location:
> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/webrev.01/
>
> 64 bit code does not have less precise result or lower performance, by without using FPU instructions.
>
> Thank you.
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, January 06, 2016 4:54 PM
> To: Deshpande, Vivek R; Joseph D. Darcy
> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib
>
> On 1/6/16 4:31 PM, Deshpande, Vivek R wrote:
>> HI Vladimir,
>>
>> Yes, the macroAssembler_x86_libm.cpp file is getting large, I could look into splitting it into two files macroAssembler_libm_x86_64.cpp and macroAssembler_libm_x86_32.cpp. Please let me know if that sounds good to you.
>
> Yes, if we keep separate code we should split the file (and adjust make files).
>
>>
>> The 64 bit code takes advantage of additional general purpose registers and 64 bit integer arithmetic and so we have two different versions for 32 bit and 64 bit.
>
> Okay, this is valid argument. Even so we may use push/pop on 32-bit to preserve registers.
>
>>
>> Regarding the FPU usage in cos/sin, we talked with the LIBM algorithm experts and they came back with the following:
>> "It would not be easy to remove FPU x87 instructions from libm_sincos_huge and libm_reduced_pi04l, they are designed with using extended precision from FPU in mind.  The performance for 32bit implementation for these that do not use x87 instructions may not be optimal. These two are only used for very large input arguments."
>
> I don't buy this argument. Do they mean that 64-bit code, which does not use FPU, produces less precise result for very large input arguments" ?
> Very large input arguments is very rare case, I think. Should we care about its performance?
> Note, 32-bit performance become less and less important.
>
> Okay, for now lets split the file. Late we can try to simplify/combine/factor out the code.
>
> Thanks,
> Vladimir
>
>
>>
>> Thank you.
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, December 30, 2015 7:49 PM
>> To: Deshpande, Vivek R; Joseph D. Darcy
>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math
>> lib
>>
>> Hi Vivek,
>>
>> Why 32-bit code is so different from 64-bit code? You only use it if sse2 is available so XMM registers are present. Why to use FPU if you have SSE?
>>
>> 32-bit:
>>
>>     582   movsd(Address(rsp, 8), xmm0);
>>     583   fld_d(Address(rsp, 8));
>>     584   movsd(Address(rsp, 16), xmm6);
>>     585   fld_d(Address(rsp, 16));
>>     586   fmula(1);
>>
>> 64-bit:
>>
>>     295   mulsd(xmm0, xmm2);
>>
>> It is concerned to all LIBM 32-bit intrinsics.
>>
>> The main concern is that macroAssembler_x86_libm.cpp file become too large and it would be nice if 32-bit and 64-bit reuse the same code.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/24/15 6:10 PM, Deshpande, Vivek R wrote:
>>> HI Vladimir
>>>
>>> I have updated the libm sin cos intrinsics for x86 for hotspot.
>>> The updated webrev for the same is at this location for your review.
>>> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/we
>>> b
>>> rev.00/
>>> Could you please review it.
>>>
>>> Regards,
>>> Vivek
>>>
>>>
>>> -----Original Message-----
>>> From: Deshpande, Vivek R
>>> Sent: Tuesday, December 22, 2015 5:42 PM
>>> To: 'Joseph D. Darcy'; Vladimir Kozlov
>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math
>>> lib
>>>
>>> HI All
>>>
>>> I have uploaded the patch for sin and cos tests with input and allowed outputs at this location for your review.
>>> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev.
>>> 00/ Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353
>>> Thank you.
>>>
>>> Regards,
>>> Vivek
>>>
>>> -----Original Message-----
>>> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com]
>>> Sent: Friday, December 04, 2015 4:50 PM
>>> To: Deshpande, Vivek R; Vladimir Kozlov
>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math
>>> lib
>>>
>>> Hi Vivek,
>>>
>>> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote:
>>>> Hi
>>>>
>>>> Sure I will add the tests. Shall I use StrictMath result as a reference for exact result.
>>>> Let me know your thoughts.
>>>
>>> As a rough test of another sin/cos implementation, StrictMath.{sin,
>>> cos} can be used a reference with the following caveat: there isn't
>>> an indication of which why the error is in a StrictMath result. Let
>>> me given an example, if
>>>
>>>         StrictMath.sin(x) => y
>>>
>>> then one of the following should be true
>>>
>>>         Math.sin(x) => y
>>>         Math.sin(x) => Math.nextUp(y)
>>>         Math.sin(x) => Math.nextDown(y)
>>>
>>> That is, Math.sin(x) should either be the same as StrictMath.sin(x)
>>> OR equal to one of the floating-point numbers adjacent to that
>>> result. Of these three options, only two area allowed by the accuracy
>>> requirements of the StrictMath.sin specification. However, since
>>> StrictMath.sin doesn't give an indication of which way its error went
>>> (if it rounded up or down), there is no indication without additional
>>> work which of
>>> nextUp(y) and nextDown(y) is allowable (assuming  StrictMath.sin isn't buggy).
>>>
>>> HTH,
>>>
>>> -Joe
>>>
>>>
>>>>
>>>> Regards,
>>>> Vivek
>>>>
>>>> -----Original Message-----
>>>> From: joe darcy [mailto:joe.darcy at oracle.com]
>>>> Sent: Thursday, December 03, 2015 1:29 PM
>>>> To: Vladimir Kozlov; Deshpande, Vivek R
>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the
>>>> math lib
>>>>
>>>> Hello,
>>>>
>>>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote:
>>>>> Vivek,
>>>>>
>>>>> I think Joe is asking you to write these tests as hotspot
>>>>> regression test in hotspot/test/compiler.
>>>> Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics.
>>>>
>>>> Thanks,
>>>>
>>>> -Joe
>>>>
>>>>> Vladimir
>>>>>
>>>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote:
>>>>>> Hi Joe
>>>>>>
>>>>>> It would be great if you would please share the additional tests
>>>>>> with us.
>>>>>>
>>>>>> Regards,
>>>>>> Vivek
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com]
>>>>>> Sent: Thursday, December 03, 2015 1:17 PM
>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R
>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the
>>>>>> math lib
>>>>>>
>>>>>> I think it is unwise for this large of an implementation change to
>>>>>> be pushed with no tests targeting the specifics of the new implementation.
>>>>>>
>>>>>> The worst-case tests in the jdk repo are the mathematical worst
>>>>>> cases for floating-point approximations, in other words the cases
>>>>>> were the exact mathematical answer is closes to half-way between
>>>>>> two representation floating-point numbers. Passing such tests is
>>>>>> necessary but not sufficient condition for a new implementation.
>>>>>>
>>>>>> Chers,
>>>>>>
>>>>>> -Joe
>>>>>>
>>>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote:
>>>>>>> Okay, looks reasonable to me.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote:
>>>>>>>> Hi Vladimir
>>>>>>>>
>>>>>>>> This is the link for the updated webrev with latest hotspot
>>>>>>>> source as base for your review.
>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Vivek
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Deshpande, Vivek R
>>>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM
>>>>>>>> To: 'Vladimir Kozlov'; joe darcy
>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the
>>>>>>>> math lib
>>>>>>>>
>>>>>>>> Hi Vladimir
>>>>>>>>
>>>>>>>> This is the link for the updated webrev for your review.
>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Vivek
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM
>>>>>>>> To: Deshpande, Vivek R; joe darcy
>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the
>>>>>>>> math lib
>>>>>>>>
>>>>>>>> Please send link to new webrev on cr server.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote:
>>>>>>>>> Hi Vladimir
>>>>>>>>>
>>>>>>>>> Please find the webrev with your suggested updates attached
>>>>>>>>> with the mail.
>>>>>>>>> We will update it in the jbs entry soon.
>>>>>>>>> Please let me know if it needs further changes.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Vivek
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Deshpande, Vivek R
>>>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM
>>>>>>>>> To: 'joe darcy'; Vladimir Kozlov
>>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in
>>>>>>>>> the math lib
>>>>>>>>>
>>>>>>>>> HI Vladimir, Joe
>>>>>>>>>
>>>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you
>>>>>>>>> have mentioned. It passed those tests.
>>>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions
>>>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option.
>>>>>>>>> The performance gain is 3.2x over base jdk, that is over
>>>>>>>>> current fsin/fcos intrinsic. This gain is more realistic.
>>>>>>>>>
>>>>>>>>> Could I get those tests around the boundary values. Would
>>>>>>>>> WorstCaseTests.java jtreg test in jdk test those ?
>>>>>>>>> If yes, then it has passed those boundary cases.
>>>>>>>>>
>>>>>>>>> I would work on adding either diagnostic flag or just one flag
>>>>>>>>> for libm and send out the webrev soon.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Vivek
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com]
>>>>>>>>> Sent: Monday, November 23, 2015 6:28 PM
>>>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R
>>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler
>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in
>>>>>>>>> the math lib
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Just getting added to the thread..
>>>>>>>>>
>>>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote:
>>>>>>>>>> Thank you, for explanation, Vivek.
>>>>>>>>>>
>>>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition
>>>>>>>>>> to Hotspot tests.
>>>>>>>>>>
>>>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote:
>>>>>>>>>>> Hi Vladimir
>>>>>>>>>>>
>>>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from
>>>>>>>>>>> StrictMath result and not exact result. So I added the flag
>>>>>>>>>>> to switch between FDLIBM and LIBM.
>>>>>>>>>>>
>>>>>>>>>>> Quick explanation:
>>>>>>>>>>> This is what we observed with comparison to HPA Library
>>>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example.
>>>>>>>>>>> LIBM Observed Math result=0.19457293629570213
>>>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result
>>>>>>>>>>> should be = 0.19457293629570216
>>>>>>>>>>> (4596178249117717084L) (StrictMath result) or
>>>>>>>>>>> 0.1945729362957022
>>>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA
>>>>>>>>>>> library result is between the above two values and Exact
>>>>>>>>>>> result would be pretty close to it.
>>>>>>>>>>> So here StrictMath result is less than quad-precision result,
>>>>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not
>>>>>>>>>>> StrictMath
>>>>>>>>>>> - 1ulp, according to our test.
>>>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both
>>>>>>>>>> direction, I
>>>>>>>>>> think) and it should be consistent for Interpreter and code
>>>>>>>>>> generated by JIT compilers:
>>>>>>>>>>
>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#s
>>>>>>>>>> i
>>>>>>>>>> n
>>>>>>>>>> %
>>>>>>>>>> 28
>>>>>>>>>> do
>>>>>>>>>> u
>>>>>>>>>> ble%29
>>>>>>>>>>
>>>>>>>>> That interpretation of the spec is not quite right. For the
>>>>>>>>> Math methods with a 1/2 ulp error bound, the floating-point
>>>>>>>>> result closest to the exact result must be returned. For the
>>>>>>>>> methods with a
>>>>>>>>> 1 ulp error bound, either of the floating-point result
>>>>>>>>> bracketing the true result can be returned, subject to the
>>>>>>>>> monotonicity constraints of the specification of the particular method.
>>>>>>>>>
>>>>>>>>>>> I have done the experiments with
>>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin and
>>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dcos.
>>>>>>>>>>> With this option, the interpreter would go through LIBM and C1 and c2 through FDLIBM.
>>>>>>>>>>> If we want to disable LIBM completely, we need the flags
>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic.
>>>>>>>>>> I was thinking about using existing
>>>>>>>>>> DirectiveSet::is_intrinsic_disabled() and
>>>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add
>>>>>>>>>> additional versions of functions which accept intrinsic ID
>>>>>>>>>> instead of methodHandle.
>>>>>>>>>>
>>>>>>>>>> If you still want to use flags make them diagnostic.
>>>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic.
>>>>>>>>>>
>>>>>>>>>>> Also the performance gain ~4x is with
>>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos.
>>>>>>>>>> You confused me here. So you get 4x when only Interpreter use
>>>>>>>>>> LIBM code and compilers use FDLIB?
>>>>>>>>> Just to be clear, are you comparing the new code to FDLIBM
>>>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)?
>>>>>>>>>
>>>>>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780:
>>>>>>>>> Port fdlibm to Java), which is providing a significant speed
>>>>>>>>> boost to the StrictMath methods that have been ported.
>>>>>>>>>
>>>>>>>>> I find the current patch *insufficient* as-is in terms of its
>>>>>>>>> testing.
>>>>>>>>> For example, part of patch says
>>>>>>>>>
>>>>>>>>> # For sin
>>>>>>>>>
>>>>>>>>> +//     This means that the main path is actually only taken for
>>>>>>>>> +//     2^-252 <= |X| < 90112.
>>>>>>>>>
>>>>>>>>> # For cos
>>>>>>>>>
>>>>>>>>> +//     This means that the main path is actually only taken for
>>>>>>>>> +//     2^-252 <= |X| < 90112.
>>>>>>>>>
>>>>>>>>> If nothing else, there are no tests at around those boundary
>>>>>>>>> values, which is unacceptable. There should also be some tests
>>>>>>>>> of values of interest to the algorithm in question.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> -Joe
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>>> Let me know your thoughts on this. I would answer more
>>>>>>>>>>> questions and give more data if needed.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Vivek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM
>>>>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net
>>>>>>>>>>> Cc: Viswanathan, Sandhya
>>>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in
>>>>>>>>>>> the math lib
>>>>>>>>>>>
>>>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote:
>>>>>>>>>>>> What is the reason you decided to add new flags? exp() and
>>>>>>>>>>>> log() changes did not have flags.
>>>>>>>>>>>>
>>>>>>>>>>>> It would be interesting to see what happens if you disable
>>>>>>>>>>>> intrinsics using existing flag, for example:
>>>>>>>>>>>>
>>>>>>>>>>>>           -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>>> -XX:DisableIntrinsic=_dexp
>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>
>>>>>>>>>>> I want to point that you can do this experiment later. We can
>>>>>>>>>>> file bugs and fixed them after FC.
>>>>>>>>>>>
>>>>>>>>>>> For now, please, answer my question about flags only. This is
>>>>>>>>>>> the only thing holding it from push.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote:
>>>>>>>>>>>>> Hi all
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to contribute a patch which optimizes
>>>>>>>>>>>>> Math.sin() and
>>>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM
>>>>>>>>>>>>>           implementation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The option to use the optimizations are
>>>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you please review and sponsor this patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bug-id:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353
>>>>>>>>>>>>> webrev:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Vivek
>>>>>>>>>>>>>
>>>