From aph at redhat.com  Wed Jul  1 11:57:49 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 01 Jul 2015 12:57:49 +0100
Subject: RFR: 8130150: RSA Acceleration
In-Reply-To: <5592D6AA.8020509@redhat.com>
References: <557ABD2E.7050608@redhat.com>	<557EFF94.5000006@oracle.com>	<557F042D.4060707@redhat.com>	<558033C4.8040104@redhat.com>	<5582F936.5020008@oracle.com>	<5582FACA.4060103@redhat.com>
	<5582FDCA.8010507@oracle.com>	<55831BC8.9060001@oracle.com>
	<5583D414.5050502@redhat.com>	<558D7D02.6070303@redhat.com>
	<559103D8.1010302@oracle.com> <559110BF.4090804@redhat.com>
	<5592D6AA.8020509@redhat.com>
Message-ID: <5593D5BD.8000809@redhat.com>

On 06/30/2015 06:49 PM, Andrew Haley wrote:
> New webrevs:
> 
> http://cr.openjdk.java.net/~aph/8130150-jdk
> http://cr.openjdk.java.net/~aph/8130150-hs/

I made an error when preparing these webrevs.  Please ignore.

Andrew.


From aph at redhat.com  Wed Jul  1 14:56:40 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 01 Jul 2015 15:56:40 +0100
Subject: RFR: 8130150: RSA Acceleration
In-Reply-To: <5593D5BD.8000809@redhat.com>
References: <557ABD2E.7050608@redhat.com>	<557EFF94.5000006@oracle.com>	<557F042D.4060707@redhat.com>	<558033C4.8040104@redhat.com>	<5582F936.5020008@oracle.com>	<5582FACA.4060103@redhat.com>	<5582FDCA.8010507@oracle.com>	<55831BC8.9060001@oracle.com>	<5583D414.5050502@redhat.com>	<558D7D02.6070303@redhat.com>	<559103D8.1010302@oracle.com>
	<559110BF.4090804@redhat.com>	<5592D6AA.8020509@redhat.com>
	<5593D5BD.8000809@redhat.com>
Message-ID: <5593FFA8.8090506@redhat.com>

Sorry for the mistake.  New webrevs:

http://cr.openjdk.java.net/~aph/8130150-hs-1/
http://cr.openjdk.java.net/~aph/8130150-jdk-1/

Andrew.


From zoltan.majo at oracle.com  Wed Jul  1 15:22:59 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 01 Jul 2015 17:22:59 +0200
Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across
	platforms
Message-ID: <559405D3.9070900@oracle.com>

Hi,


please review the patch for JDK-8130120.

Bug: https://bugs.openjdk.java.net/browse/JDK-8130120

Problem: Currently, the JVM prints different warning messages when 
SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints 
"SHA intrinsics are not available on this CPU" and x86 prints "SHA 
instructions are not available on this CPU"). Also, there are flag 
combinations that result in a warning on some platforms but not on other 
platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a warning on 
x86 but it does not on aarch64 and on sparc).

Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, 
UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same way 
on x86, aarch64, and sparc. Change warning messages to be consistent 
among the previously mentioned platforms and also to better match the 
flag's description. Update the tests in test/compiler/intrinsics/sha to 
match the new functionality.

Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/

Testing:
- full JPRT run (includes the updated tests that were executed on x86 
and sparc), all tests pass;
- locally executed the test/compiler/intrinsics/sha tests on aarch64; 
all tests pass.

Thank you and best regards,


Zoltan


From vladimir.kozlov at oracle.com  Wed Jul  1 18:47:37 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 01 Jul 2015 11:47:37 -0700
Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent
	across platforms
In-Reply-To: <559405D3.9070900@oracle.com>
References: <559405D3.9070900@oracle.com>
Message-ID: <559435C9.3040508@oracle.com>

Looks good but I would keep "on this CPU" at the end of messages to 
clear indicate that it is due to instructions are not available.

Thanks,
Vladimir

On 7/1/15 8:22 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the patch for JDK-8130120.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120
>
> Problem: Currently, the JVM prints different warning messages when
> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints
> "SHA intrinsics are not available on this CPU" and x86 prints "SHA
> instructions are not available on this CPU"). Also, there are flag
> combinations that result in a warning on some platforms but not on other
> platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a warning on
> x86 but it does not on aarch64 and on sparc).
>
> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics,
> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same way
> on x86, aarch64, and sparc. Change warning messages to be consistent
> among the previously mentioned platforms and also to better match the
> flag's description. Update the tests in test/compiler/intrinsics/sha to
> match the new functionality.
>
> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/
>
> Testing:
> - full JPRT run (includes the updated tests that were executed on x86
> and sparc), all tests pass;
> - locally executed the test/compiler/intrinsics/sha tests on aarch64;
> all tests pass.
>
> Thank you and best regards,
>
>
> Zoltan
>

From vladimir.kozlov at oracle.com  Wed Jul  1 18:56:10 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 01 Jul 2015 11:56:10 -0700
Subject: RFR: 8130150: RSA Acceleration
In-Reply-To: <5593FFA8.8090506@redhat.com>
References: <557ABD2E.7050608@redhat.com>	<557EFF94.5000006@oracle.com>	<557F042D.4060707@redhat.com>	<558033C4.8040104@redhat.com>	<5582F936.5020008@oracle.com>	<5582FACA.4060103@redhat.com>	<5582FDCA.8010507@oracle.com>	<55831BC8.9060001@oracle.com>	<5583D414.5050502@redhat.com>	<558D7D02.6070303@redhat.com>	<559103D8.1010302@oracle.com>
	<559110BF.4090804@redhat.com>	<5592D6AA.8020509@redhat.com>
	<5593D5BD.8000809@redhat.com> <5593FFA8.8090506@redhat.com>
Message-ID: <559437CA.2070303@oracle.com>

Looks good to me.

Thanks,
Vladimir

On 7/1/15 7:56 AM, Andrew Haley wrote:
> Sorry for the mistake.  New webrevs:
>
> http://cr.openjdk.java.net/~aph/8130150-hs-1/
> http://cr.openjdk.java.net/~aph/8130150-jdk-1/
>
> Andrew.
>

From michael.c.berg at intel.com  Wed Jul  1 22:57:18 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Wed, 1 Jul 2015 22:57:18 +0000
Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent
	across platforms
In-Reply-To: <559435C9.3040508@oracle.com>
References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474755E12E53@FMSMSX102.amr.corp.intel.com>

Looks good, once Vladimir's note is added.

Thanks,
-Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
Sent: Wednesday, July 01, 2015 11:48 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms

Looks good but I would keep "on this CPU" at the end of messages to clear indicate that it is due to instructions are not available.

Thanks,
Vladimir

On 7/1/15 8:22 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the patch for JDK-8130120.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120
>
> Problem: Currently, the JVM prints different warning messages when 
> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints 
> "SHA intrinsics are not available on this CPU" and x86 prints "SHA 
> instructions are not available on this CPU"). Also, there are flag 
> combinations that result in a warning on some platforms but not on 
> other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a 
> warning on
> x86 but it does not on aarch64 and on sparc).
>
> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, 
> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same 
> way on x86, aarch64, and sparc. Change warning messages to be 
> consistent among the previously mentioned platforms and also to better 
> match the flag's description. Update the tests in 
> test/compiler/intrinsics/sha to match the new functionality.
>
> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/
>
> Testing:
> - full JPRT run (includes the updated tests that were executed on x86 
> and sparc), all tests pass;
> - locally executed the test/compiler/intrinsics/sha tests on aarch64; 
> all tests pass.
>
> Thank you and best regards,
>
>
> Zoltan
>

From zoltan.majo at oracle.com  Thu Jul  2 12:17:00 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 02 Jul 2015 14:17:00 +0200
Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent
	across platforms
In-Reply-To: <C568518E7B433348B114B6A7122D474755E12E53@FMSMSX102.amr.corp.intel.com>
References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com>
	<C568518E7B433348B114B6A7122D474755E12E53@FMSMSX102.amr.corp.intel.com>
Message-ID: <55952BBC.4090605@oracle.com>

Thank you, Vladimir and Michael, for the feedback!

Here is the updated webrev:
http://cr.openjdk.java.net/~zmajo/8130120/webrev.01/

All JPRT tests pass.

I plan to push the newest webrev (webrev.01) on Friday (July 3) if no 
other issues come up by then.

Thank you and best regards,


Zoltan


On 07/02/2015 12:57 AM, Berg, Michael C wrote:
> Looks good, once Vladimir's note is added.
>
> Thanks,
> -Michael
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
> Sent: Wednesday, July 01, 2015 11:48 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms
>
> Looks good but I would keep "on this CPU" at the end of messages to clear indicate that it is due to instructions are not available.
>
> Thanks,
> Vladimir
>
> On 7/1/15 8:22 AM, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> please review the patch for JDK-8130120.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120
>>
>> Problem: Currently, the JVM prints different warning messages when
>> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints
>> "SHA intrinsics are not available on this CPU" and x86 prints "SHA
>> instructions are not available on this CPU"). Also, there are flag
>> combinations that result in a warning on some platforms but not on
>> other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a
>> warning on
>> x86 but it does not on aarch64 and on sparc).
>>
>> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics,
>> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same
>> way on x86, aarch64, and sparc. Change warning messages to be
>> consistent among the previously mentioned platforms and also to better
>> match the flag's description. Update the tests in
>> test/compiler/intrinsics/sha to match the new functionality.
>>
>> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/
>>
>> Testing:
>> - full JPRT run (includes the updated tests that were executed on x86
>> and sparc), all tests pass;
>> - locally executed the test/compiler/intrinsics/sha tests on aarch64;
>> all tests pass.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>


From vladimir.kozlov at oracle.com  Thu Jul  2 15:04:32 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 02 Jul 2015 08:04:32 -0700
Subject: hg: jdk9/hs-comp/hotspot: 2 new changesets
In-Reply-To: <201507021037.t62AbE9c014701@aojmv0008.oracle.com>
References: <201507021037.t62AbE9c014701@aojmv0008.oracle.com>
Message-ID: <55955300.7000304@oracle.com>

Andrew,

Did someone sponsored this push for you?
It has shared code changes - it should go through JPRT.

Thanks,
Vladimir

On 7/2/15 3:37 AM, aph at redhat.com wrote:
> Changeset: 9fcbb6768a78
> Author:    aph
> Date:      2015-06-16 17:31 +0100
> URL:       http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/9fcbb6768a78
>
> 8130150: Implement BigInteger.montgomeryMultiply intrinsic
> Summary: Add montgomeryMultiply intrinsics
> Reviewed-by: kvn
>
> ! src/cpu/x86/vm/sharedRuntime_x86_64.cpp
> ! src/cpu/x86/vm/stubGenerator_x86_64.cpp
> ! src/cpu/x86/vm/vm_version_x86.cpp
> ! src/share/vm/classfile/vmSymbols.hpp
> ! src/share/vm/opto/c2_globals.hpp
> ! src/share/vm/opto/escape.cpp
> ! src/share/vm/opto/library_call.cpp
> ! src/share/vm/opto/runtime.cpp
> ! src/share/vm/opto/runtime.hpp
> ! src/share/vm/runtime/sharedRuntime.hpp
> ! src/share/vm/runtime/stubRoutines.cpp
> ! src/share/vm/runtime/stubRoutines.hpp
> + test/compiler/intrinsics/montgomerymultiply/MontgomeryMultiplyTest.java
>
> Changeset: d30647171e49
> Author:    aph
> Date:      2015-07-02 11:12 +0100
> URL:       http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d30647171e49
>
> Merge
>
> ! src/cpu/x86/vm/sharedRuntime_x86_64.cpp
> ! src/cpu/x86/vm/stubGenerator_x86_64.cpp
> ! src/cpu/x86/vm/vm_version_x86.cpp
> ! src/share/vm/classfile/vmSymbols.hpp
> ! src/share/vm/opto/c2_globals.hpp
> ! src/share/vm/opto/escape.cpp
> ! src/share/vm/opto/library_call.cpp
> ! src/share/vm/opto/runtime.cpp
> ! src/share/vm/opto/runtime.hpp
> ! src/share/vm/runtime/stubRoutines.cpp
> ! src/share/vm/runtime/stubRoutines.hpp
> - test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedSparcCPU.java
> - test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedSparcCPU.java
> - test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedSparcCPU.java
> - test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedSparcCPU.java
>

From vladimir.kozlov at oracle.com  Thu Jul  2 15:09:16 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 02 Jul 2015 08:09:16 -0700
Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent
	across platforms
In-Reply-To: <55952BBC.4090605@oracle.com>
References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com>
	<C568518E7B433348B114B6A7122D474755E12E53@FMSMSX102.amr.corp.intel.com>
	<55952BBC.4090605@oracle.com>
Message-ID: <5595541C.1030402@oracle.com>

Looks good.

Thanks,
Vladimir

On 7/2/15 5:17 AM, Zolt?n Maj? wrote:
> Thank you, Vladimir and Michael, for the feedback!
>
> Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8130120/webrev.01/
>
> All JPRT tests pass.
>
> I plan to push the newest webrev (webrev.01) on Friday (July 3) if no other issues come up by then.
>
> Thank you and best regards,
>
>
> Zoltan
>
>
> On 07/02/2015 12:57 AM, Berg, Michael C wrote:
>> Looks good, once Vladimir's note is added.
>>
>> Thanks,
>> -Michael
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
>> Sent: Wednesday, July 01, 2015 11:48 AM
>> To: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms
>>
>> Looks good but I would keep "on this CPU" at the end of messages to clear indicate that it is due to instructions are
>> not available.
>>
>> Thanks,
>> Vladimir
>>
>> On 7/1/15 8:22 AM, Zolt?n Maj? wrote:
>>> Hi,
>>>
>>>
>>> please review the patch for JDK-8130120.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120
>>>
>>> Problem: Currently, the JVM prints different warning messages when
>>> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints
>>> "SHA intrinsics are not available on this CPU" and x86 prints "SHA
>>> instructions are not available on this CPU"). Also, there are flag
>>> combinations that result in a warning on some platforms but not on
>>> other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a
>>> warning on
>>> x86 but it does not on aarch64 and on sparc).
>>>
>>> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics,
>>> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same
>>> way on x86, aarch64, and sparc. Change warning messages to be
>>> consistent among the previously mentioned platforms and also to better
>>> match the flag's description. Update the tests in
>>> test/compiler/intrinsics/sha to match the new functionality.
>>>
>>> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/
>>>
>>> Testing:
>>> - full JPRT run (includes the updated tests that were executed on x86
>>> and sparc), all tests pass;
>>> - locally executed the test/compiler/intrinsics/sha tests on aarch64;
>>> all tests pass.
>>>
>>> Thank you and best regards,
>>>
>>>
>>> Zoltan
>>>
>

From zoltan.majo at oracle.com  Thu Jul  2 15:24:22 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Thu, 02 Jul 2015 17:24:22 +0200
Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent
	across platforms
In-Reply-To: <5595541C.1030402@oracle.com>
References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com>
	<C568518E7B433348B114B6A7122D474755E12E53@FMSMSX102.amr.corp.intel.com>
	<55952BBC.4090605@oracle.com> <5595541C.1030402@oracle.com>
Message-ID: <559557A6.3010207@oracle.com>

Thank you, Vladimir!

Best regards,


Zoltan


On 07/02/2015 05:09 PM, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir
>
> On 7/2/15 5:17 AM, Zolt?n Maj? wrote:
>> Thank you, Vladimir and Michael, for the feedback!
>>
>> Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8130120/webrev.01/
>>
>> All JPRT tests pass.
>>
>> I plan to push the newest webrev (webrev.01) on Friday (July 3) if no 
>> other issues come up by then.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>
>> On 07/02/2015 12:57 AM, Berg, Michael C wrote:
>>> Looks good, once Vladimir's note is added.
>>>
>>> Thanks,
>>> -Michael
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev 
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>>> Vladimir Kozlov
>>> Sent: Wednesday, July 01, 2015 11:48 AM
>>> To: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics 
>>> inconsistent across platforms
>>>
>>> Looks good but I would keep "on this CPU" at the end of messages to 
>>> clear indicate that it is due to instructions are
>>> not available.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/1/15 8:22 AM, Zolt?n Maj? wrote:
>>>> Hi,
>>>>
>>>>
>>>> please review the patch for JDK-8130120.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120
>>>>
>>>> Problem: Currently, the JVM prints different warning messages when
>>>> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints
>>>> "SHA intrinsics are not available on this CPU" and x86 prints "SHA
>>>> instructions are not available on this CPU"). Also, there are flag
>>>> combinations that result in a warning on some platforms but not on
>>>> other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a
>>>> warning on
>>>> x86 but it does not on aarch64 and on sparc).
>>>>
>>>> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics,
>>>> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same
>>>> way on x86, aarch64, and sparc. Change warning messages to be
>>>> consistent among the previously mentioned platforms and also to better
>>>> match the flag's description. Update the tests in
>>>> test/compiler/intrinsics/sha to match the new functionality.
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/
>>>>
>>>> Testing:
>>>> - full JPRT run (includes the updated tests that were executed on x86
>>>> and sparc), all tests pass;
>>>> - locally executed the test/compiler/intrinsics/sha tests on aarch64;
>>>> all tests pass.
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>


From vladimir.kozlov at oracle.com  Thu Jul  2 21:34:55 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 02 Jul 2015 14:34:55 -0700
Subject: [9] RFR (S) 8080012: JVM times out with vdbench on SPARC M7-16
Message-ID: <5595AE7F.5040903@oracle.com>

The author is Igor Veresov.

http://cr.openjdk.java.net/~kvn/8080012/webrev/

On big SPARC machines (>1k cores) request for cacheline in PICL takes a 
lot of time since we asking it for each core. PICL daemon can't handle 
such stream of requests especially when several JVMs start at the same 
time (vdbench test).

We were told that on sun4v machines (SPARC-T and -M series) you can't 
have different chips with different cache line size. So we can ask only 
one core.

I reviewed the fix.

The fix was tested on machine which showed the problem.

Thanks,
Vladimir

From serkan at hazelcast.com  Sat Jul  4 18:06:41 2015
From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=)
Date: Sat, 4 Jul 2015 21:06:41 +0300
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
Message-ID: <CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>

Hi,

I have added some logs to show that problem is caused by double scaling of
offset (index)

Here is my updated (log messages added) reproducer code:


int count = 100000;
long size = count * 8L;
long baseAddress = unsafe.allocateMemory(size);
System.out.println("Start address: " + Long.toHexString(baseAddress) +
                   ", End address: " + Long.toHexString(baseAddress +
size));

for (int i = 0; i < count; i++) {
    long address = baseAddress + (i * 8L);
    System.out.println(
        "Normal: " + Long.toHexString(address) + ", " +
        "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
8L)));
    long expected = i;
    unsafe.putLong(address, expected);
    unsafe.getLong(address);
}


After sometime it crashes as


...
Current thread (0x0000000002068800):  JavaThread "main" [_thread_in_Java,
id=10412, stack(0x00000000023f0000,0x00000000024f0000)]

siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
...
...


And here is output of the execution until crash:

Start address: 58bbcfa0, End address: 58c804a0
Normal: 58bbcfa0, If double scaled: 58bbcfa0
Normal: 58bbcfa8, If double scaled: 58bbcfe0
Normal: 58bbcfb0, If double scaled: 58bbd020
...
...
Normal: 58c517b0, If double scaled: 59061020


As seen from the logs and crash dump, double scaled version of target
address (*If double scaled: 59061020*) is the same with the problematic
address (*siginfo: ExceptionCode=0xc0000005, reading address
0x0000000059061020*) that causes to crash while accessing it.

So I think, it is obvious that the crash is caused by wrong optimization of
index value since index is scaled two times (for *Unsafe::put* and
*Unsafe::get*) instead of only one time. Then double scaled index points to
invalid memory address.

Regards.

On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com> wrote:

> Hi all,
>
> I had dived into the issue with JDK-HotSpot commits and
> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>
> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
> }
>
> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
> }
>
>
> So I run the test by calculating address as
> - *"int * long"* (int is index and long is 8l)
> - *"long * long"* (the first long is index and the second long is 8l)
> - *"int * int"* (the first int is index and the second int is 8)
>
> Here are the logs:
> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>
> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
> same *"base"* and *"index"* instructions.
> This means that address is scaled one more time because there should be only one scale.
>
>
> When I debugged the non-problematic run (*"int * int"*),
> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
> So there is no scaling.
> static bool match_index_and_scale(Instruction*  instr,
>                                   Instruction** index,
>                                   int*          log2_scale) {
>   ...
>
>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>   if (arith != NULL) {
>      ...
>   }
>
>   return false;
> }
>
>
> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>   Instruction* base = NULL;
>   Instruction* index = NULL;
>   int          log2_scale;
>
>   if (match(x, &base, &index, &log2_scale)) {
>     x->set_base(base);
>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>     if (index != NULL) {
>       if (index->is_pinned()) {
>         log2_scale = 0;
>       } else {
>         if (log2_scale != 0) {
>           index->pin();
>         }
>       }
>     }    // /////////////////////////////
>     x->set_log2_scale(log2_scale);
>     if (PrintUnsafeOptimization) {
>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>     }
>   }
> }
> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>
> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>
> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>
> Regards.
>
> --
>
> Serkan ?ZAL
>
>
>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>> converted to
>> long address = baseAddress + (i * 8)
>> test passes. Only difference is next long pointer is calculated using
>> integer 8 instead of long 8.
>> ```
>> for (int i = 0; i < count; i++) {
>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>> of long 8
>>     long expected = i;
>>     unsafe.putLong(address, expected);
>>     long actual = unsafe.getLong(address);
>>     if (expected != actual) {
>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>> actual);
>>     }
>> }
>> ```
>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>> >* Hi all,
>> *>
>> >* While I was testing my app using java 8, I encountered the previously
>> *>* reported sun.misc.Unsafe issue.
>> *>
>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>> *>
>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>> *>
>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>> *>* "1.9.0-ea-b67".
>> *>
>> >* Test is very simple:
>> *>
>> >* ```
>> *>* public static void main(String[] args) throws Exception {
>> *>*         Unsafe unsafe = findUnsafe();
>> *>*         // 10000 pass
>> *>*         // 100000 jvm crash
>> *>*         // 1000000 fail
>> *>*         int count = 100000;
>> *>*         long size = count * 8L;
>> *>*         long baseAddress = unsafe.allocateMemory(size);
>> *>
>> >*         try {
>> *>*             for (int i = 0; i < count; i++) {
>> *>*                 long address = baseAddress + (i * 8L);
>> *>
>> >*                 long expected = i;
>> *>*                 unsafe.putLong(address, expected);
>> *>
>> >*                 long actual = unsafe.getLong(address);
>> *>
>> >*                 if (expected != actual) {
>> *>*                     throw new AssertionError("Expected: " + expected + ",
>> *>* Actual: " + actual);
>> *>*                 }
>> *>*             }
>> *>*         } finally {
>> *>*             unsafe.freeMemory(baseAddress);
>> *>*         }
>> *>*     }
>> *>* ```
>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>> *>* failing constantly.
>> *>
>> >* - With iteration count 10000, test is passing.
>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>> *>* - With iteration count 1000000, test is failing with AssertionError.
>> *>
>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>> *>* failing at all.
>> *>
>> >* I tested on platforms:
>> *>* - Centos-7/openjdk-1.8.0.45
>> *>* - OSX/oraclejdk-1.8.0.40
>> *>* - OSX/oraclejdk-1.8.0.45
>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>> *>
>> >* Previous issue comment (
>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>> *>* both are failing.
>> *>
>> >* I'm looking forward to hearing from you.
>> *>
>> >* Thanks,
>> *>* -Mehmet Dogan-
>> *>* --
>> *>
>> >* @mmdogan
>> *>
>
>
> --
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>


-- 
Serkan ?ZAL
Remotest Software Engineer
GSM: +90 542 680 39 18
Twitter: @serkan_ozal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150704/b0188dc6/attachment-0001.html>

From vladimir.kozlov at oracle.com  Mon Jul  6 18:59:11 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 06 Jul 2015 11:59:11 -0700
Subject: [8u60] backport (S) 8080012: JVM times out with vdbench on SPARC M7-16
Message-ID: <559ACFFF.4000107@oracle.com>

I would like to backport this to 8u60 (through 8u).  Patch was applied cleanly. Backport was approved by release team.

https://bugs.openjdk.java.net/browse/JDK-8080012
jdk9 webrev: http://cr.openjdk.java.net/~kvn/8080012/webrev/

On big SPARC machines (>1k cores) request for cacheline in PICL takes a lot of time since we asking it for each core. 
PICL daemon can't handle such stream of requests especially when several JVMs start at the same time (vdbench test).

We were told that on sun4v machines (SPARC-T and -M series) you can't have different chips with different cache line 
size. So we can ask only one core.

Thanks,
Vladimir

From igor.veresov at oracle.com  Mon Jul  6 19:09:28 2015
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 6 Jul 2015 12:09:28 -0700
Subject: [8u60] backport (S) 8080012: JVM times out with vdbench on SPARC
	M7-16
In-Reply-To: <559ACFFF.4000107@oracle.com>
References: <559ACFFF.4000107@oracle.com>
Message-ID: <55E2BF6D-2443-42F4-8A05-8E2BD529769F@oracle.com>

Good. Thanks!

igor

> On Jul 6, 2015, at 11:59 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> I would like to backport this to 8u60 (through 8u).  Patch was applied cleanly. Backport was approved by release team.
> 
> https://bugs.openjdk.java.net/browse/JDK-8080012
> jdk9 webrev: http://cr.openjdk.java.net/~kvn/8080012/webrev/
> 
> On big SPARC machines (>1k cores) request for cacheline in PICL takes a lot of time since we asking it for each core. PICL daemon can't handle such stream of requests especially when several JVMs start at the same time (vdbench test).
> 
> We were told that on sun4v machines (SPARC-T and -M series) you can't have different chips with different cache line size. So we can ask only one core.
> 
> Thanks,
> Vladimir


From vladimir.kozlov at oracle.com  Mon Jul  6 19:17:12 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 06 Jul 2015 12:17:12 -0700
Subject: [8u60] backport (S) 8080012: JVM times out with vdbench on SPARC
	M7-16
In-Reply-To: <55E2BF6D-2443-42F4-8A05-8E2BD529769F@oracle.com>
References: <559ACFFF.4000107@oracle.com>
	<55E2BF6D-2443-42F4-8A05-8E2BD529769F@oracle.com>
Message-ID: <559AD438.9090309@oracle.com>

Thank you, Igor. Looks like Poonam is pushing it already. So everything is good.

Thanks,
Vladimir

On 7/6/15 12:09 PM, Igor Veresov wrote:
> Good. Thanks!
>
> igor
>
>> On Jul 6, 2015, at 11:59 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> I would like to backport this to 8u60 (through 8u).  Patch was applied cleanly. Backport was approved by release team.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8080012
>> jdk9 webrev: http://cr.openjdk.java.net/~kvn/8080012/webrev/
>>
>> On big SPARC machines (>1k cores) request for cacheline in PICL takes a lot of time since we asking it for each core. PICL daemon can't handle such stream of requests especially when several JVMs start at the same time (vdbench test).
>>
>> We were told that on sun4v machines (SPARC-T and -M series) you can't have different chips with different cache line size. So we can ask only one core.
>>
>> Thanks,
>> Vladimir
>

From vladimir.x.ivanov at oracle.com  Tue Jul  7 14:29:01 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 07 Jul 2015 17:29:01 +0300
Subject: [9] RFR (M): VM should constant fold Unsafe.get*() loads from
	final fields
In-Reply-To: <558AA9A1.8070603@oracle.com>
References: <5581A26C.6090303@oracle.com>	<810DE23B-6616-4465-B91D-4CD9A8FB267D@oracle.com>
	<558AA9A1.8070603@oracle.com>
Message-ID: <559BE22D.9030206@oracle.com>

Any volunteers to review updated version? Thanks!

Best regards,
Vladimir Ivanov

On 6/24/15 3:59 PM, Vladimir Ivanov wrote:
> John, Paul, thanks for review!
>
> Updated webrev:
>    http://cr.openjdk.java.net/~vlivanov/8078629/webrev.01/
>
> I spotted a bug when field and accessor types mismatch, but the JIT
> still constant-folds the load. The fix made expected result detection
> even more complex, so I decided to get rid of it & WhiteBox hooks
> altogether. The test exercises different code paths and compares
> returned values now.
>
>> WB.isCompileConstant is a nice little thing.  We should consider using
>> it in java.lang.invoke
>> to gate aggressive object-folding optimizations.  That's one reason to
>> consider putting it
>> somewhere more central that WB.  I can't propose a good place yet.
>> (Unsafe is not quite right.)
> Actually, there's already j.l.i.MethodHandleImpl.isCompileConstant.
> Probably, compiler-specific interface is the right place for such
> things. But, as I wrote before, I decided to avoid WB hooks.
>
>> The gating logic in library_call includes this extra term:  &&
>> alias_type->field()->is_constant()
>> Why not just drop it and let make_constant do the test (which it does)?
> I wanted to stress that make_constant depends on whether the field is
> constant or not. I failed to come up with a better method name
> (try_make_constant? make_constant_attempt), so I decided to keep the
> extra condition.
>
>> You have some lines with "/*require_const=*/" in two places; that
>> can't be right.
>> This is the result of functions with too many misc. arguments to keep
>> track of.
>> I don't have the code under my fingers, so I'm just guessing, but here
>> are more suggestions:
> Thanks! I tried to address all your suggestions in updated version.
>
> Best regards,
> Vladimir Ivanov
>
>>
>> I wish the is_autobox_cache condition could be more localized.  Could
>> we omit the boolean
>> flag (almost always false), and where it is true, post-process the
>> node?  Would that make
>> the code simpler?
>>
>> This leads me to notice that make_constant is not related strongly to
>> GraphKit; it is really
>> a call to the Type and CI modules to look for a singleton type, ending
>> with either a NULL
>> or a call to GraphKit::makecon.  So you might consider changing Node*
>> GK::make_constant
>> to const Type* Type::make_constant.
>>
>> Now to pick at the argument salad we have in push_constant:  The
>> effect of is_autobox_cache
>> could be transferred to a method Type[Ary]::cast_to_autobox_cache(true).
>> And the effect of stable_type on make_constant(ciCon,bool,bool,Type*),
>> could also be factored out, as post-processing step
>> contype=contype->Type::join(stabletype).
>
>

From edward.nevill at gmail.com  Tue Jul  7 16:06:08 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 07 Jul 2015 17:06:08 +0100
Subject: RFR: 8130687: aarch64: add support for hardware crc32c
Message-ID: <1436285168.1592.14.camel@mylittlepony.linaroharston>

Hi,

http://cr.openjdk.java.net/~enevill/8130687/webrev/hotspot.changeset

adds support for crc32c on aarch64. This has previously been added for Sparc (see https://bugs.openjdk.java.net/browse/JDK-8073583)

Performance measurements shows the throughput goes from ~620MB/s to 2938 MB/s == approx 4.7x performance improvement.

Tested before and after with jtreg hotspot. In both cases

Test results: passed: 867; failed: 3; error: 7

Please review.

Thanks,
Ed.


From aph at redhat.com  Tue Jul  7 16:08:25 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 07 Jul 2015 17:08:25 +0100
Subject: RFR: 8130687: aarch64: add support for hardware crc32c
In-Reply-To: <1436285168.1592.14.camel@mylittlepony.linaroharston>
References: <1436285168.1592.14.camel@mylittlepony.linaroharston>
Message-ID: <559BF979.3010204@redhat.com>

On 07/07/2015 05:06 PM, Edward Nevill wrote:
> Please review.

Looks good.

Andrew.


From vladimir.kozlov at oracle.com  Tue Jul  7 17:33:16 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 07 Jul 2015 10:33:16 -0700
Subject: RFR: 8130687: aarch64: add support for hardware crc32c
In-Reply-To: <1436285168.1592.14.camel@mylittlepony.linaroharston>
References: <1436285168.1592.14.camel@mylittlepony.linaroharston>
Message-ID: <559C0D5C.70106@oracle.com>

Looks good.

Thanks,
Vladimir

On 7/7/15 9:06 AM, Edward Nevill wrote:
> Hi,
>
> http://cr.openjdk.java.net/~enevill/8130687/webrev/hotspot.changeset
>
> adds support for crc32c on aarch64. This has previously been added for Sparc (see https://bugs.openjdk.java.net/browse/JDK-8073583)
>
> Performance measurements shows the throughput goes from ~620MB/s to 2938 MB/s == approx 4.7x performance improvement.
>
> Tested before and after with jtreg hotspot. In both cases
>
> Test results: passed: 867; failed: 3; error: 7
>
> Please review.
>
> Thanks,
> Ed.
>
>

From vladimir.kozlov at oracle.com  Tue Jul  7 17:48:48 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 07 Jul 2015 10:48:48 -0700
Subject: [9] RFR (M): VM should constant fold Unsafe.get*() loads from
	final fields
In-Reply-To: <559BE22D.9030206@oracle.com>
References: <5581A26C.6090303@oracle.com>	<810DE23B-6616-4465-B91D-4CD9A8FB267D@oracle.com>	<558AA9A1.8070603@oracle.com>
	<559BE22D.9030206@oracle.com>
Message-ID: <559C1100.7000503@oracle.com>

Looks reasonable to me.

graphKit.* files are listed without changes.

Thanks,
Vladimir K

On 7/7/15 7:29 AM, Vladimir Ivanov wrote:
> Any volunteers to review updated version? Thanks!
>
> Best regards,
> Vladimir Ivanov
>
> On 6/24/15 3:59 PM, Vladimir Ivanov wrote:
>> John, Paul, thanks for review!
>>
>> Updated webrev:
>>    http://cr.openjdk.java.net/~vlivanov/8078629/webrev.01/
>>
>> I spotted a bug when field and accessor types mismatch, but the JIT
>> still constant-folds the load. The fix made expected result detection
>> even more complex, so I decided to get rid of it & WhiteBox hooks
>> altogether. The test exercises different code paths and compares
>> returned values now.
>>
>>> WB.isCompileConstant is a nice little thing.  We should consider using
>>> it in java.lang.invoke
>>> to gate aggressive object-folding optimizations.  That's one reason to
>>> consider putting it
>>> somewhere more central that WB.  I can't propose a good place yet.
>>> (Unsafe is not quite right.)
>> Actually, there's already j.l.i.MethodHandleImpl.isCompileConstant.
>> Probably, compiler-specific interface is the right place for such
>> things. But, as I wrote before, I decided to avoid WB hooks.
>>
>>> The gating logic in library_call includes this extra term:  &&
>>> alias_type->field()->is_constant()
>>> Why not just drop it and let make_constant do the test (which it does)?
>> I wanted to stress that make_constant depends on whether the field is
>> constant or not. I failed to come up with a better method name
>> (try_make_constant? make_constant_attempt), so I decided to keep the
>> extra condition.
>>
>>> You have some lines with "/*require_const=*/" in two places; that
>>> can't be right.
>>> This is the result of functions with too many misc. arguments to keep
>>> track of.
>>> I don't have the code under my fingers, so I'm just guessing, but here
>>> are more suggestions:
>> Thanks! I tried to address all your suggestions in updated version.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> I wish the is_autobox_cache condition could be more localized.  Could
>>> we omit the boolean
>>> flag (almost always false), and where it is true, post-process the
>>> node?  Would that make
>>> the code simpler?
>>>
>>> This leads me to notice that make_constant is not related strongly to
>>> GraphKit; it is really
>>> a call to the Type and CI modules to look for a singleton type, ending
>>> with either a NULL
>>> or a call to GraphKit::makecon.  So you might consider changing Node*
>>> GK::make_constant
>>> to const Type* Type::make_constant.
>>>
>>> Now to pick at the argument salad we have in push_constant:  The
>>> effect of is_autobox_cache
>>> could be transferred to a method Type[Ary]::cast_to_autobox_cache(true).
>>> And the effect of stable_type on make_constant(ciCon,bool,bool,Type*),
>>> could also be factored out, as post-processing step
>>> contype=contype->Type::join(stabletype).
>>
>>

From vladimir.x.ivanov at oracle.com  Tue Jul  7 19:43:12 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 07 Jul 2015 22:43:12 +0300
Subject: [9] RFR (M): VM should constant fold Unsafe.get*() loads from
	final fields
In-Reply-To: <559C1100.7000503@oracle.com>
References: <5581A26C.6090303@oracle.com>	<810DE23B-6616-4465-B91D-4CD9A8FB267D@oracle.com>	<558AA9A1.8070603@oracle.com>	<559BE22D.9030206@oracle.com>
	<559C1100.7000503@oracle.com>
Message-ID: <559C2BD0.4030704@oracle.com>

Thanks, Vladimir!

Empty files are a webrev artifact when it works with a set of patches 
containing negating changes.

Best regards,
Vladimir Ivanov

On 7/7/15 8:48 PM, Vladimir Kozlov wrote:
> Looks reasonable to me.
>
> graphKit.* files are listed without changes.
>
> Thanks,
> Vladimir K
>
> On 7/7/15 7:29 AM, Vladimir Ivanov wrote:
>> Any volunteers to review updated version? Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 6/24/15 3:59 PM, Vladimir Ivanov wrote:
>>> John, Paul, thanks for review!
>>>
>>> Updated webrev:
>>>    http://cr.openjdk.java.net/~vlivanov/8078629/webrev.01/
>>>
>>> I spotted a bug when field and accessor types mismatch, but the JIT
>>> still constant-folds the load. The fix made expected result detection
>>> even more complex, so I decided to get rid of it & WhiteBox hooks
>>> altogether. The test exercises different code paths and compares
>>> returned values now.
>>>
>>>> WB.isCompileConstant is a nice little thing.  We should consider using
>>>> it in java.lang.invoke
>>>> to gate aggressive object-folding optimizations.  That's one reason to
>>>> consider putting it
>>>> somewhere more central that WB.  I can't propose a good place yet.
>>>> (Unsafe is not quite right.)
>>> Actually, there's already j.l.i.MethodHandleImpl.isCompileConstant.
>>> Probably, compiler-specific interface is the right place for such
>>> things. But, as I wrote before, I decided to avoid WB hooks.
>>>
>>>> The gating logic in library_call includes this extra term:  &&
>>>> alias_type->field()->is_constant()
>>>> Why not just drop it and let make_constant do the test (which it does)?
>>> I wanted to stress that make_constant depends on whether the field is
>>> constant or not. I failed to come up with a better method name
>>> (try_make_constant? make_constant_attempt), so I decided to keep the
>>> extra condition.
>>>
>>>> You have some lines with "/*require_const=*/" in two places; that
>>>> can't be right.
>>>> This is the result of functions with too many misc. arguments to keep
>>>> track of.
>>>> I don't have the code under my fingers, so I'm just guessing, but here
>>>> are more suggestions:
>>> Thanks! I tried to address all your suggestions in updated version.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>
>>>> I wish the is_autobox_cache condition could be more localized.  Could
>>>> we omit the boolean
>>>> flag (almost always false), and where it is true, post-process the
>>>> node?  Would that make
>>>> the code simpler?
>>>>
>>>> This leads me to notice that make_constant is not related strongly to
>>>> GraphKit; it is really
>>>> a call to the Type and CI modules to look for a singleton type, ending
>>>> with either a NULL
>>>> or a call to GraphKit::makecon.  So you might consider changing Node*
>>>> GK::make_constant
>>>> to const Type* Type::make_constant.
>>>>
>>>> Now to pick at the argument salad we have in push_constant:  The
>>>> effect of is_autobox_cache
>>>> could be transferred to a method
>>>> Type[Ary]::cast_to_autobox_cache(true).
>>>> And the effect of stable_type on make_constant(ciCon,bool,bool,Type*),
>>>> could also be factored out, as post-processing step
>>>> contype=contype->Type::join(stabletype).
>>>
>>>

From roland.westrelin at oracle.com  Wed Jul  8 08:24:20 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 8 Jul 2015 10:24:20 +0200
Subject: RFR: 8129920 - Vectorized loop unrolling
In-Reply-To: <C568518E7B433348B114B6A7122D474755E12B6E@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755E126A1@FMSMSX102.amr.corp.intel.com>
	<5591B1C4.302@oracle.com>
	<C568518E7B433348B114B6A7122D474755E12A60@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474755E12B6E@FMSMSX102.amr.corp.intel.com>
Message-ID: <54A75B9E-CD4F-4546-9235-07655FB9E821@oracle.com>

> Vladimir, please have a look at http://cr.openjdk.java.net/~mcberg/8129920/webrev.02

That looks good to me.

Roland.

From michael.haupt at oracle.com  Thu Jul  9 14:46:21 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Thu, 9 Jul 2015 16:46:21 +0200
Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool
Message-ID: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com>

Dear all,

please review and sponsor this change.
RFE: https://bugs.openjdk.java.net/browse/JDK-6900757
Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00

This affects the LogCompilation tool sources *only*, with one exception in compileBroker.cpp, where an extension was necessary to properly attribute the compiler in the log message.

Tested manually on various compilation logs.

Thanks,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150709/0445c1c0/attachment.html>

From roland.westrelin at oracle.com  Thu Jul  9 15:34:08 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 9 Jul 2015 17:34:08 +0200
Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not allowed
	any more
Message-ID: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com>

http://cr.openjdk.java.net/~roland/8130858/webrev.00/

It used to be possible to run with CICompilerCount=1 -XX:-TieredCompilation which I find useful during debugging so trace output of compiler threads are not intermixed. This was broken by 8122937.

Roland.

From vladimir.x.ivanov at oracle.com  Thu Jul  9 15:37:27 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 09 Jul 2015 18:37:27 +0300
Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not
	allowed any more
In-Reply-To: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com>
References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com>
Message-ID: <559E9537.9010401@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 7/9/15 6:34 PM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8130858/webrev.00/
>
> It used to be possible to run with CICompilerCount=1 -XX:-TieredCompilation which I find useful during debugging so trace output of compiler threads are not intermixed. This was broken by 8122937.
>
> Roland.
>

From vladimir.kozlov at oracle.com  Thu Jul  9 16:09:05 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 09 Jul 2015 09:09:05 -0700
Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not
	allowed any more
In-Reply-To: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com>
References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com>
Message-ID: <559E9CA1.2050505@oracle.com>

Nice!

Thanks,
Vladimir

On 7/9/15 8:34 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8130858/webrev.00/
>
> It used to be possible to run with CICompilerCount=1 -XX:-TieredCompilation which I find useful during debugging so trace output of compiler threads are not intermixed. This was broken by 8122937.
>
> Roland.
>

From vladimir.kozlov at oracle.com  Thu Jul  9 16:23:58 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 09 Jul 2015 09:23:58 -0700
Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool
In-Reply-To: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com>
References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com>
Message-ID: <559EA01E.20608@oracle.com>

Very nice work. Thank you for comments you added and new functionality.
I think it is good for integration.

Thanks,
Vladimir

On 7/9/15 7:46 AM, Michael Haupt wrote:
> Dear all,
>
> please review and sponsor this change.
> RFE: https://bugs.openjdk.java.net/browse/JDK-6900757
> Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00
>
> This affects the LogCompilation tool sources *only*, with one exception in compileBroker.cpp, where an extension was
> necessary to properly attribute the compiler in the log message.
>
> Tested manually on various compilation logs.
>
> Thanks,
>
> Michael
>
> --
>
> Oracle <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> OracleJava Platform Group | LangTools Team | Nashorn
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
> Green Oracle <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help
> protect the environment
>
>

From cnewland at chrisnewland.com  Thu Jul  9 20:10:40 2015
From: cnewland at chrisnewland.com (Chris Newland)
Date: Thu, 9 Jul 2015 21:10:40 +0100
Subject: Making PrintEscapeAnalysis a diagnostic option on product VM?
In-Reply-To: <558430C6.1090102@oracle.com>
References: <de0c0c67353a96ff91de67a54399104c.squirrel@excalibur.xssl.net>
	<5583F6EE.7070901@oracle.com>
	<fb781e3c4c5bc3371633dea12572489d.squirrel@excalibur.xssl.net>
	<558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com>
Message-ID: <c3c8392a5bd0cefa137413e36375cfc3.squirrel@excalibur.xssl.net>

Hi,

I've found a way to output EA information via LogCompilation from the
product VM including BCIs so that I can annotate bytecode but it relies on
the following block from opto/compile.cpp being entered:

void Compile::Init(int aliaslevel) {
...

  if (debug_info()->recording_non_safepoints()) {
    set_node_note_array(new(comp_arena()) GrowableArray<Node_Notes*>
                        (comp_arena(), 8, 0, NULL));
    set_default_node_notes(Node_Notes::make(this));
  }

Without this, the Note_Notes are not present in the ideal nodes and I
can't fully identify what was eliminated.

It looks like this will only execute when DebugNonSafepoints is true and
this is a diagnostic VM option.

Does anyone have an alternative method for getting BCIs for eliminated
allocs without using Note_Notes?

Thanks,

Chris


On Fri, June 19, 2015 16:09, Vladimir Kozlov wrote:
> Agree.
>
>
> Vladimir K
>
>
> On 6/19/15 5:10 AM, Vladimir Ivanov wrote:
>
>>> What do you think the next step is?
>>>
>>>
>>> I'm not a committer but I'd be happy to submit a patch/webrev that
>>> outputs LogCompilation XML for the kind of EA info I think would be
>>> useful.
>> Go for it. If you are a Contributor (signed OCA), we'll review and
>> accept your patch with gratitude. Keep in mind, that when you touch
>> LogCompilation output format, you should update logc tool
>> (src/share/tools/LogCompilation/ [1]) as well.
>>
>>
>>> I've just seen Vitaly's post and I agree a tty 1-liner for each
>>> elimination would also be nice.
>> Feel free to enhance -XX:+PrintEscapeAnalysis output as well, if you
>> find it useful.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>
>> [1]
>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/tools/L
>> ogCompilation
>>>
>>> Thanks,
>>>
>>>
>>> Chris
>>>
>>>
>>> On Fri, June 19, 2015 12:03, Vladimir Ivanov wrote:
>>>
>>>> Chris,
>>>>
>>>>
>>>>
>>>> I'd suggest to look into enhancing LogCompilation output instead of
>>>>  parsing VM output. It doesn't require any flag changes and fits
>>>> nicely into existing LogCompilation functionality, so we can
>>>> integrate it into the product, relieving you and JITWatch users from
>>>> building a companion VM.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>
>>>>
>>>> On 6/19/15 1:16 PM, Chris Newland wrote:
>>>>
>>>>
>>>>> Hi, hope this is the correct list (perhaps serviceability?)
>>>>>
>>>>>
>>>>>
>>>>> I'm experimenting with some HotSpot changes that log escape
>>>>> analysis decisions so that I can visualise eliminated allocations
>>>>> at the source and bytecode levels in JITWatch[1].
>>>>>
>>>>> My plan was to build a companion VM for JITWatch based on the
>>>>> product VM
>>>>> that would allow users to inspect some of the deeper workings such
>>>>> as EA and DCE that are not present in the LogCompilation output.
>>>>>
>>>>> I mentioned this to some performance guys at Devoxx and they
>>>>> didn't like the custom VM idea and suggested I put in a request to
>>>>> consider making -XX:+PrintEscapeAnalysis available under
>>>>> -XX:+UnlockDiagnosticVMOptions on
>>>>> the product VM (it's currently a notproduct option).
>>>>>
>>>>> If this is something you would consider than could I also request
>>>>>  consideration of -XX:+PrintEliminateAllocations.
>>>>>
>>>>> All I would need is the class, method, and bci of each NoEscape
>>>>> detected.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> [1] https://github.com/AdoptOpenJDK/jitwatch
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>


From cnewland at chrisnewland.com  Thu Jul  9 20:49:56 2015
From: cnewland at chrisnewland.com (Chris Newland)
Date: Thu, 9 Jul 2015 21:49:56 +0100
Subject: Making PrintEscapeAnalysis a diagnostic option on product VM?
In-Reply-To: <c3c8392a5bd0cefa137413e36375cfc3.squirrel@excalibur.xssl.net>
References: <de0c0c67353a96ff91de67a54399104c.squirrel@excalibur.xssl.net>
	<5583F6EE.7070901@oracle.com>
	<fb781e3c4c5bc3371633dea12572489d.squirrel@excalibur.xssl.net>
	<558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com>
	<c3c8392a5bd0cefa137413e36375cfc3.squirrel@excalibur.xssl.net>
Message-ID: <ecbddc45708a2e013abf3d2f67a939fb.squirrel@excalibur.xssl.net>

Please ignore previous message.

All this time I've been digging around the connection graph logging in
escape.cpp when I should have been looking in macro.cpp.

I believe the BCI information I need is actually already output by
PhaseMacroExpand::eliminate_allocate_node() so I'll attempt to visualise
this in JITWatch and submit a patch if I need anything further.

Thanks,

Chris

On Thu, July 9, 2015 21:10, Chris Newland wrote:
> Hi,
>
>
> I've found a way to output EA information via LogCompilation from the
> product VM including BCIs so that I can annotate bytecode but it relies on
>  the following block from opto/compile.cpp being entered:
>
> void Compile::Init(int aliaslevel) { ...
>
>
> if (debug_info()->recording_non_safepoints()) {
> set_node_note_array(new(comp_arena()) GrowableArray<Node_Notes*>
> (comp_arena(), 8, 0, NULL));
> set_default_node_notes(Node_Notes::make(this));
> }
>
>
> Without this, the Note_Notes are not present in the ideal nodes and I
> can't fully identify what was eliminated.
>
> It looks like this will only execute when DebugNonSafepoints is true and
> this is a diagnostic VM option.
>
> Does anyone have an alternative method for getting BCIs for eliminated
> allocs without using Note_Notes?
>
> Thanks,
>
>
> Chris
>
>
>
>
>
> On Fri, June 19, 2015 16:09, Vladimir Kozlov wrote:
>
>> Agree.
>>
>>
>>
>> Vladimir K
>>
>>
>>
>> On 6/19/15 5:10 AM, Vladimir Ivanov wrote:
>>
>>
>>>> What do you think the next step is?
>>>>
>>>>
>>>>
>>>> I'm not a committer but I'd be happy to submit a patch/webrev that
>>>> outputs LogCompilation XML for the kind of EA info I think would be
>>>> useful.
>>> Go for it. If you are a Contributor (signed OCA), we'll review and
>>> accept your patch with gratitude. Keep in mind, that when you touch
>>> LogCompilation output format, you should update logc tool
>>> (src/share/tools/LogCompilation/ [1]) as well.
>>>
>>>
>>>
>>>> I've just seen Vitaly's post and I agree a tty 1-liner for each
>>>> elimination would also be nice.
>>> Feel free to enhance -XX:+PrintEscapeAnalysis output as well, if you
>>> find it useful.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>
>>>
>>> [1]
>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/tools/
>>> L
>>> ogCompilation
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Chris
>>>>
>>>>
>>>>
>>>> On Fri, June 19, 2015 12:03, Vladimir Ivanov wrote:
>>>>
>>>>
>>>>> Chris,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I'd suggest to look into enhancing LogCompilation output instead
>>>>> of parsing VM output. It doesn't require any flag changes and fits
>>>>>  nicely into existing LogCompilation functionality, so we can
>>>>> integrate it into the product, relieving you and JITWatch users
>>>>> from building a companion VM.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 6/19/15 1:16 PM, Chris Newland wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi, hope this is the correct list (perhaps serviceability?)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm experimenting with some HotSpot changes that log escape
>>>>>> analysis decisions so that I can visualise eliminated
>>>>>> allocations at the source and bytecode levels in JITWatch[1].
>>>>>>
>>>>>> My plan was to build a companion VM for JITWatch based on the
>>>>>> product VM that would allow users to inspect some of the deeper
>>>>>> workings such as EA and DCE that are not present in the
>>>>>> LogCompilation output.
>>>>>>
>>>>>>
>>>>>> I mentioned this to some performance guys at Devoxx and they
>>>>>> didn't like the custom VM idea and suggested I put in a request
>>>>>> to consider making -XX:+PrintEscapeAnalysis available under
>>>>>> -XX:+UnlockDiagnosticVMOptions on
>>>>>> the product VM (it's currently a notproduct option).
>>>>>>
>>>>>> If this is something you would consider than could I also
>>>>>> request consideration of -XX:+PrintEliminateAllocations.
>>>>>>
>>>>>> All I would need is the class, method, and bci of each NoEscape
>>>>>>  detected.
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1] https://github.com/AdoptOpenJDK/jitwatch
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>
>
>
>


From vladimir.kozlov at oracle.com  Thu Jul  9 21:00:45 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 09 Jul 2015 14:00:45 -0700
Subject: Making PrintEscapeAnalysis a diagnostic option on product VM?
In-Reply-To: <ecbddc45708a2e013abf3d2f67a939fb.squirrel@excalibur.xssl.net>
References: <de0c0c67353a96ff91de67a54399104c.squirrel@excalibur.xssl.net>
	<5583F6EE.7070901@oracle.com>
	<fb781e3c4c5bc3371633dea12572489d.squirrel@excalibur.xssl.net>
	<558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com>
	<c3c8392a5bd0cefa137413e36375cfc3.squirrel@excalibur.xssl.net>
	<ecbddc45708a2e013abf3d2f67a939fb.squirrel@excalibur.xssl.net>
Message-ID: <559EE0FD.10502@oracle.com>

Thank you for doing this, Chris

Each call node (and allocation is call node) has associated jvm state (the information needed to deoptimize compiled 
code on the return from the call). Deoptimization info allow to restart execution in interpreter so it has all bci and 
inlined methods information.

Regards,
Vladimir

On 7/9/15 1:49 PM, Chris Newland wrote:
> Please ignore previous message.
>
> All this time I've been digging around the connection graph logging in
> escape.cpp when I should have been looking in macro.cpp.
>
> I believe the BCI information I need is actually already output by
> PhaseMacroExpand::eliminate_allocate_node() so I'll attempt to visualise
> this in JITWatch and submit a patch if I need anything further.
>
> Thanks,
>
> Chris
>
> On Thu, July 9, 2015 21:10, Chris Newland wrote:
>> Hi,
>>
>>
>> I've found a way to output EA information via LogCompilation from the
>> product VM including BCIs so that I can annotate bytecode but it relies on
>>   the following block from opto/compile.cpp being entered:
>>
>> void Compile::Init(int aliaslevel) { ...
>>
>>
>> if (debug_info()->recording_non_safepoints()) {
>> set_node_note_array(new(comp_arena()) GrowableArray<Node_Notes*>
>> (comp_arena(), 8, 0, NULL));
>> set_default_node_notes(Node_Notes::make(this));
>> }
>>
>>
>> Without this, the Note_Notes are not present in the ideal nodes and I
>> can't fully identify what was eliminated.
>>
>> It looks like this will only execute when DebugNonSafepoints is true and
>> this is a diagnostic VM option.
>>
>> Does anyone have an alternative method for getting BCIs for eliminated
>> allocs without using Note_Notes?
>>
>> Thanks,
>>
>>
>> Chris
>>
>>
>>
>>
>>
>> On Fri, June 19, 2015 16:09, Vladimir Kozlov wrote:
>>
>>> Agree.
>>>
>>>
>>>
>>> Vladimir K
>>>
>>>
>>>
>>> On 6/19/15 5:10 AM, Vladimir Ivanov wrote:
>>>
>>>
>>>>> What do you think the next step is?
>>>>>
>>>>>
>>>>>
>>>>> I'm not a committer but I'd be happy to submit a patch/webrev that
>>>>> outputs LogCompilation XML for the kind of EA info I think would be
>>>>> useful.
>>>> Go for it. If you are a Contributor (signed OCA), we'll review and
>>>> accept your patch with gratitude. Keep in mind, that when you touch
>>>> LogCompilation output format, you should update logc tool
>>>> (src/share/tools/LogCompilation/ [1]) as well.
>>>>
>>>>
>>>>
>>>>> I've just seen Vitaly's post and I agree a tty 1-liner for each
>>>>> elimination would also be nice.
>>>> Feel free to enhance -XX:+PrintEscapeAnalysis output as well, if you
>>>> find it useful.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>
>>>>
>>>> [1]
>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/tools/
>>>> L
>>>> ogCompilation
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> On Fri, June 19, 2015 12:03, Vladimir Ivanov wrote:
>>>>>
>>>>>
>>>>>> Chris,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'd suggest to look into enhancing LogCompilation output instead
>>>>>> of parsing VM output. It doesn't require any flag changes and fits
>>>>>>   nicely into existing LogCompilation functionality, so we can
>>>>>> integrate it into the product, relieving you and JITWatch users
>>>>>> from building a companion VM.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/19/15 1:16 PM, Chris Newland wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi, hope this is the correct list (perhaps serviceability?)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm experimenting with some HotSpot changes that log escape
>>>>>>> analysis decisions so that I can visualise eliminated
>>>>>>> allocations at the source and bytecode levels in JITWatch[1].
>>>>>>>
>>>>>>> My plan was to build a companion VM for JITWatch based on the
>>>>>>> product VM that would allow users to inspect some of the deeper
>>>>>>> workings such as EA and DCE that are not present in the
>>>>>>> LogCompilation output.
>>>>>>>
>>>>>>>
>>>>>>> I mentioned this to some performance guys at Devoxx and they
>>>>>>> didn't like the custom VM idea and suggested I put in a request
>>>>>>> to consider making -XX:+PrintEscapeAnalysis available under
>>>>>>> -XX:+UnlockDiagnosticVMOptions on
>>>>>>> the product VM (it's currently a notproduct option).
>>>>>>>
>>>>>>> If this is something you would consider than could I also
>>>>>>> request consideration of -XX:+PrintEliminateAllocations.
>>>>>>>
>>>>>>> All I would need is the class, method, and bci of each NoEscape
>>>>>>>   detected.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1] https://github.com/AdoptOpenJDK/jitwatch
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>>
>
>

From anthony.scarpino at oracle.com  Thu Jul  9 21:07:39 2015
From: anthony.scarpino at oracle.com (Anthony Scarpino)
Date: Thu, 09 Jul 2015 14:07:39 -0700
Subject: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException
Message-ID: <559EE29B.6030307@oracle.com>

Hi all,

I need a review of my bug fix.  Most of the lines are a test update, 
with a small important change to the 32bit assembly to save & restore 
registers around the GHASH op.

http://cr.openjdk.java.net/~ascarpino/8130341/webrev/

thanks

Tony

From vladimir.kozlov at oracle.com  Thu Jul  9 21:15:05 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 09 Jul 2015 14:15:05 -0700
Subject: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException
In-Reply-To: <559EE29B.6030307@oracle.com>
References: <559EE29B.6030307@oracle.com>
Message-ID: <559EE459.4000202@oracle.com>

Looks good. Thank you for fixing this.

Thanks,
Vladimir

On 7/9/15 2:07 PM, Anthony Scarpino wrote:
> Hi all,
>
> I need a review of my bug fix.  Most of the lines are a test update, with a small important change to the 32bit assembly
> to save & restore registers around the GHASH op.
>
> http://cr.openjdk.java.net/~ascarpino/8130341/webrev/
>
> thanks
>
> Tony

From michael.c.berg at intel.com  Thu Jul  9 21:19:06 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 9 Jul 2015 21:19:06 +0000
Subject: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException
In-Reply-To: <559EE459.4000202@oracle.com>
References: <559EE29B.6030307@oracle.com> <559EE459.4000202@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474755E1373A@FMSMSX102.amr.corp.intel.com>

Looks good Anthony.

Thanks,
-Michael

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
Sent: Thursday, July 09, 2015 2:15 PM
To: Anthony Scarpino; hotspot-compiler-dev at openjdk.java.net compiler
Subject: Re: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException

Looks good. Thank you for fixing this.

Thanks,
Vladimir

On 7/9/15 2:07 PM, Anthony Scarpino wrote:
> Hi all,
>
> I need a review of my bug fix.  Most of the lines are a test update, 
> with a small important change to the 32bit assembly to save & restore registers around the GHASH op.
>
> http://cr.openjdk.java.net/~ascarpino/8130341/webrev/
>
> thanks
>
> Tony

From anthony.scarpino at oracle.com  Thu Jul  9 21:30:23 2015
From: anthony.scarpino at oracle.com (Anthony Scarpino)
Date: Thu, 09 Jul 2015 14:30:23 -0700
Subject: build error in jdk9/hs-comp/?
Message-ID: <559EE7EF.3070005@oracle.com>

Anyone else seeing the below error with CompileDemos.gmk in hs-comp?  I 
updated my repo and brought over a new fresh one, but got the same 
failure.  "--enable-deploy=no" works around the problem.

$ make images
Compiling 5 files for BUILD_GENMODULESLIST
Building target 'images' in configuration 
'linux-x86_64-normal-server-release'
Compiling 8 files for BUILD_TOOLS_LANGTOOLS
make[3]: CompileDemos.gmk: No such file or directory
make[3]: *** No rule to make target 'CompileDemos.gmk'.  Stop.
make[2]: *** [demos-deploy] Error 1

Tony

From john.r.rose at oracle.com  Thu Jul  9 21:34:58 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 9 Jul 2015 14:34:58 -0700
Subject: Making PrintEscapeAnalysis a diagnostic option on product VM?
In-Reply-To: <c3c8392a5bd0cefa137413e36375cfc3.squirrel@excalibur.xssl.net>
References: <de0c0c67353a96ff91de67a54399104c.squirrel@excalibur.xssl.net>
	<5583F6EE.7070901@oracle.com>
	<fb781e3c4c5bc3371633dea12572489d.squirrel@excalibur.xssl.net>
	<558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com>
	<c3c8392a5bd0cefa137413e36375cfc3.squirrel@excalibur.xssl.net>
Message-ID: <1BE18DE5-716C-47B8-9B07-640C080758F6@oracle.com>

On Jul 9, 2015, at 1:10 PM, Chris Newland <cnewland at chrisnewland.com> wrote:
> 
> It looks like this will only execute when DebugNonSafepoints is true and
> this is a diagnostic VM option.
> 
> Does anyone have an alternative method for getting BCIs for eliminated
> allocs without using Note_Notes?

The node notes are incomplete and approximate mappings from optimized IR back to source (BCI) locations.
They may be useful when nothing more accurate is available, but you can't rely on them to always tell the truth.
They were introduced to give hints to instruction-level profile tools about where the instructions come from.
? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150709/54965f84/attachment.html>

From vladimir.kozlov at oracle.com  Thu Jul  9 21:45:10 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 09 Jul 2015 14:45:10 -0700
Subject: build error in jdk9/hs-comp/?
In-Reply-To: <559EE7EF.3070005@oracle.com>
References: <559EE7EF.3070005@oracle.com>
Message-ID: <559EEB66.4010406@oracle.com>

No. We passed JPRT builds. And I saw your test JPRT job passed too today.
I would suggest to create fresh clone of hs-comp forest and build it from scratch.

Vladimir

On 7/9/15 2:30 PM, Anthony Scarpino wrote:
> Anyone else seeing the below error with CompileDemos.gmk in hs-comp?  I updated my repo and brought over a new fresh
> one, but got the same failure.  "--enable-deploy=no" works around the problem.
>
> $ make images
> Compiling 5 files for BUILD_GENMODULESLIST
> Building target 'images' in configuration 'linux-x86_64-normal-server-release'
> Compiling 8 files for BUILD_TOOLS_LANGTOOLS
> make[3]: CompileDemos.gmk: No such file or directory
> make[3]: *** No rule to make target 'CompileDemos.gmk'.  Stop.
> make[2]: *** [demos-deploy] Error 1
>
> Tony

From serkan at hazelcast.com  Sun Jul 12 10:29:34 2015
From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=)
Date: Sun, 12 Jul 2015 13:29:34 +0300
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
Message-ID: <CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>

Hi all,

I have created a webrev for review including the patch and shared for
public access from here:
https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html

Regards.

On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal <serkan at hazelcast.com> wrote:

> Hi,
>
> I have added some logs to show that problem is caused by double scaling of
> offset (index)
>
> Here is my updated (log messages added) reproducer code:
>
>
> int count = 100000;
> long size = count * 8L;
> long baseAddress = unsafe.allocateMemory(size);
> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>                    ", End address: " + Long.toHexString(baseAddress +
> size));
>
> for (int i = 0; i < count; i++) {
>     long address = baseAddress + (i * 8L);
>     System.out.println(
>         "Normal: " + Long.toHexString(address) + ", " +
>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
> 8L)));
>     long expected = i;
>     unsafe.putLong(address, expected);
>     unsafe.getLong(address);
> }
>
>
> After sometime it crashes as
>
>
> ...
> Current thread (0x0000000002068800):  JavaThread "main" [_thread_in_Java,
> id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>
> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
> ...
> ...
>
>
> And here is output of the execution until crash:
>
> Start address: 58bbcfa0, End address: 58c804a0
> Normal: 58bbcfa0, If double scaled: 58bbcfa0
> Normal: 58bbcfa8, If double scaled: 58bbcfe0
> Normal: 58bbcfb0, If double scaled: 58bbd020
> ...
> ...
> Normal: 58c517b0, If double scaled: 59061020
>
>
> As seen from the logs and crash dump, double scaled version of target
> address (*If double scaled: 59061020*) is the same with the problematic
> address (*siginfo: ExceptionCode=0xc0000005, reading address
> 0x0000000059061020*) that causes to crash while accessing it.
>
> So I think, it is obvious that the crash is caused by wrong optimization
> of index value since index is scaled two times (for *Unsafe::put* and
> *Unsafe::get*) instead of only one time. Then double scaled index points
> to invalid memory address.
>
> Regards.
>
> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com> wrote:
>
>> Hi all,
>>
>> I had dived into the issue with JDK-HotSpot commits and
>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>
>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>> }
>>
>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>> }
>>
>>
>> So I run the test by calculating address as
>> - *"int * long"* (int is index and long is 8l)
>> - *"long * long"* (the first long is index and the second long is 8l)
>> - *"int * int"* (the first int is index and the second int is 8)
>>
>> Here are the logs:
>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>
>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>> same *"base"* and *"index"* instructions.
>> This means that address is scaled one more time because there should be only one scale.
>>
>>
>> When I debugged the non-problematic run (*"int * int"*),
>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>> So there is no scaling.
>> static bool match_index_and_scale(Instruction*  instr,
>>                                   Instruction** index,
>>                                   int*          log2_scale) {
>>   ...
>>
>>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>>   if (arith != NULL) {
>>      ...
>>   }
>>
>>   return false;
>> }
>>
>>
>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>   Instruction* base = NULL;
>>   Instruction* index = NULL;
>>   int          log2_scale;
>>
>>   if (match(x, &base, &index, &log2_scale)) {
>>     x->set_base(base);
>>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>>     if (index != NULL) {
>>       if (index->is_pinned()) {
>>         log2_scale = 0;
>>       } else {
>>         if (log2_scale != 0) {
>>           index->pin();
>>         }
>>       }
>>     }    // /////////////////////////////
>>     x->set_log2_scale(log2_scale);
>>     if (PrintUnsafeOptimization) {
>>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>     }
>>   }
>> }
>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>
>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>
>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>
>> Regards.
>>
>> --
>>
>> Serkan ?ZAL
>>
>>
>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>> converted to
>>> long address = baseAddress + (i * 8)
>>> test passes. Only difference is next long pointer is calculated using
>>> integer 8 instead of long 8.
>>> ```
>>> for (int i = 0; i < count; i++) {
>>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>> of long 8
>>>     long expected = i;
>>>     unsafe.putLong(address, expected);
>>>     long actual = unsafe.getLong(address);
>>>     if (expected != actual) {
>>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>>> actual);
>>>     }
>>> }
>>> ```
>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>> >* Hi all,
>>> *>
>>> >* While I was testing my app using java 8, I encountered the previously
>>> *>* reported sun.misc.Unsafe issue.
>>> *>
>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>> *>
>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>> *>
>>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>> *>* "1.9.0-ea-b67".
>>> *>
>>> >* Test is very simple:
>>> *>
>>> >* ```
>>> *>* public static void main(String[] args) throws Exception {
>>> *>*         Unsafe unsafe = findUnsafe();
>>> *>*         // 10000 pass
>>> *>*         // 100000 jvm crash
>>> *>*         // 1000000 fail
>>> *>*         int count = 100000;
>>> *>*         long size = count * 8L;
>>> *>*         long baseAddress = unsafe.allocateMemory(size);
>>> *>
>>> >*         try {
>>> *>*             for (int i = 0; i < count; i++) {
>>> *>*                 long address = baseAddress + (i * 8L);
>>> *>
>>> >*                 long expected = i;
>>> *>*                 unsafe.putLong(address, expected);
>>> *>
>>> >*                 long actual = unsafe.getLong(address);
>>> *>
>>> >*                 if (expected != actual) {
>>> *>*                     throw new AssertionError("Expected: " + expected + ",
>>> *>* Actual: " + actual);
>>> *>*                 }
>>> *>*             }
>>> *>*         } finally {
>>> *>*             unsafe.freeMemory(baseAddress);
>>> *>*         }
>>> *>*     }
>>> *>* ```
>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>> *>* failing constantly.
>>> *>
>>> >* - With iteration count 10000, test is passing.
>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>> *>
>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>> *>* failing at all.
>>> *>
>>> >* I tested on platforms:
>>> *>* - Centos-7/openjdk-1.8.0.45
>>> *>* - OSX/oraclejdk-1.8.0.40
>>> *>* - OSX/oraclejdk-1.8.0.45
>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>> *>
>>> >* Previous issue comment (
>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>> *>* both are failing.
>>> *>
>>> >* I'm looking forward to hearing from you.
>>> *>
>>> >* Thanks,
>>> *>* -Mehmet Dogan-
>>> *>* --
>>> *>
>>> >* @mmdogan
>>> *>
>>
>>
>> --
>> Serkan ?ZAL
>> Remotest Software Engineer
>> GSM: +90 542 680 39 18
>> Twitter: @serkan_ozal
>>
>
>
>
> --
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>


-- 
Serkan ?ZAL
Remotest Software Engineer
GSM: +90 542 680 39 18
Twitter: @serkan_ozal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150712/174cac42/attachment-0001.html>

From martijnverburg at gmail.com  Sun Jul 12 11:54:55 2015
From: martijnverburg at gmail.com (Martijn Verburg)
Date: Sun, 12 Jul 2015 12:54:55 +0100
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
	<CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>
Message-ID: <CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>

Non reviewer here, but I'd add to the comment *why* you don't want to scale
again.

Cheers,
Martijn

On 12 July 2015 at 11:29, Serkan ?zal <serkan at hazelcast.com> wrote:

> Hi all,
>
> I have created a webrev for review including the patch and shared for
> public access from here:
> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>
> Regards.
>
> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal <serkan at hazelcast.com> wrote:
>
>> Hi,
>>
>> I have added some logs to show that problem is caused by double scaling
>> of offset (index)
>>
>> Here is my updated (log messages added) reproducer code:
>>
>>
>> int count = 100000;
>> long size = count * 8L;
>> long baseAddress = unsafe.allocateMemory(size);
>> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>>                    ", End address: " + Long.toHexString(baseAddress +
>> size));
>>
>> for (int i = 0; i < count; i++) {
>>     long address = baseAddress + (i * 8L);
>>     System.out.println(
>>         "Normal: " + Long.toHexString(address) + ", " +
>>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
>> 8L)));
>>     long expected = i;
>>     unsafe.putLong(address, expected);
>>     unsafe.getLong(address);
>> }
>>
>>
>> After sometime it crashes as
>>
>>
>> ...
>> Current thread (0x0000000002068800):  JavaThread "main" [_thread_in_Java,
>> id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>>
>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
>> ...
>> ...
>>
>>
>> And here is output of the execution until crash:
>>
>> Start address: 58bbcfa0, End address: 58c804a0
>> Normal: 58bbcfa0, If double scaled: 58bbcfa0
>> Normal: 58bbcfa8, If double scaled: 58bbcfe0
>> Normal: 58bbcfb0, If double scaled: 58bbd020
>> ...
>> ...
>> Normal: 58c517b0, If double scaled: 59061020
>>
>>
>> As seen from the logs and crash dump, double scaled version of target
>> address (*If double scaled: 59061020*) is the same with the problematic
>> address (*siginfo: ExceptionCode=0xc0000005, reading address
>> 0x0000000059061020*) that causes to crash while accessing it.
>>
>> So I think, it is obvious that the crash is caused by wrong optimization
>> of index value since index is scaled two times (for *Unsafe::put* and
>> *Unsafe::get*) instead of only one time. Then double scaled index points
>> to invalid memory address.
>>
>> Regards.
>>
>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I had dived into the issue with JDK-HotSpot commits and
>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>>
>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>> }
>>>
>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>> }
>>>
>>>
>>> So I run the test by calculating address as
>>> - *"int * long"* (int is index and long is 8l)
>>> - *"long * long"* (the first long is index and the second long is 8l)
>>> - *"int * int"* (the first int is index and the second int is 8)
>>>
>>> Here are the logs:
>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>>
>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>>> same *"base"* and *"index"* instructions.
>>> This means that address is scaled one more time because there should be only one scale.
>>>
>>>
>>> When I debugged the non-problematic run (*"int * int"*),
>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>>> So there is no scaling.
>>> static bool match_index_and_scale(Instruction*  instr,
>>>                                   Instruction** index,
>>>                                   int*          log2_scale) {
>>>   ...
>>>
>>>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>>>   if (arith != NULL) {
>>>      ...
>>>   }
>>>
>>>   return false;
>>> }
>>>
>>>
>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>>   Instruction* base = NULL;
>>>   Instruction* index = NULL;
>>>   int          log2_scale;
>>>
>>>   if (match(x, &base, &index, &log2_scale)) {
>>>     x->set_base(base);
>>>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>>>     if (index != NULL) {
>>>       if (index->is_pinned()) {
>>>         log2_scale = 0;
>>>       } else {
>>>         if (log2_scale != 0) {
>>>           index->pin();
>>>         }
>>>       }
>>>     }    // /////////////////////////////
>>>     x->set_log2_scale(log2_scale);
>>>     if (PrintUnsafeOptimization) {
>>>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>     }
>>>   }
>>> }
>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>>
>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>>
>>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>>
>>> Regards.
>>>
>>> --
>>>
>>> Serkan ?ZAL
>>>
>>>
>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>>> converted to
>>>> long address = baseAddress + (i * 8)
>>>> test passes. Only difference is next long pointer is calculated using
>>>> integer 8 instead of long 8.
>>>> ```
>>>> for (int i = 0; i < count; i++) {
>>>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>>> of long 8
>>>>     long expected = i;
>>>>     unsafe.putLong(address, expected);
>>>>     long actual = unsafe.getLong(address);
>>>>     if (expected != actual) {
>>>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>>>> actual);
>>>>     }
>>>> }
>>>> ```
>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>>> >* Hi all,
>>>> *>
>>>> >* While I was testing my app using java 8, I encountered the previously
>>>> *>* reported sun.misc.Unsafe issue.
>>>> *>
>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>>> *>
>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>>> *>
>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>>> *>* "1.9.0-ea-b67".
>>>> *>
>>>> >* Test is very simple:
>>>> *>
>>>> >* ```
>>>> *>* public static void main(String[] args) throws Exception {
>>>> *>*         Unsafe unsafe = findUnsafe();
>>>> *>*         // 10000 pass
>>>> *>*         // 100000 jvm crash
>>>> *>*         // 1000000 fail
>>>> *>*         int count = 100000;
>>>> *>*         long size = count * 8L;
>>>> *>*         long baseAddress = unsafe.allocateMemory(size);
>>>> *>
>>>> >*         try {
>>>> *>*             for (int i = 0; i < count; i++) {
>>>> *>*                 long address = baseAddress + (i * 8L);
>>>> *>
>>>> >*                 long expected = i;
>>>> *>*                 unsafe.putLong(address, expected);
>>>> *>
>>>> >*                 long actual = unsafe.getLong(address);
>>>> *>
>>>> >*                 if (expected != actual) {
>>>> *>*                     throw new AssertionError("Expected: " + expected + ",
>>>> *>* Actual: " + actual);
>>>> *>*                 }
>>>> *>*             }
>>>> *>*         } finally {
>>>> *>*             unsafe.freeMemory(baseAddress);
>>>> *>*         }
>>>> *>*     }
>>>> *>* ```
>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>>> *>* failing constantly.
>>>> *>
>>>> >* - With iteration count 10000, test is passing.
>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>>> *>
>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>>> *>* failing at all.
>>>> *>
>>>> >* I tested on platforms:
>>>> *>* - Centos-7/openjdk-1.8.0.45
>>>> *>* - OSX/oraclejdk-1.8.0.40
>>>> *>* - OSX/oraclejdk-1.8.0.45
>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>>> *>
>>>> >* Previous issue comment (
>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>>> *>* both are failing.
>>>> *>
>>>> >* I'm looking forward to hearing from you.
>>>> *>
>>>> >* Thanks,
>>>> *>* -Mehmet Dogan-
>>>> *>* --
>>>> *>
>>>> >* @mmdogan
>>>> *>
>>>
>>>
>>> --
>>> Serkan ?ZAL
>>> Remotest Software Engineer
>>> GSM: +90 542 680 39 18
>>> Twitter: @serkan_ozal
>>>
>>
>>
>>
>> --
>> Serkan ?ZAL
>> Remotest Software Engineer
>> GSM: +90 542 680 39 18
>> Twitter: @serkan_ozal
>>
>
>
>
> --
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150712/bd2e2e40/attachment-0001.html>

From serkan at hazelcast.com  Sun Jul 12 12:07:06 2015
From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=)
Date: Sun, 12 Jul 2015 15:07:06 +0300
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
	<CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>
	<CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>
Message-ID: <CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>

Hi Martjin,

Thanks for your interest and comment for making this thread a little bit
more hot.


>From my previous message (
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html
):

I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:


void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {

  if (OptimizeUnsafes) do_UnsafeRawOp(x);

  tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d,
index = id %d, log2_scale = %d",

                    x->id(), x->base()->id(), x->index()->id(),
x->log2_scale());

}


void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {

  if (OptimizeUnsafes) do_UnsafeRawOp(x);

  tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d,
index = id %d, log2_scale = %d",

                    x->id(), x->base()->id(), x->index()->id(),
x->log2_scale());

}


So I run the test by calculating address as:

- *"int * long"* (int is index and long is 8l)

- *"long * long"* (the first long is index and the second long is 8l)

- *"int * int"* (the first int is index and the second int is 8)

Here are the logs:


*int * long:*

Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23,
log2_scale = 0

Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27,
log2_scale = 3

Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27,
log2_scale = 3

*long * long:*

Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23,
log2_scale = 0

Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14,
log2_scale = 3

Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14,
log2_scale = 3

*int * int:*

Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23,
log2_scale = 0

Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29,
log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29,
log2_scale = 0

Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0

Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0

As you can see, at the problematic runs (*"int * long"* and *"long *
long"*) there are two scaling.

One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and
these instructions points to

same *"base"* and *"index"* instructions. This means that address is
scaled one more time because there should be only one scale.


With this fix (or attempt since I am not %100 sure if it is
perfect/optimum way or not), I prevent multiple scaling on the same
index instruction.

Also one of my previous messages
(http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html)
shows that there are multiple scaling on the index so when it scaled
multiple, anymore it shows somewhere or anywhere in the memory.


On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg <martijnverburg at gmail.com>
wrote:

> Non reviewer here, but I'd add to the comment *why* you don't want to
> scale again.
>
> Cheers,
> Martijn
>
> On 12 July 2015 at 11:29, Serkan ?zal <serkan at hazelcast.com> wrote:
>
>> Hi all,
>>
>> I have created a webrev for review including the patch and shared for
>> public access from here:
>> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>>
>> Regards.
>>
>> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal <serkan at hazelcast.com> wrote:
>>
>>> Hi,
>>>
>>> I have added some logs to show that problem is caused by double scaling
>>> of offset (index)
>>>
>>> Here is my updated (log messages added) reproducer code:
>>>
>>>
>>> int count = 100000;
>>> long size = count * 8L;
>>> long baseAddress = unsafe.allocateMemory(size);
>>> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>>>                    ", End address: " + Long.toHexString(baseAddress +
>>> size));
>>>
>>> for (int i = 0; i < count; i++) {
>>>     long address = baseAddress + (i * 8L);
>>>     System.out.println(
>>>         "Normal: " + Long.toHexString(address) + ", " +
>>>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
>>> 8L)));
>>>     long expected = i;
>>>     unsafe.putLong(address, expected);
>>>     unsafe.getLong(address);
>>> }
>>>
>>>
>>> After sometime it crashes as
>>>
>>>
>>> ...
>>> Current thread (0x0000000002068800):  JavaThread "main"
>>> [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>>>
>>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
>>> ...
>>> ...
>>>
>>>
>>> And here is output of the execution until crash:
>>>
>>> Start address: 58bbcfa0, End address: 58c804a0
>>> Normal: 58bbcfa0, If double scaled: 58bbcfa0
>>> Normal: 58bbcfa8, If double scaled: 58bbcfe0
>>> Normal: 58bbcfb0, If double scaled: 58bbd020
>>> ...
>>> ...
>>> Normal: 58c517b0, If double scaled: 59061020
>>>
>>>
>>> As seen from the logs and crash dump, double scaled version of target
>>> address (*If double scaled: 59061020*) is the same with the problematic
>>> address (*siginfo: ExceptionCode=0xc0000005, reading address
>>> 0x0000000059061020*) that causes to crash while accessing it.
>>>
>>> So I think, it is obvious that the crash is caused by wrong optimization
>>> of index value since index is scaled two times (for *Unsafe::put* and
>>> *Unsafe::get*) instead of only one time. Then double scaled index
>>> points to invalid memory address.
>>>
>>> Regards.
>>>
>>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I had dived into the issue with JDK-HotSpot commits and
>>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>>>
>>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>> }
>>>>
>>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>> }
>>>>
>>>>
>>>> So I run the test by calculating address as
>>>> - *"int * long"* (int is index and long is 8l)
>>>> - *"long * long"* (the first long is index and the second long is 8l)
>>>> - *"int * int"* (the first int is index and the second int is 8)
>>>>
>>>> Here are the logs:
>>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>>>
>>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>>>> same *"base"* and *"index"* instructions.
>>>> This means that address is scaled one more time because there should be only one scale.
>>>>
>>>>
>>>> When I debugged the non-problematic run (*"int * int"*),
>>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>>>> So there is no scaling.
>>>> static bool match_index_and_scale(Instruction*  instr,
>>>>                                   Instruction** index,
>>>>                                   int*          log2_scale) {
>>>>   ...
>>>>
>>>>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>>>>   if (arith != NULL) {
>>>>      ...
>>>>   }
>>>>
>>>>   return false;
>>>> }
>>>>
>>>>
>>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>>>   Instruction* base = NULL;
>>>>   Instruction* index = NULL;
>>>>   int          log2_scale;
>>>>
>>>>   if (match(x, &base, &index, &log2_scale)) {
>>>>     x->set_base(base);
>>>>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>>>>     if (index != NULL) {
>>>>       if (index->is_pinned()) {
>>>>         log2_scale = 0;
>>>>       } else {
>>>>         if (log2_scale != 0) {
>>>>           index->pin();
>>>>         }
>>>>       }
>>>>     }    // /////////////////////////////
>>>>     x->set_log2_scale(log2_scale);
>>>>     if (PrintUnsafeOptimization) {
>>>>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>     }
>>>>   }
>>>> }
>>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>>>
>>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>>>
>>>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>>>
>>>> Regards.
>>>>
>>>> --
>>>>
>>>> Serkan ?ZAL
>>>>
>>>>
>>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>>>> converted to
>>>>> long address = baseAddress + (i * 8)
>>>>> test passes. Only difference is next long pointer is calculated using
>>>>> integer 8 instead of long 8.
>>>>> ```
>>>>> for (int i = 0; i < count; i++) {
>>>>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>>>> of long 8
>>>>>     long expected = i;
>>>>>     unsafe.putLong(address, expected);
>>>>>     long actual = unsafe.getLong(address);
>>>>>     if (expected != actual) {
>>>>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>>>>> actual);
>>>>>     }
>>>>> }
>>>>> ```
>>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>>>> >* Hi all,
>>>>> *>
>>>>> >* While I was testing my app using java 8, I encountered the previously
>>>>> *>* reported sun.misc.Unsafe issue.
>>>>> *>
>>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>>>> *>
>>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>>>> *>
>>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>>>> *>* "1.9.0-ea-b67".
>>>>> *>
>>>>> >* Test is very simple:
>>>>> *>
>>>>> >* ```
>>>>> *>* public static void main(String[] args) throws Exception {
>>>>> *>*         Unsafe unsafe = findUnsafe();
>>>>> *>*         // 10000 pass
>>>>> *>*         // 100000 jvm crash
>>>>> *>*         // 1000000 fail
>>>>> *>*         int count = 100000;
>>>>> *>*         long size = count * 8L;
>>>>> *>*         long baseAddress = unsafe.allocateMemory(size);
>>>>> *>
>>>>> >*         try {
>>>>> *>*             for (int i = 0; i < count; i++) {
>>>>> *>*                 long address = baseAddress + (i * 8L);
>>>>> *>
>>>>> >*                 long expected = i;
>>>>> *>*                 unsafe.putLong(address, expected);
>>>>> *>
>>>>> >*                 long actual = unsafe.getLong(address);
>>>>> *>
>>>>> >*                 if (expected != actual) {
>>>>> *>*                     throw new AssertionError("Expected: " + expected + ",
>>>>> *>* Actual: " + actual);
>>>>> *>*                 }
>>>>> *>*             }
>>>>> *>*         } finally {
>>>>> *>*             unsafe.freeMemory(baseAddress);
>>>>> *>*         }
>>>>> *>*     }
>>>>> *>* ```
>>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>>>> *>* failing constantly.
>>>>> *>
>>>>> >* - With iteration count 10000, test is passing.
>>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>>>> *>
>>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>>>> *>* failing at all.
>>>>> *>
>>>>> >* I tested on platforms:
>>>>> *>* - Centos-7/openjdk-1.8.0.45
>>>>> *>* - OSX/oraclejdk-1.8.0.40
>>>>> *>* - OSX/oraclejdk-1.8.0.45
>>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>>>> *>
>>>>> >* Previous issue comment (
>>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>>>> *>* both are failing.
>>>>> *>
>>>>> >* I'm looking forward to hearing from you.
>>>>> *>
>>>>> >* Thanks,
>>>>> *>* -Mehmet Dogan-
>>>>> *>* --
>>>>> *>
>>>>> >* @mmdogan
>>>>> *>
>>>>
>>>>
>>>> --
>>>> Serkan ?ZAL
>>>> Remotest Software Engineer
>>>> GSM: +90 542 680 39 18
>>>> Twitter: @serkan_ozal
>>>>
>>>
>>>
>>>
>>> --
>>> Serkan ?ZAL
>>> Remotest Software Engineer
>>> GSM: +90 542 680 39 18
>>> Twitter: @serkan_ozal
>>>
>>
>>
>>
>> --
>> Serkan ?ZAL
>> Remotest Software Engineer
>> GSM: +90 542 680 39 18
>> Twitter: @serkan_ozal
>>
>
>


-- 
Serkan ?ZAL
Remotest Software Engineer
GSM: +90 542 680 39 18
Twitter: @serkan_ozal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150712/7de62dc2/attachment-0001.html>

From michael.haupt at oracle.com  Mon Jul 13 07:04:08 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Mon, 13 Jul 2015 09:04:08 +0200
Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool
In-Reply-To: <559EA01E.20608@oracle.com>
References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com>
	<559EA01E.20608@oracle.com>
Message-ID: <A075E7F5-19DE-41CD-AF86-4BD7A77AA04F@oracle.com>

Hi Vladimir,

thank you. Does this require another review? If not, could a sponsor please step up? :-)

Best,

Michael

> Am 09.07.2015 um 18:23 schrieb Vladimir Kozlov <vladimir.kozlov at oracle.com>:
> 
> Very nice work. Thank you for comments you added and new functionality.
> I think it is good for integration.
> 
> Thanks,
> Vladimir
> 
> On 7/9/15 7:46 AM, Michael Haupt wrote:
>> Dear all,
>> 
>> please review and sponsor this change.
>> RFE: https://bugs.openjdk.java.net/browse/JDK-6900757
>> Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00
>> 
>> This affects the LogCompilation tool sources *only*, with one exception in compileBroker.cpp, where an extension was
>> necessary to properly attribute the compiler in the log message.
>> 
>> Tested manually on various compilation logs.
>> 
>> Thanks,
>> 
>> Michael


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150713/df3cad6a/attachment.html>

From gerard.ziemski at oracle.com  Mon Jul 13 14:17:01 2015
From: gerard.ziemski at oracle.com (gerard ziemski)
Date: Mon, 13 Jul 2015 09:17:01 -0500
Subject: RFR (XXS): 8079156: 32 bit Java 9-fastdebug hit assertion in
	client mode with StackShadowPages flag value from 32 to 50
In-Reply-To: <55A34DF9.5050905@oracle.com>
References: <55A03481.2040709@oracle.com> <55A34DF9.5050905@oracle.com>
Message-ID: <55A3C85D.1060109@oracle.com>

hi David,

On 07/13/2015 12:34 AM, David Holmes wrote:
> Hi Gerard,
>
> On 11/07/2015 7:09 AM, gerard ziemski wrote:
>> (resending - forgot to include the issue number in the title)
>>
>> Hi all,
>>
>> Please review this very small fix:
>>
>>     bug: https://bugs.openjdk.java.net/browse/JDK-8079156
>> webrev: http://cr.openjdk.java.net/~gziemski/8079156_rev0
>
> I'd like to hear from the compiler folk about this bug as I'm unclear 
> on why StackBanging adds to the code buffer, and why this suddenly 
> seems to be a new problem - did something change? Or have we not 
> exercised the range of values before?

My best educated quess is that the issue was always in there, but we 
never exercised that path - Dmity's only added his testing framework for 
exercising the ranges after we got the range/constraints check feature 
in recently and that's when we found it.


>
> I'd also like to understand whether the code buffer resizing should be 
> rounded up (or whether it will be rounded up internally)? e.g. power 
> of two, multiple of nK for some n etc.

I checked the sizes of existing CodeBuffers instr sizes and some values 
I saw are: 416,536,544,560,568, so I'm not sure what the constraint here 
is if there really is one (other than divisible by 4, which 112 is as well)


>
> The 112 value seems odd for 50 pages - is this 2 bytes (words?) per 
> page plus some fixed overhead? Can it be expressed as a function of 
> StackShadowPages rather than hardwiring to 112 which only works for 
> values < 50?

Hardcoding the value for 50 pages should be OK here since that's the max 
value that StackShadowPages can take.

Expressing it as some function would not be all that simple - you would 
need to take in account that the default size is enough for some 
StackShadowPages (ie.32), then find out the fixed size for the stack 
banging function. In the end you would end up with some hardcoded values 
anyhow, so why not make it super simple as we did here?

The other way is to calculate things dynamically and I actually did 
that: my first fix was based on creating a temp CodeBuffer and feeding 
it only shadow stack banging code to find out the exact size requirement 
for that code, but I was told that this might confuse some compiler code 
later that wouldn't expect it. The other unknown was whether the temp 
code buffer code actually made it into in the cache (is it flushed by 
the destructor?). I tried to find a way to wipe out the instr section 
before the destructor, but couldn't find any APIs for doing so.

I don't know the answers to those issues, so even though I liked the 
idea of using a temp buffer to find out precisely how much more memory 
we used, in the end I settled on the simplest solution that works.

Would folks from the compiler like to comment?


cheers


From zoltan.majo at oracle.com  Mon Jul 13 16:48:56 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Mon, 13 Jul 2015 18:48:56 +0200
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information
	about the availability of compiler intrinsics
Message-ID: <55A3EBF8.2020208@oracle.com>

Hi,


please review the following patch for JDK-8130832.

Bug: https://bugs.openjdk.java.net/browse/JDK-8130832

Problem: Currently, compiler-related tests frequently use the 
architecture descriptor string (e.g., "aarch64", "x86") to determine if 
a certain compiler feature is available on the platform where the JVM is 
executing. If the tested features is a compiler intrinsic, using 
architecture descriptor strings is an inaccurate way of determining if 
the intrinsic is available. The reason is that the availability of 
compiler intrinsics is guided by many factors (e.g., the value of 
command line flags and instructions available on the platform) and as a 
result a test might expect an intrinsic to be available when it is in 
fact not available.


Solution: This enhancement proposes adding a new WhiteBox method, 
is_compiled_intrinsic_available(Executable method, int compileLevel) 
that returns true if an intrinsic for method 'method' is available at 
compile level 'compileLevel' (the final API might differ from the 
proposed API).

To test the new API, a new test, 
hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is added. 
Moreover, existing tests in 
hotspot/test/compiler/intrinsics/mathexact/sanity are be updated to use 
the newly added WhiteBox method.

Webrev:
- top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/
- hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/

Testing:
- all JPRT tests in the hotspot testset (incl. all tests in 
hotspot/compiler/intrinsics/mathexact), all tests pass;
- all hotspot JTREG tests executed locally on Linux x86_64, all tests 
pass that pass with an unmodified VM; all tests were executed also with 
-Xcomp;
- hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and all 
tests in hotspot/compiler/intrinsics/mathexact executed on arm64, all 
tests pass;
- manual testing on Linux x86_64 to verify that the functionality of the 
DisableIntrinsic flag is preserved.

Thank you and best regards,


Zoltan


From anthony.scarpino at oracle.com  Mon Jul 13 18:28:27 2015
From: anthony.scarpino at oracle.com (Anthony Scarpino)
Date: Mon, 13 Jul 2015 11:28:27 -0700
Subject: RFR 8131078: typos in ghash cpu message
Message-ID: <55A4034B.2090005@oracle.com>

Hi,

I need a quick review of the typos Andreas Kohn saw.

http://cr.openjdk.java.net/~ascarpino/8131078/webrev/

Tony

From goetz.lindenmaier at sap.com  Mon Jul 13 18:41:20 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 13 Jul 2015 18:41:20 +0000
Subject: RFR 8131078: typos in ghash cpu message
In-Reply-To: <55A4034B.2090005@oracle.com>
References: <55A4034B.2090005@oracle.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap>

Hi Tony,

I would use singular for 'instruction' as at the other three
occurrences of the same string.

Besides that: Reviewed.

Best regards,
  Goetz.

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino
Sent: Monday, July 13, 2015 8:28 PM
To: hotspot-compiler-dev at openjdk.java.net compiler
Subject: RFR 8131078: typos in ghash cpu message

Hi,

I need a quick review of the typos Andreas Kohn saw.

http://cr.openjdk.java.net/~ascarpino/8131078/webrev/

Tony

From anthony.scarpino at oracle.com  Mon Jul 13 18:48:23 2015
From: anthony.scarpino at oracle.com (Anthony Scarpino)
Date: Mon, 13 Jul 2015 11:48:23 -0700
Subject: RFR 8131078: typos in ghash cpu message
In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap>
References: <55A4034B.2090005@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap>
Message-ID: <55A407F7.7010908@oracle.com>

Sounds reasonable.. I updated the webrev in place..

Tony

On 07/13/2015 11:41 AM, Lindenmaier, Goetz wrote:
> Hi Tony,
>
> I would use singular for 'instruction' as at the other three
> occurrences of the same string.
>
> Besides that: Reviewed.
>
> Best regards,
>    Goetz.
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino
> Sent: Monday, July 13, 2015 8:28 PM
> To: hotspot-compiler-dev at openjdk.java.net compiler
> Subject: RFR 8131078: typos in ghash cpu message
>
> Hi,
>
> I need a quick review of the typos Andreas Kohn saw.
>
> http://cr.openjdk.java.net/~ascarpino/8131078/webrev/
>
> Tony
>


From goetz.lindenmaier at sap.com  Mon Jul 13 18:53:17 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 13 Jul 2015 18:53:17 +0000
Subject: RFR 8131078: typos in ghash cpu message
In-Reply-To: <55A407F7.7010908@oracle.com>
References: <55A4034B.2090005@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap>
	<55A407F7.7010908@oracle.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC2D005F48@DEWDFEMB12A.global.corp.sap>

That's good, thanks!

Best regards,
  Goetz.

-----Original Message-----
From: Anthony Scarpino [mailto:anthony.scarpino at oracle.com] 
Sent: Monday, July 13, 2015 8:48 PM
To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler'
Subject: Re: RFR 8131078: typos in ghash cpu message

Sounds reasonable.. I updated the webrev in place..

Tony

On 07/13/2015 11:41 AM, Lindenmaier, Goetz wrote:
> Hi Tony,
>
> I would use singular for 'instruction' as at the other three
> occurrences of the same string.
>
> Besides that: Reviewed.
>
> Best regards,
>    Goetz.
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino
> Sent: Monday, July 13, 2015 8:28 PM
> To: hotspot-compiler-dev at openjdk.java.net compiler
> Subject: RFR 8131078: typos in ghash cpu message
>
> Hi,
>
> I need a quick review of the typos Andreas Kohn saw.
>
> http://cr.openjdk.java.net/~ascarpino/8131078/webrev/
>
> Tony
>


From vladimir.kozlov at oracle.com  Mon Jul 13 18:56:35 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 13 Jul 2015 11:56:35 -0700
Subject: RFR 8131078: typos in ghash cpu message
In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D005F48@DEWDFEMB12A.global.corp.sap>
References: <55A4034B.2090005@oracle.com>	<4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap>	<55A407F7.7010908@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D005F48@DEWDFEMB12A.global.corp.sap>
Message-ID: <55A409E3.9000006@oracle.com>

Looks good.

Thanks,
Vladimir

On 7/13/15 11:53 AM, Lindenmaier, Goetz wrote:
> That's good, thanks!
>
> Best regards,
>    Goetz.
>
> -----Original Message-----
> From: Anthony Scarpino [mailto:anthony.scarpino at oracle.com]
> Sent: Monday, July 13, 2015 8:48 PM
> To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler'
> Subject: Re: RFR 8131078: typos in ghash cpu message
>
> Sounds reasonable.. I updated the webrev in place..
>
> Tony
>
> On 07/13/2015 11:41 AM, Lindenmaier, Goetz wrote:
>> Hi Tony,
>>
>> I would use singular for 'instruction' as at the other three
>> occurrences of the same string.
>>
>> Besides that: Reviewed.
>>
>> Best regards,
>>     Goetz.
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino
>> Sent: Monday, July 13, 2015 8:28 PM
>> To: hotspot-compiler-dev at openjdk.java.net compiler
>> Subject: RFR 8131078: typos in ghash cpu message
>>
>> Hi,
>>
>> I need a quick review of the typos Andreas Kohn saw.
>>
>> http://cr.openjdk.java.net/~ascarpino/8131078/webrev/
>>
>> Tony
>>
>

From vladimir.kozlov at oracle.com  Tue Jul 14 03:41:43 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 13 Jul 2015 20:41:43 -0700
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55A3EBF8.2020208@oracle.com>
References: <55A3EBF8.2020208@oracle.com>
Message-ID: <55A484F7.4010705@oracle.com>

Nice work, Zoltan

I am worry about CompilerOracle::has_option_value() call and name() == vmSymbols:: checks.
WB code does ThreadInVMfromNative transition and calls is_intrinsic_available_for() in VM state.
Compiler thread did that only when calling into CompilerOracle (ciMethod::has_option_value()), otherwise it use CI. But 
with your change compiler does not make transition - it calls is_intrinsic_available_for() in Native state. It may cause 
problems. Compiler should not access directly VM data. And, on other hand, to access CI data (cySymbol) we need to set 
up ciEnv but WB current thread is not compiler thread and the thread is already in VM state anyway.

I thought how to solve this problem but nothing simple come up. The ugly solution is to have :has_option_value() call 
and name() == vmSymbols:: checks outside is_intrinsic_available_for(). I mean to duplicate in 
Compile::make_vm_intrinsic() and C2Compiler::is_intrinsic_available_for(). But I would like to avoid cloning code.

An other approach is Compile::make_vm_intrinsic() can go into VM state for is_intrinsic_available_for() call:

   bool is_available = false;
   {
      VM_ENTRY_MARK;
      methodHandle mh(THREAD, m->get_Method());
      methodHandle ct(THREAD, method()->get_Method());
      is_available = is_intrinsic_available_for(mh, ct, is_virtual);
   }


But we usually do that in CI and not in compiler code. Which means we have to move the method to CI but we have Matcher 
calls.

I would ask John's opinion on this problem.

You can simplify a little too. Methods intrinsic_does_virtual_dispatch_for() and intrinsic_predicates_needed_for() can 
get just intrinsic id. Similar for methods in C1.

There are several method->method_holder()->name() calls which could be done only once.

Use different name for 'method' local to avoid using Compile::

+   Method* method = m->get_Method();
+   Method* compilation_context = Compile::method()->get_Method();

Please, add comment to new test explaining why you chose crc32 intrinsic. Why?

Thanks,
Vladimir

On 7/13/15 9:48 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the following patch for JDK-8130832.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832
>
> Problem: Currently, compiler-related tests frequently use the architecture descriptor string (e.g., "aarch64", "x86") to
> determine if a certain compiler feature is available on the platform where the JVM is executing. If the tested features
> is a compiler intrinsic, using architecture descriptor strings is an inaccurate way of determining if the intrinsic is
> available. The reason is that the availability of compiler intrinsics is guided by many factors (e.g., the value of
> command line flags and instructions available on the platform) and as a result a test might expect an intrinsic to be
> available when it is in fact not available.
>
>
> Solution: This enhancement proposes adding a new WhiteBox method, is_compiled_intrinsic_available(Executable method, int
> compileLevel) that returns true if an intrinsic for method 'method' is available at compile level 'compileLevel' (the
> final API might differ from the proposed API).
>
> To test the new API, a new test, hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is added. Moreover,
> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity are be updated to use the newly added WhiteBox method.
>
> Webrev:
> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/
> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/
>
> Testing:
> - all JPRT tests in the hotspot testset (incl. all tests in hotspot/compiler/intrinsics/mathexact), all tests pass;
> - all hotspot JTREG tests executed locally on Linux x86_64, all tests pass that pass with an unmodified VM; all tests
> were executed also with -Xcomp;
> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and all tests in hotspot/compiler/intrinsics/mathexact
> executed on arm64, all tests pass;
> - manual testing on Linux x86_64 to verify that the functionality of the DisableIntrinsic flag is preserved.
>
> Thank you and best regards,
>
>
> Zoltan
>

From vladimir.x.ivanov at oracle.com  Tue Jul 14 08:22:50 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 14 Jul 2015 11:22:50 +0300
Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool
In-Reply-To: <A075E7F5-19DE-41CD-AF86-4BD7A77AA04F@oracle.com>
References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com>	<559EA01E.20608@oracle.com>
	<A075E7F5-19DE-41CD-AF86-4BD7A77AA04F@oracle.com>
Message-ID: <55A4C6DA.70409@oracle.com>

Looks good! I'll push it for you.

Best regards,
Vladimir Ivanov

On 7/13/15 10:04 AM, Michael Haupt wrote:
> Hi Vladimir,
>
> thank you. Does this require another review? If not, could a sponsor
> please step up? :-)
>
> Best,
>
> Michael
>
>> Am 09.07.2015 um 18:23 schrieb Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>:
>>
>> Very nice work. Thank you for comments you added and new functionality.
>> I think it is good for integration.
>>
>> Thanks,
>> Vladimir
>>
>> On 7/9/15 7:46 AM, Michael Haupt wrote:
>>> Dear all,
>>>
>>> please review and sponsor this change.
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-6900757
>>> Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00
>>>
>>> This affects the LogCompilation tool sources *only*, with one
>>> exception in compileBroker.cpp, where an extension was
>>> necessary to properly attribute the compiler in the log message.
>>>
>>> Tested manually on various compilation logs.
>>>
>>> Thanks,
>>>
>>> Michael
>
>
> --
>
> Oracle <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> OracleJava Platform Group | LangTools Team | Nashorn
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam,
> Germany
> Green Oracle <http://www.oracle.com/commitment>	Oracle is committed to
> developing practices and products that help protect the environment
>
>

From michael.haupt at oracle.com  Tue Jul 14 08:36:56 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Tue, 14 Jul 2015 10:36:56 +0200
Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool
In-Reply-To: <55A4C6DA.70409@oracle.com>
References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com>
	<559EA01E.20608@oracle.com>
	<A075E7F5-19DE-41CD-AF86-4BD7A77AA04F@oracle.com>
	<55A4C6DA.70409@oracle.com>
Message-ID: <81928C50-8011-4D09-88D5-1CDA59ECD95D@oracle.com>

Hi Vladimir,

thank you. I'll send the export your way.

Best,

Michael

> Am 14.07.2015 um 10:22 schrieb Vladimir Ivanov <vladimir.x.ivanov at oracle.com>:
> 
> Looks good! I'll push it for you.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 7/13/15 10:04 AM, Michael Haupt wrote:
>> Hi Vladimir,
>> 
>> thank you. Does this require another review? If not, could a sponsor
>> please step up? :-)
>> 
>> Best,
>> 
>> Michael
>> 
>>> Am 09.07.2015 um 18:23 schrieb Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>:
>>> 
>>> Very nice work. Thank you for comments you added and new functionality.
>>> I think it is good for integration.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 7/9/15 7:46 AM, Michael Haupt wrote:
>>>> Dear all,
>>>> 
>>>> please review and sponsor this change.
>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-6900757
>>>> Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00
>>>> 
>>>> This affects the LogCompilation tool sources *only*, with one
>>>> exception in compileBroker.cpp, where an extension was
>>>> necessary to properly attribute the compiler in the log message.
>>>> 
>>>> Tested manually on various compilation logs.
>>>> 
>>>> Thanks,
>>>> 
>>>> Michael


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150714/2b3bfd25/attachment.html>

From zoltan.majo at oracle.com  Tue Jul 14 16:27:14 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 14 Jul 2015 18:27:14 +0200
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55A484F7.4010705@oracle.com>
References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com>
Message-ID: <55A53862.7040308@oracle.com>

Hi Vladimir,


On 07/14/2015 05:41 AM, Vladimir Kozlov wrote:
> Nice work, Zoltan

thank you!

>
> I am worry about CompilerOracle::has_option_value() call and name() == 
> vmSymbols:: checks.
> WB code does ThreadInVMfromNative transition and calls 
> is_intrinsic_available_for() in VM state.
> Compiler thread did that only when calling into CompilerOracle 
> (ciMethod::has_option_value()), otherwise it use CI. But with your 
> change compiler does not make transition - it calls 
> is_intrinsic_available_for() in Native state. It may cause problems. 
> Compiler should not access directly VM data. And, on other hand, to 
> access CI data (cySymbol) we need to set up ciEnv but WB current 
> thread is not compiler thread and the thread is already in VM state 
> anyway.

thank you for catching that problem, I did not think of it before.

>
> I thought how to solve this problem but nothing simple come up. The 
> ugly solution is to have :has_option_value() call and name() == 
> vmSymbols:: checks outside is_intrinsic_available_for(). I mean to 
> duplicate in Compile::make_vm_intrinsic() and 
> C2Compiler::is_intrinsic_available_for(). But I would like to avoid 
> cloning code.

I agree with you.

>
> An other approach is Compile::make_vm_intrinsic() can go into VM state 
> for is_intrinsic_available_for() call:
>
>   bool is_available = false;
>   {
>      VM_ENTRY_MARK;
>      methodHandle mh(THREAD, m->get_Method());
>      methodHandle ct(THREAD, method()->get_Method());
>      is_available = is_intrinsic_available_for(mh, ct, is_virtual);
>   }
>
>
> But we usually do that in CI and not in compiler code. Which means we 
> have to move the method to CI but we have Matcher calls.
>
> I would ask John's opinion on this problem.

I would prefer to keep is_intrinsic_available in compiler code (instead 
of moving it to CI) because the compiler itself is in a better position 
than the CI to determine which method it can intrinsify under which 
conditions. But I'll ask John as well what he thinks and then we can 
decide then how to proceed.

Until then, I've updated Compile::make_vm_intrinsic() to go into VM 
state as you've suggested (please see webrev below).

>
> You can simplify a little too. Methods 
> intrinsic_does_virtual_dispatch_for() and 
> intrinsic_predicates_needed_for() can get just intrinsic id. Similar 
> for methods in C1.

I changed that for both C1 and C2.

>
> There are several method->method_holder()->name() calls which could be 
> done only once.

I changed that as well.

>
> Use different name for 'method' local to avoid using Compile::
>
> +   Method* method = m->get_Method();
> +   Method* compilation_context = Compile::method()->get_Method();

I've updated the variable name.

>
> Please, add comment to new test explaining why you chose crc32 
> intrinsic. Why?

I hoped that using a single test method for all compilation levels 
tested will keep the test simple. The crc32 is intrinsic is available 
with both C1 and C2 and both intrinsics can be controlled with a single 
flag, UseCRC32Intrinsics, in both product- and fastdebug builds. I 
updated the test to clarify that.

Here is the updated webrev (for hotspot):
http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/

I tested with
- the hotspot testset with JPRT
- all hotspot/compiler JTREG tests

All tests pass.

Thank you and best regards,


Zoltan

>
> Thanks,
> Vladimir
>
> On 7/13/15 9:48 AM, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> please review the following patch for JDK-8130832.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832
>>
>> Problem: Currently, compiler-related tests frequently use the 
>> architecture descriptor string (e.g., "aarch64", "x86") to
>> determine if a certain compiler feature is available on the platform 
>> where the JVM is executing. If the tested features
>> is a compiler intrinsic, using architecture descriptor strings is an 
>> inaccurate way of determining if the intrinsic is
>> available. The reason is that the availability of compiler intrinsics 
>> is guided by many factors (e.g., the value of
>> command line flags and instructions available on the platform) and as 
>> a result a test might expect an intrinsic to be
>> available when it is in fact not available.
>>
>>
>> Solution: This enhancement proposes adding a new WhiteBox method, 
>> is_compiled_intrinsic_available(Executable method, int
>> compileLevel) that returns true if an intrinsic for method 'method' 
>> is available at compile level 'compileLevel' (the
>> final API might differ from the proposed API).
>>
>> To test the new API, a new test, 
>> hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is 
>> added. Moreover,
>> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity 
>> are be updated to use the newly added WhiteBox method.
>>
>> Webrev:
>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/
>> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/
>>
>> Testing:
>> - all JPRT tests in the hotspot testset (incl. all tests in 
>> hotspot/compiler/intrinsics/mathexact), all tests pass;
>> - all hotspot JTREG tests executed locally on Linux x86_64, all tests 
>> pass that pass with an unmodified VM; all tests
>> were executed also with -Xcomp;
>> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and 
>> all tests in hotspot/compiler/intrinsics/mathexact
>> executed on arm64, all tests pass;
>> - manual testing on Linux x86_64 to verify that the functionality of 
>> the DisableIntrinsic flag is preserved.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>


From vladimir.kozlov at oracle.com  Tue Jul 14 17:11:44 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 14 Jul 2015 10:11:44 -0700
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55A53862.7040308@oracle.com>
References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com>
	<55A53862.7040308@oracle.com>
Message-ID: <55A542D0.50100@oracle.com>

Pass methodHandle parameters to Compile::is_intrinsic_available_for() in C2.
CompilerOracle::has_option_value() has methodHandle parameter.

And it is safer to use methodHandle.

And you forgot 'hg add' for new test - it is not in webrev.

Thanks,
Vladimir

On 7/14/15 9:27 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> On 07/14/2015 05:41 AM, Vladimir Kozlov wrote:
>> Nice work, Zoltan
>
> thank you!
>
>>
>> I am worry about CompilerOracle::has_option_value() call and name() == vmSymbols:: checks.
>> WB code does ThreadInVMfromNative transition and calls is_intrinsic_available_for() in VM state.
>> Compiler thread did that only when calling into CompilerOracle (ciMethod::has_option_value()), otherwise it use CI.
>> But with your change compiler does not make transition - it calls is_intrinsic_available_for() in Native state. It may
>> cause problems. Compiler should not access directly VM data. And, on other hand, to access CI data (cySymbol) we need
>> to set up ciEnv but WB current thread is not compiler thread and the thread is already in VM state anyway.
>
> thank you for catching that problem, I did not think of it before.
>
>>
>> I thought how to solve this problem but nothing simple come up. The ugly solution is to have :has_option_value() call
>> and name() == vmSymbols:: checks outside is_intrinsic_available_for(). I mean to duplicate in
>> Compile::make_vm_intrinsic() and C2Compiler::is_intrinsic_available_for(). But I would like to avoid cloning code.
>
> I agree with you.
>
>>
>> An other approach is Compile::make_vm_intrinsic() can go into VM state for is_intrinsic_available_for() call:
>>
>>   bool is_available = false;
>>   {
>>      VM_ENTRY_MARK;
>>      methodHandle mh(THREAD, m->get_Method());
>>      methodHandle ct(THREAD, method()->get_Method());
>>      is_available = is_intrinsic_available_for(mh, ct, is_virtual);
>>   }
>>
>>
>> But we usually do that in CI and not in compiler code. Which means we have to move the method to CI but we have
>> Matcher calls.
>>
>> I would ask John's opinion on this problem.
>
> I would prefer to keep is_intrinsic_available in compiler code (instead of moving it to CI) because the compiler itself
> is in a better position than the CI to determine which method it can intrinsify under which conditions. But I'll ask
> John as well what he thinks and then we can decide then how to proceed.
>
> Until then, I've updated Compile::make_vm_intrinsic() to go into VM state as you've suggested (please see webrev below).
>
>>
>> You can simplify a little too. Methods intrinsic_does_virtual_dispatch_for() and intrinsic_predicates_needed_for() can
>> get just intrinsic id. Similar for methods in C1.
>
> I changed that for both C1 and C2.
>
>>
>> There are several method->method_holder()->name() calls which could be done only once.
>
> I changed that as well.
>
>>
>> Use different name for 'method' local to avoid using Compile::
>>
>> +   Method* method = m->get_Method();
>> +   Method* compilation_context = Compile::method()->get_Method();
>
> I've updated the variable name.
>
>>
>> Please, add comment to new test explaining why you chose crc32 intrinsic. Why?
>
> I hoped that using a single test method for all compilation levels tested will keep the test simple. The crc32 is
> intrinsic is available with both C1 and C2 and both intrinsics can be controlled with a single flag, UseCRC32Intrinsics,
> in both product- and fastdebug builds. I updated the test to clarify that.
>
> Here is the updated webrev (for hotspot):
> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/
>
> I tested with
> - the hotspot testset with JPRT
> - all hotspot/compiler JTREG tests
>
> All tests pass.
>
> Thank you and best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>> On 7/13/15 9:48 AM, Zolt?n Maj? wrote:
>>> Hi,
>>>
>>>
>>> please review the following patch for JDK-8130832.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832
>>>
>>> Problem: Currently, compiler-related tests frequently use the architecture descriptor string (e.g., "aarch64", "x86") to
>>> determine if a certain compiler feature is available on the platform where the JVM is executing. If the tested features
>>> is a compiler intrinsic, using architecture descriptor strings is an inaccurate way of determining if the intrinsic is
>>> available. The reason is that the availability of compiler intrinsics is guided by many factors (e.g., the value of
>>> command line flags and instructions available on the platform) and as a result a test might expect an intrinsic to be
>>> available when it is in fact not available.
>>>
>>>
>>> Solution: This enhancement proposes adding a new WhiteBox method, is_compiled_intrinsic_available(Executable method, int
>>> compileLevel) that returns true if an intrinsic for method 'method' is available at compile level 'compileLevel' (the
>>> final API might differ from the proposed API).
>>>
>>> To test the new API, a new test, hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is added. Moreover,
>>> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity are be updated to use the newly added WhiteBox
>>> method.
>>>
>>> Webrev:
>>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/
>>> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/
>>>
>>> Testing:
>>> - all JPRT tests in the hotspot testset (incl. all tests in hotspot/compiler/intrinsics/mathexact), all tests pass;
>>> - all hotspot JTREG tests executed locally on Linux x86_64, all tests pass that pass with an unmodified VM; all tests
>>> were executed also with -Xcomp;
>>> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and all tests in hotspot/compiler/intrinsics/mathexact
>>> executed on arm64, all tests pass;
>>> - manual testing on Linux x86_64 to verify that the functionality of the DisableIntrinsic flag is preserved.
>>>
>>> Thank you and best regards,
>>>
>>>
>>> Zoltan
>>>
>

From thomas.schatzl at oracle.com  Wed Jul 15 14:19:42 2015
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 15 Jul 2015 16:19:42 +0200
Subject: RFR (XXS): 8131344: Missing klass.inline.hpp include in compiler files
Message-ID: <1436969982.2282.41.camel@oracle.com>

Hi all,

  while working on changes for PLAB handling improvements I found that
with these changes compilation would not be successful any more due to
some compiler code missing some includes to oops/Klass.inline.hpp
because of the use of the Klass::encode_klass() method.

This change adds the additional includes to these files.

I intend to push this through the hs-rt tree since all other changes
will be pushed through that tree, but it changes compiler files only, so
I have it out for review here.

CR:
https://bugs.openjdk.java.net/browse/JDK-8131344

Webrev:
http://cr.openjdk.java.net/~tschatzl/8131344/webrev/

Testing:
jprt

Thanks,
  Thomas


From zoltan.majo at oracle.com  Wed Jul 15 14:54:19 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 15 Jul 2015 16:54:19 +0200
Subject: [9] RFR(S): 8131326: Enable CheckIntrinsics in all types of builds
Message-ID: <55A6741B.7050607@oracle.com>

Hi,


please review the following patch for JDK-8131326.

Bug: https://bugs.openjdk.java.net/browse/JDK-8131326

Problem: The CheckIntrinsics flag added by JDK-8076112 is currently 
enabled only in debug builds. As a result, users of product builds might 
easier oversee potential mismatches between VM-level and classfile-level 
intrinsics.

Solution: This enhancement enables the flag in all types of builds 
(incl. product builds). The check for orphan methods in 
src/share/vm/classfile/classFileParser.cpp is also controlled by the 
CheckIntrinsics flag. To limit the impact of that potentially expensive 
check on our product builds, this enhancment proposes to include the 
check only in debug builds.

Webrev: http://cr.openjdk.java.net/~zmajo/8131326/webrev.00/

Testing: JPRT run using the hotspot testset; all tests pass.

Thank you and best regards,


Zoltan


From vladimir.kozlov at oracle.com  Wed Jul 15 15:36:37 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 15 Jul 2015 08:36:37 -0700
Subject: [9] RFR(S): 8131326: Enable CheckIntrinsics in all types of builds
In-Reply-To: <55A6741B.7050607@oracle.com>
References: <55A6741B.7050607@oracle.com>
Message-ID: <55A67E05.5080400@oracle.com>

Looks good.

Thanks,
Vladimir

On 7/15/15 7:54 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the following patch for JDK-8131326.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8131326
>
> Problem: The CheckIntrinsics flag added by JDK-8076112 is currently enabled only in debug builds. As a result, users of
> product builds might easier oversee potential mismatches between VM-level and classfile-level intrinsics.
>
> Solution: This enhancement enables the flag in all types of builds (incl. product builds). The check for orphan methods
> in src/share/vm/classfile/classFileParser.cpp is also controlled by the CheckIntrinsics flag. To limit the impact of
> that potentially expensive check on our product builds, this enhancment proposes to include the check only in debug builds.
>
> Webrev: http://cr.openjdk.java.net/~zmajo/8131326/webrev.00/
>
> Testing: JPRT run using the hotspot testset; all tests pass.
>
> Thank you and best regards,
>
>
> Zoltan
>

From vladimir.kozlov at oracle.com  Wed Jul 15 15:39:29 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 15 Jul 2015 08:39:29 -0700
Subject: RFR (XXS): 8131344: Missing klass.inline.hpp include in compiler
	files
In-Reply-To: <1436969982.2282.41.camel@oracle.com>
References: <1436969982.2282.41.camel@oracle.com>
Message-ID: <55A67EB1.7030203@oracle.com>

Looks fine to me.

Thanks,
Vladimir

On 7/15/15 7:19 AM, Thomas Schatzl wrote:
> Hi all,
>
>    while working on changes for PLAB handling improvements I found that
> with these changes compilation would not be successful any more due to
> some compiler code missing some includes to oops/Klass.inline.hpp
> because of the use of the Klass::encode_klass() method.
>
> This change adds the additional includes to these files.
>
> I intend to push this through the hs-rt tree since all other changes
> will be pushed through that tree, but it changes compiler files only, so
> I have it out for review here.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8131344
>
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8131344/webrev/
>
> Testing:
> jprt
>
> Thanks,
>    Thomas
>
>

From edward.nevill at gmail.com  Wed Jul 15 16:18:17 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 15 Jul 2015 17:18:17 +0100
Subject: RFR: 8131358: aarch64: test
	compiler/loopopts/superword/ProdRed_Float.java fails when run with
	debug VM
Message-ID: <1436977097.31596.7.camel@mylittlepony.linaroharston>

Hi,

http://cr.openjdk.java.net/~enevill/8131358/webrev/

fixes a typo in the match rule in vsub2f which causes an assertion failure in jtreg/hotspot test compiler/loopopts/superword/ProdRed_Float.java

Basically, vsub2f was matching AddVF instead of SubVF

Thanks for the review,
Ed.


From aph at redhat.com  Wed Jul 15 16:36:45 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 15 Jul 2015 17:36:45 +0100
Subject: [aarch64-port-dev ] RFR: 8131358: aarch64: test
	compiler/loopopts/superword/ProdRed_Float.java
	fails when run with debug VM
In-Reply-To: <1436977097.31596.7.camel@mylittlepony.linaroharston>
References: <1436977097.31596.7.camel@mylittlepony.linaroharston>
Message-ID: <55A68C1D.9050006@redhat.com>

On 07/15/2015 05:18 PM, Edward Nevill wrote:
> http://cr.openjdk.java.net/~enevill/8131358/webrev/
> 
> fixes a typo in the match rule in vsub2f which causes an assertion failure in jtreg/hotspot test compiler/loopopts/superword/ProdRed_Float.java
> 
> Basically, vsub2f was matching AddVF instead of SubVF

Yes, thanks.

Andrew.


From vladimir.kozlov at oracle.com  Wed Jul 15 16:47:54 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 15 Jul 2015 09:47:54 -0700
Subject: RFR: 8131358: aarch64: test
	compiler/loopopts/superword/ProdRed_Float.java
	fails when run with debug VM
In-Reply-To: <1436977097.31596.7.camel@mylittlepony.linaroharston>
References: <1436977097.31596.7.camel@mylittlepony.linaroharston>
Message-ID: <55A68EBA.9080708@oracle.com>

Looks good.

Thanks,
Vladimir

PS: for small changes like this you need only one review.

On 7/15/15 9:18 AM, Edward Nevill wrote:
> Hi,
>
> http://cr.openjdk.java.net/~enevill/8131358/webrev/
>
> fixes a typo in the match rule in vsub2f which causes an assertion failure in jtreg/hotspot test compiler/loopopts/superword/ProdRed_Float.java
>
> Basically, vsub2f was matching AddVF instead of SubVF
>
> Thanks for the review,
> Ed.
>
>

From zoltan.majo at oracle.com  Wed Jul 15 18:04:53 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 15 Jul 2015 20:04:53 +0200
Subject: [9] RFR(S): 8131326: Enable CheckIntrinsics in all types of builds
In-Reply-To: <55A67E05.5080400@oracle.com>
References: <55A6741B.7050607@oracle.com> <55A67E05.5080400@oracle.com>
Message-ID: <55A6A0C5.5050708@oracle.com>

Thank you, Vladimir, for the review!

Best regards,


Zoltan


On 07/15/2015 05:36 PM, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir
>
> On 7/15/15 7:54 AM, Zolt?n Maj? wrote:
>> Hi,
>>
>>
>> please review the following patch for JDK-8131326.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8131326
>>
>> Problem: The CheckIntrinsics flag added by JDK-8076112 is currently 
>> enabled only in debug builds. As a result, users of
>> product builds might easier oversee potential mismatches between 
>> VM-level and classfile-level intrinsics.
>>
>> Solution: This enhancement enables the flag in all types of builds 
>> (incl. product builds). The check for orphan methods
>> in src/share/vm/classfile/classFileParser.cpp is also controlled by 
>> the CheckIntrinsics flag. To limit the impact of
>> that potentially expensive check on our product builds, this 
>> enhancment proposes to include the check only in debug builds.
>>
>> Webrev: http://cr.openjdk.java.net/~zmajo/8131326/webrev.00/
>>
>> Testing: JPRT run using the hotspot testset; all tests pass.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>


From zoltan.majo at oracle.com  Wed Jul 15 18:15:28 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 15 Jul 2015 20:15:28 +0200
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55A542D0.50100@oracle.com>
References: <55A3EBF8.2020208@oracle.com>
	<55A484F7.4010705@oracle.com>	<55A53862.7040308@oracle.com>
	<55A542D0.50100@oracle.com>
Message-ID: <55A6A340.3050204@oracle.com>

Hi Vladimir,


On 07/14/2015 07:11 PM, Vladimir Kozlov wrote:
> Pass methodHandle parameters to Compile::is_intrinsic_available_for() 
> in C2.
> CompilerOracle::has_option_value() has methodHandle parameter.
>
> And it is safer to use methodHandle.

OK, I updated the method.

>
> And you forgot 'hg add' for new test - it is not in webrev.

Sorry for that. The test is included into the newest webrev:
- top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.02/
- hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.02/

I ran all JPRT tests from the hotspot testset, all pass.

Thank you and best regards,


Zoltan

>
> Thanks,
> Vladimir
>
> On 7/14/15 9:27 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> On 07/14/2015 05:41 AM, Vladimir Kozlov wrote:
>>> Nice work, Zoltan
>>
>> thank you!
>>
>>>
>>> I am worry about CompilerOracle::has_option_value() call and name() 
>>> == vmSymbols:: checks.
>>> WB code does ThreadInVMfromNative transition and calls 
>>> is_intrinsic_available_for() in VM state.
>>> Compiler thread did that only when calling into CompilerOracle 
>>> (ciMethod::has_option_value()), otherwise it use CI.
>>> But with your change compiler does not make transition - it calls 
>>> is_intrinsic_available_for() in Native state. It may
>>> cause problems. Compiler should not access directly VM data. And, on 
>>> other hand, to access CI data (cySymbol) we need
>>> to set up ciEnv but WB current thread is not compiler thread and the 
>>> thread is already in VM state anyway.
>>
>> thank you for catching that problem, I did not think of it before.
>>
>>>
>>> I thought how to solve this problem but nothing simple come up. The 
>>> ugly solution is to have :has_option_value() call
>>> and name() == vmSymbols:: checks outside 
>>> is_intrinsic_available_for(). I mean to duplicate in
>>> Compile::make_vm_intrinsic() and 
>>> C2Compiler::is_intrinsic_available_for(). But I would like to avoid 
>>> cloning code.
>>
>> I agree with you.
>>
>>>
>>> An other approach is Compile::make_vm_intrinsic() can go into VM 
>>> state for is_intrinsic_available_for() call:
>>>
>>>   bool is_available = false;
>>>   {
>>>      VM_ENTRY_MARK;
>>>      methodHandle mh(THREAD, m->get_Method());
>>>      methodHandle ct(THREAD, method()->get_Method());
>>>      is_available = is_intrinsic_available_for(mh, ct, is_virtual);
>>>   }
>>>
>>>
>>> But we usually do that in CI and not in compiler code. Which means 
>>> we have to move the method to CI but we have
>>> Matcher calls.
>>>
>>> I would ask John's opinion on this problem.
>>
>> I would prefer to keep is_intrinsic_available in compiler code 
>> (instead of moving it to CI) because the compiler itself
>> is in a better position than the CI to determine which method it can 
>> intrinsify under which conditions. But I'll ask
>> John as well what he thinks and then we can decide then how to proceed.
>>
>> Until then, I've updated Compile::make_vm_intrinsic() to go into VM 
>> state as you've suggested (please see webrev below).
>>
>>>
>>> You can simplify a little too. Methods 
>>> intrinsic_does_virtual_dispatch_for() and 
>>> intrinsic_predicates_needed_for() can
>>> get just intrinsic id. Similar for methods in C1.
>>
>> I changed that for both C1 and C2.
>>
>>>
>>> There are several method->method_holder()->name() calls which could 
>>> be done only once.
>>
>> I changed that as well.
>>
>>>
>>> Use different name for 'method' local to avoid using Compile::
>>>
>>> +   Method* method = m->get_Method();
>>> +   Method* compilation_context = Compile::method()->get_Method();
>>
>> I've updated the variable name.
>>
>>>
>>> Please, add comment to new test explaining why you chose crc32 
>>> intrinsic. Why?
>>
>> I hoped that using a single test method for all compilation levels 
>> tested will keep the test simple. The crc32 is
>> intrinsic is available with both C1 and C2 and both intrinsics can be 
>> controlled with a single flag, UseCRC32Intrinsics,
>> in both product- and fastdebug builds. I updated the test to clarify 
>> that.
>>
>> Here is the updated webrev (for hotspot):
>> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/
>>
>> I tested with
>> - the hotspot testset with JPRT
>> - all hotspot/compiler JTREG tests
>>
>> All tests pass.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/13/15 9:48 AM, Zolt?n Maj? wrote:
>>>> Hi,
>>>>
>>>>
>>>> please review the following patch for JDK-8130832.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832
>>>>
>>>> Problem: Currently, compiler-related tests frequently use the 
>>>> architecture descriptor string (e.g., "aarch64", "x86") to
>>>> determine if a certain compiler feature is available on the 
>>>> platform where the JVM is executing. If the tested features
>>>> is a compiler intrinsic, using architecture descriptor strings is 
>>>> an inaccurate way of determining if the intrinsic is
>>>> available. The reason is that the availability of compiler 
>>>> intrinsics is guided by many factors (e.g., the value of
>>>> command line flags and instructions available on the platform) and 
>>>> as a result a test might expect an intrinsic to be
>>>> available when it is in fact not available.
>>>>
>>>>
>>>> Solution: This enhancement proposes adding a new WhiteBox method, 
>>>> is_compiled_intrinsic_available(Executable method, int
>>>> compileLevel) that returns true if an intrinsic for method 'method' 
>>>> is available at compile level 'compileLevel' (the
>>>> final API might differ from the proposed API).
>>>>
>>>> To test the new API, a new test, 
>>>> hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is 
>>>> added. Moreover,
>>>> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity 
>>>> are be updated to use the newly added WhiteBox
>>>> method.
>>>>
>>>> Webrev:
>>>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/
>>>> - hotspot: 
>>>> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/
>>>>
>>>> Testing:
>>>> - all JPRT tests in the hotspot testset (incl. all tests in 
>>>> hotspot/compiler/intrinsics/mathexact), all tests pass;
>>>> - all hotspot JTREG tests executed locally on Linux x86_64, all 
>>>> tests pass that pass with an unmodified VM; all tests
>>>> were executed also with -Xcomp;
>>>> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and 
>>>> all tests in hotspot/compiler/intrinsics/mathexact
>>>> executed on arm64, all tests pass;
>>>> - manual testing on Linux x86_64 to verify that the functionality 
>>>> of the DisableIntrinsic flag is preserved.
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>


From vladimir.kozlov at oracle.com  Wed Jul 15 18:44:59 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 15 Jul 2015 11:44:59 -0700
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55A6A340.3050204@oracle.com>
References: <55A3EBF8.2020208@oracle.com>
	<55A484F7.4010705@oracle.com>	<55A53862.7040308@oracle.com>
	<55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com>
Message-ID: <55A6AA2B.5030005@oracle.com>

Looks good to me. We still need John's opinion about state transition.

Thanks,
Vladimir

On 7/15/15 11:15 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> On 07/14/2015 07:11 PM, Vladimir Kozlov wrote:
>> Pass methodHandle parameters to Compile::is_intrinsic_available_for()
>> in C2.
>> CompilerOracle::has_option_value() has methodHandle parameter.
>>
>> And it is safer to use methodHandle.
>
> OK, I updated the method.
>
>>
>> And you forgot 'hg add' for new test - it is not in webrev.
>
> Sorry for that. The test is included into the newest webrev:
> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.02/
> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.02/
>
> I ran all JPRT tests from the hotspot testset, all pass.
>
> Thank you and best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>> On 7/14/15 9:27 AM, Zolt?n Maj? wrote:
>>> Hi Vladimir,
>>>
>>>
>>> On 07/14/2015 05:41 AM, Vladimir Kozlov wrote:
>>>> Nice work, Zoltan
>>>
>>> thank you!
>>>
>>>>
>>>> I am worry about CompilerOracle::has_option_value() call and name()
>>>> == vmSymbols:: checks.
>>>> WB code does ThreadInVMfromNative transition and calls
>>>> is_intrinsic_available_for() in VM state.
>>>> Compiler thread did that only when calling into CompilerOracle
>>>> (ciMethod::has_option_value()), otherwise it use CI.
>>>> But with your change compiler does not make transition - it calls
>>>> is_intrinsic_available_for() in Native state. It may
>>>> cause problems. Compiler should not access directly VM data. And, on
>>>> other hand, to access CI data (cySymbol) we need
>>>> to set up ciEnv but WB current thread is not compiler thread and the
>>>> thread is already in VM state anyway.
>>>
>>> thank you for catching that problem, I did not think of it before.
>>>
>>>>
>>>> I thought how to solve this problem but nothing simple come up. The
>>>> ugly solution is to have :has_option_value() call
>>>> and name() == vmSymbols:: checks outside
>>>> is_intrinsic_available_for(). I mean to duplicate in
>>>> Compile::make_vm_intrinsic() and
>>>> C2Compiler::is_intrinsic_available_for(). But I would like to avoid
>>>> cloning code.
>>>
>>> I agree with you.
>>>
>>>>
>>>> An other approach is Compile::make_vm_intrinsic() can go into VM
>>>> state for is_intrinsic_available_for() call:
>>>>
>>>>   bool is_available = false;
>>>>   {
>>>>      VM_ENTRY_MARK;
>>>>      methodHandle mh(THREAD, m->get_Method());
>>>>      methodHandle ct(THREAD, method()->get_Method());
>>>>      is_available = is_intrinsic_available_for(mh, ct, is_virtual);
>>>>   }
>>>>
>>>>
>>>> But we usually do that in CI and not in compiler code. Which means
>>>> we have to move the method to CI but we have
>>>> Matcher calls.
>>>>
>>>> I would ask John's opinion on this problem.
>>>
>>> I would prefer to keep is_intrinsic_available in compiler code
>>> (instead of moving it to CI) because the compiler itself
>>> is in a better position than the CI to determine which method it can
>>> intrinsify under which conditions. But I'll ask
>>> John as well what he thinks and then we can decide then how to proceed.
>>>
>>> Until then, I've updated Compile::make_vm_intrinsic() to go into VM
>>> state as you've suggested (please see webrev below).
>>>
>>>>
>>>> You can simplify a little too. Methods
>>>> intrinsic_does_virtual_dispatch_for() and
>>>> intrinsic_predicates_needed_for() can
>>>> get just intrinsic id. Similar for methods in C1.
>>>
>>> I changed that for both C1 and C2.
>>>
>>>>
>>>> There are several method->method_holder()->name() calls which could
>>>> be done only once.
>>>
>>> I changed that as well.
>>>
>>>>
>>>> Use different name for 'method' local to avoid using Compile::
>>>>
>>>> +   Method* method = m->get_Method();
>>>> +   Method* compilation_context = Compile::method()->get_Method();
>>>
>>> I've updated the variable name.
>>>
>>>>
>>>> Please, add comment to new test explaining why you chose crc32
>>>> intrinsic. Why?
>>>
>>> I hoped that using a single test method for all compilation levels
>>> tested will keep the test simple. The crc32 is
>>> intrinsic is available with both C1 and C2 and both intrinsics can be
>>> controlled with a single flag, UseCRC32Intrinsics,
>>> in both product- and fastdebug builds. I updated the test to clarify
>>> that.
>>>
>>> Here is the updated webrev (for hotspot):
>>> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/
>>>
>>> I tested with
>>> - the hotspot testset with JPRT
>>> - all hotspot/compiler JTREG tests
>>>
>>> All tests pass.
>>>
>>> Thank you and best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 7/13/15 9:48 AM, Zolt?n Maj? wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> please review the following patch for JDK-8130832.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832
>>>>>
>>>>> Problem: Currently, compiler-related tests frequently use the
>>>>> architecture descriptor string (e.g., "aarch64", "x86") to
>>>>> determine if a certain compiler feature is available on the
>>>>> platform where the JVM is executing. If the tested features
>>>>> is a compiler intrinsic, using architecture descriptor strings is
>>>>> an inaccurate way of determining if the intrinsic is
>>>>> available. The reason is that the availability of compiler
>>>>> intrinsics is guided by many factors (e.g., the value of
>>>>> command line flags and instructions available on the platform) and
>>>>> as a result a test might expect an intrinsic to be
>>>>> available when it is in fact not available.
>>>>>
>>>>>
>>>>> Solution: This enhancement proposes adding a new WhiteBox method,
>>>>> is_compiled_intrinsic_available(Executable method, int
>>>>> compileLevel) that returns true if an intrinsic for method 'method'
>>>>> is available at compile level 'compileLevel' (the
>>>>> final API might differ from the proposed API).
>>>>>
>>>>> To test the new API, a new test,
>>>>> hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is
>>>>> added. Moreover,
>>>>> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity
>>>>> are be updated to use the newly added WhiteBox
>>>>> method.
>>>>>
>>>>> Webrev:
>>>>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/
>>>>> - hotspot:
>>>>> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/
>>>>>
>>>>> Testing:
>>>>> - all JPRT tests in the hotspot testset (incl. all tests in
>>>>> hotspot/compiler/intrinsics/mathexact), all tests pass;
>>>>> - all hotspot JTREG tests executed locally on Linux x86_64, all
>>>>> tests pass that pass with an unmodified VM; all tests
>>>>> were executed also with -Xcomp;
>>>>> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and
>>>>> all tests in hotspot/compiler/intrinsics/mathexact
>>>>> executed on arm64, all tests pass;
>>>>> - manual testing on Linux x86_64 to verify that the functionality
>>>>> of the DisableIntrinsic flag is preserved.
>>>>>
>>>>> Thank you and best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>
>

From john.r.rose at oracle.com  Wed Jul 15 19:11:51 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 15 Jul 2015 12:11:51 -0700
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55A6AA2B.5030005@oracle.com>
References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com>
	<55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com>
	<55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com>
Message-ID: <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>

On Jul 15, 2015, at 11:44 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good to me. We still need John's opinion about state transition.

Just sent a 1-1 reply; here it is FTR.

On Jul 14, 2015, at 9:42 AM, Zolt?n Maj? <zoltan.majo at oracle.com <mailto:zoltan.majo at oracle.com>> wrote:
> 
> So far, I tried Vladimir's solution of going into VM state in Compile::make_vm_intrinsic() [2] and it works well.
> 
> What we could also do is
> - (1) list in the switch statement in is_intrinsic_available_for() the intrinsic ids of all methods of interest (similarly to the way we do it for C1); that would eliminate the need to make checks based on a method's holder;
> - (2) for the DisableIntrinsic checks (that need to call CompilerOracle::has_option_value()we could define a separate method that is called directly from a WhiteBox context and through the CI from make_vm_intrinsic.

This is going to be a good cleanup.  But it is hard to follow, so please
regard my comments as tentative.

Some comments:

I think the term "_for" is a noise word as deprecated in:
 https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-NamingNaming <https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-NamingNaming>

I agree with the tendency to factor stuff (when possible) away from the guts of the compilers.

Suggest Compile::intrinsic_does_virtual_dispatch_for be moved to vmIntrinsics::does_virtual_dispatch.
It's really part of the vmIntrinsics contract.  Same for can_trap (or whatever it is called).
If it can't be wedged into vmSymbols.cpp, then at least consider abstractCompiler.cpp.

Similar comment about is_intrinsic_available[_for].  Because of the dependency
on the compiler tier, it has to be virtual, of course.  Suggest a static
vmIntrinsics::is_disabled_by_flags, to check for compiler-independent disabling logic.
Method::is_intrinsic_disabled is a good thought, but I would suggest making it
a static method on vmIntrinsic, because the Method* pointer is just a wrapper around
the intrinsic_id.  Stripping the Method* would let you avoid a VM_ENTRY_MARK
in ciMethod::* if the context argument if null (true for C1?).

The "flag soup" logic in C2 is frustrating, and may defeat an attempt to factor
the -XX flag checking into vmIntrinsics, but I encourage you to try.
The Matcher calls can be layered on separately, in C2-specific code.
The vm_version checks can go in the same C1/C2-specific layer
as the C2 matcher checks.  (Or perhaps factored into abstractCompiler,
but that may be overkill.)

Regarding your original question:  I would prefer that the VM_ENTRY
logic be confined to the CI, but there is no functional reason the
compiler itself can't do a native-to-VM transition.

? John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150715/25b686a2/attachment-0001.html>

From martijnverburg at gmail.com  Thu Jul 16 08:36:19 2015
From: martijnverburg at gmail.com (Martijn Verburg)
Date: Thu, 16 Jul 2015 09:36:19 +0100
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
	<CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>
	<CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>
	<CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>
Message-ID: <CAP7YuAT+EPk7DU=4BHFh0T494b3-_27NKypZx0o=XoNax78GtA@mail.gmail.com>

Hi all,

Is there a reviewer who can take a look at the webrev? I guess you could
combine the 2nd else / nested if, but I think it reads OK this way.

Cheers,
Martijn

On 12 July 2015 at 13:07, Serkan ?zal <serkan at hazelcast.com> wrote:

> Hi Martjin,
>
> Thanks for your interest and comment for making this thread a little bit
> more hot.
>
>
> From my previous message (
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html
> ):
>
> I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>
>
> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>
>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>
>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>
> }
>
>
> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>
>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>
>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>
> }
>
>
>
> So I run the test by calculating address as:
>
> - *"int * long"* (int is index and long is 8l)
>
> - *"long * long"* (the first long is index and the second long is 8l)
>
> - *"int * int"* (the first int is index and the second int is 8)
>
> Here are the logs:
>
>
> *int * long:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>
> *long * long:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>
> *int * int:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>
> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>
> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>
> same *"base"* and *"index"* instructions. This means that address is scaled one more time because there should be only one scale.
>
>
>
> With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction.
>
> Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory.
>
>
> On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg <martijnverburg at gmail.com
> > wrote:
>
>> Non reviewer here, but I'd add to the comment *why* you don't want to
>> scale again.
>>
>> Cheers,
>> Martijn
>>
>> On 12 July 2015 at 11:29, Serkan ?zal <serkan at hazelcast.com> wrote:
>>
>>> Hi all,
>>>
>>> I have created a webrev for review including the patch and shared for
>>> public access from here:
>>> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>>>
>>> Regards.
>>>
>>> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal <serkan at hazelcast.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have added some logs to show that problem is caused by double scaling
>>>> of offset (index)
>>>>
>>>> Here is my updated (log messages added) reproducer code:
>>>>
>>>>
>>>> int count = 100000;
>>>> long size = count * 8L;
>>>> long baseAddress = unsafe.allocateMemory(size);
>>>> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>>>>                    ", End address: " + Long.toHexString(baseAddress +
>>>> size));
>>>>
>>>> for (int i = 0; i < count; i++) {
>>>>     long address = baseAddress + (i * 8L);
>>>>     System.out.println(
>>>>         "Normal: " + Long.toHexString(address) + ", " +
>>>>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
>>>> 8L)));
>>>>     long expected = i;
>>>>     unsafe.putLong(address, expected);
>>>>     unsafe.getLong(address);
>>>> }
>>>>
>>>>
>>>> After sometime it crashes as
>>>>
>>>>
>>>> ...
>>>> Current thread (0x0000000002068800):  JavaThread "main"
>>>> [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>>>>
>>>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
>>>> ...
>>>> ...
>>>>
>>>>
>>>> And here is output of the execution until crash:
>>>>
>>>> Start address: 58bbcfa0, End address: 58c804a0
>>>> Normal: 58bbcfa0, If double scaled: 58bbcfa0
>>>> Normal: 58bbcfa8, If double scaled: 58bbcfe0
>>>> Normal: 58bbcfb0, If double scaled: 58bbd020
>>>> ...
>>>> ...
>>>> Normal: 58c517b0, If double scaled: 59061020
>>>>
>>>>
>>>> As seen from the logs and crash dump, double scaled version of target
>>>> address (*If double scaled: 59061020*) is the same with the
>>>> problematic address (*siginfo: ExceptionCode=0xc0000005, reading
>>>> address 0x0000000059061020*) that causes to crash while accessing it.
>>>>
>>>> So I think, it is obvious that the crash is caused by wrong
>>>> optimization of index value since index is scaled two times (for
>>>> *Unsafe::put* and *Unsafe::get*) instead of only one time. Then double
>>>> scaled index points to invalid memory address.
>>>>
>>>> Regards.
>>>>
>>>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had dived into the issue with JDK-HotSpot commits and
>>>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>>>>
>>>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>> }
>>>>>
>>>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>> }
>>>>>
>>>>>
>>>>> So I run the test by calculating address as
>>>>> - *"int * long"* (int is index and long is 8l)
>>>>> - *"long * long"* (the first long is index and the second long is 8l)
>>>>> - *"int * int"* (the first int is index and the second int is 8)
>>>>>
>>>>> Here are the logs:
>>>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>>>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>>>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>>>>
>>>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>>>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>>>>> same *"base"* and *"index"* instructions.
>>>>> This means that address is scaled one more time because there should be only one scale.
>>>>>
>>>>>
>>>>> When I debugged the non-problematic run (*"int * int"*),
>>>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>>>>> So there is no scaling.
>>>>> static bool match_index_and_scale(Instruction*  instr,
>>>>>                                   Instruction** index,
>>>>>                                   int*          log2_scale) {
>>>>>   ...
>>>>>
>>>>>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>>>>>   if (arith != NULL) {
>>>>>      ...
>>>>>   }
>>>>>
>>>>>   return false;
>>>>> }
>>>>>
>>>>>
>>>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>>>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>>>>   Instruction* base = NULL;
>>>>>   Instruction* index = NULL;
>>>>>   int          log2_scale;
>>>>>
>>>>>   if (match(x, &base, &index, &log2_scale)) {
>>>>>     x->set_base(base);
>>>>>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>>>>>     if (index != NULL) {
>>>>>       if (index->is_pinned()) {
>>>>>         log2_scale = 0;
>>>>>       } else {
>>>>>         if (log2_scale != 0) {
>>>>>           index->pin();
>>>>>         }
>>>>>       }
>>>>>     }    // /////////////////////////////
>>>>>     x->set_log2_scale(log2_scale);
>>>>>     if (PrintUnsafeOptimization) {
>>>>>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>>     }
>>>>>   }
>>>>> }
>>>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>>>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>>>>
>>>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>>>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>>>>
>>>>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>>>>
>>>>> Regards.
>>>>>
>>>>> --
>>>>>
>>>>> Serkan ?ZAL
>>>>>
>>>>>
>>>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>>>>> converted to
>>>>>> long address = baseAddress + (i * 8)
>>>>>> test passes. Only difference is next long pointer is calculated using
>>>>>> integer 8 instead of long 8.
>>>>>> ```
>>>>>> for (int i = 0; i < count; i++) {
>>>>>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>>>>> of long 8
>>>>>>     long expected = i;
>>>>>>     unsafe.putLong(address, expected);
>>>>>>     long actual = unsafe.getLong(address);
>>>>>>     if (expected != actual) {
>>>>>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>>>>>> actual);
>>>>>>     }
>>>>>> }
>>>>>> ```
>>>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>>>>> >* Hi all,
>>>>>> *>
>>>>>> >* While I was testing my app using java 8, I encountered the previously
>>>>>> *>* reported sun.misc.Unsafe issue.
>>>>>> *>
>>>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>>>>> *>
>>>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>>>>> *>
>>>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>>>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>>>>> *>* "1.9.0-ea-b67".
>>>>>> *>
>>>>>> >* Test is very simple:
>>>>>> *>
>>>>>> >* ```
>>>>>> *>* public static void main(String[] args) throws Exception {
>>>>>> *>*         Unsafe unsafe = findUnsafe();
>>>>>> *>*         // 10000 pass
>>>>>> *>*         // 100000 jvm crash
>>>>>> *>*         // 1000000 fail
>>>>>> *>*         int count = 100000;
>>>>>> *>*         long size = count * 8L;
>>>>>> *>*         long baseAddress = unsafe.allocateMemory(size);
>>>>>> *>
>>>>>> >*         try {
>>>>>> *>*             for (int i = 0; i < count; i++) {
>>>>>> *>*                 long address = baseAddress + (i * 8L);
>>>>>> *>
>>>>>> >*                 long expected = i;
>>>>>> *>*                 unsafe.putLong(address, expected);
>>>>>> *>
>>>>>> >*                 long actual = unsafe.getLong(address);
>>>>>> *>
>>>>>> >*                 if (expected != actual) {
>>>>>> *>*                     throw new AssertionError("Expected: " + expected + ",
>>>>>> *>* Actual: " + actual);
>>>>>> *>*                 }
>>>>>> *>*             }
>>>>>> *>*         } finally {
>>>>>> *>*             unsafe.freeMemory(baseAddress);
>>>>>> *>*         }
>>>>>> *>*     }
>>>>>> *>* ```
>>>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>>>>> *>* failing constantly.
>>>>>> *>
>>>>>> >* - With iteration count 10000, test is passing.
>>>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>>>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>>>>> *>
>>>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>>>>> *>* failing at all.
>>>>>> *>
>>>>>> >* I tested on platforms:
>>>>>> *>* - Centos-7/openjdk-1.8.0.45
>>>>>> *>* - OSX/oraclejdk-1.8.0.40
>>>>>> *>* - OSX/oraclejdk-1.8.0.45
>>>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>>>>> *>
>>>>>> >* Previous issue comment (
>>>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>>>>> *>* both are failing.
>>>>>> *>
>>>>>> >* I'm looking forward to hearing from you.
>>>>>> *>
>>>>>> >* Thanks,
>>>>>> *>* -Mehmet Dogan-
>>>>>> *>* --
>>>>>> *>
>>>>>> >* @mmdogan
>>>>>> *>
>>>>>
>>>>>
>>>>> --
>>>>> Serkan ?ZAL
>>>>> Remotest Software Engineer
>>>>> GSM: +90 542 680 39 18
>>>>> Twitter: @serkan_ozal
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Serkan ?ZAL
>>>> Remotest Software Engineer
>>>> GSM: +90 542 680 39 18
>>>> Twitter: @serkan_ozal
>>>>
>>>
>>>
>>>
>>> --
>>> Serkan ?ZAL
>>> Remotest Software Engineer
>>> GSM: +90 542 680 39 18
>>> Twitter: @serkan_ozal
>>>
>>
>>
>
>
> --
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150716/283b85f4/attachment-0001.html>

From thomas.schatzl at oracle.com  Thu Jul 16 08:41:29 2015
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 16 Jul 2015 10:41:29 +0200
Subject: RFR (XXS): 8131344: Missing klass.inline.hpp include in
	compiler files
In-Reply-To: <55A67EB1.7030203@oracle.com>
References: <1436969982.2282.41.camel@oracle.com> <55A67EB1.7030203@oracle.com>
Message-ID: <1437036089.2361.9.camel@oracle.com>

Hi Vladimir,

On Wed, 2015-07-15 at 08:39 -0700, Vladimir Kozlov wrote:
> Looks fine to me.
> 
> Thanks,
> Vladimir

  thanks for the quick review.

Thanks,
  Thomas


From edward.nevill at gmail.com  Thu Jul 16 08:46:14 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 16 Jul 2015 09:46:14 +0100
Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets
Message-ID: <1437036374.31596.18.camel@mylittlepony.linaroharston>

Hi,

http://cr.openjdk.java.net/~enevill/8131362/webrev.01

Provides support for large spill offsets in C2 on aarch64.

In general the stack offset is limited to (1<<12) * sizeof(type). This is generally sufficient.

However for 128 bit vectors the limit can be as little as +256 bytes. This is because 128 bit vectors may not be 128 bit aligned, therefore they have to use a different form of load/store which only has a 9 bit signed offset instead of the 12 bit unsigned scaled offset. 

This webrev fixes this by allowing stack offsets up to 1<<24 in all cases.

Tested before and after with jtreg hotspot & langtools. In both cases (before and after) the results were:-

Hotspot: passed: 876; failed: 3; error: 7
Langtools: Test results: passed: 3,246; error: 2

I have also tested to ensure that code sequence for large offsets is correct by artificially reducing the limit at which the large code sequence is triggered.

The spill calculation is now done in spill_address in macroAssembler_aarch64.cpp and this is called in all cases.

Address MacroAssembler::spill_address(int size, int offset)
{
  assert(offset >= 0, "spill to negative address?");
  // Offset reachable ?
  //   Not aligned - 9 bits signed offset
  //   Aligned - 12 bits unsigned offset shifted
  Register base = sp;
  if ((offset & (size-1)) && offset >= (1<<8)) {
    add(rscratch2, base, offset & ((1<<12)-1));
    base = rscratch2;
    offset &= ~((1<<12)-1);
  }

  if (offset >= (1<<12) * size) {
    add(rscratch2, base, offset & (((1<<12)-1)<<12));
    base = rscratch2;
    offset &= ~(((1<<12)-1)<<12);
  }

  return Address(base, offset);
}

This can generate up to two additional instructions in the most degenerate cases (an unaligned offset larger than (1<<12) * size).

Thanks for the review,
Ed.


From edward.nevill at gmail.com  Thu Jul 16 09:25:30 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 16 Jul 2015 10:25:30 +0100
Subject: RFR: 8131483 : aarch64: illegal stlxr instructions
Message-ID: <1437038730.31596.26.camel@mylittlepony.linaroharston>

Hi,

http://cr.openjdk.java.net/~enevill/8131483/webrev

Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW.

The instruction in question is

 stlxr(rscratch1, end, rscratch1)

According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does. The relevant section from the ARM ARM is:-

STLXR

For a description of this instruction and the encoding, see STLXR on page C6-702.

CONSTRAINED UNPREDICTABLE behavior

If s == t || (pair && s == t2), then one of the following behaviors can occur:

The instruction is UNDEFINED.
The instruction executes as a NOP.
The instruction performs the store to the specified address, but the value stored is UNKNOWN.

If s == n && n != 31 then one of the following behaviors can occur:

The instruction is UNDEFINED.
The instruction executes as a NOP.
The instruction performs the store to an UNKNOWN address.

Thanks for the review,
Ed.


From aph at redhat.com  Thu Jul 16 09:45:28 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 16 Jul 2015 10:45:28 +0100
Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle
	large stack offsets
In-Reply-To: <1437036374.31596.18.camel@mylittlepony.linaroharston>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>
Message-ID: <55A77D38.7090106@redhat.com>

On 16/07/15 09:46, Edward Nevill wrote:
> Hi,
> 
> http://cr.openjdk.java.net/~enevill/8131362/webrev.01
> 
> Provides support for large spill offsets in C2 on aarch64.

Thanks.  This is a clear improvement over what we have now.

A few minor things.

~((1<<12)-1)  is just  -1<<12

I don't like the way that spill_address silently clobbers rscratch2
and callers of spill_address clobber rscratch1.  This makes me rather
nervous.  We have had recent bugs which were caused by macros assuming
they had exclusive use of scratch registers.  Please consider passing
a destination register down to spill_address and spill_copy128.  I
think if you do that the register usage will be much clearer to the
reader.

Andrew.

From aph at redhat.com  Thu Jul 16 09:49:37 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 16 Jul 2015 10:49:37 +0100
Subject: RFR: 8131483 : aarch64: illegal stlxr instructions
In-Reply-To: <1437038730.31596.26.camel@mylittlepony.linaroharston>
References: <1437038730.31596.26.camel@mylittlepony.linaroharston>
Message-ID: <55A77E31.5070003@redhat.com>

On 16/07/15 10:25, Edward Nevill wrote:
> Hi,
> 
> http://cr.openjdk.java.net/~enevill/8131483/webrev
> 
> Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW.
> 
> The instruction in question is
> 
>  stlxr(rscratch1, end, rscratch1)
> 
> According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does.

OK.

Please assert this in Assembler::stlxr.

Andrew.


From adinn at redhat.com  Thu Jul 16 09:50:59 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 16 Jul 2015 10:50:59 +0100
Subject: RFR: 8131483 : aarch64: illegal stlxr instructions
In-Reply-To: <1437038730.31596.26.camel@mylittlepony.linaroharston>
References: <1437038730.31596.26.camel@mylittlepony.linaroharston>
Message-ID: <55A77E83.7030401@redhat.com>

The fix looks ok to me  (I checked places where this code gets called
and it is safe to clobber rscratch2).

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

On 16/07/15 10:25, Edward Nevill wrote:
> Hi,
> 
> http://cr.openjdk.java.net/~enevill/8131483/webrev
> 
> Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW.
> 
> The instruction in question is
> 
>  stlxr(rscratch1, end, rscratch1)
> 
> According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does. The relevant section from the ARM ARM is:-
> 
> STLXR
> 
> For a description of this instruction and the encoding, see STLXR on page C6-702.
> 
> CONSTRAINED UNPREDICTABLE behavior
> 
> If s == t || (pair && s == t2), then one of the following behaviors can occur:
> 
> The instruction is UNDEFINED.
> The instruction executes as a NOP.
> The instruction performs the store to the specified address, but the value stored is UNKNOWN.
> 
> If s == n && n != 31 then one of the following behaviors can occur:
> 
> The instruction is UNDEFINED.
> The instruction executes as a NOP.
> The instruction performs the store to an UNKNOWN address.
> 
> Thanks for the review,
> Ed.

From adinn at redhat.com  Thu Jul 16 10:03:03 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 16 Jul 2015 11:03:03 +0100
Subject: RFR: 8131483 : aarch64: illegal stlxr instructions
In-Reply-To: <1437038730.31596.26.camel@mylittlepony.linaroharston>
References: <1437038730.31596.26.camel@mylittlepony.linaroharston>
Message-ID: <55A78157.2080101@redhat.com>

The fix looks ok to me. I checked places where this code gets called and
rscratch2 is safe to use.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

On 16/07/15 10:25, Edward Nevill wrote:
> Hi,
> 
> http://cr.openjdk.java.net/~enevill/8131483/webrev
> 
> Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW.
> 
> The instruction in question is
> 
>  stlxr(rscratch1, end, rscratch1)
> 
> According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does. The relevant section from the ARM ARM is:-
> 
> STLXR
> 
> For a description of this instruction and the encoding, see STLXR on page C6-702.
> 
> CONSTRAINED UNPREDICTABLE behavior
> 
> If s == t || (pair && s == t2), then one of the following behaviors can occur:
> 
> The instruction is UNDEFINED.
> The instruction executes as a NOP.
> The instruction performs the store to the specified address, but the value stored is UNKNOWN.
> 
> If s == n && n != 31 then one of the following behaviors can occur:
> 
> The instruction is UNDEFINED.
> The instruction executes as a NOP.
> The instruction performs the store to an UNKNOWN address.
> 
> Thanks for the review,
> Ed.

From goetz.lindenmaier at sap.com  Thu Jul 16 12:25:40 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 16 Jul 2015 12:25:40 +0000
Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned'
	after 8085932.
Message-ID: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap>

Hi,

A new warning kills the build with gcc 4.2.
Could I please get a review and a sponsor for this tiny change?
http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/

Best regards,
  Goetz.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150716/78b7400e/attachment.html>

From edward.nevill at gmail.com  Thu Jul 16 13:52:55 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 16 Jul 2015 14:52:55 +0100
Subject: RFR: 8131483 : aarch64: illegal stlxr instructions
In-Reply-To: <55A77E31.5070003@redhat.com>
References: <1437038730.31596.26.camel@mylittlepony.linaroharston>
	<55A77E31.5070003@redhat.com>
Message-ID: <1437054775.18306.4.camel@mylittlepony.linaroharston>

On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote:
> On 16/07/15 10:25, Edward Nevill wrote:
> > 
> > Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW.
> > 
> > The instruction in question is
> > 
> >  stlxr(rscratch1, end, rscratch1)
> > 
> >
> 
> Please assert this in Assembler::stlxr.

OK. New webrev @

http://cr.openjdk.java.net/~enevill/8131483/webrev.01

Thanks,
Ed.


From aph at redhat.com  Thu Jul 16 13:58:58 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 16 Jul 2015 14:58:58 +0100
Subject: RFR: 8131483 : aarch64: illegal stlxr instructions
In-Reply-To: <1437054775.18306.4.camel@mylittlepony.linaroharston>
References: <1437038730.31596.26.camel@mylittlepony.linaroharston>	
	<55A77E31.5070003@redhat.com>
	<1437054775.18306.4.camel@mylittlepony.linaroharston>
Message-ID: <55A7B8A2.3020402@redhat.com>

On 07/16/2015 02:52 PM, Edward Nevill wrote:
> On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote:
>> On 16/07/15 10:25, Edward Nevill wrote:
>>>
>>> Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW.
>>>
>>> The instruction in question is
>>>
>>>  stlxr(rscratch1, end, rscratch1)
>>>
>>>
>>
>> Please assert this in Assembler::stlxr.
> 
> OK. New webrev @
> 
> http://cr.openjdk.java.net/~enevill/8131483/webrev.01

+    assert(Rs != Rn, "unpredicatable instruction");                  \

"unpredicatable"?  "unpredictable," surely?  :-)

The fix is ok with that spelling change.

Thanks,
Andrew.


From edward.nevill at gmail.com  Thu Jul 16 14:23:24 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 16 Jul 2015 15:23:24 +0100
Subject: [aarch64-port-dev ] RFR: 8131483 : aarch64: illegal stlxr
	instructions
In-Reply-To: <55A7B8A2.3020402@redhat.com>
References: <1437038730.31596.26.camel@mylittlepony.linaroharston>
	<55A77E31.5070003@redhat.com>
	<1437054775.18306.4.camel@mylittlepony.linaroharston>
	<55A7B8A2.3020402@redhat.com>
Message-ID: <1437056604.18306.7.camel@mylittlepony.linaroharston>

On Thu, 2015-07-16 at 14:58 +0100, Andrew Haley wrote:
> On 07/16/2015 02:52 PM, Edward Nevill wrote:
> > On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote:
> >> On 16/07/15 10:25, Edward Nevill wrote:
> >>>
> +    assert(Rs != Rn, "unpredicatable instruction");                  \
> 
> "unpredicatable"?  "unpredictable," surely?  :-)

Oh, what a predictament!

New webrev

http://cr.openjdk.java.net/~enevill/8131483/webrev.02

No need to respond if you are happy with this, but I do need a formal
*R*eviewer please,

Thanks,
Ed.


From koutheir at gmail.com  Thu Jul 16 14:28:43 2015
From: koutheir at gmail.com (Koutheir Attouchi)
Date: Thu, 16 Jul 2015 16:28:43 +0200
Subject: Ahead-Of-Time (AOT) compiler in Hotspot JVM
Message-ID: <CAGq5sPLKcNnVnacU5Egi1RsSSDnsFDCpXhSmtGs41jJhNcKqYg@mail.gmail.com>

Hi,

As far as I know, the Hotspot JVM does not have an Ahead-Of-Time (AOT)
compiler support. Are there any plans to implement this feature?

Thank you.

-- 
Koutheir Attouchi.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150716/a4461333/attachment.html>

From vladimir.kozlov at oracle.com  Thu Jul 16 14:55:20 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 16 Jul 2015 07:55:20 -0700
Subject: RFR: 8131483 : aarch64: illegal stlxr instructions
In-Reply-To: <1437056604.18306.7.camel@mylittlepony.linaroharston>
References: <1437038730.31596.26.camel@mylittlepony.linaroharston>	<55A77E31.5070003@redhat.com>	<1437054775.18306.4.camel@mylittlepony.linaroharston>	<55A7B8A2.3020402@redhat.com>
	<1437056604.18306.7.camel@mylittlepony.linaroharston>
Message-ID: <55A7C5D8.5090907@oracle.com>

Okay.

Thanks,
Vladimir

On 7/16/15 7:23 AM, Edward Nevill wrote:
> On Thu, 2015-07-16 at 14:58 +0100, Andrew Haley wrote:
>> On 07/16/2015 02:52 PM, Edward Nevill wrote:
>>> On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote:
>>>> On 16/07/15 10:25, Edward Nevill wrote:
>>>>>
>> +    assert(Rs != Rn, "unpredicatable instruction");                  \
>>
>> "unpredicatable"?  "unpredictable," surely?  :-)
>
> Oh, what a predictament!
>
> New webrev
>
> http://cr.openjdk.java.net/~enevill/8131483/webrev.02
>
> No need to respond if you are happy with this, but I do need a formal
> *R*eviewer please,
>
> Thanks,
> Ed.
>
>

From edward.nevill at gmail.com  Thu Jul 16 15:13:14 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 16 Jul 2015 16:13:14 +0100
Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle
	large stack offsets
In-Reply-To: <55A77D38.7090106@redhat.com>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>
	<55A77D38.7090106@redhat.com>
Message-ID: <1437059594.18306.20.camel@mylittlepony.linaroharston>

On Thu, 2015-07-16 at 10:45 +0100, Andrew Haley wrote:
> On 16/07/15 09:46, Edward Nevill wrote:
> > Hi,
> > 
> > 
> > Provides support for large spill offsets in C2 on aarch64.
> A few minor things.
> 
> ~((1<<12)-1)  is just  -1<<12

Fixed.

> 
> I don't like the way that spill_address silently clobbers rscratch2
> and callers of spill_address clobber rscratch1.  This makes me rather
> nervous.  We have had recent bugs which were caused by macros assuming
> they had exclusive use of scratch registers.  Please consider passing
> a destination register down to spill_address and spill_copy128.  I
> think if you do that the register usage will be much clearer to the
> reader.

OK. So what I have done is changed the declaration to have a 'tmp' Register which defaults to rscratch2 as follows:-

Address spill_address(int size, int offset, Register tmp=rscratch2);

That way people can see from the header that it needs a tmp which defaults to rscratch2.

Similarly for spill_copy128 we now have

void spill_copy128(int src_offset, int dst_offset,
                   Register tmp1=rscratch1, Register tmp2=rscratch2)

Is this OK? Or do you want to force people to name the tmp registers on every call.

New webrev.

http://cr.openjdk.java.net/~enevill/8131362/webrev.02

Regards,
Ed.


From aph at redhat.com  Thu Jul 16 15:46:30 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 16 Jul 2015 16:46:30 +0100
Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle
	large stack offsets
In-Reply-To: <1437059594.18306.20.camel@mylittlepony.linaroharston>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>	
	<55A77D38.7090106@redhat.com>
	<1437059594.18306.20.camel@mylittlepony.linaroharston>
Message-ID: <55A7D1D6.60508@redhat.com>

On 07/16/2015 04:13 PM, Edward Nevill wrote:
> OK. So what I have done is changed the declaration to have a 'tmp' Register which defaults to rscratch2 as follows:-
> 
> Address spill_address(int size, int offset, Register tmp=rscratch2);
> 
> That way people can see from the header that it needs a tmp which
> defaults to rscratch2.
> 
> Similarly for spill_copy128 we now have
> 
> void spill_copy128(int src_offset, int dst_offset,
>                    Register tmp1=rscratch1, Register tmp2=rscratch2)
> 
> Is this OK? Or do you want to force people to name the tmp registers
> on every call.

OK, I can live with that.

Andrew.


From vladimir.kozlov at oracle.com  Thu Jul 16 18:35:28 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 16 Jul 2015 11:35:28 -0700
Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned'
	after 8085932.
In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap>
References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap>
Message-ID: <55A7F970.3020303@oracle.com>

Hi Goetz,

Looks good.

Do you see also next problem?:

hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant 
is too large for ?long? type
hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant 
is too large for ?long? type

Can you fix and test it too? Use CONST64:

#define CPU_AVX512VL CONST64(0x100000000)


On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote:
> Hi,
>
> A new warning kills the build with gcc 4.2.
>
> Could I please get a review and a sponsor for this tiny change?
>
> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/
>
> Best regards,
>
>    Goetz.
>

From vladimir.kozlov at oracle.com  Thu Jul 16 18:49:05 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 16 Jul 2015 11:49:05 -0700
Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets
In-Reply-To: <1437036374.31596.18.camel@mylittlepony.linaroharston>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>
Message-ID: <55A7FCA1.6010008@oracle.com>

Hi Ed,

Should it be +8 instead of +4? Or these offsets are not in bytes?:

+      unspill(rscratch1, true, src_offset);
+      spill(rscratch1, true, dst_offset);
+      unspill(rscratch1, true, src_offset+4);
+      spill(rscratch1, true, dst_offset+4);

The size of each move is 8 bytes since you specified is64 = true.

 > Hotspot: passed: 876; failed: 3; error: 7
 > Langtools: Test results: passed: 3,246; error: 2

Can you add -ignore:quiet to jtreg commands so that tests which are 
marked @ignore <bugid> are not treated as error:

http://openjdk.java.net/jtreg/command-help.html

Thanks,
Vladimir

On 7/16/15 1:46 AM, Edward Nevill wrote:
> Hi,
>
> http://cr.openjdk.java.net/~enevill/8131362/webrev.01
>
> Provides support for large spill offsets in C2 on aarch64.
>
> In general the stack offset is limited to (1<<12) * sizeof(type). This is generally sufficient.
>
> However for 128 bit vectors the limit can be as little as +256 bytes. This is because 128 bit vectors may not be 128 bit aligned, therefore they have to use a different form of load/store which only has a 9 bit signed offset instead of the 12 bit unsigned scaled offset.
>
> This webrev fixes this by allowing stack offsets up to 1<<24 in all cases.
>
> Tested before and after with jtreg hotspot & langtools. In both cases (before and after) the results were:-
>
> Hotspot: passed: 876; failed: 3; error: 7
> Langtools: Test results: passed: 3,246; error: 2
>
> I have also tested to ensure that code sequence for large offsets is correct by artificially reducing the limit at which the large code sequence is triggered.
>
> The spill calculation is now done in spill_address in macroAssembler_aarch64.cpp and this is called in all cases.
>
> Address MacroAssembler::spill_address(int size, int offset)
> {
>    assert(offset >= 0, "spill to negative address?");
>    // Offset reachable ?
>    //   Not aligned - 9 bits signed offset
>    //   Aligned - 12 bits unsigned offset shifted
>    Register base = sp;
>    if ((offset & (size-1)) && offset >= (1<<8)) {
>      add(rscratch2, base, offset & ((1<<12)-1));
>      base = rscratch2;
>      offset &= ~((1<<12)-1);
>    }
>
>    if (offset >= (1<<12) * size) {
>      add(rscratch2, base, offset & (((1<<12)-1)<<12));
>      base = rscratch2;
>      offset &= ~(((1<<12)-1)<<12);
>    }
>
>    return Address(base, offset);
> }
>
> This can generate up to two additional instructions in the most degenerate cases (an unaligned offset larger than (1<<12) * size).
>
> Thanks for the review,
> Ed.
>
>

From edward.nevill at gmail.com  Fri Jul 17 08:52:39 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Fri, 17 Jul 2015 09:52:39 +0100
Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets
In-Reply-To: <55A7FCA1.6010008@oracle.com>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>
	<55A7FCA1.6010008@oracle.com>
Message-ID: <1437123159.29276.16.camel@mint>

On Thu, 2015-07-16 at 11:49 -0700, Vladimir Kozlov wrote:
> Hi Ed,
> 
> Should it be +8 instead of +4? Or these offsets are not in bytes?:
> 
> +      unspill(rscratch1, true, src_offset);
> +      spill(rscratch1, true, dst_offset);
> +      unspill(rscratch1, true, src_offset+4);
> +      spill(rscratch1, true, dst_offset+4);

Ouch! Good catch.

New webrev.

http://cr.openjdk.java.net/~enevill/8131362/webrev.03/

>  > Hotspot: passed: 876; failed: 3; error: 7
>  > Langtools: Test results: passed: 3,246; error: 2
> 
> Can you add -ignore:quiet to jtreg commands so that tests which are 
> marked @ignore <bugid> are not treated as error:

Yes. I am using the -ignore:quiet option. Here is the command I am using for the hotspot run.

/home/ed/images/jdk-spill2/bin/java -jar lib/jtreg.jar -nr -conc:48 -timeout:3 -othervm -jdk:/home/ed/images/jdk-spill2 -v1 -a -ignore:quiet /home/ed/jdk9-dev/hs-comp/hotspot/test

The hotspot failures and errors are

FAILED: compiler/intrinsics/classcast/NullCheckDroppingsTest.java
FAILED: compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java
FAILED: serviceability/sa/jmap-hashcode/Test8028623.java
Error:  native_sanity/JniVersion.java
Error:  runtime/classFileParserBug/AnnotationTag.java
Error:  runtime/handlerInTry/LoadHandlerInTry.java
Error:  runtime/jni/8033445/DefaultMethods.java
Error:  runtime/jni/8025979/UninitializedStrings.java
Error:  runtime/jni/ToStringInInterfaceTest/ToStringTest.java
Error:  runtime/stackMapCheck/StackMapCheck.java

and the langtools errors

Error:  tools/javac/annotations/typeAnnotations/classfile/T8010015.java
Error:  tools/javac/lambda/LambdaParserTest.java

In both cases the set of errors/failures is the same before and after the patch.

So yes, it is not ideal that we are seeing these. The only ideal number for a regression suite is 0. However it is a separate issue and is on my list of things to look at.

All the best,
Ed.


From goetz.lindenmaier at sap.com  Fri Jul 17 08:54:47 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 17 Jul 2015 08:54:47 +0000
Subject: RFR(XS): 8131676: Fix warning 'negative int converted to
	unsigned' after 8085932.
In-Reply-To: <55A7F970.3020303@oracle.com>
References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap>
	<55A7F970.3020303@oracle.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap>

Hi Vladimir,

Thanks for the review!

I can't reproduce the problem in vm_version_x86.hpp.  I tried on a row of
32 and 64 bit machines with different compilers ...
I fixed it anyways:
http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.02/

Best regards,
  Goetz


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Donnerstag, 16. Juli 2015 20:35
To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler'
Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932.

Hi Goetz,

Looks good.

Do you see also next problem?:

hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant 
is too large for 'long' type
hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant 
is too large for 'long' type

Can you fix and test it too? Use CONST64:

#define CPU_AVX512VL CONST64(0x100000000)


On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote:
> Hi,
>
> A new warning kills the build with gcc 4.2.
>
> Could I please get a review and a sponsor for this tiny change?
>
> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/
>
> Best regards,
>
>    Goetz.
>

From aph at redhat.com  Fri Jul 17 09:01:17 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 17 Jul 2015 10:01:17 +0100
Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets
In-Reply-To: <1437123159.29276.16.camel@mint>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>	<55A7FCA1.6010008@oracle.com>
	<1437123159.29276.16.camel@mint>
Message-ID: <55A8C45D.2070009@redhat.com>

On 17/07/15 09:52, Edward Nevill wrote:
>> > Should it be +8 instead of +4? Or these offsets are not in bytes?:
>> > 
>> > +      unspill(rscratch1, true, src_offset);
>> > +      spill(rscratch1, true, dst_offset);
>> > +      unspill(rscratch1, true, src_offset+4);
>> > +      spill(rscratch1, true, dst_offset+4);
> Ouch! Good catch.
> 
> New webrev.
> 
> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/

I'm a bit more concerned that this did not fail in testing.  I guess
there were no tests at all for stack-stack spills.

Andrew.


From edward.nevill at gmail.com  Fri Jul 17 09:12:39 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Fri, 17 Jul 2015 10:12:39 +0100
Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets
In-Reply-To: <55A8C45D.2070009@redhat.com>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>
	<55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint>
	<55A8C45D.2070009@redhat.com>
Message-ID: <1437124359.29276.18.camel@mint>

On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote:
> On 17/07/15 09:52, Edward Nevill wrote:
> >> > Should it be +8 instead of +4? Or these offsets are not in bytes?:
> >> > 
> >> > +      unspill(rscratch1, true, src_offset);
> >> > +      spill(rscratch1, true, dst_offset);
> >> > +      unspill(rscratch1, true, src_offset+4);
> >> > +      spill(rscratch1, true, dst_offset+4);
> > Ouch! Good catch.
> > 
> > New webrev.
> > 
> > http://cr.openjdk.java.net/~enevill/8131362/webrev.03/
> 
> I'm a bit more concerned that this did not fail in testing.  I guess
> there were no tests at all for stack-stack spills.

Correct. And it would have to be a 128 bit vector stack-stack spill with
an offset >= 512. How would you even provoke such a thing.

Ed.


From aph at redhat.com  Fri Jul 17 09:29:25 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 17 Jul 2015 10:29:25 +0100
Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets
In-Reply-To: <1437124359.29276.18.camel@mint>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>	
	<55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint>	
	<55A8C45D.2070009@redhat.com> <1437124359.29276.18.camel@mint>
Message-ID: <55A8CAF5.5060601@redhat.com>

On 17/07/15 10:12, Edward Nevill wrote:
> On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote:
>> On 17/07/15 09:52, Edward Nevill wrote:
>>>>> Should it be +8 instead of +4? Or these offsets are not in bytes?:
>>>>>
>>>>> +      unspill(rscratch1, true, src_offset);
>>>>> +      spill(rscratch1, true, dst_offset);
>>>>> +      unspill(rscratch1, true, src_offset+4);
>>>>> +      spill(rscratch1, true, dst_offset+4);
>>> Ouch! Good catch.
>>>
>>> New webrev.
>>>
>>> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/
>>
>> I'm a bit more concerned that this did not fail in testing.  I guess
>> there were no tests at all for stack-stack spills.
> 
> Correct. And it would have to be a 128 bit vector stack-stack spill with
> an offset >= 512. How would you even provoke such a thing.

With a highly-vectorizable test case with a zillion temporaries, I guess.
But I don't know why HotSpot would ever do stack-stack spills.  The very
idea of stack-stack spilling makes no sense to me.

Andrew.


From aph at redhat.com  Fri Jul 17 09:43:24 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 17 Jul 2015 10:43:24 +0100
Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle
	large stack offsets
In-Reply-To: <55A8CAF5.5060601@redhat.com>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>		<55A7FCA1.6010008@oracle.com>
	<1437123159.29276.16.camel@mint>		<55A8C45D.2070009@redhat.com>
	<1437124359.29276.18.camel@mint> <55A8CAF5.5060601@redhat.com>
Message-ID: <55A8CE3C.1050203@redhat.com>

On 17/07/15 10:29, Andrew Haley wrote:
> On 17/07/15 10:12, Edward Nevill wrote:
>> On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote:
>>> On 17/07/15 09:52, Edward Nevill wrote:
>>>>>> Should it be +8 instead of +4? Or these offsets are not in bytes?:
>>>>>>
>>>>>> +      unspill(rscratch1, true, src_offset);
>>>>>> +      spill(rscratch1, true, dst_offset);
>>>>>> +      unspill(rscratch1, true, src_offset+4);
>>>>>> +      spill(rscratch1, true, dst_offset+4);
>>>> Ouch! Good catch.
>>>>
>>>> New webrev.
>>>>
>>>> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/
>>>
>>> I'm a bit more concerned that this did not fail in testing.  I guess
>>> there were no tests at all for stack-stack spills.
>>
>> Correct. And it would have to be a 128 bit vector stack-stack spill with
>> an offset >= 512. How would you even provoke such a thing.
> 
> With a highly-vectorizable test case with a zillion temporaries, I guess.

Thinking some more: I think I'd add some special code to test it all
once, then delete the special code.  If that's what it takes, there
isn't much choice.

Andrew.


From aleksey.shipilev at oracle.com  Fri Jul 17 13:29:32 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 17 Jul 2015 16:29:32 +0300
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
Message-ID: <55A9033C.2030302@oracle.com>

Hi there,

C1 is not very good at inlining and intrisifying methods, and hence the
call performance is important there. One nit that we can see in the
generated code on x86 is that C1 uses the single-byte nops, even for
long nop strides.

This improvement fixes that:
  https://bugs.openjdk.java.net/browse/JDK-8131682
  http://cr.openjdk.java.net/~shade/8131682/webrev.00/

Testing:
  - JPRT -testset hotspot on open platforms
  - eyeballing the generated assembly with -XX:TieredStopAtLevel=1

(I understand the symmetric change is going to be needed in closed
parts, but let's polish the open part first).

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150717/195508af/signature.asc>

From vladimir.kozlov at oracle.com  Fri Jul 17 15:21:04 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 17 Jul 2015 08:21:04 -0700
Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets
In-Reply-To: <1437123159.29276.16.camel@mint>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>	
	<55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint>
Message-ID: <55A91D60.5090107@oracle.com>

Looks good.

So it is real failure in testing. Thank you for letting know.

Vladimir

On 7/17/15 1:52 AM, Edward Nevill wrote:
> On Thu, 2015-07-16 at 11:49 -0700, Vladimir Kozlov wrote:
>> Hi Ed,
>>
>> Should it be +8 instead of +4? Or these offsets are not in bytes?:
>>
>> +      unspill(rscratch1, true, src_offset);
>> +      spill(rscratch1, true, dst_offset);
>> +      unspill(rscratch1, true, src_offset+4);
>> +      spill(rscratch1, true, dst_offset+4);
>
> Ouch! Good catch.
>
> New webrev.
>
> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/
>
>>   > Hotspot: passed: 876; failed: 3; error: 7
>>   > Langtools: Test results: passed: 3,246; error: 2
>>
>> Can you add -ignore:quiet to jtreg commands so that tests which are
>> marked @ignore <bugid> are not treated as error:
>
> Yes. I am using the -ignore:quiet option. Here is the command I am using for the hotspot run.
>
> /home/ed/images/jdk-spill2/bin/java -jar lib/jtreg.jar -nr -conc:48 -timeout:3 -othervm -jdk:/home/ed/images/jdk-spill2 -v1 -a -ignore:quiet /home/ed/jdk9-dev/hs-comp/hotspot/test
>
> The hotspot failures and errors are
>
> FAILED: compiler/intrinsics/classcast/NullCheckDroppingsTest.java
> FAILED: compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java
> FAILED: serviceability/sa/jmap-hashcode/Test8028623.java
> Error:  native_sanity/JniVersion.java
> Error:  runtime/classFileParserBug/AnnotationTag.java
> Error:  runtime/handlerInTry/LoadHandlerInTry.java
> Error:  runtime/jni/8033445/DefaultMethods.java
> Error:  runtime/jni/8025979/UninitializedStrings.java
> Error:  runtime/jni/ToStringInInterfaceTest/ToStringTest.java
> Error:  runtime/stackMapCheck/StackMapCheck.java
>
> and the langtools errors
>
> Error:  tools/javac/annotations/typeAnnotations/classfile/T8010015.java
> Error:  tools/javac/lambda/LambdaParserTest.java
>
> In both cases the set of errors/failures is the same before and after the patch.
>
> So yes, it is not ideal that we are seeing these. The only ideal number for a regression suite is 0. However it is a separate issue and is on my list of things to look at.
>
> All the best,
> Ed.
>
>

From vladimir.kozlov at oracle.com  Fri Jul 17 15:23:59 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 17 Jul 2015 08:23:59 -0700
Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned'
	after 8085932.
In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap>
References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap>
	<55A7F970.3020303@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap>
Message-ID: <55A91E0F.2030303@oracle.com>

Looks good. Thank you for fixing second issue. I will push it.

Vladimir

On 7/17/15 1:54 AM, Lindenmaier, Goetz wrote:
> Hi Vladimir,
>
> Thanks for the review!
>
> I can't reproduce the problem in vm_version_x86.hpp.  I tried on a row of
> 32 and 64 bit machines with different compilers ...
> I fixed it anyways:
> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.02/
>
> Best regards,
>    Goetz
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Donnerstag, 16. Juli 2015 20:35
> To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler'
> Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932.
>
> Hi Goetz,
>
> Looks good.
>
> Do you see also next problem?:
>
> hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant
> is too large for 'long' type
> hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant
> is too large for 'long' type
>
> Can you fix and test it too? Use CONST64:
>
> #define CPU_AVX512VL CONST64(0x100000000)
>
>
> On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> A new warning kills the build with gcc 4.2.
>>
>> Could I please get a review and a sponsor for this tiny change?
>>
>> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/
>>
>> Best regards,
>>
>>     Goetz.
>>

From goetz.lindenmaier at sap.com  Fri Jul 17 15:24:40 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 17 Jul 2015 15:24:40 +0000
Subject: RFR(XS): 8131676: Fix warning 'negative int converted to
	unsigned' after 8085932.
In-Reply-To: <55A91E0F.2030303@oracle.com>
References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap>
	<55A7F970.3020303@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap>
	<55A91E0F.2030303@oracle.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC2D006E52@DEWDFEMB12A.global.corp.sap>

Thanks!
 
Best regards,
  Goetz.

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Freitag, 17. Juli 2015 17:24
To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler'
Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932.

Looks good. Thank you for fixing second issue. I will push it.

Vladimir

On 7/17/15 1:54 AM, Lindenmaier, Goetz wrote:
> Hi Vladimir,
>
> Thanks for the review!
>
> I can't reproduce the problem in vm_version_x86.hpp.  I tried on a row of
> 32 and 64 bit machines with different compilers ...
> I fixed it anyways:
> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.02/
>
> Best regards,
>    Goetz
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Donnerstag, 16. Juli 2015 20:35
> To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler'
> Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932.
>
> Hi Goetz,
>
> Looks good.
>
> Do you see also next problem?:
>
> hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant
> is too large for 'long' type
> hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant
> is too large for 'long' type
>
> Can you fix and test it too? Use CONST64:
>
> #define CPU_AVX512VL CONST64(0x100000000)
>
>
> On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> A new warning kills the build with gcc 4.2.
>>
>> Could I please get a review and a sponsor for this tiny change?
>>
>> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/
>>
>> Best regards,
>>
>>     Goetz.
>>

From john.r.rose at oracle.com  Fri Jul 17 19:31:04 2015
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 17 Jul 2015 12:31:04 -0700
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
	<CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>
	<CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>
	<CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>
Message-ID: <46530DA3-EF02-434A-9642-7046CF92729B@oracle.com>

Thanks Serkan and Martijn for reporting and analyzing this.

We had a very similar bug reported internally, and we just integrated a fix:
  http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7

Would you mind checking if it fixes your problem also?

Best wishes,
? John

On Jul 12, 2015, at 5:07 AM, Serkan ?zal <serkan at hazelcast.com> wrote:
> 
> Hi Martjin,
> 
> Thanks for your interest and comment for making this thread a little bit more hot.
> 
> 
> From my previous message (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html>):
> 
> I added some additional logs to "vm/c1/c1_Canonicalizer.cpp":
> 
> 
> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { 
>   if (OptimizeUnsafes) do_UnsafeRawOp(x); 
>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
> }
> 
> 
> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { 
>   if (OptimizeUnsafes) do_UnsafeRawOp(x); 
>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
> }
> 
> 
> 
> So I run the test by calculating address as:
> - "int * long" (int is index and long is 8l)
> - "long * long" (the first long is index and the second long is 8l)
> - "int * int" (the first int is index and the second int is 8)
> 
> Here are the logs:
> 
> 
> int * long:
> 
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
> 
> long * long:
> 
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
> 
> int * int:
> 
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
> 
> As you can see, at the problematic runs ("int * long" and "long * long") there are two scaling.
> One for "Unsafe.put" and the other one is for "Unsafe.get" and these instructions points to 
> same "base" and "index" instructions. This means that address is scaled one more time because there should be only one scale.
> 
> 
> With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction.
> 
> Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html>) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory.
> 
> On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg <martijnverburg at gmail.com <mailto:martijnverburg at gmail.com>> wrote:
> Non reviewer here, but I'd add to the comment *why* you don't want to scale again.
> 
> Cheers,
> Martijn
> 
> On 12 July 2015 at 11:29, Serkan ?zal <serkan at hazelcast.com <mailto:serkan at hazelcast.com>> wrote:
> Hi all,
> 
> I have created a webrev for review including the patch and shared for public access from here: https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html <https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html>
> 
> Regards.
> 
> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal <serkan at hazelcast.com <mailto:serkan at hazelcast.com>> wrote:
> Hi,
> 
> I have added some logs to show that problem is caused by double scaling of offset (index)
> 
> Here is my updated (log messages added) reproducer code:
> 
> 
> int count = 100000;
> long size = count * 8L;
> long baseAddress = unsafe.allocateMemory(size);
> System.out.println("Start address: " + Long.toHexString(baseAddress) + 
>                    ", End address: " + Long.toHexString(baseAddress + size));
> 
> for (int i = 0; i < count; i++) {
>     long address = baseAddress + (i * 8L);
>     System.out.println(
>         "Normal: " + Long.toHexString(address) + ", " + 
>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * 8L)));
>     long expected = i;
>     unsafe.putLong(address, expected);
>     unsafe.getLong(address);
> }
> 
> 
> After sometime it crashes as 
> 
> 
> ...
> Current thread (0x0000000002068800):  JavaThread "main" [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
> 
> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
> ...
> ...
> 
> 
> And here is output of the execution until crash:
> 
> Start address: 58bbcfa0, End address: 58c804a0
> Normal: 58bbcfa0, If double scaled: 58bbcfa0
> Normal: 58bbcfa8, If double scaled: 58bbcfe0
> Normal: 58bbcfb0, If double scaled: 58bbd020
> ...
> ...
> Normal: 58c517b0, If double scaled: 59061020
> 
> 
> As seen from the logs and crash dump, double scaled version of target address (If double scaled: 59061020) is the same with the problematic address (siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020) that causes to crash while accessing it.
> 
> So I think, it is obvious that the crash is caused by wrong optimization of index value since index is scaled two times (for Unsafe::put and Unsafe::get) instead of only one time. Then double scaled index points to invalid memory address.
> 
> Regards.
> 
> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com <mailto:serkan at hazelcast.com>> wrote:
> Hi all,
> 
> I had dived into the issue with JDK-HotSpot commits and 
> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a <http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a>
> 
> Then I added some additional logs to "vm/c1/c1_Canonicalizer.cpp":
> 
> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { 
>   if (OptimizeUnsafes) do_UnsafeRawOp(x); 
>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
> }
> 
> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { 
>   if (OptimizeUnsafes) do_UnsafeRawOp(x); 
>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
> }
> 
> 
> 
> So I run the test by calculating address as 
> - "int * long" (int is index and long is 8l)
> - "long * long" (the first long is index and the second long is 8l)
> - "int * int" (the first int is index and the second int is 8)
> 
> Here are the logs:
> 
> int * long:
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
> 
> long * long:
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
> 
> int * int:
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
> 
> As you can see, at the problematic runs ("int * long" and "long * long") there are two scaling.
> One for "Unsafe.put" and the other one is for "Unsafe.get" and these instructions points to 
> same "base" and "index" instructions. 
> This means that address is scaled one more time because there should be only one scale.
> 
> 
> 
> When I debugged the non-problematic run ("int * int"), 
> I saw that "instr->as_ArithmeticOp();" is always returns "null" then "match_index_and_scale" method returns "false" always.
> So there is no scaling.
> 
> static bool match_index_and_scale(Instruction*  instr,
>                                   Instruction** index,
>                                   int*          log2_scale) {
>   ...
> 
>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>   if (arith != NULL) {
>      ...
>   }
> 
>   return false;
> } 
> 
> 
> 
> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
> 
> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>   Instruction* base = NULL;
>   Instruction* index = NULL;
>   int          log2_scale;
> 
>   if (match(x, &base, &index, &log2_scale)) {
>     x->set_base(base);
>     x->set_index(index);
>     // The fix attempt here
>     // /////////////////////////////
>     if (index != NULL) {
>       if (index->is_pinned()) {
>         log2_scale = 0;
>       } else {
>         if (log2_scale != 0) {
>           index->pin();
>         }
>       } 
>     }
>     // /////////////////////////////
>     x->set_log2_scale(log2_scale);
>     if (PrintUnsafeOptimization) {
>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>     }
>   }
> }
> 
> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
> 
> After this fix, I rerun the problematic test ("int * long") and it works with these logs:
> 
> int * long (after fix):
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
> 
> I am not sure my fix attempt is a really fix or maybe there are better fixes.
> 
> Regards.
> 
> --
> 
> Serkan ?ZAL
>  
> Btw, (thanks to one my colleagues), when address calculation in the loop is
> converted to
> long address = baseAddress + (i * 8)
> test passes. Only difference is next long pointer is calculated using
> integer 8 instead of long 8.
> ```
> for (int i = 0; i < count; i++) {
>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
> of long 8
>     long expected = i;
>     unsafe.putLong(address, expected);
>     long actual = unsafe.getLong(address);
>     if (expected != actual) {
>         throw new AssertionError("Expected: " + expected + ", Actual: " +
> actual);
>     }
> }
> ```
> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
> > Hi all,
> >
> > While I was testing my app using java 8, I encountered the previously
> > reported sun.misc.Unsafe issue.
> >
> > https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
> >
> > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
> >
> > Issue status says it's resolved with resolution "Cannot Reproduce".  But
> > unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
> > "1.9.0-ea-b67".
> >
> > Test is very simple:
> >
> > ```
> > public static void main(String[] args) throws Exception {
> >         Unsafe unsafe = findUnsafe();
> >         // 10000 pass
> >         // 100000 jvm crash
> >         // 1000000 fail
> >         int count = 100000;
> >         long size = count * 8L;
> >         long baseAddress = unsafe.allocateMemory(size);
> >
> >         try {
> >             for (int i = 0; i < count; i++) {
> >                 long address = baseAddress + (i * 8L);
> >
> >                 long expected = i;
> >                 unsafe.putLong(address, expected);
> >
> >                 long actual = unsafe.getLong(address);
> >
> >                 if (expected != actual) {
> >                     throw new AssertionError("Expected: " + expected + ",
> > Actual: " + actual);
> >                 }
> >             }
> >         } finally {
> >             unsafe.freeMemory(baseAddress);
> >         }
> >     }
> > ```
> > It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
> > failing constantly.
> >
> > - With iteration count 10000, test is passing.
> > - With iteration count 100000, jvm is crashing with SIGSEGV.
> > - With iteration count 1000000, test is failing with AssertionError.
> >
> > When one of compilation (-Xint) or inlining (-XX:-Inline) or
> > on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
> > failing at all.
> >
> > I tested on platforms:
> > - Centos-7/openjdk-1.8.0.45
> > - OSX/oraclejdk-1.8.0.40
> > - OSX/oraclejdk-1.8.0.45
> > - OSX/oraclejdk-1.8.0_60-ea-b18
> > - OSX/oraclejdk-1.9.0-ea-b67
> >
> > Previous issue comment (
> > https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
> > says "Cannot reproduce based on the latest version". I hope that latest
> > version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
> > both are failing.
> >
> > I'm looking forward to hearing from you.
> >
> > Thanks,
> > -Mehmet Dogan-
> > --
> >
> > @mmdogan
> >
> 
> -- 
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
> Twitter: @serkan_ozal
> 
> 
> 
> -- 
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
> Twitter: @serkan_ozal
> 
> 
> 
> -- 
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
> Twitter: @serkan_ozal
> 
> 
> 
> 
> -- 
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150717/dba82f86/attachment-0001.html>

From serkan at hazelcast.com  Fri Jul 17 19:49:38 2015
From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=)
Date: Fri, 17 Jul 2015 22:49:38 +0300
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <46530DA3-EF02-434A-9642-7046CF92729B@oracle.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
	<CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>
	<CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>
	<CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>
	<46530DA3-EF02-434A-9642-7046CF92729B@oracle.com>
Message-ID: <CA+Mwa3Ng46vZbYeSVfdTpUzQ+Jgmr7+J+ZPttFG4bvOhUk+LLg@mail.gmail.com>

Hi John,

Yes, I have applied your fix and it works.
Thanks!

Since which JDK version this patch will be there?

Regards.

On Fri, Jul 17, 2015 at 10:31 PM, John Rose <john.r.rose at oracle.com> wrote:

> Thanks Serkan and Martijn for reporting and analyzing this.
>
> We had a very similar bug reported internally, and we just integrated a
> fix:
>   http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7
>
> Would you mind checking if it fixes your problem also?
>
> Best wishes,
> ? John
>
> On Jul 12, 2015, at 5:07 AM, Serkan ?zal <serkan at hazelcast.com> wrote:
>
>
> Hi Martjin,
>
> Thanks for your interest and comment for making this thread a little bit
> more hot.
>
>
> From my previous message (
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html
> ):
>
> I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>
>
> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>
>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>
>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>
> }
>
>
> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>
>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>
>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>
>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>
> }
>
>
>
> So I run the test by calculating address as:
>
> - *"int * long"* (int is index and long is 8l)
>
> - *"long * long"* (the first long is index and the second long is 8l)
>
> - *"int * int"* (the first int is index and the second int is 8)
>
> Here are the logs:
>
>
> *int * long:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>
> *long * long:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>
> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>
> *int * int:*
>
> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>
> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>
> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>
> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>
> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>
> same *"base"* and *"index"* instructions. This means that address is scaled one more time because there should be only one scale.
>
>
>
> With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction.
>
> Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory.
>
>
> On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg <martijnverburg at gmail.com
> > wrote:
>
>> Non reviewer here, but I'd add to the comment *why* you don't want to
>> scale again.
>>
>> Cheers,
>> Martijn
>>
>> On 12 July 2015 at 11:29, Serkan ?zal <serkan at hazelcast.com> wrote:
>>
>>> Hi all,
>>>
>>> I have created a webrev for review including the patch and shared for
>>> public access from here:
>>> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>>>
>>> Regards.
>>>
>>> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal <serkan at hazelcast.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have added some logs to show that problem is caused by double scaling
>>>> of offset (index)
>>>>
>>>> Here is my updated (log messages added) reproducer code:
>>>>
>>>>
>>>> int count = 100000;
>>>> long size = count * 8L;
>>>> long baseAddress = unsafe.allocateMemory(size);
>>>> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>>>>                    ", End address: " + Long.toHexString(baseAddress +
>>>> size));
>>>>
>>>> for (int i = 0; i < count; i++) {
>>>>     long address = baseAddress + (i * 8L);
>>>>     System.out.println(
>>>>         "Normal: " + Long.toHexString(address) + ", " +
>>>>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
>>>> 8L)));
>>>>     long expected = i;
>>>>     unsafe.putLong(address, expected);
>>>>     unsafe.getLong(address);
>>>> }
>>>>
>>>>
>>>> After sometime it crashes as
>>>>
>>>>
>>>> ...
>>>> Current thread (0x0000000002068800):  JavaThread "main"
>>>> [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>>>>
>>>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
>>>> ...
>>>> ...
>>>>
>>>>
>>>> And here is output of the execution until crash:
>>>>
>>>> Start address: 58bbcfa0, End address: 58c804a0
>>>> Normal: 58bbcfa0, If double scaled: 58bbcfa0
>>>> Normal: 58bbcfa8, If double scaled: 58bbcfe0
>>>> Normal: 58bbcfb0, If double scaled: 58bbd020
>>>> ...
>>>> ...
>>>> Normal: 58c517b0, If double scaled: 59061020
>>>>
>>>>
>>>> As seen from the logs and crash dump, double scaled version of target
>>>> address (*If double scaled: 59061020*) is the same with the
>>>> problematic address (*siginfo: ExceptionCode=0xc0000005, reading
>>>> address 0x0000000059061020*) that causes to crash while accessing it.
>>>>
>>>> So I think, it is obvious that the crash is caused by wrong
>>>> optimization of index value since index is scaled two times (for
>>>> *Unsafe::put* and *Unsafe::get*) instead of only one time. Then double
>>>> scaled index points to invalid memory address.
>>>>
>>>> Regards.
>>>>
>>>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had dived into the issue with JDK-HotSpot commits and
>>>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>>>>
>>>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>> }
>>>>>
>>>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>>>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>> }
>>>>>
>>>>>
>>>>> So I run the test by calculating address as
>>>>> - *"int * long"* (int is index and long is 8l)
>>>>> - *"long * long"* (the first long is index and the second long is 8l)
>>>>> - *"int * int"* (the first int is index and the second int is 8)
>>>>>
>>>>> Here are the logs:
>>>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>>>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>>>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>>>>
>>>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>>>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>>>>> same *"base"* and *"index"* instructions.
>>>>> This means that address is scaled one more time because there should be only one scale.
>>>>>
>>>>>
>>>>> When I debugged the non-problematic run (*"int * int"*),
>>>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>>>>> So there is no scaling.
>>>>> static bool match_index_and_scale(Instruction*  instr,
>>>>>                                   Instruction** index,
>>>>>                                   int*          log2_scale) {
>>>>>   ...
>>>>>
>>>>>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>>>>>   if (arith != NULL) {
>>>>>      ...
>>>>>   }
>>>>>
>>>>>   return false;
>>>>> }
>>>>>
>>>>>
>>>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>>>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>>>>   Instruction* base = NULL;
>>>>>   Instruction* index = NULL;
>>>>>   int          log2_scale;
>>>>>
>>>>>   if (match(x, &base, &index, &log2_scale)) {
>>>>>     x->set_base(base);
>>>>>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>>>>>     if (index != NULL) {
>>>>>       if (index->is_pinned()) {
>>>>>         log2_scale = 0;
>>>>>       } else {
>>>>>         if (log2_scale != 0) {
>>>>>           index->pin();
>>>>>         }
>>>>>       }
>>>>>     }    // /////////////////////////////
>>>>>     x->set_log2_scale(log2_scale);
>>>>>     if (PrintUnsafeOptimization) {
>>>>>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>>>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>>>     }
>>>>>   }
>>>>> }
>>>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>>>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>>>>
>>>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>>>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>>>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>>>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>>>>
>>>>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>>>>
>>>>> Regards.
>>>>>
>>>>> --
>>>>>
>>>>> Serkan ?ZAL
>>>>>
>>>>>
>>>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>>>>> converted to
>>>>>> long address = baseAddress + (i * 8)
>>>>>> test passes. Only difference is next long pointer is calculated using
>>>>>> integer 8 instead of long 8.
>>>>>> ```
>>>>>> for (int i = 0; i < count; i++) {
>>>>>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>>>>> of long 8
>>>>>>     long expected = i;
>>>>>>     unsafe.putLong(address, expected);
>>>>>>     long actual = unsafe.getLong(address);
>>>>>>     if (expected != actual) {
>>>>>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>>>>>> actual);
>>>>>>     }
>>>>>> }
>>>>>> ```
>>>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>>>>> >* Hi all,
>>>>>> *>
>>>>>> >* While I was testing my app using java 8, I encountered the previously
>>>>>> *>* reported sun.misc.Unsafe issue.
>>>>>> *>
>>>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>>>>> *>
>>>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>>>>> *>
>>>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>>>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>>>>> *>* "1.9.0-ea-b67".
>>>>>> *>
>>>>>> >* Test is very simple:
>>>>>> *>
>>>>>> >* ```
>>>>>> *>* public static void main(String[] args) throws Exception {
>>>>>> *>*         Unsafe unsafe = findUnsafe();
>>>>>> *>*         // 10000 pass
>>>>>> *>*         // 100000 jvm crash
>>>>>> *>*         // 1000000 fail
>>>>>> *>*         int count = 100000;
>>>>>> *>*         long size = count * 8L;
>>>>>> *>*         long baseAddress = unsafe.allocateMemory(size);
>>>>>> *>
>>>>>> >*         try {
>>>>>> *>*             for (int i = 0; i < count; i++) {
>>>>>> *>*                 long address = baseAddress + (i * 8L);
>>>>>> *>
>>>>>> >*                 long expected = i;
>>>>>> *>*                 unsafe.putLong(address, expected);
>>>>>> *>
>>>>>> >*                 long actual = unsafe.getLong(address);
>>>>>> *>
>>>>>> >*                 if (expected != actual) {
>>>>>> *>*                     throw new AssertionError("Expected: " + expected + ",
>>>>>> *>* Actual: " + actual);
>>>>>> *>*                 }
>>>>>> *>*             }
>>>>>> *>*         } finally {
>>>>>> *>*             unsafe.freeMemory(baseAddress);
>>>>>> *>*         }
>>>>>> *>*     }
>>>>>> *>* ```
>>>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>>>>> *>* failing constantly.
>>>>>> *>
>>>>>> >* - With iteration count 10000, test is passing.
>>>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>>>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>>>>> *>
>>>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>>>>> *>* failing at all.
>>>>>> *>
>>>>>> >* I tested on platforms:
>>>>>> *>* - Centos-7/openjdk-1.8.0.45
>>>>>> *>* - OSX/oraclejdk-1.8.0.40
>>>>>> *>* - OSX/oraclejdk-1.8.0.45
>>>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>>>>> *>
>>>>>> >* Previous issue comment (
>>>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>>>>> *>* both are failing.
>>>>>> *>
>>>>>> >* I'm looking forward to hearing from you.
>>>>>> *>
>>>>>> >* Thanks,
>>>>>> *>* -Mehmet Dogan-
>>>>>> *>* --
>>>>>> *>
>>>>>> >* @mmdogan
>>>>>> *>
>>>>>
>>>>>
>>>>> --
>>>>> Serkan ?ZAL
>>>>> Remotest Software Engineer
>>>>> GSM: +90 542 680 39 18
>>>>> Twitter: @serkan_ozal
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Serkan ?ZAL
>>>> Remotest Software Engineer
>>>> GSM: +90 542 680 39 18
>>>> Twitter: @serkan_ozal
>>>>
>>>
>>>
>>>
>>> --
>>> Serkan ?ZAL
>>> Remotest Software Engineer
>>> GSM: +90 542 680 39 18
>>> Twitter: @serkan_ozal
>>>
>>
>>
>
>
> --
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>
>
>


-- 
Serkan ?ZAL
Remotest Software Engineer
GSM: +90 542 680 39 18
Twitter: @serkan_ozal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150717/fd04d123/attachment-0001.html>

From vladimir.kozlov at oracle.com  Fri Jul 17 20:27:31 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 17 Jul 2015 13:27:31 -0700
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CA+Mwa3Ng46vZbYeSVfdTpUzQ+Jgmr7+J+ZPttFG4bvOhUk+LLg@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>	<CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>	<CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>	<CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>	<46530DA3-EF02-434A-9642-7046CF92729B@oracle.com>
	<CA+Mwa3Ng46vZbYeSVfdTpUzQ+Jgmr7+J+ZPttFG4bvOhUk+LLg@mail.gmail.com>
Message-ID: <55A96533.2050703@oracle.com>

It is in released few days ago JDK 8u51:

http://www.oracle.com/technetwork/java/javase/8u51-relnotes-2587590.html

Regards,
Vladimir

On 7/17/15 12:49 PM, Serkan ?zal wrote:
> Hi John,
>
> Yes, I have applied your fix and it works.
> Thanks!
>
> Since which JDK version this patch will be there?
>
> Regards.
>
> On Fri, Jul 17, 2015 at 10:31 PM, John Rose <john.r.rose at oracle.com
> <mailto:john.r.rose at oracle.com>> wrote:
>
>     Thanks Serkan and Martijn for reporting and analyzing this.
>
>     We had a very similar bug reported internally, and we just
>     integrated a fix:
>     http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7
>
>     Would you mind checking if it fixes your problem also?
>
>     Best wishes,
>     ? John
>
>     On Jul 12, 2015, at 5:07 AM, Serkan ?zal <serkan at hazelcast.com
>     <mailto:serkan at hazelcast.com>> wrote:
>>
>>     Hi Martjin,
>>
>>     Thanks for your interest and comment for making this thread a
>>     little bit more hot.
>>
>>
>>     From my previous message
>>     (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html):
>>
>>         I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>
>>
>>         void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>
>>         if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>
>>         tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id
>>         %d, index = id %d, log2_scale = %d",
>>
>>         x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>
>>         }
>>
>>
>>         void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>
>>         if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>
>>         tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id
>>         %d, index = id %d, log2_scale = %d",
>>
>>         x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>
>>         }
>>
>>
>>
>>         So I run the test by calculating address as:
>>
>>         - *"int * long"* (int is index and long is 8l)
>>
>>         - *"long * long"* (the first long is index and the second long
>>         is 8l)
>>
>>         - *"int * int"* (the first int is index and the second int is 8)
>>
>>         Here are the logs:
>>
>>
>>         *int * long:*
>>
>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>         17, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>         19, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>         21, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>         23, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id
>>         27, log2_scale = 3
>>
>>         Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id
>>         27, log2_scale = 3
>>
>>         *long * long:*
>>
>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>         17, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>         19, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>         21, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>         23, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id
>>         14, log2_scale = 3
>>
>>         Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id
>>         14, log2_scale = 3
>>
>>         *int * int:*
>>
>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>         17, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>         19, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>         21, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>         23, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id
>>         29, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id
>>         29, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id
>>         15, log2_scale = 0
>>
>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id
>>         15, log2_scale = 0
>>
>>         As you can see, at the problematic runs (*"int * long"* and
>>         *"long * long"*) there are two scaling.
>>
>>         One for *"Unsafe.put"* and the other one is for*"Unsafe.get"*
>>         and these instructions points to
>>
>>         same *"base"* and *"index"* instructions. This means that
>>         address is scaled one more time because there should be only
>>         one scale.
>>
>>
>>
>>     With this fix (or attempt since I am not %100 sure if it is
>>     perfect/optimum way or not), I prevent multiple scaling on the
>>     same index instruction.
>>
>>     Also one of my previous messages
>>     (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html)
>>     shows that there are multiple scaling on the index so when it
>>     scaled multiple, anymore it shows somewhere or anywhere in the memory.
>>
>>     On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg
>>     <martijnverburg at gmail.com <mailto:martijnverburg at gmail.com>> wrote:
>>
>>         Non reviewer here, but I'd add to the comment *why* you don't
>>         want to scale again.
>>
>>         Cheers,
>>         Martijn
>>
>>         On 12 July 2015 at 11:29, Serkan ?zal <serkan at hazelcast.com
>>         <mailto:serkan at hazelcast.com>> wrote:
>>
>>             Hi all,
>>
>>             I have created a webrev for review including the patch and
>>             shared for public access from here:
>>             https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>>
>>             Regards.
>>
>>             On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal
>>             <serkan at hazelcast.com <mailto:serkan at hazelcast.com>> wrote:
>>
>>                 Hi,
>>
>>                 I have added some logs to show that problem is caused
>>                 by double scaling of offset (index)
>>
>>                 Here is my updated (log messages added) reproducer code:
>>
>>
>>                 int count = 100000;
>>                 long size = count * 8L;
>>                 long baseAddress = unsafe.allocateMemory(size);
>>                 System.out.println("Start address: " +
>>                 Long.toHexString(baseAddress) +
>>                                    ", End address: " +
>>                 Long.toHexString(baseAddress + size));
>>
>>                 for (int i = 0; i < count; i++) {
>>                     long address = baseAddress + (i * 8L);
>>                     System.out.println(
>>                         "Normal: " + Long.toHexString(address) + ", " +
>>                         "If double scaled: " +
>>                 Long.toHexString(baseAddress + (i * 8L * 8L)));
>>                     long expected = i;
>>                     unsafe.putLong(address, expected);
>>                     unsafe.getLong(address);
>>                 }
>>
>>
>>                 After sometime it crashes as
>>
>>
>>                 ...
>>                 Current thread (0x0000000002068800):  JavaThread
>>                 "main" [_thread_in_Java, id=10412,
>>                 stack(0x00000000023f0000,0x00000000024f0000)]
>>
>>                 siginfo: ExceptionCode=0xc0000005, reading address
>>                 0x0000000059061020
>>                 ...
>>                 ...
>>
>>
>>                 And here is output of the execution until crash:
>>
>>                 Start address: 58bbcfa0, End address: 58c804a0
>>                 Normal: 58bbcfa0, If double scaled: 58bbcfa0
>>                 Normal: 58bbcfa8, If double scaled: 58bbcfe0
>>                 Normal: 58bbcfb0, If double scaled: 58bbd020
>>                 ...
>>                 ...
>>                 Normal: 58c517b0, If double scaled: 59061020
>>
>>
>>                 As seen from the logs and crash dump, double scaled
>>                 version of target address (*If double scaled:
>>                 59061020*) is the same with the problematic address
>>                 (*siginfo: ExceptionCode=0xc0000005, reading address
>>                 0x0000000059061020*) that causes to crash while
>>                 accessing it.
>>
>>                 So I think, it is obvious that the crash is caused by
>>                 wrong optimization of index value since index is
>>                 scaled two times (for *Unsafe::put* and *Unsafe::get*)
>>                 instead of only one time. Then double scaled index
>>                 points to invalid memory address.
>>
>>                 Regards.
>>
>>                 On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal
>>                 <serkan at hazelcast.com <mailto:serkan at hazelcast.com>>
>>                 wrote:
>>
>>                     Hi all, I had dived into the issue with
>>                     JDK-HotSpot commits and the issue arised after
>>                     this commit:
>>                     http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>                     Then I added some additional logs to
>>                     *"vm/c1/c1_Canonicalizer.cpp"*: void
>>                     Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>                     if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>                     tty->print_cr("Canonicalizer: do_UnsafeGetRaw id
>>                     %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(),
>>                     x->log2_scale()); } void
>>                     Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>                     if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>                     tty->print_cr("Canonicalizer: do_UnsafePutRaw id
>>                     %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(),
>>                     x->log2_scale()); }
>>
>>                     So I run the test by calculating address as -
>>                     *"int * long"* (int is index and long is 8l) -
>>                     *"long * long"* (the first long is index and the
>>                     second long is 8l) - *"int * int"* (the first int
>>                     is index and the second int is 8) Here are the
>>                     logs: *int * long:* Canonicalizer: do_UnsafeGetRaw
>>                     id 18: base = id 16, index = id 17, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 20: base = id
>>                     16, index = id 19, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 22: base = id 16, index = id
>>                     21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 24: base = id 16, index = id 23, log2_scale = 0
>>                     Canonicalizer: do_UnsafePutRaw id 33: base = id
>>                     13, index = id 27, log2_scale = 3 Canonicalizer:
>>                     do_UnsafeGetRaw id 36: base = id 13, index = id
>>                     27, log2_scale = 3*long * long:* Canonicalizer:
>>                     do_UnsafeGetRaw id 18: base = id 16, index = id
>>                     17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 20: base = id 16, index = id 19, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 22: base = id
>>                     16, index = id 21, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 24: base = id 16, index = id
>>                     23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw
>>                     id 35: base = id 13, index = id 14, log2_scale = 3
>>                     Canonicalizer: do_UnsafeGetRaw id 37: base = id
>>                     13, index = id 14, log2_scale = 3*int * int:*
>>                     Canonicalizer: do_UnsafeGetRaw id 18: base = id
>>                     16, index = id 17, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 20: base = id 16, index = id
>>                     19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 22: base = id 16, index = id 21, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 24: base = id
>>                     16, index = id 23, log2_scale = 0 Canonicalizer:
>>                     do_UnsafePutRaw id 33: base = id 13, index = id
>>                     29, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 36: base = id 13, index = id 29, log2_scale = 0
>>                     Canonicalizer: do_UnsafePutRaw id 19: base = id 8,
>>                     index = id 15, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 22: base = id 8, index = id 15,
>>                     log2_scale = 0As you can see, at the problematic
>>                     runs (*"int * long"* and *"long * long"*) there
>>                     are two scaling. One for *"Unsafe.put"* and the
>>                     other one is for*"Unsafe.get"* and these
>>                     instructions points to same *"base"* and *"index"*
>>                     instructions. This means that address is scaled
>>                     one more time because there should be only one scale.
>>
>>                     When I debugged the non-problematic run (*"int *
>>                     int"*), I saw that *"instr->as_ArithmeticOp();"*
>>                     is always returns *"null" *then
>>                     *"match_index_and_scale"* method returns*"false"*
>>                     always. So there is no scaling. static bool
>>                     match_index_and_scale(Instruction* instr,
>>                     Instruction** index, int* log2_scale) { ...
>>                     ArithmeticOp* arith = instr->as_ArithmeticOp(); if
>>                     (arith != NULL) { ... } return false; }
>>
>>                     Then I have added my fix attempt to prevent
>>                     multiple scaling for Unsafe instructions points to
>>                     same index instruction like this: void
>>                     Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>                     Instruction* base = NULL; Instruction* index =
>>                     NULL; int log2_scale; if (match(x, &base, &index,
>>                     &log2_scale)) { x->set_base(base);
>>                     x->set_index(index); // The fix attempt here //
>>                     ///////////////////////////// if (index != NULL) {
>>                     if (index->is_pinned()) { log2_scale = 0; } else {
>>                     if (log2_scale != 0) { index->pin(); } } } //
>>                     /////////////////////////////
>>                     x->set_log2_scale(log2_scale); if
>>                     (PrintUnsafeOptimization) {
>>                     tty->print_cr("Canonicalizer: UnsafeRawOp id %d:
>>                     base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(),
>>                     x->log2_scale()); } } } In this fix attempt, if
>>                     there is a scaling for the Unsafe instruction, I
>>                     pin index instruction of that instruction and at
>>                     next calls, if the index instruction is pinned, I
>>                     assummed that there is already scaling so no need
>>                     to another scaling. After this fix, I rerun the
>>                     problematic test (*"int * long"*) and it works
>>                     with these logs: *int * long (after fix):*
>>                     Canonicalizer: do_UnsafeGetRaw id 18: base = id
>>                     16, index = id 17, log2_scale = 0 Canonicalizer:
>>                     do_UnsafeGetRaw id 20: base = id 16, index = id
>>                     19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>                     id 22: base = id 16, index = id 21, log2_scale = 0
>>                     Canonicalizer: do_UnsafeGetRaw id 24: base = id
>>                     16, index = id 23, log2_scale = 0 Canonicalizer:
>>                     do_UnsafePutRaw id 35: base = id 13, index = id
>>                     14, log2_scale = 3 Canonicalizer: do_UnsafeGetRaw
>>                     id 37: base = id 13, index = id 14, log2_scale = 0
>>                     Canonicalizer: do_UnsafePutRaw id 21: base = id 8,
>>                     index = id 11, log2_scale = 3 Canonicalizer:
>>                     do_UnsafeGetRaw id 23: base = id 8, index = id 11,
>>                     log2_scale = 0I am not sure my fix attempt is a
>>                     really fix or maybe there are better fixes.
>>                     Regards. -- Serkan ?ZAL
>>
>>                         Btw, (thanks to one my colleagues), when
>>                         address calculation in the loop is
>>                         converted to long address = baseAddress + (i *
>>                         8) test passes. Only difference is next long
>>                         pointer is calculated using
>>                         integer 8 instead of long 8. ```
>>                         for (int i = 0; i < count; i++) {
>>                         long address = baseAddress + (i * 8); // <---
>>                         here, integer 8 instead
>>                         of long 8 long expected = i;
>>                         unsafe.putLong(address, expected); long actual
>>                         = unsafe.getLong(address); if (expected !=
>>                         actual) {
>>                         throw new AssertionError("Expected: " +
>>                         expected + ", Actual: " +
>>                         actual);
>>                         }
>>                         }
>>                         ``` On Tue, Jun 9, 2015 at 1:07 PM Mehmet
>>                         Dogan <mehmet at hazelcast.com
>>                         <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>>
>>                         wrote: >/Hi all, />
>>                         >/While I was testing my app using java 8, I
>>                         encountered the previously />/reported
>>                         sun.misc.Unsafe issue. />
>>                         >/https://bugs.openjdk.java.net/browse/JDK-8076445
>>                         />
>>                         >/http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html
>>                         />
>>                         >/Issue status says it's resolved with
>>                         resolution "Cannot Reproduce". But
>>                         />/unfortunately it's still reproducible using
>>                         "1.8.0_60-ea-b18" and />/"1.9.0-ea-b67". />
>>                         >/Test is very simple: />
>>                         >/``` />/public static void main(String[]
>>                         args) throws Exception { />/Unsafe unsafe =
>>                         findUnsafe(); />/// 10000 pass />/// 100000
>>                         jvm crash />/// 1000000 fail />/int count =
>>                         100000; />/long size = count * 8L; />/long
>>                         baseAddress = unsafe.allocateMemory(size); />
>>                         >/try { />/for (int i = 0; i < count; i++) {
>>                         />/long address = baseAddress + (i * 8L); />
>>                         >/long expected = i;
>>                         />/unsafe.putLong(address, expected); />
>>                         >/long actual = unsafe.getLong(address); />
>>                         >/if (expected != actual) { />/throw new
>>                         AssertionError("Expected: " + expected + ",
>>                         />/Actual: " + actual); />/} />/} />/} finally
>>                         { />/unsafe.freeMemory(baseAddress); />/} />/}
>>                         />/``` />/It's not failing up to version
>>                         1.8.0.31, by starting 1.8.0.40 test is
>>                         />/failing constantly. />
>>                         >/- With iteration count 10000, test is
>>                         passing. />/- With iteration count 100000, jvm
>>                         is crashing with SIGSEGV. />/- With iteration
>>                         count 1000000, test is failing with
>>                         AssertionError. />
>>                         >/When one of compilation (-Xint) or inlining
>>                         (-XX:-Inline) or />/on-stack-replacement
>>                         (-XX:-UseOnStackReplacement) is disabled, test
>>                         is not />/failing at all. />
>>                         >/I tested on platforms: />/-
>>                         Centos-7/openjdk-1.8.0.45 />/-
>>                         OSX/oraclejdk-1.8.0.40 />/-
>>                         OSX/oraclejdk-1.8.0.45 />/-
>>                         OSX/oraclejdk-1.8.0_60-ea-b18 />/-
>>                         OSX/oraclejdk-1.9.0-ea-b67 />
>>                         >/Previous issue comment (
>>                         />/https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043)
>>                         />/says "Cannot reproduce based on the latest
>>                         version". I hope that latest />/version is not
>>                         mentioning to '1.8.0_60-ea-b18' or
>>                         '1.9.0-ea-b67'. Because />/both are failing. />
>>                         >/I'm looking forward to hearing from you. />
>>                         >/Thanks, />/-Mehmet Dogan- />/-- />
>>                         >/@mmdogan />
>>
>>
>>                     --
>>                     Serkan ?ZAL
>>                     Remotest Software Engineer
>>                     GSM: +90 542 680 39 18
>>                     <tel:%2B90%20542%20680%2039%2018>
>>                     Twitter: @serkan_ozal
>>
>>
>>
>>
>>                 --
>>                 Serkan ?ZAL
>>                 Remotest Software Engineer
>>                 GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>                 Twitter: @serkan_ozal
>>
>>
>>
>>
>>             --
>>             Serkan ?ZAL
>>             Remotest Software Engineer
>>             GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>             Twitter: @serkan_ozal
>>
>>
>>
>>
>>
>>     --
>>     Serkan ?ZAL
>>     Remotest Software Engineer
>>     GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>     Twitter: @serkan_ozal
>
>
>
>
> --
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal

From vlad.ureche at gmail.com  Fri Jul 17 22:47:15 2015
From: vlad.ureche at gmail.com (Vlad Ureche)
Date: Sat, 18 Jul 2015 00:47:15 +0200
Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of
	Compile::unique() in appropriate places
Message-ID: <CAKBs1_rWjrBpsj9oUe8d1NgkVtaUePVHeSUaB_DXSdQpW1TkQw@mail.gmail.com>

Hi,

Please review the following patch for JDK-8011858. Big thanks to Vladimir
Kozlov for his patient guidance while working on this!

*Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858

*Problem:* Throughout C2, local stacks are used to prevent recursive calls
from blowing up the system stack. These are sized based on the total number
of nodes in the compilation run (e.g. C->unique()). Instead, they should be
sized based on the live node count (C->live_nodes()).

Now, with the increased difference between live_nodes (limited at
LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go up
to 240K), it is important to not over-estimate the size of stacks.

*Solution:* This patch mirrors a patch written by Vladimir Kozlov for
JDK8u. It replaces the initial sizes from C->unique() to C->live_nodes(),
preserving any shifts (divisions) and offsets. For example, in the compile.cpp
patch
<http://cr.openjdk.java.net/~kvn/8011858/webrev/src/share/vm/opto/compile.cpp.patch>
:

-  Node_Stack nstack(unique() >> 1);
+  Node_Stack nstack(live_nodes() >> 1);

There is an issue described at
https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the
workaround from Vladimir?s patch.

*Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or
http://vladureche.ro/webrev/8011858
(updated, includes a link to bug 8121702)

*Tests:* Running jtreg with the compiler, runtime and gc tests on the dev
<http://hg.openjdk.java.net/jdk9/dev> branch shows the same status before
and after the patch: 808 tests passed, 16 failed and 6 errors
<http://vladureche.ro/webrev/8011858/JTreport/html/index.html>. What would
be a stable point where all tests are expected to pass, so I can test the
patch there? Maybe jdk9 <http://hg.openjdk.java.net/jdk9/jdk9>?

Thanks,
Vlad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150718/e5ec1008/attachment-0001.html>

From vladimir.kozlov at oracle.com  Sat Jul 18 01:24:06 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 17 Jul 2015 18:24:06 -0700
Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of
	Compile::unique() in appropriate places
In-Reply-To: <CAKBs1_rWjrBpsj9oUe8d1NgkVtaUePVHeSUaB_DXSdQpW1TkQw@mail.gmail.com>
References: <CAKBs1_rWjrBpsj9oUe8d1NgkVtaUePVHeSUaB_DXSdQpW1TkQw@mail.gmail.com>
Message-ID: <55A9AAB6.50505@oracle.com>

Thank you, Vlad

It looks good. We usually don't put bug id into comments. So your 
previous version on cr.openjdk is fine.

Second reviewer should look on and sponsor it with you listed as 
contributor (I see you signed OCA already).

Thanks,
Vladimir

On 7/17/15 3:47 PM, Vlad Ureche wrote:
> Hi,
>
> Please review the following patch for JDK-8011858. Big thanks to
> Vladimir Kozlov for his patient guidance while working on this!
>
> *Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858
>
> *Problem:* Throughout C2, local stacks are used to prevent recursive
> calls from blowing up the system stack. These are sized based on the
> total number of nodes in the compilation run (e.g. C->unique()).
> Instead, they should be sized based on the live node count
> (C->live_nodes()).
>
> Now, with the increased difference between live_nodes (limited at
> LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go
> up to 240K), it is important to not over-estimate the size of stacks.
>
> *Solution:* This patch mirrors a patch written by Vladimir Kozlov for
> JDK8u. It replaces the initial sizes from C->unique() to
> C->live_nodes(), preserving any shifts (divisions) and offsets. For
> example, in the compile.cpp patch
> <http://cr.openjdk.java.net/~kvn/8011858/webrev/src/share/vm/opto/compile.cpp.patch>:
>
> |-  Node_Stack nstack(unique() >> 1);
> +  Node_Stack nstack(live_nodes() >> 1);
> |
>
> There is an issue described at
> https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the
> workaround from Vladimir?s patch.
>
> *Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or
> http://vladureche.ro/webrev/8011858
> <http://vladureche.ro/webrev/8011858>(updated, includes a link to bug
> 8121702)
>
> *Tests:* Running jtreg with the compiler, runtime and gc tests on the
> dev <http://hg.openjdk.java.net/jdk9/dev> branch shows the same status
> before and after the patch: 808 tests passed, 16 failed and 6 errors
> <http://vladureche.ro/webrev/8011858/JTreport/html/index.html>. What
> would be a stable point where all tests are expected to pass, so I can
> test the patch there? Maybe jdk9 <http://hg.openjdk.java.net/jdk9/jdk9>?
>
> Thanks,
> Vlad
>

From martijnverburg at gmail.com  Sat Jul 18 06:43:39 2015
From: martijnverburg at gmail.com (Martijn Verburg)
Date: Sat, 18 Jul 2015 07:43:39 +0100
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <55A96533.2050703@oracle.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
	<CA+Mwa3Mbj+v+UNiCQ2H+kX_26Tj6GAtPcGK0W6L1u87V=n5wcQ@mail.gmail.com>
	<CAP7YuAQb7S6mWPMpnb2rN53bG7xQESB1KZPGXYxymUFJycvHEA@mail.gmail.com>
	<CA+Mwa3PGr5bp+K2vS8qfM8gePmbiQR75kp_MhEh=jem3Du815w@mail.gmail.com>
	<46530DA3-EF02-434A-9642-7046CF92729B@oracle.com>
	<CA+Mwa3Ng46vZbYeSVfdTpUzQ+Jgmr7+J+ZPttFG4bvOhUk+LLg@mail.gmail.com>
	<55A96533.2050703@oracle.com>
Message-ID: <CAP7YuARimdEmWUej+GqfZMM_niGicxGSzAiMW56u0Ze9TZf1Aw@mail.gmail.com>

Fix works for me as well - thanks for following up, appreciate this was an
obscure one in an officially unsupported API

On Friday, 17 July 2015, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> It is in released few days ago JDK 8u51:
>
> http://www.oracle.com/technetwork/java/javase/8u51-relnotes-2587590.html
>
> Regards,
> Vladimir
>
> On 7/17/15 12:49 PM, Serkan ?zal wrote:
>
>> Hi John,
>>
>> Yes, I have applied your fix and it works.
>> Thanks!
>>
>> Since which JDK version this patch will be there?
>>
>> Regards.
>>
>> On Fri, Jul 17, 2015 at 10:31 PM, John Rose <john.r.rose at oracle.com
>> <mailto:john.r.rose at oracle.com>> wrote:
>>
>>     Thanks Serkan and Martijn for reporting and analyzing this.
>>
>>     We had a very similar bug reported internally, and we just
>>     integrated a fix:
>>     http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7
>>
>>     Would you mind checking if it fixes your problem also?
>>
>>     Best wishes,
>>     ? John
>>
>>     On Jul 12, 2015, at 5:07 AM, Serkan ?zal <serkan at hazelcast.com
>>     <mailto:serkan at hazelcast.com>> wrote:
>>
>>>
>>>     Hi Martjin,
>>>
>>>     Thanks for your interest and comment for making this thread a
>>>     little bit more hot.
>>>
>>>
>>>     From my previous message
>>>     (
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html
>>> ):
>>>
>>>         I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>>>
>>>
>>>         void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>
>>>         if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>
>>>         tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id
>>>         %d, index = id %d, log2_scale = %d",
>>>
>>>         x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>
>>>         }
>>>
>>>
>>>         void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>
>>>         if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>
>>>         tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id
>>>         %d, index = id %d, log2_scale = %d",
>>>
>>>         x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>>
>>>         }
>>>
>>>
>>>
>>>         So I run the test by calculating address as:
>>>
>>>         - *"int * long"* (int is index and long is 8l)
>>>
>>>         - *"long * long"* (the first long is index and the second long
>>>         is 8l)
>>>
>>>         - *"int * int"* (the first int is index and the second int is 8)
>>>
>>>         Here are the logs:
>>>
>>>
>>>         *int * long:*
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>>         17, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>>         19, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>>         21, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>>         23, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id
>>>         27, log2_scale = 3
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id
>>>         27, log2_scale = 3
>>>
>>>         *long * long:*
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>>         17, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>>         19, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>>         21, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>>         23, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id
>>>         14, log2_scale = 3
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id
>>>         14, log2_scale = 3
>>>
>>>         *int * int:*
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id
>>>         17, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id
>>>         19, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id
>>>         21, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id
>>>         23, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id
>>>         29, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id
>>>         29, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id
>>>         15, log2_scale = 0
>>>
>>>         Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id
>>>         15, log2_scale = 0
>>>
>>>         As you can see, at the problematic runs (*"int * long"* and
>>>         *"long * long"*) there are two scaling.
>>>
>>>         One for *"Unsafe.put"* and the other one is for*"Unsafe.get"*
>>>         and these instructions points to
>>>
>>>         same *"base"* and *"index"* instructions. This means that
>>>         address is scaled one more time because there should be only
>>>         one scale.
>>>
>>>
>>>
>>>     With this fix (or attempt since I am not %100 sure if it is
>>>     perfect/optimum way or not), I prevent multiple scaling on the
>>>     same index instruction.
>>>
>>>     Also one of my previous messages
>>>     (
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html
>>> )
>>>     shows that there are multiple scaling on the index so when it
>>>     scaled multiple, anymore it shows somewhere or anywhere in the
>>> memory.
>>>
>>>     On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg
>>>     <martijnverburg at gmail.com <mailto:martijnverburg at gmail.com>> wrote:
>>>
>>>         Non reviewer here, but I'd add to the comment *why* you don't
>>>         want to scale again.
>>>
>>>         Cheers,
>>>         Martijn
>>>
>>>         On 12 July 2015 at 11:29, Serkan ?zal <serkan at hazelcast.com
>>>         <mailto:serkan at hazelcast.com>> wrote:
>>>
>>>             Hi all,
>>>
>>>             I have created a webrev for review including the patch and
>>>             shared for public access from here:
>>>             https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html
>>>
>>>             Regards.
>>>
>>>             On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal
>>>             <serkan at hazelcast.com <mailto:serkan at hazelcast.com>> wrote:
>>>
>>>                 Hi,
>>>
>>>                 I have added some logs to show that problem is caused
>>>                 by double scaling of offset (index)
>>>
>>>                 Here is my updated (log messages added) reproducer code:
>>>
>>>
>>>                 int count = 100000;
>>>                 long size = count * 8L;
>>>                 long baseAddress = unsafe.allocateMemory(size);
>>>                 System.out.println("Start address: " +
>>>                 Long.toHexString(baseAddress) +
>>>                                    ", End address: " +
>>>                 Long.toHexString(baseAddress + size));
>>>
>>>                 for (int i = 0; i < count; i++) {
>>>                     long address = baseAddress + (i * 8L);
>>>                     System.out.println(
>>>                         "Normal: " + Long.toHexString(address) + ", " +
>>>                         "If double scaled: " +
>>>                 Long.toHexString(baseAddress + (i * 8L * 8L)));
>>>                     long expected = i;
>>>                     unsafe.putLong(address, expected);
>>>                     unsafe.getLong(address);
>>>                 }
>>>
>>>
>>>                 After sometime it crashes as
>>>
>>>
>>>                 ...
>>>                 Current thread (0x0000000002068800):  JavaThread
>>>                 "main" [_thread_in_Java, id=10412,
>>>                 stack(0x00000000023f0000,0x00000000024f0000)]
>>>
>>>                 siginfo: ExceptionCode=0xc0000005, reading address
>>>                 0x0000000059061020
>>>                 ...
>>>                 ...
>>>
>>>
>>>                 And here is output of the execution until crash:
>>>
>>>                 Start address: 58bbcfa0, End address: 58c804a0
>>>                 Normal: 58bbcfa0, If double scaled: 58bbcfa0
>>>                 Normal: 58bbcfa8, If double scaled: 58bbcfe0
>>>                 Normal: 58bbcfb0, If double scaled: 58bbd020
>>>                 ...
>>>                 ...
>>>                 Normal: 58c517b0, If double scaled: 59061020
>>>
>>>
>>>                 As seen from the logs and crash dump, double scaled
>>>                 version of target address (*If double scaled:
>>>                 59061020*) is the same with the problematic address
>>>                 (*siginfo: ExceptionCode=0xc0000005, reading address
>>>                 0x0000000059061020*) that causes to crash while
>>>                 accessing it.
>>>
>>>                 So I think, it is obvious that the crash is caused by
>>>                 wrong optimization of index value since index is
>>>                 scaled two times (for *Unsafe::put* and *Unsafe::get*)
>>>                 instead of only one time. Then double scaled index
>>>                 points to invalid memory address.
>>>
>>>                 Regards.
>>>
>>>                 On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal
>>>                 <serkan at hazelcast.com <mailto:serkan at hazelcast.com>>
>>>                 wrote:
>>>
>>>                     Hi all, I had dived into the issue with
>>>                     JDK-HotSpot commits and the issue arised after
>>>                     this commit:
>>>
>>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>>                     Then I added some additional logs to
>>>                     *"vm/c1/c1_Canonicalizer.cpp"*: void
>>>                     Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>>                     if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>                     tty->print_cr("Canonicalizer: do_UnsafeGetRaw id
>>>                     %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(),
>>>                     x->log2_scale()); } void
>>>                     Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>>                     if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>>                     tty->print_cr("Canonicalizer: do_UnsafePutRaw id
>>>                     %d: base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(),
>>>                     x->log2_scale()); }
>>>
>>>                     So I run the test by calculating address as -
>>>                     *"int * long"* (int is index and long is 8l) -
>>>                     *"long * long"* (the first long is index and the
>>>                     second long is 8l) - *"int * int"* (the first int
>>>                     is index and the second int is 8) Here are the
>>>                     logs: *int * long:* Canonicalizer: do_UnsafeGetRaw
>>>                     id 18: base = id 16, index = id 17, log2_scale = 0
>>>                     Canonicalizer: do_UnsafeGetRaw id 20: base = id
>>>                     16, index = id 19, log2_scale = 0 Canonicalizer:
>>>                     do_UnsafeGetRaw id 22: base = id 16, index = id
>>>                     21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>>                     id 24: base = id 16, index = id 23, log2_scale = 0
>>>                     Canonicalizer: do_UnsafePutRaw id 33: base = id
>>>                     13, index = id 27, log2_scale = 3 Canonicalizer:
>>>                     do_UnsafeGetRaw id 36: base = id 13, index = id
>>>                     27, log2_scale = 3*long * long:* Canonicalizer:
>>>                     do_UnsafeGetRaw id 18: base = id 16, index = id
>>>                     17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>>                     id 20: base = id 16, index = id 19, log2_scale = 0
>>>                     Canonicalizer: do_UnsafeGetRaw id 22: base = id
>>>                     16, index = id 21, log2_scale = 0 Canonicalizer:
>>>                     do_UnsafeGetRaw id 24: base = id 16, index = id
>>>                     23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw
>>>                     id 35: base = id 13, index = id 14, log2_scale = 3
>>>                     Canonicalizer: do_UnsafeGetRaw id 37: base = id
>>>                     13, index = id 14, log2_scale = 3*int * int:*
>>>                     Canonicalizer: do_UnsafeGetRaw id 18: base = id
>>>                     16, index = id 17, log2_scale = 0 Canonicalizer:
>>>                     do_UnsafeGetRaw id 20: base = id 16, index = id
>>>                     19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>>                     id 22: base = id 16, index = id 21, log2_scale = 0
>>>                     Canonicalizer: do_UnsafeGetRaw id 24: base = id
>>>                     16, index = id 23, log2_scale = 0 Canonicalizer:
>>>                     do_UnsafePutRaw id 33: base = id 13, index = id
>>>                     29, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>>                     id 36: base = id 13, index = id 29, log2_scale = 0
>>>                     Canonicalizer: do_UnsafePutRaw id 19: base = id 8,
>>>                     index = id 15, log2_scale = 0 Canonicalizer:
>>>                     do_UnsafeGetRaw id 22: base = id 8, index = id 15,
>>>                     log2_scale = 0As you can see, at the problematic
>>>                     runs (*"int * long"* and *"long * long"*) there
>>>                     are two scaling. One for *"Unsafe.put"* and the
>>>                     other one is for*"Unsafe.get"* and these
>>>                     instructions points to same *"base"* and *"index"*
>>>                     instructions. This means that address is scaled
>>>                     one more time because there should be only one scale.
>>>
>>>                     When I debugged the non-problematic run (*"int *
>>>                     int"*), I saw that *"instr->as_ArithmeticOp();"*
>>>                     is always returns *"null" *then
>>>                     *"match_index_and_scale"* method returns*"false"*
>>>                     always. So there is no scaling. static bool
>>>                     match_index_and_scale(Instruction* instr,
>>>                     Instruction** index, int* log2_scale) { ...
>>>                     ArithmeticOp* arith = instr->as_ArithmeticOp(); if
>>>                     (arith != NULL) { ... } return false; }
>>>
>>>                     Then I have added my fix attempt to prevent
>>>                     multiple scaling for Unsafe instructions points to
>>>                     same index instruction like this: void
>>>                     Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>>                     Instruction* base = NULL; Instruction* index =
>>>                     NULL; int log2_scale; if (match(x, &base, &index,
>>>                     &log2_scale)) { x->set_base(base);
>>>                     x->set_index(index); // The fix attempt here //
>>>                     ///////////////////////////// if (index != NULL) {
>>>                     if (index->is_pinned()) { log2_scale = 0; } else {
>>>                     if (log2_scale != 0) { index->pin(); } } } //
>>>                     /////////////////////////////
>>>                     x->set_log2_scale(log2_scale); if
>>>                     (PrintUnsafeOptimization) {
>>>                     tty->print_cr("Canonicalizer: UnsafeRawOp id %d:
>>>                     base = id %d, index = id %d, log2_scale = %d",
>>>                     x->id(), x->base()->id(), x->index()->id(),
>>>                     x->log2_scale()); } } } In this fix attempt, if
>>>                     there is a scaling for the Unsafe instruction, I
>>>                     pin index instruction of that instruction and at
>>>                     next calls, if the index instruction is pinned, I
>>>                     assummed that there is already scaling so no need
>>>                     to another scaling. After this fix, I rerun the
>>>                     problematic test (*"int * long"*) and it works
>>>                     with these logs: *int * long (after fix):*
>>>                     Canonicalizer: do_UnsafeGetRaw id 18: base = id
>>>                     16, index = id 17, log2_scale = 0 Canonicalizer:
>>>                     do_UnsafeGetRaw id 20: base = id 16, index = id
>>>                     19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw
>>>                     id 22: base = id 16, index = id 21, log2_scale = 0
>>>                     Canonicalizer: do_UnsafeGetRaw id 24: base = id
>>>                     16, index = id 23, log2_scale = 0 Canonicalizer:
>>>                     do_UnsafePutRaw id 35: base = id 13, index = id
>>>                     14, log2_scale = 3 Canonicalizer: do_UnsafeGetRaw
>>>                     id 37: base = id 13, index = id 14, log2_scale = 0
>>>                     Canonicalizer: do_UnsafePutRaw id 21: base = id 8,
>>>                     index = id 11, log2_scale = 3 Canonicalizer:
>>>                     do_UnsafeGetRaw id 23: base = id 8, index = id 11,
>>>                     log2_scale = 0I am not sure my fix attempt is a
>>>                     really fix or maybe there are better fixes.
>>>                     Regards. -- Serkan ?ZAL
>>>
>>>                         Btw, (thanks to one my colleagues), when
>>>                         address calculation in the loop is
>>>                         converted to long address = baseAddress + (i *
>>>                         8) test passes. Only difference is next long
>>>                         pointer is calculated using
>>>                         integer 8 instead of long 8. ```
>>>                         for (int i = 0; i < count; i++) {
>>>                         long address = baseAddress + (i * 8); // <---
>>>                         here, integer 8 instead
>>>                         of long 8 long expected = i;
>>>                         unsafe.putLong(address, expected); long actual
>>>                         = unsafe.getLong(address); if (expected !=
>>>                         actual) {
>>>                         throw new AssertionError("Expected: " +
>>>                         expected + ", Actual: " +
>>>                         actual);
>>>                         }
>>>                         }
>>>                         ``` On Tue, Jun 9, 2015 at 1:07 PM Mehmet
>>>                         Dogan <mehmet at hazelcast.com
>>>                         <
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>>
>>>                         wrote: >/Hi all, />
>>>                         >/While I was testing my app using java 8, I
>>>                         encountered the previously />/reported
>>>                         sun.misc.Unsafe issue. />
>>>                         >/
>>> https://bugs.openjdk.java.net/browse/JDK-8076445
>>>                         />
>>>                         >/
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html
>>>                         />
>>>                         >/Issue status says it's resolved with
>>>                         resolution "Cannot Reproduce". But
>>>                         />/unfortunately it's still reproducible using
>>>                         "1.8.0_60-ea-b18" and />/"1.9.0-ea-b67". />
>>>                         >/Test is very simple: />
>>>                         >/``` />/public static void main(String[]
>>>                         args) throws Exception { />/Unsafe unsafe =
>>>                         findUnsafe(); />/// 10000 pass />/// 100000
>>>                         jvm crash />/// 1000000 fail />/int count =
>>>                         100000; />/long size = count * 8L; />/long
>>>                         baseAddress = unsafe.allocateMemory(size); />
>>>                         >/try { />/for (int i = 0; i < count; i++) {
>>>                         />/long address = baseAddress + (i * 8L); />
>>>                         >/long expected = i;
>>>                         />/unsafe.putLong(address, expected); />
>>>                         >/long actual = unsafe.getLong(address); />
>>>                         >/if (expected != actual) { />/throw new
>>>                         AssertionError("Expected: " + expected + ",
>>>                         />/Actual: " + actual); />/} />/} />/} finally
>>>                         { />/unsafe.freeMemory(baseAddress); />/} />/}
>>>                         />/``` />/It's not failing up to version
>>>                         1.8.0.31, by starting 1.8.0.40 test is
>>>                         />/failing constantly. />
>>>                         >/- With iteration count 10000, test is
>>>                         passing. />/- With iteration count 100000, jvm
>>>                         is crashing with SIGSEGV. />/- With iteration
>>>                         count 1000000, test is failing with
>>>                         AssertionError. />
>>>                         >/When one of compilation (-Xint) or inlining
>>>                         (-XX:-Inline) or />/on-stack-replacement
>>>                         (-XX:-UseOnStackReplacement) is disabled, test
>>>                         is not />/failing at all. />
>>>                         >/I tested on platforms: />/-
>>>                         Centos-7/openjdk-1.8.0.45 />/-
>>>                         OSX/oraclejdk-1.8.0.40 />/-
>>>                         OSX/oraclejdk-1.8.0.45 />/-
>>>                         OSX/oraclejdk-1.8.0_60-ea-b18 />/-
>>>                         OSX/oraclejdk-1.9.0-ea-b67 />
>>>                         >/Previous issue comment (
>>>                         />/
>>> https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043
>>> )
>>>                         />/says "Cannot reproduce based on the latest
>>>                         version". I hope that latest />/version is not
>>>                         mentioning to '1.8.0_60-ea-b18' or
>>>                         '1.9.0-ea-b67'. Because />/both are failing. />
>>>                         >/I'm looking forward to hearing from you. />
>>>                         >/Thanks, />/-Mehmet Dogan- />/-- />
>>>                         >/@mmdogan />
>>>
>>>
>>>                     --
>>>                     Serkan ?ZAL
>>>                     Remotest Software Engineer
>>>                     GSM: +90 542 680 39 18
>>>                     <tel:%2B90%20542%20680%2039%2018>
>>>                     Twitter: @serkan_ozal
>>>
>>>
>>>
>>>
>>>                 --
>>>                 Serkan ?ZAL
>>>                 Remotest Software Engineer
>>>                 GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>>                 Twitter: @serkan_ozal
>>>
>>>
>>>
>>>
>>>             --
>>>             Serkan ?ZAL
>>>             Remotest Software Engineer
>>>             GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>>             Twitter: @serkan_ozal
>>>
>>>
>>>
>>>
>>>
>>>     --
>>>     Serkan ?ZAL
>>>     Remotest Software Engineer
>>>     GSM: +90 542 680 39 18 <tel:%2B90%20542%20680%2039%2018>
>>>     Twitter: @serkan_ozal
>>>
>>
>>
>>
>>
>> --
>> Serkan ?ZAL
>> Remotest Software Engineer
>> GSM: +90 542 680 39 18
>> Twitter: @serkan_ozal
>>
>

-- 
Cheers, Martijn (Sent from Gmail Mobile)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150718/a542a9bc/attachment-0001.html>

From dean.long at oracle.com  Sat Jul 18 07:51:03 2015
From: dean.long at oracle.com (Dean Long)
Date: Sat, 18 Jul 2015 00:51:03 -0700
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <55A9033C.2030302@oracle.com>
References: <55A9033C.2030302@oracle.com>
Message-ID: <55AA0567.6070602@oracle.com>

I think we should distinguish the different uses and treat them accordingly:

1) padding nops for patching, executed

We need to be careful about inserting a fat nop here, if later patching 
overwrites only part of the fat nop, resulting in an illegal intruction.

2) padding nops for patching, never executed

It should be safe insert a fat nop here, but there's no point if the nops are not reachable and never executed.


3) alignment nops, never patched, executed

Fat nops are fine, but on some CPUs branching may be even better, so I 
suggest using align() for this, and letting align() decide what to 
generate.  The change in check_icache() could use a version of align 
that takes the target  offset as an argument:

348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);

4) alignment nops, never patched, never executed

Doesn't matter what we emit here, but we might as well make it 
understandable by humans using a debugger.


I believe the patching nops in c1_CodeStubs_x86.cpp and 
c1_LIRAssembler.cpp are patched concurrently while the code is running, 
not at a safepoint, so it's not clear to me if it's safe to use fat nops 
on x86.  I would consider those changes unsafe on x86 without further 
analysis of what happens during patching.

dl

On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
> Hi there,
>
> C1 is not very good at inlining and intrisifying methods, and hence the
> call performance is important there. One nit that we can see in the
> generated code on x86 is that C1 uses the single-byte nops, even for
> long nop strides.
>
> This improvement fixes that:
>    https://bugs.openjdk.java.net/browse/JDK-8131682
>    http://cr.openjdk.java.net/~shade/8131682/webrev.00/
>
> Testing:
>    - JPRT -testset hotspot on open platforms
>    - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>
> (I understand the symmetric change is going to be needed in closed
> parts, but let's polish the open part first).
>
> Thanks,
> -Aleksey
>


From roland.westrelin at oracle.com  Mon Jul 20 10:05:09 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 20 Jul 2015 12:05:09 +0200
Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not
	allowed any more
In-Reply-To: <559E9CA1.2050505@oracle.com>
References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com>
	<559E9CA1.2050505@oracle.com>
Message-ID: <DD6AD25D-2B1C-4DEE-8036-7278DFE08AD3@oracle.com>

Thanks for the reviews, Vladimir & Vladimir.

Roland.


From adinn at redhat.com  Mon Jul 20 13:22:28 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 20 Jul 2015 14:22:28 +0100
Subject: Query regarding ordering in G1 post-write barrier
Message-ID: <55ACF614.5020103@redhat.com>

$SUBJECT is in relation to the following code in method
GraphKit::g1_mark_card()

  . . .
  __ storeCM(__ ctrl(), card_adr, zero, oop_store, oop_alias_idx,
card_bt, Compile::AliasIdxRaw);

  //  Now do the queue work
  __ if_then(index, BoolTest::ne, zeroX); {

    Node* next_index = _gvn.transform(new SubXNode(index, __
ConX(sizeof(intptr_t))));
    Node* log_addr = __ AddP(no_base, buffer, next_index);

    // Order, see storeCM.
    __ store(__ ctrl(), log_addr, card_adr, T_ADDRESS,
Compile::AliasIdxRaw, MemNode::unordered);
    __ store(__ ctrl(), index_adr, next_index, TypeX_X->basic_type(),
Compile::AliasIdxRaw, MemNode::unordered);

  } __ else_(); {
    __ make_leaf_call(tf, CAST_FROM_FN_PTR(address,
SharedRuntime::g1_wb_post), "g1_wb_post", card_adr, __ thread());
  } __ end_if();
  . . .

The 3 stores StoreCM -> StoreP -> StoreX which mark the card and then
push the card address into the dirty queue appear to end up being
emitted in that order -- I assume by virtue of the memory links between
them (output of StoreCM is the mem input of StoreP, output of StoreP is
the mem input of StoreX). That order makes sense if the queue were to be
observed concurrently i.e. mark card to ensure write is flagged before
it is performed, write value, then decrement index to make value write
visible. So, that's ok on x86 where TCO means this is also the order of
visibility.

The StoreCM is 'ordered' i.e. it is flagged with mem ordering type =
mo_release. However, the latter pair of instructions are 'unordered'. I
looked at the G1 code which processes dirty queues and could not make
head nor tail (oops, apologies for any undercurrent of a pun) of when it
gets run.

So, the question is this: does dirty queue processing only happen in GC
threads when mutators cannot be writing them? or is there a need on
non-TCO architectures to maintain some sort of consistency in the queue
update via by serializing these writes?

Hmm, ok that seems to be two questions. Let's make that a starter for 10
and a bonus for whoever buzzes fastest.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From thomas.schatzl at oracle.com  Mon Jul 20 13:44:47 2015
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 20 Jul 2015 15:44:47 +0200
Subject: Query regarding ordering in G1 post-write barrier
In-Reply-To: <55ACF614.5020103@redhat.com>
References: <55ACF614.5020103@redhat.com>
Message-ID: <1437399887.2272.76.camel@oracle.com>

Hi,

On Mon, 2015-07-20 at 14:22 +0100, Andrew Dinn wrote:
> $SUBJECT is in relation to the following code in method
> GraphKit::g1_mark_card()
[...]
> The StoreCM is 'ordered' i.e. it is flagged with mem ordering type =
> mo_release. However, the latter pair of instructions are 'unordered'. I
> looked at the G1 code which processes dirty queues and could not make
> head nor tail (oops, apologies for any undercurrent of a pun) of when it
> gets run.
>
> So, the question is this: does dirty queue processing only happen in GC
> threads when mutators cannot be writing them? or is there a need on
> non-TCO architectures to maintain some sort of consistency in the queue
> update via by serializing these writes?

Mutator threads and refinement threads are running concurrently. While
GC threads are working on the queues, mutators never run.

While refinement threads may set and clear the mark on the card table
concurrently because they were processing that card concurrently, there
is afaik sufficient synchronization between the card mark and the
reference write on that in the code (in
HeapRegion::oops_on_card_seq_iterate_careful()). The refinement threads
do not modify or assume any value of the buffer's members, so there does
not seem to be a need in the write barrier for further synchronization.

Refinement threads and mutator threads take a lock to push and pop a new
buffer (in SharedRuntime::g1_wb_post) to and from the buffer queue,
which means that there is at least one cmpxchg/synchronization between
the thread pushing the buffer and the refinement thread popping it.

Which should be sufficient that the buffer's members are visible to all
participants of this exchange.

Note that GC threads do access the buffers, but there is a safepointing
operation between the accesses.

That should cover the eventualities. You can now tell us where I am
wrong :)

> Hmm, ok that seems to be two questions. Let's make that a starter for 10
> and a bonus for whoever buzzes fastest.

Not sure if that answers your question. So what did I win? :-)

Thanks,
  Thomas


From aleksey.shipilev at oracle.com  Mon Jul 20 13:52:15 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 20 Jul 2015 16:52:15 +0300
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
Message-ID: <55ACFD0F.30207@oracle.com>

Hi,

On the road from Unsafe to VarHandles lies a small deficiency in C1
Class.cast/isInstance optimization: the canonicalizer folds constant
class perfectly when it is coming from "inlined" constant, but not from
static final, because the constant "shapes" are different:
  https://bugs.openjdk.java.net/browse/JDK-8131782

And here is the fix:
  http://cr.openjdk.java.net/~shade/8131782/webrev.01/

(There is another ClassConstant shape, which is, AFAIU, the constant
"receiver" class as discovered within the static method. It does not
seem to appear in current java.lang.Class, and there seems to be no way
for users to stumble upon such a pattern. Which also means I can't do a
targeted test for it.)

Testing:
  * JPRT -testset hotspot on all open platforms;
  * Targeted benchmarks (see JIRA ticket)

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150720/87c60161/signature.asc>

From adinn at redhat.com  Mon Jul 20 14:14:42 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 20 Jul 2015 15:14:42 +0100
Subject: Query regarding ordering in G1 post-write barrier
In-Reply-To: <1437399887.2272.76.camel@oracle.com>
References: <55ACF614.5020103@redhat.com> <1437399887.2272.76.camel@oracle.com>
Message-ID: <55AD0252.3040007@redhat.com>

On 20/07/15 14:44, Thomas Schatzl wrote:
> 
> . . .
> 
> Refinement threads and mutator threads take a lock to push and pop a new
> buffer (in SharedRuntime::g1_wb_post) to and from the buffer queue,
> which means that there is at least one cmpxchg/synchronization between
> the thread pushing the buffer and the refinement thread popping it.
> 
> . . .
> 
> Note that GC threads do access the buffers, but there is a safepointing
> operation between the accesses.
> 
> That should cover the eventualities. You can now tell us where I am
> wrong :)

Thanks for the explanation. I don't doubt that you (and the code) are
right -- but I wanted to understand why.

>> Hmm, ok that seems to be two questions. Let's make that a starter for 10
>> and a bonus for whoever buzzes fastest.
> 
> Not sure if that answers your question. So what did I win? :-)

Why glory***, naturally.

regards,


Andrew Dinn
-----------
*** n.b. to find out what 'glory' means see Through The Looking Glass

From Alexander.Alexeev at caviumnetworks.com  Mon Jul 20 14:38:31 2015
From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander)
Date: Mon, 20 Jul 2015 14:38:31 +0000
Subject: RFR: aarch64: Typo in SHA intrinsics flags handling code for aarch64 
Message-ID: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>

Hello

Please review provided patch and sponsor if  approved.

Problem: SHA flags verification code checks condition for UseSHA256Intrinsics, but corrects UseSHA1Intrinsics.

The patch:
http://cr.openjdk.java.net/~aalexeev/1/webrev.00/

Regards,
Alexander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150720/ed717f6b/attachment-0001.html>

From aleksey.shipilev at oracle.com  Mon Jul 20 14:51:12 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 20 Jul 2015 17:51:12 +0300
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <55AA0567.6070602@oracle.com>
References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com>
Message-ID: <55AD0AE0.3060803@oracle.com>

Hi Dean,

Thanks for taking a look!

Silly me, I should have left the call patching cases intact, because
you're right, we should be able to patch the nops partially while still
producing the correct instruction stream. Therefore, I reverted the
cases where we do nop-ing for *instruction* patching, and added the
comment there.

Other places seem to use the nop sequences to provide the alignment, not
for the general patching. Especially interesting for us is the case of
aligning the patcheable immediate in the existing call. C2 does the nops
in these cases.

New webrev:
  http://cr.openjdk.java.net/~shade/8131682/webrev.01/

Testing:
  * JPRT -testset hotspot on open platforms;
  * Targeted benchmarks, plus eyeballing the assembly;

Thanks,
-Aleksey

On 18.07.2015 10:51, Dean Long wrote:
> I think we should distinguish the different uses and treat them
> accordingly:
> 
> 1) padding nops for patching, executed
> 
> We need to be careful about inserting a fat nop here, if later patching
> overwrites only part of the fat nop, resulting in an illegal intruction.
> 
> 2) padding nops for patching, never executed
> 
> It should be safe insert a fat nop here, but there's no point if the
> nops are not reachable and never executed.
> 
> 
> 3) alignment nops, never patched, executed
> 
> Fat nops are fine, but on some CPUs branching may be even better, so I
> suggest using align() for this, and letting align() decide what to
> generate.  The change in check_icache() could use a version of align
> that takes the target  offset as an argument:
> 
> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);
> 
> 4) alignment nops, never patched, never executed
> 
> Doesn't matter what we emit here, but we might as well make it
> understandable by humans using a debugger.
> 
> 
> I believe the patching nops in c1_CodeStubs_x86.cpp and
> c1_LIRAssembler.cpp are patched concurrently while the code is running,
> not at a safepoint, so it's not clear to me if it's safe to use fat nops
> on x86.  I would consider those changes unsafe on x86 without further
> analysis of what happens during patching.
> 
> dl
> 
> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
>> Hi there,
>>
>> C1 is not very good at inlining and intrisifying methods, and hence the
>> call performance is important there. One nit that we can see in the
>> generated code on x86 is that C1 uses the single-byte nops, even for
>> long nop strides.
>>
>> This improvement fixes that:
>>    https://bugs.openjdk.java.net/browse/JDK-8131682
>>    http://cr.openjdk.java.net/~shade/8131682/webrev.00/
>>
>> Testing:
>>    - JPRT -testset hotspot on open platforms
>>    - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>>
>> (I understand the symmetric change is going to be needed in closed
>> parts, but let's polish the open part first).
>>
>> Thanks,
>> -Aleksey
>>
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150720/21c7794e/signature.asc>

From john.r.rose at oracle.com  Mon Jul 20 21:14:59 2015
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 20 Jul 2015 14:14:59 -0700
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
In-Reply-To: <55ACFD0F.30207@oracle.com>
References: <55ACFD0F.30207@oracle.com>
Message-ID: <C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>

On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> 
> Hi,
> 
> On the road from Unsafe to VarHandles lies a small deficiency in C1
> Class.cast/isInstance optimization: the canonicalizer folds constant
> class perfectly when it is coming from "inlined" constant, but not from
> static final, because the constant "shapes" are different:
>  https://bugs.openjdk.java.net/browse/JDK-8131782
> 
> And here is the fix:
>  http://cr.openjdk.java.net/~shade/8131782/webrev.01/
> 
> (There is another ClassConstant shape, which is, AFAIU, the constant
> "receiver" class as discovered within the static method. It does not
> seem to appear in current java.lang.Class, and there seems to be no way
> for users to stumble upon such a pattern. Which also means I can't do a
> targeted test for it.)
> 
> Testing:
>  * JPRT -testset hotspot on all open platforms;
>  * Targeted benchmarks (see JIRA ticket)

I suggest a deeper fix, to the factory that produces the oddly formatted constant.
That may help with other, similar constant folding problems.

? John

diff --git a/src/share/vm/c1/c1_ValueType.cpp b/src/share/vm/c1/c1_ValueType.cpp
--- a/src/share/vm/c1/c1_ValueType.cpp
+++ b/src/share/vm/c1/c1_ValueType.cpp
@@ -153,7 +153,19 @@
     case T_FLOAT  : return new FloatConstant (value.as_float ());
     case T_DOUBLE : return new DoubleConstant(value.as_double());
     case T_ARRAY  : // fall through (ciConstant doesn't have an array accessor)
-    case T_OBJECT : return new ObjectConstant(value.as_object());
+    case T_OBJECT : {
+      // FIXME: use common code with GraphBuilder::load_constant
+      ciObject* obj = value.as_object();
+      if (obj->is_null_object())
+        return objectNull;
+      if (obj->is_loaded()) {
+        if (obj->is_array())
+          return new ArrayConstant(obj->as_array());
+        else if (obj->is_instance())
+          return new InstanceConstant(obj->as_instance());
+      }
+      return new ObjectConstant(obj);
+    }
   }
   ShouldNotReachHere();
   return illegalType;


From john.r.rose at oracle.com  Mon Jul 20 23:05:07 2015
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 20 Jul 2015 16:05:07 -0700
Subject: On constant folding of final field loads
In-Reply-To: <5592E75A.1000500@oracle.com>
References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com>
	<559143D2.2020201@oracle.com> <5592662C.40400@oracle.com>
	<5592E75A.1000500@oracle.com>
Message-ID: <C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>

On Jun 30, 2015, at 12:00 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Aleksey,
> 
>>>> Big picture question: do we actually care about propagating final field
>>>> values once the object escaped (and in this sense, available to be
>>>> introspected by the compiler)?
>>>> 
>>>> Java memory model does not guarantee the final field visibility when the
>>>> object had escaped. The very reason why deserialization works is because
>>>> the deserialized object had not yet been published.
>>>> 
>>>> That is, are we in line with the spec and general expectations by
>>>> folding the final values, *and* not deoptimizing on the store?
>>> Can you elaborate on your point and interaction with JMM a bit?
>>> 
>>> Are you talking about not tracking constant folded final field values at
>>> all, since there are no guarantees by JMM such updates are visible?
>> 
>> Yup. AFAIU the JMM, there is no guarantees you would see the updated
>> value for final field after the object had leaked. So, spec-wise you may
>> just use the final field values as constants. I think the only reason
>> you have to do the dependency tracking is when constant folding depends
>> on instance identity.
>> 
>> So, my question is, do we knowingly make a goodwill call to deopt on
>> final field store, even though it is not required by spec? I am not
>> opposing the change, but I'd like us to understand the implications better.
> That's a good question.

I believe that the JMM doesn't give users any hope for changing the
value of a final field, apart from objects under initialization, and
the specific cases of System.setErr and its two evil twins.  (Did I
miss a third case?  There aren't many.)

Rather than wait for the unthinkable and throw a de-opt, I would
prefer to make more positive checks against final field changing, and
throw a suitable exception when (if ever) an application sets a final
field not in a scenario envisioned by the JMM.  The value of this
would be that whatever context markers or annotations we use to make
these checks will also help guide the *suppression* of the final field
folding optimization.

> I consider it more like a quality of implementation aspect. Neither Reflection nor Unsafe APIs are part of JVM/JLS spec, so I don't think possibility of final field updates should be taken into account there.

Reflection allows apps. to emulate the source semantics of Java
programs, and (independently) it provides access to some run-time
metadata.  Whatever it does with final should correspond (within
reason) to source semantics.  Unsafe is whatever we want it to be, as
a simple, well-factored set of building blocks to implement low-level
JVM operations and (independently) provide access to some run-time
features of the hardware platform.  Therefore, Unsafe and Reflection
are partially coupled to final semantics.

With that said, I think it may be undesirable to push final-bit
checking into the Unsafe API.  Unsafe loads and stores should map to
single memory instructions (with doubly-indexed, unscaled addresses).
If we add extra "tag" bits to (say) offsets, we will have to "untag"
those offsets when the instruction executes (if the offsets are not
JIT-time constants); that is an extra instruction.

> In order to avoid surprises and inconsistencies (old value vs new value depending on execution path) which are *very* hard to track down, VM should either completely forbid final field changes or keep track of them and adapt accordingly.

I like the "forbid" option, also known as "fail fast".  I think (in general)
we should (where we can) remove indeterminate behavior from the
JVM specification, such as "what happens when I store a new value
to a final at an unexpected time".

We have enough bits in the object header to encode frozen-ness.
This is an opposite property:  slushiness-of-finals.  We could require
that the newInstance operation used by deserialization would create
slushy objects.  (The normal new/<init> sequence doesn't need this.)
Ideally, we would want the deserializer to issue an explicit "publish"
operation, which would clear the slushy flag.  JITs would consult
that flag to gate final-folding.  Reflection (and other users of
Unsafe) would consult the flag and throw a fail-fast error if it
failed.  There would have to be some way to limit the time
an object is in the slushy state, ideally by enforcing an error
on deserializers who neglect to publish (or discard) a slushy
object.  For example, we could require an annotation on
deserialization methods, as we do today on caller-sensitive
methods.

That's the sort of thing I would prefer to see, to remove
indeterminate behavior.

> 
>> For example, I can see the change gives rise to some interesting
>> low-level coding idioms, like:
>> 
>> final boolean running = true;
>> Field runningField = resolve(...); // reflective
>> 
>> // run stuff for minutes
>> void m() {
>>   while (running) { // compiler hoists, turns into while(true)
>>      // do stuff
>>   }
>> }
>> 
>> void hammerTime() {
>>   runningField.set(this, false); // deopt, break the loop!
>> }
>> 
>> Once we allow users to go crazy like that, it would be cruel to
>> retract/break/change this behavior.

You can simulate this (very interesting) pattern using the "target" variable
of a MutableCallSite.  I.e., the "fold then deopt" use case is supported
by MutableCallSite.setTarget.

Those variable semantics should *not* be overloaded on final.
That pattern, if driven by a special variable, deserves a new
kind of variable.

The key parameters would be 1) allowed state transitions between
blank, set, reset, dead, and 2) expected frequency of various
transitions.  The frequencies are guesses, not user contracts,
and the JVM would have to measure and retune to cope with
surprises.

(One thing I wonder about:  What could a "volatile final" be?
My best suggestion is a moderate extension of blank finals:
  http://cr.openjdk.java.net/~jrose/draft/lazy-final.html )

>> But I speculate those cases are not pervasive. By and large, people care
>> about final ops to jump through the barriers. For example, the final
>> load can be commonned through the acquires / control flow. See e.g.:
>>  http://psy-lob-saw.blogspot.ru/2014/02/when-i-say-final-i-mean-final.html
> >
>>>>> Regarding alternative approaches to track the finality, an offset bitmap
>>>>> on per-class basis can be used (containing locations of final fields).
>>>>> Possible downsides are: (1) memory footprint (1/8th of instance size per
>>>>> class); and (2) more complex checking logic (load a relevant piece of a
>>>>> bitmap from a klass, instead of checking locally available offset
>>>>> cookie). The advantage is that it is completely transparent to a user:
>>>>> it doesn't change offset translation scheme.
>>>> 
>>>> I like this one. Paying with slightly larger memory footprint for API
>>>> compatibility sounds reasonable to me.
>>> 
>>> I don't care about cases when Unsafe API is abused (e.g. raw memory
>>> writes on absolute address or arbitrary offset in an object). In the
>>> end, it's unsafe API, right? :-)

Today's abuse = tomorrow's use.  Whatever we might want to do with
a memory instruction is a possible valid use for Unsafe.  For Project
Panama I expect we will be using the managed heap to store temporary
native values.  The envelope will be something like a new long[2],
but the layout (after the envelope header = array base) will *not*
be something the JVM knows about in detail; it will be sliced up
by Unsafe operations into native bits and bytes.  And likewise with
malloc-buffers (where the whole VA is stuffed in the offset).

>> Yeah, but with millions of users, we are in a bit of a (implicit)
>> compatibility bind here ;)
> 
> That's why I deliberately tried to omit compatibility aspect discussion for now :-)
> 
> Unsafe is unique: it's not a supported API, but nonetheless many people rely on it. It means we can't throw it away (even in a major release), but still we are not as limited as with official public API.
> 
> As part of Project Jigsaw there's already an attempt to do an incompatible change for Unsafe API. Depending on how it goes, we can get some insights how to address compatibility concerns (e.g. preserve original behavior in Java 8 compatibility mode).
> 
> What I'm trying to understand right now, before diving into compatibility details, is whether Unsafe API allows offset encoding scheme change itself and what can be done to make it happen.

The decision to make offsets opaque was mine; the idea was to hide more
details of object layout, for example in case the JVM ever used non-flat
layouts for objects.  (It never has.  It might for objects containing larger
value types; we don't know yet.)  At least some offsets want to be
occur in arithmetic sequences (arrays and now misaligned accesses).

Of course 64 bits can encode a lot of stuff, so it would be possible to mix
together both symbolic information (type tags, finality and other mode tags)
with pure offset or address information.  Over 32 bits of arithmetic sequence
range can co-exist with this, by putting the tags at either end of the word.
But (back to my earlier comment) this makes it hard to compile Unsafe ops
as single instructions.

Folding tags into offsets will make it harder for Panama-type APIs to perform
address arithmetic (they will have to work around the tags).  The Unsafe
API would have to expose operations like offsetAdd(long o, int delta) and
offsetDifference(long o1, long o2).


> Though offset value is explicitly described in API as an opaque offset cookie, I spotted 2 inconsistencies in the API itself:
> 
>  * Unsafe.get/set*Unaligned() require absolute offsets;
> These methods were added in 9, so haven't leaked into public yet.

Yep.  That seems to push for a high-tag (color bits in the MSB of
the offset), or (my preference) no tag or separate tag.
You could also copy the alignment bits into the LSB to co-exist
with a tag.

(The "separate tag" option means something like having a
query for the "tag" as well as the base and offset of a variable.
The operations getInt, etc., would take an optional third argument,
which would be the tag associated with the base and offset.
This would allow address arithmetic to remain trivial, at
the expense of retooling uses of Unsafe that need to be
sensitive to tagging concerns.)

> Andrew, can you comment on why you decided to stick with absolute offsets and not preserving Unsafe.getInt() addressing scheme?

(The outcome is that the unaligned guys have the same signatures as the aligned ones.)

>  * Unsafe.copyMemory()
> Source and destination addressing operate on offset cookies, but amount of copied data is expressed in bytes. In order to do bulk copies of consecutive memory blocks, the user should be able to convert offset cookies to byte offset and vice versa. There's no way to do that with current API.

Right.

> Are you aware of any other use cases when people rely on absolute offsets?
> 
> I thought about VarHandles a bit and it seems they aren't a silver bullet - they should be based on Unsafe (or stripped Unsafe equivalent) anyway.
> 
> Unsafe.fireDepChange is a viable option for Reflection and MethodHandles. I'll consider it during further explorations. The downside is that it puts responsibility of tracking final field changes on a user, which is error-prone. There are places in JDK where Unsafe is used directly and they should be analyzed whether a final field is updated or not on a case-by-case basis.

Idea:  If we go with a three-argument version of getInt, the legacy two-argument
version could do a more laborious check.  The best way to motivate users of
Unsafe to refresh their code (probably) is to improve performance.  Recovering
lost performance (due to increased safety) is a tactic we can use too, although
it is less enjoyable all around.

I wish we had value types already; we could make a lot of this clearer if we
were able to give cookies their own opaque 64-bit type.

(Hacky idea:  Use "double" as an envelope type for a second kind of cookie,
since "long" is taken.  Hacky idea killer:  There is an implicit conversion from
long to double, which is probably harmful.)
> 
> It's basically opt-in vs opt-out approaches. I'd prefer a cleaner approach, if there's a solution for compatibility issues.
> 
>>> So, my next question is how to proceed. Does changing API and providing
>>> 2 set of functions working with absolute and encoded offsets solve the
>>> problem? Or leaving Unsafe as is (but clarifying the API) and migrating
>>> Reflection/j.l.i to VarHandles solve the problem? That's what I'm trying
>>> to understand.
>> 
>> I would think Reflection/j.l.i would eventually migrate to VarHandles
>> anyway. Paul? The interim solution for encoding final field flags
>> shouldn't leak into (even Unsafe) API, or at least should not break the
>> existing APIs.
>> 
>> I further think that an interim solution makes auxiliary single
>> Unsafe.fireDepChange(Field f / long addr) or something, and uses it
>> along with the Unsafe calls in Reflection/j.l.i, when wrappers know they
>> are dealing with final fields. In other words, should we try to reuse
>> the knowledge those wrappers already have, instead of trying to encode
>> the same knowledge into offset cookies?
> >
>>>>> II. Managing relations between final fields and nmethods
>>>>> Another aspect is how expensive dependency checking becomes.
>> 
>>>> Isn't the underlying problem being the dependencies are searched
>>>> linearly? At least in ConstantFieldDep, can we compartmentalize the
>>>> dependencies by holder class in some sort of hash table?
>>> In some cases (when coarse-grained (per-class) tracking is used), linear
>>> traversal is fine, since all nmethods will be invalidated.
>>> 
>>> In order to construct a more efficient data structure, you need a way to
>>> order or hash oops. The problem with that is oops aren't stable - they
>>> can change at any GC. So, either some stable value should be associated
>>> with them (System.identityHashCode()?) or dependency tables should be
>>> updated on every GC.
>> 
>> Yeah, like Symbol::_identity_hash.
> Symbol is an internal VM entity. Oops are different. They are just pointers to Java object (OOP = Ordinary Object Pointer). The only doable way is piggyback on object hash code. I won't dive into details here, but there are many intricate consequences.

We sometimes use binary search instead of identity_hash, e.g., in the CI.
We could create a data structure which carries a GC generation counter,
and re-sort lazily as needed.  In principle, the GC could help with re-sorting.
The API could look like a fixed-sized table containing a pair of aligned arrays:
Object[] key, int[] value.  The arrays would have to be encapsulated, since
they can change order at any moment (if GC kicks in), but it would be
reasonable to work with snapshots of them for bulk queries.  The
underlying arrays could be ordinary Java arrays, perhaps with blocking
to facilitate growth. Native methods or specially-marked non-interruptable
methods would perform the required transactions.  Access cost would be
O(log(N)).  This feels like it might be useful for things besides dependencies.

>>> Unless existing machinery can be sped up to appropriate level, I
>>> wouldn't consider complicating things so much.
>> 
>> Okay. I just can't escape the feeling we keep band-aiding the linear
>> searches everywhere in VM on case-to-case basis, instead of providing
>> the asymptotic guarantees with better data structures.
> Well, class-based dependency contexts have been working pretty well for KlassDeps. They worked pretty well for CallSiteDeps as well, once a more specific context was used (I introduced a specialized CallSite instance-based implementation because it is simpler to maintain).
> 
> It's hard to come up with a narrow enough class context for ConstantFieldDeps, so, probably, it's a good time to consider a different approach to index nmethod dependencies. But assuming final field updates are rare (with the exception of deserialization), it can be not that important.
> 
>>> The 3 optimizations I initially proposed allow to isolate
>>> ConstantFieldDep from other kinds of dependencies, so dependency
>>> traversal speed will affect only final field writes. Which is acceptable
>>> IMO.
>> 
>> Except for an overwhelming number of cases where the final field stores
>> happen in the course of deserialization. What's particularly bad about
>> this scenario is that you wouldn't see the time burned in the VM unless
>> you employ the native profiler, as we discovered in Nashorn perf work.
> Yes, deserialization is a good example. It's special because it operates on freshly created objects, which, as you noted, haven't escaped yet. It'd be nice if VM can skip dependency checking in such case (either automatically or with explicit hints).

Agree.  The "slushy bit" might help.  Hard part:  It would have to co-exist
with identityHashCode (because deserialization uses that also IIRC).

? John

> In order to diagnose performance problems with excessive dependency checking, VM can monitor it closely (UsePerfData counters + JFR events + tracing should provide enough information to spot issues).
> 
>> Recapping the discussion in this thread, I think we would need to have a
>> more thorough performance work for this change, since it touches the
>> very core of the platform. I think many people outside the
>> hotspot-compiler-dev understand some corner intricacies of the problem
>> that we miss. JEP and outcry for public comments, maybe?
> Yes, I planned to get quick feedback on the list and then file a JEP as a followup.
> 
> Thanks again for the feedback, Aleksey!
> 
> Best regards,
> Vladimir Ivanov


From edward.nevill at gmail.com  Tue Jul 21 09:18:35 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 21 Jul 2015 10:18:35 +0100
Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code
	for aarch64
In-Reply-To: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>
References: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>
Message-ID: <1437470315.1575.9.camel@mylittlepony.linaroharston>

On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote:

> Please review provided patch and sponsor if  approved.
> Problem: SHA flags verification code checks condition for
> UseSHA256Intrinsics, but corrects UseSHA1Intrinsics.
> The patch:
> http://cr.openjdk.java.net/~aalexeev/1/webrev.00/

Hi Alexander,

Thanks for fixing this. I will sponsor this patch.

Here is the changeset.

http://cr.openjdk.java.net/~enevill/8132010/webrev

I have tested this before and after with hotspot jtreg

Before: Test results: passed: 876; failed: 3; error: 7
After: Test results: passed: 877; failed: 2; error: 7

The 1 test fixed is the test

compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java

This regression was introduced in the following changeset

http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 

Could I have an official reviewer for this please. As this is a trivial
1 liner I think one reviewer should be sufficient.

All the best,
Ed.


From aleksey.shipilev at oracle.com  Tue Jul 21 10:05:38 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Tue, 21 Jul 2015 13:05:38 +0300
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class
	is loaded from static final
In-Reply-To: <C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
Message-ID: <55AE1972.4050106@oracle.com>

On 21.07.2015 00:14, John Rose wrote:
> On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>> On the road from Unsafe to VarHandles lies a small deficiency in C1
>> Class.cast/isInstance optimization: the canonicalizer folds constant
>> class perfectly when it is coming from "inlined" constant, but not from
>> static final, because the constant "shapes" are different:
>>  https://bugs.openjdk.java.net/browse/JDK-8131782

> I suggest a deeper fix, to the factory that produces the oddly formatted constant.
> That may help with other, similar constant folding problems.

All right, let's do that!
  http://cr.openjdk.java.net/~shade/8131782/webrev.02/

I respinned it through JRPT and my targeted benchmarks, and it performs
the same as previous patch.

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150721/ce3f58ab/signature.asc>

From aph at redhat.com  Tue Jul 21 10:53:11 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 21 Jul 2015 11:53:11 +0100
Subject: On constant folding of final field loads
In-Reply-To: <C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com>
	<559143D2.2020201@oracle.com> <5592662C.40400@oracle.com>
	<5592E75A.1000500@oracle.com>
	<C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
Message-ID: <55AE2497.8040405@redhat.com>

On 21/07/15 00:05, John Rose wrote:

>> > Andrew, can you comment on why you decided to stick with absolute
>> > offsets and not preserving Unsafe.getInt() addressing scheme?

> (The outcome is that the unaligned guys have the same signatures as
> the aligned ones.)

Indeed it does.

I had a look around at the way Unsafe.getXXX(Object, long) is used.
One of the most common usages is with arrays.  There, the offset is
the result of address arithmetic so it cannot be an opaque cookie, and
there is no way to make it so without breaking all usages with arrays.

Also there is the guarantee that you can use
Unsafe.getXXXUnaligned(null, address) to fetch data from an absolute
address in memory.  To discover that this latter usage is explicitly
allowed surprised me, but it does mean that the offset can not be an
opaque handle unless we special-case the null form.  And I think we
don't want to do that.

Andrew.

From tobias.hartmann at oracle.com  Tue Jul 21 13:40:18 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 21 Jul 2015 15:40:18 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
Message-ID: <55AE4BC2.1090104@oracle.com>

Hi,

please review the following patch.

https://bugs.openjdk.java.net/browse/JDK-8130309
http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/

Problem:
While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.

More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
- MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
- CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.

Solution:
Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. 

Testing:
- Failing test
- JPRT

Thanks,
Tobias

From edward.nevill at gmail.com  Tue Jul 21 14:27:52 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 21 Jul 2015 15:27:52 +0100
Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling
	code for aarch64
In-Reply-To: <1437470315.1575.9.camel@mylittlepony.linaroharston>
References: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>
	<1437470315.1575.9.camel@mylittlepony.linaroharston>
Message-ID: <1437488872.6057.3.camel@mylittlepony.linaroharston>

On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote:
> On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote:
> 
> > Please review provided patch and sponsor if  approved.
> > Problem: SHA flags verification code checks condition for
> > UseSHA256Intrinsics, but corrects UseSHA1Intrinsics.
> > The patch:
> > http://cr.openjdk.java.net/~aalexeev/1/webrev.00/
> 
> Hi Alexander,
> 
> Thanks for fixing this. I will sponsor this patch.
> 
> Here is the changeset.
> 
> http://cr.openjdk.java.net/~enevill/8132010/webrev

Please disregard the above webrev. I had outstanding outgoing changes.

Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp

http://cr.openjdk.java.net/~enevill/8132010/webrev.01

Sorry for the confusion, working on too many changesets at once.

Ed.

> 
> I have tested this before and after with hotspot jtreg
> 
> Before: Test results: passed: 876; failed: 3; error: 7
> After: Test results: passed: 877; failed: 2; error: 7
> 
> The 1 test fixed is the test
> 
> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java
> 
> This regression was introduced in the following changeset
> 
> http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 
> 
> Could I have an official reviewer for this please. As this is a trivial
> 1 liner I think one reviewer should be sufficient.
> 
> All the best,
> Ed.
> 
> 


From zoltan.majo at oracle.com  Tue Jul 21 14:32:57 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 21 Jul 2015 16:32:57 +0200
Subject: [aarch64-port-dev ] RFR: 8132010: aarch64: Typo in SHA intrinsics
	flags handling code for aarch64
In-Reply-To: <1437488872.6057.3.camel@mylittlepony.linaroharston>
References: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>	<1437470315.1575.9.camel@mylittlepony.linaroharston>
	<1437488872.6057.3.camel@mylittlepony.linaroharston>
Message-ID: <55AE5819.3070506@oracle.com>

Hi,


the fix looks good to me (I'm not a *R*eviewer).

Thank you and best regards,


Zoltan


On 07/21/2015 04:27 PM, Edward Nevill wrote:
> On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote:
>> On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote:
>>
>>> Please review provided patch and sponsor if  approved.
>>> Problem: SHA flags verification code checks condition for
>>> UseSHA256Intrinsics, but corrects UseSHA1Intrinsics.
>>> The patch:
>>> http://cr.openjdk.java.net/~aalexeev/1/webrev.00/
>> Hi Alexander,
>>
>> Thanks for fixing this. I will sponsor this patch.
>>
>> Here is the changeset.
>>
>> http://cr.openjdk.java.net/~enevill/8132010/webrev
> Please disregard the above webrev. I had outstanding outgoing changes.
>
> Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp
>
> http://cr.openjdk.java.net/~enevill/8132010/webrev.01
>
> Sorry for the confusion, working on too many changesets at once.
>
> Ed.
>
>> I have tested this before and after with hotspot jtreg
>>
>> Before: Test results: passed: 876; failed: 3; error: 7
>> After: Test results: passed: 877; failed: 2; error: 7
>>
>> The 1 test fixed is the test
>>
>> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java
>>
>> This regression was introduced in the following changeset
>>
>> http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2
>>
>> Could I have an official reviewer for this please. As this is a trivial
>> 1 liner I think one reviewer should be sufficient.
>>
>> All the best,
>> Ed.
>>
>>
>


From alian567 at 126.com  Tue Jul 21 14:39:56 2015
From: alian567 at 126.com (=?GBK?B?wO7Rqb78?=)
Date: Tue, 21 Jul 2015 22:39:56 +0800 (CST)
Subject: error when building hotspot in aarch64.
Message-ID: <1ba1f4e5.fb28.14eb10e8755.Coremail.alian567@126.com>

 configure cmd: 
configure --openjdk-target=aarch64 --with-debug-level=slowdebug
make
error happen in frame_aarch64.cpp 
in frame::frame(....){
  init(...);
}
init is undefined.


is init function missing the implementation?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150721/66e9916b/attachment-0001.html>

From edward.nevill at gmail.com  Tue Jul 21 15:18:15 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 21 Jul 2015 16:18:15 +0100
Subject: RFR: 8131062: aarch64: add support for GHASH acceleration
Message-ID: <1437491895.6739.17.camel@mylittlepony.linaroharston>

Hi,

http://cr.openjdk.java.net/~enevill/8131062/webrev.0/

adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions.

This patch was contributed by alexander.alexeev at caviumnetworks.com

Note that the 128 pmull instructions are not supported on all aarch64. The patch uses the HWCAP_PMULL bit from getauxv() to determine whether the 128 bit pmull is supported.

I have tested this with jtreg / hotspot.

Without patch: Test results: passed: 876; failed: 3; error: 9
With patch: Test results: passed: 876; failed: 3; error: 9

In both cases the set of failing/error tests is identical.

I have done some performance testing using TestAESMain from the jtreg/hotspot test suite. Here are the results I get:-

java -XX:-UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain

encode time = 66945.63635, decode time = 34085.08754

java -XX:+UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain

encode time = 43469.38244, decode time = 17783.6603

This is an improvement of 54% and 92% respectively.

Alexander has done some benchmarking to measure the raw performance improvement of GHASH on its own using the following benchmark.

http://cr.openjdk.java.net/~enevill/8131062/GHash.java

Here are the results he gets:-

-XX:-UseGHASHIntrinsics.

Benchmark             Mode  Cnt    Score   Error  Units
GHash.calculateGHash  avgt    5  118.688 ? 0.009  us/op

-XX:+UseGHASHIntrinsics
Benchmark             Mode  Cnt   Score   Error  Units
GHash.calculateGHash  avgt    5  21.164 ? 1.763  us/op

This represents a 5.6X speed increase on the raw GHASH performance.

Thanks your your review,

Ed.


From zoltan.majo at oracle.com  Tue Jul 21 15:19:26 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 21 Jul 2015 17:19:26 +0200
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>
References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com>
	<55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com>
	<55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com>
	<7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>
Message-ID: <55AE62FE.4070502@oracle.com>

Hi John,


thank you for the feedback!

On 07/15/2015 09:11 PM, John Rose wrote:
> On Jul 15, 2015, at 11:44 AM, Vladimir Kozlov 
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>> Looks good to me. We still need John's opinion about state transition.
>
> Just sent a 1-1 reply; here it is FTR.
>
> On Jul 14, 2015, at 9:42 AM, Zolt?n Maj? <zoltan.majo at oracle.com 
> <mailto:zoltan.majo at oracle.com>> wrote:
>>
>> So far, I tried Vladimir's solution of going into VM state in 
>> Compile::make_vm_intrinsic() [2] and it works well.
>>
>> What we could also do is
>> - (1) list in the switch statement in is_intrinsic_available_for() 
>> the intrinsic ids of all methods of interest (similarly to the way we 
>> do it for C1); that would eliminate the need to make checks based on 
>> a method's holder;
>> - (2) for the DisableIntrinsic checks (that need to call 
>> CompilerOracle::has_option_value()we could define a separate method 
>> that is called directly from a WhiteBox context and through the CI 
>> from make_vm_intrinsic.
>
> This is going to be a good cleanup.  But it is hard to follow, so please
> regard my comments as tentative.
>
> Some comments:
>
> I think the term "_for" is a noise word as deprecated in:
> https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-NamingNaming

I removed the "_for" suffix from relevant method names.

It seems that the style guide does not list "_for" as a noise word. I 
tried to update the page, but I don't seem to have the necessary access 
rights.

>
> I agree with the tendency to factor stuff (when possible) away from 
> the guts of the compilers.
>
> Suggest Compile::intrinsic_does_virtual_dispatch_for be moved to 
> vmIntrinsics::does_virtual_dispatch.
> It's really part of the vmIntrinsics contract.  Same for can_trap (or 
> whatever it is called).
> If it can't be wedged into vmSymbols.cpp, then at least consider 
> abstractCompiler.cpp.

I relocated the following methods:

Compile::intrinsic_does_virtual_dispatch_for -> 
vmIntrinsics::does_virtual_dispatch
Compile::intrinsic_predicates_needed_for -> vmIntrinsics::predicates_needed
GraphBuilder::intrinsic_can_trap-> vmIntrinsics::can_trap
GraphBuilder::intrinsic_preserves_state -> vmIntrinsics::preserves_state

>
> Similar comment about is_intrinsic_available[_for].  Because of the 
> dependency
> on the compiler tier, it has to be virtual, of course.

OK, I changed the name of AbstractCompiler::is_intrinsic_available_for 
to AbstractCompiler::is_intrinsic_available. The method is virtual and 
is overridden by Compiler and by C2Compiler.

> Suggest a static
> vmIntrinsics::is_disabled_by_flags, to check for compiler-independent 
> disabling logic.
> Method::is_intrinsic_disabled is a good thought, but I would suggest 
> making it
> a static method on vmIntrinsic, because the Method* pointer is just a 
> wrapper around
> the intrinsic_id.  Stripping the Method* would let you avoid a 
> VM_ENTRY_MARK
> in ciMethod::* if the context argument if null (true for C1?).

I managed to extract most of the flag-disabling logic (the parts common 
to C1 and C2) to the vmIntrinsics::is_disabled_by_flags static method.

The compiler-specific parts are implemented by the hierarchy starting at 
AbstractCompiler::is_intrinsic_disabled_by_flag. Some of the 
compiler-specific flag-disabling logic might be also considered as an 
inconsistency between C1 and C2 (please see below).

>
> The "flag soup" logic in C2 is frustrating, and may defeat an attempt 
> to factor
> the -XX flag checking into vmIntrinsics, but I encourage you to try.
> The Matcher calls can be layered on separately, in C2-specific code.
> The vm_version checks can go in the same C1/C2-specific layer
> as the C2 matcher checks.  (Or perhaps factored into abstractCompiler,
> but that may be overkill.)

I factored the Matcher calls (for C2) and the vm_version checks (for C1) 
into the hierarchy starting at AbstractCompiler::is_intrinsic_supported().

Regarding the compiler-specific flag-disabling logic, we can make the 
following observations:

1) The DisableIntrinsic flag is C2-specific therefore it is currently 
included in C2Compiler::is_intrinsic_disabled_by_flag.

2) The InlineNatives flag disables most but not all intrinsics. There 
are some intrinsics (implemented by both C1 and C2) but that 
-XX:-InlineNatives turns off for C1 but leaves unaffected for C2.

3) The _getClass intrinsic (implemented by both C1 and C2) is turned off 
by -XX:-InlineClassNatives for C1 and is left unaffected by C2.

4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject,  
_compareAndSwapLong, and _compareAndSwapInt intrinsics are turned off by 
-XX:-InlineUnsafeOps for C2 and are unaffected by C1.

Compiler-specific functionality related to observations (1)-(4) is 
currently implemented in the hierarchy starting at 
AbstractCompiler::is_intrinsic_disabled_by_flag. If we decide to 
standardize some parts of flag processing, we can move the relevant 
functionality to vmIntrinsics::is_disabled_by_flag().

>
> Regarding your original question:  I would prefer that the VM_ENTRY
> logic be confined to the CI, but there is no functional reason the
> compiler itself can't do a native-to-VM transition.

Thank you for clarifying! Currently, C1 can perform all checks without 
going into VM mode. For C2, only vmIntrinsics::is_disabled_by_flags() 
can be executed in native mode, it seems that the rest of the checks 
needed by C2 must be performed in VM mode. The logic in 
Compile::make_vm_intrinsic reflects these considerations.

Here is the newest webrev:
- top: http://cr.openjdk.java.net/~zmajo/8130832/top/
- hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/

Testing:
- all JTREG tests run locally (linux-x86_64), all pass;
- all JPRT tests (testset hotspot, including the newly added test, 
compiler/intrinsics/IntrinsicAvailableTest.java), all tests pass;
- all tests in hotspot/test/compiler/intrinsics/mathexact on aarch64, 
all tests pass.

Thank you!

Best regards,


Zoltan


>
> ? John
>


From vladimir.x.ivanov at oracle.com  Tue Jul 21 16:29:08 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 21 Jul 2015 19:29:08 +0300
Subject: On constant folding of final field loads
In-Reply-To: <55AE2497.8040405@redhat.com>
References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com>
	<559143D2.2020201@oracle.com> <5592662C.40400@oracle.com>
	<5592E75A.1000500@oracle.com>
	<C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
	<55AE2497.8040405@redhat.com>
Message-ID: <55AE7354.1040504@oracle.com>

>>>> Andrew, can you comment on why you decided to stick with absolute
>>>> offsets and not preserving Unsafe.getInt() addressing scheme?
>
>> (The outcome is that the unaligned guys have the same signatures as
>> the aligned ones.)
>
> Indeed it does.
>
> I had a look around at the way Unsafe.getXXX(Object, long) is used.
> One of the most common usages is with arrays.  There, the offset is
> the result of address arithmetic so it cannot be an opaque cookie, and
> there is no way to make it so without breaking all usages with arrays.
Yes, it can't be a completely opaque cookie, but it doesn't mean it 
should be a raw offset.

Array addressing mode is the following:
    BASE + index * SCALE

where both BASE & SCALE are produced by Unsafe.

Such addressing scheme permits encodings which are linear in byte offset.

> Also there is the guarantee that you can use
> Unsafe.getXXXUnaligned(null, address) to fetch data from an absolute
> address in memory.  To discover that this latter usage is explicitly
> allowed surprised me, but it does mean that the offset can not be an
> opaque handle unless we special-case the null form.  And I think we
> don't want to do that.
That's a valid argument. 1-1 correspondence between Unsafe methods & 
machine ops is appealing.

I'm curious do you use any additional addressing modes for unaligned 
variants? getXXX(Ojbect, long) supports: (1) NULL + address, (2) obj + 
offset, and (3) base + index * scale (for arrays). Anything besides that 
for unalinged?

Best regards,
Vladimir Ivanov

From aph at redhat.com  Tue Jul 21 16:51:43 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 21 Jul 2015 17:51:43 +0100
Subject: On constant folding of final field loads
In-Reply-To: <55AE7354.1040504@oracle.com>
References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com>
	<559143D2.2020201@oracle.com> <5592662C.40400@oracle.com>
	<5592E75A.1000500@oracle.com>
	<C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
	<55AE2497.8040405@redhat.com> <55AE7354.1040504@oracle.com>
Message-ID: <55AE789F.9060902@redhat.com>

On 07/21/2015 05:29 PM, Vladimir Ivanov wrote:
>>>>> Andrew, can you comment on why you decided to stick with absolute
>>>>> offsets and not preserving Unsafe.getInt() addressing scheme?
>>
>>> (The outcome is that the unaligned guys have the same signatures as
>>> the aligned ones.)
>>
>> Indeed it does.
>>
>> I had a look around at the way Unsafe.getXXX(Object, long) is used.
>> One of the most common usages is with arrays.  There, the offset is
>> the result of address arithmetic so it cannot be an opaque cookie, and
>> there is no way to make it so without breaking all usages with arrays.
> Yes, it can't be a completely opaque cookie, but it doesn't mean it 
> should be a raw offset.
> 
> Array addressing mode is the following:
>     BASE + index * SCALE
> 
> where both BASE & SCALE are produced by Unsafe.
> 
> Such addressing scheme permits encodings which are linear in byte offset.

Mmm, okay.  I see what you mean: clever tricks are possible with some
encodings.  Even tag bits.  However, I can guarantee you that there is
code in the JDK which knows that ARRAY_BYTE_INDEX_SCALE == 1.  And lots
of other places too, I'm sure.

>> Also there is the guarantee that you can use
>> Unsafe.getXXXUnaligned(null, address) to fetch data from an absolute
>> address in memory.  To discover that this latter usage is explicitly
>> allowed surprised me, but it does mean that the offset can not be an
>> opaque handle unless we special-case the null form.  And I think we
>> don't want to do that.
>
> That's a valid argument. 1-1 correspondence between Unsafe methods & 
> machine ops is appealing.
> 
> I'm curious do you use any additional addressing modes for unaligned 
> variants? getXXX(Object, long) supports: (1) NULL + address, (2) obj + 
> offset, and (3) base + index * scale (for arrays). Anything besides that 
> for unaligned?

I don't think that getXXXUnaligned places any restrictions on its callers
at all.  There aren't any other forms used in the JDK.

Andrew.

From serkan at hazelcast.com  Sat Jul 11 21:58:46 2015
From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=)
Date: Sun, 12 Jul 2015 00:58:46 +0300
Subject: Array accesses using sun.misc.Unsafe cause data corruption or
	SIGSEGV
In-Reply-To: <CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
References: <CA+Mwa3MmEwABf=Ph4_ziFuH7q3_hd3kAA4qfXTi3Uej7kcejEw@mail.gmail.com>
	<CA+Mwa3NcfJwLwtc0w-EW1j3kgL-_5j5o1Mie2ZM9=4ov_BKW7A@mail.gmail.com>
Message-ID: <CA+Mwa3PU=T0mta2VS8N8gbK3QTDYLoFi3Yj0N6cVNS5eiE_nNA@mail.gmail.com>

Hi all,

I have created a webrev for review including the patch and zipped it then
put it to my Dropbox.
Here its public link:
https://www.dropbox.com/s/9122hbk1vdryvby/JDK-8087134.zip?dl=0

Also attached to this mail.

Regards.

On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal <serkan at hazelcast.com> wrote:

> Hi,
>
> I have added some logs to show that problem is caused by double scaling of
> offset (index)
>
> Here is my updated (log messages added) reproducer code:
>
>
> int count = 100000;
> long size = count * 8L;
> long baseAddress = unsafe.allocateMemory(size);
> System.out.println("Start address: " + Long.toHexString(baseAddress) +
>                    ", End address: " + Long.toHexString(baseAddress +
> size));
>
> for (int i = 0; i < count; i++) {
>     long address = baseAddress + (i * 8L);
>     System.out.println(
>         "Normal: " + Long.toHexString(address) + ", " +
>         "If double scaled: " + Long.toHexString(baseAddress + (i * 8L *
> 8L)));
>     long expected = i;
>     unsafe.putLong(address, expected);
>     unsafe.getLong(address);
> }
>
>
> After sometime it crashes as
>
>
> ...
> Current thread (0x0000000002068800):  JavaThread "main" [_thread_in_Java,
> id=10412, stack(0x00000000023f0000,0x00000000024f0000)]
>
> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020
> ...
> ...
>
>
> And here is output of the execution until crash:
>
> Start address: 58bbcfa0, End address: 58c804a0
> Normal: 58bbcfa0, If double scaled: 58bbcfa0
> Normal: 58bbcfa8, If double scaled: 58bbcfe0
> Normal: 58bbcfb0, If double scaled: 58bbd020
> ...
> ...
> Normal: 58c517b0, If double scaled: 59061020
>
>
> As seen from the logs and crash dump, double scaled version of target
> address (*If double scaled: 59061020*) is the same with the problematic
> address (*siginfo: ExceptionCode=0xc0000005, reading address
> 0x0000000059061020*) that causes to crash while accessing it.
>
> So I think, it is obvious that the crash is caused by wrong optimization
> of index value since index is scaled two times (for *Unsafe::put* and
> *Unsafe::get*) instead of only one time. Then double scaled index points
> to invalid memory address.
>
> Regards.
>
> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal <serkan at hazelcast.com> wrote:
>
>> Hi all,
>>
>> I had dived into the issue with JDK-HotSpot commits and
>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a
>>
>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*:
>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) {
>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>   tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>> }
>>
>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) {
>>   if (OptimizeUnsafes) do_UnsafeRawOp(x);
>>   tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>> }
>>
>>
>> So I run the test by calculating address as
>> - *"int * long"* (int is index and long is 8l)
>> - *"long * long"* (the first long is index and the second long is 8l)
>> - *"int * int"* (the first int is index and the second int is 8)
>>
>> Here are the logs:
>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3
>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3
>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0
>>
>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling.
>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to
>> same *"base"* and *"index"* instructions.
>> This means that address is scaled one more time because there should be only one scale.
>>
>>
>> When I debugged the non-problematic run (*"int * int"*),
>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always.
>> So there is no scaling.
>> static bool match_index_and_scale(Instruction*  instr,
>>                                   Instruction** index,
>>                                   int*          log2_scale) {
>>   ...
>>
>>   ArithmeticOp* arith = instr->as_ArithmeticOp();
>>   if (arith != NULL) {
>>      ...
>>   }
>>
>>   return false;
>> }
>>
>>
>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this:
>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) {
>>   Instruction* base = NULL;
>>   Instruction* index = NULL;
>>   int          log2_scale;
>>
>>   if (match(x, &base, &index, &log2_scale)) {
>>     x->set_base(base);
>>     x->set_index(index);    // The fix attempt here    // /////////////////////////////
>>     if (index != NULL) {
>>       if (index->is_pinned()) {
>>         log2_scale = 0;
>>       } else {
>>         if (log2_scale != 0) {
>>           index->pin();
>>         }
>>       }
>>     }    // /////////////////////////////
>>     x->set_log2_scale(log2_scale);
>>     if (PrintUnsafeOptimization) {
>>       tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d",
>>                     x->id(), x->base()->id(), x->index()->id(), x->log2_scale());
>>     }
>>   }
>> }
>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction
>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling.
>>
>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs:
>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0
>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0
>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3
>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0
>>
>> I am not sure my fix attempt is a really fix or maybe there are better fixes.
>>
>> Regards.
>>
>> --
>>
>> Serkan ?ZAL
>>
>>
>>> Btw, (thanks to one my colleagues), when address calculation in the loop is
>>> converted to
>>> long address = baseAddress + (i * 8)
>>> test passes. Only difference is next long pointer is calculated using
>>> integer 8 instead of long 8.
>>> ```
>>> for (int i = 0; i < count; i++) {
>>>     long address = baseAddress + (i * 8); // <--- here, integer 8 instead
>>> of long 8
>>>     long expected = i;
>>>     unsafe.putLong(address, expected);
>>>     long actual = unsafe.getLong(address);
>>>     if (expected != actual) {
>>>         throw new AssertionError("Expected: " + expected + ", Actual: " +
>>> actual);
>>>     }
>>> }
>>> ```
>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan <mehmet at hazelcast.com <http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> wrote:
>>> >* Hi all,
>>> *>
>>> >* While I was testing my app using java 8, I encountered the previously
>>> *>* reported sun.misc.Unsafe issue.
>>> *>
>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 <https://bugs.openjdk.java.net/browse/JDK-8076445>
>>> *>
>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html>
>>> *>
>>> >* Issue status says it's resolved with resolution "Cannot Reproduce".  But
>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and
>>> *>* "1.9.0-ea-b67".
>>> *>
>>> >* Test is very simple:
>>> *>
>>> >* ```
>>> *>* public static void main(String[] args) throws Exception {
>>> *>*         Unsafe unsafe = findUnsafe();
>>> *>*         // 10000 pass
>>> *>*         // 100000 jvm crash
>>> *>*         // 1000000 fail
>>> *>*         int count = 100000;
>>> *>*         long size = count * 8L;
>>> *>*         long baseAddress = unsafe.allocateMemory(size);
>>> *>
>>> >*         try {
>>> *>*             for (int i = 0; i < count; i++) {
>>> *>*                 long address = baseAddress + (i * 8L);
>>> *>
>>> >*                 long expected = i;
>>> *>*                 unsafe.putLong(address, expected);
>>> *>
>>> >*                 long actual = unsafe.getLong(address);
>>> *>
>>> >*                 if (expected != actual) {
>>> *>*                     throw new AssertionError("Expected: " + expected + ",
>>> *>* Actual: " + actual);
>>> *>*                 }
>>> *>*             }
>>> *>*         } finally {
>>> *>*             unsafe.freeMemory(baseAddress);
>>> *>*         }
>>> *>*     }
>>> *>* ```
>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is
>>> *>* failing constantly.
>>> *>
>>> >* - With iteration count 10000, test is passing.
>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV.
>>> *>* - With iteration count 1000000, test is failing with AssertionError.
>>> *>
>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or
>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not
>>> *>* failing at all.
>>> *>
>>> >* I tested on platforms:
>>> *>* - Centos-7/openjdk-1.8.0.45
>>> *>* - OSX/oraclejdk-1.8.0.40
>>> *>* - OSX/oraclejdk-1.8.0.45
>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18
>>> *>* - OSX/oraclejdk-1.9.0-ea-b67
>>> *>
>>> >* Previous issue comment (
>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 <https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043>)
>>> *>* says "Cannot reproduce based on the latest version". I hope that latest
>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because
>>> *>* both are failing.
>>> *>
>>> >* I'm looking forward to hearing from you.
>>> *>
>>> >* Thanks,
>>> *>* -Mehmet Dogan-
>>> *>* --
>>> *>
>>> >* @mmdogan
>>> *>
>>
>>
>> --
>> Serkan ?ZAL
>> Remotest Software Engineer
>> GSM: +90 542 680 39 18
>> Twitter: @serkan_ozal
>>
>
>
>
> --
> Serkan ?ZAL
> Remotest Software Engineer
> GSM: +90 542 680 39 18
> Twitter: @serkan_ozal
>


-- 
Serkan ?ZAL
Remotest Software Engineer
GSM: +90 542 680 39 18
Twitter: @serkan_ozal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150712/eeeedd18/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: JDK-8087134.zip
Type: application/zip
Size: 84609 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150712/eeeedd18/JDK-8087134-0001.zip>

From irogers at google.com  Tue Jul 21 17:24:32 2015
From: irogers at google.com (Ian Rogers)
Date: Tue, 21 Jul 2015 10:24:32 -0700
Subject: On constant folding of final field loads
In-Reply-To: <558DFBEC.6040700@oracle.com>
References: <558DFBEC.6040700@oracle.com>
Message-ID: <CAP-5=fXeopVUCqOGC568m_hBupYCdz_fg7QF5HXqM3LZDvQ1aw@mail.gmail.com>

Fwiw, Jikes RVM will propagate final field values without any checking if
reflection or JNI is being used to abuse the meaning of finality. As Jikes
RVM is seldom used to run anything other than (elderly) benchmarks this has
only turned up one problem, specifically in jython:

http://bugs.jython.org/issue1611

In the bug there is some (hopefully) interesting discussion on the meaning
of finality, notably that the language and VM spec disagree on when final
fields may be written. This was why the problem hadn't occurred with Java
bytecode but did occur with Jython generated bytecode.

A related topic maybe that perhaps Java finalizers should be allowed to
write to final fields too, in particular for the case that they represent
some kind of native resource.

Thanks,
Ian Rogers, Google.


On Fri, Jun 26, 2015 at 6:27 PM, Vladimir Ivanov <
vladimir.x.ivanov at oracle.com> wrote:

> Hi there,
>
> Recently I started looking at constant folding of loads from instance
> final fields:
>   https://bugs.openjdk.java.net/browse/JDK-8058164
>
> I made some progress and wanted to share my findings and initiate a
> discussion about the problems I spotted.
>
> Current prototype:
>   http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot
>   http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk
>
> The idea is simple: JIT tracks final field changes and throws away
> nmethods which are affected.
>
> There are 2 parts of the problem:
>   - how to track changes of final field
>   - how to manage relation between final fields and nmethods;
>
> I. Tracking changes of final fields
>
> There are 4 ways to circumvent runtime limitations and change a final
> field value:
>   - Reflection API (Field.setAccessible())
>   - Unsafe
>   - JNI
>   - java.lang.invoke (MethodHandles)
>
> (It's also possible to write to a final field in a constructor, but I
> consider it as a corner case and haven't addressed yet. VM can ignore )
>
> Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with only
> 2 cases: JNI & Unsafe.
>
> For JNI it's possible to encode field "finality" in jfieldID and check
> corresponding bit in Set*Field JNI methods before updating a field. There
> are already some data encoded in the field ID, so extending it to record
> final bit as well went pretty smooth.
>
> For Unsafe it's much more complex.
> I started with a similar approach (mostly implemented in the current
> prototype) - record "finality" bit in offset cookie and check it when
> performing a write.
>
> Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly
> states that returned value is not guaranteed to be a byte offset [1], after
> following that road I don't see how offset encoding scheme can be changed.
>
> First of all, there are Unsafe.get*Unaligned methods (added in 9), which
> require byte offset (Unsafe.getLong()):
>   "Fetches a value at some byte offset into a given Java object
>    ...
>    The specification of this method is the same as {@link
>    #getLong(Object, long)} except that the offset does not need to
>    have been obtained from {@link #objectFieldOffset} on the
>    {@link java.lang.reflect.Field} of some Java field."
>
> Unsafe.getInt supports 3 addressing modes:
>   (1) NULL + address
>   (2) oop + offset
>   (3) base + index * scale
>
> Since there are no methods in Unsafe to get byte offsets, there's no way
> to make (3) work with non-byte offsets for Unaligned versions. Both base
> and scale should be either byte offsets or offset cookies to make things
> work. You can get a sense of the problems looking into Unsafe & java.nio
> hacks I did to make things somewhat function after switching offset
> encoding strategy.
>
> Also, Unsafe.copyMemory() doesn't work well with offset cookies (see
> java.nio.Bits changes I did). Though source and destination addressing
> shares the same mode with Unsage.getInt() et al., the size of the copied
> region is defined in bytes. So, in order to perform bulk copies of
> consecutive memory blocks, the user should be able to convert offset cookie
> to byte offset and vice versa. There's no way to solve that with current
> API right now.
>
> I don't want to touch compatibility concerns of switching from byte
> offsets to encoded offsets, but it looks like Unsafe API needs some
> overhaul in 9 to make offset encoding viable.
>
> More realistically, since there are external dependencies on Unsafe API,
> I'd prefer to leave sun.misc.Unsafe as is and switch to VarHandles (when
> they are available in 9) all over JDK. Or temporarily make a private copy
> (finally :-)) of field accessors from Unsafe, switch it to encoded offsets,
> and use it in Reflection & java.lang.invoke API.
>
> Regarding alternative approaches to track the finality, an offset bitmap
> on per-class basis can be used (containing locations of final fields).
> Possible downsides are: (1) memory footprint (1/8th of instance size per
> class); and (2) more complex checking logic (load a relevant piece of a
> bitmap from a klass, instead of checking locally available offset cookie).
> The advantage is that it is completely transparent to a user: it doesn't
> change offset translation scheme.
>
>
> II. Managing relations between final fields and nmethods
>
> Nmethods dependencies suits that purpose pretty well, but some
> enhancements are needed.
>
> I envision 2 types of dependencies: (1) per-class (field holder); and (2)
> per-instance (value holder). Field holder is used as a context.
>
> Unless a final field is changed, there's no need to track per-instance
> dependency. VM optimistically starts in per-class mode and switch to
> per-instance mode when it sees a field change. The policy is chosen on
> per-class basis. VM should be pretty conservative, since false positives
> are expensive - a change of unrelated field causes recompilation of all
> nmethods where the same field was inlined (even if the value was taken from
> a different instance). Aliasing also causes false positives (same instance,
> but different final field), so fields in the same class should be
> differentiated as well.
>
> Unilke methods, fields don't have any dedicated metadata associated with
> them. All data is confined in holder klass. To be able to identify a field
> in a dependency, byte offset can be used. Right now, dependency management
> machinery supports only oops and metadata. So, it should be extended to
> support primitive values in dependencies (haven't done yet).
>
> Byte offset + per-instance modes completely eliminates false positives.
>
> Another aspect is how expensive dependency checking becomes.
>
> I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle
> inlining heavily relies on constant folding of instance final fields.
>
>                     Before   After
> checks (#)          420       12,5K
> nmethods checked(#)  3K       1,5M
> total time:         60ms       2s
> deps total          19K        26K
>
> Though total number of dependencies in VM didn't change much (+37% =
> 19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) and time
> spent on dependency checking (30x: 60ms -> 2s) dramatically increased.
>
> The reason is that constant field value dependencies created heavily
> populated contextes which are regularly checked:
>
>      #1                #2    #3/#4
> Before
>   KlassDep            254    47/2,632
>   CallSiteDep         167    46/  358
>
> After
>   ConstantFieldDep 11,790     0/1,494,112
>   KlassDep            286    41/    2,769
>   CallSiteDep         249    58/      393
>
> (#1 - dependency kind; #2 - total number of unique dependencies;
> #3/#4 - invalidated nmethods/checked dependencies)
>
> I have 3 ideas how to improve performance of dependency checking:
>
>   (1) split dependency context list (nmethodBucket) into 3 independent
> lists (Klass, CallSite & ConstantValue); (IMPLEMENTED)
>
> It trades size for speed - duplicate nmethods are possible, but the lists
> should be shorter on average. I already implemented it, but it didn't
> improve the benchmark I'm playing with, since the fraction of
> CallSite/Klass deps is very small compared to ConstantField.
>
>   (2) group nmethodBucket entries into chunks of k-nmethods; (TODO)
>
> It should improve nmethod iteration speed in heavily populated contexts.
>
>   (3) iterate only dependencies of appropriate kind; (TODO)
>
> There are 3 kinds of changes which require dependency checking: changes in
> CHA (KlassDepChange), call site target change (CallSiteDepChange), and
> constant field value change (ConstantFieldDepChange). Different types of
> changes affect disjoint sets of dependencies. So, instead of enumerating
> all dependencies in a nmethod, a more focused approach can be used (e.g.
> check only call_site_target_value deps for CallSiteDepChange).
>
> Since dependencies are sorted by type when serialized in a nmethod, it's
> possible to compute offsets for 3 disjoint sets and use them in DepStream
> to iterate only relevant dependencies.
>
> I hope it'll significantly reduce dependency checking costs I'm seeing.
>
> That's all for now. Thanks!
>
> Best regards,
> Vladimir Ivanov
>
> [1] "Do not expect to perform any sort of arithmetic on this offset;
>      it is just a cookie which is passed to the unsafe heap memory
>      accessors."
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150721/1239a799/attachment.html>

From vitalyd at gmail.com  Tue Jul 21 17:51:52 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 21 Jul 2015 13:51:52 -0400
Subject: On constant folding of final field loads
In-Reply-To: <CAP-5=fXeopVUCqOGC568m_hBupYCdz_fg7QF5HXqM3LZDvQ1aw@mail.gmail.com>
References: <558DFBEC.6040700@oracle.com>
	<CAP-5=fXeopVUCqOGC568m_hBupYCdz_fg7QF5HXqM3LZDvQ1aw@mail.gmail.com>
Message-ID: <CAHjP37EM5S8V63bkZL7zU0td5fek0wro9omcS7V7OgHv3Cc1rQ@mail.gmail.com>

That jython bug report seems to be talking about static final fields
though, whereas this thread is about instance final fields.  Hotspot
already considers static final fields as constants, and will propagate that
into generated code; unfortunately, it doesn't do this for instance final
fields.  The problem here is that there are known java libs (and likely
numerous internal/non-public ones) that use/rely on mutating final instance
fields, and so blindly disregarding reflection/JNI/etc will break them
(IMHO, they deserve it though! :)).

On Tue, Jul 21, 2015 at 1:24 PM, Ian Rogers <irogers at google.com> wrote:

> Fwiw, Jikes RVM will propagate final field values without any checking if
> reflection or JNI is being used to abuse the meaning of finality. As Jikes
> RVM is seldom used to run anything other than (elderly) benchmarks this has
> only turned up one problem, specifically in jython:
>
> http://bugs.jython.org/issue1611
>
> In the bug there is some (hopefully) interesting discussion on the meaning
> of finality, notably that the language and VM spec disagree on when final
> fields may be written. This was why the problem hadn't occurred with Java
> bytecode but did occur with Jython generated bytecode.
>
> A related topic maybe that perhaps Java finalizers should be allowed to
> write to final fields too, in particular for the case that they represent
> some kind of native resource.
>
> Thanks,
> Ian Rogers, Google.
>
>
> On Fri, Jun 26, 2015 at 6:27 PM, Vladimir Ivanov <
> vladimir.x.ivanov at oracle.com> wrote:
>
>> Hi there,
>>
>> Recently I started looking at constant folding of loads from instance
>> final fields:
>>   https://bugs.openjdk.java.net/browse/JDK-8058164
>>
>> I made some progress and wanted to share my findings and initiate a
>> discussion about the problems I spotted.
>>
>> Current prototype:
>>   http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot
>>   http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk
>>
>> The idea is simple: JIT tracks final field changes and throws away
>> nmethods which are affected.
>>
>> There are 2 parts of the problem:
>>   - how to track changes of final field
>>   - how to manage relation between final fields and nmethods;
>>
>> I. Tracking changes of final fields
>>
>> There are 4 ways to circumvent runtime limitations and change a final
>> field value:
>>   - Reflection API (Field.setAccessible())
>>   - Unsafe
>>   - JNI
>>   - java.lang.invoke (MethodHandles)
>>
>> (It's also possible to write to a final field in a constructor, but I
>> consider it as a corner case and haven't addressed yet. VM can ignore )
>>
>> Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with only
>> 2 cases: JNI & Unsafe.
>>
>> For JNI it's possible to encode field "finality" in jfieldID and check
>> corresponding bit in Set*Field JNI methods before updating a field. There
>> are already some data encoded in the field ID, so extending it to record
>> final bit as well went pretty smooth.
>>
>> For Unsafe it's much more complex.
>> I started with a similar approach (mostly implemented in the current
>> prototype) - record "finality" bit in offset cookie and check it when
>> performing a write.
>>
>> Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly
>> states that returned value is not guaranteed to be a byte offset [1], after
>> following that road I don't see how offset encoding scheme can be changed.
>>
>> First of all, there are Unsafe.get*Unaligned methods (added in 9), which
>> require byte offset (Unsafe.getLong()):
>>   "Fetches a value at some byte offset into a given Java object
>>    ...
>>    The specification of this method is the same as {@link
>>    #getLong(Object, long)} except that the offset does not need to
>>    have been obtained from {@link #objectFieldOffset} on the
>>    {@link java.lang.reflect.Field} of some Java field."
>>
>> Unsafe.getInt supports 3 addressing modes:
>>   (1) NULL + address
>>   (2) oop + offset
>>   (3) base + index * scale
>>
>> Since there are no methods in Unsafe to get byte offsets, there's no way
>> to make (3) work with non-byte offsets for Unaligned versions. Both base
>> and scale should be either byte offsets or offset cookies to make things
>> work. You can get a sense of the problems looking into Unsafe & java.nio
>> hacks I did to make things somewhat function after switching offset
>> encoding strategy.
>>
>> Also, Unsafe.copyMemory() doesn't work well with offset cookies (see
>> java.nio.Bits changes I did). Though source and destination addressing
>> shares the same mode with Unsage.getInt() et al., the size of the copied
>> region is defined in bytes. So, in order to perform bulk copies of
>> consecutive memory blocks, the user should be able to convert offset cookie
>> to byte offset and vice versa. There's no way to solve that with current
>> API right now.
>>
>> I don't want to touch compatibility concerns of switching from byte
>> offsets to encoded offsets, but it looks like Unsafe API needs some
>> overhaul in 9 to make offset encoding viable.
>>
>> More realistically, since there are external dependencies on Unsafe API,
>> I'd prefer to leave sun.misc.Unsafe as is and switch to VarHandles (when
>> they are available in 9) all over JDK. Or temporarily make a private copy
>> (finally :-)) of field accessors from Unsafe, switch it to encoded offsets,
>> and use it in Reflection & java.lang.invoke API.
>>
>> Regarding alternative approaches to track the finality, an offset bitmap
>> on per-class basis can be used (containing locations of final fields).
>> Possible downsides are: (1) memory footprint (1/8th of instance size per
>> class); and (2) more complex checking logic (load a relevant piece of a
>> bitmap from a klass, instead of checking locally available offset cookie).
>> The advantage is that it is completely transparent to a user: it doesn't
>> change offset translation scheme.
>>
>>
>> II. Managing relations between final fields and nmethods
>>
>> Nmethods dependencies suits that purpose pretty well, but some
>> enhancements are needed.
>>
>> I envision 2 types of dependencies: (1) per-class (field holder); and (2)
>> per-instance (value holder). Field holder is used as a context.
>>
>> Unless a final field is changed, there's no need to track per-instance
>> dependency. VM optimistically starts in per-class mode and switch to
>> per-instance mode when it sees a field change. The policy is chosen on
>> per-class basis. VM should be pretty conservative, since false positives
>> are expensive - a change of unrelated field causes recompilation of all
>> nmethods where the same field was inlined (even if the value was taken from
>> a different instance). Aliasing also causes false positives (same instance,
>> but different final field), so fields in the same class should be
>> differentiated as well.
>>
>> Unilke methods, fields don't have any dedicated metadata associated with
>> them. All data is confined in holder klass. To be able to identify a field
>> in a dependency, byte offset can be used. Right now, dependency management
>> machinery supports only oops and metadata. So, it should be extended to
>> support primitive values in dependencies (haven't done yet).
>>
>> Byte offset + per-instance modes completely eliminates false positives.
>>
>> Another aspect is how expensive dependency checking becomes.
>>
>> I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle
>> inlining heavily relies on constant folding of instance final fields.
>>
>>                     Before   After
>> checks (#)          420       12,5K
>> nmethods checked(#)  3K       1,5M
>> total time:         60ms       2s
>> deps total          19K        26K
>>
>> Though total number of dependencies in VM didn't change much (+37% =
>> 19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) and time
>> spent on dependency checking (30x: 60ms -> 2s) dramatically increased.
>>
>> The reason is that constant field value dependencies created heavily
>> populated contextes which are regularly checked:
>>
>>      #1                #2    #3/#4
>> Before
>>   KlassDep            254    47/2,632
>>   CallSiteDep         167    46/  358
>>
>> After
>>   ConstantFieldDep 11,790     0/1,494,112
>>   KlassDep            286    41/    2,769
>>   CallSiteDep         249    58/      393
>>
>> (#1 - dependency kind; #2 - total number of unique dependencies;
>> #3/#4 - invalidated nmethods/checked dependencies)
>>
>> I have 3 ideas how to improve performance of dependency checking:
>>
>>   (1) split dependency context list (nmethodBucket) into 3 independent
>> lists (Klass, CallSite & ConstantValue); (IMPLEMENTED)
>>
>> It trades size for speed - duplicate nmethods are possible, but the lists
>> should be shorter on average. I already implemented it, but it didn't
>> improve the benchmark I'm playing with, since the fraction of
>> CallSite/Klass deps is very small compared to ConstantField.
>>
>>   (2) group nmethodBucket entries into chunks of k-nmethods; (TODO)
>>
>> It should improve nmethod iteration speed in heavily populated contexts.
>>
>>   (3) iterate only dependencies of appropriate kind; (TODO)
>>
>> There are 3 kinds of changes which require dependency checking: changes
>> in CHA (KlassDepChange), call site target change (CallSiteDepChange), and
>> constant field value change (ConstantFieldDepChange). Different types of
>> changes affect disjoint sets of dependencies. So, instead of enumerating
>> all dependencies in a nmethod, a more focused approach can be used (e.g.
>> check only call_site_target_value deps for CallSiteDepChange).
>>
>> Since dependencies are sorted by type when serialized in a nmethod, it's
>> possible to compute offsets for 3 disjoint sets and use them in DepStream
>> to iterate only relevant dependencies.
>>
>> I hope it'll significantly reduce dependency checking costs I'm seeing.
>>
>> That's all for now. Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] "Do not expect to perform any sort of arithmetic on this offset;
>>      it is just a cookie which is passed to the unsafe heap memory
>>      accessors."
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150721/59d5cb15/attachment-0001.html>

From tobias.hartmann at oracle.com  Tue Jul 21 18:00:55 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 21 Jul 2015 20:00:55 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55AE4BC2.1090104@oracle.com>
References: <55AE4BC2.1090104@oracle.com>
Message-ID: <55AE88D7.4080005@oracle.com>

[CC'ing aarch64-port-dev because aarch64 files are affected]

On 21.07.2015 15:40, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch.
> 
> https://bugs.openjdk.java.net/browse/JDK-8130309
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
> 
> Problem:
> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
> 
> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
> 
> Solution:
> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. 
> 
> Testing:
> - Failing test
> - JPRT
> 
> Thanks,
> Tobias
> 

From vladimir.x.ivanov at oracle.com  Tue Jul 21 18:20:10 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 21 Jul 2015 21:20:10 +0300
Subject: [9] RFR (XS): 8131675: EA fails with assert(false) failed: not unsafe
	or G1 barrier raw StoreP
Message-ID: <55AE8D5A.1020906@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8131675/webrev.00
https://bugs.openjdk.java.net/browse/JDK-8131675

Newly introduced UnsafeGetConstantField test revealed an uncovered case 
in EA for unsafe accesses.

The following code:

   U.putAddress(nativeAddr, 0x12345678L);

where
   static final long nativeAddr = U.allocateMemory(16);
   static final Unsafe U = Unsafe.getUnsafe();

is parsed into the following IR:

   StoreP (ctrl) (mem) ConP ConP

But EA doesn't expect to see a constant address and falls through the 
unsafe access detection logic hitting the assert right away.

The fix is to treat all stores to raw pointers (except G1 barriers) as 
unsafe accesses and mark stored values as escaped.

Testing: failed test, jprt

Best regards,
Vladimir Ivanov

[1] test/compiler/unsafe/UnsafeGetConstantField.java

From dean.long at oracle.com  Tue Jul 21 18:21:52 2015
From: dean.long at oracle.com (Dean Long)
Date: Tue, 21 Jul 2015 11:21:52 -0700
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <55AE4BC2.1090104@oracle.com>
References: <55AE4BC2.1090104@oracle.com>
Message-ID: <55AE8DC0.5050000@oracle.com>

Looks good to me.

dl

On 7/21/2015 6:40 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch.
>
> https://bugs.openjdk.java.net/browse/JDK-8130309
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
>
> Problem:
> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
>
> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
>
> Solution:
> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed.
>
> Testing:
> - Failing test
> - JPRT
>
> Thanks,
> Tobias


From vladimir.x.ivanov at oracle.com  Tue Jul 21 18:48:07 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 21 Jul 2015 21:48:07 +0300
Subject: On constant folding of final field loads
In-Reply-To: <CAP-5=fXeopVUCqOGC568m_hBupYCdz_fg7QF5HXqM3LZDvQ1aw@mail.gmail.com>
References: <558DFBEC.6040700@oracle.com>
	<CAP-5=fXeopVUCqOGC568m_hBupYCdz_fg7QF5HXqM3LZDvQ1aw@mail.gmail.com>
Message-ID: <55AE93E7.9090608@oracle.com>

Ian,

Thanks for sharing the reference! Though jython problem was about static 
final fields, it is still applicable to the discussion. Multiple writes 
to instance final fields are allowed by JVMS and should be correctly 
handled.

There's still some mismatch in field finality between JVMS & JLS, but it 
seems JVMS was slightly tied up since 2010. JVM is required to throw 
runtime exception when attempting to write to a final field from outside 
of <init>/<clinit> [1].

JVMS 6 [2] allowed to write to final fields in current class.

Fixing the problem for static final fields doesn't look like a hard 
problem. JIT can inspect field holder class state and skip the 
optimization if the class hasn't been fully initialized yet.

Best regards,
Vladimir Ivanov

[1] 
http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html#jvms-6.5.putfield:
   "Otherwise, if the field is final, it must be declared in the current 
class, and the instruction must occur in an instance initialization 
method (<init>) of the current class. Otherwise, an IllegalAccessError 
is thrown."

[2] 
https://docs.oracle.com/javase/specs/jvms/se6/html/Instructions2.doc11.html
"Otherwise, if the field is final, it must be declared in the current 
class. Otherwise, an IllegalAccessError is thrown."

On 7/21/15 8:24 PM, Ian Rogers wrote:
> Fwiw, Jikes RVM will propagate final field values without any checking
> if reflection or JNI is being used to abuse the meaning of finality. As
> Jikes RVM is seldom used to run anything other than (elderly) benchmarks
> this has only turned up one problem, specifically in jython:
>
> http://bugs.jython.org/issue1611
>
> In the bug there is some (hopefully) interesting discussion on the
> meaning of finality, notably that the language and VM spec disagree on
> when final fields may be written. This was why the problem hadn't
> occurred with Java bytecode but did occur with Jython generated bytecode.
>
> A related topic maybe that perhaps Java finalizers should be allowed to
> write to final fields too, in particular for the case that they
> represent some kind of native resource.
>
> Thanks,
> Ian Rogers, Google.
>
>
> On Fri, Jun 26, 2015 at 6:27 PM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>
>     Hi there,
>
>     Recently I started looking at constant folding of loads from
>     instance final fields:
>     https://bugs.openjdk.java.net/browse/JDK-8058164
>
>     I made some progress and wanted to share my findings and initiate a
>     discussion about the problems I spotted.
>
>     Current prototype:
>     http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot
>     http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk
>
>     The idea is simple: JIT tracks final field changes and throws away
>     nmethods which are affected.
>
>     There are 2 parts of the problem:
>        - how to track changes of final field
>        - how to manage relation between final fields and nmethods;
>
>     I. Tracking changes of final fields
>
>     There are 4 ways to circumvent runtime limitations and change a
>     final field value:
>        - Reflection API (Field.setAccessible())
>        - Unsafe
>        - JNI
>        - java.lang.invoke (MethodHandles)
>
>     (It's also possible to write to a final field in a constructor, but
>     I consider it as a corner case and haven't addressed yet. VM can
>     ignore )
>
>     Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with
>     only 2 cases: JNI & Unsafe.
>
>     For JNI it's possible to encode field "finality" in jfieldID and
>     check corresponding bit in Set*Field JNI methods before updating a
>     field. There are already some data encoded in the field ID, so
>     extending it to record final bit as well went pretty smooth.
>
>     For Unsafe it's much more complex.
>     I started with a similar approach (mostly implemented in the current
>     prototype) - record "finality" bit in offset cookie and check it
>     when performing a write.
>
>     Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly
>     states that returned value is not guaranteed to be a byte offset
>     [1], after following that road I don't see how offset encoding
>     scheme can be changed.
>
>     First of all, there are Unsafe.get*Unaligned methods (added in 9),
>     which require byte offset (Unsafe.getLong()):
>        "Fetches a value at some byte offset into a given Java object
>         ...
>         The specification of this method is the same as {@link
>         #getLong(Object, long)} except that the offset does not need to
>         have been obtained from {@link #objectFieldOffset} on the
>         {@link java.lang.reflect.Field} of some Java field."
>
>     Unsafe.getInt supports 3 addressing modes:
>        (1) NULL + address
>        (2) oop + offset
>        (3) base + index * scale
>
>     Since there are no methods in Unsafe to get byte offsets, there's no
>     way to make (3) work with non-byte offsets for Unaligned versions.
>     Both base and scale should be either byte offsets or offset cookies
>     to make things work. You can get a sense of the problems looking
>     into Unsafe & java.nio hacks I did to make things somewhat function
>     after switching offset encoding strategy.
>
>     Also, Unsafe.copyMemory() doesn't work well with offset cookies (see
>     java.nio.Bits changes I did). Though source and destination
>     addressing shares the same mode with Unsage.getInt() et al., the
>     size of the copied region is defined in bytes. So, in order to
>     perform bulk copies of consecutive memory blocks, the user should be
>     able to convert offset cookie to byte offset and vice versa. There's
>     no way to solve that with current API right now.
>
>     I don't want to touch compatibility concerns of switching from byte
>     offsets to encoded offsets, but it looks like Unsafe API needs some
>     overhaul in 9 to make offset encoding viable.
>
>     More realistically, since there are external dependencies on Unsafe
>     API, I'd prefer to leave sun.misc.Unsafe as is and switch to
>     VarHandles (when they are available in 9) all over JDK. Or
>     temporarily make a private copy (finally :-)) of field accessors
>     from Unsafe, switch it to encoded offsets, and use it in Reflection
>     & java.lang.invoke API.
>
>     Regarding alternative approaches to track the finality, an offset
>     bitmap on per-class basis can be used (containing locations of final
>     fields). Possible downsides are: (1) memory footprint (1/8th of
>     instance size per class); and (2) more complex checking logic (load
>     a relevant piece of a bitmap from a klass, instead of checking
>     locally available offset cookie). The advantage is that it is
>     completely transparent to a user: it doesn't change offset
>     translation scheme.
>
>
>     II. Managing relations between final fields and nmethods
>
>     Nmethods dependencies suits that purpose pretty well, but some
>     enhancements are needed.
>
>     I envision 2 types of dependencies: (1) per-class (field holder);
>     and (2) per-instance (value holder). Field holder is used as a context.
>
>     Unless a final field is changed, there's no need to track
>     per-instance dependency. VM optimistically starts in per-class mode
>     and switch to per-instance mode when it sees a field change. The
>     policy is chosen on per-class basis. VM should be pretty
>     conservative, since false positives are expensive - a change of
>     unrelated field causes recompilation of all nmethods where the same
>     field was inlined (even if the value was taken from a different
>     instance). Aliasing also causes false positives (same instance, but
>     different final field), so fields in the same class should be
>     differentiated as well.
>
>     Unilke methods, fields don't have any dedicated metadata associated
>     with them. All data is confined in holder klass. To be able to
>     identify a field in a dependency, byte offset can be used. Right
>     now, dependency management machinery supports only oops and
>     metadata. So, it should be extended to support primitive values in
>     dependencies (haven't done yet).
>
>     Byte offset + per-instance modes completely eliminates false positives.
>
>     Another aspect is how expensive dependency checking becomes.
>
>     I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle
>     inlining heavily relies on constant folding of instance final fields.
>
>                          Before   After
>     checks (#)          420       12,5K
>     nmethods checked(#)  3K       1,5M
>     total time:         60ms       2s
>     deps total          19K        26K
>
>     Though total number of dependencies in VM didn't change much (+37% =
>     19K->26K), total number of checked dependencies (500x: 3K -> 1,5M)
>     and time spent on dependency checking (30x: 60ms -> 2s) dramatically
>     increased.
>
>     The reason is that constant field value dependencies created heavily
>     populated contextes which are regularly checked:
>
>           #1                #2    #3/#4
>     Before
>        KlassDep            254    47/2,632
>        CallSiteDep         167    46/  358
>
>     After
>        ConstantFieldDep 11,790     0/1,494,112
>        KlassDep            286    41/    2,769
>        CallSiteDep         249    58/      393
>
>     (#1 - dependency kind; #2 - total number of unique dependencies;
>     #3/#4 - invalidated nmethods/checked dependencies)
>
>     I have 3 ideas how to improve performance of dependency checking:
>
>        (1) split dependency context list (nmethodBucket) into 3
>     independent lists (Klass, CallSite & ConstantValue); (IMPLEMENTED)
>
>     It trades size for speed - duplicate nmethods are possible, but the
>     lists should be shorter on average. I already implemented it, but it
>     didn't improve the benchmark I'm playing with, since the fraction of
>     CallSite/Klass deps is very small compared to ConstantField.
>
>        (2) group nmethodBucket entries into chunks of k-nmethods; (TODO)
>
>     It should improve nmethod iteration speed in heavily populated contexts.
>
>        (3) iterate only dependencies of appropriate kind; (TODO)
>
>     There are 3 kinds of changes which require dependency checking:
>     changes in CHA (KlassDepChange), call site target change
>     (CallSiteDepChange), and constant field value change
>     (ConstantFieldDepChange). Different types of changes affect disjoint
>     sets of dependencies. So, instead of enumerating all dependencies in
>     a nmethod, a more focused approach can be used (e.g. check only
>     call_site_target_value deps for CallSiteDepChange).
>
>     Since dependencies are sorted by type when serialized in a nmethod,
>     it's possible to compute offsets for 3 disjoint sets and use them in
>     DepStream to iterate only relevant dependencies.
>
>     I hope it'll significantly reduce dependency checking costs I'm seeing.
>
>     That's all for now. Thanks!
>
>     Best regards,
>     Vladimir Ivanov
>
>     [1] "Do not expect to perform any sort of arithmetic on this offset;
>           it is just a cookie which is passed to the unsafe heap memory
>           accessors."
>
>

From john.r.rose at oracle.com  Tue Jul 21 19:02:43 2015
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 21 Jul 2015 12:02:43 -0700
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55AE62FE.4070502@oracle.com>
References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com>
	<55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com>
	<55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com>
	<7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>
	<55AE62FE.4070502@oracle.com>
Message-ID: <FCF93687-81FE-4A9E-9BE9-065754AF94A1@oracle.com>

Yes, that will work, and I think it is cleaner than what we had before, as well as providing the new required functionality.

Reviewed; please get a second reviewer.

? John

P.S. If the unit tests want to test (via the whitebox API) whether an intrinsic was compiled successfully, we might want to expose Compile::gather_intrinsic_statistics, etc.  But not in this change set.

P.P.S. As I think I said before, I wish we had a way to consolidate the switch statements further (into vmSymbols.hpp).  But I don't see a clean way to do it.

On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? <zoltan.majo at oracle.com> wrote:
> 
> Here is the newest webrev:
> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/ <http://cr.openjdk.java.net/~zmajo/8130832/top/>
> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ <http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150721/19e86c5e/attachment-0001.html>

From john.r.rose at oracle.com  Tue Jul 21 19:04:42 2015
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 21 Jul 2015 12:04:42 -0700
Subject: On constant folding of final field loads
In-Reply-To: <55AE93E7.9090608@oracle.com>
References: <558DFBEC.6040700@oracle.com>
	<CAP-5=fXeopVUCqOGC568m_hBupYCdz_fg7QF5HXqM3LZDvQ1aw@mail.gmail.com>
	<55AE93E7.9090608@oracle.com>
Message-ID: <2BB402A2-0A7B-4DF6-BE81-CC336663CEE7@oracle.com>

On Jul 21, 2015, at 11:48 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Fixing the problem for static final fields doesn't look like a hard problem. JIT can inspect field holder class state and skip the optimization if the class hasn't been fully initialized yet.

And a slushy bit (for use by reflection, etc.) can handle the corresponding non-static case.  ? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150721/7c6f003c/attachment.html>

From dean.long at oracle.com  Tue Jul 21 20:28:12 2015
From: dean.long at oracle.com (Dean Long)
Date: Tue, 21 Jul 2015 13:28:12 -0700
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <55AD0AE0.3060803@oracle.com>
References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com>
	<55AD0AE0.3060803@oracle.com>
Message-ID: <55AEAB5C.8050307@oracle.com>

This version looks good.

dl

On 7/20/2015 7:51 AM, Aleksey Shipilev wrote:
> Hi Dean,
>
> Thanks for taking a look!
>
> Silly me, I should have left the call patching cases intact, because
> you're right, we should be able to patch the nops partially while still
> producing the correct instruction stream. Therefore, I reverted the
> cases where we do nop-ing for *instruction* patching, and added the
> comment there.
>
> Other places seem to use the nop sequences to provide the alignment, not
> for the general patching. Especially interesting for us is the case of
> aligning the patcheable immediate in the existing call. C2 does the nops
> in these cases.
>
> New webrev:
>    http://cr.openjdk.java.net/~shade/8131682/webrev.01/
>
> Testing:
>    * JPRT -testset hotspot on open platforms;
>    * Targeted benchmarks, plus eyeballing the assembly;
>
> Thanks,
> -Aleksey
>
> On 18.07.2015 10:51, Dean Long wrote:
>> I think we should distinguish the different uses and treat them
>> accordingly:
>>
>> 1) padding nops for patching, executed
>>
>> We need to be careful about inserting a fat nop here, if later patching
>> overwrites only part of the fat nop, resulting in an illegal intruction.
>>
>> 2) padding nops for patching, never executed
>>
>> It should be safe insert a fat nop here, but there's no point if the
>> nops are not reachable and never executed.
>>
>>
>> 3) alignment nops, never patched, executed
>>
>> Fat nops are fine, but on some CPUs branching may be even better, so I
>> suggest using align() for this, and letting align() decide what to
>> generate.  The change in check_icache() could use a version of align
>> that takes the target  offset as an argument:
>>
>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);
>>
>> 4) alignment nops, never patched, never executed
>>
>> Doesn't matter what we emit here, but we might as well make it
>> understandable by humans using a debugger.
>>
>>
>> I believe the patching nops in c1_CodeStubs_x86.cpp and
>> c1_LIRAssembler.cpp are patched concurrently while the code is running,
>> not at a safepoint, so it's not clear to me if it's safe to use fat nops
>> on x86.  I would consider those changes unsafe on x86 without further
>> analysis of what happens during patching.
>>
>> dl
>>
>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
>>> Hi there,
>>>
>>> C1 is not very good at inlining and intrisifying methods, and hence the
>>> call performance is important there. One nit that we can see in the
>>> generated code on x86 is that C1 uses the single-byte nops, even for
>>> long nop strides.
>>>
>>> This improvement fixes that:
>>>     https://bugs.openjdk.java.net/browse/JDK-8131682
>>>     http://cr.openjdk.java.net/~shade/8131682/webrev.00/
>>>
>>> Testing:
>>>     - JPRT -testset hotspot on open platforms
>>>     - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>>>
>>> (I understand the symmetric change is going to be needed in closed
>>> parts, but let's polish the open part first).
>>>
>>> Thanks,
>>> -Aleksey
>>>
>


From tobias.hartmann at oracle.com  Wed Jul 22 06:06:06 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 22 Jul 2015 08:06:06 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55AE8DC0.5050000@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55AE8DC0.5050000@oracle.com>
Message-ID: <55AF32CE.2000008@oracle.com>

Thanks, Dean.

Best,
Tobias

On 21.07.2015 20:21, Dean Long wrote:
> Looks good to me.
> 
> dl
> 
> On 7/21/2015 6:40 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8130309
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
>>
>> Problem:
>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
>>
>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
>>
>> Solution:
>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed.
>>
>> Testing:
>> - Failing test
>> - JPRT
>>
>> Thanks,
>> Tobias
> 

From adinn at redhat.com  Wed Jul 22 07:47:06 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 22 Jul 2015 08:47:06 +0100
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55AE88D7.4080005@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com>
Message-ID: <55AF4A7A.6080505@redhat.com>

Hi Thomas,

The patch looks good so it could be pushed as is as far as I am
concerned (*not* a hotspot/openjdk reviewer).

However, note that the ppc port also suffers from the same problem. It
employs an identical routine emit_trampoline_stub defined in the Arch
Description file (ppc.ad). You might want to include a tweak to the ppc
code as part of this fix or maybe leave it to Volker/Goetz et al.

n.b. the problem also affects both the ppc jdk8 code but is not present
in AArch64 jdk8. Since the ppc fix probably needs a backport it might be
better to leave that fix as a separate step?

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From zoltan.majo at oracle.com  Wed Jul 22 07:59:56 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Wed, 22 Jul 2015 09:59:56 +0200
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <FCF93687-81FE-4A9E-9BE9-065754AF94A1@oracle.com>
References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com>
	<55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com>
	<55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com>
	<7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>
	<55AE62FE.4070502@oracle.com>
	<FCF93687-81FE-4A9E-9BE9-065754AF94A1@oracle.com>
Message-ID: <55AF4D7C.1030706@oracle.com>

Hi John,


On 07/21/2015 09:02 PM, John Rose wrote:
> Yes, that will work, and I think it is cleaner than what we had 
> before, as well as providing the new required functionality.
>
> Reviewed; please get a second reviewer.

thank you for the review! I'll ask Vladimir K., maybe he has time to 
look at the newest webrev.

>
> ? John
>
> P.S. If the unit tests want to test (via the whitebox API) whether an 
> intrinsic was compiled successfully, we might want to 
> expose Compile::gather_intrinsic_statistics, etc.  But not in this 
> change set.

That is an interesting idea. We'd also have to see if current tests 
require such functionality or if SQE plans to add tests requiring that 
functionality.

>
> P.P.S. As I think I said before, I wish we had a way to consolidate 
> the switch statements further (into vmSymbols.hpp).  But I don't see a 
> clean way to do it.

Yes, that would be nice. I've not seen a good way to do that, partly 
because inconsistencies between the way C1 and C2 depends on the value 
of command-line flags.

Thank you!

Best regards,


Zoltan

>
> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? <zoltan.majo at oracle.com 
> <mailto:zoltan.majo at oracle.com>> wrote:
>>
>> Here is the newest webrev:
>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/ 
>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/top/>
>> - 
>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ 
>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/hotspot/webrev.03/>
>


From tobias.hartmann at oracle.com  Wed Jul 22 08:04:57 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 22 Jul 2015 10:04:57 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55AF4A7A.6080505@redhat.com>
References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com>
	<55AF4A7A.6080505@redhat.com>
Message-ID: <55AF4EA9.6070002@oracle.com>

Hi Andrew,

On 22.07.2015 09:47, Andrew Dinn wrote:
> Hi Thomas,
> 
> The patch looks good so it could be pushed as is as far as I am
> concerned (*not* a hotspot/openjdk reviewer).

Thanks for taking a look!

> However, note that the ppc port also suffers from the same problem. It
> employs an identical routine emit_trampoline_stub defined in the Arch
> Description file (ppc.ad). You might want to include a tweak to the ppc
> code as part of this fix or maybe leave it to Volker/Goetz et al.

What problem are you referring to?
The actual problem I fixed with this patch is platform independent. It's caused by C2 code not bailing out if the platform dependent code was unable to create a stub (see changes in 'output.cpp'). I only removed the call to 'start_a_stub' in the aarch64 code because it is useless. I don't see this call on ppc though.

Thanks,
Tobias

> n.b. the problem also affects both the ppc jdk8 code but is not present
> in AArch64 jdk8. Since the ppc fix probably needs a backport it might be
> better to leave that fix as a separate step?
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
> 

From aleksey.shipilev at oracle.com  Wed Jul 22 08:10:51 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 22 Jul 2015 11:10:51 +0300
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <55AEAB5C.8050307@oracle.com>
References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com>
	<55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com>
Message-ID: <55AF500B.9000505@oracle.com>

Thanks for review, Dean!

I'd like to hear the opinions of AArch64 and Power folks, since we
contaminate their assemblers a bit to gain access to x86 fat nops.

-Aleksey

On 21.07.2015 23:28, Dean Long wrote:
> This version looks good.
> 
> dl
> 
> On 7/20/2015 7:51 AM, Aleksey Shipilev wrote:
>> Hi Dean,
>>
>> Thanks for taking a look!
>>
>> Silly me, I should have left the call patching cases intact, because
>> you're right, we should be able to patch the nops partially while still
>> producing the correct instruction stream. Therefore, I reverted the
>> cases where we do nop-ing for *instruction* patching, and added the
>> comment there.
>>
>> Other places seem to use the nop sequences to provide the alignment, not
>> for the general patching. Especially interesting for us is the case of
>> aligning the patcheable immediate in the existing call. C2 does the nops
>> in these cases.
>>
>> New webrev:
>>    http://cr.openjdk.java.net/~shade/8131682/webrev.01/
>>
>> Testing:
>>    * JPRT -testset hotspot on open platforms;
>>    * Targeted benchmarks, plus eyeballing the assembly;
>>
>> Thanks,
>> -Aleksey
>>
>> On 18.07.2015 10:51, Dean Long wrote:
>>> I think we should distinguish the different uses and treat them
>>> accordingly:
>>>
>>> 1) padding nops for patching, executed
>>>
>>> We need to be careful about inserting a fat nop here, if later patching
>>> overwrites only part of the fat nop, resulting in an illegal intruction.
>>>
>>> 2) padding nops for patching, never executed
>>>
>>> It should be safe insert a fat nop here, but there's no point if the
>>> nops are not reachable and never executed.
>>>
>>>
>>> 3) alignment nops, never patched, executed
>>>
>>> Fat nops are fine, but on some CPUs branching may be even better, so I
>>> suggest using align() for this, and letting align() decide what to
>>> generate.  The change in check_icache() could use a version of align
>>> that takes the target  offset as an argument:
>>>
>>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);
>>>
>>> 4) alignment nops, never patched, never executed
>>>
>>> Doesn't matter what we emit here, but we might as well make it
>>> understandable by humans using a debugger.
>>>
>>>
>>> I believe the patching nops in c1_CodeStubs_x86.cpp and
>>> c1_LIRAssembler.cpp are patched concurrently while the code is running,
>>> not at a safepoint, so it's not clear to me if it's safe to use fat nops
>>> on x86.  I would consider those changes unsafe on x86 without further
>>> analysis of what happens during patching.
>>>
>>> dl
>>>
>>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
>>>> Hi there,
>>>>
>>>> C1 is not very good at inlining and intrisifying methods, and hence the
>>>> call performance is important there. One nit that we can see in the
>>>> generated code on x86 is that C1 uses the single-byte nops, even for
>>>> long nop strides.
>>>>
>>>> This improvement fixes that:
>>>>     https://bugs.openjdk.java.net/browse/JDK-8131682
>>>>     http://cr.openjdk.java.net/~shade/8131682/webrev.00/
>>>>
>>>> Testing:
>>>>     - JPRT -testset hotspot on open platforms
>>>>     - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>>>>
>>>> (I understand the symmetric change is going to be needed in closed
>>>> parts, but let's polish the open part first).
>>>>
>>>> Thanks,
>>>> -Aleksey
>>>>
>>
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150722/97fd7fc4/signature-0001.asc>

From adinn at redhat.com  Wed Jul 22 08:32:59 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 22 Jul 2015 09:32:59 +0100
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55AF4EA9.6070002@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com>
	<55AF4A7A.6080505@redhat.com> <55AF4EA9.6070002@oracle.com>
Message-ID: <55AF553B.4060906@redhat.com>

On 22/07/15 09:04, Tobias Hartmann wrote:
> On 22.07.2015 09:47, Andrew Dinn wrote: . . .
>> However, note that the ppc port also suffers from the same problem.
>> It employs an identical routine emit_trampoline_stub defined in the
>> Arch Description file (ppc.ad). You might want to include a tweak
>> to the ppc code as part of this fix or maybe leave it to
>> Volker/Goetz et al.
> 
> What problem are you referring to? The actual problem I fixed with
> this patch is platform independent. It's caused by C2 code not
> bailing out if the platform dependent code was unable to create a
> stub (see changes in 'output.cpp'). I only removed the call to
> 'start_a_stub' in the aarch64 code because it is useless. I don't see
> this call on ppc though.

Oops, apologies this is my mistake! The AArch4 code was cloned off the
ppc code and I when I looked at the ppc versions this morning I saw they
included the same repeated call to start_a_stub that you removed from
the AArch64 tree. Evidently I was still high on crack from last night's
Dionysian debauch (or something like that :-) since, as you say, there
is no such call.

I'll just go get another cup of coffee . . .

regards,


Andrew Dinn
-----------


From tobias.hartmann at oracle.com  Wed Jul 22 09:09:01 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 22 Jul 2015 11:09:01 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55AF553B.4060906@redhat.com>
References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com>
	<55AF4A7A.6080505@redhat.com> <55AF4EA9.6070002@oracle.com>
	<55AF553B.4060906@redhat.com>
Message-ID: <55AF5DAD.1030608@oracle.com>

Okay, no worries :)

Best,
Tobias

On 22.07.2015 10:32, Andrew Dinn wrote:
> On 22/07/15 09:04, Tobias Hartmann wrote:
>> On 22.07.2015 09:47, Andrew Dinn wrote: . . .
>>> However, note that the ppc port also suffers from the same problem.
>>> It employs an identical routine emit_trampoline_stub defined in the
>>> Arch Description file (ppc.ad). You might want to include a tweak
>>> to the ppc code as part of this fix or maybe leave it to
>>> Volker/Goetz et al.
>>
>> What problem are you referring to? The actual problem I fixed with
>> this patch is platform independent. It's caused by C2 code not
>> bailing out if the platform dependent code was unable to create a
>> stub (see changes in 'output.cpp'). I only removed the call to
>> 'start_a_stub' in the aarch64 code because it is useless. I don't see
>> this call on ppc though.
> 
> Oops, apologies this is my mistake! The AArch4 code was cloned off the
> ppc code and I when I looked at the ppc versions this morning I saw they
> included the same repeated call to start_a_stub that you removed from
> the AArch64 tree. Evidently I was still high on crack from last night's
> Dionysian debauch (or something like that :-) since, as you say, there
> is no such call.
> 
> I'll just go get another cup of coffee . . .
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> 

From edward.nevill at gmail.com  Wed Jul 22 09:50:25 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 22 Jul 2015 10:50:25 +0100
Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle
	large stack offsets
In-Reply-To: <55A8CE3C.1050203@redhat.com>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>
	<55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint>
	<55A8C45D.2070009@redhat.com> <1437124359.29276.18.camel@mint>
	<55A8CAF5.5060601@redhat.com> <55A8CE3C.1050203@redhat.com>
Message-ID: <1437558625.14729.11.camel@mylittlepony.linaroharston>

On Fri, 2015-07-17 at 10:43 +0100, Andrew Haley wrote:
> On 17/07/15 10:29, Andrew Haley wrote:
> > On 17/07/15 10:12, Edward Nevill wrote:
> >> On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote:
> >>> On 17/07/15 09:52, Edward Nevill wrote:
> >>>>>> Should it be +8 instead of +4? Or these offsets are not in bytes?:
> >>>>>>
> >>>>>> +      unspill(rscratch1, true, src_offset);
> >>>>>> +      spill(rscratch1, true, dst_offset);
> >>>>>> +      unspill(rscratch1, true, src_offset+4);
> >>>>>> +      spill(rscratch1, true, dst_offset+4);
> >>>> Ouch! Good catch.
> >>>>
> >>>> New webrev.
> >>>>
> >>>> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/
> >>>
> >>> I'm a bit more concerned that this did not fail in testing.  I guess
> >>> there were no tests at all for stack-stack spills.
> >>
> >> Correct. And it would have to be a 128 bit vector stack-stack spill with
> >> an offset >= 512. How would you even provoke such a thing.
> > 
> > With a highly-vectorizable test case with a zillion temporaries, I guess.
> 
> Thinking some more: I think I'd add some special code to test it all
> once, then delete the special code.  If that's what it takes, there
> isn't much choice.

So what I did was I reduced the number of vector registers from 32 to 2. Even then spill_copy128 was never called. So maybe it just never does stack-stack spills on vector registers.

However it does do stack-stack spills on general purpose registers. So I reduced the number of general purpose registers to 2 and faked the size of the general purpose registers at 128 instead of 64.

spill_copy128 was then called and I verified that the generated code was correct. I also verified that the code was actually being executed by setting a breakpoint on the spill copy code.

I have also verified that both branches of the if in spill_copy128 are tested by changing the condition to force execution of each branch.

So, although we still don't know if it will ever call spill_copy128 for a vector register at least we have confidence that the spill_copy code does the right thing if it is ever called.

OK to push?
Ed.


From roland.westrelin at oracle.com  Wed Jul 22 13:54:56 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 22 Jul 2015 15:54:56 +0200
Subject: [9] RFR (XS): 8131675: EA fails with assert(false) failed: not
	unsafe or G1 barrier raw StoreP
In-Reply-To: <55AE8D5A.1020906@oracle.com>
References: <55AE8D5A.1020906@oracle.com>
Message-ID: <D791A211-8C78-430F-9346-7B60905C7F53@oracle.com>

> http://cr.openjdk.java.net/~vlivanov/8131675/webrev.00

Looks good to me.

Roland.


From cnewland at chrisnewland.com  Wed Jul 22 14:04:03 2015
From: cnewland at chrisnewland.com (Chris Newland)
Date: Wed, 22 Jul 2015 15:04:03 +0100
Subject: Lock coarsening LogCompilation output?
Message-ID: <eab4513168f39aefa8085f33323e9bc0.squirrel@excalibur.xssl.net>

Hi,

I'm building support into JITWatch for highlighting eliminated heap
allocations and locks (either via elision or coarsening) and I'm confused
by the LogCompilation output in this case:

Given the source code here:
https://github.com/AdoptOpenJDK/jitwatch/blob/master/src/main/resources/examples/LockCoarsen.java

Which consists of two synchronized(this) regions separated by a single
statement modifying a local primitive.

I get the following LogCompilation output:
https://gist.github.com/chriswhocodes/124984ce8078290485e7

Which has the following elimination info in the optimizer phase:

<eliminate_lock lock='1'>
<jvms bci='61' method='818'/>
</eliminate_lock>
<eliminate_lock lock='0'>
</eliminate_lock>

BCI 61 refers to the 2nd synchronized block's monitor enter which I
believe will be eliminated due to coarsening but I don't understand what
the

<eliminate_lock lock='0'>
</eliminate_lock>

tag means, and why it doesn't contain a jvms tag?

Does it mean the first synchronized block has also been eliminated (via
elision) ? I don't see any "lock cmpxchg" related to the first
synchronized block in the PrintAssembly?

Thanks,

Chris


From vladimir.x.ivanov at oracle.com  Wed Jul 22 16:14:26 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 22 Jul 2015 19:14:26 +0300
Subject: On constant folding of final field loads
In-Reply-To: <C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com>
	<559143D2.2020201@oracle.com> <5592662C.40400@oracle.com>
	<5592E75A.1000500@oracle.com>
	<C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
Message-ID: <55AFC162.4030807@oracle.com>

John,

>> In order to avoid surprises and inconsistencies (old value vs new value depending on execution path) which are *very* hard to track down, VM should either completely forbid final field changes or keep track of them and adapt accordingly.
>
> I like the "forbid" option, also known as "fail fast".  I think (in general)
> we should (where we can) remove indeterminate behavior from the
> JVM specification, such as "what happens when I store a new value
> to a final at an unexpected time".
>
> We have enough bits in the object header to encode frozen-ness.
> This is an opposite property:  slushiness-of-finals.  We could require
> that the newInstance operation used by deserialization would create
> slushy objects.  (The normal new/<init> sequence doesn't need this.)
> Ideally, we would want the deserializer to issue an explicit "publish"
> operation, which would clear the slushy flag.  JITs would consult
> that flag to gate final-folding.  Reflection (and other users of
> Unsafe) would consult the flag and throw a fail-fast error if it
> failed.  There would have to be some way to limit the time
> an object is in the slushy state, ideally by enforcing an error
> on deserializers who neglect to publish (or discard) a slushy
> object.  For example, we could require an annotation on
> deserialization methods, as we do today on caller-sensitive
> methods.
>
> That's the sort of thing I would prefer to see, to remove
> indeterminate behavior.
Yes, fail-fast approach is very appealing and I like slushy bit idea, 
but it is more intrusive (from user perspective), unfortunately.

Though Reflection & MethodHandles can be instrument with slushy bit 
checks and Unsafe left as-is, what can be done for JNI? Do you think it 
is acceptable for JVM to throw exceptions on "illegal" (slushy bit off) 
final field writes? SetXXXField JNI functions aren't declared to throw 
any exceptions [1], so it seems like an intrusive change, even with JNI 
spec adjustments.

Thinking more about "slushiness", limiting the time an object is in that 
state doesn't look like an easy problem to solve, considering there are 
3 interacting operations (instantiate, initialize, freeze).
VM can do some analysis to ensure slushy objects don't escape, but it 
looks either too fragile or too complex to implement.

I don't see a big problem with "runaway" objects with slushy bit on. It 
means JIT will be always conservative when working with them. Or are you 
mostly concerned about abuse of slushy objects?

It can be mitigated by:
   (1) requiring a user to perform additional actions to set slushy bit 
(e.g. calling specialized newInstance() equivalent or marking caller 
method akin to caller-sensitive methods); in that case it is less likely 
a user won't freeze previously allocated object;

   (2) providing VM diagnostics to detect runaway slushy objects;

In the end, it is expert level API. I don't think many people write 
their own deserialization frameworks :-) If you bungle it, you are on 
your own.

>> Though offset value is explicitly described in API as an opaque offset cookie, I spotted 2 inconsistencies in the API itself:
>>
>>   * Unsafe.get/set*Unaligned() require absolute offsets;
>> These methods were added in 9, so haven't leaked into public yet.
>
> Yep.  That seems to push for a high-tag (color bits in the MSB of
> the offset), or (my preference) no tag or separate tag.
> You could also copy the alignment bits into the LSB to co-exist
> with a tag.
>
> (The "separate tag" option means something like having a
> query for the "tag" as well as the base and offset of a variable.
> The operations getInt, etc., would take an optional third argument,
> which would be the tag associated with the base and offset.
> This would allow address arithmetic to remain trivial, at
> the expense of retooling uses of Unsafe that need to be
> sensitive to tagging concerns.)
Separate tag has the same shortcoming as high-tag: it is not translated 
into a single machine instruction. VM needs to inspect the tag before 
performing a field update, though original setXXX() methods aren't 
affected.

Considering fail-fast approach and slushy bits, I'm inclined to leave 
Unsafe as is (no tag approach). Probably, adding diagnostic mode to VM 
signalling when a final field is updated using Unsafe.

Best regards,
Vladimir Ivanov

[1] 
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Set_type_Field_routines

From vladimir.x.ivanov at oracle.com  Wed Jul 22 17:10:03 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 22 Jul 2015 20:10:03 +0300
Subject: Lock coarsening LogCompilation output?
In-Reply-To: <eab4513168f39aefa8085f33323e9bc0.squirrel@excalibur.xssl.net>
References: <eab4513168f39aefa8085f33323e9bc0.squirrel@excalibur.xssl.net>
Message-ID: <55AFCE6B.1080108@oracle.com>

Chris,

Second eliminate_lock corresponds to unlock operation (lock='0'), which 
should be also eliminated (corresponding diagnostic logic in 9 [1]).

VM doesn't attach JVM state to the unlock node in the product binaries 
(see as_Unlock()->dbg_jvms()), but it is present in fastdebug/slowdebug 
binaries [2].

Not sure why the state is pruned in product. Probably, there are some 
performance after-effects from keeping the state for unlock node.

Best regards,
Vladimir Ivanov

[1] 
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0d3c20ac648e/src/share/vm/opto/callnode.cpp#l1886

[2] latest hs-comp 9 fastdebug build:
<eliminate_lock lock='1' compile_id='10' class_id='lock' 
kind='coarsened' stamp='1.071'>
   <jvms bci='61' method='844'/>
</eliminate_lock>
<eliminate_lock lock='0' compile_id='10' class_id='unlock' 
kind='coarsened' stamp='1.071'>
   <jvms bci='64' method='844'/>
</eliminate_lock>

On 7/22/15 5:04 PM, Chris Newland wrote:
> Hi,
>
> I'm building support into JITWatch for highlighting eliminated heap
> allocations and locks (either via elision or coarsening) and I'm confused
> by the LogCompilation output in this case:
>
> Given the source code here:
> https://github.com/AdoptOpenJDK/jitwatch/blob/master/src/main/resources/examples/LockCoarsen.java
>
> Which consists of two synchronized(this) regions separated by a single
> statement modifying a local primitive.
>
> I get the following LogCompilation output:
> https://gist.github.com/chriswhocodes/124984ce8078290485e7
>
> Which has the following elimination info in the optimizer phase:
>
> <eliminate_lock lock='1'>
> <jvms bci='61' method='818'/>
> </eliminate_lock>
> <eliminate_lock lock='0'>
> </eliminate_lock>
>
> BCI 61 refers to the 2nd synchronized block's monitor enter which I
> believe will be eliminated due to coarsening but I don't understand what
> the
>
> <eliminate_lock lock='0'>
> </eliminate_lock>
>
> tag means, and why it doesn't contain a jvms tag?
>
> Does it mean the first synchronized block has also been eliminated (via
> elision) ? I don't see any "lock cmpxchg" related to the first
> synchronized block in the PrintAssembly?
>
> Thanks,
>
> Chris
>
>
>
>

From vladimir.x.ivanov at oracle.com  Wed Jul 22 17:21:45 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 22 Jul 2015 20:21:45 +0300
Subject: [9] RFR (XS): 8131675: EA fails with assert(false) failed: not
	unsafe or G1 barrier raw StoreP
In-Reply-To: <D791A211-8C78-430F-9346-7B60905C7F53@oracle.com>
References: <55AE8D5A.1020906@oracle.com>
	<D791A211-8C78-430F-9346-7B60905C7F53@oracle.com>
Message-ID: <55AFD129.40109@oracle.com>

Thanks, Roland!

Best regards,
Vladimir Ivanov

On 7/22/15 4:54 PM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~vlivanov/8131675/webrev.00
>
> Looks good to me.
>
> Roland.
>

From varming at gmail.com  Wed Jul 22 17:37:36 2015
From: varming at gmail.com (Carsten Varming)
Date: Wed, 22 Jul 2015 13:37:36 -0400
Subject: Fwd: Tiered compilation and virtual call heuristics
In-Reply-To: <CAP_pwnU_L6a6jh7qJZO_gbeoYkZcDF5M88adgoL85TJaWDc78g@mail.gmail.com>
References: <CAP_pwnU_L6a6jh7qJZO_gbeoYkZcDF5M88adgoL85TJaWDc78g@mail.gmail.com>
Message-ID: <CAP_pwnWQHmayJh=m=cbZ6kt=9ybwFNE6Te_9KokYEiLRRwHnhA@mail.gmail.com>

Dear Hotspot compiler group,

I have had a few issues with tiered compilation in JDK8 lately and was
wondering if you have some comments or ideas for the given problem.

Here is my problem as I currently understand it. Feel free to correct any
misunderstandings I may have. With tiered compilation the heuristics for
inlining virtual calls seems to degrade quite a bit. I think this is due to
MethodData objects being created much earlier with tiered than without.
This causes the tracking of the hottest target methods at a virtual call
site to go awry, due to the limit (2) on the number of MethodData objects
that can be associated with a bci in a method. It seems like the only
virtual call targets tracked are the targets that are warm when when C1 is
invoked.

The program ends up with all call-sites in
scala.collection.IndexedSeqOptimized.slice using virtual dispatch with
tiered and bimorphic call sites without tiered. The end result with tiered
is a tripling of the cpu required to run the program, and instruction
pointers from the compiled slice method end up in 90% of all cpu samples
(collected with perf at 4kHz).

The problem is with a small application built in Scala on top of Netty. I
have written a small sample program (see attached Main.java) to spare you
the details (and to be able to give you code).

When I run the sample program with tiered then the call to count end up
being a virtual call, due to Instance$3.count  and Instance4.count being
warm when C1 kicks in. Without tiered Instance$1.count is the only hot
method.

I wonder if you guys have seen this problem in the wild or if I just happen
to be unlucky. Increasing BciProfileWidth should help in my case, but it is
not a product flag. Do you have any experience regarding cost of
increasing BciProfileWidth? Do you have any thoughts on throwing out
MethodData objects for virtual call sites that turns out to be pretty cold?

Thank you in advance for your thoughts,
Carsten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150722/505e77e4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Main.java
Type: text/x-java
Size: 906 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150722/505e77e4/Main.java>

From vladimir.x.ivanov at oracle.com  Wed Jul 22 18:14:24 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 22 Jul 2015 21:14:24 +0300
Subject: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be [0..4]
Message-ID: <55AFDD80.2050009@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8132168/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8132168

PrintIdealGraphLevel=4 is used to dump IR during parsing (see 
parse2.cpp:2387).

Best regards,
Vladimir Ivanov

From john.r.rose at oracle.com  Wed Jul 22 20:14:44 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 22 Jul 2015 13:14:44 -0700
Subject: On constant folding of final field loads
In-Reply-To: <55AFC162.4030807@oracle.com>
References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com>
	<559143D2.2020201@oracle.com> <5592662C.40400@oracle.com>
	<5592E75A.1000500@oracle.com>
	<C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
	<55AFC162.4030807@oracle.com>
Message-ID: <40E34041-BE42-4F8F-8C81-7DD4ACABD380@oracle.com>

On Jul 22, 2015, at 9:14 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
>> That's the sort of thing I would prefer to see, to remove
>> indeterminate behavior.
> Yes, fail-fast approach is very appealing and I like slushy bit idea, but it is more intrusive (from user perspective), unfortunately.
> 
> Though Reflection & MethodHandles can be instrument with slushy bit checks and Unsafe left as-is, what can be done for JNI?

I would prefer to treat JNI (for these use cases) the same as Unsafe, to keep our story simpler.

But it is true that we could add a few more checks to JNI than Unsafe.

> Do you think it is acceptable for JVM to throw exceptions on "illegal" (slushy bit off) final field writes?

Yes, that's the key feature of "fail fast".  The user finds out immediately.

> SetXXXField JNI functions aren't declared to throw any exceptions [1], so it seems like an intrusive change, even with JNI spec adjustments.

In some cases (in the JMM legal framework) we would be within our rights to silently nullify (ignore and discard) illegal stores to final fields (in any API).  Nullification is indistinguishable from the store occurring but never being observed in a future read.  This is possible if either the store is delayed indefinitely, or if all threads (and compiled methods) have previously performed a caching read of the original final value.  Of course, the threads would not have physically done so, but perhaps the JMM could be tortured to allow some sort of OOTA caching read of the original final value.

But silent nullification is a desperation move.  Users deserve an exception or other signal so they can fix their code quickly.

> Thinking more about "slushiness", limiting the time an object is in that state doesn't look like an easy problem to solve, considering there are 3 interacting operations (instantiate, initialize, freeze).

The important thing is to get library writers to "put a lid" on the slushy state by adding an explicit freeze operation.  A mixture of performance and robustness concerns would motivate most of them to get with the program.  But the non-compliant libraries would, as you say, leak slushy objects.

Normally allocated objects should not have the slushy bit set, except (perhaps, but probably not) when they escape during construction.

> VM can do some analysis to ensure slushy objects don't escape, but it looks either too fragile or too complex to implement.

Agree.

> I don't see a big problem with "runaway" objects with slushy bit on. It means JIT will be always conservative when working with them. Or are you mostly concerned about abuse of slushy objects?

A slushy object won't optimize fully (in all cases).  And its surprising mutability provides a little window for bugs to sneak in.  It would be on library writers to fix this.  I suppose we could also add an operation to assert frozen-ness, either inside the JIT or explicitly as a tool for users.  In some cases, a JIT could generate good code optimistically for properly frozen objects, and de-opt when it runs into a stray slushy object.  If the assertion were at the user level, it could fail with an exception; this isn't exactly fail-fast since the bad operation that leaked the slushy object would be long gone.

> It can be mitigated by:
>  (1) requiring a user to perform additional actions to set slushy bit (e.g. calling specialized newInstance() equivalent or marking caller method akin to caller-sensitive methods); in that case it is less likely a user won't freeze previously allocated object;

(Even better than annotations would be split types, a la the verifier.  But the JVM does not have a highly refined type system.)

Any hack that creates a blank instance (Unsafe.newInstance, JNIEnv.AllocObject) without also running a constructor can and should set the slushy bit.  This is something we can control, ultimately.

>  (2) providing VM diagnostics to detect runaway slushy objects;

(Yes, such as a user-visible assertion.)

> In the end, it is expert level API. I don't think many people write their own deserialization frameworks :-) If you bungle it, you are on your own.

And if you own a deserialization framework, you need to read the news and update your code.

> ?
> Separate tag has the same shortcoming as high-tag: it is not translated into a single machine instruction. VM needs to inspect the tag before performing a field update, though original setXXX() methods aren't affected.
> 
> Considering fail-fast approach and slushy bits, I'm inclined to leave Unsafe as is (no tag approach). Probably, adding diagnostic mode to VM signalling when a final field is updated using Unsafe.

That sounds pretty good.

One caveat:  Unsafe and automagic behavior (even in debug mode) don't usually go together.

An alternative to automagic mode would be a *separate* Unsafe API to enable library writers to make the check (in their own debug mode).  Something that would decode the (Object,long) pair into a java.lang.reflect.Field (or similar name), which could then be checked for finality.  Plus the aforementioned "assertSlushy" operation.

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150722/163880d0/attachment-0001.html>

From michael.c.berg at intel.com  Thu Jul 23 05:55:12 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 23 Jul 2015 05:55:12 +0000
Subject: RFR: 8132160 - support for AVX 512 call frames and stack management
Message-ID: <C568518E7B433348B114B6A7122D474755E3E637@FMSMSX102.amr.corp.intel.com>

Hi Folks,

I would like to contribute AVX 512 call frame and stack management changes. I need two reviewers to examine this patch and comment as needed:

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132160

webrev:
http://cr.openjdk.java.net/~mcberg/8132160/webrev.01/

These changes simplify frame management on 32-bit and 64-bit systems which support EVEX and extend more complete frame save and restore functionality as well as stack management for calls, traps and explicit exception paths. These changes also move CPUID queries into the assembler object state and add more state rules to a large class of instructions while simplifying their use. Also added is support for vectorizing double precision sqrt which is available through the math library. Many generated stubs and internal functions also now have predicated mask management for EVEX added.

Thanks,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150723/ec9788be/attachment.html>

From roland.westrelin at oracle.com  Thu Jul 23 07:20:08 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 23 Jul 2015 09:20:08 +0200
Subject: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be
	[0..4]
In-Reply-To: <55AFDD80.2050009@oracle.com>
References: <55AFDD80.2050009@oracle.com>
Message-ID: <5FF697E2-3D94-4650-A213-B3D3F4F6354B@oracle.com>

> http://cr.openjdk.java.net/~vlivanov/8132168/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8132168
> 
> PrintIdealGraphLevel=4 is used to dump IR during parsing (see parse2.cpp:2387).

Ok. Do we want to update the description string:

 369           "Level of detail of the ideal graph printout. "                   \
 370           "System-wide value, 0=nothing is printed, 3=all details printed. "\
 371           "Level of detail of printouts can be set on a per-method level "  \
 372           "as well by using CompileCommand=option.?)        

to cover the ?4? case?

Roland.

From aph at redhat.com  Thu Jul 23 09:23:17 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 23 Jul 2015 10:23:17 +0100
Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle
	large stack offsets
In-Reply-To: <1437558625.14729.11.camel@mylittlepony.linaroharston>
References: <1437036374.31596.18.camel@mylittlepony.linaroharston>			<55A7FCA1.6010008@oracle.com>
	<1437123159.29276.16.camel@mint>			<55A8C45D.2070009@redhat.com>
	<1437124359.29276.18.camel@mint>	 <55A8CAF5.5060601@redhat.com>
	<55A8CE3C.1050203@redhat.com>
	<1437558625.14729.11.camel@mylittlepony.linaroharston>
Message-ID: <55B0B285.7010904@redhat.com>

On 22/07/15 10:50, Edward Nevill wrote:
> So, although we still don't know if it will ever call spill_copy128 for a vector register at least we have confidence that the spill_copy code does the right thing if it is ever called.
> 
> OK to push?

Thanks.  Fine by me,

Andrew.


From aph at redhat.com  Thu Jul 23 10:02:19 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 23 Jul 2015 11:02:19 +0100
Subject: User-defined deserialization [Was: On constant folding of final field
	loads]
In-Reply-To: <C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com>
	<559143D2.2020201@oracle.com> <5592662C.40400@oracle.com>
	<5592E75A.1000500@oracle.com>
	<C0DCAB0B-200B-45BE-9E74-2DDCAF61397D@oracle.com>
Message-ID: <55B0BBAB.6020604@redhat.com>

On 21/07/15 00:05, John Rose wrote:

> We have enough bits in the object header to encode frozen-ness.
> This is an opposite property: slushiness-of-finals.  We could
> require that the newInstance operation used by deserialization would
> create slushy objects.  (The normal new/<init> sequence doesn't need
> this.)  Ideally, we would want the deserializer to issue an explicit
> "publish" operation, which would clear the slushy flag.  JITs would
> consult that flag to gate final-folding.  Reflection (and other
> users of Unsafe) would consult the flag and throw a fail-fast error
> if it failed.  There would have to be some way to limit the time an
> object is in the slushy state, ideally by enforcing an error on
> deserializers who neglect to publish (or discard) a slushy object.
> For example, we could require an annotation on deserialization
> methods, as we do today on caller-sensitive methods.
> 
> That's the sort of thing I would prefer to see, to remove
> indeterminate behavior.

Which reminds me: an issue has arisen regarding Unsafe and
user-defined deserialization.  Some middleware uses non-Java-standard
serialization protocols which require some sort of backdoor mechanism
for creating objects which have no zero-arg constructor.  Unafe is
used to do this, as is ReflectionFactory, but both are internal APIs.

I don't think there's anything inherently unreasonable about people
wanting to write their own serialization protocols, but having to call
internal APIs is fragile.  I suppose we would like to tell the runtime
that an object is no longer slushy, but there is no "official"
interface for serializers to create objects in the first place.

Andrew.

From vladimir.x.ivanov at oracle.com  Thu Jul 23 10:37:23 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 23 Jul 2015 13:37:23 +0300
Subject: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be
	[0..4]
In-Reply-To: <5FF697E2-3D94-4650-A213-B3D3F4F6354B@oracle.com>
References: <55AFDD80.2050009@oracle.com>
	<5FF697E2-3D94-4650-A213-B3D3F4F6354B@oracle.com>
Message-ID: <55B0C3E3.3020202@oracle.com>

Thanks, Roland.
I'll adjust the description before pushing.

Best regards,
Vladimir Ivanov

On 7/23/15 10:20 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~vlivanov/8132168/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8132168
>>
>> PrintIdealGraphLevel=4 is used to dump IR during parsing (see parse2.cpp:2387).
>
> Ok. Do we want to update the description string:
>
>   369           "Level of detail of the ideal graph printout. "                   \
>   370           "System-wide value, 0=nothing is printed, 3=all details printed. "\
>   371           "Level of detail of printouts can be set on a per-method level "  \
>   372           "as well by using CompileCommand=option.?)
>
> to cover the ?4? case?
>
> Roland.
>

From tobias.hartmann at oracle.com  Thu Jul 23 11:23:42 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 23 Jul 2015 13:23:42 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55AE4BC2.1090104@oracle.com>
References: <55AE4BC2.1090104@oracle.com>
Message-ID: <55B0CEBE.4010100@oracle.com>

Hi,

after running some more tests, I noticed that my fix is incomplete. The problem is that checking for a failed expansion in 'Compile::fill_buffer' may be too late in case we emit additional instructions immediately after we failed to create the stub. For example, 'CallStaticJavaHandle' emits code to restore the stack pointer after 'Java_Static_Call' on Sparc (see sparc.ad):

9991   ins_encode(preserve_SP, Java_Static_Call(meth), restore_SP, call_epilog);

If we don't bail out immediately after allocation of the stub failed, we crash in 'restore_SP' while trying to emit code into the now freed code blob.

This problem exists on all platforms. Although we do not always emit code (for example on aarch64, -XX:VerifyStackAtCalls is currently unimplemented), I added checks for robustness / completeness. I also added checks to the corresponding C1 code. Here is the new webrev:
http://cr.openjdk.java.net/~thartmann/8130309/webrev.01/

Thanks,
Tobias

On 21.07.2015 15:40, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch.
> 
> https://bugs.openjdk.java.net/browse/JDK-8130309
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
> 
> Problem:
> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
> 
> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
> 
> Solution:
> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. 
> 
> Testing:
> - Failing test
> - JPRT
> 
> Thanks,
> Tobias
> 

From roland.westrelin at oracle.com  Thu Jul 23 13:51:57 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 23 Jul 2015 15:51:57 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is
	out of space
In-Reply-To: <55B0CEBE.4010100@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
Message-ID: <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>

> http://cr.openjdk.java.net/~thartmann/8130309/webrev.01/

assembler.cpp

68     Compile::current()->env()->record_failure("CodeCache is full?);

That assumes we are calling this from c2 but it can be called from c1 as well. 

Did you add code for c1 to be on the safe side or have you observed problems with c1?

I don?t understand that part:

"the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.?

What addresses are set to badAddress?

Roland.

> 
> Thanks,
> Tobias
> 
> On 21.07.2015 15:40, Tobias Hartmann wrote:
>> Hi,
>> 
>> please review the following patch.
>> 
>> https://bugs.openjdk.java.net/browse/JDK-8130309
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
>> 
>> Problem:
>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
>> 
>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
>> 
>> Solution:
>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. 
>> 
>> Testing:
>> - Failing test
>> - JPRT
>> 
>> Thanks,
>> Tobias
>> 


From vivek.r.deshpande at intel.com  Thu Jul 23 20:26:45 2015
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Thu, 23 Jul 2015 20:26:45 +0000
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com>

Hi all

I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM  implementation.
Please review and sponsor this patch.

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207

webrev:
http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/

Thanks,
Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150723/124c4f6e/attachment.html>

From igor.veresov at oracle.com  Thu Jul 23 23:45:29 2015
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 23 Jul 2015 16:45:29 -0700
Subject: Tiered compilation and virtual call heuristics
In-Reply-To: <CAP_pwnWQHmayJh=m=cbZ6kt=9ybwFNE6Te_9KokYEiLRRwHnhA@mail.gmail.com>
References: <CAP_pwnU_L6a6jh7qJZO_gbeoYkZcDF5M88adgoL85TJaWDc78g@mail.gmail.com>
	<CAP_pwnWQHmayJh=m=cbZ6kt=9ybwFNE6Te_9KokYEiLRRwHnhA@mail.gmail.com>
Message-ID: <6F5B24D5-F8D9-4C72-B978-93EDD7F353DE@oracle.com>

I guess you got a little bit unlucky :). Increasing BciProfileWidth will only work in the case when one of the types appears in the profile later and then dominates other usages (see TypeProfileMajorReceiverPercent, it?s 90% by default). So, by the time C2 see the callsite the receiver type you want should be recorded and have more than 90% of counts. The overhead of BciProfileWidth may be substantial for the startup (slower profiling), but it won?t affect the final code (unless inlining happens differently, in a bad way).

Another approach would be to delay the moment when the profiling starts. By, basically, bumping up the values of these guys:

  product(intx, Tier3InvocationThreshold, 200,                              \
          "Compile if number of method invocations crosses this "           \
          "threshold")                                                      \
                                                                            \
  product(intx, Tier3MinInvocationThreshold, 100,                           \
          "Minimum invocation to compile at tier 3")                        \
                                                                            \
  product(intx, Tier3CompileThreshold, 2000,                                \
          "Threshold at which tier 3 compilation is invoked (invocation "   \
          "minimum must be satisfied)")                                     \


Compilation will happen, when the following predicate gets true:

    return (i >= Tier3InvocationThreshold * scale) ||
           (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale);

Here, i is invocations, b is backedges, scale is something dynamic but is >= 1.


igor

> On Jul 22, 2015, at 10:37 AM, Carsten Varming <varming at gmail.com> wrote:
> 
> Dear Hotspot compiler group,
> 
> I have had a few issues with tiered compilation in JDK8 lately and was wondering if you have some comments or ideas for the given problem.
> 
> Here is my problem as I currently understand it. Feel free to correct any misunderstandings I may have. With tiered compilation the heuristics for inlining virtual calls seems to degrade quite a bit. I think this is due to MethodData objects being created much earlier with tiered than without. This causes the tracking of the hottest target methods at a virtual call site to go awry, due to the limit (2) on the number of MethodData objects that can be associated with a bci in a method. It seems like the only virtual call targets tracked are the targets that are warm when when C1 is invoked.
> 
> The program ends up with all call-sites in scala.collection.IndexedSeqOptimized.slice using virtual dispatch with tiered and bimorphic call sites without tiered. The end result with tiered is a tripling of the cpu required to run the program, and instruction pointers from the compiled slice method end up in 90% of all cpu samples (collected with perf at 4kHz).
> 
> The problem is with a small application built in Scala on top of Netty. I have written a small sample program (see attached Main.java) to spare you the details (and to be able to give you code).
> 
> When I run the sample program with tiered then the call to count end up being a virtual call, due to Instance$3.count  and Instance4.count being warm when C1 kicks in. Without tiered Instance$1.count is the only hot method.
> 
> I wonder if you guys have seen this problem in the wild or if I just happen to be unlucky. Increasing BciProfileWidth should help in my case, but it is not a product flag. Do you have any experience regarding cost of increasing BciProfileWidth? Do you have any thoughts on throwing out MethodData objects for virtual call sites that turns out to be pretty cold?
> 
> Thank you in advance for your thoughts,
> Carsten
> 
> <Main.java>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150723/fafdf076/attachment.html>

From aleksey.shipilev at oracle.com  Fri Jul 24 09:49:10 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 24 Jul 2015 12:49:10 +0300
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <55AF500B.9000505@oracle.com>
References: <55A9033C.2030302@oracle.com>
	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>
	<55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com>
Message-ID: <55B20A16.7020300@oracle.com>

(explicitly cc'ing AArch64 and PPC folks)

Thanks,
-Aleksey

On 22.07.2015 11:10, Aleksey Shipilev wrote:
> Thanks for review, Dean!
> 
> I'd like to hear the opinions of AArch64 and Power folks, since we
> contaminate their assemblers a bit to gain access to x86 fat nops.
> 
> -Aleksey
> 
> On 21.07.2015 23:28, Dean Long wrote:
>> This version looks good.
>>
>> dl
>>
>> On 7/20/2015 7:51 AM, Aleksey Shipilev wrote:
>>> Hi Dean,
>>>
>>> Thanks for taking a look!
>>>
>>> Silly me, I should have left the call patching cases intact, because
>>> you're right, we should be able to patch the nops partially while still
>>> producing the correct instruction stream. Therefore, I reverted the
>>> cases where we do nop-ing for *instruction* patching, and added the
>>> comment there.
>>>
>>> Other places seem to use the nop sequences to provide the alignment, not
>>> for the general patching. Especially interesting for us is the case of
>>> aligning the patcheable immediate in the existing call. C2 does the nops
>>> in these cases.
>>>
>>> New webrev:
>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.01/
>>>
>>> Testing:
>>>    * JPRT -testset hotspot on open platforms;
>>>    * Targeted benchmarks, plus eyeballing the assembly;
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>> On 18.07.2015 10:51, Dean Long wrote:
>>>> I think we should distinguish the different uses and treat them
>>>> accordingly:
>>>>
>>>> 1) padding nops for patching, executed
>>>>
>>>> We need to be careful about inserting a fat nop here, if later patching
>>>> overwrites only part of the fat nop, resulting in an illegal intruction.
>>>>
>>>> 2) padding nops for patching, never executed
>>>>
>>>> It should be safe insert a fat nop here, but there's no point if the
>>>> nops are not reachable and never executed.
>>>>
>>>>
>>>> 3) alignment nops, never patched, executed
>>>>
>>>> Fat nops are fine, but on some CPUs branching may be even better, so I
>>>> suggest using align() for this, and letting align() decide what to
>>>> generate.  The change in check_icache() could use a version of align
>>>> that takes the target  offset as an argument:
>>>>
>>>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);
>>>>
>>>> 4) alignment nops, never patched, never executed
>>>>
>>>> Doesn't matter what we emit here, but we might as well make it
>>>> understandable by humans using a debugger.
>>>>
>>>>
>>>> I believe the patching nops in c1_CodeStubs_x86.cpp and
>>>> c1_LIRAssembler.cpp are patched concurrently while the code is running,
>>>> not at a safepoint, so it's not clear to me if it's safe to use fat nops
>>>> on x86.  I would consider those changes unsafe on x86 without further
>>>> analysis of what happens during patching.
>>>>
>>>> dl
>>>>
>>>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
>>>>> Hi there,
>>>>>
>>>>> C1 is not very good at inlining and intrisifying methods, and hence the
>>>>> call performance is important there. One nit that we can see in the
>>>>> generated code on x86 is that C1 uses the single-byte nops, even for
>>>>> long nop strides.
>>>>>
>>>>> This improvement fixes that:
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8131682
>>>>>     http://cr.openjdk.java.net/~shade/8131682/webrev.00/
>>>>>
>>>>> Testing:
>>>>>     - JPRT -testset hotspot on open platforms
>>>>>     - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>>>>>
>>>>> (I understand the symmetric change is going to be needed in closed
>>>>> parts, but let's polish the open part first).
>>>>>
>>>>> Thanks,
>>>>> -Aleksey
>>>>>
>>>
>>
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/01214485/signature.asc>

From aleksey.shipilev at oracle.com  Fri Jul 24 09:50:26 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 24 Jul 2015 12:50:26 +0300
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class
	is loaded from static final
In-Reply-To: <55AE1972.4050106@oracle.com>
References: <55ACFD0F.30207@oracle.com>	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com>
Message-ID: <55B20A62.2090307@oracle.com>

On 21.07.2015 13:05, Aleksey Shipilev wrote:
> On 21.07.2015 00:14, John Rose wrote:
>> On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>>> On the road from Unsafe to VarHandles lies a small deficiency in C1
>>> Class.cast/isInstance optimization: the canonicalizer folds constant
>>> class perfectly when it is coming from "inlined" constant, but not from
>>> static final, because the constant "shapes" are different:
>>>  https://bugs.openjdk.java.net/browse/JDK-8131782
> 
>> I suggest a deeper fix, to the factory that produces the oddly formatted constant.
>> That may help with other, similar constant folding problems.
> 
> All right, let's do that!
>   http://cr.openjdk.java.net/~shade/8131782/webrev.02/
> 
> I respinned it through JRPT and my targeted benchmarks, and it performs
> the same as previous patch.

Any other reviews pending? If not, please sponsor!

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/03d1e423/signature-0001.asc>

From goetz.lindenmaier at sap.com  Fri Jul 24 10:38:10 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 24 Jul 2015 10:38:10 +0000
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <55B20A16.7020300@oracle.com>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>
	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>
	<55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>

Hi Aleksey,

thanks for pointing us to that change!
Looks good, but does not compile. Default arg should only be in the header.
See below.

Ppc part reviewed and I don?t need a new webrev.

Best regards,
  Goetz.

--- a/src/cpu/ppc/vm/assembler_ppc.inline.
+++ b/src/cpu/ppc/vm/assembler_ppc.inline.hpp
@@ -210,7 +210,7 @@
 inline void Assembler::extsw(   Register a, Register s)                { emit_int32(EXTSW_OPCODE   | rta(a) | rs(s) | rc(0)); }

 // extended mnemonics
-inline void Assembler::nop()                              { Assembler::ori(R0, R0, 0); }
+inline void Assembler::nop(int count)                     { for (int i = 0; i < count; i++) { Assembler::ori(R0, R0, 0); } }
 // NOP for FP and BR units (different versions to allow them to be in one group)
 inline void Assembler::fpnop0()                           { Assembler::fmr(F30, F30); }
 inline void Assembler::fpnop1()                           { Assembler::fmr(F31, F31); }


g++ 4.8.3:
In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.inline.hpp:43:0,                                                                                   
                 from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/macroAssembler_ppc.inline.hpp:29,                                                                              
                 from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/macroAssembler.inline.hpp:43,                                                                                
                 from ../generated/adfiles/ad_ppc_64.cpp:56:                                                                                                                                 
/sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.inline.hpp:213:41: error: default argument given for parameter 1 of void Assembler::nop(int) [-fpermissive]          
 inline void Assembler::nop(int count = 1)                 { for(int i = 0; i < count; i++)                                                                                                  
                                         ^                                                                                                                                                   
In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.hpp:434:0,                                                                                         
                 from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/nativeInst_ppc.hpp:29,                                                                                         
                 from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/code/nativeInst.hpp:41,                                                                                          
                 from ../generated/adfiles/ad_ppc_64.hpp:57,                                                                                                                                 
                 from ../generated/adfiles/ad_ppc_64.cpp:54:                                                                                                                                 
/sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.hpp:1383:15: error: after previous specification in void Assembler::nop(int) [-fpermissive]                          
   inline void nop(int count = 1);                                                                                                                                                           
               ^     


-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Aleksey Shipilev
Sent: Friday, July 24, 2015 11:49 AM
To: Dean Long; hotspot compiler
Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
Subject: Re: RFR (S) 8131682: C1 should use multibyte nops everywhere

* PGP Signed by an unknown key

(explicitly cc'ing AArch64 and PPC folks)

Thanks,
-Aleksey

On 22.07.2015 11:10, Aleksey Shipilev wrote:
> Thanks for review, Dean!
> 
> I'd like to hear the opinions of AArch64 and Power folks, since we
> contaminate their assemblers a bit to gain access to x86 fat nops.
> 
> -Aleksey
> 
> On 21.07.2015 23:28, Dean Long wrote:
>> This version looks good.
>>
>> dl
>>
>> On 7/20/2015 7:51 AM, Aleksey Shipilev wrote:
>>> Hi Dean,
>>>
>>> Thanks for taking a look!
>>>
>>> Silly me, I should have left the call patching cases intact, because
>>> you're right, we should be able to patch the nops partially while still
>>> producing the correct instruction stream. Therefore, I reverted the
>>> cases where we do nop-ing for *instruction* patching, and added the
>>> comment there.
>>>
>>> Other places seem to use the nop sequences to provide the alignment, not
>>> for the general patching. Especially interesting for us is the case of
>>> aligning the patcheable immediate in the existing call. C2 does the nops
>>> in these cases.
>>>
>>> New webrev:
>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.01/
>>>
>>> Testing:
>>>    * JPRT -testset hotspot on open platforms;
>>>    * Targeted benchmarks, plus eyeballing the assembly;
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>> On 18.07.2015 10:51, Dean Long wrote:
>>>> I think we should distinguish the different uses and treat them
>>>> accordingly:
>>>>
>>>> 1) padding nops for patching, executed
>>>>
>>>> We need to be careful about inserting a fat nop here, if later patching
>>>> overwrites only part of the fat nop, resulting in an illegal intruction.
>>>>
>>>> 2) padding nops for patching, never executed
>>>>
>>>> It should be safe insert a fat nop here, but there's no point if the
>>>> nops are not reachable and never executed.
>>>>
>>>>
>>>> 3) alignment nops, never patched, executed
>>>>
>>>> Fat nops are fine, but on some CPUs branching may be even better, so I
>>>> suggest using align() for this, and letting align() decide what to
>>>> generate.  The change in check_icache() could use a version of align
>>>> that takes the target  offset as an argument:
>>>>
>>>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);
>>>>
>>>> 4) alignment nops, never patched, never executed
>>>>
>>>> Doesn't matter what we emit here, but we might as well make it
>>>> understandable by humans using a debugger.
>>>>
>>>>
>>>> I believe the patching nops in c1_CodeStubs_x86.cpp and
>>>> c1_LIRAssembler.cpp are patched concurrently while the code is running,
>>>> not at a safepoint, so it's not clear to me if it's safe to use fat nops
>>>> on x86.  I would consider those changes unsafe on x86 without further
>>>> analysis of what happens during patching.
>>>>
>>>> dl
>>>>
>>>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
>>>>> Hi there,
>>>>>
>>>>> C1 is not very good at inlining and intrisifying methods, and hence the
>>>>> call performance is important there. One nit that we can see in the
>>>>> generated code on x86 is that C1 uses the single-byte nops, even for
>>>>> long nop strides.
>>>>>
>>>>> This improvement fixes that:
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8131682
>>>>>     http://cr.openjdk.java.net/~shade/8131682/webrev.00/
>>>>>
>>>>> Testing:
>>>>>     - JPRT -testset hotspot on open platforms
>>>>>     - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>>>>>
>>>>> (I understand the symmetric change is going to be needed in closed
>>>>> parts, but let's polish the open part first).
>>>>>
>>>>> Thanks,
>>>>> -Aleksey
>>>>>
>>>
>>
> 
> 


* Unknown Key
* 0x62A119A7

From tobias.hartmann at oracle.com  Fri Jul 24 11:29:21 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 24 Jul 2015 13:29:21 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
Message-ID: <55B22191.9070904@oracle.com>

Hi Roland,

thanks for the review!

On 23.07.2015 15:51, Roland Westrelin wrote:
> assembler.cpp
> 
> 68     Compile::current()->env()->record_failure("CodeCache is full?);
> 
> That assumes we are calling this from c2 but it can be called from c1 as well.

You are right. I moved this code to the C2 methods calling 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods already contain a call to 'bailout()'.

I also noticed that we multiply the 'to_interp_stub_size()' by 2 when creating the stub (see compiledIC_sparc.cpp):

68   __ start_a_stub(to_interp_stub_size()*2);

This seems to be unnecessary and causes problems in "Compile::scratch_emit_size" because we fix the stub section of the CodeBuffer to MAX_stubs_size and therefore fail to emit the to-interpreter stub since to_interp_stub_size()*2 > MAX_stubs_size. I removed the multiplication and added an assert to 'Compile::scratch_emit_size()'.
 
> Did you add code for c1 to be on the safe side or have you observed problems with c1?

I did not encounter the problem with C1 but added the checks to be on the safe side. Actually, we do call bailout() in C1 'LIR_Assembler::emit_static_call_stub()' if the stub cannot be created but without checking for bailed_out() in the calling method we will continue to emit code and crash (see explanation below).

> "the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.?
> 
> What addresses are set to badAddress?

If CodeBuffer::expand() fails, the corresponding buffer blob in the code cache is freed and code section addresses are set to 'badAddress' (see CodeBuffer::set_blob(NULL)). If we continue to emit code into this buffer or use its section addresses, we fail. The assert "wrong size of mach node" is hit because we use CodeBuffer::insts_size() which is now -1647855886. Another instance of this bug fails because we continue to emit code, call CodeSection::emit_int32() and fail because end() is set to 'badAddress'.

I did some more testing with "java -XX:+StressCodeBuffers -Xcomp -version" and found more places where we have to check for a failed CodeBuffer expansion to not continue to emit code and crash. I added the corresponding checks.

Here is the new webrev:
http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/

Thanks,
Tobias

> 
> Roland.
> 
>>
>> Thanks,
>> Tobias
>>
>> On 21.07.2015 15:40, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8130309
>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
>>>
>>> Problem:
>>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
>>>
>>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
>>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
>>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
>>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
>>>
>>> Solution:
>>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. 
>>>
>>> Testing:
>>> - Failing test
>>> - JPRT
>>>
>>> Thanks,
>>> Tobias
>>>
> 

From Andrey.Saenkov at oracle.com  Fri Jul 24 15:37:12 2015
From: Andrey.Saenkov at oracle.com (Andrey Saenkov)
Date: Fri, 24 Jul 2015 18:37:12 +0300
Subject: RFR: 8079667: port vm/compiler/AESIntrinsics/CheckIntrinsics into
	jtreg
In-Reply-To: <55B259AF.2070700@oracle.com>
References: <55B259AF.2070700@oracle.com>
Message-ID: <55B25BA8.4040201@oracle.com>

Hi,

could you please review the patch which ports tests for AES Intrinsics 
from closed jdk to the open.

Unfortunately bug is marked as confidential, so you can't view its 
content unless you have required permissions.
Description: Tests for AESIntrinsics ported from closed jdk to open jdk

Link to webrev: 
http://cr.openjdk.java.net/~iignatyev/asaenkov/8079667/webrev.00/
Link to bug: https://jbs.oracle.com/bugs/browse/JDK-8079667

Testing done: Tested locally on Ubuntu x64 with AES support and tested 
on all other platforms using inner tool.
Tests fail on sparc due to: JDK-8131778 
<https://bugs.openjdk.java.net/browse/JDK-8131778>

-- 

Andrey Saenkov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/087cf567/attachment.html>

From christian.thalinger at oracle.com  Fri Jul 24 16:21:26 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 24 Jul 2015 09:21:26 -0700
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com>
Message-ID: <F4230613-1779-420A-A964-0301433086F1@oracle.com>


> On Jul 23, 2015, at 1:26 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com> wrote:
> 
> Hi all
>  
> I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM  implementation.
> Please review and sponsor this patch.
>  
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 <https://bugs.openjdk.java.net/browse/JDK-8132207> 
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ <http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/> 

I have waited for this a long time and I applaud the change!  But I?m baffled by the complexity of the implementation ;-)  Are there any comments in the original Intel libm implementation which we could add here as well?

>  
> Thanks,
> Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/8830f63c/attachment-0001.html>

From vivek.r.deshpande at intel.com  Thu Jul 23 18:01:11 2015
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Thu, 23 Jul 2015 18:01:11 +0000
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A5665199D@ORSMSX106.amr.corp.intel.com>

Hi all

I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM  implementation.
Please review and sponsor this patch.

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207

webrev:
http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/

Thanks,
Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150723/62d3278d/attachment.html>

From sandhya.viswanathan at intel.com  Fri Jul 24 17:04:00 2015
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 24 Jul 2015 17:04:00 +0000
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
In-Reply-To: <F4230613-1779-420A-A964-0301433086F1@oracle.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com>
	<F4230613-1779-420A-A964-0301433086F1@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B633A7A4D@FMSMSX112.amr.corp.intel.com>

Hi Christian,
  This is generated code by the ICC, that?s why you don?t see any line by line comments here. The algorithm used by Intel libm is given in comments as a preamble.
Best Regards,
Sandhya


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Christian Thalinger
Sent: Friday, July 24, 2015 9:21 AM
To: Deshpande, Vivek R
Cc: Vladimir.Kozlov at oracle.com; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M): 8132207: Update for x86 exp in the math lib


On Jul 23, 2015, at 1:26 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com<mailto:vivek.r.deshpande at intel.com>> wrote:

Hi all

I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM  implementation.
Please review and sponsor this patch.

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207

webrev:
http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/

I have waited for this a long time and I applaud the change!  But I?m baffled by the complexity of the implementation ;-)  Are there any comments in the original Intel libm implementation which we could add here as well?


Thanks,
Vivek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/eabd47f9/attachment.html>

From christian.thalinger at oracle.com  Fri Jul 24 17:18:10 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 24 Jul 2015 10:18:10 -0700
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B633A7A4D@FMSMSX112.amr.corp.intel.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com>
	<F4230613-1779-420A-A964-0301433086F1@oracle.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B633A7A4D@FMSMSX112.amr.corp.intel.com>
Message-ID: <4B888824-AC50-4E46-8693-D1A52E86A72B@oracle.com>

Got it.  Thanks.

> On Jul 24, 2015, at 10:04 AM, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
> 
> Hi Christian,
>   This is generated code by the ICC, that?s why you don?t see any line by line comments here. The algorithm used by Intel libm is given in comments as a preamble.
> Best Regards,
> Sandhya
>  
>  
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Christian Thalinger
> Sent: Friday, July 24, 2015 9:21 AM
> To: Deshpande, Vivek R
> Cc: Vladimir.Kozlov at oracle.com; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M): 8132207: Update for x86 exp in the math lib
>  
>  
> On Jul 23, 2015, at 1:26 PM, Deshpande, Vivek R <vivek.r.deshpande at intel.com <mailto:vivek.r.deshpande at intel.com>> wrote:
>  
> Hi all
>  
> I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM  implementation.
> Please review and sponsor this patch.
>  
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 <https://bugs.openjdk.java.net/browse/JDK-8132207> 
> 
> webrev:
> http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ <http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/> 
>  
> I have waited for this a long time and I applaud the change!  But I?m baffled by the complexity of the implementation ;-)  Are there any comments in the original Intel libm implementation which we could add here as well?
> 
> 
>  
> Thanks,
> Vivek
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/9f779076/attachment-0001.html>

From christos at zoulas.com  Fri Jul 24 18:37:26 2015
From: christos at zoulas.com (Christos Zoulas)
Date: Fri, 24 Jul 2015 14:37:26 -0400
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A5665199D@ORSMSX106.amr.corp.intel.com>
	from "Deshpande, Vivek R" (Jul 23,  6:01pm)
Message-ID: <20150724183726.3B9C717FDA8@rebar.astron.com>

On Jul 23,  6:01pm, vivek.r.deshpande at intel.com ("Deshpande, Vivek R") wrote:
-- Subject: RFR (M):  8132207: Update for x86 exp in the math lib

| Hi all
| 
| I would like to contribute a patch which optimizes Math.exp() for 64 and 32=
|  bit X86 architecture using Intel LIBM  implementation.
| Please review and sponsor this patch.
| 
| Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207
| 
| webrev:
| http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/

I would be very careful with changes like this. Are you sure that
this produces identical results with the current implementation?
In the past we've had problems with the values of transcendental
functions because JIT replaced the java implementations with native
copies and sometimes this were off by 1ULP. This made the following
code fail:

public class LogTest {

    public static void main(String[] args)
    {
        double n = Integer.parseInt("17197");
        double d = Math.log(n);
        System.out.println("n=" + n + ",log(n)=" + d);
        for (int i = 0; i < 100000; i++) {
            double e = Math.log(n);
            if (e != d) {
                System.err.println("ERROR after " + i + " iterations:\n" +
                   "previous value: " + d + " (" +
                   Long.toHexString(Double.doubleToLongBits(d)) + ")\n" +
                   " current value: " + e + " (" +
                   Long.toHexString(Double.doubleToLongBits(e)) + ")");
                System.exit(1);
            }
        }
        System.err.println("SUCCESS!");
        System.exit(0);
    }
}

christos

From vivek.r.deshpande at intel.com  Fri Jul 24 18:56:51 2015
From: vivek.r.deshpande at intel.com (Deshpande, Vivek R)
Date: Fri, 24 Jul 2015 18:56:51 +0000
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
In-Reply-To: <20150724183726.3B9C717FDA8@rebar.astron.com>
References: <53E8E64DB2403849AFD89B7D4DAC8B2A5665199D@ORSMSX106.amr.corp.intel.com>
	from "Deshpande, Vivek R" (Jul 23,  6:01pm)
	<20150724183726.3B9C717FDA8@rebar.astron.com>
Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56655B27@ORSMSX106.amr.corp.intel.com>

Hi Christos

You have a very good point. We have made sure that, interpreter, c1 and c2 give same result by using same stub.
These results also meet the 1 ulp requirement.

Regards,
Vivek

-----Original Message-----
From: Christos Zoulas [mailto:christos at zoulas.com] 
Sent: Friday, July 24, 2015 11:37 AM
To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net
Cc: Vladimir.Kozlov at oracle.com
Subject: Re: RFR (M): 8132207: Update for x86 exp in the math lib

On Jul 23,  6:01pm, vivek.r.deshpande at intel.com ("Deshpande, Vivek R") wrote:
-- Subject: RFR (M):  8132207: Update for x86 exp in the math lib

| Hi all
| 
| I would like to contribute a patch which optimizes Math.exp() for 64 
| and 32=  bit X86 architecture using Intel LIBM  implementation.
| Please review and sponsor this patch.
| 
| Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207
| 
| webrev:
| http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/

I would be very careful with changes like this. Are you sure that this produces identical results with the current implementation?
In the past we've had problems with the values of transcendental functions because JIT replaced the java implementations with native copies and sometimes this were off by 1ULP. This made the following code fail:

public class LogTest {

    public static void main(String[] args)
    {
        double n = Integer.parseInt("17197");
        double d = Math.log(n);
        System.out.println("n=" + n + ",log(n)=" + d);
        for (int i = 0; i < 100000; i++) {
            double e = Math.log(n);
            if (e != d) {
                System.err.println("ERROR after " + i + " iterations:\n" +
                   "previous value: " + d + " (" +
                   Long.toHexString(Double.doubleToLongBits(d)) + ")\n" +
                   " current value: " + e + " (" +
                   Long.toHexString(Double.doubleToLongBits(e)) + ")");
                System.exit(1);
            }
        }
        System.err.println("SUCCESS!");
        System.exit(0);
    }
}

christos

From christos at zoulas.com  Fri Jul 24 19:17:20 2015
From: christos at zoulas.com (Christos Zoulas)
Date: Fri, 24 Jul 2015 15:17:20 -0400
Subject: RFR (M):  8132207: Update for x86 exp in the math lib
In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56655B27@ORSMSX106.amr.corp.intel.com>
	from "Deshpande, Vivek R" (Jul 24,  6:56pm)
Message-ID: <20150724191720.E77B117FDA8@rebar.astron.com>

On Jul 24,  6:56pm, vivek.r.deshpande at intel.com ("Deshpande, Vivek R") wrote:
-- Subject: RE: RFR (M):  8132207: Update for x86 exp in the math lib

| Hi Christos
| 
| You have a very good point. We have made sure that, interpreter, c1 and c2 =
| give same result by using same stub.
| These results also meet the 1 ulp requirement.

Excellent, many thanks! I was just making sure :-) I hate debugging these
kinds of things...

christos

From dean.long at oracle.com  Fri Jul 24 20:19:36 2015
From: dean.long at oracle.com (Dean Long)
Date: Fri, 24 Jul 2015 13:19:36 -0700
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
In-Reply-To: <55B20A62.2090307@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com>
Message-ID: <55B29DD8.40008@oracle.com>

I can push it for you.  Do you need another review?

dl

On 7/24/2015 2:50 AM, Aleksey Shipilev wrote:
> On 21.07.2015 13:05, Aleksey Shipilev wrote:
>> On 21.07.2015 00:14, John Rose wrote:
>>> On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>>>> On the road from Unsafe to VarHandles lies a small deficiency in C1
>>>> Class.cast/isInstance optimization: the canonicalizer folds constant
>>>> class perfectly when it is coming from "inlined" constant, but not from
>>>> static final, because the constant "shapes" are different:
>>>>   https://bugs.openjdk.java.net/browse/JDK-8131782
>>> I suggest a deeper fix, to the factory that produces the oddly formatted constant.
>>> That may help with other, similar constant folding problems.
>> All right, let's do that!
>>    http://cr.openjdk.java.net/~shade/8131782/webrev.02/
>>
>> I respinned it through JRPT and my targeted benchmarks, and it performs
>> the same as previous patch.
> Any other reviews pending? If not, please sponsor!
>
> Thanks,
> -Aleksey
>
>


From dean.long at oracle.com  Fri Jul 24 20:34:02 2015
From: dean.long at oracle.com (Dean Long)
Date: Fri, 24 Jul 2015 13:34:02 -0700
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <55B22191.9070904@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
Message-ID: <55B2A13A.4000604@oracle.com>

On 7/24/2015 4:29 AM, Tobias Hartmann wrote:
> Hi Roland,
>
> thanks for the review!
>
> On 23.07.2015 15:51, Roland Westrelin wrote:
>> assembler.cpp
>>
>> 68     Compile::current()->env()->record_failure("CodeCache is full?);
>>
>> That assumes we are calling this from c2 but it can be called from c1 as well.
> You are right. I moved this code to the C2 methods calling 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods already contain a call to 'bailout()'.

In the new webrev, aarch64 emit_trampoline_stub still calls 
Compile::current()->env()->record_failure,
and it appears that emit_trampoline_stub can be called from C1. 
Shouldn't we fix it so that
ciEnv::record_failure works correctly from C1?  Why does C1 need a 
different bailout message?

dl

> I also noticed that we multiply the 'to_interp_stub_size()' by 2 when creating the stub (see compiledIC_sparc.cpp):
>
> 68   __ start_a_stub(to_interp_stub_size()*2);
>
> This seems to be unnecessary and causes problems in "Compile::scratch_emit_size" because we fix the stub section of the CodeBuffer to MAX_stubs_size and therefore fail to emit the to-interpreter stub since to_interp_stub_size()*2 > MAX_stubs_size. I removed the multiplication and added an assert to 'Compile::scratch_emit_size()'.
>   
>> Did you add code for c1 to be on the safe side or have you observed problems with c1?
> I did not encounter the problem with C1 but added the checks to be on the safe side. Actually, we do call bailout() in C1 'LIR_Assembler::emit_static_call_stub()' if the stub cannot be created but without checking for bailed_out() in the calling method we will continue to emit code and crash (see explanation below).
>
>> "the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.?
>>
>> What addresses are set to badAddress?
> If CodeBuffer::expand() fails, the corresponding buffer blob in the code cache is freed and code section addresses are set to 'badAddress' (see CodeBuffer::set_blob(NULL)). If we continue to emit code into this buffer or use its section addresses, we fail. The assert "wrong size of mach node" is hit because we use CodeBuffer::insts_size() which is now -1647855886. Another instance of this bug fails because we continue to emit code, call CodeSection::emit_int32() and fail because end() is set to 'badAddress'.
>
> I did some more testing with "java -XX:+StressCodeBuffers -Xcomp -version" and found more places where we have to check for a failed CodeBuffer expansion to not continue to emit code and crash. I added the corresponding checks.
>
> Here is the new webrev:
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
>
> Thanks,
> Tobias
>
>> Roland.
>>
>>> Thanks,
>>> Tobias
>>>
>>> On 21.07.2015 15:40, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following patch.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8130309
>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
>>>>
>>>> Problem:
>>>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
>>>>
>>>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
>>>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
>>>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
>>>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
>>>>
>>>> Solution:
>>>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed.
>>>>
>>>> Testing:
>>>> - Failing test
>>>> - JPRT
>>>>
>>>> Thanks,
>>>> Tobias
>>>>


From dean.long at oracle.com  Fri Jul 24 21:21:44 2015
From: dean.long at oracle.com (Dean Long)
Date: Fri, 24 Jul 2015 14:21:44 -0700
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <55B2A13A.4000604@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com>
Message-ID: <55B2AC68.7030900@oracle.com>

If TraceJumps causes problems on Sparc, then I think the Sparc version 
of to_interp_stub_size() needs to be adjusted to take TraceJumps into 
account.

dl

From john.r.rose at oracle.com  Sat Jul 25 00:12:35 2015
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 24 Jul 2015 17:12:35 -0700
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
In-Reply-To: <55B29DD8.40008@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com>
	<55B29DD8.40008@oracle.com>
Message-ID: <CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>

On Jul 24, 2015, at 1:19 PM, Dean Long <dean.long at oracle.com> wrote:
> 
> I can push it for you.  Do you need another review?

I don't think he does; it's a simple change.  ? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/6ef303e9/attachment-0001.html>

From dean.long at oracle.com  Sat Jul 25 01:22:41 2015
From: dean.long at oracle.com (Dean Long)
Date: Fri, 24 Jul 2015 18:22:41 -0700
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <55B2A13A.4000604@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com>
Message-ID: <55B2E4E1.6060503@oracle.com>

On 7/24/2015 1:34 PM, Dean Long wrote:
> On 7/24/2015 4:29 AM, Tobias Hartmann wrote:
>> Hi Roland,
>>
>> thanks for the review!
>>
>> On 23.07.2015 15:51, Roland Westrelin wrote:
>>> assembler.cpp
>>>
>>> 68 Compile::current()->env()->record_failure("CodeCache is full?);
>>>
>>> That assumes we are calling this from c2 but it can be called from 
>>> c1 as well.
>> You are right. I moved this code to the C2 methods calling 
>> 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods 
>> already contain a call to 'bailout()'.
>
> In the new webrev, aarch64 emit_trampoline_stub still calls 
> Compile::current()->env()->record_failure,
> and it appears that emit_trampoline_stub can be called from C1. 
> Shouldn't we fix it so that
> ciEnv::record_failure works correctly from C1?  Why does C1 need a 
> different bailout message?
>
> dl

I went ahead and file a separate RFE for the bailout issue: 
https://bugs.openjdk.java.net/browse/JDK-8132354

dl

From dean.long at oracle.com  Sat Jul 25 01:35:17 2015
From: dean.long at oracle.com (Dean Long)
Date: Fri, 24 Jul 2015 18:35:17 -0700
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
In-Reply-To: <CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com>
	<55B29DD8.40008@oracle.com>
	<CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>
Message-ID: <55B2E7D5.7060806@oracle.com>

OK I will push it now.  I did 'hg ci -u shade' so that Aleksey gets 
credit for it, and jcheck isn't complaining.  Does anyone know if JPRT 
does more checks for Committer status?  If so I'll have to redo it and 
add a Contributed-by line.

dl

changeset:   8724:df802f98b828
tag:         tip
user:        shade
date:        Fri Jul 24 21:29:11 2015 -0400
files:       src/share/vm/c1/c1_ValueType.cpp
description:
8131782: C1 Class.cast optimization breaks when Class is loaded from 
static final
Summary: change as_ValueType() to return InstanceConstant when appropriate
Reviewed-by: jrose


On 7/24/2015 5:12 PM, John Rose wrote:
> On Jul 24, 2015, at 1:19 PM, Dean Long <dean.long at oracle.com 
> <mailto:dean.long at oracle.com>> wrote:
>>
>> I can push it for you.  Do you need another review?
>
> I don't think he does; it's a simple change.  ? John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/c874a3b2/attachment.html>

From john.r.rose at oracle.com  Sat Jul 25 02:32:02 2015
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 24 Jul 2015 19:32:02 -0700
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
In-Reply-To: <55B2E7D5.7060806@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com>
	<55B29DD8.40008@oracle.com>
	<CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>
	<55B2E7D5.7060806@oracle.com>
Message-ID: <EB03B403-9D7D-4EF6-BF3A-FDEB492D3578@oracle.com>

You are fine.
Aleksey is an Author for JDK 9:  http://openjdk.java.net/census#shade
Committers who sponsor changes are expected to use the correct Author (not themselves).
It's a syntax error to push a non-Author changeset to an OpenJDK repo.
BTW, jcheck does not consult the OJN census AFAIK.

? John

On Jul 24, 2015, at 6:35 PM, Dean Long <dean.long at oracle.com> wrote:
> 
> OK I will push it now.  I did 'hg ci -u shade' so that Aleksey gets credit for it, and jcheck isn't complaining.  Does anyone know if JPRT does more checks for Committer status?  If so I'll have to redo it and add a Contributed-by line.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150724/8e7556fd/attachment.html>

From tobias.hartmann at oracle.com  Mon Jul 27 05:44:27 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 27 Jul 2015 07:44:27 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55B2A13A.4000604@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com>
Message-ID: <55B5C53B.5080606@oracle.com>

Hi Dean,

On 24.07.2015 22:34, Dean Long wrote:
> On 7/24/2015 4:29 AM, Tobias Hartmann wrote:
>> Hi Roland,
>>
>> thanks for the review!
>>
>> On 23.07.2015 15:51, Roland Westrelin wrote:
>>> assembler.cpp
>>>
>>> 68     Compile::current()->env()->record_failure("CodeCache is full?);
>>>
>>> That assumes we are calling this from c2 but it can be called from c1 as well.
>> You are right. I moved this code to the C2 methods calling 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods already contain a call to 'bailout()'.
> 
> In the new webrev, aarch64 emit_trampoline_stub still calls Compile::current()->env()->record_failure,
> and it appears that emit_trampoline_stub can be called from C1. Shouldn't we fix it so that
> ciEnv::record_failure works correctly from C1?  Why does C1 need a different bailout message?

Roland was referring to AbstractAssembler::start_a_stub() which is called from C1 and C2. However, CompiledStaticCall::emit_to_interp_stub() is only used by C2. C1 emits stubs via LIR_Assembler, for example LIR_Assembler::emit_static_call_stub(). 

I agree that it would be best to have a shared bailout mechanism, +1 to JDK-8132354.

Best,
Tobias
 
> dl
> 
>> I also noticed that we multiply the 'to_interp_stub_size()' by 2 when creating the stub (see compiledIC_sparc.cpp):
>>
>> 68   __ start_a_stub(to_interp_stub_size()*2);
>>
>> This seems to be unnecessary and causes problems in "Compile::scratch_emit_size" because we fix the stub section of the CodeBuffer to MAX_stubs_size and therefore fail to emit the to-interpreter stub since to_interp_stub_size()*2 > MAX_stubs_size. I removed the multiplication and added an assert to 'Compile::scratch_emit_size()'.
>>  
>>> Did you add code for c1 to be on the safe side or have you observed problems with c1?
>> I did not encounter the problem with C1 but added the checks to be on the safe side. Actually, we do call bailout() in C1 'LIR_Assembler::emit_static_call_stub()' if the stub cannot be created but without checking for bailed_out() in the calling method we will continue to emit code and crash (see explanation below).
>>
>>> "the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.?
>>>
>>> What addresses are set to badAddress?
>> If CodeBuffer::expand() fails, the corresponding buffer blob in the code cache is freed and code section addresses are set to 'badAddress' (see CodeBuffer::set_blob(NULL)). If we continue to emit code into this buffer or use its section addresses, we fail. The assert "wrong size of mach node" is hit because we use CodeBuffer::insts_size() which is now -1647855886. Another instance of this bug fails because we continue to emit code, call CodeSection::emit_int32() and fail because end() is set to 'badAddress'.
>>
>> I did some more testing with "java -XX:+StressCodeBuffers -Xcomp -version" and found more places where we have to check for a failed CodeBuffer expansion to not continue to emit code and crash. I added the corresponding checks.
>>
>> Here is the new webrev:
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
>>
>> Thanks,
>> Tobias
>>
>>> Roland.
>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>> On 21.07.2015 15:40, Tobias Hartmann wrote:
>>>>> Hi,
>>>>>
>>>>> please review the following patch.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8130309
>>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/
>>>>>
>>>>> Problem:
>>>>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert.
>>>>>
>>>>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad):
>>>>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes
>>>>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes
>>>>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.
>>>>>
>>>>> Solution:
>>>>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed.
>>>>>
>>>>> Testing:
>>>>> - Failing test
>>>>> - JPRT
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
> 

From tobias.hartmann at oracle.com  Mon Jul 27 05:58:14 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 27 Jul 2015 07:58:14 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55B2AC68.7030900@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com>
	<55B2AC68.7030900@oracle.com>
Message-ID: <55B5C876.6020701@oracle.com>


On 24.07.2015 23:21, Dean Long wrote:
> If TraceJumps causes problems on Sparc, then I think the Sparc version of to_interp_stub_size() needs to be adjusted to take TraceJumps into account.

No, the Sparc version of to_interp_stub_size() already takes TraceJumps into account:

  90 int CompiledStaticCall::to_interp_stub_size() {
  91   // This doesn't need to be accurate but it must be larger or equal to
  92   // the real size of the stub.
  93   return (NativeMovConstReg::instruction_size +  // sethi/setlo;
  94           NativeJump::instruction_size + // sethi; jmp; nop
  95           (TraceJumps ? 20 * BytesPerInstWord : 0) );
  96 }

The problem is that the additional code needed for TraceJumps does not fit into the scratch buffer because we only allocate 'MAX_stubs_size' for stubs and cannot expand the buffer (see Compile::scratch_emit_size()).

Other solutions would be to increase 'MAX_stubs_size' (which does not make sense because this is only a debug case) or to not emit the stub if we are 'in_scratch_emit_size()'. I saw you filed JDK-8132344 which should take care of this issue in general.

Best,
Tobias

> 
> dl

From aleksey.shipilev at oracle.com  Mon Jul 27 08:51:59 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 27 Jul 2015 11:51:59 +0300
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class
	is loaded from static final
In-Reply-To: <EB03B403-9D7D-4EF6-BF3A-FDEB492D3578@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com>
	<55B29DD8.40008@oracle.com>
	<CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>
	<55B2E7D5.7060806@oracle.com>
	<EB03B403-9D7D-4EF6-BF3A-FDEB492D3578@oracle.com>
Message-ID: <55B5F12F.5050904@oracle.com>

Thanks for pushing the change, Dean!

-Aleksey

On 07/25/2015 05:32 AM, John Rose wrote:
> You are fine.
> Aleksey is an Author for JDK 9:  http://openjdk.java.net/census#shade
> Committers who sponsor changes are expected to use the correct Author
> (not themselves).
> It's a syntax error to push a non-Author changeset to an OpenJDK repo.
> BTW, jcheck does not consult the OJN census AFAIK.
> 
> ? John
> 
> On Jul 24, 2015, at 6:35 PM, Dean Long <dean.long at oracle.com
> <mailto:dean.long at oracle.com>> wrote:
>>
>> OK I will push it now.  I did 'hg ci -u shade' so that Aleksey gets
>> credit for it, and jcheck isn't complaining.  Does anyone know if JPRT
>> does more checks for Committer status?  If so I'll have to redo it and
>> add a Contributed-by line.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150727/4d54d068/signature-0001.asc>

From aleksey.shipilev at oracle.com  Mon Jul 27 09:13:53 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 27 Jul 2015 12:13:53 +0300
Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>
	<55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
Message-ID: <55B5F651.5090509@oracle.com>

Thanks Goetz! Fixed the assembler_ppc.inline.hpp.

Andrew/Edward, are you OK with AArch64 part?
  http://cr.openjdk.java.net/~shade/8131682/webrev.02/

Thanks,
-Aleksey

On 07/24/2015 01:38 PM, Lindenmaier, Goetz wrote:
> Hi Aleksey,
> 
> thanks for pointing us to that change!
> Looks good, but does not compile. Default arg should only be in the header.
> See below.
> 
> Ppc part reviewed and I don?t need a new webrev.
> 
> Best regards,
>   Goetz.
> 
> --- a/src/cpu/ppc/vm/assembler_ppc.inline.
> +++ b/src/cpu/ppc/vm/assembler_ppc.inline.hpp
> @@ -210,7 +210,7 @@
>  inline void Assembler::extsw(   Register a, Register s)                { emit_int32(EXTSW_OPCODE   | rta(a) | rs(s) | rc(0)); }
> 
>  // extended mnemonics
> -inline void Assembler::nop()                              { Assembler::ori(R0, R0, 0); }
> +inline void Assembler::nop(int count)                     { for (int i = 0; i < count; i++) { Assembler::ori(R0, R0, 0); } }
>  // NOP for FP and BR units (different versions to allow them to be in one group)
>  inline void Assembler::fpnop0()                           { Assembler::fmr(F30, F30); }
>  inline void Assembler::fpnop1()                           { Assembler::fmr(F31, F31); }
> 
> 
> g++ 4.8.3:
> In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.inline.hpp:43:0,                                                                                   
>                  from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/macroAssembler_ppc.inline.hpp:29,                                                                              
>                  from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/macroAssembler.inline.hpp:43,                                                                                
>                  from ../generated/adfiles/ad_ppc_64.cpp:56:                                                                                                                                 
> /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.inline.hpp:213:41: error: default argument given for parameter 1 of void Assembler::nop(int) [-fpermissive]          
>  inline void Assembler::nop(int count = 1)                 { for(int i = 0; i < count; i++)                                                                                                  
>                                          ^                                                                                                                                                   
> In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.hpp:434:0,                                                                                         
>                  from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/nativeInst_ppc.hpp:29,                                                                                         
>                  from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/code/nativeInst.hpp:41,                                                                                          
>                  from ../generated/adfiles/ad_ppc_64.hpp:57,                                                                                                                                 
>                  from ../generated/adfiles/ad_ppc_64.cpp:54:                                                                                                                                 
> /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.hpp:1383:15: error: after previous specification in void Assembler::nop(int) [-fpermissive]                          
>    inline void nop(int count = 1);                                                                                                                                                           
>                ^     
> 
> 
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Aleksey Shipilev
> Sent: Friday, July 24, 2015 11:49 AM
> To: Dean Long; hotspot compiler
> Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
> Subject: Re: RFR (S) 8131682: C1 should use multibyte nops everywhere
> 
> * PGP Signed by an unknown key
> 
> (explicitly cc'ing AArch64 and PPC folks)
> 
> Thanks,
> -Aleksey
> 
> On 22.07.2015 11:10, Aleksey Shipilev wrote:
>> Thanks for review, Dean!
>>
>> I'd like to hear the opinions of AArch64 and Power folks, since we
>> contaminate their assemblers a bit to gain access to x86 fat nops.
>>
>> -Aleksey
>>
>> On 21.07.2015 23:28, Dean Long wrote:
>>> This version looks good.
>>>
>>> dl
>>>
>>> On 7/20/2015 7:51 AM, Aleksey Shipilev wrote:
>>>> Hi Dean,
>>>>
>>>> Thanks for taking a look!
>>>>
>>>> Silly me, I should have left the call patching cases intact, because
>>>> you're right, we should be able to patch the nops partially while still
>>>> producing the correct instruction stream. Therefore, I reverted the
>>>> cases where we do nop-ing for *instruction* patching, and added the
>>>> comment there.
>>>>
>>>> Other places seem to use the nop sequences to provide the alignment, not
>>>> for the general patching. Especially interesting for us is the case of
>>>> aligning the patcheable immediate in the existing call. C2 does the nops
>>>> in these cases.
>>>>
>>>> New webrev:
>>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.01/
>>>>
>>>> Testing:
>>>>    * JPRT -testset hotspot on open platforms;
>>>>    * Targeted benchmarks, plus eyeballing the assembly;
>>>>
>>>> Thanks,
>>>> -Aleksey
>>>>
>>>> On 18.07.2015 10:51, Dean Long wrote:
>>>>> I think we should distinguish the different uses and treat them
>>>>> accordingly:
>>>>>
>>>>> 1) padding nops for patching, executed
>>>>>
>>>>> We need to be careful about inserting a fat nop here, if later patching
>>>>> overwrites only part of the fat nop, resulting in an illegal intruction.
>>>>>
>>>>> 2) padding nops for patching, never executed
>>>>>
>>>>> It should be safe insert a fat nop here, but there's no point if the
>>>>> nops are not reachable and never executed.
>>>>>
>>>>>
>>>>> 3) alignment nops, never patched, executed
>>>>>
>>>>> Fat nops are fine, but on some CPUs branching may be even better, so I
>>>>> suggest using align() for this, and letting align() decide what to
>>>>> generate.  The change in check_icache() could use a version of align
>>>>> that takes the target  offset as an argument:
>>>>>
>>>>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);
>>>>>
>>>>> 4) alignment nops, never patched, never executed
>>>>>
>>>>> Doesn't matter what we emit here, but we might as well make it
>>>>> understandable by humans using a debugger.
>>>>>
>>>>>
>>>>> I believe the patching nops in c1_CodeStubs_x86.cpp and
>>>>> c1_LIRAssembler.cpp are patched concurrently while the code is running,
>>>>> not at a safepoint, so it's not clear to me if it's safe to use fat nops
>>>>> on x86.  I would consider those changes unsafe on x86 without further
>>>>> analysis of what happens during patching.
>>>>>
>>>>> dl
>>>>>
>>>>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
>>>>>> Hi there,
>>>>>>
>>>>>> C1 is not very good at inlining and intrisifying methods, and hence the
>>>>>> call performance is important there. One nit that we can see in the
>>>>>> generated code on x86 is that C1 uses the single-byte nops, even for
>>>>>> long nop strides.
>>>>>>
>>>>>> This improvement fixes that:
>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8131682
>>>>>>     http://cr.openjdk.java.net/~shade/8131682/webrev.00/
>>>>>>
>>>>>> Testing:
>>>>>>     - JPRT -testset hotspot on open platforms
>>>>>>     - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>>>>>>
>>>>>> (I understand the symmetric change is going to be needed in closed
>>>>>> parts, but let's polish the open part first).
>>>>>>
>>>>>> Thanks,
>>>>>> -Aleksey
>>>>>>
>>>>
>>>
>>
>>
> 
> 
> 
> * Unknown Key
> * 0x62A119A7
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150727/eab2bcd8/signature.asc>

From adinn at redhat.com  Mon Jul 27 09:35:09 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 27 Jul 2015 10:35:09 +0100
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte
	nops everywhere
In-Reply-To: <55B5F651.5090509@oracle.com>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>	<55AF500B.9000505@oracle.com>
	<55B20A16.7020300@oracle.com>	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
	<55B5F651.5090509@oracle.com>
Message-ID: <55B5FB4D.4070603@redhat.com>

On 27/07/15 10:13, Aleksey Shipilev wrote:
> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
> 
> Andrew/Edward, are you OK with AArch64 part?
>   http://cr.openjdk.java.net/~shade/8131682/webrev.02/

Yes, it's fine.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From aph at redhat.com  Mon Jul 27 10:21:47 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 27 Jul 2015 11:21:47 +0100
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte
	nops everywhere
In-Reply-To: <55B5F651.5090509@oracle.com>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>	<55AF500B.9000505@oracle.com>
	<55B20A16.7020300@oracle.com>	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
	<55B5F651.5090509@oracle.com>
Message-ID: <55B6063B.3070604@redhat.com>

On 27/07/15 10:13, Aleksey Shipilev wrote:
> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
> 
> Andrew/Edward, are you OK with AArch64 part?
>   http://cr.openjdk.java.net/~shade/8131682/webrev.02/

I agree that it looks good.  Please have a look to see how many NOPs take the
same time as a branch.

Thanks,
Andrew.


From roland.westrelin at oracle.com  Mon Jul 27 10:29:29 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 27 Jul 2015 12:29:29 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is
	out of space
In-Reply-To: <55B22191.9070904@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
Message-ID: <ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>

> Here is the new webrev:
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/

CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think)

Roland.

From aleksey.shipilev at oracle.com  Mon Jul 27 10:53:20 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 27 Jul 2015 13:53:20 +0300
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte
	nops everywhere
In-Reply-To: <55B6063B.3070604@redhat.com>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>	<55AF500B.9000505@oracle.com>
	<55B20A16.7020300@oracle.com>	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
	<55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com>
Message-ID: <55B60DA0.30501@oracle.com>

On 07/27/2015 01:21 PM, Andrew Haley wrote:
> On 27/07/15 10:13, Aleksey Shipilev wrote:
>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>
>> Andrew/Edward, are you OK with AArch64 part?
>>   http://cr.openjdk.java.net/~shade/8131682/webrev.02/
> 
> I agree that it looks good.  Please have a look to see how many NOPs take the
> same time as a branch.

Thanks!

I don't quite believe we should spend time trying branches for nops, at
least for x86. The change we are discussing follows the Intel
Optimization Reference Manual 3.5.1.10 "Using NOPs", which
Assembler::align for x86 seems to implement with some bells and
whistles. Agner agrees on using multi-byte nops (0F 1F ...) on modern
x86 chips as well; up to the point he claims 4 insn/clock throughput for
them.

Is there a vendor-recommended strategy for using something else? Even if
it's so, this calls for experimenting with Assembler::align itself (that
also touches C2 usages), and not the C1-specific usages this trivial
change addresses.

Thanks again,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150727/7eedfd39/signature.asc>

From roland.westrelin at oracle.com  Mon Jul 27 11:31:21 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 27 Jul 2015 13:31:21 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is
	out of space
In-Reply-To: <ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>
Message-ID: <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>

>> Here is the new webrev:
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
> 
> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think)

Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller?

Roland.

From aph at redhat.com  Mon Jul 27 12:07:12 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 27 Jul 2015 13:07:12 +0100
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte
	nops everywhere
In-Reply-To: <55B60DA0.30501@oracle.com>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>	<55AF500B.9000505@oracle.com>
	<55B20A16.7020300@oracle.com>	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
	<55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com>
	<55B60DA0.30501@oracle.com>
Message-ID: <55B61EF0.40803@redhat.com>

On 07/27/2015 11:53 AM, Aleksey Shipilev wrote:
> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>
>>> Andrew/Edward, are you OK with AArch64 part?
>>>   http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>>
>> I agree that it looks good.  Please have a look to see how many NOPs take the
>> same time as a branch.
> 
> Thanks!
> 
> I don't quite believe we should spend time trying branches for nops, at
> least for x86. The change we are discussing follows the Intel
> Optimization Reference Manual 3.5.1.10 "Using NOPs", which
> Assembler::align for x86 seems to implement with some bells and
> whistles. Agner agrees on using multi-byte nops (0F 1F ...) on modern
> x86 chips as well; up to the point he claims 4 insn/clock throughput for
> them.

Sure.  My apologies: I responded to the wrong person.  My interest is
about AArch64.

Andrew.


From tobias.hartmann at oracle.com  Mon Jul 27 14:36:11 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 27 Jul 2015 16:36:11 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>
	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>
Message-ID: <55B641DB.1010608@oracle.com>


On 27.07.2015 13:31, Roland Westrelin wrote:
>>> Here is the new webrev:
>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
>>
>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think)

Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc.
 
> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller?

I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64.

I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it.

Here is the new webrev:
http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/

Thanks,
Tobias

> 
> Roland.
> 

From roland.westrelin at oracle.com  Mon Jul 27 14:57:17 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 27 Jul 2015 16:57:17 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is
	out of space
In-Reply-To: <55B641DB.1010608@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>
	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>
	<55B641DB.1010608@oracle.com>
Message-ID: <5B5E5666-647E-4B96-9254-07CB477DE6AD@oracle.com>

> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/

That looks good to me.

Roland.

From vladimir.kozlov at oracle.com  Mon Jul 27 16:38:04 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 27 Jul 2015 09:38:04 -0700
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55AF4D7C.1030706@oracle.com>
References: <55A3EBF8.2020208@oracle.com>
	<55A484F7.4010705@oracle.com>	<55A53862.7040308@oracle.com>
	<55A542D0.50100@oracle.com>	<55A6A340.3050204@oracle.com>
	<55A6AA2B.5030005@oracle.com>	<7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>	<55AE62FE.4070502@oracle.com>	<FCF93687-81FE-4A9E-9BE9-065754AF94A1@oracle.com>
	<55AF4D7C.1030706@oracle.com>
Message-ID: <55B65E6C.6050404@oracle.com>

WB method isIntrinsicAvailableForMethod0 has parameter 
compilationContext. So I don't think you need 
is_intrinsic_available(methodHandle method) variant - it is not used 
from WB.

You left is_intrinsic_available*_for* in comments in abstractCompiler.hpp

Following the same naming logic I would suggest to remove "ForMethod" in 
WB new methods names.

Missing 'virtual' in c2compiler.hpp

Leftover line in c2compiler.cpp:

+   //return Compile::is_intrinsic_available(method, 
compilation_context, false);

Indention is off - keep following lines at the same level as 
!compilation_context (one space after "("):

+       (!compilation_context.is_null() &&
+           CompilerOracle::has_option_value(compilation_context, 
"DisableIntrinsic", disable_intr) &&
+           strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL)

Remove 'return false' because following InlineUnsafeOps check disable 
some intrinsics:

+     case vmIntrinsics::_Reference_get:
+       return false;
+       break;

Also I would prefer vmIntrinsics::is_disabled_by_flags() was called from 
compiler's is_intrinsic_disabled_by_flag() flag.

Additionally I think we should not have difference in behavior of 
is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should 
follow the same rules. If we disable intrinsic on command line - C1 
should not intrinsify it. The only difference should be in supported 
intrinsics. If you think it is big change - file an other bug (it is 
bug, not rfe) to fix it separately.

Thanks,
Vladimir

On 7/22/15 12:59 AM, Zolt?n Maj? wrote:
> Hi John,
>
>
> On 07/21/2015 09:02 PM, John Rose wrote:
>> Yes, that will work, and I think it is cleaner than what we had
>> before, as well as providing the new required functionality.
>>
>> Reviewed; please get a second reviewer.
>
> thank you for the review! I'll ask Vladimir K., maybe he has time to
> look at the newest webrev.
>
>>
>> ? John
>>
>> P.S. If the unit tests want to test (via the whitebox API) whether an
>> intrinsic was compiled successfully, we might want to expose
>> Compile::gather_intrinsic_statistics, etc.  But not in this change set.
>
> That is an interesting idea. We'd also have to see if current tests
> require such functionality or if SQE plans to add tests requiring that
> functionality.
>
>>
>> P.P.S. As I think I said before, I wish we had a way to consolidate
>> the switch statements further (into vmSymbols.hpp).  But I don't see a
>> clean way to do it.
>
> Yes, that would be nice. I've not seen a good way to do that, partly
> because inconsistencies between the way C1 and C2 depends on the value
> of command-line flags.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? <zoltan.majo at oracle.com
>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>
>>> Here is the newest webrev:
>>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/
>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/top/>
>>> -
>>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/
>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/hotspot/webrev.03/>
>>
>

From vladimir.kozlov at oracle.com  Mon Jul 27 17:29:21 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 27 Jul 2015 10:29:21 -0700
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55B641DB.1010608@oracle.com>
References: <55AE4BC2.1090104@oracle.com>
	<55B0CEBE.4010100@oracle.com>	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>	<55B22191.9070904@oracle.com>	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>
	<55B641DB.1010608@oracle.com>
Message-ID: <55B66A71.8050704@oracle.com>

Use ciEnv()::current() instead of Compile::current()->env().

compile.cpp why special case for StressCodeBuffers?

Otherwise looks good.

Thanks,
Vladimir

On 7/27/15 7:36 AM, Tobias Hartmann wrote:
>
> On 27.07.2015 13:31, Roland Westrelin wrote:
>>>> Here is the new webrev:
>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
>>>
>>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think)
>
> Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc.
>
>> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller?
>
> I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64.
>
> I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it.
>
> Here is the new webrev:
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/
>
> Thanks,
> Tobias
>
>>
>> Roland.
>>

From dean.long at oracle.com  Mon Jul 27 17:44:09 2015
From: dean.long at oracle.com (Dean Long)
Date: Mon, 27 Jul 2015 10:44:09 -0700
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub fails when codecache is out of
	space
In-Reply-To: <55B641DB.1010608@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>
	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>
	<55B641DB.1010608@oracle.com>
Message-ID: <55B66DE9.2070000@oracle.com>

Looks good.  I wish the bailout/record_failure could be done in 
start_a_stub and not in the callers, but we can clean that up as part of 
8132354.

dl

On 7/27/2015 7:36 AM, Tobias Hartmann wrote:
> Here is the new webrev:
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/


From tobias.hartmann at oracle.com  Tue Jul 28 07:19:54 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 28 Jul 2015 09:19:54 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55B66A71.8050704@oracle.com>
References: <55AE4BC2.1090104@oracle.com>	<55B0CEBE.4010100@oracle.com>	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>	<55B22191.9070904@oracle.com>	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>	<55B641DB.1010608@oracle.com>
	<55B66A71.8050704@oracle.com>
Message-ID: <55B72D1A.1030706@oracle.com>

Thanks, Vladimir.

On 27.07.2015 19:29, Vladimir Kozlov wrote:
> Use ciEnv()::current() instead of Compile::current()->env().

Done.
 
> compile.cpp why special case for StressCodeBuffers?

Right, the special case is not necessary since StressCodeBuffers should not affect the scratch buffer. I removed it.

New webrev:
http://cr.openjdk.java.net/~thartmann/8130309/webrev.04/

Thanks,
Tobias

> Otherwise looks good.
> 
> Thanks,
> Vladimir
> 
> On 7/27/15 7:36 AM, Tobias Hartmann wrote:
>>
>> On 27.07.2015 13:31, Roland Westrelin wrote:
>>>>> Here is the new webrev:
>>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
>>>>
>>>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think)
>>
>> Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc.
>>
>>> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller?
>>
>> I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64.
>>
>> I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it.
>>
>> Here is the new webrev:
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/
>>
>> Thanks,
>> Tobias
>>
>>>
>>> Roland.
>>>

From tobias.hartmann at oracle.com  Tue Jul 28 07:20:11 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 28 Jul 2015 09:20:11 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <5B5E5666-647E-4B96-9254-07CB477DE6AD@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>
	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>
	<55B641DB.1010608@oracle.com>
	<5B5E5666-647E-4B96-9254-07CB477DE6AD@oracle.com>
Message-ID: <55B72D2B.6080608@oracle.com>

Thanks, Roland.

Best,
Tobias

On 27.07.2015 16:57, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/
> 
> That looks good to me.
> 
> Roland.
> 

From tobias.hartmann at oracle.com  Tue Jul 28 07:20:24 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 28 Jul 2015 09:20:24 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55B66DE9.2070000@oracle.com>
References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com>
	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>
	<55B22191.9070904@oracle.com>
	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>
	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>
	<55B641DB.1010608@oracle.com> <55B66DE9.2070000@oracle.com>
Message-ID: <55B72D38.2040705@oracle.com>

Thanks, Dean.

Best,
Tobias

On 27.07.2015 19:44, Dean Long wrote:
> Looks good.  I wish the bailout/record_failure could be done in start_a_stub and not in the callers, but we can clean that up as part of 8132354.
> 
> dl
> 
> On 7/27/2015 7:36 AM, Tobias Hartmann wrote:
>> Here is the new webrev:
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/
> 

From michael.haupt at oracle.com  Tue Jul 28 08:56:05 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Tue, 28 Jul 2015 10:56:05 +0200
Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method
Message-ID: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>

Dear all,

please review and sponsor this change.
RFE: https://bugs.openjdk.java.net/browse/JDK-8004073
Webrev: http://cr.openjdk.java.net/~mhaupt/8004073/webrev.00

This change extends the dumping facilities of the C2 IR Node hierarchy. Node::dump() is used in debugging sessions to print information about an IR node. The API is extended by these new entry points:

* void Node::dump_comp()
  -> Dump the node in compact form.

* void Node::dump_rel(), void Node::dump_rel_comp()
  -> Dump the node (in compact form) and all nodes related to it. Mark the
     current node in the output.

The notion of "related" nodes is of course a property of the node itself, or rather, of its class. This is configured in this virtual method:

* virtual void Node::rel(GrowableArray<Node*> *in_rel, GrowableArray<Node*> *out_rel, bool compact)
  -> Collect all related nodes. Store the incoming related nodes in the in_rel
     array, and the outgoing related nodes in the out_rel array. In case compact
     representation is desired, possibly collect less nodes.

This method must be overridden by all subclasses of Node that, in their notion of what related nodes are, divert from the default behaviour as specified in the implementation of Node::rel() in the Node class itself. The default is to collect all inputs and outputs till depth 1, including both data and control nodes, ignoring compactness.

There are several auxiliary methods. Node collection is chiefly facilitated by this method:

* void Node::collect_nodes(GrowableArray<Node*> *ns, int d, bool ctrl, bool data)
  -> Collect nodes till depth d (positive: inputs, negative: outputs), including
     *only* control or data nodes (this is controlled by the two bool arguments,
     and setting both to true is nonsensical).

Furthermore, there exist pre-defined collectors for common cases:

* void Node::collect_nodes_in_all_data(GrowableArray<Node*> *ns, bool ctrl)
  -> Collect the entire data input graph. Include control nodes only if
     requested.

* void Node::collect_nodes_in_all_ctrl(GrowableArray<Node*> *ns, bool data)
  -> Collect the entire control input graph. Include data nodes only if
     requested.

* void Node::collect_nodes_out_all_ctrl_boundary(GrowableArray<Node*> *ns)
  -> Collect all output nodes, stopping at control nodes, including these.

* void Node::collect_nodes_in_data_out_1(GrowableArray<Node*> *is, GrowableArray<Node*> *os, bool compact)
  -> Collect the entire data input graph, and outputs till depth 1.

Regarding compact dumping, subclasses of Node should override this virtual method:

* virtual void dump_comp_spec(outputStream *st)
  -> Dump the specifics of a node in compact form. This method is supposed to
     operate in the fashion of Node::dump_spec().

The default behaviour for compact dumping is to dump a node's name and index.

Specific notions of "related" have been added to the following node classes:
* AbsNode and subclasses
* AddNode and subclasses
* AddPNode
* AtanDNode
* BinaryNode
* BoolNode
* CosDNode
* CountBitsNode and subclasses
* Div{D,F,I,L}Node
* ExpDNode
* GotoNode
* HaltNode
* Log{10D,D}Node
* LShift{I,L}Node
* Mod{D,F,I,L}Node
* MulHiLNode
* Mul{D,F,I,L}Node and subclasses
* DivModNode and subclasses
* IfNode
* JumpNode
* SafePointNode and subclasses (may require more detail)
* StartNode and subclass
* NegNode and subclasses
* PowDNode
* IfProjNode and subclasses
* JProjNode and subclass
* ParmNode
* ReductionNode and subclasses
* Round{Double,Float}Node
* RShift{I,L}Node
* SqrtDNode
* SubNode and subclasses
* TanDNode
* AddV{B,D,F,I,L,S,_}Node
* DivV{D,F}Node
* LShiftV{B,I,L,S}Node
* MulV{D,F,I,S}Node
* OrVNode
* RShiftV{B,I,L,S}Node
* SubV{B,D,F,I,L,S}Node
* URShiftV{B,I,L,S}Node
* XorVNode
* URShift{I,L}Node

Here is a sample session in LLDB, showing the different dumps for an IfNode:

* thread #28: tid = 0x10d1ce3, 0x000000010353be17 libjvm.dylib`IfNode::Ideal(this=0x0000000104888760, phase=0x000000011cf62368, can_reshape=<unavailable>) + 77 at ifnode.cpp:1297, name = 'Java: C2 CompilerThread0', stop reason = breakpoint 1.1
    frame #0: 0x000000010353be17 libjvm.dylib`IfNode::Ideal(this=0x0000000104888760, phase=0x000000011cf62368, can_reshape=<unavailable>) + 77 at ifnode.cpp:1297
   1294	  if (remove_dead_region(phase, can_reshape))  return this;
   1295	  // No Def-Use info?
   1296	  if (!can_reshape)  return NULL;
-> 1297	  PhaseIterGVN *igvn = phase->is_IterGVN();
   1298	
   1299	  // Don't bother trying to transform a dead if
   1300	  if (in(0)->is_top())  return NULL;
(lldb) expr -- this->dump()
 82	If	===  61  79  [[ 83  84 ]] P=0.999999, C=-1.000000 !jvms: String::charAt @ bci:27
(lldb) expr -- this->dump_comp()
If(82)P=0.999999, C=-1.000000
(lldb) expr -- this->dump_rel()
 10	Parm	===  3  [[ 38  38  65  32 ]] Parm0: java/lang/String:NotNull:exact *  Oop:java/lang/String:NotNull:exact * !jvms: String::charAt @ bci:-1
 38	AddP	=== _  10  10  37  [[ 39 ]]   Oop:java/lang/String:NotNull:exact+12 * [narrow] !jvms: String::charAt @ bci:6
 39	LoadN	=== _  7  38  [[ 40 ]]  @java/lang/String:exact+12 * [narrow], name=value, idx=4; #narrowoop: char[int:>=0]:exact * !jvms: String::charAt @ bci:6
 40	DecodeN	=== _  39  [[ 55  42 ]]  #char[int:>=0]:exact * !jvms: String::charAt @ bci:6
 37	ConL	===  0  [[ 38  56 ]]  #long:12
 55	CastPP	===  47  40  [[ 56  56  97  97  96  86 ]]  #char[int:>=0]:NotNull:exact * !jvms: String::charAt @ bci:9
 56	AddP	=== _  55  55  37  [[ 57 ]]  !jvms: String::charAt @ bci:9
 7	Parm	===  3  [[ 99  98  86  65  57  32  50  39 ]] Memory  Memory: @BotPTR *+bot, idx=Bot; !jvms: String::charAt @ bci:-1
 57	LoadRange	=== _  7  56  [[ 65  58  78 ]]  @bottom[int:>=0]+12 * [narrow], idx=5; #int:>=0 !jvms: String::charAt @ bci:9
 11	Parm	===  3  [[ 78  93  65  32  24  32  65  58  86 ]] Parm1: int !jvms: String::charAt @ bci:-1
 78	CmpU	=== _  11  57  [[ 79 ]]  !jvms: String::charAt @ bci:27
 79	Bool	=== _  78  [[ 82 ]] [lt] !jvms: String::charAt @ bci:27
 82 >	If	===  61  79  [[ 83  84 ]] P=0.999999, C=-1.000000 !jvms: String::charAt @ bci:27
 83	IfTrue	===  82  [[ 99  98 ]] #1 !jvms: String::charAt @ bci:27
 84	IfFalse	===  82  [[ 86 ]] #0 !jvms: String::charAt @ bci:27
 99	Return	===  83  6  7  8  9 returns 98  [[ 0 ]] 
 98	LoadUS	===  83  7  96  [[ 99 ]]  @char[int:>=0]:exact+any *, idx=6; #char !jvms: String::charAt @ bci:27
 86	CallStaticJava	===  84  6  7  8  9 ( 85  1  1  55  11 ) [[ 87 ]] # Static uncommon_trap(reason='range_check' action='make_not_entrant')  void ( int ) C=0.000100 String::charAt @ bci:27 !jvms: String::charAt @ bci:27
(lldb) expr -- this->dump_rel_comp()
If(82)P=0.999999, C=-1.000000  Bool(79)[lt]  CmpU(78)  Parm(11)1:int  LoadRange(57) @bottom[int:>=0]+12 * [narrow], idx=5; #int:>=0
IfTrue(83)[99][98]#1  IfFalse(84)[86]#0  Return(99)  LoadUS(98) @char[int:>=0]:exact+any *, idx=6; #char  CallStaticJava(86)uncommon_trap

Best,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150728/a288a7a1/attachment-0001.html>

From aph at redhat.com  Tue Jul 28 09:35:14 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 28 Jul 2015 10:35:14 +0100
Subject: [aarch64-port-dev ] RFR: 8132010: aarch64: Typo in SHA intrinsics
	flags handling code for aarch64
In-Reply-To: <55AE5819.3070506@oracle.com>
References: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>	<1437470315.1575.9.camel@mylittlepony.linaroharston>	<1437488872.6057.3.camel@mylittlepony.linaroharston>
	<55AE5819.3070506@oracle.com>
Message-ID: <55B74CD2.8080104@redhat.com>

On 21/07/15 15:32, Zolt?n Maj? wrote:
> the fix looks good to me (I'm not a *R*eviewer).

And to me, too.  Official reviewer, please.

Thanks,
Andrew.


From aph at redhat.com  Tue Jul 28 09:36:36 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 28 Jul 2015 10:36:36 +0100
Subject: [aarch64-port-dev ] RFR: 8131062: aarch64: add support for GHASH
	acceleration
In-Reply-To: <1437491895.6739.17.camel@mylittlepony.linaroharston>
References: <1437491895.6739.17.camel@mylittlepony.linaroharston>
Message-ID: <55B74D24.7060508@redhat.com>

On 21/07/15 16:18, Edward Nevill wrote:
> http://cr.openjdk.java.net/~enevill/8131062/webrev.0/
> 
> adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions.

Looks good to me, thanks.

Official reviewer, please.

Andrew.


From zoltan.majo at oracle.com  Tue Jul 28 10:17:00 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 28 Jul 2015 12:17:00 +0200
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55B65E6C.6050404@oracle.com>
References: <55A3EBF8.2020208@oracle.com>	<55A484F7.4010705@oracle.com>	<55A53862.7040308@oracle.com>	<55A542D0.50100@oracle.com>	<55A6A340.3050204@oracle.com>	<55A6AA2B.5030005@oracle.com>	<7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>	<55AE62FE.4070502@oracle.com>	<FCF93687-81FE-4A9E-9BE9-065754AF94A1@oracle.com>	<55AF4D7C.1030706@oracle.com>
	<55B65E6C.6050404@oracle.com>
Message-ID: <55B7569C.60504@oracle.com>

Hi Vladimir,


thank you for the feedback!

On 07/27/2015 06:38 PM, Vladimir Kozlov wrote:
> WB method isIntrinsicAvailableForMethod0 has parameter 
> compilationContext. So I don't think you need 
> is_intrinsic_available(methodHandle method) variant - it is not used 
> from WB.

You're right, I removed that variant.

> You left is_intrinsic_available*_for* in comments in abstractCompiler.hpp

Updated. I also updated the text in the comments at other places so that 
they are more accurate than before.

>
> Following the same naming logic I would suggest to remove "ForMethod" 
> in WB new methods names.

Updated the method names as well.

>
> Missing 'virtual' in c2compiler.hpp

Updated as well.

>
> Leftover line in c2compiler.cpp:
>
> +   //return Compile::is_intrinsic_available(method, 
> compilation_context, false);

Thank you for spotting that, removed it.

>
> Indention is off - keep following lines at the same level as 
> !compilation_context (one space after "("):
>
> +       (!compilation_context.is_null() &&
> +           CompilerOracle::has_option_value(compilation_context, 
> "DisableIntrinsic", disable_intr) &&
> +           strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL)

Updated the indentation.

>
> Remove 'return false' because following InlineUnsafeOps check disable 
> some intrinsics:
>
> +     case vmIntrinsics::_Reference_get:
> +       return false;
> +       break;

Oh, I missed that! Updated.

>
> Also I would prefer vmIntrinsics::is_disabled_by_flags() was called 
> from compiler's is_intrinsic_disabled_by_flag() flag.

I moved the call to the compiler-specific is_intrinsic_disabled_by_flag 
methods.

>
> Additionally I think we should not have difference in behavior of 
> is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should 
> follow the same rules. If we disable intrinsic on command line - C1 
> should not intrinsify it. The only difference should be in supported 
> intrinsics. If you think it is big change - file an other bug (it is 
> bug, not rfe) to fix it separately.

Yes, I also think it makes sense that flags behave the same way for all 
compilers. But I would like to keep that as a separate issue, partly 
because I haven't figured out all details yet for implementing a fix and 
partly because the current issue is related to some "critical" nightly 
failures we have.

So I filed JDK-8132457: "Unify command-line flags controlling the usage 
of compiler intrinsics" for addressing inconsistencies in processing 
intrinsic-related command-line flags. I would like to push the newest 
webrev (if you are fine with it) and then continue with JDK-8132457. I 
hope that is OK.

Here is the newest webrev:
- top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.04/
- hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.04/

Testing:
- JPRT (testset "hotspot" + tests in compiler/intrinsics/mathexact); all 
tests pass.

Thank you and best regards,


Zoltan


>
> Thanks,
> Vladimir
>
> On 7/22/15 12:59 AM, Zolt?n Maj? wrote:
>> Hi John,
>>
>>
>> On 07/21/2015 09:02 PM, John Rose wrote:
>>> Yes, that will work, and I think it is cleaner than what we had
>>> before, as well as providing the new required functionality.
>>>
>>> Reviewed; please get a second reviewer.
>>
>> thank you for the review! I'll ask Vladimir K., maybe he has time to
>> look at the newest webrev.
>>
>>>
>>> ? John
>>>
>>> P.S. If the unit tests want to test (via the whitebox API) whether an
>>> intrinsic was compiled successfully, we might want to expose
>>> Compile::gather_intrinsic_statistics, etc.  But not in this change set.
>>
>> That is an interesting idea. We'd also have to see if current tests
>> require such functionality or if SQE plans to add tests requiring that
>> functionality.
>>
>>>
>>> P.P.S. As I think I said before, I wish we had a way to consolidate
>>> the switch statements further (into vmSymbols.hpp).  But I don't see a
>>> clean way to do it.
>>
>> Yes, that would be nice. I've not seen a good way to do that, partly
>> because inconsistencies between the way C1 and C2 depends on the value
>> of command-line flags.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? <zoltan.majo at oracle.com
>>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>>
>>>> Here is the newest webrev:
>>>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/
>>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/top/>
>>>> -
>>>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/
>>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/hotspot/webrev.03/>
>>>
>>


From goetz.lindenmaier at sap.com  Tue Jul 28 13:03:35 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 28 Jul 2015 13:03:35 +0000
Subject: [aarch64-port-dev ] RFR: 8131062: aarch64: add support for
	GHASH	acceleration
In-Reply-To: <55B74D24.7060508@redhat.com>
References: <1437491895.6739.17.camel@mylittlepony.linaroharston>
	<55B74D24.7060508@redhat.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC2D014F39@DEWDFEMB12A.global.corp.sap>

Hi Edward,

the change looks good! Reviewed.

Best regards,
  Goetz.

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley
Sent: Dienstag, 28. Juli 2015 11:37
To: edward.nevill at gmail.com; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
Subject: Re: [aarch64-port-dev ] RFR: 8131062: aarch64: add support for GHASH acceleration

On 21/07/15 16:18, Edward Nevill wrote:
> http://cr.openjdk.java.net/~enevill/8131062/webrev.0/
> 
> adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions.

Looks good to me, thanks.

Official reviewer, please.

Andrew.


From goetz.lindenmaier at sap.com  Tue Jul 28 13:05:29 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 28 Jul 2015 13:05:29 +0000
Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling
	code for aarch64
In-Reply-To: <1437488872.6057.3.camel@mylittlepony.linaroharston>
References: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>
	<1437470315.1575.9.camel@mylittlepony.linaroharston>
	<1437488872.6057.3.camel@mylittlepony.linaroharston>
Message-ID: <4295855A5C1DE049A61835A1887419CC2D014F4F@DEWDFEMB12A.global.corp.sap>

Hi Edward,

webrev.01 looks good. Reviewed.

Best regards,
  Goetz.

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Edward Nevill
Sent: Dienstag, 21. Juli 2015 16:28
To: Alexeev, Alexander
Cc: hotspot compiler; aarch64-port-dev at openjdk.java.net
Subject: Re: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64

On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote:
> On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote:
> 
> > Please review provided patch and sponsor if  approved.
> > Problem: SHA flags verification code checks condition for
> > UseSHA256Intrinsics, but corrects UseSHA1Intrinsics.
> > The patch:
> > http://cr.openjdk.java.net/~aalexeev/1/webrev.00/
> 
> Hi Alexander,
> 
> Thanks for fixing this. I will sponsor this patch.
> 
> Here is the changeset.
> 
> http://cr.openjdk.java.net/~enevill/8132010/webrev

Please disregard the above webrev. I had outstanding outgoing changes.

Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp

http://cr.openjdk.java.net/~enevill/8132010/webrev.01

Sorry for the confusion, working on too many changesets at once.

Ed.

> 
> I have tested this before and after with hotspot jtreg
> 
> Before: Test results: passed: 876; failed: 3; error: 7
> After: Test results: passed: 877; failed: 2; error: 7
> 
> The 1 test fixed is the test
> 
> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java
> 
> This regression was introduced in the following changeset
> 
> http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 
> 
> Could I have an official reviewer for this please. As this is a trivial
> 1 liner I think one reviewer should be sufficient.
> 
> All the best,
> Ed.
> 
> 


From vladimir.kozlov at oracle.com  Tue Jul 28 13:22:17 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Jul 2015 06:22:17 -0700
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55B72D1A.1030706@oracle.com>
References: <55AE4BC2.1090104@oracle.com>	<55B0CEBE.4010100@oracle.com>	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>	<55B22191.9070904@oracle.com>	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>	<55B641DB.1010608@oracle.com>	<55B66A71.8050704@oracle.com>
	<55B72D1A.1030706@oracle.com>
Message-ID: <55B78209.6050604@oracle.com>

Looks good.

Thanks,
Vladimir

On 7/28/15 12:19 AM, Tobias Hartmann wrote:
> Thanks, Vladimir.
>
> On 27.07.2015 19:29, Vladimir Kozlov wrote:
>> Use ciEnv()::current() instead of Compile::current()->env().
>
> Done.
>
>> compile.cpp why special case for StressCodeBuffers?
>
> Right, the special case is not necessary since StressCodeBuffers should not affect the scratch buffer. I removed it.
>
> New webrev:
> http://cr.openjdk.java.net/~thartmann/8130309/webrev.04/
>
> Thanks,
> Tobias
>
>> Otherwise looks good.
>>
>> Thanks,
>> Vladimir
>>
>> On 7/27/15 7:36 AM, Tobias Hartmann wrote:
>>>
>>> On 27.07.2015 13:31, Roland Westrelin wrote:
>>>>>> Here is the new webrev:
>>>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
>>>>>
>>>>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think)
>>>
>>> Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc.
>>>
>>>> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller?
>>>
>>> I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64.
>>>
>>> I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it.
>>>
>>> Here is the new webrev:
>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/
>>>
>>> Thanks,
>>> Tobias
>>>
>>>>
>>>> Roland.
>>>>

From roland.westrelin at oracle.com  Tue Jul 28 13:26:16 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 28 Jul 2015 15:26:16 +0200
Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not
	allowed any more
In-Reply-To: <DD6AD25D-2B1C-4DEE-8036-7278DFE08AD3@oracle.com>
References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com>
	<559E9CA1.2050505@oracle.com>
	<DD6AD25D-2B1C-4DEE-8036-7278DFE08AD3@oracle.com>
Message-ID: <75027CE4-5A5F-46AA-9B3A-513D7F6EA597@oracle.com>

For the record, I asked Gerard to take a look at that change and he recommended I move the code to the commandLineFlagConstraintsCompiler.* files. Here is what I?m pushing:

http://cr.openjdk.java.net/~roland/8130858/webrev.01/

Roland.

From vladimir.kozlov at oracle.com  Tue Jul 28 13:27:26 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Jul 2015 06:27:26 -0700
Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling
	code for aarch64
In-Reply-To: <1437488872.6057.3.camel@mylittlepony.linaroharston>
References: <SN1PR07MB1472CB19718AB8463EA0625799850@SN1PR07MB1472.namprd07.prod.outlook.com>	<1437470315.1575.9.camel@mylittlepony.linaroharston>
	<1437488872.6057.3.camel@mylittlepony.linaroharston>
Message-ID: <55B7833E.8010209@oracle.com>

Looks good.

Thanks,
Vladimir

On 7/21/15 7:27 AM, Edward Nevill wrote:
> On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote:
>> On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote:
>>
>>> Please review provided patch and sponsor if  approved.
>>> Problem: SHA flags verification code checks condition for
>>> UseSHA256Intrinsics, but corrects UseSHA1Intrinsics.
>>> The patch:
>>> http://cr.openjdk.java.net/~aalexeev/1/webrev.00/
>>
>> Hi Alexander,
>>
>> Thanks for fixing this. I will sponsor this patch.
>>
>> Here is the changeset.
>>
>> http://cr.openjdk.java.net/~enevill/8132010/webrev
>
> Please disregard the above webrev. I had outstanding outgoing changes.
>
> Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp
>
> http://cr.openjdk.java.net/~enevill/8132010/webrev.01
>
> Sorry for the confusion, working on too many changesets at once.
>
> Ed.
>
>>
>> I have tested this before and after with hotspot jtreg
>>
>> Before: Test results: passed: 876; failed: 3; error: 7
>> After: Test results: passed: 877; failed: 2; error: 7
>>
>> The 1 test fixed is the test
>>
>> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java
>>
>> This regression was introduced in the following changeset
>>
>> http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2
>>
>> Could I have an official reviewer for this please. As this is a trivial
>> 1 liner I think one reviewer should be sufficient.
>>
>> All the best,
>> Ed.
>>
>>
>
>

From vladimir.kozlov at oracle.com  Tue Jul 28 13:30:56 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Jul 2015 06:30:56 -0700
Subject: RFR: 8131062: aarch64: add support for GHASH acceleration
In-Reply-To: <1437491895.6739.17.camel@mylittlepony.linaroharston>
References: <1437491895.6739.17.camel@mylittlepony.linaroharston>
Message-ID: <55B78410.1050108@oracle.com>

Looks good to me.

Thanks,
Vladimir

On 7/21/15 8:18 AM, Edward Nevill wrote:
> Hi,
>
> http://cr.openjdk.java.net/~enevill/8131062/webrev.0/
>
> adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions.
>
> This patch was contributed by alexander.alexeev at caviumnetworks.com
>
> Note that the 128 pmull instructions are not supported on all aarch64. The patch uses the HWCAP_PMULL bit from getauxv() to determine whether the 128 bit pmull is supported.
>
> I have tested this with jtreg / hotspot.
>
> Without patch: Test results: passed: 876; failed: 3; error: 9
> With patch: Test results: passed: 876; failed: 3; error: 9
>
> In both cases the set of failing/error tests is identical.
>
> I have done some performance testing using TestAESMain from the jtreg/hotspot test suite. Here are the results I get:-
>
> java -XX:-UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain
>
> encode time = 66945.63635, decode time = 34085.08754
>
> java -XX:+UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain
>
> encode time = 43469.38244, decode time = 17783.6603
>
> This is an improvement of 54% and 92% respectively.
>
> Alexander has done some benchmarking to measure the raw performance improvement of GHASH on its own using the following benchmark.
>
> http://cr.openjdk.java.net/~enevill/8131062/GHash.java
>
> Here are the results he gets:-
>
> -XX:-UseGHASHIntrinsics.
>
> Benchmark             Mode  Cnt    Score   Error  Units
> GHash.calculateGHash  avgt    5  118.688 ? 0.009  us/op
>
> -XX:+UseGHASHIntrinsics
> Benchmark             Mode  Cnt   Score   Error  Units
> GHash.calculateGHash  avgt    5  21.164 ? 1.763  us/op
>
> This represents a 5.6X speed increase on the raw GHASH performance.
>
> Thanks your your review,
>
> Ed.
>
>

From roland.westrelin at oracle.com  Tue Jul 28 14:05:40 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 28 Jul 2015 16:05:40 +0200
Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2
	escape analysis
Message-ID: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>

http://cr.openjdk.java.net/~roland/8130847/webrev.00/

When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.

I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. 

Roland.

From tobias.hartmann at oracle.com  Tue Jul 28 14:08:38 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 28 Jul 2015 16:08:38 +0200
Subject: [9] RFR(S): 8130309: need to bailout cleanly if
	CompiledStaticCall::emit_to_interp_stub
	fails when codecache is out of space
In-Reply-To: <55B78209.6050604@oracle.com>
References: <55AE4BC2.1090104@oracle.com>	<55B0CEBE.4010100@oracle.com>	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>	<55B22191.9070904@oracle.com>	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>	<55B641DB.1010608@oracle.com>	<55B66A71.8050704@oracle.com>	<55B72D1A.1030706@oracle.com>
	<55B78209.6050604@oracle.com>
Message-ID: <55B78CE6.1050302@oracle.com>

Thanks, Vladimir.

Best,
Tobias

On 28.07.2015 15:22, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 7/28/15 12:19 AM, Tobias Hartmann wrote:
>> Thanks, Vladimir.
>>
>> On 27.07.2015 19:29, Vladimir Kozlov wrote:
>>> Use ciEnv()::current() instead of Compile::current()->env().
>>
>> Done.
>>
>>> compile.cpp why special case for StressCodeBuffers?
>>
>> Right, the special case is not necessary since StressCodeBuffers should not affect the scratch buffer. I removed it.
>>
>> New webrev:
>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.04/
>>
>> Thanks,
>> Tobias
>>
>>> Otherwise looks good.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/27/15 7:36 AM, Tobias Hartmann wrote:
>>>>
>>>> On 27.07.2015 13:31, Roland Westrelin wrote:
>>>>>>> Here is the new webrev:
>>>>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/
>>>>>>
>>>>>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think)
>>>>
>>>> Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc.
>>>>
>>>>> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller?
>>>>
>>>> I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64.
>>>>
>>>> I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it.
>>>>
>>>> Here is the new webrev:
>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>>>
>>>>> Roland.
>>>>>

From vladimir.kozlov at oracle.com  Tue Jul 28 14:12:42 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Jul 2015 07:12:42 -0700
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55B7569C.60504@oracle.com>
References: <55A3EBF8.2020208@oracle.com>	<55A484F7.4010705@oracle.com>	<55A53862.7040308@oracle.com>	<55A542D0.50100@oracle.com>	<55A6A340.3050204@oracle.com>	<55A6AA2B.5030005@oracle.com>	<7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>	<55AE62FE.4070502@oracle.com>	<FCF93687-81FE-4A9E-9BE9-065754AF94A1@oracle.com>	<55AF4D7C.1030706@oracle.com>	<55B65E6C.6050404@oracle.com>
	<55B7569C.60504@oracle.com>
Message-ID: <55B78DDA.1010705@oracle.com>

This looks good.

Thanks,
Vladimir

On 7/28/15 3:17 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> thank you for the feedback!
>
> On 07/27/2015 06:38 PM, Vladimir Kozlov wrote:
>> WB method isIntrinsicAvailableForMethod0 has parameter
>> compilationContext. So I don't think you need
>> is_intrinsic_available(methodHandle method) variant - it is not used
>> from WB.
>
> You're right, I removed that variant.
>
>> You left is_intrinsic_available*_for* in comments in abstractCompiler.hpp
>
> Updated. I also updated the text in the comments at other places so that
> they are more accurate than before.
>
>>
>> Following the same naming logic I would suggest to remove "ForMethod"
>> in WB new methods names.
>
> Updated the method names as well.
>
>>
>> Missing 'virtual' in c2compiler.hpp
>
> Updated as well.
>
>>
>> Leftover line in c2compiler.cpp:
>>
>> +   //return Compile::is_intrinsic_available(method,
>> compilation_context, false);
>
> Thank you for spotting that, removed it.
>
>>
>> Indention is off - keep following lines at the same level as
>> !compilation_context (one space after "("):
>>
>> +       (!compilation_context.is_null() &&
>> +           CompilerOracle::has_option_value(compilation_context,
>> "DisableIntrinsic", disable_intr) &&
>> +           strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL)
>
> Updated the indentation.
>
>>
>> Remove 'return false' because following InlineUnsafeOps check disable
>> some intrinsics:
>>
>> +     case vmIntrinsics::_Reference_get:
>> +       return false;
>> +       break;
>
> Oh, I missed that! Updated.
>
>>
>> Also I would prefer vmIntrinsics::is_disabled_by_flags() was called
>> from compiler's is_intrinsic_disabled_by_flag() flag.
>
> I moved the call to the compiler-specific is_intrinsic_disabled_by_flag
> methods.
>
>>
>> Additionally I think we should not have difference in behavior of
>> is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should
>> follow the same rules. If we disable intrinsic on command line - C1
>> should not intrinsify it. The only difference should be in supported
>> intrinsics. If you think it is big change - file an other bug (it is
>> bug, not rfe) to fix it separately.
>
> Yes, I also think it makes sense that flags behave the same way for all
> compilers. But I would like to keep that as a separate issue, partly
> because I haven't figured out all details yet for implementing a fix and
> partly because the current issue is related to some "critical" nightly
> failures we have.
>
> So I filed JDK-8132457: "Unify command-line flags controlling the usage
> of compiler intrinsics" for addressing inconsistencies in processing
> intrinsic-related command-line flags. I would like to push the newest
> webrev (if you are fine with it) and then continue with JDK-8132457. I
> hope that is OK.
>
> Here is the newest webrev:
> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.04/
> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.04/
>
> Testing:
> - JPRT (testset "hotspot" + tests in compiler/intrinsics/mathexact); all
> tests pass.
>
> Thank you and best regards,
>
>
> Zoltan
>
>
>>
>> Thanks,
>> Vladimir
>>
>> On 7/22/15 12:59 AM, Zolt?n Maj? wrote:
>>> Hi John,
>>>
>>>
>>> On 07/21/2015 09:02 PM, John Rose wrote:
>>>> Yes, that will work, and I think it is cleaner than what we had
>>>> before, as well as providing the new required functionality.
>>>>
>>>> Reviewed; please get a second reviewer.
>>>
>>> thank you for the review! I'll ask Vladimir K., maybe he has time to
>>> look at the newest webrev.
>>>
>>>>
>>>> ? John
>>>>
>>>> P.S. If the unit tests want to test (via the whitebox API) whether an
>>>> intrinsic was compiled successfully, we might want to expose
>>>> Compile::gather_intrinsic_statistics, etc.  But not in this change set.
>>>
>>> That is an interesting idea. We'd also have to see if current tests
>>> require such functionality or if SQE plans to add tests requiring that
>>> functionality.
>>>
>>>>
>>>> P.P.S. As I think I said before, I wish we had a way to consolidate
>>>> the switch statements further (into vmSymbols.hpp).  But I don't see a
>>>> clean way to do it.
>>>
>>> Yes, that would be nice. I've not seen a good way to do that, partly
>>> because inconsistencies between the way C1 and C2 depends on the value
>>> of command-line flags.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? <zoltan.majo at oracle.com
>>>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>>>
>>>>> Here is the newest webrev:
>>>>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/
>>>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/top/>
>>>>> -
>>>>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/
>>>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/hotspot/webrev.03/>
>>>>
>>>
>

From zoltan.majo at oracle.com  Tue Jul 28 15:05:30 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 28 Jul 2015 17:05:30 +0200
Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide
	information about the availability of compiler intrinsics
In-Reply-To: <55B78DDA.1010705@oracle.com>
References: <55A3EBF8.2020208@oracle.com>	<55A484F7.4010705@oracle.com>	<55A53862.7040308@oracle.com>	<55A542D0.50100@oracle.com>	<55A6A340.3050204@oracle.com>	<55A6AA2B.5030005@oracle.com>	<7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com>	<55AE62FE.4070502@oracle.com>	<FCF93687-81FE-4A9E-9BE9-065754AF94A1@oracle.com>	<55AF4D7C.1030706@oracle.com>	<55B65E6C.6050404@oracle.com>	<55B7569C.60504@oracle.com>
	<55B78DDA.1010705@oracle.com>
Message-ID: <55B79A3A.4050303@oracle.com>

Thank you, Vladimir, for the review!

Best regards,


Zoltan


On 07/28/2015 04:12 PM, Vladimir Kozlov wrote:
> This looks good.
>
> Thanks,
> Vladimir
>
> On 7/28/15 3:17 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> thank you for the feedback!
>>
>> On 07/27/2015 06:38 PM, Vladimir Kozlov wrote:
>>> WB method isIntrinsicAvailableForMethod0 has parameter
>>> compilationContext. So I don't think you need
>>> is_intrinsic_available(methodHandle method) variant - it is not used
>>> from WB.
>>
>> You're right, I removed that variant.
>>
>>> You left is_intrinsic_available*_for* in comments in 
>>> abstractCompiler.hpp
>>
>> Updated. I also updated the text in the comments at other places so that
>> they are more accurate than before.
>>
>>>
>>> Following the same naming logic I would suggest to remove "ForMethod"
>>> in WB new methods names.
>>
>> Updated the method names as well.
>>
>>>
>>> Missing 'virtual' in c2compiler.hpp
>>
>> Updated as well.
>>
>>>
>>> Leftover line in c2compiler.cpp:
>>>
>>> +   //return Compile::is_intrinsic_available(method,
>>> compilation_context, false);
>>
>> Thank you for spotting that, removed it.
>>
>>>
>>> Indention is off - keep following lines at the same level as
>>> !compilation_context (one space after "("):
>>>
>>> +       (!compilation_context.is_null() &&
>>> + CompilerOracle::has_option_value(compilation_context,
>>> "DisableIntrinsic", disable_intr) &&
>>> +           strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL)
>>
>> Updated the indentation.
>>
>>>
>>> Remove 'return false' because following InlineUnsafeOps check disable
>>> some intrinsics:
>>>
>>> +     case vmIntrinsics::_Reference_get:
>>> +       return false;
>>> +       break;
>>
>> Oh, I missed that! Updated.
>>
>>>
>>> Also I would prefer vmIntrinsics::is_disabled_by_flags() was called
>>> from compiler's is_intrinsic_disabled_by_flag() flag.
>>
>> I moved the call to the compiler-specific is_intrinsic_disabled_by_flag
>> methods.
>>
>>>
>>> Additionally I think we should not have difference in behavior of
>>> is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should
>>> follow the same rules. If we disable intrinsic on command line - C1
>>> should not intrinsify it. The only difference should be in supported
>>> intrinsics. If you think it is big change - file an other bug (it is
>>> bug, not rfe) to fix it separately.
>>
>> Yes, I also think it makes sense that flags behave the same way for all
>> compilers. But I would like to keep that as a separate issue, partly
>> because I haven't figured out all details yet for implementing a fix and
>> partly because the current issue is related to some "critical" nightly
>> failures we have.
>>
>> So I filed JDK-8132457: "Unify command-line flags controlling the usage
>> of compiler intrinsics" for addressing inconsistencies in processing
>> intrinsic-related command-line flags. I would like to push the newest
>> webrev (if you are fine with it) and then continue with JDK-8132457. I
>> hope that is OK.
>>
>> Here is the newest webrev:
>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.04/
>> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.04/
>>
>> Testing:
>> - JPRT (testset "hotspot" + tests in compiler/intrinsics/mathexact); all
>> tests pass.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/22/15 12:59 AM, Zolt?n Maj? wrote:
>>>> Hi John,
>>>>
>>>>
>>>> On 07/21/2015 09:02 PM, John Rose wrote:
>>>>> Yes, that will work, and I think it is cleaner than what we had
>>>>> before, as well as providing the new required functionality.
>>>>>
>>>>> Reviewed; please get a second reviewer.
>>>>
>>>> thank you for the review! I'll ask Vladimir K., maybe he has time to
>>>> look at the newest webrev.
>>>>
>>>>>
>>>>> ? John
>>>>>
>>>>> P.S. If the unit tests want to test (via the whitebox API) whether an
>>>>> intrinsic was compiled successfully, we might want to expose
>>>>> Compile::gather_intrinsic_statistics, etc.  But not in this change 
>>>>> set.
>>>>
>>>> That is an interesting idea. We'd also have to see if current tests
>>>> require such functionality or if SQE plans to add tests requiring that
>>>> functionality.
>>>>
>>>>>
>>>>> P.P.S. As I think I said before, I wish we had a way to consolidate
>>>>> the switch statements further (into vmSymbols.hpp).  But I don't 
>>>>> see a
>>>>> clean way to do it.
>>>>
>>>> Yes, that would be nice. I've not seen a good way to do that, partly
>>>> because inconsistencies between the way C1 and C2 depends on the value
>>>> of command-line flags.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? <zoltan.majo at oracle.com
>>>>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>>>>
>>>>>> Here is the newest webrev:
>>>>>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/
>>>>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/top/>
>>>>>> -
>>>>>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/
>>>>>> <http://cr.openjdk.java.net/%7Ezmajo/8130832/hotspot/webrev.03/>
>>>>>
>>>>
>>


From vladimir.kozlov at oracle.com  Tue Jul 28 16:29:50 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Jul 2015 09:29:50 -0700
Subject: RFR(M): 8130847: Cloned object's fields observed as null after
	C2 escape analysis
In-Reply-To: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>
References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>
Message-ID: <55B7ADFE.3060109@oracle.com>

The next change puzzles me:

-         if (!call->may_modify(tinst, phase)) {
+         if (call->may_modify(tinst, phase)) {
-           mem = call->in(TypeFunc::Memory);
+           assert(call->is_ArrayCopy(), "ArrayCopy is the only call 
node that doesn't make allocation escape");

Why only ArrayCopy? I think it is most of calls. What set of tests you ran?

Methods naming is confusing. membar_for_arraycopy() does not check for 
membar but for calls which can modify. handle_arraycopy() could be 
make_arraycopy_load().

Add explicit check:
  && strcmp(_name, "unsafe_arraycopy") != 0)

Thanks,
Vladimir

On 7/28/15 7:05 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8130847/webrev.00/
>
> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.
>
> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates.
>
> Roland.
>

From vladimir.kozlov at oracle.com  Tue Jul 28 17:24:30 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Jul 2015 10:24:30 -0700
Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method
In-Reply-To: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>
References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>
Message-ID: <55B7BACE.9080100@oracle.com>

Some first observations.

Before looking to webrev, can you use whole word Node::related(), 
dump_related(), dump_related_compact(), dump_compact()? "comp" could be 
confused for "compiled". It is more typing in debugger but it is more clear.

Also from  this->dump_rel() in your example I see that you dump a lot 
more input nodes than I expect (only up to inputs of CmpU node).
But this->dump_rel_comp() produces correct set of nodes.

It would be nice if you can avoid using macro:

+#ifndef PRODUCT
+  REL_IN_DATA_OUT_1;
+#endif

"Arithmetic nodes" are most common data nodes (vs control nodes 
this->is_CFG() == true). May be instead specialized rel() method you can 
use some flags checks in Node::rel() method.

Thanks,
Vladimir

On 7/28/15 1:56 AM, Michael Haupt wrote:
> Dear all,
>
> please review and sponsor this change.
> RFE: https://bugs.openjdk.java.net/browse/JDK-8004073
> Webrev: http://cr.openjdk.java.net/~mhaupt/8004073/webrev.00
>
> This change extends the dumping facilities of the C2 IR Node hierarchy.
> Node::dump() is used in debugging sessions to print information about an
> IR node. The API is extended by these new entry points:
>
> * void Node::dump_comp()
>    -> Dump the node in compact form.
>
> * void Node::dump_rel(), void Node::dump_rel_comp()
>    -> Dump the node (in compact form) and all nodes related to it. Mark the
>       current node in the output.
>
> The notion of "related" nodes is of course a property of the node
> itself, or rather, of its class. This is configured in this virtual method:
>
> * virtual void Node::rel(GrowableArray<Node*> *in_rel,
> GrowableArray<Node*> *out_rel, bool compact)
>    -> Collect all related nodes. Store the incoming related nodes in the
> in_rel
>       array, and the outgoing related nodes in the out_rel array. In
> case compact
>       representation is desired, possibly collect less nodes.
>
> This method must be overridden by all subclasses of Node that, in their
> notion of what related nodes are, divert from the default behaviour as
> specified in the implementation of Node::rel() in the Node class itself.
> The default is to collect all inputs and outputs till depth 1, including
> both data and control nodes, ignoring compactness.
>
> There are several auxiliary methods. Node collection is chiefly
> facilitated by this method:
>
> * void Node::collect_nodes(GrowableArray<Node*> *ns, int d, bool ctrl,
> bool data)
>    -> Collect nodes till depth d (positive: inputs, negative: outputs),
> including
>       *only* control or data nodes (this is controlled by the two bool
> arguments,
>       and setting both to true is nonsensical).
>
> Furthermore, there exist pre-defined collectors for common cases:
>
> * void Node::collect_nodes_in_all_data(GrowableArray<Node*> *ns, bool ctrl)
>    -> Collect the entire data input graph. Include control nodes only if
>       requested.
>
> * void Node::collect_nodes_in_all_ctrl(GrowableArray<Node*> *ns, bool data)
>    -> Collect the entire control input graph. Include data nodes only if
>       requested.
>
> * void Node::collect_nodes_out_all_ctrl_boundary(GrowableArray<Node*> *ns)
>    -> Collect all output nodes, stopping at control nodes, including these.
>
> * void Node::collect_nodes_in_data_out_1(GrowableArray<Node*> *is,
> GrowableArray<Node*> *os, bool compact)
>    -> Collect the entire data input graph, and outputs till depth 1.
>
> Regarding compact dumping, subclasses of Node should override this
> virtual method:
>
> * virtual void dump_comp_spec(outputStream *st)
>    -> Dump the specifics of a node in compact form. This method is
> supposed to
>       operate in the fashion of Node::dump_spec().
>
> The default behaviour for compact dumping is to dump a node's name and
> index.
>
> Specific notions of "related" have been added to the following node classes:
> * AbsNode and subclasses
> * AddNode and subclasses
> * AddPNode
> * AtanDNode
> * BinaryNode
> * BoolNode
> * CosDNode
> * CountBitsNode and subclasses
> * Div{D,F,I,L}Node
> * ExpDNode
> * GotoNode
> * HaltNode
> * Log{10D,D}Node
> * LShift{I,L}Node
> * Mod{D,F,I,L}Node
> * MulHiLNode
> * Mul{D,F,I,L}Node and subclasses
> * DivModNode and subclasses
> * IfNode
> * JumpNode
> * SafePointNode and subclasses (may require more detail)
> * StartNode and subclass
> * NegNode and subclasses
> * PowDNode
> * IfProjNode and subclasses
> * JProjNode and subclass
> * ParmNode
> * ReductionNode and subclasses
> * Round{Double,Float}Node
> * RShift{I,L}Node
> * SqrtDNode
> * SubNode and subclasses
> * TanDNode
> * AddV{B,D,F,I,L,S,_}Node
> * DivV{D,F}Node
> * LShiftV{B,I,L,S}Node
> * MulV{D,F,I,S}Node
> * OrVNode
> * RShiftV{B,I,L,S}Node
> * SubV{B,D,F,I,L,S}Node
> * URShiftV{B,I,L,S}Node
> * XorVNode
> * URShift{I,L}Node
>
> Here is a sample session in LLDB, showing the different dumps for an IfNode:
>
> * thread #28: tid = 0x10d1ce3, 0x000000010353be17
> libjvm.dylib`IfNode::Ideal(this=0x0000000104888760,
> phase=0x000000011cf62368, can_reshape=<unavailable>) + 77 at
> ifnode.cpp:1297, name = 'Java: C2 CompilerThread0', stop reason =
> breakpoint 1.1
>      frame #0: 0x000000010353be17
> libjvm.dylib`IfNode::Ideal(this=0x0000000104888760,
> phase=0x000000011cf62368, can_reshape=<unavailable>) + 77 at ifnode.cpp:1297
>     1294  if (remove_dead_region(phase, can_reshape))  return this;
>     1295  // No Def-Use info?
>     1296  if (!can_reshape)  return NULL;
> -> 1297  PhaseIterGVN *igvn = phase->is_IterGVN();
>     1298
>     1299  // Don't bother trying to transform a dead if
>     1300  if (in(0)->is_top())  return NULL;
> (lldb) expr -- this->dump()
>   82If===  61  79  [[ 83  84 ]] P=0.999999, C=-1.000000 !jvms:
> String::charAt @ bci:27
> (lldb) expr -- this->dump_comp()
> If(82)P=0.999999, C=-1.000000
> (lldb) expr -- this->dump_rel()
>   10Parm===  3  [[ 38  38  65  32 ]] Parm0:
> java/lang/String:NotNull:exact *  Oop:java/lang/String:NotNull:exact *
> !jvms: String::charAt @ bci:-1
>   38AddP=== _  10  10  37  [[ 39
> ]]   Oop:java/lang/String:NotNull:exact+12 * [narrow] !jvms:
> String::charAt @ bci:6
>   39LoadN=== _  7  38  [[ 40 ]]  @java/lang/String:exact+12 * [narrow],
> name=value, idx=4; #narrowoop: char[int:>=0]:exact *
> !jvms: String::charAt @ bci:6
>   40DecodeN=== _  39  [[ 55  42 ]]  #char[int:>=0]:exact * !jvms:
> String::charAt @ bci:6
>   37ConL===  0  [[ 38  56 ]]  #long:12
>   55CastPP===  47  40  [[ 56  56  97  97  96  86
> ]]  #char[int:>=0]:NotNull:exact * !jvms: String::charAt @ bci:9
>   56AddP=== _  55  55  37  [[ 57 ]]  !jvms: String::charAt @ bci:9
>   7Parm===  3  [[ 99  98  86  65  57  32  50  39 ]] Memory  Memory:
> @BotPTR *+bot, idx=Bot; !jvms: String::charAt @ bci:-1
>   57LoadRange=== _  7  56  [[ 65  58  78 ]]  @bottom[int:>=0]+12 *
> [narrow], idx=5; #int:>=0 !jvms: String::charAt @ bci:9
>   11Parm===  3  [[ 78  93  65  32  24  32  65  58  86 ]] Parm1: int
> !jvms: String::charAt @ bci:-1
>   78CmpU=== _  11  57  [[ 79 ]]  !jvms: String::charAt @ bci:27
>   79Bool=== _  78  [[ 82 ]] [lt] !jvms: String::charAt @ bci:27
>   82 >If===  61  79  [[ 83  84 ]] P=0.999999, C=-1.000000 !jvms:
> String::charAt @ bci:27
>   83IfTrue===  82  [[ 99  98 ]] #1 !jvms: String::charAt @ bci:27
>   84IfFalse===  82  [[ 86 ]] #0 !jvms: String::charAt @ bci:27
>   99Return===  83  6  7  8  9 returns 98  [[ 0 ]]
>   98LoadUS===  83  7  96  [[ 99 ]]  @char[int:>=0]:exact+any *, idx=6;
> #char !jvms: String::charAt @ bci:27
>   86CallStaticJava===  84  6  7  8  9 ( 85  1  1  55  11 ) [[ 87 ]] #
> Static
> uncommon_trap(reason='range_check' action='make_not_entrant')  void (
> int ) C=0.000100 String::charAt @ bci:27 !jvms: String::charAt @ bci:27
> (lldb) expr -- this->dump_rel_comp()
> If(82)P=0.999999,
> C=-1.000000  Bool(79)[lt]  CmpU(78)  Parm(11)1:int  LoadRange(57)
> @bottom[int:>=0]+12 * [narrow], idx=5; #int:>=0
> IfTrue(83)[99][98]#1  IfFalse(84)[86]#0  Return(99)  LoadUS(98)
> @char[int:>=0]:exact+any *, idx=6; #char  CallStaticJava(86)uncommon_trap
>
> Best,
>
> Michael
>
> --
>
> Oracle <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> OracleJava Platform Group | LangTools Team | Nashorn
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam,
> Germany
> Green Oracle <http://www.oracle.com/commitment>	Oracle is committed to
> developing practices and products that help protect the environment
>
>

From roland.westrelin at oracle.com  Tue Jul 28 18:20:01 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 28 Jul 2015 20:20:01 +0200
Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated by
	optimizer
In-Reply-To: <D6D02221-7131-49FA-9DB4-F058845067B6@oracle.com>
References: <BE183559-F29F-4D6B-8FFD-0A5957B59887@oracle.com>
	<55789088.5050405@oracle.com>
	<9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com>
	<5580738D.9070900@oracle.com>
	<EB23C1A3-D230-4116-9BF2-39065B89298C@oracle.com>
	<5581C465.7070803@oracle.com>
	<05B8872C-194E-4596-9291-C62262AAA930@oracle.com>
	<55891C4A.6020002@oracle.com>
	<7EBACFD5-8D27-4EB3-A6F6-851BFAAF6737@oracle.com>
	<5589232B.7080705@oracle.com>
	<D6D02221-7131-49FA-9DB4-F058845067B6@oracle.com>
Message-ID: <3883E157-55BE-4E00-87AD-C5572F92C0A4@oracle.com>

>>> What about
>>> 
>>> volatile int y;
>>> volatile int x;
>>> 
>>> y=1
>>> x=1
>>> y=2
>>> 
>>> transformed to:
>>> 
>>> x=1
>>> y=2
>>> 
>>> ?
>> 
>> I think this is not allowed, since operations over "x" get tied up in
>> the synchronization order.
> 
> Thanks. Then for support_IRIW_for_not_multiple_copy_atomic_cpu true, I don?t see how incorrect reordering is prevented.

I took another look and I was wrong about that.

void Parse::do_put_xxx(Node* obj, ciField* field, bool is_field) {
  bool is_vol = field->is_volatile();
  // If reference is volatile, prevent following memory ops from                                                                                                                                                                                                                
  // floating down past the volatile write.  Also prevents commoning                                                                                                                                                                                                            
  // another volatile read.                                                                                                                                                                                                                                                      
  if (is_vol)  insert_mem_bar(Op_MemBarRelease);

The barrier prevents y=1 from being optimized out.

Roland.

From roland.westrelin at oracle.com  Tue Jul 28 18:26:33 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 28 Jul 2015 20:26:33 +0200
Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2
	escape analysis
In-Reply-To: <55B7ADFE.3060109@oracle.com>
References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>
	<55B7ADFE.3060109@oracle.com>
Message-ID: <49EC4EF4-2ADD-4F53-80DB-1B498E91E8A2@oracle.com>

Thanks for looking at this, Vladimir.

> The next change puzzles me:
> 
> -         if (!call->may_modify(tinst, phase)) {
> +         if (call->may_modify(tinst, phase)) {
> -           mem = call->in(TypeFunc::Memory);
> +           assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape");
> 
> Why only ArrayCopy? I think it is most of calls. What set of tests you ran?

I ran:

java/lang, java/util, compiler, closed, runtime, gc jtreg tests
nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring from ute
jprt

I?m not sure if I did CTW or not but I can if you think it makes sense.

Aren?t arguments of calls marked as ArgEscape so an object that is an argument to a call cannot be scalar replaced?

> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load().
> 
> Add explicit check:
> && strcmp(_name, "unsafe_arraycopy") != 0)

Thanks for the suggestions. I?ll make the suggested changes.

Roland.

> 
> Thanks,
> Vladimir
> 
> On 7/28/15 7:05 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/
>> 
>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.
>> 
>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates.
>> 
>> Roland.
>> 


From rednaxelafx at gmail.com  Tue Jul 28 20:43:19 2015
From: rednaxelafx at gmail.com (Krystal Mok)
Date: Tue, 28 Jul 2015 13:43:19 -0700
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
In-Reply-To: <55B5F12F.5050904@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com>
	<55B29DD8.40008@oracle.com>
	<CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>
	<55B2E7D5.7060806@oracle.com>
	<EB03B403-9D7D-4EF6-BF3A-FDEB492D3578@oracle.com>
	<55B5F12F.5050904@oracle.com>
Message-ID: <CA+cQ+tTe_EM9j1C2hoCoREi5KDaUkmXb7bf0d6=4v+Foim4RRA@mail.gmail.com>

Hi Aleksey,

Thanks for fixing this in OpenJDK!
I actually noticed the same issue a few weeks ago [1], but somehow I had
missed the reply that Roland sent me, so I didn't send out a request for
review for my version of the change.
But now that it's fixed, everythings all right ;-)

Thanks,
Kris

[1]:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018179.html

On Mon, Jul 27, 2015 at 1:51 AM, Aleksey Shipilev <
aleksey.shipilev at oracle.com> wrote:

> Thanks for pushing the change, Dean!
>
> -Aleksey
>
> On 07/25/2015 05:32 AM, John Rose wrote:
> > You are fine.
> > Aleksey is an Author for JDK 9:  http://openjdk.java.net/census#shade
> > Committers who sponsor changes are expected to use the correct Author
> > (not themselves).
> > It's a syntax error to push a non-Author changeset to an OpenJDK repo.
> > BTW, jcheck does not consult the OJN census AFAIK.
> >
> > ? John
> >
> > On Jul 24, 2015, at 6:35 PM, Dean Long <dean.long at oracle.com
> > <mailto:dean.long at oracle.com>> wrote:
> >>
> >> OK I will push it now.  I did 'hg ci -u shade' so that Aleksey gets
> >> credit for it, and jcheck isn't complaining.  Does anyone know if JPRT
> >> does more checks for Committer status?  If so I'll have to redo it and
> >> add a Contributed-by line.
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150728/03faf14f/attachment-0001.html>

From vladimir.kozlov at oracle.com  Tue Jul 28 22:42:08 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Jul 2015 15:42:08 -0700
Subject: RFR(M): 8130847: Cloned object's fields observed as null after
	C2 escape analysis
In-Reply-To: <49EC4EF4-2ADD-4F53-80DB-1B498E91E8A2@oracle.com>
References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>	<55B7ADFE.3060109@oracle.com>
	<49EC4EF4-2ADD-4F53-80DB-1B498E91E8A2@oracle.com>
Message-ID: <55B80540.40904@oracle.com>

I misread assert's message - I thought "only arraycopy which modifies an 
allocate makes it escape". Have to read more carefully.

Wait, original code looks buggy - 'mem' is set to input memory 
regardless may_modify() result:

!         if (!call->may_modify(tinst, phase)) {
!           mem = call->in(TypeFunc::Memory);
           }
           mem = in->in(TypeFunc::Memory);

That is what got my attention. Your change looks fine in this code - we 
skip all call nodes except arraycopy which modifies this field/element. 
And we can skip all calls because we are processing non-escaping allocation.

Thanks,
Vladimir

On 7/28/15 11:26 AM, Roland Westrelin wrote:
> Thanks for looking at this, Vladimir.
>
>> The next change puzzles me:
>>
>> -         if (!call->may_modify(tinst, phase)) {
>> +         if (call->may_modify(tinst, phase)) {
>> -           mem = call->in(TypeFunc::Memory);
>> +           assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape");
>>
>> Why only ArrayCopy? I think it is most of calls. What set of tests you ran?
>
> I ran:
>
> java/lang, java/util, compiler, closed, runtime, gc jtreg tests
> nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring from ute
> jprt
>
> I?m not sure if I did CTW or not but I can if you think it makes sense.
>
> Aren?t arguments of calls marked as ArgEscape so an object that is an argument to a call cannot be scalar replaced?
>
>> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load().
>>
>> Add explicit check:
>> && strcmp(_name, "unsafe_arraycopy") != 0)
>
> Thanks for the suggestions. I?ll make the suggested changes.
>
> Roland.
>
>>
>> Thanks,
>> Vladimir
>>
>> On 7/28/15 7:05 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/
>>>
>>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.
>>>
>>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates.
>>>
>>> Roland.
>>>
>

From michael.haupt at oracle.com  Wed Jul 29 08:54:24 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Wed, 29 Jul 2015 10:54:24 +0200
Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method
In-Reply-To: <55B7BACE.9080100@oracle.com>
References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>
	<55B7BACE.9080100@oracle.com>
Message-ID: <1AB40291-4B32-4B33-8621-183585854169@oracle.com>

Hi Vladimir,

thank you for your comments. I have uploaded a revised webrev to http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are inlined below.

> Am 28.07.2015 um 19:24 schrieb Vladimir Kozlov <vladimir.kozlov at oracle.com>:
> Before looking to webrev, can you use whole word Node::related(), dump_related(), dump_related_compact(), dump_compact()? "comp" could be confused for "compiled". It is more typing in debugger but it is more clear.

Done.

> Also from  this->dump_rel() in your example I see that you dump a lot more input nodes than I expect (only up to inputs of CmpU node).
> But this->dump_rel_comp() produces correct set of nodes.

The depth of output can be controlled with the method Node::dump_related(int d_in, int d_out); in my initial post I had not mentioned this method. The default output is also formatted in a way that makes clear where the current node (>) is, and where all the inputs (before) and outputs (after) are. Regarding the notion of "related nodes", YMMV.

For additional illustration, I've added an implementation of related() for PhiNode.

> It would be nice if you can avoid using macro:
> 
> +#ifndef PRODUCT
> +  REL_IN_DATA_OUT_1;
> +#endif
> 
> "Arithmetic nodes" are most common data nodes (vs control nodes this->is_CFG() == true). May be instead specialized rel() method you can use some flags checks in Node::rel() method.

Done.

Best,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150729/2080c013/attachment.html>

From aleksey.shipilev at oracle.com  Wed Jul 29 08:58:51 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 29 Jul 2015 11:58:51 +0300
Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on
	failure
Message-ID: <55B895CB.4060105@oracle.com>

Hi,

I would like to suggest a fix for:
  https://bugs.openjdk.java.net/browse/JDK-8019968

In short, current reference CAS intrinsic blindly emits post_barrier,
ignoring the CAS result. In some cases, notably contended CAS
spin-loops, we fail the CAS a lot, and thus go for a post_barrier
excessively. Instead, we can conditionalize on the result of the store
itself, and put the post_barrier only on success path:
  http://cr.openjdk.java.net/~shade/8019968/webrev.01/

More performance results here:
  http://cr.openjdk.java.net/~shade/8019968/notes.txt

Thanks,
-Aleksey

P.S. Thanks to Vladimir Ivanov, who manhandled me through dealing with
GraphKit/IdealKit interop.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150729/4680f752/signature.asc>

From aleksey.shipilev at oracle.com  Wed Jul 29 09:00:25 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 29 Jul 2015 12:00:25 +0300
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class
	is loaded from static final
In-Reply-To: <CA+cQ+tTe_EM9j1C2hoCoREi5KDaUkmXb7bf0d6=4v+Foim4RRA@mail.gmail.com>
References: <55ACFD0F.30207@oracle.com>	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>	<55AE1972.4050106@oracle.com>	<55B20A62.2090307@oracle.com>	<55B29DD8.40008@oracle.com>	<CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>	<55B2E7D5.7060806@oracle.com>	<EB03B403-9D7D-4EF6-BF3A-FDEB492D3578@oracle.com>	<55B5F12F.5050904@oracle.com>
	<CA+cQ+tTe_EM9j1C2hoCoREi5KDaUkmXb7bf0d6=4v+Foim4RRA@mail.gmail.com>
Message-ID: <55B89629.8060102@oracle.com>

Ah. That would save me half a day digging in HotSpot. Thanks should go
to John Rose who suggested the factory fix, not the Class.cast intrinsic
one.

-Aleksey

On 07/28/2015 11:43 PM, Krystal Mok wrote:
> Hi Aleksey,
> 
> Thanks for fixing this in OpenJDK!
> I actually noticed the same issue a few weeks ago [1], but somehow I had
> missed the reply that Roland sent me, so I didn't send out a request for
> review for my version of the change.
> But now that it's fixed, everythings all right ;-)
> 
> Thanks,
> Kris
> 
> [1]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018179.html
> 
> On Mon, Jul 27, 2015 at 1:51 AM, Aleksey Shipilev
> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>> wrote:
> 
>     Thanks for pushing the change, Dean!
> 
>     -Aleksey
> 
>     On 07/25/2015 05:32 AM, John Rose wrote:
>     > You are fine.
>     > Aleksey is an Author for JDK 9:  http://openjdk.java.net/census#shade
>     > Committers who sponsor changes are expected to use the correct Author
>     > (not themselves).
>     > It's a syntax error to push a non-Author changeset to an OpenJDK repo.
>     > BTW, jcheck does not consult the OJN census AFAIK.
>     >
>     > ? John
>     >
>     > On Jul 24, 2015, at 6:35 PM, Dean Long <dean.long at oracle.com
>     <mailto:dean.long at oracle.com>
>     > <mailto:dean.long at oracle.com <mailto:dean.long at oracle.com>>> wrote:
>     >>
>     >> OK I will push it now.  I did 'hg ci -u shade' so that Aleksey gets
>     >> credit for it, and jcheck isn't complaining.  Does anyone know if
>     JPRT
>     >> does more checks for Committer status?  If so I'll have to redo
>     it and
>     >> add a Contributed-by line.
>     >
> 
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150729/e4a606e0/signature.asc>

From adinn at redhat.com  Wed Jul 29 09:24:38 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 29 Jul 2015 10:24:38 +0100
Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even
	on failure
In-Reply-To: <55B895CB.4060105@oracle.com>
References: <55B895CB.4060105@oracle.com>
Message-ID: <55B89BD6.7050209@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 29/07/15 09:58, Aleksey Shipilev wrote:
> I would like to suggest a fix for: 
> https://bugs.openjdk.java.net/browse/JDK-8019968
> 
> In short, current reference CAS intrinsic blindly emits
> post_barrier, ignoring the CAS result. In some cases, notably
> contended CAS spin-loops, we fail the CAS a lot, and thus go for a
> post_barrier excessively. Instead, we can conditionalize on the
> result of the store itself, and put the post_barrier only on
> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/
> 
> More performance results here: 
> http://cr.openjdk.java.net/~shade/8019968/notes.txt

Nice! The code looks fine and your test results are very convincing.
I'll be interested to see how this looks on AArch64.

That said, I am afraid you still need a Reviewer!

regards,


Andrew Dinn
- -----------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBAgAGBQJVuJvWAAoJEGnaNq4xxcSz4wMH/3azdliPZzS5ZaS4d67JHTc/
RoX9Mq6m+Aa5PwqTT5luq68B4NIuWVu808gK4iDA0+tS2DSrpgaIyzTJm7HGz94o
CYoQRxTG2git9QD1nILJBi40s1OI7zyP4GK0I6sa/2Pm6BnLZ3uA/tXo2UGthYBO
nHAb/i9DHR8i5/gIKNLezUnkRNElQRkL32cIdDxt2qlRA/KkKR1GZM+8C4SFoHxY
2SSPewZkOofF9ewByGn/aboQWBSMW6vnPNGjLm4ODSuc9PqtV3XUM3WNyL83yT9Z
PIZfUlYn8vAo+POgKsrRro0ZOj6njjG4UIDpJYp01tN8+VYpu1XquM8uhXU6BY8=
=GdV6
-----END PGP SIGNATURE-----

From aleksey.shipilev at oracle.com  Wed Jul 29 09:57:36 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 29 Jul 2015 12:57:36 +0300
Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even
	on failure
In-Reply-To: <55B89BD6.7050209@redhat.com>
References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com>
Message-ID: <55B8A390.4000203@oracle.com>

On 07/29/2015 12:24 PM, Andrew Dinn wrote:
> On 29/07/15 09:58, Aleksey Shipilev wrote:
>> I would like to suggest a fix for: 
>> https://bugs.openjdk.java.net/browse/JDK-8019968
> 
>> In short, current reference CAS intrinsic blindly emits
>> post_barrier, ignoring the CAS result. In some cases, notably
>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a
>> post_barrier excessively. Instead, we can conditionalize on the
>> result of the store itself, and put the post_barrier only on
>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/
> 
>> More performance results here: 
>> http://cr.openjdk.java.net/~shade/8019968/notes.txt
> 
> Nice! The code looks fine and your test results are very convincing.
> I'll be interested to see how this looks on AArch64.

Thanks Andrew!

The change passes JPRT, so AArch64 build is available. The benchmark JAR
mentioned in the issue comments would run without intervention, taking
around 40 minutes. You are very welcome to try, while Reviewers are
taking a look. I can do that only next week.

> That said, I am afraid you still need a Reviewer!

That reminds me I haven't spelled out what testing was done:

 * JPRT on all open platforms
 * Targeted benchmarks
 * Eyeballing the generated x86 assembly

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150729/c2490975/signature.asc>

From roland.westrelin at oracle.com  Wed Jul 29 10:51:36 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Jul 2015 12:51:36 +0200
Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation
	-XX:CICompilerCount=1 -version asserts since 8130858
Message-ID: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>

http://cr.openjdk.java.net/~roland/8132525/webrev.00/

My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process.

Roland.

From adinn at redhat.com  Wed Jul 29 10:55:48 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 29 Jul 2015 11:55:48 +0100
Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object
	stores
Message-ID: <55B8B134.2050803@redhat.com>

The following webrev is a follow-on to the fix posted for JDK-8078263.
The earlier fix optimized *non-object* volatile puts/gets to use
stlr/ldar and elide the associated leading/trailing dmb instructions.
This one extends the previous optimization to cover volatile object puts.

  http://cr.openjdk.java.net/~adinn/8078743/webrev.03/

The fix involves identifying certain Ideal Graph configurations in which
generation of leading and trailing dmb can be avoided in favour of
generating an stlr instruction. As a consequence the fix is sensitive to
the current GC configs and also to whether the value being written is
null, potentially null or known notnull. I have tested it using 5 GC
configs:

  G1GC
  CMS+UseCondCardMark
  CMS-UseCondCardMark
  Parallel+UseCondCardMark
  Parallel-UseCondCardMark

The last two configs are much of a muchness as regards what the patch
code does since with Parallel (or Serial) GC the patch code does not
need to look at the code which does the card mark -- but I tested
against both just to be sure.

Testing involved

  i) eyeballing the code generated for normal and unsafe volatile puts
with null, possibly null and notnull values

  ii) exercising a large program (build and run sample project in
netbeans) and eyeballing the code generated for methods of ConcurrentHashMap

  iii) running the full jcstress suite

Comments and reviews very welcome

n.b. I have a 3rd patch queued which performs a similar optimization for
CAS operations (drop the dmbs in favour of an ldaxr/stlxr pair).

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From roland.westrelin at oracle.com  Wed Jul 29 13:57:34 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Jul 2015 15:57:34 +0200
Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2
	escape analysis
In-Reply-To: <55B7ADFE.3060109@oracle.com>
References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>
	<55B7ADFE.3060109@oracle.com>
Message-ID: <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com>

> The next change puzzles me:
> 
> -         if (!call->may_modify(tinst, phase)) {
> +         if (call->may_modify(tinst, phase)) {
> -           mem = call->in(TypeFunc::Memory);
> +           assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape");
> 
> Why only ArrayCopy? I think it is most of calls. What set of tests you ran?
> 
> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load().

What about:

static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);

instead of membar_for_arraycopy()

So ArrayCopyNode would have:

virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase);

and

static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);

that do the same thing except the static method also looks for a graph pattern starting from a MemBar.

Roland.

> 
> Add explicit check:
> && strcmp(_name, "unsafe_arraycopy") != 0)
> 
> Thanks,
> Vladimir
> 
> On 7/28/15 7:05 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/
>> 
>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.
>> 
>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates.
>> 
>> Roland.
>> 


From aleksey.shipilev at oracle.com  Wed Jul 29 14:01:00 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 29 Jul 2015 17:01:00 +0300
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte
	nops everywhere
In-Reply-To: <55B6063B.3070604@redhat.com>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>	<55AF500B.9000505@oracle.com>
	<55B20A16.7020300@oracle.com>	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
	<55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com>
Message-ID: <55B8DC9C.7010003@oracle.com>

On 07/27/2015 01:21 PM, Andrew Haley wrote:
> On 27/07/15 10:13, Aleksey Shipilev wrote:
>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>
>> Andrew/Edward, are you OK with AArch64 part?
>>   http://cr.openjdk.java.net/~shade/8131682/webrev.02/
> 
> I agree that it looks good.  

So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
Andrew Haley. Still no Capital (R)eviewers.

Otherwise, I think we are good to go. I respinned the JPRT with
open+closed sources, and it would seem the changes in closed sources are
not required.

Please review and sponsor!

Thanks,
-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150729/7b59cbff/signature.asc>

From vladimir.x.ivanov at oracle.com  Wed Jul 29 14:16:19 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 29 Jul 2015 17:16:19 +0300
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte
	nops everywhere
In-Reply-To: <55B8DC9C.7010003@oracle.com>
References: <55A9033C.2030302@oracle.com>	<55AA0567.6070602@oracle.com>	<55AD0AE0.3060803@oracle.com>	<55AEAB5C.8050307@oracle.com>	<55AF500B.9000505@oracle.com>	<55B20A16.7020300@oracle.com>	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>	<55B5F651.5090509@oracle.com>
	<55B6063B.3070604@redhat.com> <55B8DC9C.7010003@oracle.com>
Message-ID: <55B8E033.4060900@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 7/29/15 5:01 PM, Aleksey Shipilev wrote:
> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>
>>> Andrew/Edward, are you OK with AArch64 part?
>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>>
>> I agree that it looks good.
>
> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
> Andrew Haley. Still no Capital (R)eviewers.
>
> Otherwise, I think we are good to go. I respinned the JPRT with
> open+closed sources, and it would seem the changes in closed sources are
> not required.
>
> Please review and sponsor!
>
> Thanks,
> -Aleksey
>
>

From dean.long at oracle.com  Wed Jul 29 16:30:24 2015
From: dean.long at oracle.com (Dean Long)
Date: Wed, 29 Jul 2015 09:30:24 -0700
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops
	everywhere
In-Reply-To: <55B8DC9C.7010003@oracle.com>
References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com>
	<55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com>
	<55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
	<55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com>
	<55B8DC9C.7010003@oracle.com>
Message-ID: <55B8FFA0.4070105@oracle.com>

On 7/29/2015 7:01 AM, Aleksey Shipilev wrote:
> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>
>>> Andrew/Edward, are you OK with AArch64 part?
>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>> I agree that it looks good.
> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
> Andrew Haley. Still no Capital (R)eviewers.
>
> Otherwise, I think we are good to go. I respinned the JPRT with
> open+closed sources, and it would seem the changes in closed sources are
> not required.

The changes to sparc and ppc may not be required anymore.

dl

> Please review and sponsor!
>
> Thanks,
> -Aleksey
>
>


From vladimir.kozlov at oracle.com  Wed Jul 29 17:09:22 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Jul 2015 10:09:22 -0700
Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation
	-XX:CICompilerCount=1 -version asserts since 8130858
In-Reply-To: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>
References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>
Message-ID: <55B908C2.1040301@oracle.com>

Okay.

Thanks,
Vladimir

On 7/29/15 3:51 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8132525/webrev.00/
>
> My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process.
>
> Roland.
>

From john.r.rose at oracle.com  Wed Jul 29 17:53:51 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 29 Jul 2015 10:53:51 -0700
Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is
	loaded from static final
In-Reply-To: <55B89629.8060102@oracle.com>
References: <55ACFD0F.30207@oracle.com>
	<C0DD3F77-7170-43C6-A5FF-5CD86F7727D6@oracle.com>
	<55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com>
	<55B29DD8.40008@oracle.com>
	<CC1E51CC-F4E6-47B3-97FB-417A6743E9EA@oracle.com>
	<55B2E7D5.7060806@oracle.com>
	<EB03B403-9D7D-4EF6-BF3A-FDEB492D3578@oracle.com>
	<55B5F12F.5050904@oracle.com>
	<CA+cQ+tTe_EM9j1C2hoCoREi5KDaUkmXb7bf0d6=4v+Foim4RRA@mail.gmail.com>
	<55B89629.8060102@oracle.com>
Message-ID: <0AC6A98C-8773-4A8D-9EE8-6DC9D207A0CA@oracle.com>

It's possible that Kris's suggestion helped me (through some unconscious brain backplane) to find the same fix location.

In any case, he called it first; thanks Kris, and keep up the good work!

? John

On Jul 29, 2015, at 2:00 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> 
> Ah. That would save me half a day digging in HotSpot. Thanks should go
> to John Rose who suggested the factory fix, not the Class.cast intrinsic
> one.
> 
> -Aleksey
> 
> On 07/28/2015 11:43 PM, Krystal Mok wrote:
>> Hi Aleksey,
>> 
>> Thanks for fixing this in OpenJDK!
>> I actually noticed the same issue a few weeks ago [1], but somehow I had
>> missed the reply that Roland sent me, so I didn't send out a request for
>> review for my version of the change.
>> But now that it's fixed, everythings all right ;-)
>> 
>> Thanks,
>> Kris
>> 
>> [1]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018179.html
>> 
>> On Mon, Jul 27, 2015 at 1:51 AM, Aleksey Shipilev
>> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>> wrote:
>> 
>>    Thanks for pushing the change, Dean!
>> 
>>    -Aleksey
>> 
>>    On 07/25/2015 05:32 AM, John Rose wrote:
>>> You are fine.
>>> Aleksey is an Author for JDK 9:  http://openjdk.java.net/census#shade
>>> Committers who sponsor changes are expected to use the correct Author
>>> (not themselves).
>>> It's a syntax error to push a non-Author changeset to an OpenJDK repo.
>>> BTW, jcheck does not consult the OJN census AFAIK.
>>> 
>>> ? John
>>> 
>>> On Jul 24, 2015, at 6:35 PM, Dean Long <dean.long at oracle.com
>>    <mailto:dean.long at oracle.com>
>>> <mailto:dean.long at oracle.com <mailto:dean.long at oracle.com>>> wrote:
>>>> 
>>>> OK I will push it now.  I did 'hg ci -u shade' so that Aleksey gets
>>>> credit for it, and jcheck isn't complaining.  Does anyone know if
>>    JPRT
>>>> does more checks for Committer status?  If so I'll have to redo
>>    it and
>>>> add a Contributed-by line.
>>> 
>> 
>> 
>> 
> 
> 


From gerard.ziemski at oracle.com  Wed Jul 29 18:34:35 2015
From: gerard.ziemski at oracle.com (gerard ziemski)
Date: Wed, 29 Jul 2015 13:34:35 -0500
Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation
	-XX:CICompilerCount=1 -version asserts since 8130858
In-Reply-To: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>
References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>
Message-ID: <55B91CBB.2040800@oracle.com>

hi Roland,

You put

*+    // With a client VM, -XX:+TieredCompilation causes TieredCompilation*
*+   // to be true here (the option is validated later) and*
*+   // min_number_of_compiler_threads to exceed CI_COMPILER_COUNT.*
*+   min_number_of_compiler_threads = MIN2(min_number_of_compiler_threads, CI_COMPILER_COUNT);*

into src/share/vm/runtime/commandLineFlagConstraintsCompiler.cpp file, but that's a constraint function - there should not be any code that sets anything there. Is there somewhere else we can put this code instead of the constraint function?


cheers

On 07/29/2015 05:51 AM, Roland Westrelin wrote:

> http://cr.openjdk.java.net/~roland/8132525/webrev.00/
>
> My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process.
>
> Roland.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150729/25672120/attachment.html>

From roland.westrelin at oracle.com  Wed Jul 29 18:48:44 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Jul 2015 20:48:44 +0200
Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation
	-XX:CICompilerCount=1 -version asserts since 8130858
In-Reply-To: <55B91CBB.2040800@oracle.com>
References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>
	<55B91CBB.2040800@oracle.com>
Message-ID: <02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com>

Hi Gerard,

Thanks for looking at this.

> You put  +   // With a client VM, -XX:+TieredCompilation causes TieredCompilation
> +   // to be true here (the option is validated later) and
> +   // min_number_of_compiler_threads to exceed CI_COMPILER_COUNT.
> +   min_number_of_compiler_threads = MIN2(min_number_of_compiler_threads, CI_COMPILER_COUNT);
> into src/share/vm/runtime/commandLineFlagConstraintsCompiler.cpp file, but that's a constraint function - there should not be any code that sets anything there. Is there somewhere else we can put this code instead of the constraint function?  

min_number_of_compiler_threads is a local variable so it shouldn?t be a problem, right?

Roland.

> 
> cheers
> 
> On 07/29/2015 05:51 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8132525/webrev.00/
>> 
>> 
>> My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process.
>> 
>> Roland.
>> 
> 


From gerard.ziemski at oracle.com  Wed Jul 29 18:50:00 2015
From: gerard.ziemski at oracle.com (gerard ziemski)
Date: Wed, 29 Jul 2015 13:50:00 -0500
Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation
	-XX:CICompilerCount=1 -version asserts since 8130858
In-Reply-To: <02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com>
References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>
	<55B91CBB.2040800@oracle.com>
	<02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com>
Message-ID: <55B92058.3010207@oracle.com>

Right, looks good.


cheers

On 07/29/2015 01:48 PM, Roland Westrelin wrote:
> Hi Gerard,
>
> Thanks for looking at this.
>
>> You put  +   // With a client VM, -XX:+TieredCompilation causes TieredCompilation
>> +   // to be true here (the option is validated later) and
>> +   // min_number_of_compiler_threads to exceed CI_COMPILER_COUNT.
>> +   min_number_of_compiler_threads = MIN2(min_number_of_compiler_threads, CI_COMPILER_COUNT);
>> into src/share/vm/runtime/commandLineFlagConstraintsCompiler.cpp file, but that's a constraint function - there should not be any code that sets anything there. Is there somewhere else we can put this code instead of the constraint function?
> min_number_of_compiler_threads is a local variable so it shouldn?t be a problem, right?
>
> Roland.
>
>> cheers
>>
>> On 07/29/2015 05:51 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/8132525/webrev.00/
>>>
>>>
>>> My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process.
>>>
>>> Roland.
>>>
>


From roland.westrelin at oracle.com  Wed Jul 29 18:57:09 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Jul 2015 20:57:09 +0200
Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation
	-XX:CICompilerCount=1 -version asserts since 8130858
In-Reply-To: <55B92058.3010207@oracle.com>
References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com>
	<55B91CBB.2040800@oracle.com>
	<02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com>
	<55B92058.3010207@oracle.com>
Message-ID: <CA3AAFBD-D778-4EAD-87B1-6F6533D1B3EE@oracle.com>

Thanks Gerard & Vladimir for the reviews.

Roland.

From vladimir.kozlov at oracle.com  Thu Jul 30 00:48:39 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Jul 2015 17:48:39 -0700
Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2
	escape analysis
In-Reply-To: <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com>
References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>
	<55B7ADFE.3060109@oracle.com>
	<9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com>
Message-ID: <55B97467.3000404@oracle.com>

On 7/29/15 6:57 AM, Roland Westrelin wrote:
>> The next change puzzles me:
>>
>> -         if (!call->may_modify(tinst, phase)) {
>> +         if (call->may_modify(tinst, phase)) {
>> -           mem = call->in(TypeFunc::Memory);
>> +           assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape");
>>
>> Why only ArrayCopy? I think it is most of calls. What set of tests you ran?
>>
>> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load().
>
> What about:
>
> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);
>
> instead of membar_for_arraycopy()
>
> So ArrayCopyNode would have:
>
> virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase);
>
> and
>
> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);
>
> that do the same thing except the static method also looks for a graph pattern starting from a MemBar.

Yes, it is better.

Thanks,
Vladimir

>
> Roland.
>
>>
>> Add explicit check:
>> && strcmp(_name, "unsafe_arraycopy") != 0)
>>
>> Thanks,
>> Vladimir
>>
>> On 7/28/15 7:05 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/
>>>
>>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.
>>>
>>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates.
>>>
>>> Roland.
>>>
>

From vladimir.kozlov at oracle.com  Thu Jul 30 02:08:56 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Jul 2015 19:08:56 -0700
Subject: Fwd: Tiered compilation and virtual call heuristics
In-Reply-To: <CAP_pwnWQHmayJh=m=cbZ6kt=9ybwFNE6Te_9KokYEiLRRwHnhA@mail.gmail.com>
References: <CAP_pwnU_L6a6jh7qJZO_gbeoYkZcDF5M88adgoL85TJaWDc78g@mail.gmail.com>
	<CAP_pwnWQHmayJh=m=cbZ6kt=9ybwFNE6Te_9KokYEiLRRwHnhA@mail.gmail.com>
Message-ID: <55B98738.3040502@oracle.com>

Hi Carsten,

The main issue here is that without Tiered Interpreter starts collection 
profiling information only after 3300 invocations 
(InterpreterProfilePercentage). As result data from first invocations is 
not recorded.
On other hand with Tiered C1 compilation (with profiling code) is 
triggered after 100 invocations. So you have a lot more data as you 
observed.

If you can sacrifice a startup performance you can try to use 
CompileThresholdScaling to increase compilation thresholds to delay 
compilations.

Or you can also try to increase Tier3InvocationThreshold and 
Tier3CompileThreshold to delay only C1 compilation:

Here is formula from simpleThresholdPolicy.inline.hpp:

     return (i >= Tier3InvocationThreshold * scale) ||
            (i >= Tier3MinInvocationThreshold * scale && i + b >= 
Tier3CompileThreshold * scale);

But if you have real "flat" profile (all called methods are relatively 
warm) nothing will help you.

If you have some methods which are relatively hot you can solve that by 
trying to call them at the beginning. For example, if you had 
count400(0) called first (or second) you will get record for it in MDO.
And then you can try to low TypeProfileMajorReceiverPercent to avoid 
virtual call at least for on hot method (recorded in MDO):

   product(intx, TypeProfileMajorReceiverPercent, 90,
           "% of major receiver type to all profiled receivers")

Regards,
Vladimir

On 7/22/15 10:37 AM, Carsten Varming wrote:
> Dear Hotspot compiler group,
>
> I have had a few issues with tiered compilation in JDK8 lately and was
> wondering if you have some comments or ideas for the given problem.
>
> Here is my problem as I currently understand it. Feel free to correct
> any misunderstandings I may have. With tiered compilation the heuristics
> for inlining virtual calls seems to degrade quite a bit. I think this is
> due to MethodData objects being created much earlier with tiered than
> without. This causes the tracking of the hottest target methods at a
> virtual call site to go awry, due to the limit (2) on the number of
> MethodData objects that can be associated with a bci in a method. It
> seems like the only virtual call targets tracked are the targets that
> are warm when when C1 is invoked.
>
> The program ends up with all call-sites in
> scala.collection.IndexedSeqOptimized.slice using virtual dispatch with
> tiered and bimorphic call sites without tiered. The end result with
> tiered is a tripling of the cpu required to run the program, and
> instruction pointers from the compiled slice method end up in 90% of all
> cpu samples (collected with perf at 4kHz).
>
> The problem is with a small application built in Scala on top of Netty.
> I have written a small sample program (see attached Main.java) to spare
> you the details (and to be able to give you code).
>
> When I run the sample program with tiered then the call to count end up
> being a virtual call, due to Instance$3.count  and Instance4.count being
> warm when C1 kicks in. Without tiered Instance$1.count is the only hot
> method.
>
> I wonder if you guys have seen this problem in the wild or if I just
> happen to be unlucky. Increasing BciProfileWidth should help in my case,
> but it is not a product flag. Do you have any experience regarding cost
> of increasing BciProfileWidth? Do you have any thoughts on throwing out
> MethodData objects for virtual call sites that turns out to be pretty cold?
>
> Thank you in advance for your thoughts,
> Carsten
>

From vladimir.kozlov at oracle.com  Thu Jul 30 02:15:14 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Jul 2015 19:15:14 -0700
Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method
In-Reply-To: <1AB40291-4B32-4B33-8621-183585854169@oracle.com>
References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>
	<55B7BACE.9080100@oracle.com>
	<1AB40291-4B32-4B33-8621-183585854169@oracle.com>
Message-ID: <55B988B2.5000409@oracle.com>

This looks good to me. You need secondreviewer to look on this since 
changes are big.

Thanks.
Vladimir

On 7/29/15 1:54 AM, Michael Haupt wrote:
> Hi Vladimir,
>
> thank you for your comments. I have uploaded a revised webrev to
> http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are
> inlined below.
>
>> Am 28.07.2015 um 19:24 schrieb Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>:
>> Before looking to webrev, can you use whole word Node::related(),
>> dump_related(), dump_related_compact(), dump_compact()? "comp" could
>> be confused for "compiled". It is more typing in debugger but it is
>> more clear.
>
> Done.
>
>> Also from  this->dump_rel() in your example I see that you dump a lot
>> more input nodes than I expect (only up to inputs of CmpU node).
>> But this->dump_rel_comp() produces correct set of nodes.
>
> The depth of output can be controlled with the method
> Node::dump_related(int d_in, int d_out); in my initial post I had not
> mentioned this method. The default output is also formatted in a way
> that makes clear where the current node (>) is, and where all the inputs
> (before) and outputs (after) are. Regarding the notion of "related
> nodes", YMMV.
>
> For additional illustration, I've added an implementation of related()
> for PhiNode.
>
>> It would be nice if you can avoid using macro:
>>
>> +#ifndef PRODUCT
>> +  REL_IN_DATA_OUT_1;
>> +#endif
>>
>> "Arithmetic nodes" are most common data nodes (vs control nodes
>> this->is_CFG() == true). May be instead specialized rel() method you
>> can use some flags checks in Node::rel() method.
>
> Done.
>
> Best,
>
> Michael
>
> --
>
> Oracle <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> OracleJava Platform Group | LangTools Team | Nashorn
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam,
> Germany
> Green Oracle <http://www.oracle.com/commitment>	Oracle is committed to
> developing practices and products that help protect the environment
>
>

From michael.haupt at oracle.com  Thu Jul 30 05:06:51 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Thu, 30 Jul 2015 07:06:51 +0200
Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method
In-Reply-To: <55B988B2.5000409@oracle.com>
References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>
	<55B7BACE.9080100@oracle.com>
	<1AB40291-4B32-4B33-8621-183585854169@oracle.com>
	<55B988B2.5000409@oracle.com>
Message-ID: <585D977C-2C99-4F96-BC3A-FB3EE56304C9@oracle.com>

Hi Vladimir,

thank you.

Anyone, please review ...

Best,

Michael

> Am 30.07.2015 um 04:15 schrieb Vladimir Kozlov <vladimir.kozlov at oracle.com>:
> 
> This looks good to me. You need secondreviewer to look on this since changes are big.
> 
> Thanks.
> Vladimir
> 
> On 7/29/15 1:54 AM, Michael Haupt wrote:
>> Hi Vladimir,
>> 
>> thank you for your comments. I have uploaded a revised webrev to
>> http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are
>> inlined below.
>> 
>>> Am 28.07.2015 um 19:24 schrieb Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>:
>>> Before looking to webrev, can you use whole word Node::related(),
>>> dump_related(), dump_related_compact(), dump_compact()? "comp" could
>>> be confused for "compiled". It is more typing in debugger but it is
>>> more clear.
>> 
>> Done.
>> 
>>> Also from  this->dump_rel() in your example I see that you dump a lot
>>> more input nodes than I expect (only up to inputs of CmpU node).
>>> But this->dump_rel_comp() produces correct set of nodes.
>> 
>> The depth of output can be controlled with the method
>> Node::dump_related(int d_in, int d_out); in my initial post I had not
>> mentioned this method. The default output is also formatted in a way
>> that makes clear where the current node (>) is, and where all the inputs
>> (before) and outputs (after) are. Regarding the notion of "related
>> nodes", YMMV.
>> 
>> For additional illustration, I've added an implementation of related()
>> for PhiNode.
>> 
>>> It would be nice if you can avoid using macro:
>>> 
>>> +#ifndef PRODUCT
>>> +  REL_IN_DATA_OUT_1;
>>> +#endif
>>> 
>>> "Arithmetic nodes" are most common data nodes (vs control nodes
>>> this->is_CFG() == true). May be instead specialized rel() method you
>>> can use some flags checks in Node::rel() method.
>> 
>> Done.
>> 
>> Best,
>> 
>> Michael


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150730/2bbd5eeb/attachment.html>

From vladimir.x.ivanov at oracle.com  Thu Jul 30 09:33:45 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 30 Jul 2015 12:33:45 +0300
Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of
	Compile::unique() in appropriate places
In-Reply-To: <55A9AAB6.50505@oracle.com>
References: <CAKBs1_rWjrBpsj9oUe8d1NgkVtaUePVHeSUaB_DXSdQpW1TkQw@mail.gmail.com>
	<55A9AAB6.50505@oracle.com>
Message-ID: <55B9EF79.1040907@oracle.com>

Looks good.
I'll sponsor the change.

Best regards,
Vladimir Ivanov

On 7/18/15 4:24 AM, Vladimir Kozlov wrote:
> Thank you, Vlad
>
> It looks good. We usually don't put bug id into comments. So your
> previous version on cr.openjdk is fine.
>
> Second reviewer should look on and sponsor it with you listed as
> contributor (I see you signed OCA already).
>
> Thanks,
> Vladimir
>
> On 7/17/15 3:47 PM, Vlad Ureche wrote:
>> Hi,
>>
>> Please review the following patch for JDK-8011858. Big thanks to
>> Vladimir Kozlov for his patient guidance while working on this!
>>
>> *Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858
>>
>> *Problem:* Throughout C2, local stacks are used to prevent recursive
>> calls from blowing up the system stack. These are sized based on the
>> total number of nodes in the compilation run (e.g. C->unique()).
>> Instead, they should be sized based on the live node count
>> (C->live_nodes()).
>>
>> Now, with the increased difference between live_nodes (limited at
>> LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go
>> up to 240K), it is important to not over-estimate the size of stacks.
>>
>> *Solution:* This patch mirrors a patch written by Vladimir Kozlov for
>> JDK8u. It replaces the initial sizes from C->unique() to
>> C->live_nodes(), preserving any shifts (divisions) and offsets. For
>> example, in the compile.cpp patch
>> <http://cr.openjdk.java.net/~kvn/8011858/webrev/src/share/vm/opto/compile.cpp.patch>:
>>
>>
>> |-  Node_Stack nstack(unique() >> 1);
>> +  Node_Stack nstack(live_nodes() >> 1);
>> |
>>
>> There is an issue described at
>> https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the
>> workaround from Vladimir?s patch.
>>
>> *Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or
>> http://vladureche.ro/webrev/8011858
>> <http://vladureche.ro/webrev/8011858>(updated, includes a link to bug
>> 8121702)
>>
>> *Tests:* Running jtreg with the compiler, runtime and gc tests on the
>> dev <http://hg.openjdk.java.net/jdk9/dev> branch shows the same status
>> before and after the patch: 808 tests passed, 16 failed and 6 errors
>> <http://vladureche.ro/webrev/8011858/JTreport/html/index.html>. What
>> would be a stable point where all tests are expected to pass, so I can
>> test the patch there? Maybe jdk9 <http://hg.openjdk.java.net/jdk9/jdk9>?
>>
>> Thanks,
>> Vlad
>>

From roland.westrelin at oracle.com  Thu Jul 30 18:29:37 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 30 Jul 2015 20:29:37 +0200
Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2
	escape analysis
In-Reply-To: <55B97467.3000404@oracle.com>
References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>
	<55B7ADFE.3060109@oracle.com>
	<9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com>
	<55B97467.3000404@oracle.com>
Message-ID: <CB79B407-1E7C-44C2-8896-2DE6697AD854@oracle.com>

Updated webrev with Vladimir?s comments:

http://cr.openjdk.java.net/~roland/8130847/webrev.01/

Roland.

> On Jul 30, 2015, at 2:48 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> On 7/29/15 6:57 AM, Roland Westrelin wrote:
>>> The next change puzzles me:
>>> 
>>> -         if (!call->may_modify(tinst, phase)) {
>>> +         if (call->may_modify(tinst, phase)) {
>>> -           mem = call->in(TypeFunc::Memory);
>>> +           assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape");
>>> 
>>> Why only ArrayCopy? I think it is most of calls. What set of tests you ran?
>>> 
>>> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load().
>> 
>> What about:
>> 
>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);
>> 
>> instead of membar_for_arraycopy()
>> 
>> So ArrayCopyNode would have:
>> 
>> virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase);
>> 
>> and
>> 
>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);
>> 
>> that do the same thing except the static method also looks for a graph pattern starting from a MemBar.
> 
> Yes, it is better.
> 
> Thanks,
> Vladimir
> 
>> 
>> Roland.
>> 
>>> 
>>> Add explicit check:
>>> && strcmp(_name, "unsafe_arraycopy") != 0)
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 7/28/15 7:05 AM, Roland Westrelin wrote:
>>>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/
>>>> 
>>>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.
>>>> 
>>>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates.
>>>> 
>>>> Roland.


From vladimir.kozlov at oracle.com  Thu Jul 30 18:34:34 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 30 Jul 2015 11:34:34 -0700
Subject: RFR(M): 8130847: Cloned object's fields observed as null after
	C2 escape analysis
In-Reply-To: <CB79B407-1E7C-44C2-8896-2DE6697AD854@oracle.com>
References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com>	<55B7ADFE.3060109@oracle.com>	<9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com>	<55B97467.3000404@oracle.com>
	<CB79B407-1E7C-44C2-8896-2DE6697AD854@oracle.com>
Message-ID: <55BA6E3A.2030007@oracle.com>

Looks good to me.

Thanks,
Vladimir

On 7/30/15 11:29 AM, Roland Westrelin wrote:
> Updated webrev with Vladimir?s comments:
>
> http://cr.openjdk.java.net/~roland/8130847/webrev.01/
>
> Roland.
>
>> On Jul 30, 2015, at 2:48 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> On 7/29/15 6:57 AM, Roland Westrelin wrote:
>>>> The next change puzzles me:
>>>>
>>>> -         if (!call->may_modify(tinst, phase)) {
>>>> +         if (call->may_modify(tinst, phase)) {
>>>> -           mem = call->in(TypeFunc::Memory);
>>>> +           assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape");
>>>>
>>>> Why only ArrayCopy? I think it is most of calls. What set of tests you ran?
>>>>
>>>> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load().
>>>
>>> What about:
>>>
>>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);
>>>
>>> instead of membar_for_arraycopy()
>>>
>>> So ArrayCopyNode would have:
>>>
>>> virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase);
>>>
>>> and
>>>
>>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase);
>>>
>>> that do the same thing except the static method also looks for a graph pattern starting from a MemBar.
>>
>> Yes, it is better.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Roland.
>>>
>>>>
>>>> Add explicit check:
>>>> && strcmp(_name, "unsafe_arraycopy") != 0)
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 7/28/15 7:05 AM, Roland Westrelin wrote:
>>>>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/
>>>>>
>>>>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary.
>>>>>
>>>>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates.
>>>>>
>>>>> Roland.
>

From vladimir.kozlov at oracle.com  Thu Jul 30 19:19:19 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 30 Jul 2015 12:19:19 -0700
Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile
	object stores
In-Reply-To: <55B8B134.2050803@redhat.com>
References: <55B8B134.2050803@redhat.com>
Message-ID: <55BA78B7.7030300@oracle.com>

First, thank you for extensive comments - they help.

Second, does it really help? I don't see any numbers.

Next pattern (returning NULL or false) is repeater several times. May be make separate function for it.


+    if (opcode != Op_MemBarRelease) {
+      if (opcode != Op_MemBarCPUOrder)
+        return NULL;
+      MemBarNode *parent = parent_membar(leading);
+      if (!parent || !parent->Opcode() == Op_MemBarRelease)
+        return NULL;
+    }

Can you replace retry_feed: with while(cond) ?

Next could be one line Node *x = training
+    Node *x;
+    // the Mem feed to the membar should be a merge
+    x = trailing->in(TypeFunc::Memory);

Thanks,
Vladimir

On 7/29/15 3:55 AM, Andrew Dinn wrote:
> The following webrev is a follow-on to the fix posted for JDK-8078263.
> The earlier fix optimized *non-object* volatile puts/gets to use
> stlr/ldar and elide the associated leading/trailing dmb instructions.
> This one extends the previous optimization to cover volatile object puts.
>
>    http://cr.openjdk.java.net/~adinn/8078743/webrev.03/
>
> The fix involves identifying certain Ideal Graph configurations in which
> generation of leading and trailing dmb can be avoided in favour of
> generating an stlr instruction. As a consequence the fix is sensitive to
> the current GC configs and also to whether the value being written is
> null, potentially null or known notnull. I have tested it using 5 GC
> configs:
>
>    G1GC
>    CMS+UseCondCardMark
>    CMS-UseCondCardMark
>    Parallel+UseCondCardMark
>    Parallel-UseCondCardMark
>
> The last two configs are much of a muchness as regards what the patch
> code does since with Parallel (or Serial) GC the patch code does not
> need to look at the code which does the card mark -- but I tested
> against both just to be sure.
>
> Testing involved
>
>    i) eyeballing the code generated for normal and unsafe volatile puts
> with null, possibly null and notnull values
>
>    ii) exercising a large program (build and run sample project in
> netbeans) and eyeballing the code generated for methods of ConcurrentHashMap
>
>    iii) running the full jcstress suite
>
> Comments and reviews very welcome
>
> n.b. I have a 3rd patch queued which performs a similar optimization for
> CAS operations (drop the dmbs in favour of an ldaxr/stlxr pair).
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
>

From vladimir.kozlov at oracle.com  Fri Jul 31 03:03:02 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 30 Jul 2015 20:03:02 -0700
Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on
	failure
In-Reply-To: <55B8A390.4000203@oracle.com>
References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com>
	<55B8A390.4000203@oracle.com>
Message-ID: <55BAE566.5020904@oracle.com>

I think the test is wrong. It should be:

if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT);

Thanks,
Vladimir

On 7/29/15 2:57 AM, Aleksey Shipilev wrote:
> On 07/29/2015 12:24 PM, Andrew Dinn wrote:
>> On 29/07/15 09:58, Aleksey Shipilev wrote:
>>> I would like to suggest a fix for:
>>> https://bugs.openjdk.java.net/browse/JDK-8019968
>>
>>> In short, current reference CAS intrinsic blindly emits
>>> post_barrier, ignoring the CAS result. In some cases, notably
>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a
>>> post_barrier excessively. Instead, we can conditionalize on the
>>> result of the store itself, and put the post_barrier only on
>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/
>>
>>> More performance results here:
>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt
>>
>> Nice! The code looks fine and your test results are very convincing.
>> I'll be interested to see how this looks on AArch64.
>
> Thanks Andrew!
>
> The change passes JPRT, so AArch64 build is available. The benchmark JAR
> mentioned in the issue comments would run without intervention, taking
> around 40 minutes. You are very welcome to try, while Reviewers are
> taking a look. I can do that only next week.
>
>> That said, I am afraid you still need a Reviewer!
>
> That reminds me I haven't spelled out what testing was done:
>
>   * JPRT on all open platforms
>   * Targeted benchmarks
>   * Eyeballing the generated x86 assembly
>
> Thanks,
> -Aleksey
>
>

From zoltan.majo at oracle.com  Fri Jul 31 10:02:06 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 31 Jul 2015 12:02:06 +0200
Subject: [9] RFR(S): 8132457: Unify command-line flags controlling the usage
	of compiler intrinsics
Message-ID: <55BB479E.8000402@oracle.com>

Hi,


please review the following patch for JDK-8132457.

Bug: https://bugs.openjdk.java.net/browse/JDK-8132457

Problem: There are four cases when flags controlling intrinsics for C1 
and C2 behave inconsistently:
1) The DisableIntrinsic flag is C2-specific.
2) The InlineNatives flag disables most but not all intrinsics. Some 
intrinsics (implemented by both C1 and C2) are turned off by 
-XX:-InlineNatives for C1 but are left on for C2.
3) The _getClass intrinsic (implemented by both C1 and C2) is turned off 
by -XX:-InlineClassNatives for C1 and is left unaffected by C2.
4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject, 
_compareAndSwapLong, and _compareAndSwapInt intrinsics are turned off by 
-XX:-InlineUnsafeOps for C2 and are unaffected by C1.


Solution: Unify command-line flags controlling intrinsic processing. 
Processing of command-line flags is now done only in 
vmIntrinsics::is_disabled_by_flags and there is no compiler-specific 
flag processing.

The inconsistencies listed in the problem description were addressed the 
following way:
1) Extend the C1 compiler to consider the DisableIntrinsic flag when 
checking if an intrinsic is available.
2) -XX:-InlineNatives turns off most intrinsics but leaves on some 
intrinsics (the same set of intrinsics are left on for both C1 and C2).
3) -XX:-InlineClassNatives turns off the _getClass intrinsic for both C1 
and C2.
4) -XX:-InlineUnsafeOps turns off the _loadfence, _storefence, 
_fullfence, _compareAndSwapObject,  _compareAndSwapLong, and 
_compareAndSwapInt intrinsics for both C1 and C2.

Webrev:
http://cr.openjdk.java.net/~zmajo/8132457/webrev.00/

Testing:
- JPRT run, testset hotspot, all tests pass;
- all JTREG tests in hotspot/test, all tests pass;
- local testing of DisableIntrinsic with both C1 and C2.

Thank you and best regards,


Zoltan


From roland.westrelin at oracle.com  Fri Jul 31 10:20:38 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Fri, 31 Jul 2015 12:20:38 +0200
Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated by
	optimizer
In-Reply-To: <5581C465.7070803@oracle.com>
References: <BE183559-F29F-4D6B-8FFD-0A5957B59887@oracle.com>
	<55789088.5050405@oracle.com>
	<9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com>
	<5580738D.9070900@oracle.com>
	<EB23C1A3-D230-4116-9BF2-39065B89298C@oracle.com>
	<5581C465.7070803@oracle.com>
Message-ID: <CC168022-8CA5-4FA2-B46F-16AB266D98A6@oracle.com>

Here is a new webrev for this that takes Vladimir?s comments into account:

http://cr.openjdk.java.net/~roland/8080289/webrev.01/

Roland.


> On Jun 17, 2015, at 9:03 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> > http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> >
> > it?s allowed to reorder normal stores with normal stores
> 
> If we can guarantee that all passed stores are normal (I assume we will have barriers otherwise in between) then I agree. I am not sure why we didn't do it before, there could be a counterargument for that which I don't remember. To make sure, ask John.
> 
> >> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal().
> >
> > Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores?
> 
> Yes, it may simplify the code of Ideal. You may still need a loop to look for previous store which could be eliminated but you don't need to have 'prev'. As soon you remove one node, you exit Ideal returning 'this' and it will be called again so you can search for another previous store.
> 
> >> BOTTOM (all slices) Phi?
> >
> > Wouldn?t there be a MergeMem between the store and the Phi then?
> 
> Yes. Okay, you can keep the check as assert we will see if Nightly testing hit it it or not.
> 
> Thanks,
> Vladimir
> 
> On 6/17/15 1:35 AM, Roland Westrelin wrote:
>> 
>>>> That?s what I think the code does. That is if you have:
>>>> 
>>>> st1->st2->st3->st4
>>> 
>>> I assume st4 is first store and st1 is last. Right?
>> 
>> Program order is:
>> st4
>> st3
>> st2
>> st1
>> 
>>>> and st3 is redundant with st1, the chain should become:
>>>> 
>>>> st1->st2->st4
>>> 
>>> I am not sure it is correct optimization. On some machines result of st3 could be visible before result of st2. And you change it.
>>> I am suggesting not do that. Do you need that for stores move from loop?
>> 
>> It?s not required. It cleans up the graph in some cases like this:
>> 
>>     static void test_after_5(int idx) {
>>         for (int i = 0; i < 1000; i++) {
>>             array[idx] = i;
>>             array[idx+1] = i;
>>             array[idx+2] = i;
>>             array[idx+3] = i;
>>             array[idx+4] = i;
>>             array[idx+5] = i;
>>         }
>>     }
>> 
>> all stores are sunk out of the loop but that happens after iteration splitting and so there are multiple redundant copies of each store that are not collapsed.
>> 
>> This said, we currently reorder the stores even if it?s less aggressive than what I?m proposing. With program:
>> 
>> st4
>> st3
>> st2
>> st1
>> 
>> If st1, st3 and st4 are on one slice and st2 is on another and if st1 and st3 store to the same address we optimize st3 out:
>> 
>> st4
>> st2
>> st1
>> 
>> so st3=st1 may only be visible after st2.
>> 
>> Also, the way I read the first table in this:
>> 
>> http://gee.cs.oswego.edu/dl/jmm/cookbook.html
>> 
>> it?s allowed to reorder normal stores with normal stores
>> 
>>>> so we need to change the memory input of st2 when we find st3 can be removed. In the code, at that point, this=st1, st = st3 and prev=st2.
>>> 
>>> In this case the code should be:
>>> 
>>>   if (st->in(MemNode::Address)->eqv_uncast(address) &&
>>>     ...
>>>   } else {
>>>     prev = st;
>>>   }
>>> 
>>> to update 'prev' with 'st' only if 'st' is not removed.
>> 
>> You?re right.
>> 
>>> Also, I think, st->in(MemNode::Memory) could be put in local var since it is used several times in this code.
>>> 
>>>> 
>>>>> You need to set improved = true since 'this' will not change. We also use 'make_progress' variable's name in such cases.
>>>> 
>>>> In the example above, if we remove st2, we modify this, right?
>>> 
>>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal().
>> 
>> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores?
>> 
>>>> We?ll find a path from the head that doesn?t go through the store and that exits the loop. What the comment doesn?t say is that with example 2 below:
>>>> 
>>>> for (int i = 0; i < 10; i++) {
>>>>   if (some_condition) {
>>>>     uncommon_trap();
>>>>   }
>>>>   array[idx] = 999;
>>>> }
>>>> 
>>>> my verification code would find the early exit as well.
>>>> 
>>>> It?s verification code only because if we have example 1 above, then we have a memory Phi to merge both branches of the if. So the pattern that we look for in PhaseIdealLoop::try_move_store_before_loop() won?t match: the loop?s memory Phi backedge won?t be the store. If we have example 2 above, then the loop?s memory Phi doesn?t have a single memory use. So I don?t think we need to check that the store post dominate the loop head in product. That?s my reasoning anyway and the verification code is there to verify it.
>>> 
>>> I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which reduce cases only to one store to this address in the loop - good.
>>> 
>>> How you check in product VM that there are no other exists from a loop (your example 2)? Is it guarded by mem->outcnt() == 1 check?
>> 
>> Yes.
>> 
>>>>> Should you check phi == NULL instead of assert to make sure you have only one Phi node?
>>>> 
>>>> Can there be more than one memory Phi for a particular slice that has in(0) == n_loop->_head?
>>>> I would have expected that to be impossible.
>>> 
>>> BOTTOM (all slices) Phi?
>> 
>> Wouldn?t there be a MergeMem between the store and the Phi then?
>> 
>> For the record, the webrev:
>> 
>> http://cr.openjdk.java.net/~roland/8080289/webrev.00/
>> 
>> Roland.
>> 


From adinn at redhat.com  Fri Jul 31 10:33:42 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 31 Jul 2015 11:33:42 +0100
Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile
	object stores
In-Reply-To: <55BA78B7.7030300@oracle.com>
References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com>
Message-ID: <55BB4F06.7090603@redhat.com>

Hi Vladimir,

Thank you very much for the feedback.

On 30/07/15 20:19, Vladimir Kozlov wrote:
> First, thank you for extensive comments - they help.

They were a necessity for me as much as anyone else :-)

> Second, does it really help? I don't see any numbers.

Hmm, running on prejudice, maybe try science? good idea!

I will obtain numbers.

> Next pattern (returning NULL or false) is repeater several times. May be
> make separate function for it.
> 
> 
> +    if (opcode != Op_MemBarRelease) {
> +      if (opcode != Op_MemBarCPUOrder)
> +        return NULL;
> +      MemBarNode *parent = parent_membar(leading);
> +      if (!parent || !parent->Opcode() == Op_MemBarRelease)
> +        return NULL;
> +    }

Yes, that's a very good idea.

> Can you replace retry_feed: with while(cond) ?

Of course.

> Next could be one line Node *x = training
> +    Node *x;
> +    // the Mem feed to the membar should be a merge
> +    x = trailing->in(TypeFunc::Memory);

Yes, I agree.

I'll post a revised webrev with the above fixed after I obtain some
performance figures.

regards,


Andrew Dinn
-----------


From roland.westrelin at oracle.com  Fri Jul 31 11:42:00 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Fri, 31 Jul 2015 13:42:00 +0200
Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method
In-Reply-To: <1AB40291-4B32-4B33-8621-183585854169@oracle.com>
References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>
	<55B7BACE.9080100@oracle.com>
	<1AB40291-4B32-4B33-8621-183585854169@oracle.com>
Message-ID: <E29972CE-5E9F-49CF-ABD3-595160DD5BD9@oracle.com>

> thank you for your comments. I have uploaded a revised webrev to http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are inlined below.

That looks good to me.

In subnode.cpp comment:
1439 //---------------------------------rel-----------------------------------------
doesn?t match function name.

Roland.


From michael.haupt at oracle.com  Fri Jul 31 11:54:38 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Fri, 31 Jul 2015 13:54:38 +0200
Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method
In-Reply-To: <E29972CE-5E9F-49CF-ABD3-595160DD5BD9@oracle.com>
References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com>
	<55B7BACE.9080100@oracle.com>
	<1AB40291-4B32-4B33-8621-183585854169@oracle.com>
	<E29972CE-5E9F-49CF-ABD3-595160DD5BD9@oracle.com>
Message-ID: <418CAD45-6BFA-43C1-8CCD-AD16CAA4227E@oracle.com>

Hi Roland,

> Am 31.07.2015 um 13:42 schrieb Roland Westrelin <roland.westrelin at oracle.com>:
>> thank you for your comments. I have uploaded a revised webrev to http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are inlined below.
> 
> That looks good to me.
> 
> In subnode.cpp comment:
> 1439 //---------------------------------rel-----------------------------------------
> doesn?t match function name.

thank you very much. This was the case also in several other places; I've fixed these occurrences.

Best,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/bbc40ee3/attachment.html>

From aph at redhat.com  Fri Jul 31 13:32:45 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 31 Jul 2015 14:32:45 +0100
Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile
	object stores
In-Reply-To: <55BB4F06.7090603@redhat.com>
References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com>
	<55BB4F06.7090603@redhat.com>
Message-ID: <55BB78FD.3050002@redhat.com>

Hi,

On 07/31/2015 11:33 AM, Andrew Dinn wrote:

> On 30/07/15 20:19, Vladimir Kozlov wrote:
>> First, thank you for extensive comments - they help.
> 
> They were a necessity for me as much as anyone else :-)
> 
>> Second, does it really help? I don't see any numbers.
> 
> Hmm, running on prejudice, maybe try science? good idea!
> 
> I will obtain numbers.

That's not easy because AArch64 is a specification, not an
implementation.  Going the route of load acquire/store release may not
help much on some chips, but conversations I've had with ARM
architects tell me that they should be preferred.  In particular,
store release for volatiles means that we can avoid a full fence.

Current status: on one out-of-order implementation of AArch64 I see no
difference between "stlr" and "dmb st; str ; dmb ish".  On another,
this time an in-order processor, "stlr" is 40% faster.  This is just
the execution for a few instructions, like this:

.L3:
        ldr     w2, [x1]
        add     w2, w2, 1
        str     w2, [x1]
        stlr x4, [x3]
        subs    x0, x0, #1
        bne     .L3

versus this:

.L3:
        ldr     w2, [x1]
        add     w2, w2, 1
        str     w2, [x1]
        dmb st; str x4, [x3]; dmb ish
        subs    x0, x0, #1
        bne     .L3

The guidelines from ARM are that we should optimize for the simpler
in-order processors; it won't help the out-of-order parts very much,
but it won't hurt either.

Andrew.

From vladimir.kozlov at oracle.com  Fri Jul 31 15:18:12 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 31 Jul 2015 08:18:12 -0700
Subject: [9] RFR(S): 8132457: Unify command-line flags controlling the
	usage of compiler intrinsics
In-Reply-To: <55BB479E.8000402@oracle.com>
References: <55BB479E.8000402@oracle.com>
Message-ID: <55BB91B4.805@oracle.com>

Very nice cleanup. Thank you, Zoltan.

Vladimir

On 7/31/15 3:02 AM, Zolt?n Maj? wrote:
> Hi,
>
>
> please review the following patch for JDK-8132457.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8132457
>
> Problem: There are four cases when flags controlling intrinsics for C1 and C2 behave inconsistently:
> 1) The DisableIntrinsic flag is C2-specific.
> 2) The InlineNatives flag disables most but not all intrinsics. Some intrinsics (implemented by both C1 and C2) are
> turned off by -XX:-InlineNatives for C1 but are left on for C2.
> 3) The _getClass intrinsic (implemented by both C1 and C2) is turned off by -XX:-InlineClassNatives for C1 and is left
> unaffected by C2.
> 4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject, _compareAndSwapLong, and _compareAndSwapInt
> intrinsics are turned off by -XX:-InlineUnsafeOps for C2 and are unaffected by C1.
>
>
> Solution: Unify command-line flags controlling intrinsic processing. Processing of command-line flags is now done only
> in vmIntrinsics::is_disabled_by_flags and there is no compiler-specific flag processing.
>
> The inconsistencies listed in the problem description were addressed the following way:
> 1) Extend the C1 compiler to consider the DisableIntrinsic flag when checking if an intrinsic is available.
> 2) -XX:-InlineNatives turns off most intrinsics but leaves on some intrinsics (the same set of intrinsics are left on
> for both C1 and C2).
> 3) -XX:-InlineClassNatives turns off the _getClass intrinsic for both C1 and C2.
> 4) -XX:-InlineUnsafeOps turns off the _loadfence, _storefence, _fullfence, _compareAndSwapObject,  _compareAndSwapLong,
> and _compareAndSwapInt intrinsics for both C1 and C2.
>
> Webrev:
> http://cr.openjdk.java.net/~zmajo/8132457/webrev.00/
>
> Testing:
> - JPRT run, testset hotspot, all tests pass;
> - all JTREG tests in hotspot/test, all tests pass;
> - local testing of DisableIntrinsic with both C1 and C2.
>
> Thank you and best regards,
>
>
> Zoltan
>

From vladimir.kozlov at oracle.com  Fri Jul 31 15:20:27 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 31 Jul 2015 08:20:27 -0700
Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated
	by optimizer
In-Reply-To: <CC168022-8CA5-4FA2-B46F-16AB266D98A6@oracle.com>
References: <BE183559-F29F-4D6B-8FFD-0A5957B59887@oracle.com>	<55789088.5050405@oracle.com>	<9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com>	<5580738D.9070900@oracle.com>	<EB23C1A3-D230-4116-9BF2-39065B89298C@oracle.com>	<5581C465.7070803@oracle.com>
	<CC168022-8CA5-4FA2-B46F-16AB266D98A6@oracle.com>
Message-ID: <55BB923B.1050900@oracle.com>

This looks better. Reviewed.

thanks,
Vladimir

On 7/31/15 3:20 AM, Roland Westrelin wrote:
> Here is a new webrev for this that takes Vladimir?s comments into account:
>
> http://cr.openjdk.java.net/~roland/8080289/webrev.01/
>
> Roland.
>
>
>> On Jun 17, 2015, at 9:03 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>>> http://gee.cs.oswego.edu/dl/jmm/cookbook.html
>>>
>>> it?s allowed to reorder normal stores with normal stores
>>
>> If we can guarantee that all passed stores are normal (I assume we will have barriers otherwise in between) then I agree. I am not sure why we didn't do it before, there could be a counterargument for that which I don't remember. To make sure, ask John.
>>
>>>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal().
>>>
>>> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores?
>>
>> Yes, it may simplify the code of Ideal. You may still need a loop to look for previous store which could be eliminated but you don't need to have 'prev'. As soon you remove one node, you exit Ideal returning 'this' and it will be called again so you can search for another previous store.
>>
>>>> BOTTOM (all slices) Phi?
>>>
>>> Wouldn?t there be a MergeMem between the store and the Phi then?
>>
>> Yes. Okay, you can keep the check as assert we will see if Nightly testing hit it it or not.
>>
>> Thanks,
>> Vladimir
>>
>> On 6/17/15 1:35 AM, Roland Westrelin wrote:
>>>
>>>>> That?s what I think the code does. That is if you have:
>>>>>
>>>>> st1->st2->st3->st4
>>>>
>>>> I assume st4 is first store and st1 is last. Right?
>>>
>>> Program order is:
>>> st4
>>> st3
>>> st2
>>> st1
>>>
>>>>> and st3 is redundant with st1, the chain should become:
>>>>>
>>>>> st1->st2->st4
>>>>
>>>> I am not sure it is correct optimization. On some machines result of st3 could be visible before result of st2. And you change it.
>>>> I am suggesting not do that. Do you need that for stores move from loop?
>>>
>>> It?s not required. It cleans up the graph in some cases like this:
>>>
>>>      static void test_after_5(int idx) {
>>>          for (int i = 0; i < 1000; i++) {
>>>              array[idx] = i;
>>>              array[idx+1] = i;
>>>              array[idx+2] = i;
>>>              array[idx+3] = i;
>>>              array[idx+4] = i;
>>>              array[idx+5] = i;
>>>          }
>>>      }
>>>
>>> all stores are sunk out of the loop but that happens after iteration splitting and so there are multiple redundant copies of each store that are not collapsed.
>>>
>>> This said, we currently reorder the stores even if it?s less aggressive than what I?m proposing. With program:
>>>
>>> st4
>>> st3
>>> st2
>>> st1
>>>
>>> If st1, st3 and st4 are on one slice and st2 is on another and if st1 and st3 store to the same address we optimize st3 out:
>>>
>>> st4
>>> st2
>>> st1
>>>
>>> so st3=st1 may only be visible after st2.
>>>
>>> Also, the way I read the first table in this:
>>>
>>> http://gee.cs.oswego.edu/dl/jmm/cookbook.html
>>>
>>> it?s allowed to reorder normal stores with normal stores
>>>
>>>>> so we need to change the memory input of st2 when we find st3 can be removed. In the code, at that point, this=st1, st = st3 and prev=st2.
>>>>
>>>> In this case the code should be:
>>>>
>>>>    if (st->in(MemNode::Address)->eqv_uncast(address) &&
>>>>      ...
>>>>    } else {
>>>>      prev = st;
>>>>    }
>>>>
>>>> to update 'prev' with 'st' only if 'st' is not removed.
>>>
>>> You?re right.
>>>
>>>> Also, I think, st->in(MemNode::Memory) could be put in local var since it is used several times in this code.
>>>>
>>>>>
>>>>>> You need to set improved = true since 'this' will not change. We also use 'make_progress' variable's name in such cases.
>>>>>
>>>>> In the example above, if we remove st2, we modify this, right?
>>>>
>>>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal().
>>>
>>> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores?
>>>
>>>>> We?ll find a path from the head that doesn?t go through the store and that exits the loop. What the comment doesn?t say is that with example 2 below:
>>>>>
>>>>> for (int i = 0; i < 10; i++) {
>>>>>    if (some_condition) {
>>>>>      uncommon_trap();
>>>>>    }
>>>>>    array[idx] = 999;
>>>>> }
>>>>>
>>>>> my verification code would find the early exit as well.
>>>>>
>>>>> It?s verification code only because if we have example 1 above, then we have a memory Phi to merge both branches of the if. So the pattern that we look for in PhaseIdealLoop::try_move_store_before_loop() won?t match: the loop?s memory Phi backedge won?t be the store. If we have example 2 above, then the loop?s memory Phi doesn?t have a single memory use. So I don?t think we need to check that the store post dominate the loop head in product. That?s my reasoning anyway and the verification code is there to verify it.
>>>>
>>>> I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which reduce cases only to one store to this address in the loop - good.
>>>>
>>>> How you check in product VM that there are no other exists from a loop (your example 2)? Is it guarded by mem->outcnt() == 1 check?
>>>
>>> Yes.
>>>
>>>>>> Should you check phi == NULL instead of assert to make sure you have only one Phi node?
>>>>>
>>>>> Can there be more than one memory Phi for a particular slice that has in(0) == n_loop->_head?
>>>>> I would have expected that to be impossible.
>>>>
>>>> BOTTOM (all slices) Phi?
>>>
>>> Wouldn?t there be a MergeMem between the store and the Phi then?
>>>
>>> For the record, the webrev:
>>>
>>> http://cr.openjdk.java.net/~roland/8080289/webrev.00/
>>>
>>> Roland.
>>>
>

From adinn at redhat.com  Fri Jul 31 16:17:13 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 31 Jul 2015 17:17:13 +0100
Subject: [aarch64-port-dev ] [9] RFR(S): 8130309: need to bailout cleanly
	if CompiledStaticCall::emit_to_interp_stub fails when codecache is
	out of space
In-Reply-To: <55B72D38.2040705@oracle.com>
References: <55AE4BC2.1090104@oracle.com>
	<55B0CEBE.4010100@oracle.com>	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>	<55B22191.9070904@oracle.com>	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>	<55B641DB.1010608@oracle.com>
	<55B66DE9.2070000@oracle.com> <55B72D38.2040705@oracle.com>
Message-ID: <55BB9F89.3010203@redhat.com>

On 28/07/15 08:20, Tobias Hartmann wrote:
>> On 7/27/2015 7:36 AM, Tobias Hartmann wrote:
>>> Here is the new webrev:
>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/

I'm getting a problem building the latest AArch64 hs-comp because of
bailout being undefined.

: In member function 'virtual void
ArrayCopyStub::emit_code(LIR_Assembler*)':
/home/adinn/openjdk/hs-comp/hotspot/src/cpu/aarch64/vm/c1_CodeStubs_aarch64.cpp\
:337:39: error: 'bailout' was not declared in this scope
     bailout("trampoline stub overflow");

I cannot find a declaration for bailout anywhere? Is this something
which escaped from the lab too early or is it just that the AArch64 part
of the patch got omitted?

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From adinn at redhat.com  Fri Jul 31 17:06:55 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 31 Jul 2015 18:06:55 +0100
Subject: [aarch64-port-dev ] [9] RFR(S): 8130309: need to bailout cleanly
	if CompiledStaticCall::emit_to_interp_stub fails when codecache is
	out of space
In-Reply-To: <55BB9F89.3010203@redhat.com>
References: <55AE4BC2.1090104@oracle.com>	<55B0CEBE.4010100@oracle.com>	<81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com>	<55B22191.9070904@oracle.com>	<ED2AAAE4-8B19-4C1B-9848-A0E9C9EFA98D@oracle.com>	<77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com>	<55B641DB.1010608@oracle.com>	<55B66DE9.2070000@oracle.com>
	<55B72D38.2040705@oracle.com> <55BB9F89.3010203@redhat.com>
Message-ID: <55BBAB2F.1080100@redhat.com>


On 31/07/15 17:17, Andrew Dinn wrote:
> On 28/07/15 08:20, Tobias Hartmann wrote:
>>> On 7/27/2015 7:36 AM, Tobias Hartmann wrote:
>>>> Here is the new webrev:
>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/
> 
> I'm getting a problem building the latest AArch64 hs-comp because of
> bailout being undefined.
> 
> : In member function 'virtual void
> ArrayCopyStub::emit_code(LIR_Assembler*)':
> /home/adinn/openjdk/hs-comp/hotspot/src/cpu/aarch64/vm/c1_CodeStubs_aarch64.cpp\
> :337:39: error: 'bailout' was not declared in this scope
>      bailout("trampoline stub overflow");

Ok, the problem is that the call is happening inside
ArrayCopyStub::emit_code(LIR_Assembler* ce) so it actually needs to be

 ce->bailout("trampoline stub overflow");

However, that won't work because bailout is private to LIR_Assembler. So
we also need a friend declaration in c1_LIRAssembler_aarch64.hpp

There is also a problem with the change to MacroAssembler::trampoline_call

pp:688:10: error: invalid conversion from 'unsigned int' to 'address
{aka unsigned char*}' [-fpermissive]
   return start_offset;

I believe the return value probably ought to be pc() -- the value is not
used as far as I can see but it needs to be a non-NULL address to
indicate that everything worked ok.

I will raise a JIRA for this and post a webrev asap.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From ysr1729 at gmail.com  Fri Jul 31 18:19:02 2015
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 31 Jul 2015 11:19:02 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in
	JDK 8)
Message-ID: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>

Hello GC and Compiler teams!

One of our services that runs with several thousand threads recently
noticed an increase
in safepoint stop times, but not gc times, upon transitioning to JDK 8.

Further investigation revealed that most of the delta was related to the
so-called
pre-gc/vmop "cleanup" phase when various book-keeping activities are
performed,
and more specifically in the portion that walks java thread stacks
single-threaded (!)
and updates the hotness counters for the active nmethods. This code appears
to
be new to JDK 8 (in jdk 7 one would walk the stacks only during code cache
sweeps).

I have two questions:
(1) has anyone else (typically, I'd expect applications with many hundreds
or thousands of threads)
noticed this regression?
(2) Can we do better, for example, by:
      (a) doing these updates by walking thread stacks in multiple worker
threads in parallel, or best of all:
      (b) doing these updates when we walk the thread stacks during GC, and
skipping this phase entirely
            for non-GC safepoints (with attendant loss in frequency of this
update in low GC frequency
            scenarios).

It seems kind of silly to do GC's with many multiple worker threads, but do
these thread stack
walks single-threaded when it is embarrasingly parallel (one could
predicate the parallelization
based on the measured stack sizes and thread population, if there was
concern on the ovrhead of
activating and deactivating the thread gangs for the work).

A followup question: Any guesses as to how code cache sweep/eviction
quality might be compromised if one
were to dispense with these hotness updates entirely (or at a much reduced
frequency), as a temporary
workaround to the performance problem?

Thoughts/Comments? In particular, has this issue been addressed perhaps in
newer JVMs?

Thanks for any comments, feedback, pointers!
-- ramki

PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7
(first, where this isn't done)
vs JDK 8 (where this is done) with a program that has a few thousands of
threads:


JDK 7:
..
2827.308: [sweeping nmethods, 0.0000020 secs]
2828.679: [sweeping nmethods, 0.0000030 secs]
2829.984: [sweeping nmethods, 0.0000030 secs]
2830.956: [sweeping nmethods, 0.0000030 secs]
..

JDK 8:
..
7368.634: [mark nmethods, 0.0177030 secs]
7369.587: [mark nmethods, 0.0178305 secs]
7370.479: [mark nmethods, 0.0180260 secs]
7371.503: [mark nmethods, 0.0186494 secs]
..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/e1a5487f/attachment.html>

From vitalyd at gmail.com  Fri Jul 31 18:31:48 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 31 Jul 2015 14:31:48 -0400
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
Message-ID: <CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>

Ramki, are you running tiered compilation?

sent from my phone
On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

>
> Hello GC and Compiler teams!
>
> One of our services that runs with several thousand threads recently
> noticed an increase
> in safepoint stop times, but not gc times, upon transitioning to JDK 8.
>
> Further investigation revealed that most of the delta was related to the
> so-called
> pre-gc/vmop "cleanup" phase when various book-keeping activities are
> performed,
> and more specifically in the portion that walks java thread stacks
> single-threaded (!)
> and updates the hotness counters for the active nmethods. This code
> appears to
> be new to JDK 8 (in jdk 7 one would walk the stacks only during code cache
> sweeps).
>
> I have two questions:
> (1) has anyone else (typically, I'd expect applications with many hundreds
> or thousands of threads)
> noticed this regression?
> (2) Can we do better, for example, by:
>       (a) doing these updates by walking thread stacks in multiple worker
> threads in parallel, or best of all:
>       (b) doing these updates when we walk the thread stacks during GC,
> and skipping this phase entirely
>             for non-GC safepoints (with attendant loss in frequency of
> this update in low GC frequency
>             scenarios).
>
> It seems kind of silly to do GC's with many multiple worker threads, but
> do these thread stack
> walks single-threaded when it is embarrasingly parallel (one could
> predicate the parallelization
> based on the measured stack sizes and thread population, if there was
> concern on the ovrhead of
> activating and deactivating the thread gangs for the work).
>
> A followup question: Any guesses as to how code cache sweep/eviction
> quality might be compromised if one
> were to dispense with these hotness updates entirely (or at a much reduced
> frequency), as a temporary
> workaround to the performance problem?
>
> Thoughts/Comments? In particular, has this issue been addressed perhaps in
> newer JVMs?
>
> Thanks for any comments, feedback, pointers!
> -- ramki
>
> PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7
> (first, where this isn't done)
> vs JDK 8 (where this is done) with a program that has a few thousands of
> threads:
>
>
>
> JDK 7:
> ..
> 2827.308: [sweeping nmethods, 0.0000020 secs]
> 2828.679: [sweeping nmethods, 0.0000030 secs]
> 2829.984: [sweeping nmethods, 0.0000030 secs]
> 2830.956: [sweeping nmethods, 0.0000030 secs]
> ..
>
> JDK 8:
> ..
> 7368.634: [mark nmethods, 0.0177030 secs]
> 7369.587: [mark nmethods, 0.0178305 secs]
> 7370.479: [mark nmethods, 0.0180260 secs]
> 7371.503: [mark nmethods, 0.0186494 secs]
> ..
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/a0865aef/attachment.html>

From ysr1729 at gmail.com  Fri Jul 31 18:33:14 2015
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 31 Jul 2015 11:33:14 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
Message-ID: <CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>

Yes.


On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

> Ramki, are you running tiered compilation?
>
> sent from my phone
> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:
>
>>
>> Hello GC and Compiler teams!
>>
>> One of our services that runs with several thousand threads recently
>> noticed an increase
>> in safepoint stop times, but not gc times, upon transitioning to JDK 8.
>>
>> Further investigation revealed that most of the delta was related to the
>> so-called
>> pre-gc/vmop "cleanup" phase when various book-keeping activities are
>> performed,
>> and more specifically in the portion that walks java thread stacks
>> single-threaded (!)
>> and updates the hotness counters for the active nmethods. This code
>> appears to
>> be new to JDK 8 (in jdk 7 one would walk the stacks only during code
>> cache sweeps).
>>
>> I have two questions:
>> (1) has anyone else (typically, I'd expect applications with many
>> hundreds or thousands of threads)
>> noticed this regression?
>> (2) Can we do better, for example, by:
>>       (a) doing these updates by walking thread stacks in multiple worker
>> threads in parallel, or best of all:
>>       (b) doing these updates when we walk the thread stacks during GC,
>> and skipping this phase entirely
>>             for non-GC safepoints (with attendant loss in frequency of
>> this update in low GC frequency
>>             scenarios).
>>
>> It seems kind of silly to do GC's with many multiple worker threads, but
>> do these thread stack
>> walks single-threaded when it is embarrasingly parallel (one could
>> predicate the parallelization
>> based on the measured stack sizes and thread population, if there was
>> concern on the ovrhead of
>> activating and deactivating the thread gangs for the work).
>>
>> A followup question: Any guesses as to how code cache sweep/eviction
>> quality might be compromised if one
>> were to dispense with these hotness updates entirely (or at a much
>> reduced frequency), as a temporary
>> workaround to the performance problem?
>>
>> Thoughts/Comments? In particular, has this issue been addressed perhaps
>> in newer JVMs?
>>
>> Thanks for any comments, feedback, pointers!
>> -- ramki
>>
>> PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7
>> (first, where this isn't done)
>> vs JDK 8 (where this is done) with a program that has a few thousands of
>> threads:
>>
>>
>>
>> JDK 7:
>> ..
>> 2827.308: [sweeping nmethods, 0.0000020 secs]
>> 2828.679: [sweeping nmethods, 0.0000030 secs]
>> 2829.984: [sweeping nmethods, 0.0000030 secs]
>> 2830.956: [sweeping nmethods, 0.0000030 secs]
>> ..
>>
>> JDK 8:
>> ..
>> 7368.634: [mark nmethods, 0.0177030 secs]
>> 7369.587: [mark nmethods, 0.0178305 secs]
>> 7370.479: [mark nmethods, 0.0180260 secs]
>> 7371.503: [mark nmethods, 0.0186494 secs]
>> ..
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/dba01e2c/attachment-0001.html>

From vladimir.kozlov at oracle.com  Fri Jul 31 18:43:12 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 31 Jul 2015 11:43:12 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
Message-ID: <55BBC1C0.3030709@oracle.com>

Hi Ramki,

Did you fill up CodeCache? It start scanning aggressive only with full 
CodeCache:

   // Force stack scanning if there is only 10% free space in the code 
cache.
   // We force stack scanning only non-profiled code heap gets full, 
since critical
   // allocation go to the non-profiled heap and we must be make sure 
that there is
   // enough space.
   double free_percent = 1 / 
CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
   if (free_percent <= StartAggressiveSweepingAt) {
     do_stack_scanning();
   }

Vladimir

On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>
> Yes.
>
>
> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <vitalyd at gmail.com
> <mailto:vitalyd at gmail.com>> wrote:
>
>     Ramki, are you running tiered compilation?
>
>     sent from my phone
>
>     On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com
>     <mailto:ysr1729 at gmail.com>> wrote:
>
>
>         Hello GC and Compiler teams!
>
>         One of our services that runs with several thousand threads
>         recently noticed an increase
>         in safepoint stop times, but not gc times, upon transitioning to
>         JDK 8.
>
>         Further investigation revealed that most of the delta was
>         related to the so-called
>         pre-gc/vmop "cleanup" phase when various book-keeping activities
>         are performed,
>         and more specifically in the portion that walks java thread
>         stacks single-threaded (!)
>         and updates the hotness counters for the active nmethods. This
>         code appears to
>         be new to JDK 8 (in jdk 7 one would walk the stacks only during
>         code cache sweeps).
>
>         I have two questions:
>         (1) has anyone else (typically, I'd expect applications with
>         many hundreds or thousands of threads)
>         noticed this regression?
>         (2) Can we do better, for example, by:
>                (a) doing these updates by walking thread stacks in
>         multiple worker threads in parallel, or best of all:
>                (b) doing these updates when we walk the thread stacks
>         during GC, and skipping this phase entirely
>                      for non-GC safepoints (with attendant loss in
>         frequency of this update in low GC frequency
>                      scenarios).
>
>         It seems kind of silly to do GC's with many multiple worker
>         threads, but do these thread stack
>         walks single-threaded when it is embarrasingly parallel (one
>         could predicate the parallelization
>         based on the measured stack sizes and thread population, if
>         there was concern on the ovrhead of
>         activating and deactivating the thread gangs for the work).
>
>         A followup question: Any guesses as to how code cache
>         sweep/eviction quality might be compromised if one
>         were to dispense with these hotness updates entirely (or at a
>         much reduced frequency), as a temporary
>         workaround to the performance problem?
>
>         Thoughts/Comments? In particular, has this issue been addressed
>         perhaps in newer JVMs?
>
>         Thanks for any comments, feedback, pointers!
>         -- ramki
>
>         PS: for comparison, here's data with +TraceSafepointCleanup from
>         JDK 7 (first, where this isn't done)
>         vs JDK 8 (where this is done) with a program that has a few
>         thousands of threads:
>
>
>
>         JDK 7:
>         ..
>         2827.308: [sweeping nmethods, 0.0000020 secs]
>         2828.679: [sweeping nmethods, 0.0000030 secs]
>         2829.984: [sweeping nmethods, 0.0000030 secs]
>         2830.956: [sweeping nmethods, 0.0000030 secs]
>         ..
>
>         JDK 8:
>         ..
>         7368.634: [mark nmethods, 0.0177030 secs]
>         7369.587: [mark nmethods, 0.0178305 secs]
>         7370.479: [mark nmethods, 0.0180260 secs]
>         7371.503: [mark nmethods, 0.0186494 secs]
>         ..
>
>

From ysr1729 at gmail.com  Fri Jul 31 18:48:53 2015
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 31 Jul 2015 11:48:53 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <55BBC1C0.3030709@oracle.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
Message-ID: <CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>

Hi Vladimir --

I noticed the increase even with Initial and Reserved set to the default of
240 MB, but actual usage much lower (less than a quarter).

Look at this code path. Note that this is invoked at every safepoint
(although it says "periodically" in the comment).
In the mark_active_nmethods() method, there's a thread iteration in both
branches of the if. I haven't checked to
see which of the two was the culprit here, yet (if either).

// Various cleaning tasks that should be done periodically at safepoints

void SafepointSynchronize::do_cleanup_tasks() {

....

  {

    TraceTime t4("mark nmethods", TraceSafepointCleanupTime);

    NMethodSweeper::mark_active_nmethods();

  }

..

}


void NMethodSweeper::mark_active_nmethods() {

 ...

  if (!sweep_in_progress()) {

    _seen = 0;

    _sweep_fractions_left = NmethodSweepFraction;

    _current = CodeCache::first_nmethod();

    _traversals += 1;

    _total_time_this_sweep = Tickspan();


    if (PrintMethodFlushing) {

      tty->print_cr("### Sweep: stack traversal %d", _traversals);

    }

    Threads::nmethods_do(&mark_activation_closure);


  } else {

    // Only set hotness counter

    Threads::nmethods_do(&set_hotness_closure);

  }


  OrderAccess::storestore();

}

On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov <
vladimir.kozlov at oracle.com> wrote:

> Hi Ramki,
>
> Did you fill up CodeCache? It start scanning aggressive only with full
> CodeCache:
>
>   // Force stack scanning if there is only 10% free space in the code
> cache.
>   // We force stack scanning only non-profiled code heap gets full, since
> critical
>   // allocation go to the non-profiled heap and we must be make sure that
> there is
>   // enough space.
>   double free_percent = 1 /
> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>   if (free_percent <= StartAggressiveSweepingAt) {
>     do_stack_scanning();
>   }
>
> Vladimir
>
> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>
>>
>> Yes.
>>
>>
>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <vitalyd at gmail.com
>> <mailto:vitalyd at gmail.com>> wrote:
>>
>>     Ramki, are you running tiered compilation?
>>
>>     sent from my phone
>>
>>     On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com
>>     <mailto:ysr1729 at gmail.com>> wrote:
>>
>>
>>         Hello GC and Compiler teams!
>>
>>         One of our services that runs with several thousand threads
>>         recently noticed an increase
>>         in safepoint stop times, but not gc times, upon transitioning to
>>         JDK 8.
>>
>>         Further investigation revealed that most of the delta was
>>         related to the so-called
>>         pre-gc/vmop "cleanup" phase when various book-keeping activities
>>         are performed,
>>         and more specifically in the portion that walks java thread
>>         stacks single-threaded (!)
>>         and updates the hotness counters for the active nmethods. This
>>         code appears to
>>         be new to JDK 8 (in jdk 7 one would walk the stacks only during
>>         code cache sweeps).
>>
>>         I have two questions:
>>         (1) has anyone else (typically, I'd expect applications with
>>         many hundreds or thousands of threads)
>>         noticed this regression?
>>         (2) Can we do better, for example, by:
>>                (a) doing these updates by walking thread stacks in
>>         multiple worker threads in parallel, or best of all:
>>                (b) doing these updates when we walk the thread stacks
>>         during GC, and skipping this phase entirely
>>                      for non-GC safepoints (with attendant loss in
>>         frequency of this update in low GC frequency
>>                      scenarios).
>>
>>         It seems kind of silly to do GC's with many multiple worker
>>         threads, but do these thread stack
>>         walks single-threaded when it is embarrasingly parallel (one
>>         could predicate the parallelization
>>         based on the measured stack sizes and thread population, if
>>         there was concern on the ovrhead of
>>         activating and deactivating the thread gangs for the work).
>>
>>         A followup question: Any guesses as to how code cache
>>         sweep/eviction quality might be compromised if one
>>         were to dispense with these hotness updates entirely (or at a
>>         much reduced frequency), as a temporary
>>         workaround to the performance problem?
>>
>>         Thoughts/Comments? In particular, has this issue been addressed
>>         perhaps in newer JVMs?
>>
>>         Thanks for any comments, feedback, pointers!
>>         -- ramki
>>
>>         PS: for comparison, here's data with +TraceSafepointCleanup from
>>         JDK 7 (first, where this isn't done)
>>         vs JDK 8 (where this is done) with a program that has a few
>>         thousands of threads:
>>
>>
>>
>>         JDK 7:
>>         ..
>>         2827.308: [sweeping nmethods, 0.0000020 secs]
>>         2828.679: [sweeping nmethods, 0.0000030 secs]
>>         2829.984: [sweeping nmethods, 0.0000030 secs]
>>         2830.956: [sweeping nmethods, 0.0000030 secs]
>>         ..
>>
>>         JDK 8:
>>         ..
>>         7368.634: [mark nmethods, 0.0177030 secs]
>>         7369.587: [mark nmethods, 0.0178305 secs]
>>         7370.479: [mark nmethods, 0.0180260 secs]
>>         7371.503: [mark nmethods, 0.0186494 secs]
>>         ..
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/1f262687/attachment.html>

From ysr1729 at gmail.com  Fri Jul 31 19:07:18 2015
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 31 Jul 2015 12:07:18 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
Message-ID: <CABzyjy==w=riDMkGWk3FEmct-WaX91z=W2+fZk2eYAhHKCZ0gw@mail.gmail.com>

Hi Vladimir --


Here's a snapshot of the counters:

sun.ci.codeCacheCapacity=251658240

sun.ci.codeCacheMaxCapacity=251658240

sun.ci.codeCacheMethodsReclaimedNum=3450

sun.ci.codeCacheSweepsTotalNum=58

sun.ci.codeCacheSweepsTotalTimeMillis=1111

sun.ci.codeCacheUsed=35888704


Notice that the code cache usage is less that 35 MB, for the 240 MB
capacity, yet it seems we have had 58 sweeps already, and safepoint cleanup
says:

[mark nmethods, 0.0165062 secs]

Even if the two closures do little or no work, the single-threaded walk
over deep stacks of a thousand threads will cost time for applications with
many threads, and this is now done at each safepoint irrespective of the
sweeper activity as far as I can tell. It seems as if this work should be
somehow rolled up (via a suitable injection) into GC's thread walks that
are done in parallel, rather than doing this in a pre-GC phase (unless I am
mssing some reason that the sequencing is necessary, which it doesn't seem
to be here).

-- ramki

On Fri, Jul 31, 2015 at 11:48 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>
wrote:

> Hi Vladimir --
>
> I noticed the increase even with Initial and Reserved set to the default
> of 240 MB, but actual usage much lower (less than a quarter).
>
> Look at this code path. Note that this is invoked at every safepoint
> (although it says "periodically" in the comment).
> In the mark_active_nmethods() method, there's a thread iteration in both
> branches of the if. I haven't checked to
> see which of the two was the culprit here, yet (if either).
>
> // Various cleaning tasks that should be done periodically at safepoints
>
> void SafepointSynchronize::do_cleanup_tasks() {
>
> ....
>
>   {
>
>     TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>
>     NMethodSweeper::mark_active_nmethods();
>
>   }
>
> ..
>
> }
>
>
> void NMethodSweeper::mark_active_nmethods() {
>
>  ...
>
>   if (!sweep_in_progress()) {
>
>     _seen = 0;
>
>     _sweep_fractions_left = NmethodSweepFraction;
>
>     _current = CodeCache::first_nmethod();
>
>     _traversals += 1;
>
>     _total_time_this_sweep = Tickspan();
>
>
>     if (PrintMethodFlushing) {
>
>       tty->print_cr("### Sweep: stack traversal %d", _traversals);
>
>     }
>
>     Threads::nmethods_do(&mark_activation_closure);
>
>
>   } else {
>
>     // Only set hotness counter
>
>     Threads::nmethods_do(&set_hotness_closure);
>
>   }
>
>
>   OrderAccess::storestore();
>
> }
>
> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov <
> vladimir.kozlov at oracle.com> wrote:
>
>> Hi Ramki,
>>
>> Did you fill up CodeCache? It start scanning aggressive only with full
>> CodeCache:
>>
>>   // Force stack scanning if there is only 10% free space in the code
>> cache.
>>   // We force stack scanning only non-profiled code heap gets full, since
>> critical
>>   // allocation go to the non-profiled heap and we must be make sure that
>> there is
>>   // enough space.
>>   double free_percent = 1 /
>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>   if (free_percent <= StartAggressiveSweepingAt) {
>>     do_stack_scanning();
>>   }
>>
>> Vladimir
>>
>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>
>>>
>>> Yes.
>>>
>>>
>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <vitalyd at gmail.com
>>> <mailto:vitalyd at gmail.com>> wrote:
>>>
>>>     Ramki, are you running tiered compilation?
>>>
>>>     sent from my phone
>>>
>>>     On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com
>>>     <mailto:ysr1729 at gmail.com>> wrote:
>>>
>>>
>>>         Hello GC and Compiler teams!
>>>
>>>         One of our services that runs with several thousand threads
>>>         recently noticed an increase
>>>         in safepoint stop times, but not gc times, upon transitioning to
>>>         JDK 8.
>>>
>>>         Further investigation revealed that most of the delta was
>>>         related to the so-called
>>>         pre-gc/vmop "cleanup" phase when various book-keeping activities
>>>         are performed,
>>>         and more specifically in the portion that walks java thread
>>>         stacks single-threaded (!)
>>>         and updates the hotness counters for the active nmethods. This
>>>         code appears to
>>>         be new to JDK 8 (in jdk 7 one would walk the stacks only during
>>>         code cache sweeps).
>>>
>>>         I have two questions:
>>>         (1) has anyone else (typically, I'd expect applications with
>>>         many hundreds or thousands of threads)
>>>         noticed this regression?
>>>         (2) Can we do better, for example, by:
>>>                (a) doing these updates by walking thread stacks in
>>>         multiple worker threads in parallel, or best of all:
>>>                (b) doing these updates when we walk the thread stacks
>>>         during GC, and skipping this phase entirely
>>>                      for non-GC safepoints (with attendant loss in
>>>         frequency of this update in low GC frequency
>>>                      scenarios).
>>>
>>>         It seems kind of silly to do GC's with many multiple worker
>>>         threads, but do these thread stack
>>>         walks single-threaded when it is embarrasingly parallel (one
>>>         could predicate the parallelization
>>>         based on the measured stack sizes and thread population, if
>>>         there was concern on the ovrhead of
>>>         activating and deactivating the thread gangs for the work).
>>>
>>>         A followup question: Any guesses as to how code cache
>>>         sweep/eviction quality might be compromised if one
>>>         were to dispense with these hotness updates entirely (or at a
>>>         much reduced frequency), as a temporary
>>>         workaround to the performance problem?
>>>
>>>         Thoughts/Comments? In particular, has this issue been addressed
>>>         perhaps in newer JVMs?
>>>
>>>         Thanks for any comments, feedback, pointers!
>>>         -- ramki
>>>
>>>         PS: for comparison, here's data with +TraceSafepointCleanup from
>>>         JDK 7 (first, where this isn't done)
>>>         vs JDK 8 (where this is done) with a program that has a few
>>>         thousands of threads:
>>>
>>>
>>>
>>>         JDK 7:
>>>         ..
>>>         2827.308: [sweeping nmethods, 0.0000020 secs]
>>>         2828.679: [sweeping nmethods, 0.0000030 secs]
>>>         2829.984: [sweeping nmethods, 0.0000030 secs]
>>>         2830.956: [sweeping nmethods, 0.0000030 secs]
>>>         ..
>>>
>>>         JDK 8:
>>>         ..
>>>         7368.634: [mark nmethods, 0.0177030 secs]
>>>         7369.587: [mark nmethods, 0.0178305 secs]
>>>         7370.479: [mark nmethods, 0.0180260 secs]
>>>         7371.503: [mark nmethods, 0.0186494 secs]
>>>         ..
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/f72802b9/attachment-0001.html>

From ysr1729 at gmail.com  Fri Jul 31 19:08:50 2015
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 31 Jul 2015 12:08:50 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CABzyjy==w=riDMkGWk3FEmct-WaX91z=W2+fZk2eYAhHKCZ0gw@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
	<CABzyjy==w=riDMkGWk3FEmct-WaX91z=W2+fZk2eYAhHKCZ0gw@mail.gmail.com>
Message-ID: <CABzyjy=y1312mk+WP1260giQAmimnt2=PyJhP2+H_poxO6QyEw@mail.gmail.com>

BTW, i think some of those are counters that we added (a patch for which is
attached to the openjdk ticket i opened a few months ago, i think)...

-- ramki

On Fri, Jul 31, 2015 at 12:07 PM, Srinivas Ramakrishna <ysr1729 at gmail.com>
wrote:

> Hi Vladimir --
>
>
> Here's a snapshot of the counters:
>
> sun.ci.codeCacheCapacity=251658240
>
> sun.ci.codeCacheMaxCapacity=251658240
>
> sun.ci.codeCacheMethodsReclaimedNum=3450
>
> sun.ci.codeCacheSweepsTotalNum=58
>
> sun.ci.codeCacheSweepsTotalTimeMillis=1111
>
> sun.ci.codeCacheUsed=35888704
>
>
> Notice that the code cache usage is less that 35 MB, for the 240 MB
> capacity, yet it seems we have had 58 sweeps already, and safepoint cleanup
> says:
>
> [mark nmethods, 0.0165062 secs]
>
> Even if the two closures do little or no work, the single-threaded walk
> over deep stacks of a thousand threads will cost time for applications with
> many threads, and this is now done at each safepoint irrespective of the
> sweeper activity as far as I can tell. It seems as if this work should be
> somehow rolled up (via a suitable injection) into GC's thread walks that
> are done in parallel, rather than doing this in a pre-GC phase (unless I am
> mssing some reason that the sequencing is necessary, which it doesn't seem
> to be here).
>
> -- ramki
>
> On Fri, Jul 31, 2015 at 11:48 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>
> wrote:
>
>> Hi Vladimir --
>>
>> I noticed the increase even with Initial and Reserved set to the default
>> of 240 MB, but actual usage much lower (less than a quarter).
>>
>> Look at this code path. Note that this is invoked at every safepoint
>> (although it says "periodically" in the comment).
>> In the mark_active_nmethods() method, there's a thread iteration in both
>> branches of the if. I haven't checked to
>> see which of the two was the culprit here, yet (if either).
>>
>> // Various cleaning tasks that should be done periodically at safepoints
>>
>> void SafepointSynchronize::do_cleanup_tasks() {
>>
>> ....
>>
>>   {
>>
>>     TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>>
>>     NMethodSweeper::mark_active_nmethods();
>>
>>   }
>>
>> ..
>>
>> }
>>
>>
>> void NMethodSweeper::mark_active_nmethods() {
>>
>>  ...
>>
>>   if (!sweep_in_progress()) {
>>
>>     _seen = 0;
>>
>>     _sweep_fractions_left = NmethodSweepFraction;
>>
>>     _current = CodeCache::first_nmethod();
>>
>>     _traversals += 1;
>>
>>     _total_time_this_sweep = Tickspan();
>>
>>
>>     if (PrintMethodFlushing) {
>>
>>       tty->print_cr("### Sweep: stack traversal %d", _traversals);
>>
>>     }
>>
>>     Threads::nmethods_do(&mark_activation_closure);
>>
>>
>>   } else {
>>
>>     // Only set hotness counter
>>
>>     Threads::nmethods_do(&set_hotness_closure);
>>
>>   }
>>
>>
>>   OrderAccess::storestore();
>>
>> }
>>
>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com> wrote:
>>
>>> Hi Ramki,
>>>
>>> Did you fill up CodeCache? It start scanning aggressive only with full
>>> CodeCache:
>>>
>>>   // Force stack scanning if there is only 10% free space in the code
>>> cache.
>>>   // We force stack scanning only non-profiled code heap gets full,
>>> since critical
>>>   // allocation go to the non-profiled heap and we must be make sure
>>> that there is
>>>   // enough space.
>>>   double free_percent = 1 /
>>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>>   if (free_percent <= StartAggressiveSweepingAt) {
>>>     do_stack_scanning();
>>>   }
>>>
>>> Vladimir
>>>
>>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <vitalyd at gmail.com
>>>> <mailto:vitalyd at gmail.com>> wrote:
>>>>
>>>>     Ramki, are you running tiered compilation?
>>>>
>>>>     sent from my phone
>>>>
>>>>     On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com
>>>>     <mailto:ysr1729 at gmail.com>> wrote:
>>>>
>>>>
>>>>         Hello GC and Compiler teams!
>>>>
>>>>         One of our services that runs with several thousand threads
>>>>         recently noticed an increase
>>>>         in safepoint stop times, but not gc times, upon transitioning to
>>>>         JDK 8.
>>>>
>>>>         Further investigation revealed that most of the delta was
>>>>         related to the so-called
>>>>         pre-gc/vmop "cleanup" phase when various book-keeping activities
>>>>         are performed,
>>>>         and more specifically in the portion that walks java thread
>>>>         stacks single-threaded (!)
>>>>         and updates the hotness counters for the active nmethods. This
>>>>         code appears to
>>>>         be new to JDK 8 (in jdk 7 one would walk the stacks only during
>>>>         code cache sweeps).
>>>>
>>>>         I have two questions:
>>>>         (1) has anyone else (typically, I'd expect applications with
>>>>         many hundreds or thousands of threads)
>>>>         noticed this regression?
>>>>         (2) Can we do better, for example, by:
>>>>                (a) doing these updates by walking thread stacks in
>>>>         multiple worker threads in parallel, or best of all:
>>>>                (b) doing these updates when we walk the thread stacks
>>>>         during GC, and skipping this phase entirely
>>>>                      for non-GC safepoints (with attendant loss in
>>>>         frequency of this update in low GC frequency
>>>>                      scenarios).
>>>>
>>>>         It seems kind of silly to do GC's with many multiple worker
>>>>         threads, but do these thread stack
>>>>         walks single-threaded when it is embarrasingly parallel (one
>>>>         could predicate the parallelization
>>>>         based on the measured stack sizes and thread population, if
>>>>         there was concern on the ovrhead of
>>>>         activating and deactivating the thread gangs for the work).
>>>>
>>>>         A followup question: Any guesses as to how code cache
>>>>         sweep/eviction quality might be compromised if one
>>>>         were to dispense with these hotness updates entirely (or at a
>>>>         much reduced frequency), as a temporary
>>>>         workaround to the performance problem?
>>>>
>>>>         Thoughts/Comments? In particular, has this issue been addressed
>>>>         perhaps in newer JVMs?
>>>>
>>>>         Thanks for any comments, feedback, pointers!
>>>>         -- ramki
>>>>
>>>>         PS: for comparison, here's data with +TraceSafepointCleanup from
>>>>         JDK 7 (first, where this isn't done)
>>>>         vs JDK 8 (where this is done) with a program that has a few
>>>>         thousands of threads:
>>>>
>>>>
>>>>
>>>>         JDK 7:
>>>>         ..
>>>>         2827.308: [sweeping nmethods, 0.0000020 secs]
>>>>         2828.679: [sweeping nmethods, 0.0000030 secs]
>>>>         2829.984: [sweeping nmethods, 0.0000030 secs]
>>>>         2830.956: [sweeping nmethods, 0.0000030 secs]
>>>>         ..
>>>>
>>>>         JDK 8:
>>>>         ..
>>>>         7368.634: [mark nmethods, 0.0177030 secs]
>>>>         7369.587: [mark nmethods, 0.0178305 secs]
>>>>         7370.479: [mark nmethods, 0.0180260 secs]
>>>>         7371.503: [mark nmethods, 0.0186494 secs]
>>>>         ..
>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/65b269d0/attachment.html>

From michael.c.berg at intel.com  Fri Jul 31 19:47:24 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 31 Jul 2015 19:47:24 +0000
Subject: 8132160 - support for AVX 512 call frames and stack management
In-Reply-To: <C568518E7B433348B114B6A7122D474755E3E637@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755E3E637@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474755E426B8@FMSMSX102.amr.corp.intel.com>

I have revised the webrev below with more EVEX based changes for 64-bit and 32-bit.
Accordingly, I still need two reviewers.  Vladimir, can you be first reviewer?

Thanks,
-Michael


From: Berg, Michael C
Sent: Wednesday, July 22, 2015 10:55 PM
To: 'hotspot-compiler-dev at openjdk.java.net'
Subject: RFR: 8132160 - support for AVX 512 call frames and stack management


Hi Folks,

I would like to contribute AVX 512 call frame and stack management changes. I need two reviewers to examine this patch and comment as needed:

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132160

webrev:
http://cr.openjdk.java.net/~mcberg/8132160/webrev.01/

These changes simplify frame management on 32-bit and 64-bit systems which support EVEX and extend more complete frame save and restore functionality as well as stack management for calls, traps and explicit exception paths. These changes also move CPUID queries into the assembler object state and add more state rules to a large class of instructions while simplifying their use. Also added is support for vectorizing double precision sqrt which is available through the math library. Many generated stubs and internal functions also now have predicated mask management for EVEX added.

Thanks,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/293e6e99/attachment-0001.html>

From vladimir.kozlov at oracle.com  Fri Jul 31 21:28:35 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 31 Jul 2015 14:28:35 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
Message-ID: <55BBE883.1080308@oracle.com>

Got it. Yes, it is issue with thousands java threads.
You are the first pointing this problem. File bug on compiler. We will 
look what we can do. Most likely we need parallelize this work.

Method's hotness is used only for UseCodeCacheFlushing. You can try to 
guard Threads::nmethods_do(&set_hotness_closure); with this flag and 
switch it off.

We need mark_as_seen_on_stack so leave it.

Thanks,
Vladimir

On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:
> Hi Vladimir --
>
> I noticed the increase even with Initial and Reserved set to the default
> of 240 MB, but actual usage much lower (less than a quarter).
>
> Look at this code path. Note that this is invoked at every safepoint
> (although it says "periodically" in the comment).
> In the mark_active_nmethods() method, there's a thread iteration in both
> branches of the if. I haven't checked to
> see which of the two was the culprit here, yet (if either).
>
> // Various cleaning tasks that should be done periodically at safepoints
>
> void SafepointSynchronize::do_cleanup_tasks() {
>
> ....
>
>    {
>
>      TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>
>      NMethodSweeper::mark_active_nmethods();
>
>    }
>
> ..
>
> }
>
>
> void NMethodSweeper::mark_active_nmethods() {
>
>   ...
>
>    if (!sweep_in_progress()) {
>
>      _seen = 0;
>
>      _sweep_fractions_left = NmethodSweepFraction;
>
>      _current = CodeCache::first_nmethod();
>
>      _traversals += 1;
>
>      _total_time_this_sweep = Tickspan();
>
>
>      if (PrintMethodFlushing) {
>
>        tty->print_cr("### Sweep: stack traversal %d", _traversals);
>
>      }
>
>      Threads::nmethods_do(&mark_activation_closure);
>
>
>    } else {
>
>      // Only set hotness counter
>
>      Threads::nmethods_do(&set_hotness_closure);
>
>    }
>
>
>    OrderAccess::storestore();
>
> }
>
>
> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     Hi Ramki,
>
>     Did you fill up CodeCache? It start scanning aggressive only with
>     full CodeCache:
>
>        // Force stack scanning if there is only 10% free space in the
>     code cache.
>        // We force stack scanning only non-profiled code heap gets full,
>     since critical
>        // allocation go to the non-profiled heap and we must be make
>     sure that there is
>        // enough space.
>        double free_percent = 1 /
>     CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>        if (free_percent <= StartAggressiveSweepingAt) {
>          do_stack_scanning();
>        }
>
>     Vladimir
>
>     On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>
>
>         Yes.
>
>
>         On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich
>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>
>              Ramki, are you running tiered compilation?
>
>              sent from my phone
>
>              On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"
>         <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>
>              <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>> wrote:
>
>
>                  Hello GC and Compiler teams!
>
>                  One of our services that runs with several thousand threads
>                  recently noticed an increase
>                  in safepoint stop times, but not gc times, upon
>         transitioning to
>                  JDK 8.
>
>                  Further investigation revealed that most of the delta was
>                  related to the so-called
>                  pre-gc/vmop "cleanup" phase when various book-keeping
>         activities
>                  are performed,
>                  and more specifically in the portion that walks java thread
>                  stacks single-threaded (!)
>                  and updates the hotness counters for the active
>         nmethods. This
>                  code appears to
>                  be new to JDK 8 (in jdk 7 one would walk the stacks
>         only during
>                  code cache sweeps).
>
>                  I have two questions:
>                  (1) has anyone else (typically, I'd expect applications
>         with
>                  many hundreds or thousands of threads)
>                  noticed this regression?
>                  (2) Can we do better, for example, by:
>                         (a) doing these updates by walking thread stacks in
>                  multiple worker threads in parallel, or best of all:
>                         (b) doing these updates when we walk the thread
>         stacks
>                  during GC, and skipping this phase entirely
>                               for non-GC safepoints (with attendant loss in
>                  frequency of this update in low GC frequency
>                               scenarios).
>
>                  It seems kind of silly to do GC's with many multiple worker
>                  threads, but do these thread stack
>                  walks single-threaded when it is embarrasingly parallel
>         (one
>                  could predicate the parallelization
>                  based on the measured stack sizes and thread population, if
>                  there was concern on the ovrhead of
>                  activating and deactivating the thread gangs for the work).
>
>                  A followup question: Any guesses as to how code cache
>                  sweep/eviction quality might be compromised if one
>                  were to dispense with these hotness updates entirely
>         (or at a
>                  much reduced frequency), as a temporary
>                  workaround to the performance problem?
>
>                  Thoughts/Comments? In particular, has this issue been
>         addressed
>                  perhaps in newer JVMs?
>
>                  Thanks for any comments, feedback, pointers!
>                  -- ramki
>
>                  PS: for comparison, here's data with
>         +TraceSafepointCleanup from
>                  JDK 7 (first, where this isn't done)
>                  vs JDK 8 (where this is done) with a program that has a few
>                  thousands of threads:
>
>
>
>                  JDK 7:
>                  ..
>                  2827.308: [sweeping nmethods, 0.0000020 secs]
>                  2828.679: [sweeping nmethods, 0.0000030 secs]
>                  2829.984: [sweeping nmethods, 0.0000030 secs]
>                  2830.956: [sweeping nmethods, 0.0000030 secs]
>                  ..
>
>                  JDK 8:
>                  ..
>                  7368.634: [mark nmethods, 0.0177030 secs]
>                  7369.587: [mark nmethods, 0.0178305 secs]
>                  7370.479: [mark nmethods, 0.0180260 secs]
>                  7371.503: [mark nmethods, 0.0186494 secs]
>                  ..
>
>
>

From ysr1729 at gmail.com  Fri Jul 31 22:02:06 2015
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 31 Jul 2015 15:02:06 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <55BBE883.1080308@oracle.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
	<55BBE883.1080308@oracle.com>
Message-ID: <CABzyjykMwmdWLqa9=J6uiF6Lji51r_zHX_TDFxWKdTgrdQP4tA@mail.gmail.com>

OK, will do and add you as watcher; thanks Vladimir! (don't yet know if
with tiered and a necessarily bounded, if large, code cache whether
flushing will in fact eventually become necessary, wrt yr suggested
temporary workaround.)

Have a good weekend!
-- ramki

On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> Got it. Yes, it is issue with thousands java threads.
> You are the first pointing this problem. File bug on compiler. We will
> look what we can do. Most likely we need parallelize this work.
>
> Method's hotness is used only for UseCodeCacheFlushing. You can try to
> guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch
> it off.
>
> We need mark_as_seen_on_stack so leave it.
>
> Thanks,
> Vladimir
>
>
> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:
>
>> Hi Vladimir --
>>
>> I noticed the increase even with Initial and Reserved set to the default
>> of 240 MB, but actual usage much lower (less than a quarter).
>>
>> Look at this code path. Note that this is invoked at every safepoint
>> (although it says "periodically" in the comment).
>> In the mark_active_nmethods() method, there's a thread iteration in both
>> branches of the if. I haven't checked to
>> see which of the two was the culprit here, yet (if either).
>>
>> // Various cleaning tasks that should be done periodically at safepoints
>>
>> void SafepointSynchronize::do_cleanup_tasks() {
>>
>> ....
>>
>>    {
>>
>>      TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>>
>>      NMethodSweeper::mark_active_nmethods();
>>
>>    }
>>
>> ..
>>
>> }
>>
>>
>> void NMethodSweeper::mark_active_nmethods() {
>>
>>   ...
>>
>>    if (!sweep_in_progress()) {
>>
>>      _seen = 0;
>>
>>      _sweep_fractions_left = NmethodSweepFraction;
>>
>>      _current = CodeCache::first_nmethod();
>>
>>      _traversals += 1;
>>
>>      _total_time_this_sweep = Tickspan();
>>
>>
>>      if (PrintMethodFlushing) {
>>
>>        tty->print_cr("### Sweep: stack traversal %d", _traversals);
>>
>>      }
>>
>>      Threads::nmethods_do(&mark_activation_closure);
>>
>>
>>    } else {
>>
>>      // Only set hotness counter
>>
>>      Threads::nmethods_do(&set_hotness_closure);
>>
>>    }
>>
>>
>>    OrderAccess::storestore();
>>
>> }
>>
>>
>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     Hi Ramki,
>>
>>     Did you fill up CodeCache? It start scanning aggressive only with
>>     full CodeCache:
>>
>>        // Force stack scanning if there is only 10% free space in the
>>     code cache.
>>        // We force stack scanning only non-profiled code heap gets full,
>>     since critical
>>        // allocation go to the non-profiled heap and we must be make
>>     sure that there is
>>        // enough space.
>>        double free_percent = 1 /
>>     CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>        if (free_percent <= StartAggressiveSweepingAt) {
>>          do_stack_scanning();
>>        }
>>
>>     Vladimir
>>
>>     On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>
>>
>>         Yes.
>>
>>
>>         On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich
>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>
>>              Ramki, are you running tiered compilation?
>>
>>              sent from my phone
>>
>>              On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"
>>         <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>
>>              <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>>
>> wrote:
>>
>>
>>                  Hello GC and Compiler teams!
>>
>>                  One of our services that runs with several thousand
>> threads
>>                  recently noticed an increase
>>                  in safepoint stop times, but not gc times, upon
>>         transitioning to
>>                  JDK 8.
>>
>>                  Further investigation revealed that most of the delta was
>>                  related to the so-called
>>                  pre-gc/vmop "cleanup" phase when various book-keeping
>>         activities
>>                  are performed,
>>                  and more specifically in the portion that walks java
>> thread
>>                  stacks single-threaded (!)
>>                  and updates the hotness counters for the active
>>         nmethods. This
>>                  code appears to
>>                  be new to JDK 8 (in jdk 7 one would walk the stacks
>>         only during
>>                  code cache sweeps).
>>
>>                  I have two questions:
>>                  (1) has anyone else (typically, I'd expect applications
>>         with
>>                  many hundreds or thousands of threads)
>>                  noticed this regression?
>>                  (2) Can we do better, for example, by:
>>                         (a) doing these updates by walking thread stacks
>> in
>>                  multiple worker threads in parallel, or best of all:
>>                         (b) doing these updates when we walk the thread
>>         stacks
>>                  during GC, and skipping this phase entirely
>>                               for non-GC safepoints (with attendant loss
>> in
>>                  frequency of this update in low GC frequency
>>                               scenarios).
>>
>>                  It seems kind of silly to do GC's with many multiple
>> worker
>>                  threads, but do these thread stack
>>                  walks single-threaded when it is embarrasingly parallel
>>         (one
>>                  could predicate the parallelization
>>                  based on the measured stack sizes and thread population,
>> if
>>                  there was concern on the ovrhead of
>>                  activating and deactivating the thread gangs for the
>> work).
>>
>>                  A followup question: Any guesses as to how code cache
>>                  sweep/eviction quality might be compromised if one
>>                  were to dispense with these hotness updates entirely
>>         (or at a
>>                  much reduced frequency), as a temporary
>>                  workaround to the performance problem?
>>
>>                  Thoughts/Comments? In particular, has this issue been
>>         addressed
>>                  perhaps in newer JVMs?
>>
>>                  Thanks for any comments, feedback, pointers!
>>                  -- ramki
>>
>>                  PS: for comparison, here's data with
>>         +TraceSafepointCleanup from
>>                  JDK 7 (first, where this isn't done)
>>                  vs JDK 8 (where this is done) with a program that has a
>> few
>>                  thousands of threads:
>>
>>
>>
>>                  JDK 7:
>>                  ..
>>                  2827.308: [sweeping nmethods, 0.0000020 secs]
>>                  2828.679: [sweeping nmethods, 0.0000030 secs]
>>                  2829.984: [sweeping nmethods, 0.0000030 secs]
>>                  2830.956: [sweeping nmethods, 0.0000030 secs]
>>                  ..
>>
>>                  JDK 8:
>>                  ..
>>                  7368.634: [mark nmethods, 0.0177030 secs]
>>                  7369.587: [mark nmethods, 0.0178305 secs]
>>                  7370.479: [mark nmethods, 0.0180260 secs]
>>                  7371.503: [mark nmethods, 0.0186494 secs]
>>                  ..
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/22be048c/attachment-0001.html>

From ysr1729 at gmail.com  Fri Jul 31 22:04:28 2015
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 31 Jul 2015 15:04:28 -0700
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <20150731234445.0000459c.ecki@zusammenkunft.net>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
	<CABzyjy==w=riDMkGWk3FEmct-WaX91z=W2+fZk2eYAhHKCZ0gw@mail.gmail.com>
	<20150731234445.0000459c.ecki@zusammenkunft.net>
Message-ID: <CABzyjymUoEDLOfF74nEzVOOPg7ETSXYpRQ=SO3suzVWFW=sDkQ@mail.gmail.com>

Hi Bernd --

It doesn't seem coupled to GC; here's an example snapshot:

sun.ci.codeCacheCapacity=13041664
sun.ci.codeCacheMaxCapacity=50331648
sun.ci.codeCacheMethodsReclaimedNum=4281
sun.ci.codeCacheSweepsTotalNum=409
sun.ci.codeCacheSweepsTotalTimeMillis=1541
sun.ci.codeCacheUsed=10864704

sun.gc.collector.0.invocations=6319
sun.gc.collector.1.invocations=6

BTW, to a question of Vitaly's on this thread earlier, this code executes
irrespective of whether you are running tiered or not (the above is from
tiered explictly off  -- earlier ones were with tiered on --, yet note the
excessive time in the stack walk as part of this code in JDK 8):

[mark nmethods, 0.0171828 secs]

On a similar note (and to Vladimir's earlier note of turning off code cache
flushing) and with the understanding that there were lots of changes
related to tiered compilation and code cache flushing between 7 ad 8,
turning on Tiered in 7 leads to the occasional (rare) long safepoint for
the stack walks, but else not.

And finally the same issue must exist in 9 as well, albeit based on code
inspection, not running with JDK 9 yet.
Have a good weekend!
-- ramki

On Fri, Jul 31, 2015 at 2:44 PM, Bernd Eckenfels <ecki at zusammenkunft.net>
wrote:

> Am Fri, 31 Jul 2015 12:07:18 -0700
> schrieb Srinivas Ramakrishna <ysr1729 at gmail.com>:
>
> > sun.ci.codeCacheSweepsTotalNum=58
> ...
> > Notice that the code cache usage is less that 35 MB, for the 240 MB
> > capacity, yet it seems we have had 58 sweeps already
>
> I would also be interested in what causes this. Is this caused by
> System.gc maybe? (we do see sweeps and decreasing code cache usage on
> systems where no pressure should exist).
>
> Gruss
> Bernd
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/2d27ac13/attachment.html>

From vitalyd at gmail.com  Fri Jul 31 22:08:14 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 31 Jul 2015 18:08:14 -0400
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CABzyjykMwmdWLqa9=J6uiF6Lji51r_zHX_TDFxWKdTgrdQP4tA@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
	<55BBE883.1080308@oracle.com>
	<CABzyjykMwmdWLqa9=J6uiF6Lji51r_zHX_TDFxWKdTgrdQP4tA@mail.gmail.com>
Message-ID: <CAHjP37HRYF-kB-qMGeK3tAL4A9cvzHu_P0rCYPwbnAtn6dViTQ@mail.gmail.com>

Ramki, are you actually seeing better peak perf with tiered than C2? I
experimented with it on a real workload and it was a net loss for peak perf
(anywhere from 8-20% worse than C2, but also quite unstable); this was with
a very large code cache to play it safe, but no other tuning.

sent from my phone
On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

> OK, will do and add you as watcher; thanks Vladimir! (don't yet know if
> with tiered and a necessarily bounded, if large, code cache whether
> flushing will in fact eventually become necessary, wrt yr suggested
> temporary workaround.)
>
> Have a good weekend!
> -- ramki
>
> On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov <
> vladimir.kozlov at oracle.com> wrote:
>
>> Got it. Yes, it is issue with thousands java threads.
>> You are the first pointing this problem. File bug on compiler. We will
>> look what we can do. Most likely we need parallelize this work.
>>
>> Method's hotness is used only for UseCodeCacheFlushing. You can try to
>> guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch
>> it off.
>>
>> We need mark_as_seen_on_stack so leave it.
>>
>> Thanks,
>> Vladimir
>>
>>
>> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:
>>
>>> Hi Vladimir --
>>>
>>> I noticed the increase even with Initial and Reserved set to the default
>>> of 240 MB, but actual usage much lower (less than a quarter).
>>>
>>> Look at this code path. Note that this is invoked at every safepoint
>>> (although it says "periodically" in the comment).
>>> In the mark_active_nmethods() method, there's a thread iteration in both
>>> branches of the if. I haven't checked to
>>> see which of the two was the culprit here, yet (if either).
>>>
>>> // Various cleaning tasks that should be done periodically at safepoints
>>>
>>> void SafepointSynchronize::do_cleanup_tasks() {
>>>
>>> ....
>>>
>>>    {
>>>
>>>      TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>>>
>>>      NMethodSweeper::mark_active_nmethods();
>>>
>>>    }
>>>
>>> ..
>>>
>>> }
>>>
>>>
>>> void NMethodSweeper::mark_active_nmethods() {
>>>
>>>   ...
>>>
>>>    if (!sweep_in_progress()) {
>>>
>>>      _seen = 0;
>>>
>>>      _sweep_fractions_left = NmethodSweepFraction;
>>>
>>>      _current = CodeCache::first_nmethod();
>>>
>>>      _traversals += 1;
>>>
>>>      _total_time_this_sweep = Tickspan();
>>>
>>>
>>>      if (PrintMethodFlushing) {
>>>
>>>        tty->print_cr("### Sweep: stack traversal %d", _traversals);
>>>
>>>      }
>>>
>>>      Threads::nmethods_do(&mark_activation_closure);
>>>
>>>
>>>    } else {
>>>
>>>      // Only set hotness counter
>>>
>>>      Threads::nmethods_do(&set_hotness_closure);
>>>
>>>    }
>>>
>>>
>>>    OrderAccess::storestore();
>>>
>>> }
>>>
>>>
>>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>
>>>     Hi Ramki,
>>>
>>>     Did you fill up CodeCache? It start scanning aggressive only with
>>>     full CodeCache:
>>>
>>>        // Force stack scanning if there is only 10% free space in the
>>>     code cache.
>>>        // We force stack scanning only non-profiled code heap gets full,
>>>     since critical
>>>        // allocation go to the non-profiled heap and we must be make
>>>     sure that there is
>>>        // enough space.
>>>        double free_percent = 1 /
>>>     CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>>        if (free_percent <= StartAggressiveSweepingAt) {
>>>          do_stack_scanning();
>>>        }
>>>
>>>     Vladimir
>>>
>>>     On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>>
>>>
>>>         Yes.
>>>
>>>
>>>         On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich
>>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>>
>>>              Ramki, are you running tiered compilation?
>>>
>>>              sent from my phone
>>>
>>>              On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"
>>>         <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>
>>>              <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>>
>>> wrote:
>>>
>>>
>>>                  Hello GC and Compiler teams!
>>>
>>>                  One of our services that runs with several thousand
>>> threads
>>>                  recently noticed an increase
>>>                  in safepoint stop times, but not gc times, upon
>>>         transitioning to
>>>                  JDK 8.
>>>
>>>                  Further investigation revealed that most of the delta
>>> was
>>>                  related to the so-called
>>>                  pre-gc/vmop "cleanup" phase when various book-keeping
>>>         activities
>>>                  are performed,
>>>                  and more specifically in the portion that walks java
>>> thread
>>>                  stacks single-threaded (!)
>>>                  and updates the hotness counters for the active
>>>         nmethods. This
>>>                  code appears to
>>>                  be new to JDK 8 (in jdk 7 one would walk the stacks
>>>         only during
>>>                  code cache sweeps).
>>>
>>>                  I have two questions:
>>>                  (1) has anyone else (typically, I'd expect applications
>>>         with
>>>                  many hundreds or thousands of threads)
>>>                  noticed this regression?
>>>                  (2) Can we do better, for example, by:
>>>                         (a) doing these updates by walking thread stacks
>>> in
>>>                  multiple worker threads in parallel, or best of all:
>>>                         (b) doing these updates when we walk the thread
>>>         stacks
>>>                  during GC, and skipping this phase entirely
>>>                               for non-GC safepoints (with attendant loss
>>> in
>>>                  frequency of this update in low GC frequency
>>>                               scenarios).
>>>
>>>                  It seems kind of silly to do GC's with many multiple
>>> worker
>>>                  threads, but do these thread stack
>>>                  walks single-threaded when it is embarrasingly parallel
>>>         (one
>>>                  could predicate the parallelization
>>>                  based on the measured stack sizes and thread
>>> population, if
>>>                  there was concern on the ovrhead of
>>>                  activating and deactivating the thread gangs for the
>>> work).
>>>
>>>                  A followup question: Any guesses as to how code cache
>>>                  sweep/eviction quality might be compromised if one
>>>                  were to dispense with these hotness updates entirely
>>>         (or at a
>>>                  much reduced frequency), as a temporary
>>>                  workaround to the performance problem?
>>>
>>>                  Thoughts/Comments? In particular, has this issue been
>>>         addressed
>>>                  perhaps in newer JVMs?
>>>
>>>                  Thanks for any comments, feedback, pointers!
>>>                  -- ramki
>>>
>>>                  PS: for comparison, here's data with
>>>         +TraceSafepointCleanup from
>>>                  JDK 7 (first, where this isn't done)
>>>                  vs JDK 8 (where this is done) with a program that has a
>>> few
>>>                  thousands of threads:
>>>
>>>
>>>
>>>                  JDK 7:
>>>                  ..
>>>                  2827.308: [sweeping nmethods, 0.0000020 secs]
>>>                  2828.679: [sweeping nmethods, 0.0000030 secs]
>>>                  2829.984: [sweeping nmethods, 0.0000030 secs]
>>>                  2830.956: [sweeping nmethods, 0.0000030 secs]
>>>                  ..
>>>
>>>                  JDK 8:
>>>                  ..
>>>                  7368.634: [mark nmethods, 0.0177030 secs]
>>>                  7369.587: [mark nmethods, 0.0178305 secs]
>>>                  7370.479: [mark nmethods, 0.0180260 secs]
>>>                  7371.503: [mark nmethods, 0.0186494 secs]
>>>                  ..
>>>
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/79dac5cc/attachment-0001.html>

From vitalyd at gmail.com  Fri Jul 31 22:10:51 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 31 Jul 2015 18:10:51 -0400
Subject: Cost of single-threaded nmethod hotness updates at each safepoint
	(in JDK 8)
In-Reply-To: <CABzyjymUoEDLOfF74nEzVOOPg7ETSXYpRQ=SO3suzVWFW=sDkQ@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
	<CABzyjy==w=riDMkGWk3FEmct-WaX91z=W2+fZk2eYAhHKCZ0gw@mail.gmail.com>
	<20150731234445.0000459c.ecki@zusammenkunft.net>
	<CABzyjymUoEDLOfF74nEzVOOPg7ETSXYpRQ=SO3suzVWFW=sDkQ@mail.gmail.com>
Message-ID: <CAHjP37Ev=jmtfJxVfnBeYdv0Gz5bDPC2i2a7KJ++SNnMFEuRaQ@mail.gmail.com>

Yes, I misremembered thinking turning off tiered disabled sweeps at
safepoints; sorry for the noise.

sent from my phone
On Jul 31, 2015 6:04 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

>
> Hi Bernd --
>
> It doesn't seem coupled to GC; here's an example snapshot:
>
> sun.ci.codeCacheCapacity=13041664
> sun.ci.codeCacheMaxCapacity=50331648
> sun.ci.codeCacheMethodsReclaimedNum=4281
> sun.ci.codeCacheSweepsTotalNum=409
> sun.ci.codeCacheSweepsTotalTimeMillis=1541
> sun.ci.codeCacheUsed=10864704
>
> sun.gc.collector.0.invocations=6319
> sun.gc.collector.1.invocations=6
>
> BTW, to a question of Vitaly's on this thread earlier, this code executes
> irrespective of whether you are running tiered or not (the above is from
> tiered explictly off  -- earlier ones were with tiered on --, yet note the
> excessive time in the stack walk as part of this code in JDK 8):
>
> [mark nmethods, 0.0171828 secs]
>
> On a similar note (and to Vladimir's earlier note of turning off code
> cache flushing) and with the understanding that there were lots of changes
> related to tiered compilation and code cache flushing between 7 ad 8,
> turning on Tiered in 7 leads to the occasional (rare) long safepoint for
> the stack walks, but else not.
>
> And finally the same issue must exist in 9 as well, albeit based on code
> inspection, not running with JDK 9 yet.
> Have a good weekend!
> -- ramki
>
> On Fri, Jul 31, 2015 at 2:44 PM, Bernd Eckenfels <ecki at zusammenkunft.net>
> wrote:
>
>> Am Fri, 31 Jul 2015 12:07:18 -0700
>> schrieb Srinivas Ramakrishna <ysr1729 at gmail.com>:
>>
>> > sun.ci.codeCacheSweepsTotalNum=58
>> ...
>> > Notice that the code cache usage is less that 35 MB, for the 240 MB
>> > capacity, yet it seems we have had 58 sweeps already
>>
>> I would also be interested in what causes this. Is this caused by
>> System.gc maybe? (we do see sweeps and decreasing code cache usage on
>> systems where no pressure should exist).
>>
>> Gruss
>> Bernd
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/3f5bf99a/attachment.html>

From ecki at zusammenkunft.net  Fri Jul 31 21:44:45 2015
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Fri, 31 Jul 2015 23:44:45 +0200
Subject: Cost of single-threaded nmethod hotness updates at each
	safepoint (in JDK 8)
In-Reply-To: <CABzyjy==w=riDMkGWk3FEmct-WaX91z=W2+fZk2eYAhHKCZ0gw@mail.gmail.com>
References: <CABzyjy=GHQ7aDyZyJg_4QgpqWuFkEPmwErq2pkLmFr7CRL8X6Q@mail.gmail.com>
	<CAHjP37FHJq_wEy3jak7qPnR38C2xTXHi+BgA+xk56aV+M=dpsQ@mail.gmail.com>
	<CABzyjyn1GaD7VKj-Lskn4UTZcJdckt_iqaSrOgWMxWh-7ehEOQ@mail.gmail.com>
	<55BBC1C0.3030709@oracle.com>
	<CABzyjy=UmTx9+HCTAxZ_R27fWiOcBqAxk_xLRioBJN=ZSJmGyQ@mail.gmail.com>
	<CABzyjy==w=riDMkGWk3FEmct-WaX91z=W2+fZk2eYAhHKCZ0gw@mail.gmail.com>
Message-ID: <20150731234445.0000459c.ecki@zusammenkunft.net>

Am Fri, 31 Jul 2015 12:07:18 -0700
schrieb Srinivas Ramakrishna <ysr1729 at gmail.com>:

> sun.ci.codeCacheSweepsTotalNum=58
...
> Notice that the code cache usage is less that 35 MB, for the 240 MB
> capacity, yet it seems we have had 58 sweeps already

I would also be interested in what causes this. Is this caused by
System.gc maybe? (we do see sweeps and decreasing code cache usage on
systems where no pressure should exist).

Gruss
Bernd