From david.holmes at oracle.com  Thu May  4 01:13:25 2017
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 4 May 2017 11:13:25 +1000
Subject: jdk10/jdk10 is broken on 32-bit linux
Message-ID: <1810b41c-28cd-1c00-4e07-939286428348@oracle.com>

With the latest changes in jdk10/jdk10 we are seeing failures of all 
jtreg agentvm mode tests on 32-bit linux binaries due to socket 
connection failures. And also some OSX failures.

This seems to be have been caused by:

8165437: Evaluate the use of gettimeofday in Networking code

http://hg.openjdk.java.net/jdk10/jdk10/jdk/rev/7cdde79d6a46

due to a truncation issue through using long instead of jlong.

I've notified net-dev and will file a P1 bug to either have this fixed 
or backed out.

David
-----

From john_platts at hotmail.com  Fri May  5 03:04:04 2017
From: john_platts at hotmail.com (John Platts)
Date: Fri, 5 May 2017 03:04:04 +0000
Subject: Add support for Unicode versions of JNI_CreateJavaVM and
 JNI_GetDefaultJavaVMInitArgs on Windows platforms
Message-ID: <CY1PR15MB01533B5DD08C21DA67FF69AF9DEA0@CY1PR15MB0153.namprd15.prod.outlook.com>

The JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods in the JNI invocation API expect ANSI strings on Windows platforms instead of Unicode-encoded strings. This is an issue on Windows-based platforms since some of the option strings that are passed into JNI_CreateJavaVM might contain Unicode characters that are not in the ANSI encoding on Windows platforms.


There is support for UTF-16 literals on Windows platforms with wchar_t and wide character literals prefixed with the L prefix, and on platforms that support C11 and C++11 with char16_t and UTF-16 character literals that are prefixed with the u prefix.


jchar is currently defined to be a typedef for unsigned short on all platforms, but char16_t is a separate type and not a typedef for unsigned short or jchar in C++11 and later. jchar should be changed to be a typedef for wchar_t on Windows platforms and to be a typedef for char16_t on non-Windows platforms that support the char16_t type. This change will make it possible to define jchar character and string literals on Windows platforms and on non-Windows platforms that support the C11 or C++11 standard.


The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on Windows:

#define JCHAR_LITERAL(x) L ## x


The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on non-Windows platforms:

#define JCHAR_LITERAL(x) u ## x


Here is how the Unicode version of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs could be defined:

typedef struct JavaVMUnicodeOption {
    const jchar *optionString;  /* the option as a string in UTF-16 encoding */
    void *extraInfo;
} JavaVMUnicodeOption;

typedef struct JavaVMUnicodeInitArgs {
    jint version;
    jint nOptions;
    JavaVMUnicodeOption *options;
    jboolean ignoreUnrecognized;
} JavaVMUnicodeInitArgs;

jint JNI_CreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args);
jint JNI_GetDefaultJavaVMInitArgs(void *args);

The java.exe wrapper should use wmain instead of main on Windows platforms, and the javaw.exe wrapper should use wWinMain instead of WinMain on Windows platforms. This change, along with the support for Unicode-enabled version of the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods, would allow the JVM to be launched with arguments that contain Unicode characters that are not in the platform-default encoding.

All of the Windows platforms that Java SE 10 and later VMs would be supported on do support Unicode. Adding support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs will allow Unicode characters that are not in the platform-default encoding on Windows platforms to be supported in command-line arguments that are passed to the JVM.

From david.holmes at oracle.com  Fri May  5 04:07:22 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 5 May 2017 14:07:22 +1000
Subject: Add support for Unicode versions of JNI_CreateJavaVM and
 JNI_GetDefaultJavaVMInitArgs on Windows platforms
In-Reply-To: <CY1PR15MB01533B5DD08C21DA67FF69AF9DEA0@CY1PR15MB0153.namprd15.prod.outlook.com>
References: <CY1PR15MB01533B5DD08C21DA67FF69AF9DEA0@CY1PR15MB0153.namprd15.prod.outlook.com>
Message-ID: <ad81723e-5965-b5be-1834-28073dca03c2@oracle.com>

Hi John,

The JNI is defined to use Modified UTF-8 format for strings, so any 
Unicode character should be handled if passed in in the right format. 
Updating the JNI specification and implementation to accept UTF-16 
directly would be a major undertaking.

Is the issue here that you want a tool, like the java launcher, to 
accept arbitrary Unicode strings in a end-user friendly manner and then 
have it perform the modified UTF-8 conversion when invoking the VM?

Can you give a concrete example of what you would like to be able to 
pass as arguments to the JVM?

Thanks,
David

On 5/05/2017 1:04 PM, John Platts wrote:
> The JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods in the JNI invocation API expect ANSI strings on Windows platforms instead of Unicode-encoded strings. This is an issue on Windows-based platforms since some of the option strings that are passed into JNI_CreateJavaVM might contain Unicode characters that are not in the ANSI encoding on Windows platforms.
>
>
> There is support for UTF-16 literals on Windows platforms with wchar_t and wide character literals prefixed with the L prefix, and on platforms that support C11 and C++11 with char16_t and UTF-16 character literals that are prefixed with the u prefix.
>
>
> jchar is currently defined to be a typedef for unsigned short on all platforms, but char16_t is a separate type and not a typedef for unsigned short or jchar in C++11 and later. jchar should be changed to be a typedef for wchar_t on Windows platforms and to be a typedef for char16_t on non-Windows platforms that support the char16_t type. This change will make it possible to define jchar character and string literals on Windows platforms and on non-Windows platforms that support the C11 or C++11 standard.
>
>
> The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on Windows:
>
> #define JCHAR_LITERAL(x) L ## x
>
>
> The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on non-Windows platforms:
>
> #define JCHAR_LITERAL(x) u ## x
>
>
> Here is how the Unicode version of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs could be defined:
>
> typedef struct JavaVMUnicodeOption {
>     const jchar *optionString;  /* the option as a string in UTF-16 encoding */
>     void *extraInfo;
> } JavaVMUnicodeOption;
>
> typedef struct JavaVMUnicodeInitArgs {
>     jint version;
>     jint nOptions;
>     JavaVMUnicodeOption *options;
>     jboolean ignoreUnrecognized;
> } JavaVMUnicodeInitArgs;
>
> jint JNI_CreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args);
> jint JNI_GetDefaultJavaVMInitArgs(void *args);
>
> The java.exe wrapper should use wmain instead of main on Windows platforms, and the javaw.exe wrapper should use wWinMain instead of WinMain on Windows platforms. This change, along with the support for Unicode-enabled version of the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods, would allow the JVM to be launched with arguments that contain Unicode characters that are not in the platform-default encoding.
>
> All of the Windows platforms that Java SE 10 and later VMs would be supported on do support Unicode. Adding support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs will allow Unicode characters that are not in the platform-default encoding on Windows platforms to be supported in command-line arguments that are passed to the JVM.
>

From aph at redhat.com  Fri May  5 17:10:17 2017
From: aph at redhat.com (Andrew Haley)
Date: Fri, 5 May 2017 18:10:17 +0100
Subject: RFR: 8179701: AArch64: Reinstate FP as an allocatable register
Message-ID: <4b03ad5f-3b42-9111-a77b-7cfc6b340f1c@redhat.com>

http://cr.openjdk.java.net/~aph/8179701/

OK?

Andrew.

From joe.darcy at oracle.com  Fri May  5 22:35:57 2017
From: joe.darcy at oracle.com (joe darcy)
Date: Fri, 5 May 2017 15:35:57 -0700
Subject: Coming soon: CSR review for JDK 10 API and interface changes
Message-ID: <efff0d24-6606-0e3f-2312-b6d28005e80f@oracle.com>

Hello,

As has been in the works recently [1], the "Compatibility & 
Specification Review" process (CSR) is coming to JDK 10 soon. The CSR 
process is a replacement for the long-running JDK-internal CCC process.

A sampling of JDK 9 CCC requests have been screened and imported to a 
temporary CCC project in JBS:

     https://bugs.openjdk.java.net/issues/?jql=project%20%3D%20ccc

More detail on the imported issues can be found in the CSR discussion 
list [2] and the CSR wiki page discusses motivation for the process 
along with other supporting information. [3]

Please look over the imported issues to get sense for what the CSR is 
looking for. Assuming no show-stopper issues are found with the CSR 
issue type, I'd like to start using the CSR process to review JDK 10 API 
and other interfaces changes shortly after May 12, 2017.

Thanks,

-Joe

[1] 
http://mail.openjdk.java.net/pipermail/gb-discuss/2017-January/000320.html

[2] http://mail.openjdk.java.net/pipermail/csr-discuss/2017-May/000025.html

[3] https://wiki.openjdk.java.net/display/csr/Main


From david.holmes at oracle.com  Mon May  8 00:47:08 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 8 May 2017 10:47:08 +1000
Subject: Add support for Unicode versions of JNI_CreateJavaVM and
 JNI_GetDefaultJavaVMInitArgs on Windows platforms
In-Reply-To: <SN1PR15MB015994A46800F081B2D0E6429DEE0@SN1PR15MB0159.namprd15.prod.outlook.com>
References: <CY1PR15MB01533B5DD08C21DA67FF69AF9DEA0@CY1PR15MB0153.namprd15.prod.outlook.com>
 <ad81723e-5965-b5be-1834-28073dca03c2@oracle.com>
 <SN1PR15MB015994A46800F081B2D0E6429DEE0@SN1PR15MB0159.namprd15.prod.outlook.com>
Message-ID: <f46846c3-44f0-9f17-2f9f-e549c56eb96e@oracle.com>

Added back jdk10-dev as a bcc.

Added hotspot-dev and core-libs-dev (for launcher) for follow up 
discussions.

Hi John,

On 8/05/2017 10:33 AM, John Platts wrote:
> I actually did a search through the code that implements
> JNI_CreateJavaVM, and I found that the conversion of the strings is done
> using java_lang_String::create_from_platform_dependent_str, which
> converts from the platform-default encoding to Unicode. In the case of
> Windows-based platforms, the conversion is done based on the ANSI
> character encoding instead of UTF-8 or Modified UTF-8.
>
>
> The platform encoding detection logic on Windows is implemented
> java_props_md.c, which can be found at
> jdk/src/windows/native/java/lang/java_props_md.c in releases prior to
> JDK 9 and at src/java.base/windows/native/libjava/java_props_md.c in JDK
> 9 and later. The encoding used for command-line arguments passed into
> the JNI invocation API is Cp1252 for English locales on Windows
> platforms, and not Modified UTF-8 or UTF-8.
>
>
> The documentation found
> at http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html also
> states that the strings passed into JNI_CreateJavaVM are in the
> platform-default encoding.

Thanks for the additional details. I assume you are referring to:

typedef struct JavaVMOption {
     char *optionString;  /* the option as a string in the default 
platform encoding */

that comment should not form part of the specification as it is 
non-normative text. If the intent is truly to use the platform default 
encoding and not UTF-8 then that should be very clearly spelt out in the 
spec!

That said, the implementation is following this so it is a limitation. I 
suspect this is historical.

> A version of JNI_CreateJavaVM that takes UTF-16-encoded strings should
> be added to the JNI Invocation API. The java.exe launchers and javaw.exe
> launchers should also be updated to use the UTF-16 version of the
> JNI_CreateJavaVM function on Windows platforms and to use wmain and
> wWinMain instead of main and WinMain.

Why versions for UTF-16 instead of the missing UTF-8 variants? As I said 
the whole spec is intended to be based around UTF-8 so we would not want 
to throw in just a couple of UTF-16 based usages.

Thanks,
David

>
> A few files in HotSpot would need to be changed in order to implement
> the UTF-16 version of JNI_CreateJavaVM, but the change would improve
> consistency across different locales on Windows platforms and allow
> arguments that contain Unicode characters that are not available in the
> platform-default encoding to be passed into the JVM on the command line.
>
>
> The UTF-16-based version of JNI_CreateJavaVM also makes it easier to
> allocate string objects that contain non-ASCII characters as the strings
> are already in UTF-16 format, at least in cases where the strings
> contain Unicode characters that are not in Latin-1 or on VMs that do not
> support compact Latin-1 strings.
>
>
> The UTF-16-based version of JNI_CreateJavaVM should probably be
> implemented as a separate function so that the solution could be
> backported to JDK 8 and JDK 9 updates and so that backwards
> compatibility with the current JNI_CreateJavaVM implementation is
> maintained.
>
>
> Here is what the new UTF-16-based API might look like:
>
> typedef struct JavaVMInitArgs_UTF16 {
>     jint version;
>     jint nOptions;
>     JavaVMOptionUTF16 *options;
>     jboolean ignoreUnrecognized;
> } JavaVMInitArgs;
>
>
> typedef struct JavaVMOption_UTF16 {
>     char *optionString;  /* the option as a string in the default
> platform encoding */
>     void *extraInfo;
> } JavaVMOptionUTF16;
>
> /* vm_args is an pointer to a JavaVMInitArgs_UTF16 structure */
>
> jint JNI_CreateJavaVM_UTF16(JavaVM **p_vm, void **p_env, void *vm_args);
>
>
> /* vm_args is a pointer to a JavaVMInitArgs_UTF16 structure */
>
> jint JNI_GetDefaultJavaVMInitArgs_UTF16(void *vm_args);
>
> ------------------------------------------------------------------------
> *From:* David Holmes <david.holmes at oracle.com>
> *Sent:* Thursday, May 4, 2017 11:07 PM
> *To:* John Platts; jdk10-dev at openjdk.java.net
> *Subject:* Re: Add support for Unicode versions of JNI_CreateJavaVM and
> JNI_GetDefaultJavaVMInitArgs on Windows platforms
>
> Hi John,
>
> The JNI is defined to use Modified UTF-8 format for strings, so any
> Unicode character should be handled if passed in in the right format.
> Updating the JNI specification and implementation to accept UTF-16
> directly would be a major undertaking.
>
> Is the issue here that you want a tool, like the java launcher, to
> accept arbitrary Unicode strings in a end-user friendly manner and then
> have it perform the modified UTF-8 conversion when invoking the VM?
>
> Can you give a concrete example of what you would like to be able to
> pass as arguments to the JVM?
>
> Thanks,
> David
>
> On 5/05/2017 1:04 PM, John Platts wrote:
>> The JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods in the JNI invocation API expect ANSI strings on Windows platforms instead of Unicode-encoded strings. This is an issue on Windows-based platforms since some of the option strings that are passed into JNI_CreateJavaVM might contain Unicode characters that are not in
> the ANSI encoding on Windows platforms.
>>
>>
>> There is support for UTF-16 literals on Windows platforms with wchar_t and wide character literals prefixed with the L prefix, and on platforms that support C11 and C++11 with char16_t and UTF-16 character literals that are prefixed with the u prefix.
>>
>>
>> jchar is currently defined to be a typedef for unsigned short on all platforms, but char16_t is a separate type and not a typedef for unsigned short or jchar in C++11 and later. jchar should be changed to be a typedef for wchar_t on Windows platforms and to be a typedef for char16_t on non-Windows platforms that support the
> char16_t type. This change will make it possible to define jchar
> character and string literals on Windows platforms and on non-Windows
> platforms that support the C11 or C++11 standard.
>>
>>
>> The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on Windows:
>>
>> #define JCHAR_LITERAL(x) L ## x
>>
>>
>> The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on non-Windows platforms:
>>
>> #define JCHAR_LITERAL(x) u ## x
>>
>>
>> Here is how the Unicode version of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs could be defined:
>>
>> typedef struct JavaVMUnicodeOption {
>>     const jchar *optionString;  /* the option as a string in UTF-16 encoding */
>>     void *extraInfo;
>> } JavaVMUnicodeOption;
>>
>> typedef struct JavaVMUnicodeInitArgs {
>>     jint version;
>>     jint nOptions;
>>     JavaVMUnicodeOption *options;
>>     jboolean ignoreUnrecognized;
>> } JavaVMUnicodeInitArgs;
>>
>> jint JNI_CreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args);
>> jint JNI_GetDefaultJavaVMInitArgs(void *args);
>>
>> The java.exe wrapper should use wmain instead of main on Windows platforms, and the javaw.exe wrapper should use wWinMain instead of WinMain on Windows platforms. This change, along with the support for Unicode-enabled version of the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods, would allow the JVM to be
> launched with arguments that contain Unicode characters that are not in
> the platform-default encoding.
>>
>> All of the Windows platforms that Java SE 10 and later VMs would be supported on do support Unicode. Adding support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs will allow Unicode characters that are not in the platform-default encoding on Windows platforms to be supported in command-line arguments
> that are passed to the JVM.
>>

From rwestrel at redhat.com  Tue May  9 09:04:09 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 09 May 2017 11:04:09 +0200
Subject: RFR: 8179701: AArch64: Reinstate FP as an allocatable register
In-Reply-To: <4b03ad5f-3b42-9111-a77b-7cfc6b340f1c@redhat.com>
References: <4b03ad5f-3b42-9111-a77b-7cfc6b340f1c@redhat.com>
Message-ID: <dk61sryqu46.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~aph/8179701/
>
> OK?

Looks good.

Roland.

From aph at redhat.com  Tue May  9 17:20:20 2017
From: aph at redhat.com (Andrew Haley)
Date: Tue, 9 May 2017 18:20:20 +0100
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
Message-ID: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>

[My apologies to everyone: apparently I have to ask about JDK 10 as
well.]

I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.

Andrew co-wrote the original AArch64 port.  He is the author of 15
patches in the HotSpot sources, but that does not reflect the true
extent of his contribution because he is the author of 341 patches in
the aarch64-port project which I merged into OpenJDK.  His
considerable expertise, particularly with the C2 compiler, will be of
great value to the project.

Votes are due by 23 May, 2017.

Only current Open|JDK 10 Reviewers [1] are eligible to vote
on this nomination.  Votes must be cast in the open by replying
to this mailing list.

For Three-Vote Consensus voting instructions, see [2].

Andrew Haley.


[1] http://openjdk.java.net/census
[2] http://openjdk.java.net/projects/#reviewer-vote

From claes.redestad at oracle.com  Tue May  9 17:23:17 2017
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 9 May 2017 19:23:17 +0200
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <0f9254e8-892c-c6ee-6982-278ce8d5d52f@oracle.com>

Vote: yes

On 2017-05-09 19:20, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
>
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
>
> Votes are due by 23 May, 2017.
>
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
>
> For Three-Vote Consensus voting instructions, see [2].
>
> Andrew Haley.
>
>
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote


From ashipile at redhat.com  Tue May  9 17:27:10 2017
From: ashipile at redhat.com (Aleksey Shipilev)
Date: Tue, 9 May 2017 19:27:10 +0200
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <92ac3209-652b-cf28-068f-24dbb0ce288d@redhat.com>

Vote: yes

-Aleksey

On 05/09/2017 07:20 PM, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
> 
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
> 
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
> 
> Votes are due by 23 May, 2017.
> 
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
> 
> For Three-Vote Consensus voting instructions, see [2].
> 
> Andrew Haley.
> 
> 
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote
> 


From mandy.chung at oracle.com  Tue May  9 17:27:48 2017
From: mandy.chung at oracle.com (Mandy Chung)
Date: Tue, 9 May 2017 10:27:48 -0700
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <C7B43B7F-8B86-49AD-A5B6-F7167C2E7D60@oracle.com>

Vote: yes

Mandy


From zgu at redhat.com  Tue May  9 17:28:51 2017
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 9 May 2017 13:28:51 -0400
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <0f9254e8-892c-c6ee-6982-278ce8d5d52f@oracle.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
 <0f9254e8-892c-c6ee-6982-278ce8d5d52f@oracle.com>
Message-ID: <69635cc2-e441-41b8-9787-bee3a8ad42d8@redhat.com>

Vote: yes

-Zhengyu

On 05/09/2017 01:23 PM, Claes Redestad wrote:
> Vote: yes
>
> On 2017-05-09 19:20, Andrew Haley wrote:
>> [My apologies to everyone: apparently I have to ask about JDK 10 as
>> well.]
>>
>> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>>
>> Andrew co-wrote the original AArch64 port.  He is the author of 15
>> patches in the HotSpot sources, but that does not reflect the true
>> extent of his contribution because he is the author of 341 patches in
>> the aarch64-port project which I merged into OpenJDK.  His
>> considerable expertise, particularly with the C2 compiler, will be of
>> great value to the project.
>>
>> Votes are due by 23 May, 2017.
>>
>> Only current Open|JDK 10 Reviewers [1] are eligible to vote
>> on this nomination.  Votes must be cast in the open by replying
>> to this mailing list.
>>
>> For Three-Vote Consensus voting instructions, see [2].
>>
>> Andrew Haley.
>>
>>
>> [1] http://openjdk.java.net/census
>> [2] http://openjdk.java.net/projects/#reviewer-vote
>

From coleen.phillimore at oracle.com  Tue May  9 17:29:58 2017
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Tue, 9 May 2017 13:29:58 -0400
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <9cfc2ff1-2cdb-4505-203a-8d49f7819c7c@oracle.com>

Vote: yes

On 5/9/17 1:20 PM, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
>
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
>
> Votes are due by 23 May, 2017.
>
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
>
> For Three-Vote Consensus voting instructions, see [2].
>
> Andrew Haley.
>
>
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote


From vladimir.kozlov at oracle.com  Tue May  9 17:30:06 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 9 May 2017 10:30:06 -0700
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <c3dff315-ce2b-d933-2f65-1e113d5e2bf9@oracle.com>

Vote: yes

On 5/9/17 10:20 AM, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
>
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
>
> Votes are due by 23 May, 2017.
>
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
>
> For Three-Vote Consensus voting instructions, see [2].
>
> Andrew Haley.
>
>
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote
>

From tobias.hartmann at oracle.com  Tue May  9 17:30:48 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 9 May 2017 19:30:48 +0200
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <86f2d2f8-5be1-7e81-472c-03b1a36a9680@oracle.com>

Vote: yes

Best regards,
Tobias

On 09.05.2017 19:20, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
> 
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
> 
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
> 
> Votes are due by 23 May, 2017.
> 
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
> 
> For Three-Vote Consensus voting instructions, see [2].
> 
> Andrew Haley.
> 
> 
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote
> 

From paul.sandoz at oracle.com  Tue May  9 17:33:46 2017
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 9 May 2017 10:33:46 -0700
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <7A49A671-A2A6-497E-8C16-0F81C1B206BC@oracle.com>

Vote: yes

Paul.

From Roger.Riggs at Oracle.com  Tue May  9 17:55:18 2017
From: Roger.Riggs at Oracle.com (Roger Riggs)
Date: Tue, 9 May 2017 13:55:18 -0400
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <e9aee79f-5726-86fe-a4f9-592cd480343c@Oracle.com>

Vote: yes

On 5/9/2017 1:20 PM, Andrew Haley wrote:
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.

From daniel.fuchs at oracle.com  Tue May  9 18:27:00 2017
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Tue, 9 May 2017 19:27:00 +0100
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <4ec6070d-1b2f-2ea6-f74e-2d788ddf0a68@oracle.com>

Vote: yes

-- daniel

On 09/05/17 18:20, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
> 
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.


From david.holmes at oracle.com  Tue May  9 21:21:11 2017
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 10 May 2017 07:21:11 +1000
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <f5d6597d-6d9d-e246-2ece-804cd697ecec@oracle.com>

Vote: yes

David

On 10/05/2017 3:20 AM, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
>
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
>
> Votes are due by 23 May, 2017.
>
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
>
> For Three-Vote Consensus voting instructions, see [2].
>
> Andrew Haley.
>
>
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote
>

From serguei.spitsyn at oracle.com  Tue May  9 22:28:59 2017
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 9 May 2017 15:28:59 -0700
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <d997f8d8-875e-6b7d-41c1-3dae29f633bf@oracle.com>

Vote: yes

From kim.barrett at oracle.com  Wed May 10 07:21:37 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 10 May 2017 03:21:37 -0400
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <2BB55CAD-830B-4EC8-90B3-B36E75AE90E9@oracle.com>

vote: yes


> On May 9, 2017, at 1:20 PM, Andrew Haley <aph at redhat.com> wrote:
> 
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
> 
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
> 
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
> 
> Votes are due by 23 May, 2017.
> 
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
> 
> For Three-Vote Consensus voting instructions, see [2].
> 
> Andrew Haley.
> 
> 
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote


From thomas.stuefe at gmail.com  Wed May 10 08:57:11 2017
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 10 May 2017 10:57:11 +0200
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <CAA-vtUyWxAGV7v29v1sY6z-ZVzfhuixEv0v56RjY4JqCR4GGNw@mail.gmail.com>

Vote: yes

On Tue, May 9, 2017 at 7:20 PM, Andrew Haley <aph at redhat.com> wrote:

> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
>
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
>
> Votes are due by 23 May, 2017.
>
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
>
> For Three-Vote Consensus voting instructions, see [2].
>
> Andrew Haley.
>
>
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote
>

From omajid at redhat.com  Wed May 10 14:45:04 2017
From: omajid at redhat.com (Omair Majid)
Date: Wed, 10 May 2017 10:45:04 -0400
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <20170510144504.GC10767@redhat.com>

Vote: Yes

* Andrew Haley <aph at redhat.com> [2017-05-09 13:21]:
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.

Thanks,
Omair

-- 
PGP Key: 66484681 (http://pgp.mit.edu/)
Fingerprint = F072 555B 0A17 3957 4E95  0056 F286 F14F 6648 4681

From vladimir.x.ivanov at oracle.com  Wed May 10 15:06:40 2017
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 10 May 2017 18:06:40 +0300
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <8025768d-d24d-22bc-a5e3-cd1bcbad380c@oracle.com>

Vote: yes

Best regards,
Vladimir Ivanov

On 5/9/17 8:20 PM, Andrew Haley wrote:
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.

From rwestrel at redhat.com  Thu May 11 07:07:24 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 11 May 2017 09:07:24 +0200
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <dk6inl7q3bn.fsf@rwestrel.remote.csb>


Vote: yes

Roland.

From peter.levart at gmail.com  Thu May 11 21:13:16 2017
From: peter.levart at gmail.com (Peter Levart)
Date: Thu, 11 May 2017 23:13:16 +0200
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <a3b7f242-79d0-b244-397d-dadb39409b3f@gmail.com>

Vote: yes

Regards, Peter

On 05/09/2017 07:20 PM, Andrew Haley wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
>
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
>
> Votes are due by 23 May, 2017.
>
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
>
> For Three-Vote Consensus voting instructions, see [2].
>
> Andrew Haley.
>
>
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote


From christian.tornqvist at oracle.com  Fri May 12 16:47:04 2017
From: christian.tornqvist at oracle.com (Christian Tornqvist)
Date: Fri, 12 May 2017 09:47:04 -0700
Subject: RFR(XS): 8180304 - Add tests to ProblemList that fails on Windows
 when running with subst or different drive than source code is on.
Message-ID: <32E07ACF-9025-4978-AEC3-0BCDA12EE921@oracle.com>

Hi everyone,

Please review this small change that adds a number of JDK and Langtools tests to ProblemList. They all fail on Windows when running with a jtreg workdir that is either on a drive that has been created using subst or when the source code for the tests are on a different drive than the workdir.

Webrev: http://cr.openjdk.java.net/~ctornqvi/webrev/8180304/ <http://cr.openjdk.java.net/~ctornqvi/webrev/8180304/> 

Thanks,
Christian

From george.triantafillou at oracle.com  Fri May 12 16:49:34 2017
From: george.triantafillou at oracle.com (George Triantafillou)
Date: Fri, 12 May 2017 12:49:34 -0400
Subject: RFR(XS): 8180304 - Add tests to ProblemList that fails on Windows
 when running with subst or different drive than source code is on.
In-Reply-To: <32E07ACF-9025-4978-AEC3-0BCDA12EE921@oracle.com>
References: <32E07ACF-9025-4978-AEC3-0BCDA12EE921@oracle.com>
Message-ID: <d857c468-abbe-255f-86f6-5d6c291e4306@oracle.com>

Hi Christian,

Looks good.

-George

On 5/12/2017 12:47 PM, Christian Tornqvist wrote:
> Hi everyone,
>
> Please review this small change that adds a number of JDK and Langtools tests to ProblemList. They all fail on Windows when running with a jtreg workdir that is either on a drive that has been created using subst or when the source code for the tests are on a different drive than the workdir.
>
> Webrev: http://cr.openjdk.java.net/~ctornqvi/webrev/8180304/ <http://cr.openjdk.java.net/~ctornqvi/webrev/8180304/>
>
> Thanks,
> Christian


From kumar.x.srinivasan at oracle.com  Fri May 12 18:54:54 2017
From: kumar.x.srinivasan at oracle.com (Kumar Srinivasan)
Date: Fri, 12 May 2017 11:54:54 -0700
Subject: RFR(XS): 8180304 - Add tests to ProblemList that fails on Windows
 when running with subst or different drive than source code is on.
In-Reply-To: <32E07ACF-9025-4978-AEC3-0BCDA12EE921@oracle.com>
References: <32E07ACF-9025-4978-AEC3-0BCDA12EE921@oracle.com>
Message-ID: <591604FE.4040304@oracle.com>


Looks good.

Kumar

> Hi everyone,
>
> Please review this small change that adds a number of JDK and Langtools tests to ProblemList. They all fail on Windows when running with a jtreg workdir that is either on a drive that has been created using subst or when the source code for the tests are on a different drive than the workdir.
>
> Webrev: http://cr.openjdk.java.net/~ctornqvi/webrev/8180304/ <http://cr.openjdk.java.net/~ctornqvi/webrev/8180304/>
>
> Thanks,
> Christian


From chihiro.ito at oracle.com  Thu May 18 06:22:09 2017
From: chihiro.ito at oracle.com (chihiro ito)
Date: Thu, 18 May 2017 15:22:09 +0900
Subject: RFR: Apply UL to PrintCodeCacheOnCompilation
Message-ID: <591D3D91.9050901@oracle.com>

Hi all,

I apply Unified JVM Logging to log of PrintCodeCacheOnCompilation 
option. Logs which applied this is following.
Could you possibly review for this following small change? If review is 
ok, please commit this as cito.

Sample Log:
[1.370s][debug][compilation,codecache] CodeHeap 'non-profiled nmethods': 
size=120036Kb used=13Kb max_used=13Kb free=120022Kb
[1.372s][debug][compilation,codecache] CodeHeap 'profiled nmethods': 
size=120032Kb used=85Kb max_used=85Kb free=119946Kb
[1.372s][debug][compilation,codecache] CodeHeap 'non-nmethods': 
size=5692Kb used=2648Kb max_used=2655Kb free=3043Kb

Source:
diff --git a/src/share/vm/compiler/compileBroker.cpp 
b/src/share/vm/compiler/compileBroker.cpp
--- a/src/share/vm/compiler/compileBroker.cpp
+++ b/src/share/vm/compiler/compileBroker.cpp
@@ -1726,6 +1726,34 @@
    tty->print("%s", s.as_string());
  }

+// wrapper for CodeCache::print_summary() using outputStream
+static void codecache_print(outputStream* out, bool detailed)
+{
+  ResourceMark rm;
+  stringStream s;
+
+  // Dump code cache into a buffer
+  {
+    MutexLockerEx mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
+    CodeCache::print_summary(&s, detailed);
+  }
+
+  char* summary = s.as_string();
+  char* cr_pos;
+
+  do {
+    cr_pos = strchr(summary, '\n');
+    if (cr_pos != NULL) {
+      *cr_pos = '\0';
+    }
+    if (strlen(summary)!=0) {
+      out->print_cr("%s", summary);
+    }
+
+    summary = cr_pos + 1;
+  } while (cr_pos != NULL);
+}
+
  void CompileBroker::post_compile(CompilerThread* thread, CompileTask* 
task, EventCompilation& event, bool success, ciEnv* ci_env) {

    if (success) {
@@ -1939,6 +1967,10 @@
      tty->print_cr("time: %d inlined: %d bytes", 
(int)time.milliseconds(), task->num_inlined_bytecodes());
    }

+  Log(compilation, codecache) log;
+  if (log.is_debug())
+    codecache_print(log.debug_stream(), /* detailed= */ false);
+
    if (PrintCodeCacheOnCompilation)
      codecache_print(/* detailed= */ false);


Regards,
Chihiro


-- 

Chihiro Ito | Principal Consultant | +81.90.6148.8815
Oracle <http://www.oracle.com> Consultant
ORACLE Japan | Akasaka Center Bldg. | Motoakasaka 1-3-13 | 1070051 
Minato-ku, Tokyo, JAPAN

Oracle is committed to developing practices and products that help 
protect the environment <http://www.oracle.com/commitment>

From david.holmes at oracle.com  Thu May 18 06:40:22 2017
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 18 May 2017 16:40:22 +1000
Subject: RFR: Apply UL to PrintCodeCacheOnCompilation
In-Reply-To: <591D3D91.9050901@oracle.com>
References: <591D3D91.9050901@oracle.com>
Message-ID: <27b3260b-9ba9-134d-76cf-83e7212d1ca0@oracle.com>

Hi Chihiro,

Reviews take place on the mailing list for the area the code change 
relates to - in this case it looks like 
hotspot-compiler-dev at opnejdk.java.net. Please send your RFR over there.

Thanks,
David

On 18/05/2017 4:22 PM, chihiro ito wrote:
> Hi all,
>
> I apply Unified JVM Logging to log of PrintCodeCacheOnCompilation
> option. Logs which applied this is following.
> Could you possibly review for this following small change? If review is
> ok, please commit this as cito.
>
> Sample Log:
> [1.370s][debug][compilation,codecache] CodeHeap 'non-profiled nmethods':
> size=120036Kb used=13Kb max_used=13Kb free=120022Kb
> [1.372s][debug][compilation,codecache] CodeHeap 'profiled nmethods':
> size=120032Kb used=85Kb max_used=85Kb free=119946Kb
> [1.372s][debug][compilation,codecache] CodeHeap 'non-nmethods':
> size=5692Kb used=2648Kb max_used=2655Kb free=3043Kb
>
> Source:
> diff --git a/src/share/vm/compiler/compileBroker.cpp
> b/src/share/vm/compiler/compileBroker.cpp
> --- a/src/share/vm/compiler/compileBroker.cpp
> +++ b/src/share/vm/compiler/compileBroker.cpp
> @@ -1726,6 +1726,34 @@
>    tty->print("%s", s.as_string());
>  }
>
> +// wrapper for CodeCache::print_summary() using outputStream
> +static void codecache_print(outputStream* out, bool detailed)
> +{
> +  ResourceMark rm;
> +  stringStream s;
> +
> +  // Dump code cache into a buffer
> +  {
> +    MutexLockerEx mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
> +    CodeCache::print_summary(&s, detailed);
> +  }
> +
> +  char* summary = s.as_string();
> +  char* cr_pos;
> +
> +  do {
> +    cr_pos = strchr(summary, '\n');
> +    if (cr_pos != NULL) {
> +      *cr_pos = '\0';
> +    }
> +    if (strlen(summary)!=0) {
> +      out->print_cr("%s", summary);
> +    }
> +
> +    summary = cr_pos + 1;
> +  } while (cr_pos != NULL);
> +}
> +
>  void CompileBroker::post_compile(CompilerThread* thread, CompileTask*
> task, EventCompilation& event, bool success, ciEnv* ci_env) {
>
>    if (success) {
> @@ -1939,6 +1967,10 @@
>      tty->print_cr("time: %d inlined: %d bytes",
> (int)time.milliseconds(), task->num_inlined_bytecodes());
>    }
>
> +  Log(compilation, codecache) log;
> +  if (log.is_debug())
> +    codecache_print(log.debug_stream(), /* detailed= */ false);
> +
>    if (PrintCodeCacheOnCompilation)
>      codecache_print(/* detailed= */ false);
>
>
> Regards,
> Chihiro
>
>

From chihiro.ito at oracle.com  Thu May 18 12:53:56 2017
From: chihiro.ito at oracle.com (chihiro ito)
Date: Thu, 18 May 2017 21:53:56 +0900
Subject: RFR: Apply UL to PrintCodeCacheOnCompilation
In-Reply-To: <27b3260b-9ba9-134d-76cf-83e7212d1ca0@oracle.com>
References: <591D3D91.9050901@oracle.com>
 <27b3260b-9ba9-134d-76cf-83e7212d1ca0@oracle.com>
Message-ID: <591D9964.40208@oracle.com>

Hi David

Thank you for your reply. I try to send to hotspot-compiler-dev.

Regards,
Chihiro

On 2017/05/18 15:40, David Holmes wrote:
> Hi Chihiro,
>
> Reviews take place on the mailing list for the area the code change 
> relates to - in this case it looks like 
> hotspot-compiler-dev at opnejdk.java.net. Please send your RFR over there.
>
> Thanks,
> David
>
> On 18/05/2017 4:22 PM, chihiro ito wrote:
>> Hi all,
>>
>> I apply Unified JVM Logging to log of PrintCodeCacheOnCompilation
>> option. Logs which applied this is following.
>> Could you possibly review for this following small change? If review is
>> ok, please commit this as cito.
>>
>> Sample Log:
>> [1.370s][debug][compilation,codecache] CodeHeap 'non-profiled nmethods':
>> size=120036Kb used=13Kb max_used=13Kb free=120022Kb
>> [1.372s][debug][compilation,codecache] CodeHeap 'profiled nmethods':
>> size=120032Kb used=85Kb max_used=85Kb free=119946Kb
>> [1.372s][debug][compilation,codecache] CodeHeap 'non-nmethods':
>> size=5692Kb used=2648Kb max_used=2655Kb free=3043Kb
>>
>> Source:
>> diff --git a/src/share/vm/compiler/compileBroker.cpp
>> b/src/share/vm/compiler/compileBroker.cpp
>> --- a/src/share/vm/compiler/compileBroker.cpp
>> +++ b/src/share/vm/compiler/compileBroker.cpp
>> @@ -1726,6 +1726,34 @@
>>    tty->print("%s", s.as_string());
>>  }
>>
>> +// wrapper for CodeCache::print_summary() using outputStream
>> +static void codecache_print(outputStream* out, bool detailed)
>> +{
>> +  ResourceMark rm;
>> +  stringStream s;
>> +
>> +  // Dump code cache into a buffer
>> +  {
>> +    MutexLockerEx mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>> +    CodeCache::print_summary(&s, detailed);
>> +  }
>> +
>> +  char* summary = s.as_string();
>> +  char* cr_pos;
>> +
>> +  do {
>> +    cr_pos = strchr(summary, '\n');
>> +    if (cr_pos != NULL) {
>> +      *cr_pos = '\0';
>> +    }
>> +    if (strlen(summary)!=0) {
>> +      out->print_cr("%s", summary);
>> +    }
>> +
>> +    summary = cr_pos + 1;
>> +  } while (cr_pos != NULL);
>> +}
>> +
>>  void CompileBroker::post_compile(CompilerThread* thread, CompileTask*
>> task, EventCompilation& event, bool success, ciEnv* ci_env) {
>>
>>    if (success) {
>> @@ -1939,6 +1967,10 @@
>>      tty->print_cr("time: %d inlined: %d bytes",
>> (int)time.milliseconds(), task->num_inlined_bytecodes());
>>    }
>>
>> +  Log(compilation, codecache) log;
>> +  if (log.is_debug())
>> +    codecache_print(log.debug_stream(), /* detailed= */ false);
>> +
>>    if (PrintCodeCacheOnCompilation)
>>      codecache_print(/* detailed= */ false);
>>
>>
>> Regards,
>> Chihiro
>>
>>

-- 

Chihiro Ito | Principal Consultant | +81.90.6148.8815
Oracle <http://www.oracle.com> Consultant
ORACLE Japan | Akasaka Center Bldg. | Motoakasaka 1-3-13 | 1070051 
Minato-ku, Tokyo, JAPAN

Oracle is committed to developing practices and products that help 
protect the environment <http://www.oracle.com/commitment>

From christoph.langer at sap.com  Thu May 18 13:41:02 2017
From: christoph.langer at sap.com (Langer, Christoph)
Date: Thu, 18 May 2017 13:41:02 +0000
Subject: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <67a2157d0ab34eb49240e5dbaece5970@sap.com>

Vote: Yes

> -----Original Message-----
> From: jdk10-dev [mailto:jdk10-dev-bounces at openjdk.java.net] On Behalf
> Of Andrew Haley
> Sent: Dienstag, 9. Mai 2017 19:20
> To: jdk10-dev at openjdk.java.net
> Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
> 
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
> 
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
> 
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
> 
> Votes are due by 23 May, 2017.
> 
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
> 
> For Three-Vote Consensus voting instructions, see [2].
> 
> Andrew Haley.
> 
> 
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote

From volker.simonis at gmail.com  Thu May 18 13:54:00 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Thu, 18 May 2017 15:54:00 +0200
Subject: CFV: New JDK 10 Reviewer: Andrew Dinn
In-Reply-To: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
References: <177a88db-ad04-21da-2fb9-3b9564e3863f@redhat.com>
Message-ID: <CA+3eh11qzzGeJfhaurpqa7ZkppTLX94uLs4LR7y5uq57kUSWfQ@mail.gmail.com>

Vote: yes


On Tue, May 9, 2017 at 7:20 PM, Andrew Haley <aph at redhat.com> wrote:
> [My apologies to everyone: apparently I have to ask about JDK 10 as
> well.]
>
> I hereby nominate Andrew Dinn (adinn) to JDK 10 Reviewer.
>
> Andrew co-wrote the original AArch64 port.  He is the author of 15
> patches in the HotSpot sources, but that does not reflect the true
> extent of his contribution because he is the author of 341 patches in
> the aarch64-port project which I merged into OpenJDK.  His
> considerable expertise, particularly with the C2 compiler, will be of
> great value to the project.
>
> Votes are due by 23 May, 2017.
>
> Only current Open|JDK 10 Reviewers [1] are eligible to vote
> on this nomination.  Votes must be cast in the open by replying
> to this mailing list.
>
> For Three-Vote Consensus voting instructions, see [2].
>
> Andrew Haley.
>
>
> [1] http://openjdk.java.net/census
> [2] http://openjdk.java.net/projects/#reviewer-vote

From joe.darcy at oracle.com  Wed May 24 01:00:07 2017
From: joe.darcy at oracle.com (Joseph D. Darcy)
Date: Tue, 23 May 2017 18:00:07 -0700
Subject: CSR issue type now available in JDK project of JBS for compatibility
 and specification review of JDK 10 changes
Message-ID: <5924DB17.80202@oracle.com>

Hello,

As previewed recently [1], the "Compatibility & Specification Review" 
process (CSR) is now available for JDK 10 changes.

To create a CSR request, from a bug in JBS with a fixVersion of JDK 10 
select More -> Create CSR and then fill in the pre-populated outline in 
the description field and set the other fields as appropriate.

If you have questions about the process, please first read through the 
material on the CSR wiki page:

     https://wiki.openjdk.java.net/display/csr/Main

including the FAQ and then send me email if the question is not already 
covered. Finding an example request for a similar JDK 9 change

     https://bugs.openjdk.java.net/issues/?jql=project%20%3D%20ccc

may provide guidance or a template to follow for a JDK 10 change.

I expect to refine the CSR FAQ and other documentation in the coming 
months as we start using the new process. I also expect some adjustments 
to the process will be made as we break it in.

Thanks,

-Joe

[1] http://mail.openjdk.java.net/pipermail/jdk10-dev/2017-May/000193.html

From adinn at redhat.com  Thu May 25 13:12:53 2017
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 25 May 2017 14:12:53 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
Message-ID: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>

The following webrev fixes a race condition that is present in jdk10 and
also jdk9 and jdk8. It is caused by a misplaced volatile keyword that
faild to ensure correct ordering of writes by the compiler. Reviews welcome.

  http://cr.openjdk.java.net/~adinn/8181085/webrev.00/

Backporting:
This same fix is required in jdk9 and jdk8.

Testing:
The reproducer posted with the original issue manifests the NPE reliably
on jdk8. It does not manifest on jdk9/10 but that is only thanks to
changes introduced into the resolution process in jdk9 which change the
timing of execution. However, without this fix the out-of-order write
problem is still present in jdk9/10, as can be seen by eyeballing the
compiled code for ConstantPoolCacheEntry::set_direct_or_vtable_call.

The patch has been validated on jdk8 by running the reproducer. It stops
any resulting NPEs.

The code for ConstantPoolCacheEntry::set_direct_or_vtable_call on
jdk8-10 has been eyeballed to ensure that post-patch the assignments now
occur in the correct order.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Thu May 25 13:30:26 2017
From: aph at redhat.com (Andrew Haley)
Date: Thu, 25 May 2017 14:30:26 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
Message-ID: <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>

On 25/05/17 14:12, Andrew Dinn wrote:
> The following webrev fixes a race condition that is present in jdk10 and
> also jdk9 and jdk8. It is caused by a misplaced volatile keyword that
> faild to ensure correct ordering of writes by the compiler. Reviews welcome.

Can you explain why we don't need a memory fence there?

Andrew.

From adinn at redhat.com  Thu May 25 13:57:32 2017
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 25 May 2017 14:57:32 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
Message-ID: <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>


On 25/05/17 14:30, Andrew Haley wrote:
> On 25/05/17 14:12, Andrew Dinn wrote:
>> The following webrev fixes a race condition that is present in jdk10 and
>> also jdk9 and jdk8. It is caused by a misplaced volatile keyword that
>> faild to ensure correct ordering of writes by the compiler. Reviews welcome.
> 
> Can you explain why we don't need a memory fence there?

We do need one and we have one.

The assignments executed in the relevant method in cpCache.cpp (i.e.
ConstantPoolCacheEntry::set_direct_or_vtable_call) are

  . . .
  set_f1(method());
  . . .
  set_bytecode_1(invoke_code);
  . . .

If you look at the definition of these two methods they are

  void set_f1(Metadata* f1) {
    Metadata* existing_f1 = (Metadata*)_f1; // read once
    assert(existing_f1 == NULL || existing_f1 == f1, "illegal field
change");
    _f1 = f1;
  }

and

void ConstantPoolCacheEntry::set_bytecode_1(Bytecodes::Code code) {
#ifdef ASSERT
  // Read once.
  volatile Bytecodes::Code c = bytecode_1();
  assert(c == 0 || c == code || code == 0, "update must be consistent");
#endif
  // Need to flush pending stores here before bytecode is written.
  OrderAccess::release_store_ptr(&_indices, _indices | ((u_char)code <<
bytecode_1_shift));

On x86 the release_store_ptr operation just reduces to an assignment of
volatile field _indices. That alone doesn't stop the compiler
re-ordering it before the assignment of f1. Making both fields volatile
does stop them being re-ordered.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Thu May 25 14:07:45 2017
From: aph at redhat.com (Andrew Haley)
Date: Thu, 25 May 2017 15:07:45 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
Message-ID: <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>

On 25/05/17 14:57, Andrew Dinn wrote:
> On x86 the release_store_ptr operation just reduces to an assignment of
> volatile field _indices. That alone doesn't stop the compiler
> re-ordering it before the assignment of f1. Making both fields volatile
> does stop them being re-ordered.

Please bear with me.  We have to set f1 and then bytecode_1.  We do not
want the store to bytecode_1 to move before the store to f1.

OrderAccess::release_store_ptr() should be strong enough to guarantee that,
regardless of whether f1 is volatile or not.
If it's not, there should be a compiler fence in release_store_ptr().

Andrew.

From adinn at redhat.com  Thu May 25 14:25:06 2017
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 25 May 2017 15:25:06 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
Message-ID: <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>

On 25/05/17 15:07, Andrew Haley wrote:
> On 25/05/17 14:57, Andrew Dinn wrote:
>> On x86 the release_store_ptr operation just reduces to an assignment of
>> volatile field _indices. That alone doesn't stop the compiler
>> re-ordering it before the assignment of f1. Making both fields volatile
>> does stop them being re-ordered.
> 
> Please bear with me.  We have to set f1 and then bytecode_1.  We do not
> want the store to bytecode_1 to move before the store to f1.
> 
> OrderAccess::release_store_ptr() should be strong enough to guarantee that,
> regardless of whether f1 is volatile or not.
> If it's not, there should be a compiler fence in release_store_ptr().

On a weak architecture like AArch64 OrderAccess::release_store_ptr()
will be translated to an ordered write. That will ensure that order of
generated store instructions and order of memory system visibility for
those stores reflect source order.

On x86 OrderAccess::release_store_ptr() reduces to a simple write.
That's because TCO means that there is no need to do anything in order
to ensure that /memory visibility/ order respects instruction
generation/execution order.

However, on x86 there most definitely /is/ a need to ensure that the
compiler generates these store instructions in source order. That's why
both fields need to be volatile. A C++ compiler may not re-order
volatile writes.

Yes, C++ volatile pretty much sux doesn't it!

regards,


Andrew Dinn
-----------
Of course I'm respectable. I'm old. Politicians, ugly buildings,
and whores all get respectable if they last long enough.
     --John Huston in "Chinatown."
----------------------------------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Thu May 25 14:38:55 2017
From: aph at redhat.com (Andrew Haley)
Date: Thu, 25 May 2017 15:38:55 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
Message-ID: <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>

On 25/05/17 15:25, Andrew Dinn wrote:
> On 25/05/17 15:07, Andrew Haley wrote:
>> On 25/05/17 14:57, Andrew Dinn wrote:
>>> On x86 the release_store_ptr operation just reduces to an assignment of
>>> volatile field _indices. That alone doesn't stop the compiler
>>> re-ordering it before the assignment of f1. Making both fields volatile
>>> does stop them being re-ordered.
>>
>> Please bear with me.  We have to set f1 and then bytecode_1.  We do not
>> want the store to bytecode_1 to move before the store to f1.
>>
>> OrderAccess::release_store_ptr() should be strong enough to guarantee that,
>> regardless of whether f1 is volatile or not.
>> If it's not, there should be a compiler fence in release_store_ptr().
> 
> On a weak architecture like AArch64 OrderAccess::release_store_ptr()
> will be translated to an ordered write. That will ensure that order of
> generated store instructions and order of memory system visibility for
> those stores reflect source order.
> 
> On x86 OrderAccess::release_store_ptr() reduces to a simple write.
> That's because TCO means that there is no need to do anything in order
> to ensure that /memory visibility/ order respects instruction
> generation/execution order.
> 
> However, on x86 there most definitely /is/ a need to ensure that the
> compiler generates these store instructions in source order. That's why
> both fields need to be volatile. A C++ compiler may not re-order
> volatile writes.

Well, that's wrong.  The bug is in OrderAccess::release_store_ptr(),
which must not allow this reordering.  Put a proper release barrier
in there before the store, and all will be well:

    __atomic_thread_fence(__ATOMIC_RELEASE);

There's really no need to make both fields volatile.  And to do so
leaves a lurking bug for any other unsuspecting user of
release_store_ptr().

Andrew.

From adinn at redhat.com  Thu May 25 15:03:24 2017
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 25 May 2017 16:03:24 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
Message-ID: <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>

On 25/05/17 15:38, Andrew Haley wrote:
> Well, that's wrong.  The bug is in OrderAccess::release_store_ptr(),
> which must not allow this reordering.  Put a proper release barrier
> in there before the store, and all will be well:
> 
>     __atomic_thread_fence(__ATOMIC_RELEASE);
> 
> There's really no need to make both fields volatile.  And to do so
> leaves a lurking bug for any other unsuspecting user of
> release_store_ptr().

Oops. Apologies for this but I misread the gdb output when I ran this on
jdk10. The re-ordering of the store instructions is not happening in
jdk10 or jdk9.

It does happen on jdk8 (that probably explains why the reproducer only
manifests the NPE in jdk8 :-).

The jdk10 implementation of release_Store_ptr has indeed already been
reworked to insert a compiler barrier (using "asm volatile memory") but
not a memory store barrier.

I retract this patch. The problem with jdk8 still exists. It probably
needs fixing by backporting the changes to the store_release etc.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Thu May 25 15:33:27 2017
From: aph at redhat.com (Andrew Haley)
Date: Thu, 25 May 2017 16:33:27 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
Message-ID: <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>

On 25/05/17 16:03, Andrew Dinn wrote:
> The jdk10 implementation of release_Store_ptr has indeed already been
> reworked to insert a compiler barrier (using "asm volatile memory") but
> not a memory store barrier.

Cool.  An asm volatile memory barrier is a bit stronger than is
perhaps needed, but it almost certainly will make no difference,
and is compatible with old releases of GCC.

Andrew.

From paul.hohensee at gmail.com  Thu May 25 20:29:52 2017
From: paul.hohensee at gmail.com (Paul Hohensee)
Date: Thu, 25 May 2017 13:29:52 -0700
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
Message-ID: <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>

I don't know that you want to retract the patch. There's still a bug here
imo that your patch fixes.

The pointer formal parameter types for all orderAccess methods are volatile
in order to force the compiler, at the point of method invocation, to order
the memory access through the pointer within the method with respect to
other volatile accesses. Accesses in the caller won't be ordered by the
compiler with respect to the access in the orderAccess method unless the
caller accesses are also volatile. And that's the bug.

If we want the compiler to not reorder accesses to _f1, _f1 must be
declared volatile.

volatile MethodData* _f1;

says that the MethodData is volatile (i.e., all accesses to parts of the
MethodData object are volatile), which isn't what we want if we're intent
on ordering with respect to accesses to _f1.

MethodData* volatile _f1;

is the way to do that.

Thanks,

Paul

On Thu, May 25, 2017 at 8:33 AM, Andrew Haley <aph at redhat.com> wrote:

> On 25/05/17 16:03, Andrew Dinn wrote:
> > The jdk10 implementation of release_Store_ptr has indeed already been
> > reworked to insert a compiler barrier (using "asm volatile memory") but
> > not a memory store barrier.
>
> Cool.  An asm volatile memory barrier is a bit stronger than is
> perhaps needed, but it almost certainly will make no difference,
> and is compatible with old releases of GCC.
>
> Andrew.
>

From david.holmes at oracle.com  Fri May 26 02:20:09 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 26 May 2017 12:20:09 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
Message-ID: <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>

On 26/05/2017 6:29 AM, Paul Hohensee wrote:
> I don't know that you want to retract the patch. There's still a bug here
> imo that your patch fixes.

I agree. This is a common error when dealing with pointer variables, 
especially when looking at surrounding usage on non-pointer variables. 
We need the _f1 pointer to be volatile, not the thing to which the _f1 
pointer points (well it's possible we may need both, I haven't dived 
that deep).

Any variable passed to an OrderAccess, or Atomic, function should be 
volatile to minimise the chances the C compiler will do something 
unexpected with it.

I don't even know what to make of the vmStructs.cpp existing code!

David

> The pointer formal parameter types for all orderAccess methods are volatile
> in order to force the compiler, at the point of method invocation, to order
> the memory access through the pointer within the method with respect to
> other volatile accesses. Accesses in the caller won't be ordered by the
> compiler with respect to the access in the orderAccess method unless the
> caller accesses are also volatile. And that's the bug.
> 
> If we want the compiler to not reorder accesses to _f1, _f1 must be
> declared volatile.
> 
> volatile MethodData* _f1;
> 
> says that the MethodData is volatile (i.e., all accesses to parts of the
> MethodData object are volatile), which isn't what we want if we're intent
> on ordering with respect to accesses to _f1.
> 
> MethodData* volatile _f1;
> 
> is the way to do that.
> 
> Thanks,
> 
> Paul
> 
> On Thu, May 25, 2017 at 8:33 AM, Andrew Haley <aph at redhat.com> wrote:
> 
>> On 25/05/17 16:03, Andrew Dinn wrote:
>>> The jdk10 implementation of release_Store_ptr has indeed already been
>>> reworked to insert a compiler barrier (using "asm volatile memory") but
>>> not a memory store barrier.
>>
>> Cool.  An asm volatile memory barrier is a bit stronger than is
>> perhaps needed, but it almost certainly will make no difference,
>> and is compatible with old releases of GCC.
>>
>> Andrew.
>>

From adinn at redhat.com  Fri May 26 08:11:27 2017
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 26 May 2017 09:11:27 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
Message-ID: <464bf18f-8d57-e8a7-f3a2-5ebbd4a993c1@redhat.com>

On 26/05/17 03:20, David Holmes wrote:
> On 26/05/2017 6:29 AM, Paul Hohensee wrote:
>> I don't know that you want to retract the patch. There's still a bug here
>> imo that your patch fixes.
> 
> I agree. This is a common error when dealing with pointer variables,
> especially when looking at surrounding usage on non-pointer variables.
> We need the _f1 pointer to be volatile, not the thing to which the _f1
> pointer points (well it's possible we may need both, I haven't dived
> that deep).
> 
> Any variable passed to an OrderAccess, or Atomic, function should be
> volatile to minimise the chances the C compiler will do something
> unexpected with it.
> 
> I don't even know what to make of the vmStructs.cpp existing code!

Hmm, well this is a conundrum then. One piece of advice from the two of
you and another from Andrew Haley (who is 'technically' my boss --- i.e.
he's the technical lead for my team :-).

Your view is precisely what I originally assumed was at play here i.e.
that where successive writes to fields must be seen by other threads in
the correct order that is to be achieved on x86 by making both fields
volatile. This guarantees sequencing of generated store instructions by
the compiler in accordance with source order and, hence, because x86 is
TCO, visibility of those store instructions in that same order.

Of course, that is inadequate on weak-memory models like ppc and
AArch64. So, to make your suggestion work properly for all architectures
the second write (at least, if not the first) also needs to be
implemented using a call to store_release. That will definitely ensure
that the first write is visible before the second on all architectures.
It has been our hope (Andrew's and mine) since we completed the AArch64
port that all pairs of stores which require ordering do indeed employ a
store_release (we have had to correct a few cases over the last few years).

Andrew's belief seems to be that your model is error prone and is fixed
more correctly by introducing a memory and/or compiler barrier into the
implementation of release_store. If instead release_store is used
consistently whenever the second of a pair of writes needs to be
guaranteed to be visible after the first then it will provide the
desired outcome. This belief seems indeed to be backed up by the changes
made to the jdk9 code base quite some while back (the ones I failed to
notice). The relevant commit is

  7143664: Clean up OrderAccess implementations and usage

(n.b. I believe the author is one D Holmes :-)

I think Andrew's view is probably sound (and not just because he is my
boss). Since we must use release_store everywhere we want visibility of
writes to be ordered then also requiring both the fields involved to be
volatile is redundant. Given what little C++ volatile declarations do
achieve it might be wiser not to be using volatile declarations at all.
We would certainly start finding missing store_release calls quicker ;-)

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From aph at redhat.com  Fri May 26 08:26:01 2017
From: aph at redhat.com (Andrew Haley)
Date: Fri, 26 May 2017 09:26:01 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
Message-ID: <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>

On 26/05/17 03:20, David Holmes wrote:
> Any variable passed to an OrderAccess, or Atomic, function should be 
> volatile to minimise the chances the C compiler will do something 
> unexpected with it.

That's not much more than paranoia, IMO.  If the barriers are strong
enough it'll be fine.  The problem was, I suppose, with old compilers
which didn't handle memory barriers properly, but we should be moving
towards standard ways of doing these things.  Standard atomics have
been available since C++11 (I think) and GCC has had support since long
before then.

Maybe in the JDK10 timeframe we can look at upgrading the compilers
for all platforms.

Andrew.

From david.holmes at oracle.com  Fri May 26 09:20:19 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 26 May 2017 19:20:19 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
Message-ID: <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>

Hi Andrew,

On 26/05/2017 6:26 PM, Andrew Haley wrote:
> On 26/05/17 03:20, David Holmes wrote:
>> Any variable passed to an OrderAccess, or Atomic, function should be
>> volatile to minimise the chances the C compiler will do something
>> unexpected with it.
> 
> That's not much more than paranoia, IMO.  If the barriers are strong
> enough it'll be fine.  The problem was, I suppose, with old compilers
> which didn't handle memory barriers properly, but we should be moving
> towards standard ways of doing these things.  Standard atomics have
> been available since C++11 (I think) and GCC has had support since long
> before then.

The issue isn't just the barriers that might be involved inside 
orderAccess methods. If these variables are being used in racy lock-free 
code then they should be marked volatile to ensure other compiler 
optimizations don't interfere. Perhaps that is paranoia, but I'd rather 
a little harmless paranoia than try to debug what might otherwise go wrong.

Regardless of anything else the declaration(s) of _f1 are "wrong" under 
our existing approach to lock-free code. Fixing those declarations may 
or may not make any difference to the observed spurious NPE problem.

The backport of the improved compiler_barrier is a separate issue.

> Maybe in the JDK10 timeframe we can look at upgrading the compilers
> for all platforms.

I have no doubt we will upgrade compilers, but whether we try to use 
C++11 features/APIs is a different matter. IIRC there are already some 
open RFEs to look into this.

Thanks,
David

> Andrew.
> 

From adinn at redhat.com  Fri May 26 09:35:27 2017
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 26 May 2017 10:35:27 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
Message-ID: <6021f2b1-4b0a-5229-aaa1-57d608f4a49c@redhat.com>

Hi David,

On 26/05/17 10:20, David Holmes wrote:
> Hi Andrew,
> 
> On 26/05/2017 6:26 PM, Andrew Haley wrote:
>> On 26/05/17 03:20, David Holmes wrote:
>>> Any variable passed to an OrderAccess, or Atomic, function should be
>>> volatile to minimise the chances the C compiler will do something
>>> unexpected with it.
>>
>> That's not much more than paranoia, IMO.  If the barriers are strong
>> enough it'll be fine.  The problem was, I suppose, with old compilers
>> which didn't handle memory barriers properly, but we should be moving
>> towards standard ways of doing these things.  Standard atomics have
>> been available since C++11 (I think) and GCC has had support since long
>> before then.
> 
> The issue isn't just the barriers that might be involved inside
> orderAccess methods. If these variables are being used in racy lock-free
> code then they should be marked volatile to ensure other compiler
> optimizations don't interfere. Perhaps that is paranoia, but I'd rather
> a little harmless paranoia than try to debug what might otherwise go wrong.

I don't understand what you are suggesting here. How is such racy,
lock-free code ever going to work on architectures with weak memory models?

> ...

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From david.holmes at oracle.com  Fri May 26 09:40:40 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 26 May 2017 19:40:40 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <6021f2b1-4b0a-5229-aaa1-57d608f4a49c@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <6021f2b1-4b0a-5229-aaa1-57d608f4a49c@redhat.com>
Message-ID: <77ef6b38-b5a4-8922-4250-57ce13c4dda8@oracle.com>

On 26/05/2017 7:35 PM, Andrew Dinn wrote:
> Hi David,
> 
> On 26/05/17 10:20, David Holmes wrote:
>> Hi Andrew,
>>
>> On 26/05/2017 6:26 PM, Andrew Haley wrote:
>>> On 26/05/17 03:20, David Holmes wrote:
>>>> Any variable passed to an OrderAccess, or Atomic, function should be
>>>> volatile to minimise the chances the C compiler will do something
>>>> unexpected with it.
>>>
>>> That's not much more than paranoia, IMO.  If the barriers are strong
>>> enough it'll be fine.  The problem was, I suppose, with old compilers
>>> which didn't handle memory barriers properly, but we should be moving
>>> towards standard ways of doing these things.  Standard atomics have
>>> been available since C++11 (I think) and GCC has had support since long
>>> before then.
>>
>> The issue isn't just the barriers that might be involved inside
>> orderAccess methods. If these variables are being used in racy lock-free
>> code then they should be marked volatile to ensure other compiler
>> optimizations don't interfere. Perhaps that is paranoia, but I'd rather
>> a little harmless paranoia than try to debug what might otherwise go wrong.
> 
> I don't understand what you are suggesting here. How is such racy,
> lock-free code ever going to work on architectures with weak memory models?

By using load-acquire/store-release and atomic operations - that's how 
you write lock-free algorithms.

David

>> ...
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
> 

From david.holmes at oracle.com  Fri May 26 09:43:31 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 26 May 2017 19:43:31 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <464bf18f-8d57-e8a7-f3a2-5ebbd4a993c1@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <464bf18f-8d57-e8a7-f3a2-5ebbd4a993c1@redhat.com>
Message-ID: <5dfd5bdc-1019-cd5c-f3c6-6e9e86c527d5@oracle.com>

One important correction:

>   7143664: Clean up OrderAccess implementations and usage
> 
> (n.b. I believe the author is one D Holmes 

No that work was done by Erik Osterlund (before he joined Oracle). I was 
only the sponsor.

7143664: Clean up OrderAccess implementations and usage
Summary: Clarify and correct the abstract model for memory barriers 
provided by the orderAccess class. Refactor the implementations using 
template specialization to allow the bulk of the code to be shared, with 
platform specific customizations applied as needed.
Reviewed-by: acorn, dcubed, dholmes, dlong, goetz, kbarrett, sgehwolf
Contributed-by: Erik Osterlund <erik.osterlund at lnu.se>

Cheers,
David
-----

On 26/05/2017 6:11 PM, Andrew Dinn wrote:
> On 26/05/17 03:20, David Holmes wrote:
>> On 26/05/2017 6:29 AM, Paul Hohensee wrote:
>>> I don't know that you want to retract the patch. There's still a bug here
>>> imo that your patch fixes.
>>
>> I agree. This is a common error when dealing with pointer variables,
>> especially when looking at surrounding usage on non-pointer variables.
>> We need the _f1 pointer to be volatile, not the thing to which the _f1
>> pointer points (well it's possible we may need both, I haven't dived
>> that deep).
>>
>> Any variable passed to an OrderAccess, or Atomic, function should be
>> volatile to minimise the chances the C compiler will do something
>> unexpected with it.
>>
>> I don't even know what to make of the vmStructs.cpp existing code!
> 
> Hmm, well this is a conundrum then. One piece of advice from the two of
> you and another from Andrew Haley (who is 'technically' my boss --- i.e.
> he's the technical lead for my team :-).
> 
> Your view is precisely what I originally assumed was at play here i.e.
> that where successive writes to fields must be seen by other threads in
> the correct order that is to be achieved on x86 by making both fields
> volatile. This guarantees sequencing of generated store instructions by
> the compiler in accordance with source order and, hence, because x86 is
> TCO, visibility of those store instructions in that same order.
> 
> Of course, that is inadequate on weak-memory models like ppc and
> AArch64. So, to make your suggestion work properly for all architectures
> the second write (at least, if not the first) also needs to be
> implemented using a call to store_release. That will definitely ensure
> that the first write is visible before the second on all architectures.
> It has been our hope (Andrew's and mine) since we completed the AArch64
> port that all pairs of stores which require ordering do indeed employ a
> store_release (we have had to correct a few cases over the last few years).
> 
> Andrew's belief seems to be that your model is error prone and is fixed
> more correctly by introducing a memory and/or compiler barrier into the
> implementation of release_store. If instead release_store is used
> consistently whenever the second of a pair of writes needs to be
> guaranteed to be visible after the first then it will provide the
> desired outcome. This belief seems indeed to be backed up by the changes
> made to the jdk9 code base quite some while back (the ones I failed to
> notice). The relevant commit is
> 
>    7143664: Clean up OrderAccess implementations and usage
> 
> (n.b. I believe the author is one D Holmes :-)
> 
> I think Andrew's view is probably sound (and not just because he is my
> boss). Since we must use release_store everywhere we want visibility of
> writes to be ordered then also requiring both the fields involved to be
> volatile is redundant. Given what little C++ volatile declarations do
> achieve it might be wiser not to be using volatile declarations at all.
> We would certainly start finding missing store_release calls quicker ;-)
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
> 

From adinn at redhat.com  Fri May 26 09:48:59 2017
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 26 May 2017 10:48:59 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <77ef6b38-b5a4-8922-4250-57ce13c4dda8@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <6021f2b1-4b0a-5229-aaa1-57d608f4a49c@redhat.com>
 <77ef6b38-b5a4-8922-4250-57ce13c4dda8@oracle.com>
Message-ID: <e06e21fc-7144-00b0-25b5-cc1f4594741f@redhat.com>

On 26/05/17 10:40, David Holmes wrote:
> On 26/05/2017 7:35 PM, Andrew Dinn wrote:
>> Hi David,
>>
>> On 26/05/17 10:20, David Holmes wrote:
>>> Hi Andrew,
>>>
>>> On 26/05/2017 6:26 PM, Andrew Haley wrote:
>>>> On 26/05/17 03:20, David Holmes wrote:
>>>>> Any variable passed to an OrderAccess, or Atomic, function should be
>>>>> volatile to minimise the chances the C compiler will do something
>>>>> unexpected with it.
>>>>
>>>> That's not much more than paranoia, IMO.  If the barriers are strong
>>>> enough it'll be fine.  The problem was, I suppose, with old compilers
>>>> which didn't handle memory barriers properly, but we should be moving
>>>> towards standard ways of doing these things.  Standard atomics have
>>>> been available since C++11 (I think) and GCC has had support since long
>>>> before then.
>>>
>>> The issue isn't just the barriers that might be involved inside
>>> orderAccess methods. If these variables are being used in racy lock-free
>>> code then they should be marked volatile to ensure other compiler
>>> optimizations don't interfere. Perhaps that is paranoia, but I'd rather
>>> a little harmless paranoia than try to debug what might otherwise go
>>> wrong.
>>
>> I don't understand what you are suggesting here. How is such racy,
>> lock-free code ever going to work on architectures with weak memory
>> models?
> 
> By using load-acquire/store-release and atomic operations - that's how
> you write lock-free algorithms.

Now I'm even more confused. Surely, the implementations of
load-acquire/store-release and atomic operations themselves guarantee
that 'other compiler optimizations don't interfere'. Why doesn't that
make the volatile declarations redundant?

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From david.holmes at oracle.com  Fri May 26 11:02:27 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 26 May 2017 21:02:27 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <e06e21fc-7144-00b0-25b5-cc1f4594741f@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <6021f2b1-4b0a-5229-aaa1-57d608f4a49c@redhat.com>
 <77ef6b38-b5a4-8922-4250-57ce13c4dda8@oracle.com>
 <e06e21fc-7144-00b0-25b5-cc1f4594741f@redhat.com>
Message-ID: <57be32fc-be71-b13c-0cd3-8fbb899df688@oracle.com>

On 26/05/2017 7:48 PM, Andrew Dinn wrote:
> On 26/05/17 10:40, David Holmes wrote:
>> On 26/05/2017 7:35 PM, Andrew Dinn wrote:
>>> Hi David,
>>>
>>> On 26/05/17 10:20, David Holmes wrote:
>>>> Hi Andrew,
>>>>
>>>> On 26/05/2017 6:26 PM, Andrew Haley wrote:
>>>>> On 26/05/17 03:20, David Holmes wrote:
>>>>>> Any variable passed to an OrderAccess, or Atomic, function should be
>>>>>> volatile to minimise the chances the C compiler will do something
>>>>>> unexpected with it.
>>>>>
>>>>> That's not much more than paranoia, IMO.  If the barriers are strong
>>>>> enough it'll be fine.  The problem was, I suppose, with old compilers
>>>>> which didn't handle memory barriers properly, but we should be moving
>>>>> towards standard ways of doing these things.  Standard atomics have
>>>>> been available since C++11 (I think) and GCC has had support since long
>>>>> before then.
>>>>
>>>> The issue isn't just the barriers that might be involved inside
>>>> orderAccess methods. If these variables are being used in racy lock-free
>>>> code then they should be marked volatile to ensure other compiler
>>>> optimizations don't interfere. Perhaps that is paranoia, but I'd rather
>>>> a little harmless paranoia than try to debug what might otherwise go
>>>> wrong.
>>>
>>> I don't understand what you are suggesting here. How is such racy,
>>> lock-free code ever going to work on architectures with weak memory
>>> models?
>>
>> By using load-acquire/store-release and atomic operations - that's how
>> you write lock-free algorithms.
> 
> Now I'm even more confused. Surely, the implementations of
> load-acquire/store-release and atomic operations themselves guarantee
> that 'other compiler optimizations don't interfere'. Why doesn't that
> make the volatile declarations redundant?

Good question. Perhaps with the right implementation it does. But for 
the last 15+ years as far as I am aware the general "wisdom" has been 
that use of C/C++ volatile was a necessary, but nowhere near sufficient, 
condition when writing such algorithms with 'hand-crafted' memory 
barriers and atomic instructions that are outside the C/C++ language, 
and the compiler.

Cheers,
David

> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
> 

From aph at redhat.com  Fri May 26 12:35:29 2017
From: aph at redhat.com (Andrew Haley)
Date: Fri, 26 May 2017 13:35:29 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
Message-ID: <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>

On 26/05/17 10:20, David Holmes wrote:
> Hi Andrew,
> 
> On 26/05/2017 6:26 PM, Andrew Haley wrote:
>> On 26/05/17 03:20, David Holmes wrote:
>>> Any variable passed to an OrderAccess, or Atomic, function should be
>>> volatile to minimise the chances the C compiler will do something
>>> unexpected with it.
>>
>> That's not much more than paranoia, IMO.  If the barriers are strong
>> enough it'll be fine.  The problem was, I suppose, with old compilers
>> which didn't handle memory barriers properly, but we should be moving
>> towards standard ways of doing these things.  Standard atomics have
>> been available since C++11 (I think) and GCC has had support since long
>> before then.
> 
> The issue isn't just the barriers that might be involved inside
> orderAccess methods. If these variables are being used in racy
> lock-free code then they should be marked volatile to ensure other
> compiler optimizations don't interfere. Perhaps that is paranoia,
> but I'd rather a little harmless paranoia than try to debug what
> might otherwise go wrong.

I'm always leery of this kind of reasoning because the hardware I most
care about has a very weakly-ordered memory system and will reorder
everything in the absence of synchronization.  If it is actually
necessary to use volatile on a TSO machine to get multi-thread
ordering then it is almost certainly incorrect code, because volatile
is not sufficient to do what is needed on non-TSO hardware.

So, if you "fix" code on a TSO machine by using volatile, you are
making work for me because I'll have to debug it on a non-TSO machine.
Fix it in a portable way by using the correct primitives and it's
correct everywhere, it's easier to reason about, and you lost nothing.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From paul.hohensee at gmail.com  Fri May 26 13:47:00 2017
From: paul.hohensee at gmail.com (Paul Hohensee)
Date: Fri, 26 May 2017 06:47:00 -0700
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
Message-ID: <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>

What David said, and a little history.

orderAccess was originally written (by me, though not as well as Erik's
rewrite) in order to support ia64, which also has a very weakly ordered
memory system. The idea is that there are two sources of potential
reordering, the first by the C++ compilers and the second by the hardware.
Using the volatile specifier consistently blocks the C++ compilers from
reordering, and the orderAccess methods block the hardware from reordering.
The idea was to minimize the number of required hardware barriers (which
can be quite expensive), so the model allows for code that need only
prevent compiler reordering. Another way to put it is that it allows for
the minimal use of hardware barriers.

An alternative would be to use only orderAccess methods to access data that
require ordering. The reason that works is because the formal parameter
types on the orderAccess methods' pointer formals are marked volatile, thus
preventing the C++ compilers from, say, inlining orderAccess methods and
reordering accesses derived from them.

I'm not a ppc memory ordering expert, but from the discussion it seems to
me that there are two bugs, one fixed by amending the ppc implementation of
release_store_ptr and the other by marking _f1 volatile.

Thanks,

Paul

On Fri, May 26, 2017 at 5:35 AM, Andrew Haley <aph at redhat.com> wrote:

> On 26/05/17 10:20, David Holmes wrote:
> > Hi Andrew,
> >
> > On 26/05/2017 6:26 PM, Andrew Haley wrote:
> >> On 26/05/17 03:20, David Holmes wrote:
> >>> Any variable passed to an OrderAccess, or Atomic, function should be
> >>> volatile to minimise the chances the C compiler will do something
> >>> unexpected with it.
> >>
> >> That's not much more than paranoia, IMO.  If the barriers are strong
> >> enough it'll be fine.  The problem was, I suppose, with old compilers
> >> which didn't handle memory barriers properly, but we should be moving
> >> towards standard ways of doing these things.  Standard atomics have
> >> been available since C++11 (I think) and GCC has had support since long
> >> before then.
> >
> > The issue isn't just the barriers that might be involved inside
> > orderAccess methods. If these variables are being used in racy
> > lock-free code then they should be marked volatile to ensure other
> > compiler optimizations don't interfere. Perhaps that is paranoia,
> > but I'd rather a little harmless paranoia than try to debug what
> > might otherwise go wrong.
>
> I'm always leery of this kind of reasoning because the hardware I most
> care about has a very weakly-ordered memory system and will reorder
> everything in the absence of synchronization.  If it is actually
> necessary to use volatile on a TSO machine to get multi-thread
> ordering then it is almost certainly incorrect code, because volatile
> is not sufficient to do what is needed on non-TSO hardware.
>
> So, if you "fix" code on a TSO machine by using volatile, you are
> making work for me because I'll have to debug it on a non-TSO machine.
> Fix it in a portable way by using the correct primitives and it's
> correct everywhere, it's easier to reason about, and you lost nothing.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>

From david.holmes at oracle.com  Fri May 26 13:57:32 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 26 May 2017 23:57:32 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
Message-ID: <e814f546-e766-4368-b259-23a3d4e37c21@oracle.com>

On 26/05/2017 10:35 PM, Andrew Haley wrote:
> On 26/05/17 10:20, David Holmes wrote:
>> Hi Andrew,
>>
>> On 26/05/2017 6:26 PM, Andrew Haley wrote:
>>> On 26/05/17 03:20, David Holmes wrote:
>>>> Any variable passed to an OrderAccess, or Atomic, function should be
>>>> volatile to minimise the chances the C compiler will do something
>>>> unexpected with it.
>>>
>>> That's not much more than paranoia, IMO.  If the barriers are strong
>>> enough it'll be fine.  The problem was, I suppose, with old compilers
>>> which didn't handle memory barriers properly, but we should be moving
>>> towards standard ways of doing these things.  Standard atomics have
>>> been available since C++11 (I think) and GCC has had support since long
>>> before then.
>>
>> The issue isn't just the barriers that might be involved inside
>> orderAccess methods. If these variables are being used in racy
>> lock-free code then they should be marked volatile to ensure other
>> compiler optimizations don't interfere. Perhaps that is paranoia,
>> but I'd rather a little harmless paranoia than try to debug what
>> might otherwise go wrong.
> 
> I'm always leery of this kind of reasoning because the hardware I most
> care about has a very weakly-ordered memory system and will reorder
> everything in the absence of synchronization.  If it is actually
> necessary to use volatile on a TSO machine to get multi-thread
> ordering then it is almost certainly incorrect code, because volatile
> is not sufficient to do what is needed on non-TSO hardware.
> 
> So, if you "fix" code on a TSO machine by using volatile, you are
> making work for me because I'll have to debug it on a non-TSO machine.

No we do not "fix" the code by adding volatile. We as rule mark all 
variables involved as "volatile" because it is the only thing we can do 
to tell the compiler that there are things going on it is not aware of. 
In addition we use barriers and atomic instructions to be correct on 
every platform strong or weakly ordered - at least that is the intent. 
Now it may be that if your compiler is truly multi-thread aware and 
fence aware and atomic aware, and you use all those things directly that 
you don't need to also use "volatile". But the JVM does not at this time 
exist in that world.

David

> Fix it in a portable way by using the correct primitives and it's
> correct everywhere, it's easier to reason about, and you lost nothing.
> 

From paul.hohensee at gmail.com  Fri May 26 14:09:06 2017
From: paul.hohensee at gmail.com (Paul Hohensee)
Date: Fri, 26 May 2017 07:09:06 -0700
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
Message-ID: <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>

Note that the current model doesn't prevent one from using orderAccess for
all accesses to orderable data, so one can use that model if desired.

On Fri, May 26, 2017 at 6:47 AM, Paul Hohensee <paul.hohensee at gmail.com>
wrote:

> What David said, and a little history.
>
> orderAccess was originally written (by me, though not as well as Erik's
> rewrite) in order to support ia64, which also has a very weakly ordered
> memory system. The idea is that there are two sources of potential
> reordering, the first by the C++ compilers and the second by the hardware.
> Using the volatile specifier consistently blocks the C++ compilers from
> reordering, and the orderAccess methods block the hardware from reordering.
> The idea was to minimize the number of required hardware barriers (which
> can be quite expensive), so the model allows for code that need only
> prevent compiler reordering. Another way to put it is that it allows for
> the minimal use of hardware barriers.
>
> An alternative would be to use only orderAccess methods to access data
> that require ordering. The reason that works is because the formal
> parameter types on the orderAccess methods' pointer formals are marked
> volatile, thus preventing the C++ compilers from, say, inlining orderAccess
> methods and reordering accesses derived from them.
>
> I'm not a ppc memory ordering expert, but from the discussion it seems to
> me that there are two bugs, one fixed by amending the ppc implementation of
> release_store_ptr and the other by marking _f1 volatile.
>
> Thanks,
>
> Paul
>
> On Fri, May 26, 2017 at 5:35 AM, Andrew Haley <aph at redhat.com> wrote:
>
>> On 26/05/17 10:20, David Holmes wrote:
>> > Hi Andrew,
>> >
>> > On 26/05/2017 6:26 PM, Andrew Haley wrote:
>> >> On 26/05/17 03:20, David Holmes wrote:
>> >>> Any variable passed to an OrderAccess, or Atomic, function should be
>> >>> volatile to minimise the chances the C compiler will do something
>> >>> unexpected with it.
>> >>
>> >> That's not much more than paranoia, IMO.  If the barriers are strong
>> >> enough it'll be fine.  The problem was, I suppose, with old compilers
>> >> which didn't handle memory barriers properly, but we should be moving
>> >> towards standard ways of doing these things.  Standard atomics have
>> >> been available since C++11 (I think) and GCC has had support since long
>> >> before then.
>> >
>> > The issue isn't just the barriers that might be involved inside
>> > orderAccess methods. If these variables are being used in racy
>> > lock-free code then they should be marked volatile to ensure other
>> > compiler optimizations don't interfere. Perhaps that is paranoia,
>> > but I'd rather a little harmless paranoia than try to debug what
>> > might otherwise go wrong.
>>
>> I'm always leery of this kind of reasoning because the hardware I most
>> care about has a very weakly-ordered memory system and will reorder
>> everything in the absence of synchronization.  If it is actually
>> necessary to use volatile on a TSO machine to get multi-thread
>> ordering then it is almost certainly incorrect code, because volatile
>> is not sufficient to do what is needed on non-TSO hardware.
>>
>> So, if you "fix" code on a TSO machine by using volatile, you are
>> making work for me because I'll have to debug it on a non-TSO machine.
>> Fix it in a portable way by using the correct primitives and it's
>> correct everywhere, it's easier to reason about, and you lost nothing.
>>
>> --
>> Andrew Haley
>> Java Platform Lead Engineer
>> Red Hat UK Ltd. <https://www.redhat.com>
>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>>
>
>

From volker.simonis at gmail.com  Fri May 26 16:03:10 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Fri, 26 May 2017 16:03:10 +0000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
Message-ID: <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>

Volatile not only prevents reordering by the compiler. It also prevents
other, otherwise legal transformations/optimizations (like for example
reloading a variable [1]) which have to be prevented in order to write
correct, lock free  programs.

So I think declaring the variables involved in such algorithms volatile is
currently still necessary.

Regards,
Volker

[1] RFR(XS): JDK-8129440 G1 crash during concurrent root region scan (
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-June/013928.html)

Paul Hohensee <paul.hohensee at gmail.com> schrieb am Fr. 26. Mai 2017 um
17:09:

> Note that the current model doesn't prevent one from using orderAccess for
> all accesses to orderable data, so one can use that model if desired.
>
> On Fri, May 26, 2017 at 6:47 AM, Paul Hohensee <paul.hohensee at gmail.com>
> wrote:
>
> > What David said, and a little history.
> >
> > orderAccess was originally written (by me, though not as well as Erik's
> > rewrite) in order to support ia64, which also has a very weakly ordered
> > memory system. The idea is that there are two sources of potential
> > reordering, the first by the C++ compilers and the second by the
> hardware.
> > Using the volatile specifier consistently blocks the C++ compilers from
> > reordering, and the orderAccess methods block the hardware from
> reordering.
> > The idea was to minimize the number of required hardware barriers (which
> > can be quite expensive), so the model allows for code that need only
> > prevent compiler reordering. Another way to put it is that it allows for
> > the minimal use of hardware barriers.
> >
> > An alternative would be to use only orderAccess methods to access data
> > that require ordering. The reason that works is because the formal
> > parameter types on the orderAccess methods' pointer formals are marked
> > volatile, thus preventing the C++ compilers from, say, inlining
> orderAccess
> > methods and reordering accesses derived from them.
> >
> > I'm not a ppc memory ordering expert, but from the discussion it seems to
> > me that there are two bugs, one fixed by amending the ppc implementation
> of
> > release_store_ptr and the other by marking _f1 volatile.
> >
> > Thanks,
> >
> > Paul
> >
> > On Fri, May 26, 2017 at 5:35 AM, Andrew Haley <aph at redhat.com> wrote:
> >
> >> On 26/05/17 10:20, David Holmes wrote:
> >> > Hi Andrew,
> >> >
> >> > On 26/05/2017 6:26 PM, Andrew Haley wrote:
> >> >> On 26/05/17 03:20, David Holmes wrote:
> >> >>> Any variable passed to an OrderAccess, or Atomic, function should be
> >> >>> volatile to minimise the chances the C compiler will do something
> >> >>> unexpected with it.
> >> >>
> >> >> That's not much more than paranoia, IMO.  If the barriers are strong
> >> >> enough it'll be fine.  The problem was, I suppose, with old compilers
> >> >> which didn't handle memory barriers properly, but we should be moving
> >> >> towards standard ways of doing these things.  Standard atomics have
> >> >> been available since C++11 (I think) and GCC has had support since
> long
> >> >> before then.
> >> >
> >> > The issue isn't just the barriers that might be involved inside
> >> > orderAccess methods. If these variables are being used in racy
> >> > lock-free code then they should be marked volatile to ensure other
> >> > compiler optimizations don't interfere. Perhaps that is paranoia,
> >> > but I'd rather a little harmless paranoia than try to debug what
> >> > might otherwise go wrong.
> >>
> >> I'm always leery of this kind of reasoning because the hardware I most
> >> care about has a very weakly-ordered memory system and will reorder
> >> everything in the absence of synchronization.  If it is actually
> >> necessary to use volatile on a TSO machine to get multi-thread
> >> ordering then it is almost certainly incorrect code, because volatile
> >> is not sufficient to do what is needed on non-TSO hardware.
> >>
> >> So, if you "fix" code on a TSO machine by using volatile, you are
> >> making work for me because I'll have to debug it on a non-TSO machine.
> >> Fix it in a portable way by using the correct primitives and it's
> >> correct everywhere, it's easier to reason about, and you lost nothing.
> >>
> >> --
> >> Andrew Haley
> >> Java Platform Lead Engineer
> >> Red Hat UK Ltd. <https://www.redhat.com>
> >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> >>
> >
> >
>

From aph at redhat.com  Fri May 26 16:09:33 2017
From: aph at redhat.com (Andrew Haley)
Date: Fri, 26 May 2017 17:09:33 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
Message-ID: <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>

On 26/05/17 17:03, Volker Simonis wrote:

> Volatile not only prevents reordering by the compiler. It also
> prevents other, otherwise legal transformations/optimizations (like
> for example reloading a variable [1]) which have to be prevented in
> order to write correct, lock free programs.

Yes, but so do compiler barriers.

> So I think declaring the variables involved in such algorithms
> volatile is currently still necessary.

IMO, only if compiler barriers don't work; and that implies broken
compilers.  But from the responses I've seen, the assumption is that
the compilers used to build HotSpot are broken.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From kim.barrett at oracle.com  Sat May 27 02:44:47 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 26 May 2017 22:44:47 -0400
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
Message-ID: <DCBF3719-6827-498C-88F3-74A6316BB9C0@oracle.com>

> On May 26, 2017, at 12:09 PM, Andrew Haley <aph at redhat.com> wrote:
> 
> On 26/05/17 17:03, Volker Simonis wrote:
> 
>> Volatile not only prevents reordering by the compiler. It also
>> prevents other, otherwise legal transformations/optimizations (like
>> for example reloading a variable [1]) which have to be prevented in
>> order to write correct, lock free programs.
> 
> Yes, but so do compiler barriers.
> 
>> So I think declaring the variables involved in such algorithms
>> volatile is currently still necessary.
> 
> IMO, only if compiler barriers don't work; and that implies broken
> compilers.  But from the responses I've seen, the assumption is that
> the compilers used to build HotSpot are broken.

Compiler barriers don't work if they aren't present. And for TCO
systems, that problem exists in jdk8. It was Erik O's jdk9 changes
that introduced compiler barriers. Before then, in code like the
following:

  x = new_x;
  OrderAccess::release_store(&y, new_y);

on TCO systems, the compiler was free to move the store of x after the
store of y if x is not volatile, because there is no compile barrier
in the release_store. Old compilers tended to treat volatile accesses
as stronger constraints than required by the standard. Newer
compilers, not so much. Hence the sprinkling of volatile pixie dust.
It might be worthwhile backporting the compile barriers to jdk8.

It's a separate question whether y needs to be volatile. In that
snippet, strictly speaking, it doesn't, as the release_store parameter
takes care of that. However, there's been a sort of informal use of
volatile declarations to flag such variables are "interesting" and as
a sort of marker for future std::atomic<> or the like.


From kim.barrett at oracle.com  Sat May 27 02:45:23 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 26 May 2017 22:45:23 -0400
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
Message-ID: <E9FD9B1F-70B1-429B-B133-85026E8F6A79@oracle.com>

> On May 26, 2017, at 12:03 PM, Volker Simonis <volker.simonis at gmail.com> wrote:
> 
> Volatile not only prevents reordering by the compiler. It also prevents
> other, otherwise legal transformations/optimizations (like for example
> reloading a variable [1]) which have to be prevented in order to write
> correct, lock free  programs.
> 
> So I think declaring the variables involved in such algorithms volatile is
> currently still necessary.

Seems like the thing to do would be to use Atomic::load instead of a
bare reference.


From aph at redhat.com  Sat May 27 06:44:54 2017
From: aph at redhat.com (Andrew Haley)
Date: Sat, 27 May 2017 07:44:54 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <DCBF3719-6827-498C-88F3-74A6316BB9C0@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <DCBF3719-6827-498C-88F3-74A6316BB9C0@oracle.com>
Message-ID: <da391bd5-5a70-1ea9-ad32-563a46e576cc@redhat.com>

On 27/05/17 03:44, Kim Barrett wrote:
> It might be worthwhile backporting the compile barriers to jdk8.

Certainly, IMO.  They're necessary for correctness.  Standard C++ now
treats all code with data races as undefined behaviour, and we've got
to get used to that.  The more we tell the C++ compiler about what we
want, the more it can optimize and the faster our JVMs will be.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From volker.simonis at gmail.com  Sat May 27 08:27:54 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Sat, 27 May 2017 10:27:54 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <E9FD9B1F-70B1-429B-B133-85026E8F6A79@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <E9FD9B1F-70B1-429B-B133-85026E8F6A79@oracle.com>
Message-ID: <CA+3eh10+uJTjEnFiVT3ypFDn7htuAmKiUBEtsY2Exnykaj-XYQ@mail.gmail.com>

On Sat, May 27, 2017 at 4:45 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> On May 26, 2017, at 12:03 PM, Volker Simonis <volker.simonis at gmail.com> wrote:
>>
>> Volatile not only prevents reordering by the compiler. It also prevents
>> other, otherwise legal transformations/optimizations (like for example
>> reloading a variable [1]) which have to be prevented in order to write
>> correct, lock free  programs.
>>
>> So I think declaring the variables involved in such algorithms volatile is
>> currently still necessary.
>
> Seems like the thing to do would be to use Atomic::load instead of a
> bare reference.
>

Yes, but Atomic::load is not overloaded for oop/narrowOop and the
naming doesn't really express what we want to achieve. The proposed
fix was to change 'oopDesc::load_heap_oop()' such that it casts its
plain pointer argument into a 'pointer to volatile' argument.
Unfortunately, I've just realized, that this fix (i.e. JDK-8129440
[2]) was never pushed which I think is bad (we have it in our SAP JVM
since long time). The comment on 'oopDesc::load_heap_oop()' clearly
states that it is "Called by GC to check for null before decoding".
This obviously can not work reliably if the oop is reloaded a second
time after the null check (and before the decoding). I don't see how a
compiler barrier could help here because this is not a question of
reordering.

[2] https://bugs.openjdk.java.net/browse/JDK-8129440

From volker.simonis at gmail.com  Sat May 27 09:10:33 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Sat, 27 May 2017 11:10:33 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <678f73ae-0993-f5ef-e3dc-6a6940dd0a0c@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
Message-ID: <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>

On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
> On 26/05/17 17:03, Volker Simonis wrote:
>
>> Volatile not only prevents reordering by the compiler. It also
>> prevents other, otherwise legal transformations/optimizations (like
>> for example reloading a variable [1]) which have to be prevented in
>> order to write correct, lock free programs.
>
> Yes, but so do compiler barriers.
>

Please correct me if I'm wrong, but I thought "compiler barriers" are
to prevent reordering by the compiler. However, this is a question of
optimization. If you have two subsequent loads from the same address,
the compiler is free to do only the first load and keep the value in a
register if the address is not pointing to a volatile value. This is
one of the well known semantics of volatile.

But there's another, less known 'optimization' which is possible, if
an address is not pointing to a volatile value. If there's just a
single load, the compiler is free to reload that value a second time
later on (instead of spilling it to the stack or to another register).
And that was exactly the problem with JDK-8129440 [2]:

static inline oop       load_heap_oop(oop* p)       { return *p; }
...
template <class T>
inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
  // 1. load 'heap_oop' from 'p'
  T heap_oop = oopDesc::load_heap_oop(p);
  if (!oopDesc::is_null(heap_oop)) {
    // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
    HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
    _cm->grayRoot(obj, hr);
  }
}

How would a compiler barrier help here? How would it look like and
where would it have to be placed to?

I think this problem can currently only be solved reliably by
declaring the loaded value 'volatile'.

Regards,
Volker

[2] https://bugs.openjdk.java.net/browse/JDK-8129440

>> So I think declaring the variables involved in such algorithms
>> volatile is currently still necessary.
>
> IMO, only if compiler barriers don't work; and that implies broken
> compilers.  But from the responses I've seen, the assumption is that
> the compilers used to build HotSpot are broken.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Sun May 28 08:45:19 2017
From: aph at redhat.com (Andrew Haley)
Date: Sun, 28 May 2017 09:45:19 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <dfe55de4-ecfd-8df3-956e-8e38825a4b54@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
Message-ID: <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>

On 27/05/17 10:10, Volker Simonis wrote:
> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>> On 26/05/17 17:03, Volker Simonis wrote:
>>
>>> Volatile not only prevents reordering by the compiler. It also
>>> prevents other, otherwise legal transformations/optimizations (like
>>> for example reloading a variable [1]) which have to be prevented in
>>> order to write correct, lock free programs.
>>
>> Yes, but so do compiler barriers.
> 
> Please correct me if I'm wrong, but I thought "compiler barriers" are
> to prevent reordering by the compiler. However, this is a question of
> optimization. If you have two subsequent loads from the same address,
> the compiler is free to do only the first load and keep the value in a
> register if the address is not pointing to a volatile value.

No it isn't: that is precisely what a compiler barrier prevents.  A
compiler barrier (from the POV of the compiler) clobbers all of
the memory state.  Neither reads nor writes may move past a compiler
barrier.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From erik.osterlund at oracle.com  Mon May 29 12:20:26 2017
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 29 May 2017 14:20:26 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
Message-ID: <592C120A.1080908@oracle.com>

Hi Andrew,

I just thought I'd put my opinions in here as I see I have been 
mentioned a few times already.

First of all, I find using the volatile keyword on things that are 
involved in lock-free protocols meaningful from a readability point of 
view. It allows the reader of the code to see care is needed here.

About the compiler barriers - you are right. Volatile should indeed not 
be necessary if the compiler barriers do everything right. The compiler 
should not reorder things and it should not prevent reloading.

On windows we rely on the deprecated _ReadWriteBarrier(). According to 
MSDN, it guarantees:

"The _ReadWriteBarrier intrinsic limits the compiler optimizations that 
can remove or reorder memory accesses across the point of the call."

This should cut it.

The GCC memory clobber is defined as:

"The "memory" clobber tells the compiler that the assembly code performs 
memory reads or writes to items other than those listed in the input and 
output operands (for example, accessing the memory pointed to by one of 
the input parameters). To ensure memory contains correct values, GCC may 
need to flush specific register values to memory before executing the 
asm. Further, the compiler does not assume that any values read from 
memory before an asm remain unchanged after that asm; it reloads them as 
needed. Using the "memory" clobber effectively forms a read/write memory 
barrier for the compiler."

This seems to only guarantee values will not be re-ordered. But in the 
documentation for ExtendedAsm it also states:

"You will also want to add the volatile keyword if the memory affected 
is not listed in the inputs or outputs of the asm, as the `memory' 
clobber does not count as a side-effect of the asm."

and

"The volatile keyword indicates that the instruction has important 
side-effects. GCC will not delete a volatile asm if it is reachable. 
(The instruction can still be deleted if GCC can prove that control-flow 
will never reach the location of the instruction.) Note that even a 
volatile asm instruction can be moved relative to other code, including 
across jump instructions."

This is a bit vague, but seems to suggest that by making the asm 
statement volatile and having a memory clobber, it definitely will not 
reload variables. About not re-ordering non-volatile accesses, it 
shouldn't but it is not quite clearly stated. I have never observed such 
a re-ordering across a volatile memory clobber. But the semantics seem a 
bit vague.

As for clang, the closest to a definition of what it does I have seen is:

"A clobber constraint is indicated by a ?~? prefix. A clobber does not 
consume an input operand, nor generate an output. Clobbers cannot use 
any of the general constraint code letters ? they may use only explicit 
register constraints, e.g. ?~{eax}?. The one exception is that a clobber 
string of ?~{memory}? indicates that the assembly writes to arbitrary 
undeclared memory locations ? not only the memory pointed to by a 
declared indirect output."

Apart from sweeping statements saying clang inline assembly is largely 
compatible and working similar to GCC, I have not seen clear guarantees. 
And then there are more compilers.

As a conclusion, by using volatile in addition to OrderAccess you rely 
on standardized compiler semantics (at least for volatile-to-volatile 
re-orderings and re-loading, but not for volatile-to-nonvolatile, but 
that's another can of worms), and regrettably if you rely on OrderAccess 
memory model doing what it says it will do, then it should indeed work 
without volatile, but to make that work, OrderAccess relies on 
non-standardized compiler-specific barriers. In practice it should work 
well on all our supported compilers without volatile. And if it didn't, 
it would indeed be a bug in OrderAccess that needs to be solved in 
OrderAccess.

Personally though, I am a helmet-on-synchronization kind of person, so I 
would take precaution anyway and use volatile whenever possible, because 
1) it makes the code more readable, and 2) it provides one extra layer 
of safety that is more standardized. It seems that over the years it has 
happened multiple times that we assumed OrderAccess is bullet proof, and 
then realized that it wasn't and observed a crash that would never have 
happened if the code was written in a helmet-on-synchronization way. At 
least that's how I feel about it.

Now one might argue that by using C++11 atomics that are standardized, 
all these problems would go away as we would rely in standardized 
primitives and then just trust the compiler. But then there could arise 
problems when the C++ compiler decides to be less conservative than we 
want, e.g. by not doing fence in sequentially consistent loads to 
optimize for non-multiple copy atomic CPUs arguing that IRIW issues that 
violate sequential consistency are non-issues in practice. That makes 
those loads "almost" sequentially consistent, which might be good 
enough. But it feels good to have a choice here to be more conservative. 
To have the synchronization helmet on.

Meta summary:
1) Current OrderAccess without volatile:
   - should work, but relies on compiler-specific not standardized and 
sometimes poorly documented compiler barriers.

2) Current OrderAccess with volatile:
   - relies on standardized volatile semantics to guarantee compiler 
reordering and reloading issues do not occur.

3) C++11 Atomic backend for OrderAccess
   - relies on standardized semantics to guarantee compiler and hardware 
reordering issues
   - nevertheless isn't always flawless, and when it isn't, it gets painful

Hope this sheds some light on the trade-offs.

Thanks,
/Erik

On 2017-05-28 10:45, Andrew Haley wrote:
> On 27/05/17 10:10, Volker Simonis wrote:
>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>
>>>> Volatile not only prevents reordering by the compiler. It also
>>>> prevents other, otherwise legal transformations/optimizations (like
>>>> for example reloading a variable [1]) which have to be prevented in
>>>> order to write correct, lock free programs.
>>> Yes, but so do compiler barriers.
>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>> to prevent reordering by the compiler. However, this is a question of
>> optimization. If you have two subsequent loads from the same address,
>> the compiler is free to do only the first load and keep the value in a
>> register if the address is not pointing to a volatile value.
> No it isn't: that is precisely what a compiler barrier prevents.  A
> compiler barrier (from the POV of the compiler) clobbers all of
> the memory state.  Neither reads nor writes may move past a compiler
> barrier.
>


From volker.simonis at gmail.com  Mon May 29 17:02:25 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Mon, 29 May 2017 19:02:25 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <592C120A.1080908@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
Message-ID: <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>

Hi Erik,

thanks for the nice summary. Just for the sake of completeness, here's
the corresponding documentation for the xlc compiler barrier [1]. It
kind of implements the gcc syntax, but the wording is slightly
different:

"Add memory to the list of clobbered registers if assembler
instructions can change a memory location in an unpredictable fashion.
The memory clobber ensures that the data used after the completion of
the assembly statement is valid and synchronized.
However, the memory clobber can result in many unnecessary reloads,
reducing the benefits of hardware prefetching. Thus, the memory
clobber can impose a performance penalty and should be used with
caution."

We haven't used it until now, so I can not say if it really does what
it is supposed to do. I'm also concerned about the performance
warning. It seems like the "unnecessary reloads" can really hurt on
architectures like ppc which have much more registers than x86.
Declaring a memory location 'volatile' seems much more simple and
light-weight in order to achieve the desired effect. So I tend to
agree with you and David that we should proceed to mark things with
'volatile'.

Sorry for constantly "spamming" this thread with another problem (i.e.
JDK-8129440 [2]) but I still think that it is related and important.
In its current state, the way how "load_heap_oop()" and its
application works is broken. And this is not because of a problem in
OrderAccess, but because of missing compiler barriers:

static inline oop       load_heap_oop(oop* p)       { return *p; }
...
template <class T>
inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
  // 1. load 'heap_oop' from 'p'
  T heap_oop = oopDesc::load_heap_oop(p);
  if (!oopDesc::is_null(heap_oop)) {
    // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
    HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
    _cm->grayRoot(obj, hr);
  }
}

Notice that we don't need memory barriers here - all we need is to
prevent the compiler from loading the oop (i.e. 'heap_oop') a second
time. After Andrews explanation (thanks for that!) and Martin's
examples from Google, I think we could fix this by rewriting
'load_heap_oop()' (and friends) as follows:

static inline oop load_heap_oop(oop* p) {
  oop o = *p;
  __asm__ volatile ("" : : : "memory");
  return o;
}

In order to make this consistent across all platforms, we would
probably have to introduce a new, public "compiler barrier" function
in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
don't currently seem to have a cross-platform concept for
"compiler-only barriers"). But I'm still not convinced that it would
be better than simply writing (and that's the way how we've actually
solved it internally):

static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }

Declaring that single memory location to be 'volatile' seems to be a
much more local change compared to globally "clobbering" all the
memory. And it doesn't rely on a the compilers providing a compiler
barrier. It does however rely on the compiler doing the "right thing"
for volatile - but after all what has been said here so far, that
seems more likely?

The problem may also depend on the specific compiler/cpu combination.
For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
for volatile variables - they don't insert any memory barriers (i.e.
no instructions) but just access the corresponding variables as if
there was a compiler barrier. This is exactly what we currently want
in HotSpot, because fine-grained control of memory barriers is
controlled by the use of OrderAccess (and OrderAccess implies
"compiler barrier", at least after the latest fixes).

Any thoughts? Should we introduce a cross-platform, "compiler-only
barrier" or should we stick to using "volatile" for such cases?

Regards,
Volker

[1] https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
[2] https://bugs.openjdk.java.net/browse/JDK-8129440

On Mon, May 29, 2017 at 2:20 PM, Erik ?sterlund
<erik.osterlund at oracle.com> wrote:
> Hi Andrew,
>
> I just thought I'd put my opinions in here as I see I have been mentioned a
> few times already.
>
> First of all, I find using the volatile keyword on things that are involved
> in lock-free protocols meaningful from a readability point of view. It
> allows the reader of the code to see care is needed here.
>
> About the compiler barriers - you are right. Volatile should indeed not be
> necessary if the compiler barriers do everything right. The compiler should
> not reorder things and it should not prevent reloading.
>
> On windows we rely on the deprecated _ReadWriteBarrier(). According to MSDN,
> it guarantees:
>
> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that can
> remove or reorder memory accesses across the point of the call."
>
> This should cut it.
>
> The GCC memory clobber is defined as:
>
> "The "memory" clobber tells the compiler that the assembly code performs
> memory reads or writes to items other than those listed in the input and
> output operands (for example, accessing the memory pointed to by one of the
> input parameters). To ensure memory contains correct values, GCC may need to
> flush specific register values to memory before executing the asm. Further,
> the compiler does not assume that any values read from memory before an asm
> remain unchanged after that asm; it reloads them as needed. Using the
> "memory" clobber effectively forms a read/write memory barrier for the
> compiler."
>
> This seems to only guarantee values will not be re-ordered. But in the
> documentation for ExtendedAsm it also states:
>
> "You will also want to add the volatile keyword if the memory affected is
> not listed in the inputs or outputs of the asm, as the `memory' clobber does
> not count as a side-effect of the asm."
>
> and
>
> "The volatile keyword indicates that the instruction has important
> side-effects. GCC will not delete a volatile asm if it is reachable. (The
> instruction can still be deleted if GCC can prove that control-flow will
> never reach the location of the instruction.) Note that even a volatile asm
> instruction can be moved relative to other code, including across jump
> instructions."
>
> This is a bit vague, but seems to suggest that by making the asm statement
> volatile and having a memory clobber, it definitely will not reload
> variables. About not re-ordering non-volatile accesses, it shouldn't but it
> is not quite clearly stated. I have never observed such a re-ordering across
> a volatile memory clobber. But the semantics seem a bit vague.
>
> As for clang, the closest to a definition of what it does I have seen is:
>
> "A clobber constraint is indicated by a ?~? prefix. A clobber does not
> consume an input operand, nor generate an output. Clobbers cannot use any of
> the general constraint code letters ? they may use only explicit register
> constraints, e.g. ?~{eax}?. The one exception is that a clobber string of
> ?~{memory}? indicates that the assembly writes to arbitrary undeclared
> memory locations ? not only the memory pointed to by a declared indirect
> output."
>
> Apart from sweeping statements saying clang inline assembly is largely
> compatible and working similar to GCC, I have not seen clear guarantees. And
> then there are more compilers.
>
> As a conclusion, by using volatile in addition to OrderAccess you rely on
> standardized compiler semantics (at least for volatile-to-volatile
> re-orderings and re-loading, but not for volatile-to-nonvolatile, but that's
> another can of worms), and regrettably if you rely on OrderAccess memory
> model doing what it says it will do, then it should indeed work without
> volatile, but to make that work, OrderAccess relies on non-standardized
> compiler-specific barriers. In practice it should work well on all our
> supported compilers without volatile. And if it didn't, it would indeed be a
> bug in OrderAccess that needs to be solved in OrderAccess.
>
> Personally though, I am a helmet-on-synchronization kind of person, so I
> would take precaution anyway and use volatile whenever possible, because 1)
> it makes the code more readable, and 2) it provides one extra layer of
> safety that is more standardized. It seems that over the years it has
> happened multiple times that we assumed OrderAccess is bullet proof, and
> then realized that it wasn't and observed a crash that would never have
> happened if the code was written in a helmet-on-synchronization way. At
> least that's how I feel about it.
>
> Now one might argue that by using C++11 atomics that are standardized, all
> these problems would go away as we would rely in standardized primitives and
> then just trust the compiler. But then there could arise problems when the
> C++ compiler decides to be less conservative than we want, e.g. by not doing
> fence in sequentially consistent loads to optimize for non-multiple copy
> atomic CPUs arguing that IRIW issues that violate sequential consistency are
> non-issues in practice. That makes those loads "almost" sequentially
> consistent, which might be good enough. But it feels good to have a choice
> here to be more conservative. To have the synchronization helmet on.
>
> Meta summary:
> 1) Current OrderAccess without volatile:
>   - should work, but relies on compiler-specific not standardized and
> sometimes poorly documented compiler barriers.
>
> 2) Current OrderAccess with volatile:
>   - relies on standardized volatile semantics to guarantee compiler
> reordering and reloading issues do not occur.
>
> 3) C++11 Atomic backend for OrderAccess
>   - relies on standardized semantics to guarantee compiler and hardware
> reordering issues
>   - nevertheless isn't always flawless, and when it isn't, it gets painful
>
> Hope this sheds some light on the trade-offs.
>
> Thanks,
> /Erik
>
>
> On 2017-05-28 10:45, Andrew Haley wrote:
>>
>> On 27/05/17 10:10, Volker Simonis wrote:
>>>
>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>
>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>
>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>> order to write correct, lock free programs.
>>>>
>>>> Yes, but so do compiler barriers.
>>>
>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>> to prevent reordering by the compiler. However, this is a question of
>>> optimization. If you have two subsequent loads from the same address,
>>> the compiler is free to do only the first load and keep the value in a
>>> register if the address is not pointing to a volatile value.
>>
>> No it isn't: that is precisely what a compiler barrier prevents.  A
>> compiler barrier (from the POV of the compiler) clobbers all of
>> the memory state.  Neither reads nor writes may move past a compiler
>> barrier.
>>
>

From erik.osterlund at oracle.com  Mon May 29 17:56:22 2017
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Mon, 29 May 2017 19:56:22 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com> <C!
 A+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
Message-ID: <D6206F9C-6479-477D-8134-C0E989BA5E14@oracle.com>

Hi Volker,

Thank you for filling in more compiler info.

If there is a choice between providing a new compiler barrier interface and defining its semantics vs using existing volatile semantics, then volatile semantics seems better to me.

Also, my new Access API allows you to do BasicAccess<MO_VOLATILE>::load_oop(addr) to perform load_heap_oop and load_decode_heap_oop with volatile semantics. Sounds like that would help here.

Thanks,
/Erik

> On 29 May 2017, at 19:02, Volker Simonis <volker.simonis at gmail.com> wrote:
> 
> Hi Erik,
> 
> thanks for the nice summary. Just for the sake of completeness, here's
> the corresponding documentation for the xlc compiler barrier [1]. It
> kind of implements the gcc syntax, but the wording is slightly
> different:
> 
> "Add memory to the list of clobbered registers if assembler
> instructions can change a memory location in an unpredictable fashion.
> The memory clobber ensures that the data used after the completion of
> the assembly statement is valid and synchronized.
> However, the memory clobber can result in many unnecessary reloads,
> reducing the benefits of hardware prefetching. Thus, the memory
> clobber can impose a performance penalty and should be used with
> caution."
> 
> We haven't used it until now, so I can not say if it really does what
> it is supposed to do. I'm also concerned about the performance
> warning. It seems like the "unnecessary reloads" can really hurt on
> architectures like ppc which have much more registers than x86.
> Declaring a memory location 'volatile' seems much more simple and
> light-weight in order to achieve the desired effect. So I tend to
> agree with you and David that we should proceed to mark things with
> 'volatile'.
> 
> Sorry for constantly "spamming" this thread with another problem (i.e.
> JDK-8129440 [2]) but I still think that it is related and important.
> In its current state, the way how "load_heap_oop()" and its
> application works is broken. And this is not because of a problem in
> OrderAccess, but because of missing compiler barriers:
> 
> static inline oop       load_heap_oop(oop* p)       { return *p; }
> ...
> template <class T>
> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>  // 1. load 'heap_oop' from 'p'
>  T heap_oop = oopDesc::load_heap_oop(p);
>  if (!oopDesc::is_null(heap_oop)) {
>    // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
>    HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>    _cm->grayRoot(obj, hr);
>  }
> }
> 
> Notice that we don't need memory barriers here - all we need is to
> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
> time. After Andrews explanation (thanks for that!) and Martin's
> examples from Google, I think we could fix this by rewriting
> 'load_heap_oop()' (and friends) as follows:
> 
> static inline oop load_heap_oop(oop* p) {
>  oop o = *p;
>  __asm__ volatile ("" : : : "memory");
>  return o;
> }
> 
> In order to make this consistent across all platforms, we would
> probably have to introduce a new, public "compiler barrier" function
> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
> don't currently seem to have a cross-platform concept for
> "compiler-only barriers"). But I'm still not convinced that it would
> be better than simply writing (and that's the way how we've actually
> solved it internally):
> 
> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }
> 
> Declaring that single memory location to be 'volatile' seems to be a
> much more local change compared to globally "clobbering" all the
> memory. And it doesn't rely on a the compilers providing a compiler
> barrier. It does however rely on the compiler doing the "right thing"
> for volatile - but after all what has been said here so far, that
> seems more likely?
> 
> The problem may also depend on the specific compiler/cpu combination.
> For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
> for volatile variables - they don't insert any memory barriers (i.e.
> no instructions) but just access the corresponding variables as if
> there was a compiler barrier. This is exactly what we currently want
> in HotSpot, because fine-grained control of memory barriers is
> controlled by the use of OrderAccess (and OrderAccess implies
> "compiler barrier", at least after the latest fixes).
> 
> Any thoughts? Should we introduce a cross-platform, "compiler-only
> barrier" or should we stick to using "volatile" for such cases?
> 
> Regards,
> Volker
> 
> [1] https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
> [2] https://bugs.openjdk.java.net/browse/JDK-8129440
> 
> On Mon, May 29, 2017 at 2:20 PM, Erik ?sterlund
> <erik.osterlund at oracle.com> wrote:
>> Hi Andrew,
>> 
>> I just thought I'd put my opinions in here as I see I have been mentioned a
>> few times already.
>> 
>> First of all, I find using the volatile keyword on things that are involved
>> in lock-free protocols meaningful from a readability point of view. It
>> allows the reader of the code to see care is needed here.
>> 
>> About the compiler barriers - you are right. Volatile should indeed not be
>> necessary if the compiler barriers do everything right. The compiler should
>> not reorder things and it should not prevent reloading.
>> 
>> On windows we rely on the deprecated _ReadWriteBarrier(). According to MSDN,
>> it guarantees:
>> 
>> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that can
>> remove or reorder memory accesses across the point of the call."
>> 
>> This should cut it.
>> 
>> The GCC memory clobber is defined as:
>> 
>> "The "memory" clobber tells the compiler that the assembly code performs
>> memory reads or writes to items other than those listed in the input and
>> output operands (for example, accessing the memory pointed to by one of the
>> input parameters). To ensure memory contains correct values, GCC may need to
>> flush specific register values to memory before executing the asm. Further,
>> the compiler does not assume that any values read from memory before an asm
>> remain unchanged after that asm; it reloads them as needed. Using the
>> "memory" clobber effectively forms a read/write memory barrier for the
>> compiler."
>> 
>> This seems to only guarantee values will not be re-ordered. But in the
>> documentation for ExtendedAsm it also states:
>> 
>> "You will also want to add the volatile keyword if the memory affected is
>> not listed in the inputs or outputs of the asm, as the `memory' clobber does
>> not count as a side-effect of the asm."
>> 
>> and
>> 
>> "The volatile keyword indicates that the instruction has important
>> side-effects. GCC will not delete a volatile asm if it is reachable. (The
>> instruction can still be deleted if GCC can prove that control-flow will
>> never reach the location of the instruction.) Note that even a volatile asm
>> instruction can be moved relative to other code, including across jump
>> instructions."
>> 
>> This is a bit vague, but seems to suggest that by making the asm statement
>> volatile and having a memory clobber, it definitely will not reload
>> variables. About not re-ordering non-volatile accesses, it shouldn't but it
>> is not quite clearly stated. I have never observed such a re-ordering across
>> a volatile memory clobber. But the semantics seem a bit vague.
>> 
>> As for clang, the closest to a definition of what it does I have seen is:
>> 
>> "A clobber constraint is indicated by a ?~? prefix. A clobber does not
>> consume an input operand, nor generate an output. Clobbers cannot use any of
>> the general constraint code letters ? they may use only explicit register
>> constraints, e.g. ?~{eax}?. The one exception is that a clobber string of
>> ?~{memory}? indicates that the assembly writes to arbitrary undeclared
>> memory locations ? not only the memory pointed to by a declared indirect
>> output."
>> 
>> Apart from sweeping statements saying clang inline assembly is largely
>> compatible and working similar to GCC, I have not seen clear guarantees. And
>> then there are more compilers.
>> 
>> As a conclusion, by using volatile in addition to OrderAccess you rely on
>> standardized compiler semantics (at least for volatile-to-volatile
>> re-orderings and re-loading, but not for volatile-to-nonvolatile, but that's
>> another can of worms), and regrettably if you rely on OrderAccess memory
>> model doing what it says it will do, then it should indeed work without
>> volatile, but to make that work, OrderAccess relies on non-standardized
>> compiler-specific barriers. In practice it should work well on all our
>> supported compilers without volatile. And if it didn't, it would indeed be a
>> bug in OrderAccess that needs to be solved in OrderAccess.
>> 
>> Personally though, I am a helmet-on-synchronization kind of person, so I
>> would take precaution anyway and use volatile whenever possible, because 1)
>> it makes the code more readable, and 2) it provides one extra layer of
>> safety that is more standardized. It seems that over the years it has
>> happened multiple times that we assumed OrderAccess is bullet proof, and
>> then realized that it wasn't and observed a crash that would never have
>> happened if the code was written in a helmet-on-synchronization way. At
>> least that's how I feel about it.
>> 
>> Now one might argue that by using C++11 atomics that are standardized, all
>> these problems would go away as we would rely in standardized primitives and
>> then just trust the compiler. But then there could arise problems when the
>> C++ compiler decides to be less conservative than we want, e.g. by not doing
>> fence in sequentially consistent loads to optimize for non-multiple copy
>> atomic CPUs arguing that IRIW issues that violate sequential consistency are
>> non-issues in practice. That makes those loads "almost" sequentially
>> consistent, which might be good enough. But it feels good to have a choice
>> here to be more conservative. To have the synchronization helmet on.
>> 
>> Meta summary:
>> 1) Current OrderAccess without volatile:
>>  - should work, but relies on compiler-specific not standardized and
>> sometimes poorly documented compiler barriers.
>> 
>> 2) Current OrderAccess with volatile:
>>  - relies on standardized volatile semantics to guarantee compiler
>> reordering and reloading issues do not occur.
>> 
>> 3) C++11 Atomic backend for OrderAccess
>>  - relies on standardized semantics to guarantee compiler and hardware
>> reordering issues
>>  - nevertheless isn't always flawless, and when it isn't, it gets painful
>> 
>> Hope this sheds some light on the trade-offs.
>> 
>> Thanks,
>> /Erik
>> 
>> 
>>> On 2017-05-28 10:45, Andrew Haley wrote:
>>> 
>>>> On 27/05/17 10:10, Volker Simonis wrote:
>>>> 
>>>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>> 
>>>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>>> 
>>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>>> order to write correct, lock free programs.
>>>>> 
>>>>> Yes, but so do compiler barriers.
>>>> 
>>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>>> to prevent reordering by the compiler. However, this is a question of
>>>> optimization. If you have two subsequent loads from the same address,
>>>> the compiler is free to do only the first load and keep the value in a
>>>> register if the address is not pointing to a volatile value.
>>> 
>>> No it isn't: that is precisely what a compiler barrier prevents.  A
>>> compiler barrier (from the POV of the compiler) clobbers all of
>>> the memory state.  Neither reads nor writes may move past a compiler
>>> barrier.
>>> 
>> 


From david.holmes at oracle.com  Mon May 29 20:55:36 2017
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 30 May 2017 06:55:36 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
Message-ID: <561661eb-b463-726a-1d32-84ef5f32af13@oracle.com>

<trimming>

On 30/05/2017 3:02 AM, Volker Simonis wrote:
> Sorry for constantly "spamming" this thread with another problem (i.e.
> JDK-8129440 [2]) but I still think that it is related and important.
> In its current state, the way how "load_heap_oop()" and its
> application works is broken. And this is not because of a problem in
> OrderAccess, but because of missing compiler barriers:
> 
> static inline oop       load_heap_oop(oop* p)       { return *p; }
> ...
> template <class T>
> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>    // 1. load 'heap_oop' from 'p'
>    T heap_oop = oopDesc::load_heap_oop(p);
>    if (!oopDesc::is_null(heap_oop)) {
>      // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>      oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);

Do you mean that the compiler has not stashed heap_oop somewhere and 
re-executes oopDesc::load_heap_oop(p) again? That would be quite nasty I 
think in general as it breaks any logic that wants to read a non-local 
variable once to get it into a local and reuse that knowing that it 
won't change even if the real variable does!

David
-----

>      HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>      _cm->grayRoot(obj, hr);
>    }
> }
> 
> Notice that we don't need memory barriers here - all we need is to
> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
> time. After Andrews explanation (thanks for that!) and Martin's
> examples from Google, I think we could fix this by rewriting
> 'load_heap_oop()' (and friends) as follows:
> 
> static inline oop load_heap_oop(oop* p) {
>    oop o = *p;
>    __asm__ volatile ("" : : : "memory");
>    return o;
> }
> 
> In order to make this consistent across all platforms, we would
> probably have to introduce a new, public "compiler barrier" function
> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
> don't currently seem to have a cross-platform concept for
> "compiler-only barriers"). But I'm still not convinced that it would
> be better than simply writing (and that's the way how we've actually
> solved it internally):
> 
> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }
> 
> Declaring that single memory location to be 'volatile' seems to be a
> much more local change compared to globally "clobbering" all the
> memory. And it doesn't rely on a the compilers providing a compiler
> barrier. It does however rely on the compiler doing the "right thing"
> for volatile - but after all what has been said here so far, that
> seems more likely?
> 
> The problem may also depend on the specific compiler/cpu combination.
> For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
> for volatile variables - they don't insert any memory barriers (i.e.
> no instructions) but just access the corresponding variables as if
> there was a compiler barrier. This is exactly what we currently want
> in HotSpot, because fine-grained control of memory barriers is
> controlled by the use of OrderAccess (and OrderAccess implies
> "compiler barrier", at least after the latest fixes).
> 
> Any thoughts? Should we introduce a cross-platform, "compiler-only
> barrier" or should we stick to using "volatile" for such cases?
> 
> Regards,
> Volker
> 
> [1] https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
> [2] https://bugs.openjdk.java.net/browse/JDK-8129440
> 
> On Mon, May 29, 2017 at 2:20 PM, Erik ?sterlund
> <erik.osterlund at oracle.com> wrote:
>> Hi Andrew,
>>
>> I just thought I'd put my opinions in here as I see I have been mentioned a
>> few times already.
>>
>> First of all, I find using the volatile keyword on things that are involved
>> in lock-free protocols meaningful from a readability point of view. It
>> allows the reader of the code to see care is needed here.
>>
>> About the compiler barriers - you are right. Volatile should indeed not be
>> necessary if the compiler barriers do everything right. The compiler should
>> not reorder things and it should not prevent reloading.
>>
>> On windows we rely on the deprecated _ReadWriteBarrier(). According to MSDN,
>> it guarantees:
>>
>> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that can
>> remove or reorder memory accesses across the point of the call."
>>
>> This should cut it.
>>
>> The GCC memory clobber is defined as:
>>
>> "The "memory" clobber tells the compiler that the assembly code performs
>> memory reads or writes to items other than those listed in the input and
>> output operands (for example, accessing the memory pointed to by one of the
>> input parameters). To ensure memory contains correct values, GCC may need to
>> flush specific register values to memory before executing the asm. Further,
>> the compiler does not assume that any values read from memory before an asm
>> remain unchanged after that asm; it reloads them as needed. Using the
>> "memory" clobber effectively forms a read/write memory barrier for the
>> compiler."
>>
>> This seems to only guarantee values will not be re-ordered. But in the
>> documentation for ExtendedAsm it also states:
>>
>> "You will also want to add the volatile keyword if the memory affected is
>> not listed in the inputs or outputs of the asm, as the `memory' clobber does
>> not count as a side-effect of the asm."
>>
>> and
>>
>> "The volatile keyword indicates that the instruction has important
>> side-effects. GCC will not delete a volatile asm if it is reachable. (The
>> instruction can still be deleted if GCC can prove that control-flow will
>> never reach the location of the instruction.) Note that even a volatile asm
>> instruction can be moved relative to other code, including across jump
>> instructions."
>>
>> This is a bit vague, but seems to suggest that by making the asm statement
>> volatile and having a memory clobber, it definitely will not reload
>> variables. About not re-ordering non-volatile accesses, it shouldn't but it
>> is not quite clearly stated. I have never observed such a re-ordering across
>> a volatile memory clobber. But the semantics seem a bit vague.
>>
>> As for clang, the closest to a definition of what it does I have seen is:
>>
>> "A clobber constraint is indicated by a ?~? prefix. A clobber does not
>> consume an input operand, nor generate an output. Clobbers cannot use any of
>> the general constraint code letters ? they may use only explicit register
>> constraints, e.g. ?~{eax}?. The one exception is that a clobber string of
>> ?~{memory}? indicates that the assembly writes to arbitrary undeclared
>> memory locations ? not only the memory pointed to by a declared indirect
>> output."
>>
>> Apart from sweeping statements saying clang inline assembly is largely
>> compatible and working similar to GCC, I have not seen clear guarantees. And
>> then there are more compilers.
>>
>> As a conclusion, by using volatile in addition to OrderAccess you rely on
>> standardized compiler semantics (at least for volatile-to-volatile
>> re-orderings and re-loading, but not for volatile-to-nonvolatile, but that's
>> another can of worms), and regrettably if you rely on OrderAccess memory
>> model doing what it says it will do, then it should indeed work without
>> volatile, but to make that work, OrderAccess relies on non-standardized
>> compiler-specific barriers. In practice it should work well on all our
>> supported compilers without volatile. And if it didn't, it would indeed be a
>> bug in OrderAccess that needs to be solved in OrderAccess.
>>
>> Personally though, I am a helmet-on-synchronization kind of person, so I
>> would take precaution anyway and use volatile whenever possible, because 1)
>> it makes the code more readable, and 2) it provides one extra layer of
>> safety that is more standardized. It seems that over the years it has
>> happened multiple times that we assumed OrderAccess is bullet proof, and
>> then realized that it wasn't and observed a crash that would never have
>> happened if the code was written in a helmet-on-synchronization way. At
>> least that's how I feel about it.
>>
>> Now one might argue that by using C++11 atomics that are standardized, all
>> these problems would go away as we would rely in standardized primitives and
>> then just trust the compiler. But then there could arise problems when the
>> C++ compiler decides to be less conservative than we want, e.g. by not doing
>> fence in sequentially consistent loads to optimize for non-multiple copy
>> atomic CPUs arguing that IRIW issues that violate sequential consistency are
>> non-issues in practice. That makes those loads "almost" sequentially
>> consistent, which might be good enough. But it feels good to have a choice
>> here to be more conservative. To have the synchronization helmet on.
>>
>> Meta summary:
>> 1) Current OrderAccess without volatile:
>>    - should work, but relies on compiler-specific not standardized and
>> sometimes poorly documented compiler barriers.
>>
>> 2) Current OrderAccess with volatile:
>>    - relies on standardized volatile semantics to guarantee compiler
>> reordering and reloading issues do not occur.
>>
>> 3) C++11 Atomic backend for OrderAccess
>>    - relies on standardized semantics to guarantee compiler and hardware
>> reordering issues
>>    - nevertheless isn't always flawless, and when it isn't, it gets painful
>>
>> Hope this sheds some light on the trade-offs.
>>
>> Thanks,
>> /Erik
>>
>>
>> On 2017-05-28 10:45, Andrew Haley wrote:
>>>
>>> On 27/05/17 10:10, Volker Simonis wrote:
>>>>
>>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>>
>>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>>
>>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>>> order to write correct, lock free programs.
>>>>>
>>>>> Yes, but so do compiler barriers.
>>>>
>>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>>> to prevent reordering by the compiler. However, this is a question of
>>>> optimization. If you have two subsequent loads from the same address,
>>>> the compiler is free to do only the first load and keep the value in a
>>>> register if the address is not pointing to a volatile value.
>>>
>>> No it isn't: that is precisely what a compiler barrier prevents.  A
>>> compiler barrier (from the POV of the compiler) clobbers all of
>>> the memory state.  Neither reads nor writes may move past a compiler
>>> barrier.
>>>
>>

From aph at redhat.com  Tue May 30 08:50:36 2017
From: aph at redhat.com (Andrew Haley)
Date: Tue, 30 May 2017 09:50:36 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <592C120A.1080908@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
Message-ID: <6e340bdf-da1a-5a6b-6db4-8d040cdf0b54@redhat.com>

On 29/05/17 13:20, Erik ?sterlund wrote:

> As a conclusion, by using volatile in addition to OrderAccess you
> rely on standardized compiler semantics (at least for
> volatile-to-volatile re-orderings and re-loading, but not for
> volatile-to-nonvolatile, but that's another can of worms), and
> regrettably if you rely on OrderAccess memory model doing what it
> says it will do, then it should indeed work without volatile, but to
> make that work, OrderAccess relies on non-standardized
> compiler-specific barriers. In practice it should work well on all
> our supported compilers without volatile. And if it didn't, it would
> indeed be a bug in OrderAccess that needs to be solved in
> OrderAccess.

Right.  And, target by target, we can rework OrderAccess to use real
C++11 atomics, and everything will be better.  Eventually.

It's important that we do so because racy accesses are undefined
behaviour in C++11.  (And, arguably, before that, but I'm not going to
go there.)

> Personally though, I am a helmet-on-synchronization kind of person,
> so I would take precaution anyway and use volatile whenever
> possible, because 1) it makes the code more readable, and 2) it
> provides one extra layer of safety that is more standardized. It
> seems that over the years it has happened multiple times that we
> assumed OrderAccess is bullet proof, and then realized that it
> wasn't and observed a crash that would never have happened if the
> code was written in a helmet-on-synchronization way. At least that's
> how I feel about it.

I have no problem with that.  What I *do* have a problem with is the
use of volatile to fix bugs that really need to be corrected with
proper barriers.

> Now one might argue that by using C++11 atomics that are
> standardized, all these problems would go away as we would rely in
> standardized primitives and then just trust the compiler.

And I absolutely do argue that.  In fact, it is the only correct way
to go with C++11 compilers.  IMO.

> But then there could arise problems when the C++ compiler decides to
> be less conservative than we want, e.g. by not doing fence in
> sequentially consistent loads to optimize for non-multiple copy
> atomic CPUs arguing that IRIW issues that violate sequential
> consistency are non-issues in practice.

A C++ compiler will not decide to do that.  C++ compiler authors know
well enough what sequential consistency means.  Besides, if there is
any idiom in the JVM that actually requires IRIW we should remove it
as soon as possible.

> That makes those loads "almost" sequentially consistent, which might
> be good enough. But it feels good to have a choice here to be more
> conservative.  To have the synchronization helmet on.

I have no real problem with that.  Using volatile has the problem,
from my point of view, that it might conceal bugs that would be
revealed on a weakly-ordered machine that you or I then have to fix,
but I can live with it.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Tue May 30 09:04:58 2017
From: aph at redhat.com (Andrew Haley)
Date: Tue, 30 May 2017 10:04:58 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
Message-ID: <0a79603e-6c6d-8eaf-6fe9-0774317f0560@redhat.com>

On 29/05/17 18:02, Volker Simonis wrote:

> "Add memory to the list of clobbered registers if assembler
> instructions can change a memory location in an unpredictable fashion.
> The memory clobber ensures that the data used after the completion of
> the assembly statement is valid and synchronized.
> However, the memory clobber can result in many unnecessary reloads,
> reducing the benefits of hardware prefetching. Thus, the memory
> clobber can impose a performance penalty and should be used with
> caution."
> 
> We haven't used it until now, so I can not say if it really does what
> it is supposed to do. I'm also concerned about the performance
> warning. It seems like the "unnecessary reloads" can really hurt on
> architectures like ppc which have much more registers than x86.
> Declaring a memory location 'volatile' seems much more simple and
> light-weight in order to achieve the desired effect. So I tend to
> agree with you and David that we should proceed to mark things with
> 'volatile'.

If volatile is what is needed, yes.  The problem that we're discussing
is that on x86, OrderAccess was actually incorrect: it should work
with all accesses, not just volatile ones.  The addition of volatile
was potentially papering over a bug.

> Sorry for constantly "spamming" this thread with another problem (i.e.
> JDK-8129440 [2]) but I still think that it is related and important.
> In its current state, the way how "load_heap_oop()" and its
> application works is broken. And this is not because of a problem in
> OrderAccess, but because of missing compiler barriers:
> 
> static inline oop       load_heap_oop(oop* p)       { return *p; }
> ...
> template <class T>
> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>   // 1. load 'heap_oop' from 'p'
>   T heap_oop = oopDesc::load_heap_oop(p);
>   if (!oopDesc::is_null(heap_oop)) {
>     // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>     oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
>     HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>     _cm->grayRoot(obj, hr);
>   }
> }
> 
> Notice that we don't need memory barriers here - all we need is to
> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
> time. After Andrews explanation (thanks for that!) and Martin's
> examples from Google, I think we could fix this by rewriting
> 'load_heap_oop()' (and friends) as follows:
> 
> static inline oop load_heap_oop(oop* p) {
>   oop o = *p;
>   __asm__ volatile ("" : : : "memory");
>   return o;
> }

I wouldn't do that: it's much too violent an action because it
clobbers all of memory.  You don't want to do it every time anyone
reads an oop from the heap.

> In order to make this consistent across all platforms, we would
> probably have to introduce a new, public "compiler barrier" function
> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
> don't currently seem to have a cross-platform concept for
> "compiler-only barriers"). But I'm still not convinced that it would
> be better than simply writing (and that's the way how we've actually
> solved it internally):
> 
> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }

That looks better.  It's still UB post-C++11, but it should be OK.

> Declaring that single memory location to be 'volatile' seems to be a
> much more local change compared to globally "clobbering" all the
> memory. And it doesn't rely on a the compilers providing a compiler
> barrier. It does however rely on the compiler doing the "right thing"
> for volatile - but after all what has been said here so far, that
> seems more likely?

It does.  The problem here is that the compiler is not being told what
is going on, and as the saying goes 'If you lie to the compiler, it
will get its revenge.'

> Any thoughts? Should we introduce a cross-platform, "compiler-only
> barrier" or should we stick to using "volatile" for such cases?

Eventually it will have to be C++11 atomics, which give you exactly
the language you need to express this stuff.  The above would be a
relaxed atomic load.

Andrew.

From aph at redhat.com  Tue May 30 09:59:21 2017
From: aph at redhat.com (Andrew Haley)
Date: Tue, 30 May 2017 10:59:21 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <0a79603e-6c6d-8eaf-6fe9-0774317f0560@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
 <0a79603e-6c6d-8eaf-6fe9-0774317f0560@redhat.com>
Message-ID: <a05964e1-1362-b478-f080-40212b57a9a4@redhat.com>

Just to be clear on my position: we want to constrain the compiler as
little as we can, while maintaining correctness.  x86 OrderAccess was
incorrect, so compiler barriers had to be added.  Where volatile is
sufficient we should use that today, but bear in mind that racy
accesses are now undefined behaviour in C++.

Andrew.

From erik.osterlund at oracle.com  Tue May 30 10:57:52 2017
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 30 May 2017 12:57:52 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <6e340bdf-da1a-5a6b-6db4-8d040cdf0b54@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <6e340bdf-da1a-5a6b-6db4-8d040cdf0b54@redhat.com>
Message-ID: <592D5030.2020904@oracle.com>


On 2017-05-30 10:50, Andrew Haley wrote:
> On 29/05/17 13:20, Erik ?sterlund wrote:
>
>> As a conclusion, by using volatile in addition to OrderAccess you
>> rely on standardized compiler semantics (at least for
>> volatile-to-volatile re-orderings and re-loading, but not for
>> volatile-to-nonvolatile, but that's another can of worms), and
>> regrettably if you rely on OrderAccess memory model doing what it
>> says it will do, then it should indeed work without volatile, but to
>> make that work, OrderAccess relies on non-standardized
>> compiler-specific barriers. In practice it should work well on all
>> our supported compilers without volatile. And if it didn't, it would
>> indeed be a bug in OrderAccess that needs to be solved in
>> OrderAccess.
> Right.  And, target by target, we can rework OrderAccess to use real
> C++11 atomics, and everything will be better.  Eventually.

I do not completely disagree, but I see drawbacks with that too. I am 
not convinced C++11 is a silver bullet.
Note that we lose some explicit control that might end up biting us. And 
when it does, it will be even harder to detect as we have sold ourselves 
to the C++11 atomic silver bullet, abstracting away the generated code.

For example, C++11 atomic accesses were designed to play nicely with 
other C++11 atomic accesses. Both the load-side and store-side have to 
look in very specific ways for e.g. the seq_cst semantics to hold. For 
example depending if you want seq_cst to have IRIW constraints or not, 
some PPC compiler could choose to have the sync instruction on either 
the load side or the store side. Since all seq_cst accesses are 
controlled by C++11 and go through their compiler, they can make that 
choice as the accesses stay inside of their "ABI". But the choice needs 
to be consistent with the choice we make in the JVM and our hand crafted 
assembly. That is, our hand crafted assembly code has to go by the same 
"ABI". And we can no longer guarantee it does as we have lost control 
over what instructions are generated.

One concrete example that comes to mind is the JNIFastGetField 
optimization on ARMv7.
The memory model of ARMv7 does not respect causality between loads and 
stores. Therefore, in theory (and maybe in practice), problems can arise 
when three threads are involved in a synchronization dance where 
consistent causality chains are assumed.

In the JNIFastGetField optimization we do the following with hand coded 
assembly:
1) load safepoint counter (written by VM thread)
2) speculatively load primitive value from object (possibly clobbered by 
a GC thread)
3) load safepoint counter again (written by VM thread) and check it did 
not change

These loads are all normal loads in hand coded assembly.

Now for this synchronization to work, it is assumed that if the store to 
the safepoint counter observed at 1 happens-before the store observed by 
the speculatively read primitive value from 2). Due to the lack of 
causality in the memory model, this is explicitly not guaranteed to hold 
with normal loads and stores on ARMv7, and hence unless we had proper 
synchronization in the runtime, we could observe clobbered values from 
these optimized JNI getters and think they are okay. But since our 
OrderAccess::fence translates to dmb sy specifically (which is 
conservative), the store will bubble up to the top level of the 
hierarchical memory model of ARMv7, and therefore we can break the 
pathological causality chain issue in the JVM by issuing 
OrderAccess::fence when storing the safepoint counter values. That way, 
our hand crafted assembly will work with normal loads in the fast path. 
If OrderAccess::fence translated to anything else than dmb sy, this 
would break. So if we went with the proposed C++11 mappings from 
https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html then e.g. a 
seq_cst store on ARMv7 would translate into dmb ish; str; dmb ish, and 
that would not suffice to break the causality chain.

Of course, it might be that the OS layer saved us from the above 
pathology anyway, either intentionally or unintentionally, but that is 
besides the point I am trying to make. The point is that the hand coded 
assembly and other dynamically generated code has to emit accesses that 
are compatible with the machine code generated by OrderAccess in the 
runtime. And when we give that control away to C++11 and abstract away 
the generated machine code, things might go horribly wrong in the most 
unexpected and obscure ways that I think would be a nightmare to debug.

Having said that, I am not convinced C++11 is not a good idea either. I 
would just like to balance out the view that C++11 is a synchronization 
silver bullet for the JVM that is simply a superior solution without any 
pitfalls and that doing anything else is wrong. There are things to be 
considered there as well, like the extent of possible ABI incompatibilities.

> It's important that we do so because racy accesses are undefined
> behaviour in C++11.  (And, arguably, before that, but I'm not going to
> go there.)

What paragraph are we referring to here that would break OrderAccess in 
C++11?

>> Personally though, I am a helmet-on-synchronization kind of person,
>> so I would take precaution anyway and use volatile whenever
>> possible, because 1) it makes the code more readable, and 2) it
>> provides one extra layer of safety that is more standardized. It
>> seems that over the years it has happened multiple times that we
>> assumed OrderAccess is bullet proof, and then realized that it
>> wasn't and observed a crash that would never have happened if the
>> code was written in a helmet-on-synchronization way. At least that's
>> how I feel about it.
> I have no problem with that.  What I *do* have a problem with is the
> use of volatile to fix bugs that really need to be corrected with
> proper barriers.

I think you misunderstood me here. I did not propose to use volatile so 
we don't have to fix bugs in OrderAccess. Conversely, I said if we find 
such issues, we should definitely fix them in OrderAccess. But despite 
that, I personally like the pragmatic safety approach, and would use 
volatile in my lock-free code anyway to make it a) more readable, and b) 
provide an extra level of safety against our increasingly aggressive 
compilers. It's like wearing a helmet when biking. You don't expect to 
fall and should not fall, but why take risks if you don't have to and 
there is an easy way of preventing disaster if that happens. At least 
that's how I think about it myself.

>> Now one might argue that by using C++11 atomics that are
>> standardized, all these problems would go away as we would rely in
>> standardized primitives and then just trust the compiler.
> And I absolutely do argue that.  In fact, it is the only correct way
> to go with C++11 compilers.  IMO.

Not entirely convinced that statement is true as I think I mentioned 
previously.

>> But then there could arise problems when the C++ compiler decides to
>> be less conservative than we want, e.g. by not doing fence in
>> sequentially consistent loads to optimize for non-multiple copy
>> atomic CPUs arguing that IRIW issues that violate sequential
>> consistency are non-issues in practice.
> A C++ compiler will not decide to do that.  C++ compiler authors know
> well enough what sequential consistency means.  Besides, if there is
> any idiom in the JVM that actually requires IRIW we should remove it
> as soon as possible.

With the risk of going slightly off topic... C++ compiler authors have 
indeed done that in the past. And I have a huge problem with this. I 
think the exposed model semantics need to be respected. If the model 
says seq_cst, then the generated code should be seq_cst and not "almost" 
seq_cst. You can't expect users of a memory model to have to know that 
it intentionally (rather than accidentally) violates the guarantees 
because it was considered close enough and that nobody should be able to 
observe the difference in practice. (don't get me started on that one)

The issue is not whether an algorithm depends on IRIW or not. The issue 
is that we have to explicitly reason about IRIW to prove that it works. 
The lack of IRIW violates seq_cst and by extension linearizaiton points 
that rely in seq_cst, and by extension algorithms that rely on 
linearization points. By breaking the very building blocks that were 
used to reason about algorithms and their correctness, we rely on chance 
for it to work. The algorithm may or may not work. It probably does work 
without IRIW constraints in the vast majority of cases. But we have to 
explicitly reason about that expanded state machine of possible races 
caused by IRIW issues to actually know that it works rather than leaving 
it to chance. Reasoning about this extended state machine can take a lot 
of work and puts the bar unreasonably high for writing synchronized code 
in my opinion. And I think the alternative of leaving it to chance 
(albeit with good odds) seems like an unfortunate choice.

>> That makes those loads "almost" sequentially consistent, which might
>> be good enough. But it feels good to have a choice here to be more
>> conservative.  To have the synchronization helmet on.
> I have no real problem with that.  Using volatile has the problem,
> from my point of view, that it might conceal bugs that would be
> revealed on a weakly-ordered machine that you or I then have to fix,
> but I can live with it.

I do not see how using volatile has anything to do with weakly ordered 
machines. We use it where it is compiler reorderings specifically that 
need to be prevented.
If it is not just a compiler reordering that needed to be prevented, 
then of course the use of volatile is incorrect and a bug.

Either way, relying on C++11 atomics might also conceal bugs that would 
be revealed on a weakly-ordered machine due to conflicting ABIs between 
the statically generated C++ code and the dynamically generated code, as 
previously mentioned.

Thanks,
/Erik

From aph at redhat.com  Tue May 30 12:45:41 2017
From: aph at redhat.com (Andrew Haley)
Date: Tue, 30 May 2017 13:45:41 +0100
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <592D5030.2020904@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <6e340bdf-da1a-5a6b-6db4-8d040cdf0b54@redhat.com>
 <592D5030.2020904@oracle.com>
Message-ID: <64fda788-4ff6-978e-4476-a5af36bd708a@redhat.com>

On 30/05/17 11:57, Erik ?sterlund wrote:
> 
> Having said that, I am not convinced C++11 is not a good idea either. I 
> would just like to balance out the view that C++11 is a synchronization 
> silver bullet for the JVM that is simply a superior solution without any 
> pitfalls and that doing anything else is wrong. There are things to be 
> considered there as well, like the extent of possible ABI incompatibilities.

Sure, and I appreciate your comment, but you never got the idea that
using C++11 atomics is a synchronization silver bullet from me: IMO
it's necessary but (probably) not sufficient.

>> It's important that we do so because racy accesses are undefined
>> behaviour in C++11.  (And, arguably, before that, but I'm not going to
>> go there.)
> 
> What paragraph are we referring to here that would break OrderAccess
> in C++11?

Nowhere: it's racy accesses *without* synchronization that are UB.

>>> Personally though, I am a helmet-on-synchronization kind of person,
>>> so I would take precaution anyway and use volatile whenever
>>> possible, because 1) it makes the code more readable, and 2) it
>>> provides one extra layer of safety that is more standardized. It
>>> seems that over the years it has happened multiple times that we
>>> assumed OrderAccess is bullet proof, and then realized that it
>>> wasn't and observed a crash that would never have happened if the
>>> code was written in a helmet-on-synchronization way. At least that's
>>> how I feel about it.
>>
>> I have no problem with that.  What I *do* have a problem with is the
>> use of volatile to fix bugs that really need to be corrected with
>> proper barriers.
> 
> I think you misunderstood me here. I did not propose to use volatile so 
> we don't have to fix bugs in OrderAccess. Conversely, I said if we find 
> such issues, we should definitely fix them in OrderAccess. But despite 
> that, I personally like the pragmatic safety approach, and would use 
> volatile in my lock-free code anyway to make it a) more readable, and b) 
> provide an extra level of safety against our increasingly aggressive 
> compilers. It's like wearing a helmet when biking. You don't expect to 
> fall and should not fall, but why take risks if you don't have to and 
> there is an easy way of preventing disaster if that happens. At least 
> that's how I think about it myself.

As I said, I have no problem with that.  I'm happy with that
justification for volatile, even when not strictly necessary, as long
as it's not done in places that would significantly impede
performance.  I'm sure you would agree with that anyway.

You have to remember where this discussion started, which was a
proposed use of volatile to fix a bug where a barrier was needed.

> The issue is not whether an algorithm depends on IRIW or not. The issue 
> is that we have to explicitly reason about IRIW to prove that it works. 
> The lack of IRIW violates seq_cst and by extension linearizaiton points 
> that rely in seq_cst, and by extension algorithms that rely on 
> linearization points. By breaking the very building blocks that were 
> used to reason about algorithms and their correctness, we rely on chance 
> for it to work. The algorithm may or may not work. It probably does work 
> without IRIW constraints in the vast majority of cases. But we have to 
> explicitly reason about that expanded state machine of possible races 
> caused by IRIW issues to actually know that it works rather than leaving 
> it to chance. Reasoning about this extended state machine can take a lot 
> of work and puts the bar unreasonably high for writing synchronized code 
> in my opinion. And I think the alternative of leaving it to chance 
> (albeit with good odds) seems like an unfortunate choice.

Sure, and I know all of that, and it sounds like you are arguing with
a point that someone else made.  Where sequential consistency really
is required in the VM, seq_cst in the C++ compiler must really be
seq_cst.  But whoever thought otherwise?  No-one, as far as I know.

> Either way, relying on C++11 atomics might also conceal bugs that would 
> be revealed on a weakly-ordered machine due to conflicting ABIs between 
> the statically generated C++ code and the dynamically generated code, as 
> previously mentioned.

We have to make sure that the generated code is ABI-compatible, of
course.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From volker.simonis at gmail.com  Tue May 30 15:37:18 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Tue, 30 May 2017 17:37:18 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <D6206F9C-6479-477D-8134-C0E989BA5E14@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
 <D6206F9C-6479-477D-8134-C0E989BA5E14@oracle.com>
Message-ID: <CA+3eh11VcFMZ_3Cfnk8aN+3mQOBmOjAKs1Q0MdD2P1e1hsn=pw@mail.gmail.com>

On Mon, May 29, 2017 at 7:56 PM, Erik Osterlund
<erik.osterlund at oracle.com> wrote:
> Hi Volker,
>
> Thank you for filling in more compiler info.
>
> If there is a choice between providing a new compiler barrier interface and defining its semantics vs using existing volatile semantics, then volatile semantics seems better to me.
>
> Also, my new Access API allows you to do BasicAccess<MO_VOLATILE>::load_oop(addr) to perform load_heap_oop and load_decode_heap_oop with volatile semantics. Sounds like that would help here.

Sorry for my ignorance, but what is the "new Access API" and
"BasicAccess<MO_VOLATILE>"? It actually sounds quite interesting :)

>
> Thanks,
> /Erik
>
>> On 29 May 2017, at 19:02, Volker Simonis <volker.simonis at gmail.com> wrote:
>>
>> Hi Erik,
>>
>> thanks for the nice summary. Just for the sake of completeness, here's
>> the corresponding documentation for the xlc compiler barrier [1]. It
>> kind of implements the gcc syntax, but the wording is slightly
>> different:
>>
>> "Add memory to the list of clobbered registers if assembler
>> instructions can change a memory location in an unpredictable fashion.
>> The memory clobber ensures that the data used after the completion of
>> the assembly statement is valid and synchronized.
>> However, the memory clobber can result in many unnecessary reloads,
>> reducing the benefits of hardware prefetching. Thus, the memory
>> clobber can impose a performance penalty and should be used with
>> caution."
>>
>> We haven't used it until now, so I can not say if it really does what
>> it is supposed to do. I'm also concerned about the performance
>> warning. It seems like the "unnecessary reloads" can really hurt on
>> architectures like ppc which have much more registers than x86.
>> Declaring a memory location 'volatile' seems much more simple and
>> light-weight in order to achieve the desired effect. So I tend to
>> agree with you and David that we should proceed to mark things with
>> 'volatile'.
>>
>> Sorry for constantly "spamming" this thread with another problem (i.e.
>> JDK-8129440 [2]) but I still think that it is related and important.
>> In its current state, the way how "load_heap_oop()" and its
>> application works is broken. And this is not because of a problem in
>> OrderAccess, but because of missing compiler barriers:
>>
>> static inline oop       load_heap_oop(oop* p)       { return *p; }
>> ...
>> template <class T>
>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>>  // 1. load 'heap_oop' from 'p'
>>  T heap_oop = oopDesc::load_heap_oop(p);
>>  if (!oopDesc::is_null(heap_oop)) {
>>    // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>>    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
>>    HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>>    _cm->grayRoot(obj, hr);
>>  }
>> }
>>
>> Notice that we don't need memory barriers here - all we need is to
>> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
>> time. After Andrews explanation (thanks for that!) and Martin's
>> examples from Google, I think we could fix this by rewriting
>> 'load_heap_oop()' (and friends) as follows:
>>
>> static inline oop load_heap_oop(oop* p) {
>>  oop o = *p;
>>  __asm__ volatile ("" : : : "memory");
>>  return o;
>> }
>>
>> In order to make this consistent across all platforms, we would
>> probably have to introduce a new, public "compiler barrier" function
>> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
>> don't currently seem to have a cross-platform concept for
>> "compiler-only barriers"). But I'm still not convinced that it would
>> be better than simply writing (and that's the way how we've actually
>> solved it internally):
>>
>> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }
>>
>> Declaring that single memory location to be 'volatile' seems to be a
>> much more local change compared to globally "clobbering" all the
>> memory. And it doesn't rely on a the compilers providing a compiler
>> barrier. It does however rely on the compiler doing the "right thing"
>> for volatile - but after all what has been said here so far, that
>> seems more likely?
>>
>> The problem may also depend on the specific compiler/cpu combination.
>> For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
>> for volatile variables - they don't insert any memory barriers (i.e.
>> no instructions) but just access the corresponding variables as if
>> there was a compiler barrier. This is exactly what we currently want
>> in HotSpot, because fine-grained control of memory barriers is
>> controlled by the use of OrderAccess (and OrderAccess implies
>> "compiler barrier", at least after the latest fixes).
>>
>> Any thoughts? Should we introduce a cross-platform, "compiler-only
>> barrier" or should we stick to using "volatile" for such cases?
>>
>> Regards,
>> Volker
>>
>> [1] https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
>> [2] https://bugs.openjdk.java.net/browse/JDK-8129440
>>
>> On Mon, May 29, 2017 at 2:20 PM, Erik ?sterlund
>> <erik.osterlund at oracle.com> wrote:
>>> Hi Andrew,
>>>
>>> I just thought I'd put my opinions in here as I see I have been mentioned a
>>> few times already.
>>>
>>> First of all, I find using the volatile keyword on things that are involved
>>> in lock-free protocols meaningful from a readability point of view. It
>>> allows the reader of the code to see care is needed here.
>>>
>>> About the compiler barriers - you are right. Volatile should indeed not be
>>> necessary if the compiler barriers do everything right. The compiler should
>>> not reorder things and it should not prevent reloading.
>>>
>>> On windows we rely on the deprecated _ReadWriteBarrier(). According to MSDN,
>>> it guarantees:
>>>
>>> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that can
>>> remove or reorder memory accesses across the point of the call."
>>>
>>> This should cut it.
>>>
>>> The GCC memory clobber is defined as:
>>>
>>> "The "memory" clobber tells the compiler that the assembly code performs
>>> memory reads or writes to items other than those listed in the input and
>>> output operands (for example, accessing the memory pointed to by one of the
>>> input parameters). To ensure memory contains correct values, GCC may need to
>>> flush specific register values to memory before executing the asm. Further,
>>> the compiler does not assume that any values read from memory before an asm
>>> remain unchanged after that asm; it reloads them as needed. Using the
>>> "memory" clobber effectively forms a read/write memory barrier for the
>>> compiler."
>>>
>>> This seems to only guarantee values will not be re-ordered. But in the
>>> documentation for ExtendedAsm it also states:
>>>
>>> "You will also want to add the volatile keyword if the memory affected is
>>> not listed in the inputs or outputs of the asm, as the `memory' clobber does
>>> not count as a side-effect of the asm."
>>>
>>> and
>>>
>>> "The volatile keyword indicates that the instruction has important
>>> side-effects. GCC will not delete a volatile asm if it is reachable. (The
>>> instruction can still be deleted if GCC can prove that control-flow will
>>> never reach the location of the instruction.) Note that even a volatile asm
>>> instruction can be moved relative to other code, including across jump
>>> instructions."
>>>
>>> This is a bit vague, but seems to suggest that by making the asm statement
>>> volatile and having a memory clobber, it definitely will not reload
>>> variables. About not re-ordering non-volatile accesses, it shouldn't but it
>>> is not quite clearly stated. I have never observed such a re-ordering across
>>> a volatile memory clobber. But the semantics seem a bit vague.
>>>
>>> As for clang, the closest to a definition of what it does I have seen is:
>>>
>>> "A clobber constraint is indicated by a ?~? prefix. A clobber does not
>>> consume an input operand, nor generate an output. Clobbers cannot use any of
>>> the general constraint code letters ? they may use only explicit register
>>> constraints, e.g. ?~{eax}?. The one exception is that a clobber string of
>>> ?~{memory}? indicates that the assembly writes to arbitrary undeclared
>>> memory locations ? not only the memory pointed to by a declared indirect
>>> output."
>>>
>>> Apart from sweeping statements saying clang inline assembly is largely
>>> compatible and working similar to GCC, I have not seen clear guarantees. And
>>> then there are more compilers.
>>>
>>> As a conclusion, by using volatile in addition to OrderAccess you rely on
>>> standardized compiler semantics (at least for volatile-to-volatile
>>> re-orderings and re-loading, but not for volatile-to-nonvolatile, but that's
>>> another can of worms), and regrettably if you rely on OrderAccess memory
>>> model doing what it says it will do, then it should indeed work without
>>> volatile, but to make that work, OrderAccess relies on non-standardized
>>> compiler-specific barriers. In practice it should work well on all our
>>> supported compilers without volatile. And if it didn't, it would indeed be a
>>> bug in OrderAccess that needs to be solved in OrderAccess.
>>>
>>> Personally though, I am a helmet-on-synchronization kind of person, so I
>>> would take precaution anyway and use volatile whenever possible, because 1)
>>> it makes the code more readable, and 2) it provides one extra layer of
>>> safety that is more standardized. It seems that over the years it has
>>> happened multiple times that we assumed OrderAccess is bullet proof, and
>>> then realized that it wasn't and observed a crash that would never have
>>> happened if the code was written in a helmet-on-synchronization way. At
>>> least that's how I feel about it.
>>>
>>> Now one might argue that by using C++11 atomics that are standardized, all
>>> these problems would go away as we would rely in standardized primitives and
>>> then just trust the compiler. But then there could arise problems when the
>>> C++ compiler decides to be less conservative than we want, e.g. by not doing
>>> fence in sequentially consistent loads to optimize for non-multiple copy
>>> atomic CPUs arguing that IRIW issues that violate sequential consistency are
>>> non-issues in practice. That makes those loads "almost" sequentially
>>> consistent, which might be good enough. But it feels good to have a choice
>>> here to be more conservative. To have the synchronization helmet on.
>>>
>>> Meta summary:
>>> 1) Current OrderAccess without volatile:
>>>  - should work, but relies on compiler-specific not standardized and
>>> sometimes poorly documented compiler barriers.
>>>
>>> 2) Current OrderAccess with volatile:
>>>  - relies on standardized volatile semantics to guarantee compiler
>>> reordering and reloading issues do not occur.
>>>
>>> 3) C++11 Atomic backend for OrderAccess
>>>  - relies on standardized semantics to guarantee compiler and hardware
>>> reordering issues
>>>  - nevertheless isn't always flawless, and when it isn't, it gets painful
>>>
>>> Hope this sheds some light on the trade-offs.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>
>>>> On 2017-05-28 10:45, Andrew Haley wrote:
>>>>
>>>>> On 27/05/17 10:10, Volker Simonis wrote:
>>>>>
>>>>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>>>
>>>>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>>>>
>>>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>>>> order to write correct, lock free programs.
>>>>>>
>>>>>> Yes, but so do compiler barriers.
>>>>>
>>>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>>>> to prevent reordering by the compiler. However, this is a question of
>>>>> optimization. If you have two subsequent loads from the same address,
>>>>> the compiler is free to do only the first load and keep the value in a
>>>>> register if the address is not pointing to a volatile value.
>>>>
>>>> No it isn't: that is precisely what a compiler barrier prevents.  A
>>>> compiler barrier (from the POV of the compiler) clobbers all of
>>>> the memory state.  Neither reads nor writes may move past a compiler
>>>> barrier.
>>>>
>>>
>

From volker.simonis at gmail.com  Tue May 30 15:40:36 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Tue, 30 May 2017 17:40:36 +0200
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <561661eb-b463-726a-1d32-84ef5f32af13@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
 <561661eb-b463-726a-1d32-84ef5f32af13@oracle.com>
Message-ID: <CA+3eh13RKPTzMgiMkgfRu=7exaWRfaLeztmyT0okDjO2_i0Rmg@mail.gmail.com>

On Mon, May 29, 2017 at 10:55 PM, David Holmes <david.holmes at oracle.com> wrote:
> <trimming>
>
> On 30/05/2017 3:02 AM, Volker Simonis wrote:
>>
>> Sorry for constantly "spamming" this thread with another problem (i.e.
>> JDK-8129440 [2]) but I still think that it is related and important.
>> In its current state, the way how "load_heap_oop()" and its
>> application works is broken. And this is not because of a problem in
>> OrderAccess, but because of missing compiler barriers:
>>
>> static inline oop       load_heap_oop(oop* p)       { return *p; }
>> ...
>> template <class T>
>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>>    // 1. load 'heap_oop' from 'p'
>>    T heap_oop = oopDesc::load_heap_oop(p);
>>    if (!oopDesc::is_null(heap_oop)) {
>>      // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>>      oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
>
>
> Do you mean that the compiler has not stashed heap_oop somewhere and
> re-executes oopDesc::load_heap_oop(p) again? That would be quite nasty I
> think in general as it breaks any logic that wants to read a non-local
> variable once to get it into a local and reuse that knowing that it won't
> change even if the real variable does!
>

Yes, that's exactly what I mean and that's exactly what we've observed
on AIX with xlc. Notice that the compiler is free to do such
transformations if the load is not from a volatile field. That's why
we've opened the bug and fixed out internal version. But we still
think this fix needs to go into OpenJDK as well.

Regards,
Volker

> David
> -----
>
>
>>      HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>>      _cm->grayRoot(obj, hr);
>>    }
>> }
>>
>> Notice that we don't need memory barriers here - all we need is to
>> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
>> time. After Andrews explanation (thanks for that!) and Martin's
>> examples from Google, I think we could fix this by rewriting
>> 'load_heap_oop()' (and friends) as follows:
>>
>> static inline oop load_heap_oop(oop* p) {
>>    oop o = *p;
>>    __asm__ volatile ("" : : : "memory");
>>    return o;
>> }
>>
>> In order to make this consistent across all platforms, we would
>> probably have to introduce a new, public "compiler barrier" function
>> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
>> don't currently seem to have a cross-platform concept for
>> "compiler-only barriers"). But I'm still not convinced that it would
>> be better than simply writing (and that's the way how we've actually
>> solved it internally):
>>
>> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }
>>
>> Declaring that single memory location to be 'volatile' seems to be a
>> much more local change compared to globally "clobbering" all the
>> memory. And it doesn't rely on a the compilers providing a compiler
>> barrier. It does however rely on the compiler doing the "right thing"
>> for volatile - but after all what has been said here so far, that
>> seems more likely?
>>
>> The problem may also depend on the specific compiler/cpu combination.
>> For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
>> for volatile variables - they don't insert any memory barriers (i.e.
>> no instructions) but just access the corresponding variables as if
>> there was a compiler barrier. This is exactly what we currently want
>> in HotSpot, because fine-grained control of memory barriers is
>> controlled by the use of OrderAccess (and OrderAccess implies
>> "compiler barrier", at least after the latest fixes).
>>
>> Any thoughts? Should we introduce a cross-platform, "compiler-only
>> barrier" or should we stick to using "volatile" for such cases?
>>
>> Regards,
>> Volker
>>
>> [1]
>> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
>> [2] https://bugs.openjdk.java.net/browse/JDK-8129440
>>
>> On Mon, May 29, 2017 at 2:20 PM, Erik ?sterlund
>> <erik.osterlund at oracle.com> wrote:
>>>
>>> Hi Andrew,
>>>
>>> I just thought I'd put my opinions in here as I see I have been mentioned
>>> a
>>> few times already.
>>>
>>> First of all, I find using the volatile keyword on things that are
>>> involved
>>> in lock-free protocols meaningful from a readability point of view. It
>>> allows the reader of the code to see care is needed here.
>>>
>>> About the compiler barriers - you are right. Volatile should indeed not
>>> be
>>> necessary if the compiler barriers do everything right. The compiler
>>> should
>>> not reorder things and it should not prevent reloading.
>>>
>>> On windows we rely on the deprecated _ReadWriteBarrier(). According to
>>> MSDN,
>>> it guarantees:
>>>
>>> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that
>>> can
>>> remove or reorder memory accesses across the point of the call."
>>>
>>> This should cut it.
>>>
>>> The GCC memory clobber is defined as:
>>>
>>> "The "memory" clobber tells the compiler that the assembly code performs
>>> memory reads or writes to items other than those listed in the input and
>>> output operands (for example, accessing the memory pointed to by one of
>>> the
>>> input parameters). To ensure memory contains correct values, GCC may need
>>> to
>>> flush specific register values to memory before executing the asm.
>>> Further,
>>> the compiler does not assume that any values read from memory before an
>>> asm
>>> remain unchanged after that asm; it reloads them as needed. Using the
>>> "memory" clobber effectively forms a read/write memory barrier for the
>>> compiler."
>>>
>>> This seems to only guarantee values will not be re-ordered. But in the
>>> documentation for ExtendedAsm it also states:
>>>
>>> "You will also want to add the volatile keyword if the memory affected is
>>> not listed in the inputs or outputs of the asm, as the `memory' clobber
>>> does
>>> not count as a side-effect of the asm."
>>>
>>> and
>>>
>>> "The volatile keyword indicates that the instruction has important
>>> side-effects. GCC will not delete a volatile asm if it is reachable. (The
>>> instruction can still be deleted if GCC can prove that control-flow will
>>> never reach the location of the instruction.) Note that even a volatile
>>> asm
>>> instruction can be moved relative to other code, including across jump
>>> instructions."
>>>
>>> This is a bit vague, but seems to suggest that by making the asm
>>> statement
>>> volatile and having a memory clobber, it definitely will not reload
>>> variables. About not re-ordering non-volatile accesses, it shouldn't but
>>> it
>>> is not quite clearly stated. I have never observed such a re-ordering
>>> across
>>> a volatile memory clobber. But the semantics seem a bit vague.
>>>
>>> As for clang, the closest to a definition of what it does I have seen is:
>>>
>>> "A clobber constraint is indicated by a ?~? prefix. A clobber does not
>>> consume an input operand, nor generate an output. Clobbers cannot use any
>>> of
>>> the general constraint code letters ? they may use only explicit register
>>> constraints, e.g. ?~{eax}?. The one exception is that a clobber string of
>>> ?~{memory}? indicates that the assembly writes to arbitrary undeclared
>>> memory locations ? not only the memory pointed to by a declared indirect
>>> output."
>>>
>>> Apart from sweeping statements saying clang inline assembly is largely
>>> compatible and working similar to GCC, I have not seen clear guarantees.
>>> And
>>> then there are more compilers.
>>>
>>> As a conclusion, by using volatile in addition to OrderAccess you rely on
>>> standardized compiler semantics (at least for volatile-to-volatile
>>> re-orderings and re-loading, but not for volatile-to-nonvolatile, but
>>> that's
>>> another can of worms), and regrettably if you rely on OrderAccess memory
>>> model doing what it says it will do, then it should indeed work without
>>> volatile, but to make that work, OrderAccess relies on non-standardized
>>> compiler-specific barriers. In practice it should work well on all our
>>> supported compilers without volatile. And if it didn't, it would indeed
>>> be a
>>> bug in OrderAccess that needs to be solved in OrderAccess.
>>>
>>> Personally though, I am a helmet-on-synchronization kind of person, so I
>>> would take precaution anyway and use volatile whenever possible, because
>>> 1)
>>> it makes the code more readable, and 2) it provides one extra layer of
>>> safety that is more standardized. It seems that over the years it has
>>> happened multiple times that we assumed OrderAccess is bullet proof, and
>>> then realized that it wasn't and observed a crash that would never have
>>> happened if the code was written in a helmet-on-synchronization way. At
>>> least that's how I feel about it.
>>>
>>> Now one might argue that by using C++11 atomics that are standardized,
>>> all
>>> these problems would go away as we would rely in standardized primitives
>>> and
>>> then just trust the compiler. But then there could arise problems when
>>> the
>>> C++ compiler decides to be less conservative than we want, e.g. by not
>>> doing
>>> fence in sequentially consistent loads to optimize for non-multiple copy
>>> atomic CPUs arguing that IRIW issues that violate sequential consistency
>>> are
>>> non-issues in practice. That makes those loads "almost" sequentially
>>> consistent, which might be good enough. But it feels good to have a
>>> choice
>>> here to be more conservative. To have the synchronization helmet on.
>>>
>>> Meta summary:
>>> 1) Current OrderAccess without volatile:
>>>    - should work, but relies on compiler-specific not standardized and
>>> sometimes poorly documented compiler barriers.
>>>
>>> 2) Current OrderAccess with volatile:
>>>    - relies on standardized volatile semantics to guarantee compiler
>>> reordering and reloading issues do not occur.
>>>
>>> 3) C++11 Atomic backend for OrderAccess
>>>    - relies on standardized semantics to guarantee compiler and hardware
>>> reordering issues
>>>    - nevertheless isn't always flawless, and when it isn't, it gets
>>> painful
>>>
>>> Hope this sheds some light on the trade-offs.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>
>>> On 2017-05-28 10:45, Andrew Haley wrote:
>>>>
>>>>
>>>> On 27/05/17 10:10, Volker Simonis wrote:
>>>>>
>>>>>
>>>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>>>
>>>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>>>> order to write correct, lock free programs.
>>>>>>
>>>>>>
>>>>>> Yes, but so do compiler barriers.
>>>>>
>>>>>
>>>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>>>> to prevent reordering by the compiler. However, this is a question of
>>>>> optimization. If you have two subsequent loads from the same address,
>>>>> the compiler is free to do only the first load and keep the value in a
>>>>> register if the address is not pointing to a volatile value.
>>>>
>>>>
>>>> No it isn't: that is precisely what a compiler barrier prevents.  A
>>>> compiler barrier (from the POV of the compiler) clobbers all of
>>>> the memory state.  Neither reads nor writes may move past a compiler
>>>> barrier.
>>>>
>>>
>

From david.holmes at oracle.com  Tue May 30 21:33:47 2017
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 31 May 2017 07:33:47 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <64fda788-4ff6-978e-4476-a5af36bd708a@redhat.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <6e340bdf-da1a-5a6b-6db4-8d040cdf0b54@redhat.com>
 <592D5030.2020904@oracle.com>
 <64fda788-4ff6-978e-4476-a5af36bd708a@redhat.com>
Message-ID: <35858f93-1625-af37-6d1a-773fddd73654@oracle.com>

<trimming>

On 30/05/2017 10:45 PM, Andrew Haley wrote:
> You have to remember where this discussion started, which was a
> proposed use of volatile to fix a bug where a barrier was needed.

No that was not the case, as has been pointed out numerous times. There 
were two bugs:

1. Incorrect placement of volatile in a declaration
2. Need to backport the compiler_barrier changes for OrderAccess.

No one suggested doing #1 in lieu of #2. We wanted #1 as well as #2.

David


From volker.simonis at gmail.com  Tue May 30 22:24:39 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Tue, 30 May 2017 22:24:39 +0000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <CA+3eh11VcFMZ_3Cfnk8aN+3mQOBmOjAKs1Q0MdD2P1e1hsn=pw@mail.gmail.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <67244460-ac75-44a9-4b2b-0bd5f36bd74f@redhat.com>
 <f33179d8-697d-4f35-320f-6a82611016e8@redhat.com>
 <88a3d832-9875-8a3a-529c-a3a3dfe295de@redhat.com>
 <69ad2e6a-74fb-e0bb-07df-3b25c143f81d@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <CA+3eh10=cgrQ8VkT9HHYjn0aNieQ8tBrh0VmNX0OMs4wU=kwWw@mail.gmail.com>
 <D6206F9C-6479-477D-8134-C0E989BA5E14@oracle.com>
 <CA+3eh11VcFMZ_3Cfnk8aN+3mQOBmOjAKs1Q0MdD2P1e1hsn=pw@mail.gmail.com>
Message-ID: <CA+3eh13uQn3nR5pn=ujHrmf-gPSxyR5VjfvAQLfAJacUL84uXA@mail.gmail.com>

Volker Simonis <volker.simonis at gmail.com> schrieb am Di. 30. Mai 2017 um
17:37:

> On Mon, May 29, 2017 at 7:56 PM, Erik Osterlund
> <erik.osterlund at oracle.com> wrote:
> > Hi Volker,
> >
> > Thank you for filling in more compiler info.
> >
> > If there is a choice between providing a new compiler barrier interface
> and defining its semantics vs using existing volatile semantics, then
> volatile semantics seems better to me.
> >
> > Also, my new Access API allows you to do
> BasicAccess<MO_VOLATILE>::load_oop(addr) to perform load_heap_oop and
> load_decode_heap_oop with volatile semantics. Sounds like that would help
> here.
>
> Sorry for my ignorance, but what is the "new Access API" and
> "BasicAccess<MO_VOLATILE>"? It actually sounds quite interesting :)
>
Sorry, my bad:(
Please ignore this mail, I totally forgot about the new GC interface...

>
> >
> > Thanks,
> > /Erik
> >
> >> On 29 May 2017, at 19:02, Volker Simonis <volker.simonis at gmail.com>
> wrote:
> >>
> >> Hi Erik,
> >>
> >> thanks for the nice summary. Just for the sake of completeness, here's
> >> the corresponding documentation for the xlc compiler barrier [1]. It
> >> kind of implements the gcc syntax, but the wording is slightly
> >> different:
> >>
> >> "Add memory to the list of clobbered registers if assembler
> >> instructions can change a memory location in an unpredictable fashion.
> >> The memory clobber ensures that the data used after the completion of
> >> the assembly statement is valid and synchronized.
> >> However, the memory clobber can result in many unnecessary reloads,
> >> reducing the benefits of hardware prefetching. Thus, the memory
> >> clobber can impose a performance penalty and should be used with
> >> caution."
> >>
> >> We haven't used it until now, so I can not say if it really does what
> >> it is supposed to do. I'm also concerned about the performance
> >> warning. It seems like the "unnecessary reloads" can really hurt on
> >> architectures like ppc which have much more registers than x86.
> >> Declaring a memory location 'volatile' seems much more simple and
> >> light-weight in order to achieve the desired effect. So I tend to
> >> agree with you and David that we should proceed to mark things with
> >> 'volatile'.
> >>
> >> Sorry for constantly "spamming" this thread with another problem (i.e.
> >> JDK-8129440 [2]) but I still think that it is related and important.
> >> In its current state, the way how "load_heap_oop()" and its
> >> application works is broken. And this is not because of a problem in
> >> OrderAccess, but because of missing compiler barriers:
> >>
> >> static inline oop       load_heap_oop(oop* p)       { return *p; }
> >> ...
> >> template <class T>
> >> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
> >>  // 1. load 'heap_oop' from 'p'
> >>  T heap_oop = oopDesc::load_heap_oop(p);
> >>  if (!oopDesc::is_null(heap_oop)) {
> >>    // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
> >>    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
> >>    HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
> >>    _cm->grayRoot(obj, hr);
> >>  }
> >> }
> >>
> >> Notice that we don't need memory barriers here - all we need is to
> >> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
> >> time. After Andrews explanation (thanks for that!) and Martin's
> >> examples from Google, I think we could fix this by rewriting
> >> 'load_heap_oop()' (and friends) as follows:
> >>
> >> static inline oop load_heap_oop(oop* p) {
> >>  oop o = *p;
> >>  __asm__ volatile ("" : : : "memory");
> >>  return o;
> >> }
> >>
> >> In order to make this consistent across all platforms, we would
> >> probably have to introduce a new, public "compiler barrier" function
> >> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
> >> don't currently seem to have a cross-platform concept for
> >> "compiler-only barriers"). But I'm still not convinced that it would
> >> be better than simply writing (and that's the way how we've actually
> >> solved it internally):
> >>
> >> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }
> >>
> >> Declaring that single memory location to be 'volatile' seems to be a
> >> much more local change compared to globally "clobbering" all the
> >> memory. And it doesn't rely on a the compilers providing a compiler
> >> barrier. It does however rely on the compiler doing the "right thing"
> >> for volatile - but after all what has been said here so far, that
> >> seems more likely?
> >>
> >> The problem may also depend on the specific compiler/cpu combination.
> >> For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
> >> for volatile variables - they don't insert any memory barriers (i.e.
> >> no instructions) but just access the corresponding variables as if
> >> there was a compiler barrier. This is exactly what we currently want
> >> in HotSpot, because fine-grained control of memory barriers is
> >> controlled by the use of OrderAccess (and OrderAccess implies
> >> "compiler barrier", at least after the latest fixes).
> >>
> >> Any thoughts? Should we introduce a cross-platform, "compiler-only
> >> barrier" or should we stick to using "volatile" for such cases?
> >>
> >> Regards,
> >> Volker
> >>
> >> [1]
> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
> >> [2] https://bugs.openjdk.java.net/browse/JDK-8129440
> >>
> >> On Mon, May 29, 2017 at 2:20 PM, Erik ?sterlund
> >> <erik.osterlund at oracle.com> wrote:
> >>> Hi Andrew,
> >>>
> >>> I just thought I'd put my opinions in here as I see I have been
> mentioned a
> >>> few times already.
> >>>
> >>> First of all, I find using the volatile keyword on things that are
> involved
> >>> in lock-free protocols meaningful from a readability point of view. It
> >>> allows the reader of the code to see care is needed here.
> >>>
> >>> About the compiler barriers - you are right. Volatile should indeed
> not be
> >>> necessary if the compiler barriers do everything right. The compiler
> should
> >>> not reorder things and it should not prevent reloading.
> >>>
> >>> On windows we rely on the deprecated _ReadWriteBarrier(). According to
> MSDN,
> >>> it guarantees:
> >>>
> >>> "The _ReadWriteBarrier intrinsic limits the compiler optimizations
> that can
> >>> remove or reorder memory accesses across the point of the call."
> >>>
> >>> This should cut it.
> >>>
> >>> The GCC memory clobber is defined as:
> >>>
> >>> "The "memory" clobber tells the compiler that the assembly code
> performs
> >>> memory reads or writes to items other than those listed in the input
> and
> >>> output operands (for example, accessing the memory pointed to by one
> of the
> >>> input parameters). To ensure memory contains correct values, GCC may
> need to
> >>> flush specific register values to memory before executing the asm.
> Further,
> >>> the compiler does not assume that any values read from memory before
> an asm
> >>> remain unchanged after that asm; it reloads them as needed. Using the
> >>> "memory" clobber effectively forms a read/write memory barrier for the
> >>> compiler."
> >>>
> >>> This seems to only guarantee values will not be re-ordered. But in the
> >>> documentation for ExtendedAsm it also states:
> >>>
> >>> "You will also want to add the volatile keyword if the memory affected
> is
> >>> not listed in the inputs or outputs of the asm, as the `memory'
> clobber does
> >>> not count as a side-effect of the asm."
> >>>
> >>> and
> >>>
> >>> "The volatile keyword indicates that the instruction has important
> >>> side-effects. GCC will not delete a volatile asm if it is reachable.
> (The
> >>> instruction can still be deleted if GCC can prove that control-flow
> will
> >>> never reach the location of the instruction.) Note that even a
> volatile asm
> >>> instruction can be moved relative to other code, including across jump
> >>> instructions."
> >>>
> >>> This is a bit vague, but seems to suggest that by making the asm
> statement
> >>> volatile and having a memory clobber, it definitely will not reload
> >>> variables. About not re-ordering non-volatile accesses, it shouldn't
> but it
> >>> is not quite clearly stated. I have never observed such a re-ordering
> across
> >>> a volatile memory clobber. But the semantics seem a bit vague.
> >>>
> >>> As for clang, the closest to a definition of what it does I have seen
> is:
> >>>
> >>> "A clobber constraint is indicated by a ?~? prefix. A clobber does not
> >>> consume an input operand, nor generate an output. Clobbers cannot use
> any of
> >>> the general constraint code letters ? they may use only explicit
> register
> >>> constraints, e.g. ?~{eax}?. The one exception is that a clobber string
> of
> >>> ?~{memory}? indicates that the assembly writes to arbitrary undeclared
> >>> memory locations ? not only the memory pointed to by a declared
> indirect
> >>> output."
> >>>
> >>> Apart from sweeping statements saying clang inline assembly is largely
> >>> compatible and working similar to GCC, I have not seen clear
> guarantees. And
> >>> then there are more compilers.
> >>>
> >>> As a conclusion, by using volatile in addition to OrderAccess you rely
> on
> >>> standardized compiler semantics (at least for volatile-to-volatile
> >>> re-orderings and re-loading, but not for volatile-to-nonvolatile, but
> that's
> >>> another can of worms), and regrettably if you rely on OrderAccess
> memory
> >>> model doing what it says it will do, then it should indeed work without
> >>> volatile, but to make that work, OrderAccess relies on non-standardized
> >>> compiler-specific barriers. In practice it should work well on all our
> >>> supported compilers without volatile. And if it didn't, it would
> indeed be a
> >>> bug in OrderAccess that needs to be solved in OrderAccess.
> >>>
> >>> Personally though, I am a helmet-on-synchronization kind of person, so
> I
> >>> would take precaution anyway and use volatile whenever possible,
> because 1)
> >>> it makes the code more readable, and 2) it provides one extra layer of
> >>> safety that is more standardized. It seems that over the years it has
> >>> happened multiple times that we assumed OrderAccess is bullet proof,
> and
> >>> then realized that it wasn't and observed a crash that would never have
> >>> happened if the code was written in a helmet-on-synchronization way. At
> >>> least that's how I feel about it.
> >>>
> >>> Now one might argue that by using C++11 atomics that are standardized,
> all
> >>> these problems would go away as we would rely in standardized
> primitives and
> >>> then just trust the compiler. But then there could arise problems when
> the
> >>> C++ compiler decides to be less conservative than we want, e.g. by not
> doing
> >>> fence in sequentially consistent loads to optimize for non-multiple
> copy
> >>> atomic CPUs arguing that IRIW issues that violate sequential
> consistency are
> >>> non-issues in practice. That makes those loads "almost" sequentially
> >>> consistent, which might be good enough. But it feels good to have a
> choice
> >>> here to be more conservative. To have the synchronization helmet on.
> >>>
> >>> Meta summary:
> >>> 1) Current OrderAccess without volatile:
> >>>  - should work, but relies on compiler-specific not standardized and
> >>> sometimes poorly documented compiler barriers.
> >>>
> >>> 2) Current OrderAccess with volatile:
> >>>  - relies on standardized volatile semantics to guarantee compiler
> >>> reordering and reloading issues do not occur.
> >>>
> >>> 3) C++11 Atomic backend for OrderAccess
> >>>  - relies on standardized semantics to guarantee compiler and hardware
> >>> reordering issues
> >>>  - nevertheless isn't always flawless, and when it isn't, it gets
> painful
> >>>
> >>> Hope this sheds some light on the trade-offs.
> >>>
> >>> Thanks,
> >>> /Erik
> >>>
> >>>
> >>>> On 2017-05-28 10:45, Andrew Haley wrote:
> >>>>
> >>>>> On 27/05/17 10:10, Volker Simonis wrote:
> >>>>>
> >>>>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com>
> wrote:
> >>>>>>
> >>>>>>> On 26/05/17 17:03, Volker Simonis wrote:
> >>>>>>>
> >>>>>>> Volatile not only prevents reordering by the compiler. It also
> >>>>>>> prevents other, otherwise legal transformations/optimizations (like
> >>>>>>> for example reloading a variable [1]) which have to be prevented in
> >>>>>>> order to write correct, lock free programs.
> >>>>>>
> >>>>>> Yes, but so do compiler barriers.
> >>>>>
> >>>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
> >>>>> to prevent reordering by the compiler. However, this is a question of
> >>>>> optimization. If you have two subsequent loads from the same address,
> >>>>> the compiler is free to do only the first load and keep the value in
> a
> >>>>> register if the address is not pointing to a volatile value.
> >>>>
> >>>> No it isn't: that is precisely what a compiler barrier prevents.  A
> >>>> compiler barrier (from the POV of the compiler) clobbers all of
> >>>> the memory state.  Neither reads nor writes may move past a compiler
> >>>> barrier.
> >>>>
> >>>
> >
>

From david.holmes at oracle.com  Wed May 31 00:49:07 2017
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 31 May 2017 10:49:07 +1000
Subject: RFR: 8181085: Race condition in method resolution may produce
 spurious NullPointerException
In-Reply-To: <35858f93-1625-af37-6d1a-773fddd73654@oracle.com>
References: <092b320b-3230-cb69-99d3-1d7777723578@redhat.com>
 <648fc2bf-34d0-2796-dd79-9e498536b23b@redhat.com>
 <CAAsuuZdp15mQ1hX9tzfVdCMdwbu1CbsPZa4wacck=tcfvhvr0w@mail.gmail.com>
 <f0c66175-c9b4-d3ab-0baa-c91342b8290e@oracle.com>
 <806cff0e-a48f-9a31-c384-caa83c423e8b@redhat.com>
 <2062698c-838b-ad17-0d35-acfc7706878d@oracle.com>
 <3d963255-2006-de03-4b81-87a3dd64ef72@redhat.com>
 <CAAsuuZdh2drtOay+0pdVvqBGqT26GewT4zdgbnXDHo49Y+YWcA@mail.gmail.com>
 <CAAsuuZf9ZA3w6KXqMDmZuM_QcxQ8bCS6tEHOBiC+=rvRYsStMA@mail.gmail.com>
 <CA+3eh11cZoDMOf9k5ZpBfP1aCWYKwVrOAd62eETCmD-GmECVyA@mail.gmail.com>
 <45c4ea7f-3137-76ae-d20a-df35922f4266@redhat.com>
 <CA+3eh11Lpd3i3_DRPd4OTEcisSASz3H9qRepprfmD-jE85GVeA@mail.gmail.com>
 <edc92d8d-2847-838b-d351-b2e292bf3236@redhat.com>
 <592C120A.1080908@oracle.com>
 <6e340bdf-da1a-5a6b-6db4-8d040cdf0b54@redhat.com>
 <592D5030.2020904@oracle.com>
 <64fda788-4ff6-978e-4476-a5af36bd708a@redhat.com>
 <35858f93-1625-af37-6d1a-773fddd73654@oracle.com>
Message-ID: <a16b6534-47d2-c7d8-9f83-32a9fa3ada9d@oracle.com>

Correction and apology ...

On 31/05/2017 7:33 AM, David Holmes wrote:
> <trimming>
> 
> On 30/05/2017 10:45 PM, Andrew Haley wrote:
>> You have to remember where this discussion started, which was a
>> proposed use of volatile to fix a bug where a barrier was needed.
> 
> No that was not the case, as has been pointed out numerous times. There 
> were two bugs:
> 
> 1. Incorrect placement of volatile in a declaration
> 2. Need to backport the compiler_barrier changes for OrderAccess.
> 
> No one suggested doing #1 in lieu of #2. We wanted #1 as well as #2.

My apologies, the very original proposal was just to fix #1 without any 
apparent knowledge of #2. When #2 was pointed out it was then proposed 
to drop #1. Paul and I then chimed in that #1 still needed to be fixed 
to follow lets say "hotspot style", even if, in the presence of correct 
(and correctly used) compiler-barriers the volatile should not be needed.

David

> David
>