[aarch64-port-dev ] RFR(S): 8248657: Windows: strengthening in ThreadCritical regarding memory model

Ludovic Henry luhenry at microsoft.com
Wed Jul 15 17:00:02 UTC 2020


Hi David,

>> I can confirm that calls such as SetEvent have an implicit memory barrier as they are syscalls. This specific instance would then not suffer from any memory reordering issues.
>
> That is good to know. But this is something that Microsoft should be
> documenting explicitly - even if just a blanket statement that all
> syscalls (which are what exactly?) provide an implicit memory barrier
> (of what type exactly?).

I don't think it's because SetEvent is a syscall that we can assume it has a barrier (even though syscall do guarantee a barrier), it's more that SetEvent is an equivalent to sem_post. And if you cannot assume that sem_post or SetEvent guarantee a memory barrier (full or at least store_release), then you could not trust any standard locking mechanism (what's the point of synchronizing if the CPU can load and store outside of the critical section).

>> Also, would jcstress help root out these kinds of issues in Hotspot, or does that only test the code generated with the Interpreter/C1/C2? We run successfully jcstress in `tough` mode.
>
> jcstress tests will execute the native runtime code of course, but they
> won't be "stressing" it as such.

Makes sense, thanks for the clarification.

--
Ludovic

I agree with you on the value of a more explicit documentation, and I'll go look for that. If it doesn't exist, I'll put the request to have it documented somewhere on docs.microsoft.com. In the meantime, it is safe to assume that SetEvent contains a memory barrier that has at least a store_release semantic. Similarly, WaitForSingleObect and WaitForMultipleObjects have at least a load_acquire memory barrier, and are also syscalls (actually guaranteeing a full memory barrier).

________________________________________
From: David Holmes <david.holmes at oracle.com>
Sent: Monday, July 13, 2020 19:25
To: Ludovic Henry; Andrew Haley; Thomas Stüfe
Cc: Kim Barrett; hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
Subject: Re: [aarch64-port-dev ] RFR(S): 8248657: Windows: strengthening in ThreadCritical regarding memory model

Hi Ludovic,

On 14/07/2020 11:28 am, Ludovic Henry wrote:
> Hello,
>
>> But if we are dealing with non-TSO races then it would be good to get
>> some guidance from Microsoft as to the memory ordering properties of
>> various API's to ensure that we are maintaining correct ordering. For
>> example, in the destructor we have:
>>
>> 81     lock_owner = 0;
>> 82     // No lost wakeups, lock_event stays signaled until reset.
>> 83     DWORD ret = SetEvent(lock_event);
>>
>> but unless we are guaranteed that the store to lock_owner cannot be
>> reordered by the compiler or the hardware, to appear to be after the
>> SetEvent, then the logic is broken. Generally, because Windows only
>> supported TSO systems, we have assumed that the compiler will not
>> reorder code across these kind of API calls. But now we also need
>> hardware guarantees.
>
> I can confirm that calls such as SetEvent have an implicit memory barrier as they are syscalls. This specific instance would then not suffer from any memory reordering issues.

That is good to know. But this is something that Microsoft should be
documenting explicitly - even if just a blanket statement that all
syscalls (which are what exactly?) provide an implicit memory barrier
(of what type exactly?).

> As for the general question around platforms with weaker memory models, AArch64 is not the first such platform that MSVC and Windows have been ported to. It is safe to assume that MSVC has a similar approach to GCC and Clang on memory reordering optimizations. [1] also gives some pointers on some MSVC specific knobs for working around the weaker memory model.

The /volatile:ms is the kind of build control I was wondering about.
Thanks for the pointer.

> Also, would jcstress help root out these kinds of issues in Hotspot, or does that only test the code generated with the Interpreter/C1/C2? We run successfully jcstress in `tough` mode.

jcstress tests will execute the native runtime code of course, but they
won't be "stressing" it as such.

Cheers,
David
-----

> I hope this helps to answer your questions.
>
> [1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fbuild%2Fcommon-visual-cpp-arm-migration-issues%3Fview%3Dvs-2019%23volatile-keyword-default-behavior&data=02%7C01%7Cluhenry%40microsoft.com%7C0a66ab918637459bb2a408d8279d4400%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637302903701111218&sdata=vRedVDK0JAKS8QMepJYW%2Ffqga8H5pQKrptBBVtSGjG4%3D&reserved=0
>
> --
> Ludovic
> ________________________________________
> From: Andrew Haley <aph at redhat.com>
> Sent: Monday, July 13, 2020 01:36
> To: David Holmes; Thomas Stüfe
> Cc: Kim Barrett; Ludovic Henry; hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
> Subject: Re: [aarch64-port-dev ] RFR(S): 8248657: Windows: strengthening in ThreadCritical regarding memory model
>
> On 13/07/2020 06:48, David Holmes wrote:
>> Hi Thomas,
>>
>> On 13/07/2020 2:41 pm, Thomas Stüfe wrote:
>>>
>>> Can a compiler reorder system calls and stores? How would it determine
>>> if this is safe to do?
>
> I very much doubt it.
>
>> A compiler can reorder anything it likes if it can determine it is safe
>> to do so. :)
>
> I'm fairly sure the compiler doesn't care about that!
>
>>> I'd be surprised if Microsoft loosened up reordering since this would
>>> mean existing software cannot just be recompiled for arm and expected to
>>> work. But this is just a guess of course.
>>
>> It's an interesting point because I would expect there to be a lot of
>> software written for Windows that contains assumptions of TSO that would
>> in fact fail when run on Aarch64. I don't know if there are any special
>> mechanisms to force a binary to run in TSO mode on Aarch64 under Windows
>> (or build flags), that would allow for ease of migration.
>
> There's no standard hardware mechanism that would do so.
>
> I've been very surprised at how little software has broken on AArch64
> because of memory ordering. Like you, I initially assumed that stuff
> would break all over the place, but by and large it was OK. I know of
> two reasons: firstly, programmers are pretty conservative and tend to
> use simple and reliable mechanisms such as safe publication and
> mutexes for inter-thread communication. But also, and maybe more
> importantly, the kinds of reordering the hardware can do are not very
> different from those compilers do. Therefore, anyone playing fast and
> loose with TSO has probably already been bitten by the compiler.
>
>> But unless all Windows software will run in such a mode there is a
>> need for MS to document what the memory consistency properties of
>> various APIs are (as POSIX does [1]).
>
> Indeed. I would have thought it existed somewhere.
>
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.redhat.com%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C0a66ab918637459bb2a408d8279d4400%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637302903701111218&sdata=AkJOKTuxv6knUzQtTMt1ZUhAYasMIqhzX%2Bp%2FNwHY5rc%3D&reserved=0>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fandrewhaley&data=02%7C01%7Cluhenry%40microsoft.com%7C0a66ab918637459bb2a408d8279d4400%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637302903701111218&sdata=4q5HhnMSPcuh9ADTPTp60zZpc2ZrQ4663HiR8x6inmc%3D&reserved=0
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>


More information about the aarch64-port-dev mailing list