RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3]

Tue Feb 28 02:56:02 UTC 2023

On Mon, 27 Feb 2023 08:49:18 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

> I don’t think we want this to be on by default on platforms where StoreLoad fences don't cause substantial global overheads. The benefit on such platforms is rather low, and needing the last couple of nanoseconds of transition speed, seems to not be a normal use case that default settings should optimize for. Conversely, the global synchronization can be rather intrusive, especially when it involves handshakes with N threads, and you need to perform global synchronization across the entire machine, for each thread poked. I would be much more afraid of that issue out of the box, than I would be afraid of a couple of nanoseconds slower native transitions.

Hi Erik, StoreLoad fences cause substantial overhead on any multi-socket system including x86_64. The benefit may be small on single-socket systems, but can the VM distinguish? We are currently looking for benchmarks which show a negative effect of enabling it. Seems like the SPEC benchmarks don't care about it. Note that we typically use only one membarrier syscall when we handshake all threads. If you know any workload which suffers, would be great to know. David is currently also checking benchmarks. We should discuss further details in https://github.com/openjdk/jdk/pull/12753.

-------------

PR: https://git.openjdk.org/jdk/pull/12708