Critical JNI and (Shenandoah) pinning questions
David Holmes
david.holmes at oracle.com
Sun Aug 25 09:55:48 UTC 2019
On 25/08/2019 3:24 am, Ioannis Tsakpinis wrote:
> Hi Florian,
>
> On Fri, 23 Aug 2019 at 13:04, Florian Weimer <fweimer at redhat.com> wrote:
>> Why isn't the VZEROUPPER performed after using AVX2 registers? It's
>> supposed to cost approximately zero in that context.
>
> No idea.
>
>> Have you tried replacing the VZEROUPPER with a NOP of equal length?
>> Maybe it's just an instruction alignment issue.
>
> No and I'm not really qualified to do such low-level tuning. I'm only
> trying to demonstrate that the current implementation is not ideal. A
> JVM expert should review it, test across different hardware/arch's and
> decide what the best approach is.
>
> After further testing today, I've also identified the other source of
> overhead in JDK 10+ compared to JDK 8: JDK-8213436 [1]. This explains
> why non-critical JNI with the patched JDK 14 is not as fast as JDK 8.
> The reasoning behind switching to UseMembar by default is sound, but
> I'm wondering whether the memory barrier mask is unnecessarily strict.
> It currently looks like this (all bits are used):
>
> __ membar(Assembler::Membar_mask_bits(
> Assembler::LoadLoad | Assembler::LoadStore |
> Assembler::StoreLoad | Assembler::StoreStore));
This is a full "fence", which AIUI is the requirement in this context.
David
-----
> Benchmark results on a Coffee Lake Xeon (better single-core performance
> than my Ryzen):
>
> JDK 8
> Standard JNI: ~4.3ns
> Critical JNI: ~4.3ns
> JDK 8 -XX:+UseMembar
> Standard JNI: ~8.0ns
> Critical JNI: ~7.7ns
> JDK 12
> Standard JNI: ~8.4ns
> Critical JNI: ~8.1ns
> Patched JDK 14 with VZEROUPPER
> Standard JNI: ~8.4ns
> Critical JNI: ~3.4ns (!)
> Patched JDK 14 without VZEROUPPER
> Standard JNI: ~7.9ns
> Critical JNI: ~2.9ns (!!)
>
> Note that the overhead with VZEROUPPER is not that bad compared to
> Ryzen, but it's still higher than without.
>
> These findings suggest the following RFEs:
>
> 1. Skip check_needs_gc_for_critical_native() in primitive-only JNI
> critical natives, regardless of GC algorithm and object-pinning
> support. Without this change, CriticalJNINatives is completely useless
> and actually dangerous.
>
> 2. Skip the switch to "native transition" and the safepoint polling in
> primitive-only JNI critical natives.
>
> 3. Re-evaluate the use of VZEROUPPER instructions throughout the JNI
> wrapper code. Could there be fewer of them? Could they be eliminated
> entirely or emitted only for the specific CPU models that need them?
> Will benefit both standard and critical JNI natives.
>
> 4. Re-evaluate the memory barrier emitted before the safepoint poll.
> Could it be relaxed while preserving correctness? Will benefit standard
> JNI (critical JNI natives skip the barrier with #2).
>
> 5. Backport any changes for #1-4 to JDK 8u & 11u. JDK 8 also needs
> JDK-8167408 [2] and JDK-8167409 [3].
>
> The "skip_native_trans" patch implements #1 and #2. I would also gladly
> help with #5 if necessary (haven't signed the OCA yet, but I will).
> Ideally, HotSpot engineers would take a look at #3 and #4 and test
> everything thoroughly.
>
> Thanks,
>
> - Ioannis
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8213436
> [2] https://bugs.openjdk.java.net/browse/JDK-8167408
> [3] https://bugs.openjdk.java.net/browse/JDK-8167409
>
More information about the shenandoah-dev
mailing list