GetPrimitiveArrayCritical vs GetByteArrayRegion: 140x slow-down using -Xcheck:jni and java.util.zip.DeflaterOutputStream

Fri Mar 16 17:19:58 UTC 2018

Thanks Paul, very interesting.

On Fri, Mar 16, 2018 at 9:21 AM Paul Sandoz <paul.sandoz at oracle.com> wrote:

> Hi Ian, Thomas,
>
> Some background on the bulk copying for byte buffers after talking with
> Mikael who worked on these changes a few years ago.
>
> Please correct the following if needed as our memory is hazy :-)
>
> IIRC at the time we felt this was a reasonable thing to do because C2 did
> not strip mine the loops for the bulk copying of large Java arrays i.e. the
> issue was there anyway for more common cases. However, i believe that may
> no longer be the so in some cases after Roland implemented loop strip
> mining in C2 [1]. So we should go back and revisit/check the current
> support in buffers and Java arrays (System.arraycopy etc).
>

The C2 issue is a well known TTSP issue :-) Its great that there is a strip
mining optimization, revisiting the bulk copies would be great!

> (This is also something we need to consider if we modify buffers to
> support capacities larger than Integer.MAX_VALUE. Also connects with
> Project Panama.)
>
> If Thomas has not done so or does not plan to i can log an issue for you.
>

That'd be great. I wonder if identifying more TTSP issues should also be a
bug. Its interesting to observe that overlooking TTSP in C2 motivated the
Unsafe.copyMemory change permitting a fresh TTSP issue. If TTSP is a 1st
class issue then maybe we can deprecate JNI critical regions to support
that effort :-)

Thanks,
Ian

> Paul.
>
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8186027
>
> On Mar 15, 2018, at 10:49 AM, Ian Rogers <irogers at google.com> wrote:
>
> +hotspot-gc-dev
>
> On Thu, Mar 15, 2018 at 1:25 AM Thomas Schatzl <thomas.schatzl at oracle.com>
> wrote:
>
> Hi,
>
> On Thu, 2018-03-15 at 01:00 +0000, Ian Rogers wrote:
>
> An old data point on how large a critical region should be comes from
> java.nio.Bits. In JDK 9 the code migrated into unsafe, but in JDK 8
> the copies within a critical region were bound at most copying 1MB:
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/
> native/java/nio/Bits.c#l88 This is inconsistent with Deflater and
> ObjectOutputStream which both allow unlimited arrays and thereby
> critical region sizes.
>
> In JDK 9 the copies starve the garbage collector in nio Bits too as
> there is no 1MB limit to the copy sizes:
> http://hg.openjdk.java.net/jdk/jdk/rev/f70e100d3195
> which came from:
> https://bugs.openjdk.java.net/browse/JDK-8149596
>
> Perhaps this is a regression not demonstrated due to the testing
> challenge.
> [...]
> It doesn't seem unreasonable to have the loops for the copies occur
> in 1MB chunks but JDK-8149596 lost this and so I'm confused on what
> the HotSpot stand point is.
>
>
> Please file a bug (seems to be a core-libs/java.nio regression?),
> preferably with some kind of regression test. Also file enhancements (I
> would guess) for the other cases allowing unlimited arrays.
>
>
> I don't have perms to file bugs there's some catch-22 scenario in getting
> the permissions. Happy to have a bug filed or to file were that not an
> issue. Happy to create a test case but can't see any others for TTSP
> issues. This feels like a potential use case for jmh, perhaps run the
> benchmark well having a separate thread run GC bench.
>
> Should there be a bug to add, in debug mode, a TTSP watcher thread whose
> job it is to bring "random" threads into safepoints and report on tardy
> ones?
> Should there be a bug to warn on being in a JNI critical for more than just
> a short period?
> Seems like there should be a bug on Unsafe.copyMemory and
> Unsafe.copySwapMemory having TTSP issues.
> Seems like there should be a bug on all uses of critical that don't chunk
> their critical region work based on some bound (like 1MB chunks for nio
> Bits)? How are these bounds set? A past reference that I've lost is in
> having the bound be the equivalent of 65535 bytecodes due to the
> expectation of GC work at least once in a method or on a loop backedge - I
> thought this was in a spec somewhere but now I can't find it. The bytecode
> size feels as arbitrary as 1MB, a time period would be better but that can
> depend on the kind of GC you want as delays with concurrent GC mean more
> than non-concurrent. Clearly the chunk size shouldn't just be 0, but this
> appears to currently be the norm in the JDK.
>
> The original reason for coming here was a 140x slow down in -Xcheck:jni in
> Deflater.deflate There are a few options there that its useful to
> enumerate:
> 1) rewrite in Java but there are correctness and open source related issues
> 2) remove underflow/overflow protection from critical arrays (revert
> JDK-6311046
> or perhaps bound protection to arrays of a particular small size) - this
> removes checking and doesn't deal with TTSP
> 3) add a critical array slice API to JNI so that copies with -Xcheck:jni
> aren't unbounded (martinrb@ proposed this) - keeps checking but doesn't
> deal with TTSP
> 4) rewrite primitive array criticals with GetArrayRegion as O(n) beats the
> "silent killer" TTSP (effectively deprecate the critical APIs)
>
> In general (ie not just the deflate case) I think (1) is the most
> preferable. (2) and (3) both have TTSP issues. (4) isn't great performance
> wise, which motivates more use of approach (1), but I think deprecating
> criticals may just be the easiest and sanest way forward. I think that
> discussion is worth having on an e-mail thread rather than a bug.
>
>
> Long TTSP is a performance bug as any other.
>
> In a way criticals are better than unsafe as they may
> pin the memory and not starve GC, which shenandoah does.
>
>
> (Region based) Object pinning has its own share of problems:
>
> - only (relatively) easily implemented in region based collectors
>
> - may slow down pause a bit in presence of pinned regions/objects (for
> non-concurrent copying collectors)
>
> - excessive use of pinning may cause OOME and VM exit probably earlier
> than the gc locker. GC locker seems to provide a more gradual
> degradation. E.g. pinning regions typically makes these regions
> unavailable for allocation.
> I.e. you still should not use it for many, very long living objects.
> Of course this somewhat depends on the sophistication of the
> implementation.
>
> I think region based pinning would be a good addition to other
> collectors than Shenandoah too. It has been on our minds for a long
> time, but there are so many other more important issues :), so of
> course we are eager to see contributions in this area. ;)
>
> If you are interested on working on this, please ping us on hotspot-gc-
> dev for implementation ideas to get you jump-started.
>
> Thanks,
>  Thomas
>
>
> I'd rather deprecate criticals than build upon the complexity, but I'm very
> glad this is a concern.
>
> Thanks,
> Ian
>
>
>