GetPrimitiveArrayCritical vs GetByteArrayRegion: 140x slow-down using -Xcheck:jni and java.util.zip.DeflaterOutputStream

Thu Mar 15 17:49:53 UTC 2018

+hotspot-gc-dev

On Thu, Mar 15, 2018 at 1:25 AM Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On Thu, 2018-03-15 at 01:00 +0000, Ian Rogers wrote:
> > An old data point on how large a critical region should be comes from
> > java.nio.Bits. In JDK 9 the code migrated into unsafe, but in JDK 8
> > the copies within a critical region were bound at most copying 1MB:
> > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/
> > native/java/nio/Bits.c#l88 This is inconsistent with Deflater and
> > ObjectOutputStream which both allow unlimited arrays and thereby
> > critical region sizes.
> >
> > In JDK 9 the copies starve the garbage collector in nio Bits too as
> > there is no 1MB limit to the copy sizes:
> > http://hg.openjdk.java.net/jdk/jdk/rev/f70e100d3195
> > which came from:
> > https://bugs.openjdk.java.net/browse/JDK-8149596
> >
> > Perhaps this is a regression not demonstrated due to the testing
> > challenge.
> > [...]
> > It doesn't seem unreasonable to have the loops for the copies occur
> > in 1MB chunks but JDK-8149596 lost this and so I'm confused on what
> > the HotSpot stand point is.
>
> Please file a bug (seems to be a core-libs/java.nio regression?),
> preferably with some kind of regression test. Also file enhancements (I
> would guess) for the other cases allowing unlimited arrays.
>

I don't have perms to file bugs there's some catch-22 scenario in getting
the permissions. Happy to have a bug filed or to file were that not an
issue. Happy to create a test case but can't see any others for TTSP
issues. This feels like a potential use case for jmh, perhaps run the
benchmark well having a separate thread run GC bench.

Should there be a bug to add, in debug mode, a TTSP watcher thread whose
job it is to bring "random" threads into safepoints and report on tardy
ones?
Should there be a bug to warn on being in a JNI critical for more than just
a short period?
Seems like there should be a bug on Unsafe.copyMemory and
Unsafe.copySwapMemory having TTSP issues.
Seems like there should be a bug on all uses of critical that don't chunk
their critical region work based on some bound (like 1MB chunks for nio
Bits)? How are these bounds set? A past reference that I've lost is in
having the bound be the equivalent of 65535 bytecodes due to the
expectation of GC work at least once in a method or on a loop backedge - I
thought this was in a spec somewhere but now I can't find it. The bytecode
size feels as arbitrary as 1MB, a time period would be better but that can
depend on the kind of GC you want as delays with concurrent GC mean more
than non-concurrent. Clearly the chunk size shouldn't just be 0, but this
appears to currently be the norm in the JDK.

The original reason for coming here was a 140x slow down in -Xcheck:jni in
Deflater.deflate There are a few options there that its useful to enumerate:
1) rewrite in Java but there are correctness and open source related issues
2) remove underflow/overflow protection from critical arrays (revert
JDK-6311046
or perhaps bound protection to arrays of a particular small size) - this
removes checking and doesn't deal with TTSP
3) add a critical array slice API to JNI so that copies with -Xcheck:jni
aren't unbounded (martinrb@ proposed this) - keeps checking but doesn't
deal with TTSP
4) rewrite primitive array criticals with GetArrayRegion as O(n) beats the
"silent killer" TTSP (effectively deprecate the critical APIs)

In general (ie not just the deflate case) I think (1) is the most
preferable. (2) and (3) both have TTSP issues. (4) isn't great performance
wise, which motivates more use of approach (1), but I think deprecating
criticals may just be the easiest and sanest way forward. I think that
discussion is worth having on an e-mail thread rather than a bug.

> Long TTSP is a performance bug as any other.
>
> > In a way criticals are better than unsafe as they may
> > pin the memory and not starve GC, which shenandoah does.
>
> (Region based) Object pinning has its own share of problems:
>
>  - only (relatively) easily implemented in region based collectors
>
>  - may slow down pause a bit in presence of pinned regions/objects (for
> non-concurrent copying collectors)
>
>  - excessive use of pinning may cause OOME and VM exit probably earlier
> than the gc locker. GC locker seems to provide a more gradual
> degradation. E.g. pinning regions typically makes these regions
> unavailable for allocation.
> I.e. you still should not use it for many, very long living objects.
> Of course this somewhat depends on the sophistication of the
> implementation.
>
> I think region based pinning would be a good addition to other
> collectors than Shenandoah too. It has been on our minds for a long
> time, but there are so many other more important issues :), so of
> course we are eager to see contributions in this area. ;)
>
> If you are interested on working on this, please ping us on hotspot-gc-
> dev for implementation ideas to get you jump-started.
>
> Thanks,
>   Thomas
>

I'd rather deprecate criticals than build upon the complexity, but I'm very
glad this is a concern.

Thanks,
Ian