GetPrimitiveArrayCritical vs GetByteArrayRegion: 140x slow-down using -Xcheck:jni and java.util.zip.DeflaterOutputStream

Ian Rogers irogers at google.com
Thu Mar 15 01:00:13 UTC 2018


An old data point on how large a critical region should be comes from
java.nio.Bits. In JDK 9 the code migrated into unsafe, but in JDK 8 the
copies within a critical region were bound at most copying 1MB:
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/native/java/nio/Bits.c#l88
This is inconsistent with Deflater and ObjectOutputStream which both allow
unlimited arrays and thereby critical region sizes.

In JDK 9 the copies starve the garbage collector in nio Bits too as there
is no 1MB limit to the copy sizes:
http://hg.openjdk.java.net/jdk/jdk/rev/f70e100d3195
which came from:
https://bugs.openjdk.java.net/browse/JDK-8149596

Perhaps this is a regression not demonstrated due to the testing challenge.
There is a time to safepoint discussion thread related to this here:
https://groups.google.com/d/msg/mechanical-sympathy/f3g8pry-o1A/x6NptTDslcIJ
"silent killer"
It doesn't seem unreasonable to have the loops for the copies occur in 1MB
chunks but JDK-8149596 lost this and so I'm confused on what the HotSpot
stand point is. In a way criticals are better than unsafe as they may pin
the memory and not starve GC, which shenandoah does.

Thanks,
Ian



On Wed, Mar 7, 2018 at 10:16 AM Ian Rogers <irogers at google.com> wrote:

> Thanks Martin! Profiling shows most of the time spent in this code is in
> the call to libz's deflate. I worry that increasing the buffer size
> increases that work and holds the critical lock for longer. Profiling
> likely won't show this issue as there's needs to be contention on the GC
> locker.
>
> In HotSpot:
>
> http://hg.openjdk.java.net/jdk/jdk/file/2854589fd853/src/hotspot/share/gc/shared/gcLocker.hpp#l34
> "Avoid calling these if at all possible" could be taken to suggest that
> JNI critical regions should also be avoided if at all possible. I think
> HotSpot and the JDK are out of step if this is the case and there could be
> work done to remove JNI critical regions from the JDK and replace either
> with Java code (JITs are better now) or with Get/Set...ArrayRegion. This
> does appear to be a O(1) to O(n) transition so perhaps the HotSpot folks
> could speak to it.
>
> Thanks,
> Ian
>
>
> On Tue, Mar 6, 2018 at 6:44 PM Martin Buchholz <martinrb at google.com>
> wrote:
>
>> Thanks Ian and Sherman for the excellent presentation and memories of
>> ancient efforts.
>>
>> Yes, Sherman, I still have vague memory that attempts to touch any
>> implementation detail in this area was asking for trouble and someone would
>> complain.  I was happy to let you deal with those problems!
>>
>> There's a continual struggle in the industry to enable more checking at
>> test time, and -Xcheck:jni does look like it should be possible to
>> routinely turn on for running all tests.  (Google tests run with a time
>> limit, and so any low-level performance regression immediately causes test
>> failures, for better or worse)
>>
>> Our problem reduces to accessing a primitive array slice from native
>> code.  The only way to get O(1) access is via GetPrimitiveArrayCritical,
>> BUT when it fails you have to pay for a copy of the entire array.  An
>> obvious solution is to introduce a slice variant GetPrimitiveArrayRegionCritical
>> that would only degrade to a copy of the slice.  Offhand that seems
>> relatively easy to implement though we would hold our noses at adding yet
>> more *Critical* functions to the JNI spec.  In spirit though it's a
>> straightforward generalization.
>>
>> Implementing Deflater in pure Java seems very reasonable and we've had
>> good success with "nearby" code, but we likely cannot reuse the GNU
>> Classpath code.
>>
>> Thanks for pointing out
>> JDK-6311046: -Xcheck:jni should support checking of
>> GetPrimitiveArrayCritical
>> which went into jdk8 in u40.
>>
>> We can probably be smarter about choosing a better buffer size, e.g. in
>> ZipOutputStream.
>>
>> Here's an idea:  In code like this
>>   try (DeflaterOutputStream dout = new DeflaterOutputStream(deflated)) {
>>      dout.write(inflated, 0, inflated.length);
>>    }
>> when the DeflaterOutputStream is given an input that is clearly too large
>> for the current buffer size, reorganize internals dynamically to use a much
>> bigger buffer size.
>>
>> It's possible (but hard work!) to adjust algorithms based on whether
>> critical array access is available.  It would be nice if we could get the
>> JVM to tell us (but it might depend, e.g. on the size of the array).
>>
>


More information about the core-libs-dev mailing list