GetPrimitiveArrayCritical vs GetByteArrayRegion: 140x slow-down using -Xcheck:jni and java.util.zip.DeflaterOutputStream

Mon Mar 5 19:15:09 UTC 2018

Thanks! Changing the DeflaterOutputStream buffer size to be something other
than the default reduces the number of JNI native calls and is a possible
work around here, as this is an implementation detail could it be made in
the JDK? Unfortunately larger input sizes will also regress the issue as
the number of calls is "input size / buffer size". The JNI critical may
give direct access to the array but depending on the GC, may require a lock
and so lock contention may be a significant issue with the code and
contribute to tail latencies. In my original post I mention this is
difficult to measure and I think good practice is to avoid JNI critical
regions.

Thanks,
Ian

On Mon, Mar 5, 2018 at 10:41 AM Xueming Shen <xueming.shen at oracle.com>
wrote:

> On 03/05/2018 10:28 AM, Xueming Shen wrote:
> > On 03/05/2018 08:34 AM, Ian Rogers wrote:
> >> Firstly, we're not running -Xcheck:jni in production code :-) During
> >> development and testing it doesn't seem an unreasonable flag to enable,
> but
> >> a 140x regression is too much to get developers to swallow.
> >>
> >> There are 2 performance considerations:
> >> 1) the performance of -Xcheck:jni, which probably shouldn't be orders of
> >> magnitude worse than without the flag.
> >> 2) the problems associated with JNI criticals, for which
> GetByteArrayRegion
> >> is a panacea but by introducing a copying overhead.
> >>
> >>
> >
> > The reason the GetByteArrayCritical was/is being used here is exactly to
> avoid the copy
> > overhead, which was an issue escalated in the past. Though the "copy
> overhead" appears
> > to be much bigger for the GBAC when -Xcheck:jni is used here.
> >
> > Another issue with the DeflaterOutputStream is the default buf size is
> relative too small,
> > for historical reason. So with a DeflaterOutStream(deflated, new
> Deflater(), 8192 *64),
> > is which a bigger buf/8192*64,  the performance is close to the run with
> the -Xcheck:jni
> >
>
> type:
>
> in which a bigger buf/8192*64 is used, .... run without the -Xcheck:jni is
> specified.
>
> -Sherman
>
>