Performance regression in java.util.zip.Deflater
Dave Bristor
David.Bristor at Sun.COM
Thu Dec 20 22:33:49 UTC 2007
Hi Ig,
Thanks for the suggestions. Here's some more history.
Clemens Eisserer wrote:
> Hi again,
>
>> Sun engineers have tried to get reasonable performance
>> without using JNI_Get*Critical, since that introduces other
>> serious performance problems. It was my belief that any
>> pathological n^2 performance problems had been truly fixed.
> At least the code in JDK7u23 looks like (n^2)/2 or something like
> that, it copies every time the whole bytes which are left, including
> malloc/free.
Successive attempts to address this performance / scalability problem have
focused on minimizing the amount of data copied. As noted, the fix is in
DeflaterOutputStream.write, where stride bytes at a time are deflated, *not*
the entire user-provided data buffer. This results in more JNI calls (and
consequently more malloc-copy-deflate-free) but does not stall GC.
>> Sun engineers have tried to get reasonable performance
>> without using JNI_Get*Critical, since that introduces other
>> serious performance problems.
> Could please tell me when and why. As far as I understood the problem
> with the *Critical*-Functions is that they hinder the JVM in doing
> some operations (GC, ...) which limits scalability.
Prior to 1.5.0_u7 the *Critical* function were used, but for the sake of
6206933, their use was replaced with data copying.
> If this is the only reson, using them may not be that bad if the
> Get*ArrayRegion also has some GC-atomic behaviour. Copying 50mb data
> atomically also blocks the GC, doesn't it?
>
> I am working on a fix which processes the data in "strides", therefor
> the lock is only held a short time. Is this really a bad idea, except
> for the additional JNI overhead?
The observations I've made show that the use of strides results in 2x slower
performance as compared with the *Critical*. Certainly not ideal, but
certainly much better than the ~10x worse performance than early attempts at
resolving the issue.
FWIW, we looked into using DirectByteBuffer but did not like the idea of
keeping 2 copies of data around.
Moving the striding from DeflaterOutputStream to Deflater (and possibly
providing similar functionality in the Inflater side) seems like a Good Idea.
IIRC, we put the striding into DeflaterOutputStream because that has the bufer
whose size is known (and optionally provided by the user when the instance is
created).
Thanks,
Dave
>
> Thanks, lg Clemens
More information about the core-libs-dev
mailing list