Performance regression in java.util.zip.Deflater

Thu Dec 20 22:33:49 UTC 2007

Hi Ig,

Thanks for the suggestions.  Here's some more history.

Clemens Eisserer wrote:
> Hi again,
> 
>> Sun engineers have tried to get reasonable performance
>> without using JNI_Get*Critical, since that introduces other
>> serious performance problems.  It was my belief that any
>> pathological n^2 performance problems had been truly fixed.
> At least the code in JDK7u23 looks like (n^2)/2 or something like
> that, it copies every time the whole bytes which are left, including
> malloc/free.

Successive attempts to address this performance / scalability problem have 
focused on minimizing the amount of data copied.  As noted, the fix is in 
DeflaterOutputStream.write, where stride bytes at a time are deflated, *not* 
the entire user-provided data buffer.  This results in more JNI calls (and 
consequently more malloc-copy-deflate-free) but does not stall GC.

>> Sun engineers have tried to get reasonable performance
>> without using JNI_Get*Critical, since that introduces other
>> serious performance problems.
> Could please tell me when and why. As far as I understood the problem
> with the *Critical*-Functions is that they hinder the JVM in doing
> some operations (GC, ...) which limits scalability.

Prior to 1.5.0_u7 the *Critical* function were used, but for the sake of 
6206933, their use was replaced with data copying.

> If this is the only reson, using them may not be that bad if the
> Get*ArrayRegion also has some GC-atomic behaviour. Copying 50mb data
> atomically also blocks the GC, doesn't it?
> 
> I am working on a fix which processes the data in "strides", therefor
> the lock is only held a short time. Is this really a bad idea, except
> for the additional JNI overhead?

The observations I've made show that the use of strides results in 2x slower 
performance as compared with the *Critical*.  Certainly not ideal, but 
certainly much better than the ~10x worse performance than early attempts at 
resolving the issue.

FWIW, we looked into using DirectByteBuffer but did not like the idea of 
keeping 2 copies of data around.

Moving the striding from DeflaterOutputStream to Deflater (and possibly 
providing similar functionality in the Inflater side) seems like a Good Idea.
IIRC, we put the striding into DeflaterOutputStream because that has the bufer 
whose size is known (and optionally provided by the user when the instance is 
created).

Thanks,
	Dave

> 
> Thanks, lg Clemens