4206909 - adding Z_SYNC_FLUSH support to deflaters

Martin Buchholz martinrb at google.com
Sat Sep 5 01:21:33 UTC 2009


On Fri, Sep 4, 2009 at 18:02, Xueming Shen<Xueming.Shen at sun.com> wrote:
> Thanks for the comments.
>
> Brandon Long wrote:
>>
>> In the new Deflater.deflate, you are checking valid flush values against
>> NO_FLUSH twice (and missing FULL_FLUSH)
>>
>
> good catch, a bad copy/paste. More unit tests will be written to catch this
> kind of problem after
> we finalize the APIs.
>
>> The new DeflaterOutputStream.flush, you loop as long as some output is
>> generated, which can call deflate an extra time.  Normally, code that
>> does this loops until there is no def.needsInput and the output is less
>> than the size of the output buffer.  I would have thought it would have
>> generated extra output, but then the loop would never finish... so I
>> guess this must work.
>>
>>
>
> The current code works.  But your suggestion of using len < b.length
> (actually it means zlib's
> availble_out >  0) seems a better and correct approach, which can same an
> extra round. A
> quick scan of the zlib deflate/flush_pending shows it should work, I will
> update to that
> after give the code a more careful read and run some tests. Given each/every
> write() loops
> till needsInput() returns true, I don't think we need to consider this in
> flush().
>
>> Overall, I'm least happy with #4, since I feel it leaves a bug.  flush()
>> on a stream should flush everything I've written to the stream.  This
>> bug is that it currently doesn't, and this doesn't fix it.  It makes it
>> possible for people to fix it (which isn't possible currently without
>> using a completely separate implementation), but it doesn't fix the bug.
>>

I think "bug" is going too far.  There are OutputStreams that can't dispose
of previously read data without reading some more first.  It depends on
the transformation.  DeflaterOutputStream is in a gray area -
it "could" write all the data to its underlying stream, but at the cost of
sacrificing some compression (which is its purpose).

> Understood and "agreed":-) I'm wiling to change position under more
> pressure:-)  And we
> can add that anytime. It's better than put it in now and have to take it out
> later or add in some
> ugly workaround.

Maybe we should understand the risk.  Doing a SYNC_FLUSH on every
DOS.flush() won't cause the compression/decompression to "break",
you will just have very slow and very bad compression.
How bad could it be?  Suppose we test with random data,
and doing a SYNC_FLUSH on every byte?  Presumably the "compressed"
output will  be larger than the input by some factor.  If that factor
is close to 1,
then it's probably OK...It's "only" a performance problem.

Anyways, I am leaning towards changing DOS to do the Right Thing.

Martin



More information about the core-libs-dev mailing list