4206909 - adding Z_SYNC_FLUSH support to deflaters
Martin Buchholz
martinrb at google.com
Sat Sep 5 01:21:33 UTC 2009
On Fri, Sep 4, 2009 at 18:02, Xueming Shen<Xueming.Shen at sun.com> wrote:
> Thanks for the comments.
>
> Brandon Long wrote:
>>
>> In the new Deflater.deflate, you are checking valid flush values against
>> NO_FLUSH twice (and missing FULL_FLUSH)
>>
>
> good catch, a bad copy/paste. More unit tests will be written to catch this
> kind of problem after
> we finalize the APIs.
>
>> The new DeflaterOutputStream.flush, you loop as long as some output is
>> generated, which can call deflate an extra time. Normally, code that
>> does this loops until there is no def.needsInput and the output is less
>> than the size of the output buffer. I would have thought it would have
>> generated extra output, but then the loop would never finish... so I
>> guess this must work.
>>
>>
>
> The current code works. But your suggestion of using len < b.length
> (actually it means zlib's
> availble_out > 0) seems a better and correct approach, which can same an
> extra round. A
> quick scan of the zlib deflate/flush_pending shows it should work, I will
> update to that
> after give the code a more careful read and run some tests. Given each/every
> write() loops
> till needsInput() returns true, I don't think we need to consider this in
> flush().
>
>> Overall, I'm least happy with #4, since I feel it leaves a bug. flush()
>> on a stream should flush everything I've written to the stream. This
>> bug is that it currently doesn't, and this doesn't fix it. It makes it
>> possible for people to fix it (which isn't possible currently without
>> using a completely separate implementation), but it doesn't fix the bug.
>>
I think "bug" is going too far. There are OutputStreams that can't dispose
of previously read data without reading some more first. It depends on
the transformation. DeflaterOutputStream is in a gray area -
it "could" write all the data to its underlying stream, but at the cost of
sacrificing some compression (which is its purpose).
> Understood and "agreed":-) I'm wiling to change position under more
> pressure:-) And we
> can add that anytime. It's better than put it in now and have to take it out
> later or add in some
> ugly workaround.
Maybe we should understand the risk. Doing a SYNC_FLUSH on every
DOS.flush() won't cause the compression/decompression to "break",
you will just have very slow and very bad compression.
How bad could it be? Suppose we test with random data,
and doing a SYNC_FLUSH on every byte? Presumably the "compressed"
output will be larger than the input by some factor. If that factor
is close to 1,
then it's probably OK...It's "only" a performance problem.
Anyways, I am leaning towards changing DOS to do the Right Thing.
Martin
More information about the core-libs-dev
mailing list