ZipFileSystem performance regression
Lennart Börjeson
lenborje at gmail.com
Tue Apr 23 07:19:52 UTC 2019
Any chance we might get this repaired in time for the Java 13 ramp down?
Best regards,
/Lennart Börjeson
> 16 apr. 2019 kl. 23:02 skrev Langer, Christoph <christoph.langer at sap.com>:
>
> Hi,
>
> I also think the regression should be repaired and maybe we can have an option like "lazy compress" to avoid compression on write but defer it to zipfs closing time.
>
> It should also be possible to parallelize deflation during close, shouldn't it?
>
> Best regards
> Christoph
>
>> -----Original Message-----
>> From: core-libs-dev <core-libs-dev-bounces at openjdk.java.net> On Behalf
>> Of Xueming Shen
>> Sent: Dienstag, 16. April 2019 22:50
>> To: Lennart Börjeson <lenborje at gmail.com>
>> Cc: core-libs-dev at openjdk.java.net
>> Subject: Re: ZipFileSystem performance regression
>>
>> Well, have to admitted I didn't expect your use scenario when made the
>> change. Thought as a
>>
>> filesystem runtime access performance has more weight compared to
>> shutdown performance...
>>
>> basically you are no using zipfs as a filesystem, but another jar tool
>> that happens to have
>>
>> better in/out concurrent performance. Yes, back then I was working on
>> using zipfs as a memory
>>
>> filesystem. One possible usage is that javac to use it as its filesystem
>> (temp?) to write out compiled
>>
>> class files ... so I thought I can have better performance if I can keep
>> those classes uncompressed
>>
>> until the zip/jarfs is closed and written to a "jar" file.
>>
>> That said, regression is a regression, we probably want to get the
>> performance back for your
>>
>> use scenario. Just wanted to give you guys some background what happened
>> back then.
>>
>>
>> -Sherman
>>
>>
>> On 4/16/19 12:54 PM, Lennart Börjeson wrote:
>>> I’m using the tool I wrote to compress directories with thousands of log
>> files. The standard zip utility (as well as my utility when run with JDK 12) takes
>> up to an hour of user time to create the archive, on our server class 40+ core
>> servers this is reduced to 1–2 minutes.
>>>
>>> So while I understand the motivation for the change, I don’t get why you
>> would want to use ZipFs for what in essence is a RAM disk, *unless* you
>> want it compressed in memory?
>>>
>>> Oh well. Do we need a new option for this?
>>>
>>> /Lennart Börjeson
>>>
>>> Electrogramma ab iPhono meo missum est
>>>
>>>> 16 apr. 2019 kl. 21:44 skrev Xueming Shen <xueming.shen at gmail.com>:
>>>>
>>>> One of the motivations back then is to speed up the performance of
>> accessing
>>>>
>>>> those entries, means you don't have to deflate/inflate those
>> new/updated entries
>>>>
>>>> during the lifetime of that zipfilesystem. Those updated entries only get
>> compressed
>>>>
>>>> when go to storage. So the regression is more like a trade off of
>> performance of
>>>>
>>>> different usages. (it also simplifies the logic on handing different types of
>> entries ...)
>>>>
>>>>
>>>> One idea I experimented long time ago for jartool is to concurrently write
>> out
>>>>
>>>> entries when need compression ... it does gain some performance
>> improvement
>>>>
>>>> on multi-cores, but not lots, as it ends up coming back to the main thread
>> to
>>>>
>>>> write out to the underlying filesystem.
>>>>
>>>>
>>>> -Sherman
>>>>
>>>>> On 4/16/19 5:21 AM, Claes Redestad wrote:
>>>>> Both before and after this regression, it seems the default behavior is
>>>>> not to use a temporary file (until ZFS.sync(), which writes to a temp
>>>>> file and then moves it in place, but that's different from what happens
>>>>> with the useTempFile option enabled). Instead entries (and the backing
>>>>> zip file system) are kept in-memory.
>>>>>
>>>>> The cause of the issue here is instead that no deflation happens until
>>>>> sync(), even when writing to entries in-memory. Previously, the
>>>>> deflation happened eagerly, then the result of that was copied into
>>>>> the zip file during sync().
>>>>>
>>>>> I've written a proof-of-concept patch that restores the behavior of
>>>>> eagerly compressing entries when the method is METHOD_DEFLATED
>> and the
>>>>> target is to store byte[]s in-memory (the default scenario):
>>>>>
>>>>> http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/
>>>>>
>>>>> This restores performance of parallel zip to that of 11.0.2 for the
>>>>> default case. It still has a similar regression for the case where
>>>>> useTempFile is enabled, but that should be easily addressed if this
>>>>> looks like a way forward?
>>>>>
>>>>> (I've not yet created a bug as I got too caught up in trying to figure
>>>>> out what was going on here...)
>>>>>
>>>>> Thanks!
>>>>>
>>>>> /Claes
>>>>>
>>>>>> On 2019-04-16 09:29, Alan Bateman wrote:
>>>>>>> On 15/04/2019 14:32, Lennart Börjeson wrote:
>>>>>>> :
>>>>>>>
>>>>>>> Previously, the deflation was done when in the call to Files.copy, thus
>> executed in parallel, and the final ZipFileSystem.close() didn't do anything
>> much.
>>>>>>>
>>>>>> Can you submit a bug? When creating/updating a zip file with zipfs then
>> the closing the file system creates the zip file. Someone needs to check but it
>> may have been that the temporary files (on the file system hosting the zip
>> file) were deflated when writing (which is surprising but may have been the
>> case).
>>>>>>
>>>>>> -Alan
More information about the core-libs-dev
mailing list