RFR(JDK12/NIO) 8202285: (fs) Add a method to Files for comparing file contents

Joe Wang huizhe.wang at oracle.com
Thu Sep 20 00:34:10 UTC 2018



On 9/19/18, 2:02 PM, Roger Riggs wrote:
> Hi Alan,
>
> On 9/19/18 4:14 PM, Alan Bateman wrote:
>> On 19/09/2018 19:48, Joe Wang wrote:
>>> Hi,
>>>
>>> After much discussion and 10 iterations of reviews, this proposal 
>>> has evolved from what was the original isSameContent method to a 
>>> mismatch method. API-wise, a compare method was also considered as 
>>> it looked like just a short step forward from mismatch, however, it 
>>> was eventually dropped since there is no convincing use case 
>>> comparing files lexicographically by contents. Impl-wise, extensive 
>>> performance benchmarking has been done to compare a buffered reading 
>>> vs memory mapping, the result was that a simple buffered reading 
>>> performed better among small files, and those with the mismatched 
>>> byte closer to the beginning of files. Since the proposed method's 
>>> targeted files are small ones, the impl currently does a buffered 
>>> reading only.
>>>
>>> Please review.
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8202285
>>>
>>> specdiff: 
>>> http://cr.openjdk.java.net/~joehw/jdk12/8202285/specdiff/java/nio/file/Files.html
>>>
>>> webrev: http://cr.openjdk.java.net/~joehw/jdk12/8202285/webrev/
>> Starting out using buffered I/O is probably okay but I assume we will 
>> want to change this in the future to having it use memory mapped I/O 
>> beyond a certain threshold.
> This came up in off-line discussions, it seems unlikely that two files 
> will differ only in the last of 100Mb
> and it will require a separate code path that will very infrequently 
> be exercised.  So I'd still to a single
> technique even if it is slightly slower for very large files to keep 
> the size of the code in check.
> If it shows up later as a performance problem it can be added.

+1
>
> $.02, Roger
>
>>
>> Can you explain the use of toRealPath and comparing the names? That 
>> shouldn't be needed. Also the catching of ProviderMismatchExcepiton 
>> seems a bit strange too. Can you replace mismatchByAttrs with 
>> isSameFile? You could call this from Files.mismatch and then use the 
>> supporting implementation for the case that the files are not the same.

The purpose of toRealPath is so that we can do a quick Path comparison 
by following symlinks if any, I'll add a test to compare a file and its 
symbolic link. ProviderMismatchExcepiton was added in one of the 
iterations when we defined UOE, shall be removed in the next update 
(I'll have an update after some performance work to compare with direct 
buffer is done).

mismatchByAttrs cannot be replaced with isSameFile, it does one edge 
case check more than isSameFile when the size of one of the files is 
zero. mismatchByAttrs opens the files once vs twice if isSameFile and 
size(path) are called.

Thanks,
Joe

>>
>> -Alan
>


More information about the core-libs-dev mailing list