Hashing files/bytes <was> Re: RFR(JDK11/NIO) 8202285: (fs) Add a method to Files for comparing file contents

forax at univ-mlv.fr forax at univ-mlv.fr
Wed May 2 06:53:21 UTC 2018


----- Mail original -----
> De: "John Rose" <john.r.rose at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Paul Sandoz" <paul.sandoz at oracle.com>, "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev"
> <core-libs-dev at openjdk.java.net>
> Envoyé: Mercredi 2 Mai 2018 07:35:38
> Objet: Re: Hashing files/bytes <was> Re: RFR(JDK11/NIO) 8202285: (fs) Add a method to Files for comparing file contents

> Here's another potential stacking:
> 
> Define an interface ByteSequence, similar to CharSequence,
> as a zero-copy reference to some stored bytes somewhere.
> (Give it a long length.)  Define bulk methods on it like hash
> and mismatch and transferTo.  Then make File and ByteBuffer
> implement it.  Deal with the cross-product of source and
> destination types underneath the interface.
> 
> (Also I want ByteSequence as a way to encapsulate resource
> data for class files and condy, using zero-copy methods.
> The types byte[] and String don't scale and require copies.)

your ByteSequence is ByteBuffer !
a ByteBuffer can be a mapped file or wrapped a byte array,
mismatch is compareTo, transferTo is put(ByteBuffer), and hash should be messageDigest.digest(ByteBuffer) which doesn't exist but should.

> 
> — John

Rémi

> 
> On May 1, 2018, at 3:04 PM, forax at univ-mlv.fr wrote:
>> 
>> ----- Mail original -----
>>> De: "Paul Sandoz" <paul.sandoz at oracle.com>
>>> À: "Remi Forax" <forax at univ-mlv.fr>
>>> Cc: "Alan Bateman" <Alan.Bateman at oracle.com>, "nio-dev"
>>> <nio-dev at openjdk.java.net>, "core-libs-dev"
>>> <core-libs-dev at openjdk.java.net>
>>> Envoyé: Mardi 1 Mai 2018 00:37:57
>>> Objet: Hashing files/bytes <was> Re: RFR(JDK11/NIO) 8202285: (fs) Add a method
>>> to Files for comparing file contents
>> 
>>> Thanks, better then i expected with the transferTo method we recently added, but
>>> i think we could do even better for the ease of use case of “give me the hash
>>> of this file contents or these bytes or this byte buffer".
>> 
>> yes, it can be a nice addition to java.nio.file.Files and in that case the
>> method that compare content can have reference in its documentation to this new
>> method.
>> 
>>> 
>>> Paul.
>> 
>> Rémi
>> 
>>> 
>>>> On Apr 30, 2018, at 3:23 PM, Remi Forax <forax at univ-mlv.fr> wrote:
>>>> 
>>>>> 
>>>>> To Remi’s point this might dissuade/guide developers from using this method when
>>>>> there are other more efficient techniques available when operating at larger
>>>>> scales. However, it is unfortunately harder that it should be in Java to hash
>>>>> the contents of a file, a byte[] or ByteBuffer, according to some chosen
>>>>> algorithm (or a good default).
>>>> 
>>>> it's 6 lines of code
>>>> 
>>>> var digest = MessageDigest.getInstance("SHA1");
>>>> try(var input = Files.newInputStream(Path.of("myfile.txt"));
>>>>     var output = new DigestOutputStream(OutputStream.nullOutputStream(), digest)) {
>>>>   input.transferTo(output);
>>>> }
>>>> var hash = digest.digest();
>>>> 
>>>> or 3 lines if you don't mind to load the whole file in memory
>>>> 
>>>> var digest = MessageDigest.getInstance("SHA1");
>>>> digest.update(Files.readAllBytes(Path.of("myfile.txt")));
>>>> var hash = digest.digest();
>>>> 
>>>>> 
>>>>> Paul.
>>>> 
> >>> Rémi


More information about the nio-dev mailing list