Hashing files/bytes <was> Re: RFR(JDK11/NIO) 8202285: (fs) Add a method to Files for comparing file contents
forax at univ-mlv.fr
forax at univ-mlv.fr
Wed May 2 06:53:21 UTC 2018
----- Mail original -----
> De: "John Rose" <john.r.rose at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Paul Sandoz" <paul.sandoz at oracle.com>, "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev"
> <core-libs-dev at openjdk.java.net>
> Envoyé: Mercredi 2 Mai 2018 07:35:38
> Objet: Re: Hashing files/bytes <was> Re: RFR(JDK11/NIO) 8202285: (fs) Add a method to Files for comparing file contents
> Here's another potential stacking:
>
> Define an interface ByteSequence, similar to CharSequence,
> as a zero-copy reference to some stored bytes somewhere.
> (Give it a long length.) Define bulk methods on it like hash
> and mismatch and transferTo. Then make File and ByteBuffer
> implement it. Deal with the cross-product of source and
> destination types underneath the interface.
>
> (Also I want ByteSequence as a way to encapsulate resource
> data for class files and condy, using zero-copy methods.
> The types byte[] and String don't scale and require copies.)
your ByteSequence is ByteBuffer !
a ByteBuffer can be a mapped file or wrapped a byte array,
mismatch is compareTo, transferTo is put(ByteBuffer), and hash should be messageDigest.digest(ByteBuffer) which doesn't exist but should.
>
> — John
Rémi
>
> On May 1, 2018, at 3:04 PM, forax at univ-mlv.fr wrote:
>>
>> ----- Mail original -----
>>> De: "Paul Sandoz" <paul.sandoz at oracle.com>
>>> À: "Remi Forax" <forax at univ-mlv.fr>
>>> Cc: "Alan Bateman" <Alan.Bateman at oracle.com>, "nio-dev"
>>> <nio-dev at openjdk.java.net>, "core-libs-dev"
>>> <core-libs-dev at openjdk.java.net>
>>> Envoyé: Mardi 1 Mai 2018 00:37:57
>>> Objet: Hashing files/bytes <was> Re: RFR(JDK11/NIO) 8202285: (fs) Add a method
>>> to Files for comparing file contents
>>
>>> Thanks, better then i expected with the transferTo method we recently added, but
>>> i think we could do even better for the ease of use case of “give me the hash
>>> of this file contents or these bytes or this byte buffer".
>>
>> yes, it can be a nice addition to java.nio.file.Files and in that case the
>> method that compare content can have reference in its documentation to this new
>> method.
>>
>>>
>>> Paul.
>>
>> Rémi
>>
>>>
>>>> On Apr 30, 2018, at 3:23 PM, Remi Forax <forax at univ-mlv.fr> wrote:
>>>>
>>>>>
>>>>> To Remi’s point this might dissuade/guide developers from using this method when
>>>>> there are other more efficient techniques available when operating at larger
>>>>> scales. However, it is unfortunately harder that it should be in Java to hash
>>>>> the contents of a file, a byte[] or ByteBuffer, according to some chosen
>>>>> algorithm (or a good default).
>>>>
>>>> it's 6 lines of code
>>>>
>>>> var digest = MessageDigest.getInstance("SHA1");
>>>> try(var input = Files.newInputStream(Path.of("myfile.txt"));
>>>> var output = new DigestOutputStream(OutputStream.nullOutputStream(), digest)) {
>>>> input.transferTo(output);
>>>> }
>>>> var hash = digest.digest();
>>>>
>>>> or 3 lines if you don't mind to load the whole file in memory
>>>>
>>>> var digest = MessageDigest.getInstance("SHA1");
>>>> digest.update(Files.readAllBytes(Path.of("myfile.txt")));
>>>> var hash = digest.digest();
>>>>
>>>>>
>>>>> Paul.
>>>>
> >>> Rémi
More information about the core-libs-dev
mailing list