RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for reading/writing a string from/to a file
Roger Riggs
Roger.Riggs at Oracle.com
Tue Jun 12 17:37:56 UTC 2018
Hi Joe,
Looks good to me.
Typo: StringCoding.java:1026 "unmappble" (no new webrev needed)
Regards, Roger
On 6/12/2018 12:52 PM, Joe Wang wrote:
> Hi all,
>
> It's been a while since the 1st round of reviews. Thank you all for
> the valuable comments and suggestions! Note that Roger's last
> response went to nio-dev, but not core-libs-dev, I've therefore
> attached it below.
>
> CSR [2]: the CSR is now approved. Note the write method has been
> changed to writeString.
>
> Impl [3]: for performance reason and the different requirement from
> byte-string conversion for malformed/unmappable character handing, the
> implementation uses a specific method separate from the existing ones.
> Both Sherman and Alan preferred specific method than adding parameters
> to the APIs that might be error prone. Thanks Sherman for the code!
>
> JMH benchmark tests showed the new read method performs better than an
> operation that reads the bytes and convert to string. The gains start
> to be significant even at quite reasonable sizes (< 10K).
>
>
> [1] JBS: https://bugs.openjdk.java.net/browse/JDK-8201276
> [2] CSR: https://bugs.openjdk.java.net/browse/JDK-8202055
> [3] webrevs: http://cr.openjdk.java.net/~joehw/jdk11/8201276/webrev/
>
> Please review.
>
> Thanks,
> Joe
>
> On 5/15/18, 7:51 AM, Roger Riggs wrote:
>> Hi Joe, et. al.
>>
>> My $.02 on line separators:
>>
>> - We should avoid clever tricks trying to solve problems that are
>> infrequent such as cross OS file line operations.
>> Most files will be read/written on a single system so the line
>> separators will be as expected.
>>
>> - Have/use APIs that split lines consistently accepting both line
>> separators so developer code can be agnostic to line separators.
>> aka BufferedReader.readLine for developers that are processing
>> the contents *as lines*.
>> Those other methods already exist; if there are any gaps in line
>> oriented processing that's a separate task.
>>
>> - These new file methods are defined to handle Charset
>> encoding/decoding and buffering.
>> Since there are other methods to deal with files as lines these
>> methods should not look for or break to lines.
>>
>> - Performance: adding code to look for line characters will slow it
>> down and in most cases would have no impact
>> since the line endings will match the system.
>>
>> - The strings passed to writeString (CharSequence) should have been
>> constructed using the proper line separators.
>> There are any number of methods that insert the os specific line
>> terminator and its easy to write File.separator as needed.
>>
>> As for the method naming:
>>
>> I'd prefer "writeString" as a familiar term that isn't stretched too
>> far by making the argument be a CharSequence.
>> its fine to call a CharSequence a string (with a lower case s).
>>
>> $.02, Roger
>>
>>
>> On 4/27/18 6:33 PM, Joe Wang wrote:
>>>
>>>
>>> On 4/27/2018 12:50 PM, forax at univ-mlv.fr wrote:
>>>> ----- Mail original -----
>>>>> De: "Joe Wang" <huizhe.wang at oracle.com>
>>>>> À: "Remi Forax" <forax at univ-mlv.fr>, "Alan Bateman"
>>>>> <Alan.Bateman at oracle.com>
>>>>> Cc: "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev"
>>>>> <core-libs-dev at openjdk.java.net>
>>>>> Envoyé: Vendredi 27 Avril 2018 21:21:01
>>>>> Objet: Re: RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for
>>>>> reading/writing a string from/to a file
>>>>> Hi Rémi, Alan,
>>>> Hi Joe,
>>>>
>>>>> I'm not sure we'd want to replace system dependent line separators
>>>>> with
>>>>> '\n'. There are cases as you described where the replacement makes
>>>>> sense. However, there are other cases too. For example, the
>>>>> purpose is
>>>>> to read, edit a text file and then write it back. If this is on
>>>>> Windows,
>>>>> and if line separators are replaced with '\n', that would cause the
>>>>> resulting file not formatted properly in for example Notepad.
>>>>> There are
>>>>> cases where users write text files on Windows using Java, and only
>>>>> found
>>>>> the lines were not separated in Notepad.
>>>> I agree that why the counterpart of readString() that write the
>>>> string should inserts the platform's line separator.
>>>> BTW, i just found that those methods are not named writeString, or
>>>> writeCharSequence, i think they should.
>>>
>>> While readString() does not modify the original content (e.g. by
>>> replacing the platform's line separator with '\n'), write(String)
>>> won't either, by adding extra characters such as the line separator.
>>>
>>> I would think interfaces shall read along with the parameters.
>>> readString(Path) == read as a String from the specified Path
>>> (one could argue for readToString, readAsString, but we generally
>>> omit the preps)
>>> write(Path, CharSequence) == write the CharSequence to the file,
>>> since CharSequence is already in the method signature as a
>>> parameter, we probably don't want to add that to the name, otherwise
>>> it would read like repeating the word CharSequence.
>>>
>>> It is in a similar situation as write(Path, Iterable) where it was
>>> defined as writeLines(Path, Iterable).
>>>
>>>>
>>>>> Files.write(Path, Iterable) is also specified to terminate each line
>>>>> with the platform's line separator. If readString does the
>>>>> replacement,
>>>>> it would be inconsistent.
>>>>
>>>> Anyway, if we look for consistency the methods writeCharSequence
>>>> should transform the \n in the CharSequence to the platform's line
>>>> separator.
>>>>
>>>> Files.write(Path, Iterable) is is not a counterpart of
>>>> readString(), it's consistent with Files.lines() or
>>>> Files.readLines() (or BufferedReader.readLine()) that all suppress
>>>> the line separators. Anyway, Files.write(path,
>>>> readString(path)::line) will be consistent if you replace the line
>>>> separators or not because String.line() suppresses the line
>>>> separators.
>>>
>>> readString pairs with write(String), therefore it's more like
>>> Files.write(path, readString(path)) than readString(path)::line. The
>>> use case:
>>>
>>> String s = Files.read(path);
>>> Files.write(path, s.replace("/config/location", "/new/location"));
>>>
>>> would then work as expected.
>>>
>>> These two methods are one-off (open/read/write/close file)
>>> operation. write(String) therefore is not intended for
>>> adding/appending multiple Strings from other operations such as
>>> String.line(). If an app needs to put the result of String.line() or
>>> any other processes into a file using this method, it needs to take
>>> care of adding the line separator itself when necessary. "when
>>> necessary" would be a judgement call by the developer.
>>>
>>> That said, if there's a consensus on the idea of terminating the
>>> string with a line separator, then readString method would need to
>>> strip it off to go with the write method.
>>>
>>>>
>>>>> I would think readString behaves like readAllBytes, that would
>>>>> simply read all content faithfully.
>>>> readString does an interpolation due to the Charset so it's not the
>>>> real content, again, the idea is that developers should not have to
>>>> care about the platform's line separators (they have more important
>>>> things to think).
>>>>
>>>> The other solution is to just not introduce those new methods,
>>>> after all Files.lines().collect(Collectors.joining("\n")) already
>>>> does the job, no ?
>>>
>>> While there are many ways to do it, none is as straight-forward.
>>> "Read (entire) file to String"/"Write String to file" are popular
>>> requests from users. Read to string -> do some String manipulation
>>> -> write it back is such a simple use case, it really needs to be
>>> very easy to do as illustrated in the above code snippet.
>>>
>>> Cheers,
>>> Joe
>>>
>>>>
>>>>> Best,
>>>>> Joe
>>>> regards,
>>>> Rémi
>>>>
>>>>> On 4/27/2018 4:43 AM, forax at univ-mlv.fr wrote:
>>>>>> Hi Alan,
>>>>>>
>>>>>> People do not read the documentation.
>>>>>> So adding something in the documentation about when a method
>>>>>> should be used or
>>>>>> not is never a solution.
>>>>>>
>>>>>> Here the user want a String and provides a charset so you have no
>>>>>> way but to
>>>>>> decode the content to substitute the line separator.
>>>>>>
>>>>>> cheers,
>>>>>> Rémi
>>>>>>
>>>>>> ----- Mail original -----
>>>>>>> De: "Alan Bateman" <Alan.Bateman at oracle.com>
>>>>>>> À: "Remi Forax" <forax at univ-mlv.fr>, "Joe Wang"
>>>>>>> <huizhe.wang at oracle.com>
>>>>>>> Cc: "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev"
>>>>>>> <core-libs-dev at openjdk.java.net>
>>>>>>> Envoyé: Vendredi 27 Avril 2018 13:34:12
>>>>>>> Objet: Re: RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for
>>>>>>> reading/writing a string from/to a file
>>>>>>> On 27/04/2018 12:29, Remi Forax wrote:
>>>>>>>> I think that having a readString that includes OS dependent
>>>>>>>> line separators is a
>>>>>>>> mistake,
>>>>>>>> Java does a great job to try to shield the developer from the
>>>>>>>> kind of things
>>>>>>>> that makes programs behave differently on different platforms.
>>>>>>>>
>>>>>>>> readString should subtitute (\r)?\n to \n otherwise either
>>>>>>>> people will do a call
>>>>>>>> replace() which is less efficient or will learn the lesson the
>>>>>>>> hard way.
>>>>>>>>
>>>>>>>> raw string literal does the same substitution for the same reason.
>>>>>>>>
>>>>>>> Yes, there are several discussion points around this and
>>>>>>> somewhat timely
>>>>>>> with multi-string support.
>>>>>>>
>>>>>>> One thing that I think Joe will need in this API is some note to
>>>>>>> make it
>>>>>>> clearer what the intended usage is (as I think the intention is
>>>>>>> simple
>>>>>>> cases with mostly single lines of text).
>>>>>>>
>>>>>>> -Alan.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20180612/44d9b945/attachment-0001.html>
More information about the nio-dev
mailing list