RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for reading/writing a string from/to a file
Joe Wang
huizhe.wang at oracle.com
Wed Jun 13 19:52:30 UTC 2018
Thanks Roger!
I pushed the changeset after s/unmappble/unmappable, it found three of
them :-)
Best,
Joe
On 6/12/18, 10:37 AM, Roger Riggs wrote:
> Hi Joe,
>
> Looks good to me.
>
> Typo: StringCoding.java:1026 "unmappble" (no new webrev needed)
>
> Regards, Roger
>
>
> On 6/12/2018 12:52 PM, Joe Wang wrote:
>> Hi all,
>>
>> It's been a while since the 1st round of reviews. Thank you all for
>> the valuable comments and suggestions! Note that Roger's last
>> response went to nio-dev, but not core-libs-dev, I've therefore
>> attached it below.
>>
>> CSR [2]: the CSR is now approved. Note the write method has been
>> changed to writeString.
>>
>> Impl [3]: for performance reason and the different requirement from
>> byte-string conversion for malformed/unmappable character handing,
>> the implementation uses a specific method separate from the existing
>> ones. Both Sherman and Alan preferred specific method than adding
>> parameters to the APIs that might be error prone. Thanks Sherman for
>> the code!
>>
>> JMH benchmark tests showed the new read method performs better than
>> an operation that reads the bytes and convert to string. The gains
>> start to be significant even at quite reasonable sizes (< 10K).
>>
>>
>> [1] JBS: https://bugs.openjdk.java.net/browse/JDK-8201276
>> [2] CSR: https://bugs.openjdk.java.net/browse/JDK-8202055
>> [3] webrevs: http://cr.openjdk.java.net/~joehw/jdk11/8201276/webrev/
>>
>> Please review.
>>
>> Thanks,
>> Joe
>>
>> On 5/15/18, 7:51 AM, Roger Riggs wrote:
>>> Hi Joe, et. al.
>>>
>>> My $.02 on line separators:
>>>
>>> - We should avoid clever tricks trying to solve problems that are
>>> infrequent such as cross OS file line operations.
>>> Most files will be read/written on a single system so the line
>>> separators will be as expected.
>>>
>>> - Have/use APIs that split lines consistently accepting both line
>>> separators so developer code can be agnostic to line separators.
>>> aka BufferedReader.readLine for developers that are processing
>>> the contents *as lines*.
>>> Those other methods already exist; if there are any gaps in line
>>> oriented processing that's a separate task.
>>>
>>> - These new file methods are defined to handle Charset
>>> encoding/decoding and buffering.
>>> Since there are other methods to deal with files as lines these
>>> methods should not look for or break to lines.
>>>
>>> - Performance: adding code to look for line characters will slow it
>>> down and in most cases would have no impact
>>> since the line endings will match the system.
>>>
>>> - The strings passed to writeString (CharSequence) should have been
>>> constructed using the proper line separators.
>>> There are any number of methods that insert the os specific line
>>> terminator and its easy to write File.separator as needed.
>>>
>>> As for the method naming:
>>>
>>> I'd prefer "writeString" as a familiar term that isn't stretched too
>>> far by making the argument be a CharSequence.
>>> its fine to call a CharSequence a string (with a lower case s).
>>>
>>> $.02, Roger
>>>
>>>
>>> On 4/27/18 6:33 PM, Joe Wang wrote:
>>>>
>>>>
>>>> On 4/27/2018 12:50 PM, forax at univ-mlv.fr wrote:
>>>>> ----- Mail original -----
>>>>>> De: "Joe Wang" <huizhe.wang at oracle.com>
>>>>>> À: "Remi Forax" <forax at univ-mlv.fr>, "Alan Bateman"
>>>>>> <Alan.Bateman at oracle.com>
>>>>>> Cc: "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev"
>>>>>> <core-libs-dev at openjdk.java.net>
>>>>>> Envoyé: Vendredi 27 Avril 2018 21:21:01
>>>>>> Objet: Re: RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for
>>>>>> reading/writing a string from/to a file
>>>>>> Hi Rémi, Alan,
>>>>> Hi Joe,
>>>>>
>>>>>> I'm not sure we'd want to replace system dependent line
>>>>>> separators with
>>>>>> '\n'. There are cases as you described where the replacement makes
>>>>>> sense. However, there are other cases too. For example, the
>>>>>> purpose is
>>>>>> to read, edit a text file and then write it back. If this is on
>>>>>> Windows,
>>>>>> and if line separators are replaced with '\n', that would cause the
>>>>>> resulting file not formatted properly in for example Notepad.
>>>>>> There are
>>>>>> cases where users write text files on Windows using Java, and
>>>>>> only found
>>>>>> the lines were not separated in Notepad.
>>>>> I agree that why the counterpart of readString() that write the
>>>>> string should inserts the platform's line separator.
>>>>> BTW, i just found that those methods are not named writeString, or
>>>>> writeCharSequence, i think they should.
>>>>
>>>> While readString() does not modify the original content (e.g. by
>>>> replacing the platform's line separator with '\n'), write(String)
>>>> won't either, by adding extra characters such as the line separator.
>>>>
>>>> I would think interfaces shall read along with the parameters.
>>>> readString(Path) == read as a String from the specified Path
>>>> (one could argue for readToString, readAsString, but we generally
>>>> omit the preps)
>>>> write(Path, CharSequence) == write the CharSequence to the
>>>> file, since CharSequence is already in the method signature as a
>>>> parameter, we probably don't want to add that to the name,
>>>> otherwise it would read like repeating the word CharSequence.
>>>>
>>>> It is in a similar situation as write(Path, Iterable) where it was
>>>> defined as writeLines(Path, Iterable).
>>>>
>>>>>
>>>>>> Files.write(Path, Iterable) is also specified to terminate each line
>>>>>> with the platform's line separator. If readString does the
>>>>>> replacement,
>>>>>> it would be inconsistent.
>>>>>
>>>>> Anyway, if we look for consistency the methods writeCharSequence
>>>>> should transform the \n in the CharSequence to the platform's line
>>>>> separator.
>>>>>
>>>>> Files.write(Path, Iterable) is is not a counterpart of
>>>>> readString(), it's consistent with Files.lines() or
>>>>> Files.readLines() (or BufferedReader.readLine()) that all suppress
>>>>> the line separators. Anyway, Files.write(path,
>>>>> readString(path)::line) will be consistent if you replace the line
>>>>> separators or not because String.line() suppresses the line
>>>>> separators.
>>>>
>>>> readString pairs with write(String), therefore it's more like
>>>> Files.write(path, readString(path)) than readString(path)::line.
>>>> The use case:
>>>>
>>>> String s = Files.read(path);
>>>> Files.write(path, s.replace("/config/location", "/new/location"));
>>>>
>>>> would then work as expected.
>>>>
>>>> These two methods are one-off (open/read/write/close file)
>>>> operation. write(String) therefore is not intended for
>>>> adding/appending multiple Strings from other operations such as
>>>> String.line(). If an app needs to put the result of String.line()
>>>> or any other processes into a file using this method, it needs to
>>>> take care of adding the line separator itself when necessary. "when
>>>> necessary" would be a judgement call by the developer.
>>>>
>>>> That said, if there's a consensus on the idea of terminating the
>>>> string with a line separator, then readString method would need to
>>>> strip it off to go with the write method.
>>>>
>>>>>
>>>>>> I would think readString behaves like readAllBytes, that would
>>>>>> simply read all content faithfully.
>>>>> readString does an interpolation due to the Charset so it's not
>>>>> the real content, again, the idea is that developers should not
>>>>> have to care about the platform's line separators (they have more
>>>>> important things to think).
>>>>>
>>>>> The other solution is to just not introduce those new methods,
>>>>> after all Files.lines().collect(Collectors.joining("\n")) already
>>>>> does the job, no ?
>>>>
>>>> While there are many ways to do it, none is as straight-forward.
>>>> "Read (entire) file to String"/"Write String to file" are popular
>>>> requests from users. Read to string -> do some String manipulation
>>>> -> write it back is such a simple use case, it really needs to be
>>>> very easy to do as illustrated in the above code snippet.
>>>>
>>>> Cheers,
>>>> Joe
>>>>
>>>>>
>>>>>> Best,
>>>>>> Joe
>>>>> regards,
>>>>> Rémi
>>>>>
>>>>>> On 4/27/2018 4:43 AM, forax at univ-mlv.fr wrote:
>>>>>>> Hi Alan,
>>>>>>>
>>>>>>> People do not read the documentation.
>>>>>>> So adding something in the documentation about when a method
>>>>>>> should be used or
>>>>>>> not is never a solution.
>>>>>>>
>>>>>>> Here the user want a String and provides a charset so you have
>>>>>>> no way but to
>>>>>>> decode the content to substitute the line separator.
>>>>>>>
>>>>>>> cheers,
>>>>>>> Rémi
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>>> De: "Alan Bateman" <Alan.Bateman at oracle.com>
>>>>>>>> À: "Remi Forax" <forax at univ-mlv.fr>, "Joe Wang"
>>>>>>>> <huizhe.wang at oracle.com>
>>>>>>>> Cc: "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev"
>>>>>>>> <core-libs-dev at openjdk.java.net>
>>>>>>>> Envoyé: Vendredi 27 Avril 2018 13:34:12
>>>>>>>> Objet: Re: RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for
>>>>>>>> reading/writing a string from/to a file
>>>>>>>> On 27/04/2018 12:29, Remi Forax wrote:
>>>>>>>>> I think that having a readString that includes OS dependent
>>>>>>>>> line separators is a
>>>>>>>>> mistake,
>>>>>>>>> Java does a great job to try to shield the developer from the
>>>>>>>>> kind of things
>>>>>>>>> that makes programs behave differently on different platforms.
>>>>>>>>>
>>>>>>>>> readString should subtitute (\r)?\n to \n otherwise either
>>>>>>>>> people will do a call
>>>>>>>>> replace() which is less efficient or will learn the lesson the
>>>>>>>>> hard way.
>>>>>>>>>
>>>>>>>>> raw string literal does the same substitution for the same
>>>>>>>>> reason.
>>>>>>>>>
>>>>>>>> Yes, there are several discussion points around this and
>>>>>>>> somewhat timely
>>>>>>>> with multi-string support.
>>>>>>>>
>>>>>>>> One thing that I think Joe will need in this API is some note
>>>>>>>> to make it
>>>>>>>> clearer what the intended usage is (as I think the intention is
>>>>>>>> simple
>>>>>>>> cases with mostly single lines of text).
>>>>>>>>
>>>>>>>> -Alan.
>>>>
>>>
>
More information about the core-libs-dev
mailing list