RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for reading/writing a string from/to a file

Joe Wang huizhe.wang at oracle.com
Wed Jun 13 19:52:30 UTC 2018


Thanks Roger!

I pushed the changeset after s/unmappble/unmappable, it found three of 
them :-)

Best,
Joe

On 6/12/18, 10:37 AM, Roger Riggs wrote:
> Hi Joe,
>
> Looks good to me.
>
> Typo: StringCoding.java:1026 "unmappble"  (no new webrev needed)
>
> Regards, Roger
>
>
> On 6/12/2018 12:52 PM, Joe Wang wrote:
>> Hi all,
>>
>> It's been a while since the 1st round of reviews. Thank you all for 
>> the valuable comments and suggestions!  Note that Roger's last 
>> response went to nio-dev, but not core-libs-dev, I've therefore 
>> attached it below.
>>
>> CSR [2]: the CSR is now approved. Note the write method has been 
>> changed to writeString.
>>
>> Impl [3]: for performance reason and the different requirement from 
>> byte-string conversion for malformed/unmappable character handing, 
>> the implementation uses a specific method separate from the existing 
>> ones. Both Sherman and Alan preferred specific method than adding 
>> parameters to the APIs that might be error prone. Thanks Sherman for 
>> the code!
>>
>> JMH benchmark tests showed the new read method performs better than 
>> an operation that reads the bytes and convert to string. The gains 
>> start to be significant even at quite reasonable sizes (< 10K).
>>
>>
>> [1] JBS: https://bugs.openjdk.java.net/browse/JDK-8201276
>> [2] CSR: https://bugs.openjdk.java.net/browse/JDK-8202055
>> [3] webrevs: http://cr.openjdk.java.net/~joehw/jdk11/8201276/webrev/
>>
>> Please review.
>>
>> Thanks,
>> Joe
>>
>> On 5/15/18, 7:51 AM, Roger Riggs wrote:
>>> Hi Joe, et. al.
>>>
>>> My $.02 on line separators:
>>>
>>>  - We should avoid clever tricks trying to solve problems that are 
>>> infrequent such as cross OS file line operations.
>>>     Most files will be read/written on a single system so the line 
>>> separators will be as expected.
>>>
>>>  - Have/use APIs that split lines consistently accepting both line 
>>> separators so developer code can be agnostic to line separators.
>>>     aka BufferedReader.readLine for developers that are processing 
>>> the contents *as lines*.
>>>    Those other methods already exist; if there are any gaps in line 
>>> oriented processing that's a separate task.
>>>
>>>  - These new file methods are defined to handle Charset 
>>> encoding/decoding and buffering.
>>>     Since there are other methods to deal with files as lines these 
>>> methods should not look for or break to lines.
>>>
>>>  - Performance: adding code to look for line characters will slow it 
>>> down and in most cases would have no impact
>>>     since the line endings will match the system.
>>>
>>>  - The strings passed to writeString (CharSequence) should have been 
>>> constructed using the proper line separators.
>>>     There are any number of methods that insert the os specific line 
>>> terminator and its easy to write File.separator as needed.
>>>
>>> As for the method naming:
>>>
>>> I'd prefer "writeString" as a familiar term that isn't stretched too 
>>> far by making the argument be a CharSequence.
>>> its fine to call a CharSequence a string (with a lower case s).
>>>
>>> $.02, Roger
>>>
>>>
>>> On 4/27/18 6:33 PM, Joe Wang wrote:
>>>>
>>>>
>>>> On 4/27/2018 12:50 PM, forax at univ-mlv.fr wrote:
>>>>> ----- Mail original -----
>>>>>> De: "Joe Wang" <huizhe.wang at oracle.com>
>>>>>> À: "Remi Forax" <forax at univ-mlv.fr>, "Alan Bateman" 
>>>>>> <Alan.Bateman at oracle.com>
>>>>>> Cc: "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev" 
>>>>>> <core-libs-dev at openjdk.java.net>
>>>>>> Envoyé: Vendredi 27 Avril 2018 21:21:01
>>>>>> Objet: Re: RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for 
>>>>>> reading/writing a string from/to a file
>>>>>> Hi Rémi, Alan,
>>>>> Hi Joe,
>>>>>
>>>>>> I'm not sure we'd want to replace system dependent line 
>>>>>> separators with
>>>>>> '\n'. There are cases as you described where the replacement makes
>>>>>> sense. However,  there are other cases too. For example, the 
>>>>>> purpose is
>>>>>> to read, edit a text file and then write it back. If this is on 
>>>>>> Windows,
>>>>>> and if line separators are replaced with '\n', that would cause the
>>>>>> resulting file not formatted properly in for example Notepad. 
>>>>>> There are
>>>>>> cases where users write text files on Windows using Java, and 
>>>>>> only found
>>>>>> the lines were not separated in Notepad.
>>>>> I agree that why the counterpart of readString() that write the 
>>>>> string should inserts the platform's line separator.
>>>>> BTW, i just found that those methods are not named writeString, or 
>>>>> writeCharSequence, i think they should.
>>>>
>>>> While readString() does not modify the original content (e.g. by 
>>>> replacing the platform's line separator with '\n'), write(String) 
>>>> won't either, by adding extra characters such as the line separator.
>>>>
>>>> I would think interfaces shall read along with the parameters.
>>>>     readString(Path) == read as a String from the specified Path 
>>>> (one could argue for readToString, readAsString, but we generally 
>>>> omit the preps)
>>>>     write(Path, CharSequence) == write the CharSequence to the 
>>>> file, since CharSequence is already in the method signature as a 
>>>> parameter, we probably don't want to add that to the name, 
>>>> otherwise it would read like repeating the word CharSequence.
>>>>
>>>> It is in a similar situation as write(Path, Iterable) where it was 
>>>> defined as writeLines(Path, Iterable).
>>>>
>>>>>
>>>>>> Files.write(Path, Iterable) is also specified to terminate each line
>>>>>> with the platform's line separator. If readString does the 
>>>>>> replacement,
>>>>>> it would be inconsistent.
>>>>>
>>>>> Anyway, if we look for consistency the methods writeCharSequence 
>>>>> should transform the \n in the CharSequence to the platform's line 
>>>>> separator.
>>>>>
>>>>> Files.write(Path, Iterable) is is not a counterpart of 
>>>>> readString(), it's consistent with Files.lines() or 
>>>>> Files.readLines() (or BufferedReader.readLine()) that all suppress 
>>>>> the line separators. Anyway, Files.write(path, 
>>>>> readString(path)::line) will be consistent if you replace the line 
>>>>> separators or not because String.line() suppresses the line 
>>>>> separators.
>>>>
>>>> readString pairs with write(String), therefore it's more like 
>>>> Files.write(path, readString(path)) than readString(path)::line. 
>>>> The use case:
>>>>
>>>>     String s = Files.read(path);
>>>>     Files.write(path, s.replace("/config/location", "/new/location"));
>>>>
>>>> would then work as expected.
>>>>
>>>> These two methods are one-off (open/read/write/close file) 
>>>> operation. write(String) therefore is not intended for 
>>>> adding/appending multiple Strings from other operations such as 
>>>> String.line(). If an app needs to put the result of String.line() 
>>>> or any other processes into a file using this method, it needs to 
>>>> take care of adding the line separator itself when necessary. "when 
>>>> necessary" would be a judgement call by the developer.
>>>>
>>>> That said, if there's a consensus on the idea of terminating the 
>>>> string with a line separator, then readString method would need to 
>>>> strip it off to go with the write method.
>>>>
>>>>>
>>>>>> I would think readString behaves like readAllBytes, that would 
>>>>>> simply read all content faithfully.
>>>>> readString does an interpolation due to the Charset so it's not 
>>>>> the real content, again, the idea is that developers should not 
>>>>> have to care about the platform's line separators (they have more 
>>>>> important things to think).
>>>>>
>>>>> The other solution is to just not introduce those new methods, 
>>>>> after all Files.lines().collect(Collectors.joining("\n")) already 
>>>>> does the job, no ?
>>>>
>>>> While there are many ways to do it, none is as straight-forward. 
>>>> "Read (entire) file to String"/"Write String to file" are popular 
>>>> requests from users. Read to string -> do some String manipulation 
>>>> -> write it back is such a simple use case, it really needs to be 
>>>> very easy to do as illustrated in the above code snippet.
>>>>
>>>> Cheers,
>>>> Joe
>>>>
>>>>>
>>>>>> Best,
>>>>>> Joe
>>>>> regards,
>>>>> Rémi
>>>>>
>>>>>> On 4/27/2018 4:43 AM, forax at univ-mlv.fr wrote:
>>>>>>> Hi Alan,
>>>>>>>
>>>>>>> People do not read the documentation.
>>>>>>> So adding something in the documentation about when a method 
>>>>>>> should be used or
>>>>>>> not is never a solution.
>>>>>>>
>>>>>>> Here the user want a String and provides a charset so you have 
>>>>>>> no way but to
>>>>>>> decode the content to substitute the line separator.
>>>>>>>
>>>>>>> cheers,
>>>>>>> Rémi
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>>> De: "Alan Bateman" <Alan.Bateman at oracle.com>
>>>>>>>> À: "Remi Forax" <forax at univ-mlv.fr>, "Joe Wang" 
>>>>>>>> <huizhe.wang at oracle.com>
>>>>>>>> Cc: "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev"
>>>>>>>> <core-libs-dev at openjdk.java.net>
>>>>>>>> Envoyé: Vendredi 27 Avril 2018 13:34:12
>>>>>>>> Objet: Re: RFR (JDK11/NIO) 8201276: (fs) Add methods to Files for
>>>>>>>> reading/writing a string from/to a file
>>>>>>>> On 27/04/2018 12:29, Remi Forax wrote:
>>>>>>>>> I think that having a readString that includes OS dependent 
>>>>>>>>> line separators is a
>>>>>>>>> mistake,
>>>>>>>>> Java does a great job to try to shield the developer from the 
>>>>>>>>> kind of things
>>>>>>>>> that makes programs behave differently on different platforms.
>>>>>>>>>
>>>>>>>>> readString should subtitute (\r)?\n to \n otherwise either 
>>>>>>>>> people will do a call
>>>>>>>>> replace() which is less efficient or will learn the lesson the 
>>>>>>>>> hard way.
>>>>>>>>>
>>>>>>>>> raw string literal does the same substitution for the same 
>>>>>>>>> reason.
>>>>>>>>>
>>>>>>>> Yes, there are several discussion points around this and 
>>>>>>>> somewhat timely
>>>>>>>> with multi-string support.
>>>>>>>>
>>>>>>>> One thing that I think Joe will need in this API is some note 
>>>>>>>> to make it
>>>>>>>> clearer what the intended usage is (as I think the intention is 
>>>>>>>> simple
>>>>>>>> cases with mostly single lines of text).
>>>>>>>>
>>>>>>>> -Alan.
>>>>
>>>
>


More information about the core-libs-dev mailing list