RFR 8072773 (fs) Files.lines needs a better splitting implementation for stream source

Wed Jun 3 16:18:55 UTC 2015

On 03/06/2015 16:53, Paul Sandoz wrote:
> Hi,
>
> Please review an optimization for Files.lines for certain charsets:
>
>    http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8072773-File-lines/webrev/
>
> If a charset is say US-ASCII or UTF-8 it is possible to implement an efficient splitting Spliterator that scans bytes from a mid-point to search for line feed characters.
>
> Splitting uses a mapped byte buffer. Traversal uses FileChannel.reads at an offset. In previous incarnations i tried to use mapped byte buffer for both, but for some reason the traversal performance was not good (both on Mac and x86). In any case i am happy with the current approach as there is minimal layering between the FileChannel and BufferedReader leveraged to read the lines.
>
> Sequential performance is similar (same or better) than the current approach. Parallel performance is much better than the current approach.
>
> Some advice on two aspects would be most appreciated:
>
> 1) Is there an easy way to determine the sub-set of supported charsets that are applicable?
>
> 2) We should try and explicitly unmap the mapped byte buffer when the stream is closed, using some sort of shared secret. How can i do that?
>
As this code path is only for the default provider case then there's a 
good chance that it will be a FileChannelImpl, in which case you can 
call its unmap method (directly or via a shared secret). It is possible 
to interpose on the default provider so you can't be guaranteed it is a 
FileChannelImpl of course.

In passing, you might consider moving  ByteBufferLinesSpliterator to its 
own source file because Files is getting very big.

-Alan.