RFR 8072773 (fs) Files.lines needs a better splitting implementation for stream source

Wed Jun 3 15:53:27 UTC 2015

Hi,

Please review an optimization for Files.lines for certain charsets:

  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8072773-File-lines/webrev/

If a charset is say US-ASCII or UTF-8 it is possible to implement an efficient splitting Spliterator that scans bytes from a mid-point to search for line feed characters.

Splitting uses a mapped byte buffer. Traversal uses FileChannel.reads at an offset. In previous incarnations i tried to use mapped byte buffer for both, but for some reason the traversal performance was not good (both on Mac and x86). In any case i am happy with the current approach as there is minimal layering between the FileChannel and BufferedReader leveraged to read the lines.

Sequential performance is similar (same or better) than the current approach. Parallel performance is much better than the current approach.

Some advice on two aspects would be most appreciated:

1) Is there an easy way to determine the sub-set of supported charsets that are applicable?

2) We should try and explicitly unmap the mapped byte buffer when the stream is closed, using some sort of shared secret. How can i do that?

Paul.