RFR 8072773 (fs) Files.lines needs a better splitting implementation for stream source
Paul Sandoz
paul.sandoz at oracle.com
Wed Jun 3 19:58:58 UTC 2015
On Jun 3, 2015, at 9:19 PM, Xueming Shen <xueming.shen at oracle.com> wrote:
> On 06/03/2015 08:53 AM, Paul Sandoz wrote:
>> Hi,
>>
>> Please review an optimization for Files.lines for certain charsets:
>>
>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8072773-File-lines/webrev/
>>
>> If a charset is say US-ASCII or UTF-8 it is possible to implement an efficient splitting Spliterator that scans bytes from a mid-point to search for line feed characters.
>>
>> Splitting uses a mapped byte buffer. Traversal uses FileChannel.reads at an offset. In previous incarnations i tried to use mapped byte buffer for both, but for some reason the traversal performance was not good (both on Mac and x86). In any case i am happy with the current approach as there is minimal layering between the FileChannel and BufferedReader leveraged to read the lines.
>>
>> Sequential performance is similar (same or better) than the current approach. Parallel performance is much better than the current approach.
>>
>> Some advice on two aspects would be most appreciated:
>>
>> 1) Is there an easy way to determine the sub-set of supported charsets that are applicable?
>>
>
> It's easy though a little heavy :-)
Thanks, that is a little heavy, but i suppose computed values for charsets could be stashed in a static CHM.
Paul.
> getLFCR returns a byte[] for the "byte" form of
> the \n and \r in a particular encodings, if each one of them can be mapped into
> one byte. Then we can use b[0] for \n and b[1] for \r in trySplit(). This makes
> the new fast version work for most of charsets.
>
> private static byte[] getLFCR(Charset cs) {
> try {
> if (cs.canEncode()) {
> ByteBuffer bb = cs.newEncoder()
> .encode(CharBuffer.wrap(new char[] { '\n', '\r' }));
> if (bb.remaining() == 2) {
> CharBuffer cb = cs.newDecoder().decode(bb);
> if (cb.remaining() == 2 &&
> cb.get() == '\n' && cb.get() == '\r') {
> bb.flip();
> byte[] ba = new byte[2];
> bb.get(ba);
> return ba;
> }
> }
> }
> } catch (Exception x) {}
> return null;
>
> }
>
> -sherman
More information about the core-libs-dev
mailing list