RFR JDK-8164278: java.util.Base64.EncOutputStream/DecInputStream is slower than corresponding version in javax.mail package

Tue Feb 6 02:00:45 UTC 2018

Hi,

Please help review the change for  JDK-8164278.

issue: https://bugs.openjdk.java.net/browse/JDK-8164278
webrev: http://cr.openjdk.java.net/~sherman/8164278/webrev

jmh.src: http://cr.openjdk.java.net/~sherman/8164278/Base64BM.java
jmh.result: http://cr.openjdk.java.net/~sherman/8164278/base64.bm

Base64.Decoder.decode0:
     Adopted the "similar" optimization approach we took in 
Base64.Encoder.encode0()
     to add a "fast path" to decode a block of 4-byte units together 
(current implementation
     decodes one single byte per while/loop. The jmh benchmark result 
indicates a big speed
     boost  (those decodeArray/Mime/Url results, from 30% to 2 times 
faster, depends on
     input size).

Base64.Encoder.encode0()
     It appears encode0() was fully optimized in 1.8. Can't get it 
faster :-) Tried to use
     Unsafe.getLong/putLong instead of byte by byte access. But it 
appears the 8-byte
     "vectorization" does not bring us enough speed up, the performance 
is the same as the
     current one. See encode00() at
     http://cr.openjdk.java.net/~sherman/8164278/webrev.00

Base64.Encoder.wrap(OutputStream)/EncOutputStream.write():
     If my memory serves me right, the current implementation was under 
the assumption that
     the underlying output stream probably is buffered (if invoker 
cares). It would be a redundant
     if EncOutputStream buffers bytes again. It appears this is a wrong 
assumption. It is much
     slower to write 4 bytes separately, compared to bundle them 
together in a byte[4] and write
     into underlying, even the underlying output stream is a 
ByteArrayOutputStream.
     Again, the proposed change is to add a fast path loop, as we do in 
encode0(), to decode/
     write a block of 3-byte->4-byte unites. It appears this fast loop 
can help the compiler to
     optimize away some boundary checks, therefor is much faster.
     The jmh result Base64BM.encodeOS suggests the new implementation is 
almost 4 times faster
     and is almost the same as java.mail's stream encoder.

Base64.Decoder.wrap(InputStream)/DecInputStream.read():
     Same as the approach we take for decode0(), to add a fast path 
decode block of 4-byte unites
     together.
     The jmh result Base64BM.decodeOS (the name probably should be 
decodeIS, for InputStream,
     but anyway...) shows the proposed one is 4 times faster than the 
existing impl and double
     the  java.mail (Base64BM.decodeOS_javamail) implementation.

     However, there is a side-effect of adding a buffering mechanism 
into DecInputStream. The
     current implementation read bytes from the underlying stream one by 
one, it never reads
     more bytes than it needs, which means it should/is supposed to just 
stop at the last byte
     that it needs to decode, when there is "=" present in the stream.
     With buffering, it's possible more bytes (after "=", which 
indicates "end of base64 stream") might
     be read/consumed in and buffered.  A concern? if this is indeed a 
concern, the only alternative
     might be to add a separate method to support this 
"faster-buffered-decoder"?

Thanks,
Sherman