Lower overhead String encoding/decoding

Peter Levart peter.levart at gmail.com
Mon Sep 29 09:50:54 UTC 2014


Hi,

On 09/22/2014 01:25 PM, Richard Warburton wrote:
> Hi all,
>
> A long-standing issue with Strings in Java is the ease and performance of
> creating a String from a ByteBuffer. People who are using nio to bring in
> data off the network will be receiving that data in the form of bytebuffers
> and converting it to some form of String. For example restful systems
> receiving XML or Json messages.
>
> The current workaround is to create a byte[] from the ByteBuffer - a
> copying action for any direct bytebuffer - and then pass that to the
> String.

An alternative is to use CharsetDecoder to program a "decoding 
operation" on input ByteBuffer(s), writing the result to CharBuffer(s). 
If the resulting CharBuffer is a single object (big enough), it can be 
converted to String via simple CharBuffer.toString(). Which is a 
copy-ing operation. In situations where the number of resulting 
characters can be anticipated in advance (like when we know in advance 
the number of bytes to be decoded and the charset used has fixed "number 
of bytes per char" or nearly fixed (like with UTF-8), a simple static 
utility method somewhere in java.lang.nio package could be used to 
optimize this operation:

     public static String decodeString(CharsetDecoder dec, ByteBuffer in)
         throws CharacterCodingException {

         CharBuffer cb = dec.decode(in);

         if (cb.length() == cb.hb.length) {
             // optimized no-copy String construction
             return 
SharedSecrets.getJavaLangAccess().newStringUnsafe(cb.hb);
         } else {
             return cb.toString();
         }
     }


>   I'd like to propose that we add an additional constructor to the
> String class that takes a ByteBuffer as an argument, and directly create
> the char[] value inside the String from the ByteBuffer.
>
> Similarly if you have a String that you want to encode onto the wire then
> you need to call String.getBytes(), then write your byte[] into a
> ByteBuffer or send it over the network.

Again, an alternative is to use CharBuffer.wrap(CharSequence cs, int 
start, int end) to wrap a String with a CharBuffer facade and then use 
CharsetEncoder to encode it directly into a resulting ByteBuffer. No 
additional copy-ing needed.

Regards, Peter

> This ends up allocating a byte[] to
> do the copy and also trimming the byte[] back down again, usually
> allocating another byte[]. To address this problem I've added a couple of
> getBytes() overloads that take byte[] and ByteBuffer arguments and write
> directly to those buffers.
>
> I've put together a patch that implements this to demonstrate the overall
> direction.
>
> http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-5/
>
> I'm happy to take any feedback on direction or the patch itself or the
> overall idea/approach. I think there are a number of similar API situations
> in other places as well, for example StringBuffer/StringBuilder instances
> which could have similar appends directly from ByteBuffer instances instead
> of byte[] instances.
>
> I'll also be at Javaone next week, so if you want to talk about this, just
> let me know.
>
> regards,
>
>    Richard Warburton
>
>    http://insightfullogic.com
>    @RichardWarburto <http://twitter.com/richardwarburto>
>
> PS: I appreciate that since I'm adding a method to the public API which
> consequently requires standardisation but I think that this could get
> incorporated into the Java 9 umbrella JSR.




More information about the core-libs-dev mailing list