String encoding to ByteBuffer

Brian Burkhalter brian.burkhalter at oracle.com
Mon Mar 13 23:16:09 UTC 2023


Redirecting to nio-dev which is the more appropriate forum for this topic.

> On Feb 26, 2023, at 3:39 PM, Carl M <java at rkive.org> wrote:
> 
> I'm looking into adding a fast path case for encoding Strings into ByteBuffers, and wanted to get feedback on a possible approach.  My use case is taking mostly-ASCII, UTF-8 Strings and writing them to the disk/network.  To do this today, there are two approaches which both have drawbacks:
> 
> 1.  Use String.getBytes(StandardCharsets.UTF_8), and call ByteBuffer.put().  The downside of this approach is that I need to make a copy of the String's byte[] value.    The upside of this approach is that ByteBuffer uses the intrinsic copy methods, which are fast.
> 
> 2.  Wrap the String in a CharBuffer, and call CharsetEncoder.encode(CharBuffer, ByteBuffer).  This avoids copying the String value.  However, when using the UTF_8 encoder, there is no fastpath for writing to direct ByteBuffers.   sun.nio.cs.UTF_8.encodeLoop() only has fast paths for when the destination is array based.  This allocates less memory, but is overall slower in my JMH benchmark.
> 
> To fix this, I looked at adding an overload to CharsetEncoder to accept a String (or a CharSequence), and a ByteBuffer as a destination.  However, this is not easily doable, since it's hard to call it in a loop.  In the case that the String overflows the BB, the caller needs to be able to provide a new BB and resume from where they left off.  The CharBuffer approach works here because it keeps the position last read, and can resume from there.  
> 
> To encode a String, we need to know that the character index written to resume with a larger buffer.  However, the return type on CharsetEncoder's encode method is a CoderResult.  The length() method on this can't be called for underflow cases.  This means that there isn't a usable return type here (neither int nor CoderResult can be used).
> 
> Another, almost-possible solution I was considering adding a special case to UTF_8 for direct buffer destinations, and a corresponding JLA.encodeASCII overload that accepts a ByteBuffer.  The challenge here is that a wrapped CharBuffer doesn't have an array, and so doesn't get the fast path copying.
> 
> The reason I am reaching out here is that I am looking for feedback on my analysis of the existing API.  I am wondering what API compromises could be made to fast path writing Strings to direct buffers, which I feel is probably a common operation.  The only reasonable way I can see to implement is a new return type, which also seems undesirable as well.
> 
> Carl



More information about the nio-dev mailing list