RFR: JDK-8021560,(str) String constructors that take ByteBuffer

Thu Feb 15 21:55:03 UTC 2018

> public String(ByteBuffer bytes, Charset cs);
> public String(ByteBuffer bytes, String csname);

I think these constructors make good sense. They avoid an extra copy to an 
intermediate byte[].

One issue (also mentioned by Stephen Colebourne) is whether we need the csname 
overload. Arguably it's not needed if we have the Charset overload. And the 
csname overload throws UnsupportedEncodingException, which is checked. But the 
csname overload is apparently faster, since the decoder can be cached, and it's 
unclear when this can be remedied for the Charset case....

I could go either way on this one.

**

I'd also suggest adding a CharBuffer constructor:

     public String(CharBuffer cbuf)

This would be semantically equivalent to

     public String(char[] value, int offset, int count)

except using the chars from the CharBuffer between the buffer's position and its 
limit.

**

Regarding the getBytes() overloads:

> public int getBytes(byte[] dst, int offset, Charset cs);
> public int getBytes(byte[] dst, int offset, String csname);
> public int getBytes(ByteBuffer bytes, Charset cs);
> public int getBytes(ByteBuffer bytes, Charset csn);

> On 2/13/18, 12:41 AM, Alan Bateman wrote:
>> These four methods encode as many characters as possible into the destination 
>> byte[] or buffer but don't give any indication that the destination didn't 
>> have enough space to encode the entire string. I thus worry they could be a 
>> hazard and result in buggy code. If there is insufficient space then the user 
>> of the API doesn't know how many characters were encoded so it's not easy to 
>> substring and call getBytes again to encode the remaining characters. There is 
>> also the issue of how to size the destination. What would you think about 
>> having them fail when there is insufficient space? If they do fail then there 
>> is a side effect that they will have written to the destination so that would 
>> need to be documented too.

I share Alan's concern here.

If the intent is to reuse a byte[] or a ByteBuffer, then there needs to be an 
effective way to handle the case where the provided array/buffer doesn't have 
enough room to receive the decoded string. A variety of ways of dealing with 
this have been mentioned, such as throwing an exception; returning negative 
value to indicate failure, possibly also encoding the number of bytes written; 
or even allocating a fresh array or buffer of the proper size and returning that 
instead. The caller would have to check the return value and take care to handle 
all the cases properly. This is likely to be fairly error-prone.

This also raises the question in my mind of what these getBytes() methods are 
intended for.

On the one hand, they might be useful for the caller to manage its own memory 
allocation and reuse of arrays/buffers. If so, then it's necessary for 
intermediate results from partial processing to be handled properly. If the 
destination fills up, there needs to be a way to report how much of the input 
was consumed, so that a subsequent operation can pick up where the previous one 
left off. (This was one of David Lloyd's points.) If there's sufficient room in 
the destination, there needs to be a way to report this and how much space 
remains in the destination. One could contemplate adding all this information to 
the API. This eventually leads to

     CharsetEncoder.encode(CharBuffer in, ByteBuffer out, boolean endOfInput)

which has all the necessary partial progress state in the buffers.

On the other hand, maybe the intent of these APIs is for convenience. I'd 
observe that String already has this method:

     public byte[] getBytes(Charset)

which returns the decoded bytes in a newly allocated array of the proper size. 
This is pretty convenient. It doesn't let the caller reuse a destination array 
or buffer... but that's what brings in all the partial result edge cases.

Bottom line is that I'm not entirely sure of the use of these new getBytes() 
overloads. Maybe I've missed a use case where these work; if so, maybe somebody 
can describe it.

s'marks