RFR: JDK-8021560,(str) String constructors that take ByteBuffer

Tue Feb 13 14:37:44 UTC 2018

On Tue, Feb 13, 2018 at 2:41 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
> On 13/02/2018 06:24, Xueming Shen wrote:
>>
>> Hi,
>>
>> Please help review the proposal to add following constructors and methods
>> in String
>> class to take ByteBuffer as the input and output data buffer.
>>
>> public String(ByteBuffer bytes, Charset cs);
>> public String(ByteBuffer bytes, String csname);
>
> These constructors looks good (for the parameter names then I assume you
> meant "src" rather than "bytes" here).
>
>> public int getBytes(byte[] dst, int offset, Charset cs);
>> public int getBytes(byte[] dst, int offset, String csname);
>> public int getBytes(ByteBuffer bytes, Charset cs);
>> public int getBytes(ByteBuffer bytes, Charset csn);
>
> These four methods encode as many characters as possible into the
> destination byte[] or buffer but don't give any indication that the
> destination didn't have enough space to encode the entire string. I thus
> worry they could be a hazard and result in buggy code. If there is
> insufficient space then the user of the API doesn't know how many characters
> were encoded so it's not easy to substring and call getBytes again to encode
> the remaining characters. There is also the issue of how to size the
> destination. What would you think about having them fail when there is
> insufficient space? If they do fail then there is a side effect that they
> will have written to the destination so that would need to be documented
> too.

The ones that output to a ByteBuffer have more flexibility in that the
buffer position can be moved according to the number of bytes written,
but the method _could_ return the number of _chars_ actually written.
But this is not particularly useful without variants which accept an
offset into the string, unless it can be shown that
s.substring(coffs).getBytes(xxx) is reasonably efficient.

It might be better to shuffle this around a little and instead have a
Charset[Encoder].getBytes(int codePoint, byte[] b, int offs, int
len)/.getBytes(int codePoint, ByteBuffer buf) kind of thing which
returns the number of bytes or e.g. -1 or -count if there isn't enough
space in the target.  Then it would be less onerous for users to write
simple for-each-codepoint loops which encode as far as is reasonable
but no farther without too many error-handling gymnastics.

-- 
- DML