RFR: JDK-8021560,(str) String constructors that take ByteBuffer
Stuart Marks
stuart.marks at oracle.com
Thu Feb 15 21:55:03 UTC 2018
> public String(ByteBuffer bytes, Charset cs);
> public String(ByteBuffer bytes, String csname);
I think these constructors make good sense. They avoid an extra copy to an
intermediate byte[].
One issue (also mentioned by Stephen Colebourne) is whether we need the csname
overload. Arguably it's not needed if we have the Charset overload. And the
csname overload throws UnsupportedEncodingException, which is checked. But the
csname overload is apparently faster, since the decoder can be cached, and it's
unclear when this can be remedied for the Charset case....
I could go either way on this one.
**
I'd also suggest adding a CharBuffer constructor:
public String(CharBuffer cbuf)
This would be semantically equivalent to
public String(char[] value, int offset, int count)
except using the chars from the CharBuffer between the buffer's position and its
limit.
**
Regarding the getBytes() overloads:
> public int getBytes(byte[] dst, int offset, Charset cs);
> public int getBytes(byte[] dst, int offset, String csname);
> public int getBytes(ByteBuffer bytes, Charset cs);
> public int getBytes(ByteBuffer bytes, Charset csn);
> On 2/13/18, 12:41 AM, Alan Bateman wrote:
>> These four methods encode as many characters as possible into the destination
>> byte[] or buffer but don't give any indication that the destination didn't
>> have enough space to encode the entire string. I thus worry they could be a
>> hazard and result in buggy code. If there is insufficient space then the user
>> of the API doesn't know how many characters were encoded so it's not easy to
>> substring and call getBytes again to encode the remaining characters. There is
>> also the issue of how to size the destination. What would you think about
>> having them fail when there is insufficient space? If they do fail then there
>> is a side effect that they will have written to the destination so that would
>> need to be documented too.
I share Alan's concern here.
If the intent is to reuse a byte[] or a ByteBuffer, then there needs to be an
effective way to handle the case where the provided array/buffer doesn't have
enough room to receive the decoded string. A variety of ways of dealing with
this have been mentioned, such as throwing an exception; returning negative
value to indicate failure, possibly also encoding the number of bytes written;
or even allocating a fresh array or buffer of the proper size and returning that
instead. The caller would have to check the return value and take care to handle
all the cases properly. This is likely to be fairly error-prone.
This also raises the question in my mind of what these getBytes() methods are
intended for.
On the one hand, they might be useful for the caller to manage its own memory
allocation and reuse of arrays/buffers. If so, then it's necessary for
intermediate results from partial processing to be handled properly. If the
destination fills up, there needs to be a way to report how much of the input
was consumed, so that a subsequent operation can pick up where the previous one
left off. (This was one of David Lloyd's points.) If there's sufficient room in
the destination, there needs to be a way to report this and how much space
remains in the destination. One could contemplate adding all this information to
the API. This eventually leads to
CharsetEncoder.encode(CharBuffer in, ByteBuffer out, boolean endOfInput)
which has all the necessary partial progress state in the buffers.
On the other hand, maybe the intent of these APIs is for convenience. I'd
observe that String already has this method:
public byte[] getBytes(Charset)
which returns the decoded bytes in a newly allocated array of the proper size.
This is pretty convenient. It doesn't let the caller reuse a destination array
or buffer... but that's what brings in all the partial result edge cases.
Bottom line is that I'm not entirely sure of the use of these new getBytes()
overloads. Maybe I've missed a use case where these work; if so, maybe somebody
can describe it.
s'marks
More information about the core-libs-dev
mailing list