Lower overhead String encoding/decoding
Xueming Shen
xueming.shen at oracle.com
Thu Sep 25 16:24:27 UTC 2014
Hi Richard, couple comments after a quick scan.
(1) #474, the IOBE probably is no longer needed in case of ByteBuffer.
parameter.
(2) for methods that have the ByteBuffer as input, it would be desirable
to specify clearly that
the bytes are read from "position" to "limit", and whether or not
the "position" will be advanced
after decoding/encoding (important).
(3) getBytes(byte[], offset, charset)
while I understand it might be useful to have such a method in
certain circumstance, it is
usually complicated to make it right/easy for user to actually use
it. Consider the fact that
the returned number of bytes encoded has no straightforward
connection to how many
underlying chars have been encoded (so the user of this method
really have no idea how
many underlying "chars" have been encoded into the dest buffer, is
that enough? how big
the buffer need to be to encode the whole string? ....) especially
the possibility that the last
couple byte space might be short of encoding a "char". Not like
the getChars(), which has
a easy, clear and direct link between the out going chars and
underlying chars. I would
suggest it might be better to leave it out.
(4) StringCoding.decode() #239 "remaining()" should be used to return
limit - position?
(5) in case of "untrusted", it might be more straightforward to get all
"bytes" out of the buffer
first (you are allocating a byte buffer here anyway, I don;t see
obvious benefit to get a
direct buffer, btw) and then pass it to the original/existing
byte[]->char[] decoding
implementation. We probably will take a deep look at the
implementation later when
the public api settled.
-Sherman
Richard Warburton wrote:
> Hi Alan,
>
> Thanks for the feedback.
>
> The direction seems reasonable but I wonder about the offset/length (and
>> destOffet) parameters as this isn't consistent with how ByteBuffers were
>> originally intended to be used. That is, when you read the bytes from the
>> wire into a ByteBuffer and flip it then the position and limit will delimit
>> the bytes to be decoded.
>>
>> If the constructor is changed to String(ByteBuffer in, Charset cs) and
>> decodes the remaining bytes in the buffer to a String using the specified
>> Charset then I think would be more consistent. Also I think this would give
>> you a solution to the underflow case.
>>
> I've updated the webrev to reflect this, removing the offset and length
> parameters and using position() and limit() instead.
>
> http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/
>
> Similarly if getBytes is replaced with with a getBytes or
>> encode(ByteBuffer, Charset cs) then then it would encode as many characters
>> as possible into the output buffer and I think would be more consistent and
>> also help with overflow case.
>>
> I've also applied the this to the getBytes() method. I chose the getBytes()
> method name for consistency with the existing getBytes() method that
> returns a byte[]. To my mind encode() is a more natural name for the
> method, which you mention in your email, do people have a preference here?
>
> regards,
>
> Richard Warburton
>
> http://insightfullogic.com
> @RichardWarburto <http://twitter.com/richardwarburto>
More information about the core-libs-dev
mailing list