Lower overhead String encoding/decoding
Xueming Shen
xueming.shen at oracle.com
Thu Sep 25 16:44:39 UTC 2014
On 9/25/14 9:24 AM, Xueming Shen wrote:
> Hi Richard, couple comments after a quick scan.
>
> (1) #474, the IOBE probably is no longer needed in case of
> ByteBuffer. parameter.
>
> (2) for methods that have the ByteBuffer as input, it would be
> desirable to specify clearly that
> the bytes are read from "position" to "limit", and whether or not
> the "position" will be advanced
> after decoding/encoding (important).
>
> (3) getBytes(byte[], offset, charset)
> while I understand it might be useful to have such a method in
> certain circumstance, it is
> usually complicated to make it right/easy for user to actually
> use it. Consider the fact that
> the returned number of bytes encoded has no straightforward
> connection to how many
> underlying chars have been encoded (so the user of this method
> really have no idea how
> many underlying "chars" have been encoded into the dest buffer,
> is that enough? how big
> the buffer need to be to encode the whole string? ....)
> especially the possibility that the last
> couple byte space might be short of encoding a "char". Not like
> the getChars(), which has
> a easy, clear and direct link between the out going chars and
> underlying chars. I would
> suggest it might be better to leave it out.
I would assume this is true for the ByteBuffer version as well...just
doubt the usefulness of
a method that it might only give you the bytes of part of the target
string object. Given the
getBytes(ByteBuffer) version has the additional position/limit info form
the buffer itself, a
"workaround" is to return the "number of underlying chars copied"...then
you need one more
parameter to let user to copy from "index' for next round.
-sherman
>
> (4) StringCoding.decode() #239 "remaining()" should be used to return
> limit - position?
>
> (5) in case of "untrusted", it might be more straightforward to get
> all "bytes" out of the buffer
> first (you are allocating a byte buffer here anyway, I don;t see
> obvious benefit to get a
> direct buffer, btw) and then pass it to the original/existing
> byte[]->char[] decoding
> implementation. We probably will take a deep look at the
> implementation later when
> the public api settled.
>
> -Sherman
>
>
> Richard Warburton wrote:
>> Hi Alan,
>>
>> Thanks for the feedback.
>>
>> The direction seems reasonable but I wonder about the offset/length (and
>>> destOffet) parameters as this isn't consistent with how ByteBuffers
>>> were
>>> originally intended to be used. That is, when you read the bytes
>>> from the
>>> wire into a ByteBuffer and flip it then the position and limit will
>>> delimit
>>> the bytes to be decoded.
>>>
>>> If the constructor is changed to String(ByteBuffer in, Charset cs) and
>>> decodes the remaining bytes in the buffer to a String using the
>>> specified
>>> Charset then I think would be more consistent. Also I think this
>>> would give
>>> you a solution to the underflow case.
>>>
>> I've updated the webrev to reflect this, removing the offset and length
>> parameters and using position() and limit() instead.
>>
>> http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/
>>
>> Similarly if getBytes is replaced with with a getBytes or
>>> encode(ByteBuffer, Charset cs) then then it would encode as many
>>> characters
>>> as possible into the output buffer and I think would be more
>>> consistent and
>>> also help with overflow case.
>>>
>> I've also applied the this to the getBytes() method. I chose the
>> getBytes()
>> method name for consistency with the existing getBytes() method that
>> returns a byte[]. To my mind encode() is a more natural name for the
>> method, which you mention in your email, do people have a preference
>> here?
>>
>> regards,
>>
>> Richard Warburton
>>
>> http://insightfullogic.com
>> @RichardWarburto <http://twitter.com/richardwarburto>
>
More information about the core-libs-dev
mailing list