Lower overhead String encoding/decoding

Thu Sep 25 16:24:27 UTC 2014

Hi Richard, couple comments after a quick scan.

(1)  #474, the IOBE probably is no longer needed in case of ByteBuffer. 
parameter.

(2) for methods that have the ByteBuffer as input, it would be desirable 
to specify clearly that
      the bytes are read from "position" to "limit", and whether or not 
the "position" will be advanced
      after decoding/encoding (important).

(3) getBytes(byte[], offset, charset)
      while I understand it might be useful to have such a method in 
certain circumstance, it is
      usually complicated to make it right/easy for user to actually use 
it. Consider the fact that
      the returned number of bytes encoded has no straightforward 
connection to how many
      underlying chars have been encoded (so the user of this method 
really have no idea how
      many underlying "chars" have been encoded into the dest buffer, is 
that enough? how big
      the buffer need to be to encode the whole string? ....) especially 
the possibility that the last
      couple byte space might be short of encoding a "char". Not like 
the getChars(), which has
      a easy, clear and direct link between the out going chars and 
underlying chars. I would
      suggest it might be better to leave it out.

(4) StringCoding.decode() #239  "remaining()" should be used to return 
limit - position?

(5) in case of "untrusted", it might be more straightforward to get all 
"bytes" out of the buffer
      first (you are allocating a byte buffer here anyway, I don;t see 
obvious benefit to get a
      direct buffer, btw) and then pass it to the original/existing 
byte[]->char[] decoding
      implementation. We probably will take a deep look at the 
implementation later when
      the public api settled.

-Sherman

Richard Warburton wrote:
> Hi Alan,
>
> Thanks for the feedback.
>
> The direction seems reasonable but I wonder about the offset/length (and
>> destOffet) parameters as this isn't consistent with how ByteBuffers were
>> originally intended to be used. That is, when you read the bytes from the
>> wire into a ByteBuffer and flip it then the position and limit will delimit
>> the bytes to be decoded.
>>
>> If the constructor is changed to String(ByteBuffer in, Charset cs) and
>> decodes the remaining bytes in the buffer to a String using the specified
>> Charset then I think would be more consistent. Also I think this would give
>> you a solution to the underflow case.
>>
> I've updated the webrev to reflect this, removing the offset and length
> parameters and using position() and limit() instead.
>
> http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/
>
> Similarly if getBytes is replaced with with a getBytes or
>> encode(ByteBuffer, Charset cs) then then it would encode as many characters
>> as possible into the output buffer and I think would be more consistent and
>> also help with overflow case.
>>
> I've also applied the this to the getBytes() method. I chose the getBytes()
> method name for consistency with the existing getBytes() method that
> returns a byte[]. To my mind encode() is a more natural name for the
> method, which you mention in your email, do people have a preference here?
>
> regards,
>
>    Richard Warburton
>
>    http://insightfullogic.com
>    @RichardWarburto <http://twitter.com/richardwarburto>