RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

Thu Dec 1 19:17:32 UTC 2016

On 11/28/2016 11:27 AM, Paul Sandoz wrote:
>> On 25 Nov 2016, at 02:47, Tobias Hartmann<tobias.hartmann at oracle.com>  wrote:
>>
>>> I'm not sure if it is still desired to do the same boundary check in case of LATIN1
>>> for the benefit of consistency.  Assume there might be concurrent access/operation
>>> between val = this.value and count = this.count; (for StringBuilder) for example,
>>> the this.value got doubled and the this.count got increased. One will end up with
>>> StringIndexOutOfBoundsException() from checkOffset, but the other will be ioobe
>>> from vm?
>> You mean when hitting a check in an intrinsic compared to when hitting a check in the Java wrapper?
>>
> Not quite. There is a spliterator implementation for each coder, in the case of the LATIN1 coder there are no associated intrinsics. I think it’s ok just to perform the explicit bounds check for the UTF16 coder, since for the LATIN1 bounds checks will be performed by baloads.
>
> However, i think there is bug. The coder could change from LATIN1 to UTF16, while holding a reference to a byte[] value as LATIN1.
>
> For StringBuilder i could fix that by atomically reading the value/count/coder, but there is still an issue with StringBuffer. Thus, in the UTF16 coder spliterator we need to perform the explicit bounds before *every* StringUTF16.getChar().
>
> Webrev updated:
>
>    http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/
>
> Paul.
>

Hi Paul,

Seems like we do have a bug here. But I think to use "charAt()" might not be the
desired/correct approach here?

For StringBuffer, since we have to guarantee the correctness in concurrent use scenario,
I would assume we need a synchronized version of getSupplier(...) to return a late binding
and thread-safe Supplier in StringBuffer, similar to "getBytes(...)" (we have a un-synced
version of getBytes() in AbstractStringBuilder and a synced version at the bottom of the
StringBuffer.java). This should be for the latin1 case as well.

For StringBuilder, since the only concern is the bounds check, I think the existing
checkOffset(...) should be good enough, even in case the coder changes from latin1
to utf16, as the checkOffset(...) checks the "count" against "val.length >> coder" on
the local copy,  the getChar() access after that should be safe (though might not be
correct, if it's a utf16 coder on a latin1 byte[]).

Sherman

>> Actually, bound checks should always be done in the Java wrapper method that calls the (unchecked) intrinsic. In general, the unchecked intrinsic should not be called directly. StringUTF16.putChar/getChar() is an exception because there are places where we we need to omit the check for performance reasons.
>>
>> I'm planning to revisit this with JDK-8156534 in JDK 10.
>>
>> Best regards,
>> Tobias
>>
>>> Sherman
>>>
>>>
>>>> If so i propose the following to make this clearer:
>>>>
>>>> return StreamSupport.intStream(
>>>>          () ->   {
>>>>              byte[] val = this.value; int count = this.count;
>>>>              if (coder == LATIN1) {
>>>>                  return new StringLatin1.CharsSpliterator(val, 0, count, 0);
>>>>              } else {
>>>>                  // Perform an explicit bounds check since HotSpot
>>>>                  // intrinsics to access UTF-16 characters in the byte[]
>>>>                  // array will not perform bounds checks
>>>>                  checkOffset(count, val.length>>   coder);
>>>>                  return new StringUTF16.CharsSpliterator(val, 0, count, 0);
>>>>              }
>>>>          },
>>>>          Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
>>>>          false);
>>>>
>>>> Paul.