RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding
Xueming Shen
xueming.shen at oracle.com
Thu Dec 1 19:17:32 UTC 2016
On 11/28/2016 11:27 AM, Paul Sandoz wrote:
>> On 25 Nov 2016, at 02:47, Tobias Hartmann<tobias.hartmann at oracle.com> wrote:
>>
>>> I'm not sure if it is still desired to do the same boundary check in case of LATIN1
>>> for the benefit of consistency. Assume there might be concurrent access/operation
>>> between val = this.value and count = this.count; (for StringBuilder) for example,
>>> the this.value got doubled and the this.count got increased. One will end up with
>>> StringIndexOutOfBoundsException() from checkOffset, but the other will be ioobe
>>> from vm?
>> You mean when hitting a check in an intrinsic compared to when hitting a check in the Java wrapper?
>>
> Not quite. There is a spliterator implementation for each coder, in the case of the LATIN1 coder there are no associated intrinsics. I think it’s ok just to perform the explicit bounds check for the UTF16 coder, since for the LATIN1 bounds checks will be performed by baloads.
>
> However, i think there is bug. The coder could change from LATIN1 to UTF16, while holding a reference to a byte[] value as LATIN1.
>
> For StringBuilder i could fix that by atomically reading the value/count/coder, but there is still an issue with StringBuffer. Thus, in the UTF16 coder spliterator we need to perform the explicit bounds before *every* StringUTF16.getChar().
>
> Webrev updated:
>
> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/
>
> Paul.
>
Hi Paul,
Seems like we do have a bug here. But I think to use "charAt()" might not be the
desired/correct approach here?
For StringBuffer, since we have to guarantee the correctness in concurrent use scenario,
I would assume we need a synchronized version of getSupplier(...) to return a late binding
and thread-safe Supplier in StringBuffer, similar to "getBytes(...)" (we have a un-synced
version of getBytes() in AbstractStringBuilder and a synced version at the bottom of the
StringBuffer.java). This should be for the latin1 case as well.
For StringBuilder, since the only concern is the bounds check, I think the existing
checkOffset(...) should be good enough, even in case the coder changes from latin1
to utf16, as the checkOffset(...) checks the "count" against "val.length >> coder" on
the local copy, the getChar() access after that should be safe (though might not be
correct, if it's a utf16 coder on a latin1 byte[]).
Sherman
>> Actually, bound checks should always be done in the Java wrapper method that calls the (unchecked) intrinsic. In general, the unchecked intrinsic should not be called directly. StringUTF16.putChar/getChar() is an exception because there are places where we we need to omit the check for performance reasons.
>>
>> I'm planning to revisit this with JDK-8156534 in JDK 10.
>>
>> Best regards,
>> Tobias
>>
>>> Sherman
>>>
>>>
>>>> If so i propose the following to make this clearer:
>>>>
>>>> return StreamSupport.intStream(
>>>> () -> {
>>>> byte[] val = this.value; int count = this.count;
>>>> if (coder == LATIN1) {
>>>> return new StringLatin1.CharsSpliterator(val, 0, count, 0);
>>>> } else {
>>>> // Perform an explicit bounds check since HotSpot
>>>> // intrinsics to access UTF-16 characters in the byte[]
>>>> // array will not perform bounds checks
>>>> checkOffset(count, val.length>> coder);
>>>> return new StringUTF16.CharsSpliterator(val, 0, count, 0);
>>>> }
>>>> },
>>>> Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
>>>> false);
>>>>
>>>> Paul.
More information about the core-libs-dev
mailing list