Request for sponsor: JDK-8221430: StringBuffer(CharSequence) constructor truncates when -XX:-CompactStrings specified

Ivan Gerasimov ivan.gerasimov at oracle.com
Sat Mar 30 06:29:29 UTC 2019


It was a good suggestion to run micro benchmarks!

First, it turned out that (at least in some scenarios) 
StringBuilder(String) is not intrinsified.
I.e., running benchmarks with/without '-XX:+UnlockDiagnosticVMOptions 
-XX:DisableIntrinsic=_StringBuilder_String' led to the same numbers.

Here they are prior the fix:
Benchmark                               Mode  Cnt   Score Error  Units
StringBuilders.fromLatin1String         avgt   18  16.378 ± 0.378  ns/op
StringBuilders.fromLatin1StringBuilder  avgt   18  19.975 ± 0.195  ns/op
StringBuilders.fromUtf8String           avgt   18  19.946 ± 2.774  ns/op
StringBuilders.fromUtf8StringBuilder    avgt   18  34.317 ± 0.415  ns/op

Second, with the fix from the webrev.01, the performance has degraded:
Benchmark                               Mode  Cnt   Score Error  Units
StringBuilders.fromLatin1String         avgt   18  24.517 ± 0.965  ns/op
StringBuilders.fromLatin1StringBuilder  avgt   18  20.025 ± 0.678  ns/op
StringBuilders.fromUtf8String           avgt   18  26.102 ± 0.627  ns/op
StringBuilders.fromUtf8StringBuilder    avgt   18  23.141 ± 0.394  ns/op

So, it appears to be necessary to handle the case StringBuilder(String) 
separately from the general case.

And here there is the new webrev:
http://cr.openjdk.java.net/~igerasim/8221430/02/webrev/

This variant restores the performance, and shows some improvement in a 
case of StringBuilder(StringBuilder-with-UTF8-content):
Benchmark                               Mode  Cnt   Score Error  Units
StringBuilders.fromLatin1String         avgt   18  15.742 ± 0.189  ns/op
StringBuilders.fromLatin1StringBuilder  avgt   18  18.671 ± 0.109  ns/op
StringBuilders.fromUtf8String           avgt   18  17.172 ± 0.100  ns/op
StringBuilders.fromUtf8StringBuilder    avgt   18  21.885 ± 0.138  ns/op

With kind regards,
Ivan


On 3/28/19 2:24 PM, Ivan Gerasimov wrote:
> Apologize for the delay, got distracted by other urgent things.
>
> I've got some surprising micro-benchmark results, and need to 
> understand them before pushing the fix.
>
> I'll verify the results and then share details.
>
> With kind regards,
> Ivan
>
>
> On 3/28/19 9:41 AM, Roger Riggs wrote:
>> Hi Andrew,
>>
>> I'm fine with Ivan's version too.
>>
>> Roger
>>
>>
>> On 03/28/2019 12:13 PM, Andrew Leonard wrote:
>>> Roger, Ivan, thanks for your help discussing this issue. Fyi, I am 
>>> off on leave for the next week, so I +1 Ivan's last webrev if that's 
>>> the way you decide to go..
>>> Thanks
>>> Andrew
>>>
>>> Andrew Leonard
>>> Java Runtimes Development
>>> IBM Hursley
>>> IBM United Kingdom Ltd
>>> Phone internal: 245913, external: 01962 815913
>>> internet email: andrew_m_leonard at uk.ibm.com
>>>
>>>
>>>
>>>
>>> From: Roger Riggs <Roger.Riggs at oracle.com>
>>> To: Ivan Gerasimov <ivan.gerasimov at oracle.com>, Andrew Leonard 
>>> <andrew_m_leonard at uk.ibm.com>
>>> Cc: core-libs-dev at openjdk.java.net
>>> Date: 27/03/2019 13:39
>>> Subject: Re: Request for sponsor: JDK-8221430: 
>>> StringBuffer(CharSequence) constructor truncates when 
>>> -XX:-CompactStrings specified
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>>
>>>
>>> Hi Ivan,
>>>
>>> On 03/26/2019 07:30 PM, Ivan Gerasimov wrote:
>>> The main source of confusion here seems to be due to that the coder 
>>> is passed in as an argument, so it either needs to be trusted or has 
>>> to be verified in the constructor.
>>>
>>> The design for coder is that it *is* trusted in all 
>>> internal/implementation APIs.
>>> Subsequent changes should not weaken this invariant, it will cause 
>>> doubt
>>> in the code and cast doubt on all of the performance work that has
>>> been done to optimize its use.
>>>
>>> So, let's instead introduce AbstractStringBuilder(CharSequence) and 
>>> make it do all manipulations.
>>> _
>>> __http://cr.openjdk.java.net/~igerasim/8221430/01/webrev/_ 
>>> <http://cr.openjdk.java.net/%7Eigerasim/8221430/01/webrev/>
>>>
>>> Note that the common case of StringBuilder(String) already gets 
>>> intrinsified by hotspot.
>>>
>>> What do you think?
>>>
>>> This looks good, the logic is in one place.
>>>
>>> Since StringBuilder(String) is an intrinsic, my doubt about a slight 
>>> performance
>>> impact is unwarranted in that specific case.
>>>
>>> Are there any StringBuffer/Builder jmh tests than be easily rerun to 
>>> compare?
>>>
>>> Thanks, Roger
>>>
>>>
>>> With kind regards,
>>> Ivan
>>>
>>> On 3/26/19 1:04 PM, Ivan Gerasimov wrote:
>>> From the design point of view, I believe it is better to have the 
>>> constructor AbstractStringBuilder(int, int, int) to check if the 
>>> coder argument makes sense with respect to the value of 
>>> COMPACT_STRING, so it won't be possible to create a StringBuilder 
>>> with the coder==LATIN1, when it is not supported.
>>>
>>> For calculating the coderHint then, it is not necessary to check 
>>> COMPACT_STRING:  If the CharSequence argument is in fact String or 
>>> AbstractStringBuilder, the coder is known, otherwise LATIN1 can be 
>>> passed in as a hint (for an arbitrary CharSequence it is not 100% 
>>> accurate anyway).
>>> The constructor AbstractStringBuilder(int, int, int) will then 
>>> either use the provided coder, or adjust it if necessary.
>>>
>>> Will we agree on something like following?
>>> _
>>> __http://cr.openjdk.java.net/~igerasim/8221430/00/webrev/_ 
>>> <http://cr.openjdk.java.net/%7Eigerasim/8221430/00/webrev/>
>>>
>>> With kind regards,
>>>
>>> Ivan
>>>
>>>
>>> On 3/26/19 12:14 PM, Roger Riggs wrote:
>>> Hi,
>>>
>>> We've got the subject open and its fresh, there's no need for a 
>>> separate review cycle.
>>>
>>> The first fix (-01) does not seem to be consistent with the original 
>>> design
>>> and handling of the coder.  The StringBuilder(String) and 
>>> StringBuffer(String)
>>> constructors are the pattern that should be followed for determining
>>> the coder for the new instance.
>>>
>>> Checking for COMPACT_STRING in two places (the AbstractStringBuilder 
>>> and
>>> the sub classes) is unnecessary and distributes the information 
>>> about the
>>> correct coder across two classes where determining what it should be
>>> in the subclass has more complete  information and can correctly 
>>> determine
>>> the coder once.
>>>
>>> We can likely find a reviewer to be a tie-breaker if Ivan sees it as 
>>> desirable.
>>>
>>> Thanks, Roger
>>>
>>>
>>> On 03/26/2019 02:38 PM, Andrew Leonard wrote:
>>> Sorry chaps, I think my brain is getting confused!, I think we have 
>>> conflicting reviews here?
>>>
>>> Roger, I added the getCharSequenceCoder() to AbstractStringBuilder 
>>> so it was only defined in one place..
>>> I agree with this being called in StringBuffer/Builder then we don't 
>>> need the change to AbstractStringBuild() constuctor, however Ivan 
>>> wants getCharSequenceCoder() that done as a separate "bug".
>>>
>>> So I think it comes down to do we do this as 2 "bugs" or 1 ?
>>>
>>> Thanks
>>> Andrew
>>>
>>> Andrew Leonard
>>> Java Runtimes Development
>>> IBM Hursley
>>> IBM United Kingdom Ltd
>>> Phone internal: 245913, external: 01962 815913
>>> internet email: _andrew_m_leonard at uk.ibm.com_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>>
>>>
>>>
>>>
>>> From: Roger Riggs _<Roger.Riggs at oracle.com>_ 
>>> <mailto:Roger.Riggs at oracle.com>
>>> To: Andrew Leonard _<andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>, Ivan Gerasimov 
>>> _<ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> Cc: _core-libs-dev at openjdk.java.net_ 
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> Date: 26/03/2019 18:19
>>> Subject: Re: Request for sponsor: JDK-8221430: 
>>> StringBuffer(CharSequence) constructor truncates when 
>>> -XX:-CompactStrings specified
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>>
>>>
>>> Hi Andrew,
>>>
>>> You are going to have to change the same code twice
>>> because the changes should be the StringBuffer and StringBuilder
>>> constructors and would remove the code that is added to
>>> the AbstractStringBuilder constructor.  That's a waste of review 
>>> cycles.
>>>
>>>
>>> On 03/26/2019 11:45 AM, Andrew Leonard wrote:
>>> Hi Roger,
>>> No worries, the more the merrier!
>>> So that was one of my reasoning for adding getCharSequenceCode() 
>>> was, I think what you're suggesting is my webrev.01, 
>>> __http://cr.openjdk.java.net/~aleonard/8221430/webrev.01/__ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>_ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>
>>> Ivan's view is that behaviour was an extended issue, which it is, 
>>> but I thought it was nice to add..
>>>
>>> Which patch do we favour? webrev-01 or -02 ?
>>>
>>> Neither, there should be no change to the AbstractStringBuilder 
>>> constructor
>>> and the change should be done in the subclass constructors.
>>>
>>> Roger
>>>
>>>
>>> Thanks
>>> Andrew
>>>
>>> Andrew Leonard
>>> Java Runtimes Development
>>> IBM Hursley
>>> IBM United Kingdom Ltd
>>> Phone internal: 245913, external: 01962 815913
>>> internet email: _andrew_m_leonard at uk.ibm.com_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>>
>>>
>>>
>>>
>>> From: Roger Riggs __<Roger.Riggs at oracle.com>_ 
>>> <mailto:Roger.Riggs at oracle.com>_ _<mailto:Roger.Riggs at oracle.com>_ 
>>> <mailto:Roger.Riggs at oracle.com>
>>> To: _core-libs-dev at openjdk.java.net_ 
>>> _<mailto:core-libs-dev at openjdk.java.net>_ 
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> Date: 26/03/2019 15:24
>>> Subject: Re: Request for sponsor: JDK-8221430: 
>>> StringBuffer(CharSequence) constructor truncates when 
>>> -XX:-CompactStrings specified
>>> Sent by: "core-libs-dev" __<core-libs-dev-bounces at openjdk.java.net>_ 
>>> <mailto:core-libs-dev-bounces at openjdk.java.net>_ 
>>> _<mailto:core-libs-dev-bounces at openjdk.java.net>_ 
>>> <mailto:core-libs-dev-bounces at openjdk.java.net>
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>>
>>>
>>> Hi Andrew,
>>>
>>> Sorry to be late to the review.
>>>
>>> For symmetry with the constructors StringBuffer(String), and
>>> StringBuilder(String)
>>> the determine the coder based on the input argument, I would recommend
>>> using the getCharSequenceCoder added in the -01 webrev and calling
>>> it from the StringBuffer(CharSeq...), and StringBuilder(CharSeq...)
>>> constructors.
>>>
>>> It would be symmetric with the getCoder() method (line 1635)
>>> and select the appropriate coder base on the input value (if known.)
>>>
>>> Thanks, Roger
>>>
>>>
>>>
>>> On 03/26/2019 10:57 AM, Andrew Leonard wrote:
>>> > Hi Ivan,
>>> > Yes, i'm happy with that, as you say the simple constructor change 
>>> fixes
>>> > the immediate issue, but not necessarily the extended issue of a
>>> > non-compactable CharSequence in COMPACT_STRINGS mode, but that's 
>>> probably
>>> > an enhanced issue to cover in a separate bug...
>>> > I've created a new webrev.02 with just the constructor change and the
>>> > testcase:
>>> > __http://cr.openjdk.java.net/~aleonard/8221430/webrev.02/__ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.02/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.02/>_ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.02/>
>>> >
>>> > Thanks
>>> > Andrew
>>> >
>>> > Andrew Leonard
>>> > Java Runtimes Development
>>> > IBM Hursley
>>> > IBM United Kingdom Ltd
>>> > Phone internal: 245913, external: 01962 815913
>>> > internet email: _andrew_m_leonard at uk.ibm.com_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >
>>> >
>>> >
>>> >
>>> > From:   Ivan Gerasimov __<ivan.gerasimov at oracle.com>_ 
>>> <mailto:ivan.gerasimov at oracle.com>_ 
>>> _<mailto:ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> > To:     Andrew Leonard __<andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> > Cc: _core-libs-dev at openjdk.java.net_ 
>>> _<mailto:core-libs-dev at openjdk.java.net>_ 
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> > Date:   26/03/2019 01:18
>>> > Subject:        Re: Request for sponsor: JDK-8221430:
>>> > StringBuffer(CharSequence) constructor truncates when 
>>> -XX:-CompactStrings
>>> > specified
>>> >
>>> >
>>> >
>>> > Thanks Andrew!
>>> > Introducing getCharSequenceCoder() is actually an enhancement, 
>>> which may
>>> > improve pre-allocation in certain cases.
>>> > It's not actually required to restore correctness of 
>>> StringBuilder/Buffer
>>> > constructors.
>>> > I recommend to separate it from this bug fix, so it can be discussed
>>> > separately, and determined if this is the best approach to this
>>> > enhancement.
>>> > If you agree, I can adjust your latest patch accordingly, run it 
>>> through
>>> > the Mach5 test systems and push it on your behalf.
>>> > With kind regards,
>>> > Ivan
>>> >
>>> > On 3/25/19 5:00 PM, Andrew Leonard wrote:
>>> > Hi Ivan,
>>> > Here is my updated webrev:
>>> > __http://cr.openjdk.java.net/~aleonard/8221430/webrev.01/__ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>_ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>
>>> >
>>> > Thanks
>>> > Andrew
>>> >
>>> > Andrew Leonard
>>> > Java Runtimes Development
>>> > IBM Hursley
>>> > IBM United Kingdom Ltd
>>> > Phone internal: 245913, external: 01962 815913
>>> > internet email: _andrew_m_leonard at uk.ibm.com_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >
>>> >
>>> >
>>> >
>>> > From:        Ivan Gerasimov __<ivan.gerasimov at oracle.com>_ 
>>> <mailto:ivan.gerasimov at oracle.com>_ 
>>> _<mailto:ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> > To:        Andrew Leonard __<andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> > Cc: _core-libs-dev at openjdk.java.net_ 
>>> _<mailto:core-libs-dev at openjdk.java.net>_ 
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> > Date:        25/03/2019 22:55
>>> > Subject:        Re: Request for sponsor: JDK-8221430:
>>> > StringBuffer(CharSequence) constructor truncates when 
>>> -XX:-CompactStrings
>>> > specified
>>> >
>>> >
>>> >
>>> > I was thinking of organizing code similar to what is done in
>>> > AbstractStringBuilder(int):
>>> >
>>> > if (COMPACT_STRINGS && coderHint == LATIN1) {
>>> >      value = new byte[capacity];
>>> >      coder = LATIN1;
>>> > } else {
>>> >      value = StringUTF16.newBytesFor(capacity);
>>> >      coder = UTF16;
>>> > }
>>> >
>>> > With kind regards,
>>> > Ivan
>>> >
>>> > On 3/25/19 3:45 PM, Andrew Leonard wrote:
>>> > Hi Ivan,
>>> > I think I see what you're saying, you mean we also need to correct 
>>> this
>>> > line in AbstractStringBuilder
>>> > constructor:
>>> > value = (coder == LATIN1)
>>> >                 ? new byte[capacity] : 
>>> StringUTF16.newBytesFor(capacity);
>>> > to be maybe:
>>> > value = (COMPACT_STRINGS && coder == LATIN1)
>>> >                 ? new byte[capacity] : 
>>> StringUTF16.newBytesFor(capacity);
>>> >
>>> > The passed in coder stills need to be correct, since with 
>>> COMPACT_STRINGS
>>> > a string could be UTF16 if
>>> > it cannot be compacted, so it's more than just a hint isn't it?
>>> >
>>> > Thanks
>>> > Andrew
>>> >
>>> > Andrew Leonard
>>> > Java Runtimes Development
>>> > IBM Hursley
>>> > IBM United Kingdom Ltd
>>> > Phone internal: 245913, external: 01962 815913
>>> > internet email: _andrew_m_leonard at uk.ibm.com_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >
>>> >
>>> >
>>> >
>>> > From:        Ivan Gerasimov __<ivan.gerasimov at oracle.com>_ 
>>> <mailto:ivan.gerasimov at oracle.com>_ 
>>> _<mailto:ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> > To:        Andrew Leonard __<andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>,
>>> > _core-libs-dev at openjdk.java.net_ 
>>> _<mailto:core-libs-dev at openjdk.java.net>_ 
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> > Date:        25/03/2019 22:20
>>> > Subject:        Re: Request for sponsor: JDK-8221430:
>>> > StringBuffer(CharSequence) constructor truncates when 
>>> -XX:-CompactStrings
>>> > specified
>>> >
>>> >
>>> >
>>> > Hi Andrew!
>>> >
>>> > Thanks for finding this bug!
>>> >
>>> > Your fix solves the problem.
>>> >
>>> > However, I think the main issue is that the constructor
>>> > AbstractStringBuilder(byte,int,int) is now broken:  as you 
>>> discovered,
>>> > it allows to create a string buffer with the coder LATIN1 when
>>> > COMPACT_STRINGS is false.
>>> >
>>> > Wouldn't it make sense to rename the argument of the constructor to,
>>> > say, coderHint, and then either use it as the coder if
>>> > COMPACT_STRINGS==true, or discard it otherwise.
>>> >
>>> > What do you think?
>>> >
>>> > With kind regards,
>>> > Ivan
>>> >
>>> > On 3/25/19 12:45 PM, Andrew Leonard wrote:
>>> >> Hi,
>>> >> Please can I request a sponsor for this fix to a JDK-13 regression?
>>> >>
>>> >> Patch with jtreg testcase here:
>>> >> __http://cr.openjdk.java.net/~aleonard/8221430/webrev.00/__ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.00/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.00/>_ 
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.00/>
>>> >>
>>> >> bug: __https://bugs.openjdk.java.net/browse/JDK-8221430__
>>> >>
>>> >> Many thanks
>>> >> Andrew
>>> >>
>>> >> Andrew Leonard
>>> >> Java Runtimes Development
>>> >> IBM Hursley
>>> >> IBM United Kingdom Ltd
>>> >> Phone internal: 245913, external: 01962 815913
>>> >> internet email: _andrew_m_leonard at uk.ibm.com_ 
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_ 
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >>
>>> >> Unless stated otherwise above:
>>> >> IBM United Kingdom Limited - Registered in England and Wales with 
>>> number
>>> >> 741598.
>>> >> Registered office: PO Box 41, North Harbour, Portsmouth, 
>>> Hampshire PO6
>>> > 3AU
>>>
>>>
>>>
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with 
>>> number 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
>>> PO6 3AU
>>>
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with 
>>> number 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
>>> PO6 3AU
>>>
>>>
>>>
>>>
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with 
>>> number 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
>>> PO6 3AU
>>
>

-- 
With kind regards,
Ivan Gerasimov



More information about the core-libs-dev mailing list