Request for sponsor: JDK-8221430: StringBuffer(CharSequence) constructor truncates when -XX:-CompactStrings specified
Ivan Gerasimov
ivan.gerasimov at oracle.com
Sat Mar 30 06:29:29 UTC 2019
It was a good suggestion to run micro benchmarks!
First, it turned out that (at least in some scenarios)
StringBuilder(String) is not intrinsified.
I.e., running benchmarks with/without '-XX:+UnlockDiagnosticVMOptions
-XX:DisableIntrinsic=_StringBuilder_String' led to the same numbers.
Here they are prior the fix:
Benchmark Mode Cnt Score Error Units
StringBuilders.fromLatin1String avgt 18 16.378 ± 0.378 ns/op
StringBuilders.fromLatin1StringBuilder avgt 18 19.975 ± 0.195 ns/op
StringBuilders.fromUtf8String avgt 18 19.946 ± 2.774 ns/op
StringBuilders.fromUtf8StringBuilder avgt 18 34.317 ± 0.415 ns/op
Second, with the fix from the webrev.01, the performance has degraded:
Benchmark Mode Cnt Score Error Units
StringBuilders.fromLatin1String avgt 18 24.517 ± 0.965 ns/op
StringBuilders.fromLatin1StringBuilder avgt 18 20.025 ± 0.678 ns/op
StringBuilders.fromUtf8String avgt 18 26.102 ± 0.627 ns/op
StringBuilders.fromUtf8StringBuilder avgt 18 23.141 ± 0.394 ns/op
So, it appears to be necessary to handle the case StringBuilder(String)
separately from the general case.
And here there is the new webrev:
http://cr.openjdk.java.net/~igerasim/8221430/02/webrev/
This variant restores the performance, and shows some improvement in a
case of StringBuilder(StringBuilder-with-UTF8-content):
Benchmark Mode Cnt Score Error Units
StringBuilders.fromLatin1String avgt 18 15.742 ± 0.189 ns/op
StringBuilders.fromLatin1StringBuilder avgt 18 18.671 ± 0.109 ns/op
StringBuilders.fromUtf8String avgt 18 17.172 ± 0.100 ns/op
StringBuilders.fromUtf8StringBuilder avgt 18 21.885 ± 0.138 ns/op
With kind regards,
Ivan
On 3/28/19 2:24 PM, Ivan Gerasimov wrote:
> Apologize for the delay, got distracted by other urgent things.
>
> I've got some surprising micro-benchmark results, and need to
> understand them before pushing the fix.
>
> I'll verify the results and then share details.
>
> With kind regards,
> Ivan
>
>
> On 3/28/19 9:41 AM, Roger Riggs wrote:
>> Hi Andrew,
>>
>> I'm fine with Ivan's version too.
>>
>> Roger
>>
>>
>> On 03/28/2019 12:13 PM, Andrew Leonard wrote:
>>> Roger, Ivan, thanks for your help discussing this issue. Fyi, I am
>>> off on leave for the next week, so I +1 Ivan's last webrev if that's
>>> the way you decide to go..
>>> Thanks
>>> Andrew
>>>
>>> Andrew Leonard
>>> Java Runtimes Development
>>> IBM Hursley
>>> IBM United Kingdom Ltd
>>> Phone internal: 245913, external: 01962 815913
>>> internet email: andrew_m_leonard at uk.ibm.com
>>>
>>>
>>>
>>>
>>> From: Roger Riggs <Roger.Riggs at oracle.com>
>>> To: Ivan Gerasimov <ivan.gerasimov at oracle.com>, Andrew Leonard
>>> <andrew_m_leonard at uk.ibm.com>
>>> Cc: core-libs-dev at openjdk.java.net
>>> Date: 27/03/2019 13:39
>>> Subject: Re: Request for sponsor: JDK-8221430:
>>> StringBuffer(CharSequence) constructor truncates when
>>> -XX:-CompactStrings specified
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> Hi Ivan,
>>>
>>> On 03/26/2019 07:30 PM, Ivan Gerasimov wrote:
>>> The main source of confusion here seems to be due to that the coder
>>> is passed in as an argument, so it either needs to be trusted or has
>>> to be verified in the constructor.
>>>
>>> The design for coder is that it *is* trusted in all
>>> internal/implementation APIs.
>>> Subsequent changes should not weaken this invariant, it will cause
>>> doubt
>>> in the code and cast doubt on all of the performance work that has
>>> been done to optimize its use.
>>>
>>> So, let's instead introduce AbstractStringBuilder(CharSequence) and
>>> make it do all manipulations.
>>> _
>>> __http://cr.openjdk.java.net/~igerasim/8221430/01/webrev/_
>>> <http://cr.openjdk.java.net/%7Eigerasim/8221430/01/webrev/>
>>>
>>> Note that the common case of StringBuilder(String) already gets
>>> intrinsified by hotspot.
>>>
>>> What do you think?
>>>
>>> This looks good, the logic is in one place.
>>>
>>> Since StringBuilder(String) is an intrinsic, my doubt about a slight
>>> performance
>>> impact is unwarranted in that specific case.
>>>
>>> Are there any StringBuffer/Builder jmh tests than be easily rerun to
>>> compare?
>>>
>>> Thanks, Roger
>>>
>>>
>>> With kind regards,
>>> Ivan
>>>
>>> On 3/26/19 1:04 PM, Ivan Gerasimov wrote:
>>> From the design point of view, I believe it is better to have the
>>> constructor AbstractStringBuilder(int, int, int) to check if the
>>> coder argument makes sense with respect to the value of
>>> COMPACT_STRING, so it won't be possible to create a StringBuilder
>>> with the coder==LATIN1, when it is not supported.
>>>
>>> For calculating the coderHint then, it is not necessary to check
>>> COMPACT_STRING: If the CharSequence argument is in fact String or
>>> AbstractStringBuilder, the coder is known, otherwise LATIN1 can be
>>> passed in as a hint (for an arbitrary CharSequence it is not 100%
>>> accurate anyway).
>>> The constructor AbstractStringBuilder(int, int, int) will then
>>> either use the provided coder, or adjust it if necessary.
>>>
>>> Will we agree on something like following?
>>> _
>>> __http://cr.openjdk.java.net/~igerasim/8221430/00/webrev/_
>>> <http://cr.openjdk.java.net/%7Eigerasim/8221430/00/webrev/>
>>>
>>> With kind regards,
>>>
>>> Ivan
>>>
>>>
>>> On 3/26/19 12:14 PM, Roger Riggs wrote:
>>> Hi,
>>>
>>> We've got the subject open and its fresh, there's no need for a
>>> separate review cycle.
>>>
>>> The first fix (-01) does not seem to be consistent with the original
>>> design
>>> and handling of the coder. The StringBuilder(String) and
>>> StringBuffer(String)
>>> constructors are the pattern that should be followed for determining
>>> the coder for the new instance.
>>>
>>> Checking for COMPACT_STRING in two places (the AbstractStringBuilder
>>> and
>>> the sub classes) is unnecessary and distributes the information
>>> about the
>>> correct coder across two classes where determining what it should be
>>> in the subclass has more complete information and can correctly
>>> determine
>>> the coder once.
>>>
>>> We can likely find a reviewer to be a tie-breaker if Ivan sees it as
>>> desirable.
>>>
>>> Thanks, Roger
>>>
>>>
>>> On 03/26/2019 02:38 PM, Andrew Leonard wrote:
>>> Sorry chaps, I think my brain is getting confused!, I think we have
>>> conflicting reviews here?
>>>
>>> Roger, I added the getCharSequenceCoder() to AbstractStringBuilder
>>> so it was only defined in one place..
>>> I agree with this being called in StringBuffer/Builder then we don't
>>> need the change to AbstractStringBuild() constuctor, however Ivan
>>> wants getCharSequenceCoder() that done as a separate "bug".
>>>
>>> So I think it comes down to do we do this as 2 "bugs" or 1 ?
>>>
>>> Thanks
>>> Andrew
>>>
>>> Andrew Leonard
>>> Java Runtimes Development
>>> IBM Hursley
>>> IBM United Kingdom Ltd
>>> Phone internal: 245913, external: 01962 815913
>>> internet email: _andrew_m_leonard at uk.ibm.com_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>>
>>>
>>>
>>>
>>> From: Roger Riggs _<Roger.Riggs at oracle.com>_
>>> <mailto:Roger.Riggs at oracle.com>
>>> To: Andrew Leonard _<andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>, Ivan Gerasimov
>>> _<ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> Cc: _core-libs-dev at openjdk.java.net_
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> Date: 26/03/2019 18:19
>>> Subject: Re: Request for sponsor: JDK-8221430:
>>> StringBuffer(CharSequence) constructor truncates when
>>> -XX:-CompactStrings specified
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> Hi Andrew,
>>>
>>> You are going to have to change the same code twice
>>> because the changes should be the StringBuffer and StringBuilder
>>> constructors and would remove the code that is added to
>>> the AbstractStringBuilder constructor. That's a waste of review
>>> cycles.
>>>
>>>
>>> On 03/26/2019 11:45 AM, Andrew Leonard wrote:
>>> Hi Roger,
>>> No worries, the more the merrier!
>>> So that was one of my reasoning for adding getCharSequenceCode()
>>> was, I think what you're suggesting is my webrev.01,
>>> __http://cr.openjdk.java.net/~aleonard/8221430/webrev.01/__
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>_
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>
>>> Ivan's view is that behaviour was an extended issue, which it is,
>>> but I thought it was nice to add..
>>>
>>> Which patch do we favour? webrev-01 or -02 ?
>>>
>>> Neither, there should be no change to the AbstractStringBuilder
>>> constructor
>>> and the change should be done in the subclass constructors.
>>>
>>> Roger
>>>
>>>
>>> Thanks
>>> Andrew
>>>
>>> Andrew Leonard
>>> Java Runtimes Development
>>> IBM Hursley
>>> IBM United Kingdom Ltd
>>> Phone internal: 245913, external: 01962 815913
>>> internet email: _andrew_m_leonard at uk.ibm.com_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>>
>>>
>>>
>>>
>>> From: Roger Riggs __<Roger.Riggs at oracle.com>_
>>> <mailto:Roger.Riggs at oracle.com>_ _<mailto:Roger.Riggs at oracle.com>_
>>> <mailto:Roger.Riggs at oracle.com>
>>> To: _core-libs-dev at openjdk.java.net_
>>> _<mailto:core-libs-dev at openjdk.java.net>_
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> Date: 26/03/2019 15:24
>>> Subject: Re: Request for sponsor: JDK-8221430:
>>> StringBuffer(CharSequence) constructor truncates when
>>> -XX:-CompactStrings specified
>>> Sent by: "core-libs-dev" __<core-libs-dev-bounces at openjdk.java.net>_
>>> <mailto:core-libs-dev-bounces at openjdk.java.net>_
>>> _<mailto:core-libs-dev-bounces at openjdk.java.net>_
>>> <mailto:core-libs-dev-bounces at openjdk.java.net>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> Hi Andrew,
>>>
>>> Sorry to be late to the review.
>>>
>>> For symmetry with the constructors StringBuffer(String), and
>>> StringBuilder(String)
>>> the determine the coder based on the input argument, I would recommend
>>> using the getCharSequenceCoder added in the -01 webrev and calling
>>> it from the StringBuffer(CharSeq...), and StringBuilder(CharSeq...)
>>> constructors.
>>>
>>> It would be symmetric with the getCoder() method (line 1635)
>>> and select the appropriate coder base on the input value (if known.)
>>>
>>> Thanks, Roger
>>>
>>>
>>>
>>> On 03/26/2019 10:57 AM, Andrew Leonard wrote:
>>> > Hi Ivan,
>>> > Yes, i'm happy with that, as you say the simple constructor change
>>> fixes
>>> > the immediate issue, but not necessarily the extended issue of a
>>> > non-compactable CharSequence in COMPACT_STRINGS mode, but that's
>>> probably
>>> > an enhanced issue to cover in a separate bug...
>>> > I've created a new webrev.02 with just the constructor change and the
>>> > testcase:
>>> > __http://cr.openjdk.java.net/~aleonard/8221430/webrev.02/__
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.02/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.02/>_
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.02/>
>>> >
>>> > Thanks
>>> > Andrew
>>> >
>>> > Andrew Leonard
>>> > Java Runtimes Development
>>> > IBM Hursley
>>> > IBM United Kingdom Ltd
>>> > Phone internal: 245913, external: 01962 815913
>>> > internet email: _andrew_m_leonard at uk.ibm.com_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >
>>> >
>>> >
>>> >
>>> > From: Ivan Gerasimov __<ivan.gerasimov at oracle.com>_
>>> <mailto:ivan.gerasimov at oracle.com>_
>>> _<mailto:ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> > To: Andrew Leonard __<andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> > Cc: _core-libs-dev at openjdk.java.net_
>>> _<mailto:core-libs-dev at openjdk.java.net>_
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> > Date: 26/03/2019 01:18
>>> > Subject: Re: Request for sponsor: JDK-8221430:
>>> > StringBuffer(CharSequence) constructor truncates when
>>> -XX:-CompactStrings
>>> > specified
>>> >
>>> >
>>> >
>>> > Thanks Andrew!
>>> > Introducing getCharSequenceCoder() is actually an enhancement,
>>> which may
>>> > improve pre-allocation in certain cases.
>>> > It's not actually required to restore correctness of
>>> StringBuilder/Buffer
>>> > constructors.
>>> > I recommend to separate it from this bug fix, so it can be discussed
>>> > separately, and determined if this is the best approach to this
>>> > enhancement.
>>> > If you agree, I can adjust your latest patch accordingly, run it
>>> through
>>> > the Mach5 test systems and push it on your behalf.
>>> > With kind regards,
>>> > Ivan
>>> >
>>> > On 3/25/19 5:00 PM, Andrew Leonard wrote:
>>> > Hi Ivan,
>>> > Here is my updated webrev:
>>> > __http://cr.openjdk.java.net/~aleonard/8221430/webrev.01/__
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>_
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.01/>
>>> >
>>> > Thanks
>>> > Andrew
>>> >
>>> > Andrew Leonard
>>> > Java Runtimes Development
>>> > IBM Hursley
>>> > IBM United Kingdom Ltd
>>> > Phone internal: 245913, external: 01962 815913
>>> > internet email: _andrew_m_leonard at uk.ibm.com_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >
>>> >
>>> >
>>> >
>>> > From: Ivan Gerasimov __<ivan.gerasimov at oracle.com>_
>>> <mailto:ivan.gerasimov at oracle.com>_
>>> _<mailto:ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> > To: Andrew Leonard __<andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> > Cc: _core-libs-dev at openjdk.java.net_
>>> _<mailto:core-libs-dev at openjdk.java.net>_
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> > Date: 25/03/2019 22:55
>>> > Subject: Re: Request for sponsor: JDK-8221430:
>>> > StringBuffer(CharSequence) constructor truncates when
>>> -XX:-CompactStrings
>>> > specified
>>> >
>>> >
>>> >
>>> > I was thinking of organizing code similar to what is done in
>>> > AbstractStringBuilder(int):
>>> >
>>> > if (COMPACT_STRINGS && coderHint == LATIN1) {
>>> > value = new byte[capacity];
>>> > coder = LATIN1;
>>> > } else {
>>> > value = StringUTF16.newBytesFor(capacity);
>>> > coder = UTF16;
>>> > }
>>> >
>>> > With kind regards,
>>> > Ivan
>>> >
>>> > On 3/25/19 3:45 PM, Andrew Leonard wrote:
>>> > Hi Ivan,
>>> > I think I see what you're saying, you mean we also need to correct
>>> this
>>> > line in AbstractStringBuilder
>>> > constructor:
>>> > value = (coder == LATIN1)
>>> > ? new byte[capacity] :
>>> StringUTF16.newBytesFor(capacity);
>>> > to be maybe:
>>> > value = (COMPACT_STRINGS && coder == LATIN1)
>>> > ? new byte[capacity] :
>>> StringUTF16.newBytesFor(capacity);
>>> >
>>> > The passed in coder stills need to be correct, since with
>>> COMPACT_STRINGS
>>> > a string could be UTF16 if
>>> > it cannot be compacted, so it's more than just a hint isn't it?
>>> >
>>> > Thanks
>>> > Andrew
>>> >
>>> > Andrew Leonard
>>> > Java Runtimes Development
>>> > IBM Hursley
>>> > IBM United Kingdom Ltd
>>> > Phone internal: 245913, external: 01962 815913
>>> > internet email: _andrew_m_leonard at uk.ibm.com_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >
>>> >
>>> >
>>> >
>>> > From: Ivan Gerasimov __<ivan.gerasimov at oracle.com>_
>>> <mailto:ivan.gerasimov at oracle.com>_
>>> _<mailto:ivan.gerasimov at oracle.com>_ <mailto:ivan.gerasimov at oracle.com>
>>> > To: Andrew Leonard __<andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>,
>>> > _core-libs-dev at openjdk.java.net_
>>> _<mailto:core-libs-dev at openjdk.java.net>_
>>> <mailto:core-libs-dev at openjdk.java.net>
>>> > Date: 25/03/2019 22:20
>>> > Subject: Re: Request for sponsor: JDK-8221430:
>>> > StringBuffer(CharSequence) constructor truncates when
>>> -XX:-CompactStrings
>>> > specified
>>> >
>>> >
>>> >
>>> > Hi Andrew!
>>> >
>>> > Thanks for finding this bug!
>>> >
>>> > Your fix solves the problem.
>>> >
>>> > However, I think the main issue is that the constructor
>>> > AbstractStringBuilder(byte,int,int) is now broken: as you
>>> discovered,
>>> > it allows to create a string buffer with the coder LATIN1 when
>>> > COMPACT_STRINGS is false.
>>> >
>>> > Wouldn't it make sense to rename the argument of the constructor to,
>>> > say, coderHint, and then either use it as the coder if
>>> > COMPACT_STRINGS==true, or discard it otherwise.
>>> >
>>> > What do you think?
>>> >
>>> > With kind regards,
>>> > Ivan
>>> >
>>> > On 3/25/19 12:45 PM, Andrew Leonard wrote:
>>> >> Hi,
>>> >> Please can I request a sponsor for this fix to a JDK-13 regression?
>>> >>
>>> >> Patch with jtreg testcase here:
>>> >> __http://cr.openjdk.java.net/~aleonard/8221430/webrev.00/__
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.00/_>_<http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.00/>_
>>> <http://cr.openjdk.java.net/%7Ealeonard/8221430/webrev.00/>
>>> >>
>>> >> bug: __https://bugs.openjdk.java.net/browse/JDK-8221430__
>>> >>
>>> >> Many thanks
>>> >> Andrew
>>> >>
>>> >> Andrew Leonard
>>> >> Java Runtimes Development
>>> >> IBM Hursley
>>> >> IBM United Kingdom Ltd
>>> >> Phone internal: 245913, external: 01962 815913
>>> >> internet email: _andrew_m_leonard at uk.ibm.com_
>>> _<mailto:andrew_m_leonard at uk.ibm.com>_
>>> <mailto:andrew_m_leonard at uk.ibm.com>
>>> >>
>>> >> Unless stated otherwise above:
>>> >> IBM United Kingdom Limited - Registered in England and Wales with
>>> number
>>> >> 741598.
>>> >> Registered office: PO Box 41, North Harbour, Portsmouth,
>>> Hampshire PO6
>>> > 3AU
>>>
>>>
>>>
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with
>>> number 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
>>> PO6 3AU
>>>
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with
>>> number 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
>>> PO6 3AU
>>>
>>>
>>>
>>>
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with
>>> number 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
>>> PO6 3AU
>>
>
--
With kind regards,
Ivan Gerasimov
More information about the core-libs-dev
mailing list