String concatenation tweaks

Alex Buckley alex.buckley at oracle.com
Fri Mar 13 22:18:52 UTC 2015


Thanks for analyzing how the lower layers of your '+' implementation can 
fail, and for prefiguring the failures so as to know that if you start 
concatenating the subexpression strings then you'll be able to finish. 
This mitigates my concern about excessively eager evaluation of 
subexpressions when concatenation might yet fail -- now it "can't fail", 
so from the outside it looks left-associative. Nice!

Alex

On 3/13/2015 2:40 PM, Louis Wasserman wrote:
> Got it.  I think the only cases we have to worry about, then, are buffer
> size overflows resulting in NegativeArraySizeException, or possibly an
> explicitly thrown OutOfMemoryError (which is StringBuilder's response
> when the buffer size tries to exceed Integer.MAX_VALUE).  I think we
> might conceivably deal with this by rewriting the bytecode to -- I think
> we can improve on this with jump hackery to avoid repetition, but
> essentially --
>
> int length = 3; // sum of the constant strings; we can check that this
> won't overflow at compile time but I think it couldn't no matter what
> because of method length constraints
> String mStr = m().toString();
> length += mStr.length();
> if (length < 0) {
>    throw new OutOfMemoryError();
> }
> String nStr = n().toString();
> length += nStr.length();
> if (length < 0) {
>    throw new OutOfMemoryError();
> }
>
> This continues to expand the bytecode, but probably manageably -- we
> don't actually need a local for length; if (int < 0) is easy in
> bytecode, and we can have only one OOME site that all the ifs jump to?
>
> On Fri, Mar 13, 2015 at 12:59 PM Alex Buckley <alex.buckley at oracle.com
> <mailto:alex.buckley at oracle.com>> wrote:
>
>     I do recognize that the proposed implementation doesn't reorder the
>     evaluation of subexpressions.
>
>     When discussing the proposed implementation of '+' -- whose key element
>     is calling append(String) with a pre-computed value -- I was careful to
>     set aside asynchronous OOMEs, but I see that even synchronous OOMEs are
>     sidetracking us. My concern is not heap pressure; my concern is
>     arbitrary unchecked exceptions arising from the <init>(int) and
>     append(String) calls.
>
>     For sake of argument, I'll simplify "unchecked exceptions" to just
>     RuntimeExceptions, not Errors. If you can guarantee that no
>     RuntimeExceptions are thrown synchronously during the execution of those
>     method bodies on the JVM, then '+' cannot fail and the timing of
>     subexpression evaluation is unobservable (ordering is still observable,
>     as required). I think this guarantee is just a matter of reviewing the
>     method bodies.
>
>     Alex
>
>     On 3/12/2015 6:01 PM, Louis Wasserman wrote:
>      > I confess I'm not sure how that "quality" goal would be achievable in
>      > bytecode without deliberately allocating arrays we then discard.
>      >
>      > For what it's worth, delaying or avoiding OOMEs seems a desirable
>     goal
>      > in general, and up to a constant multiplicative factor, this
>      > implementation seems to allocate the same amount in the same order.
>      > That is, we're still computing m().toString() before
>     n().toString(), and
>      > up to a constant multiplicative factor, m().toString() allocates the
>      > same number of bytes as the StringBuilder the status quo
>     generates.  So
>      > if m() does something like allocate a char[Integer.MAX_VALUE], we
>     still
>      > OOM at the appropriate time.
>      >
>      > Other notes: this implementation would tend to decrease maximum
>      > allocation, so it'd reduce OOMEs.  Also, since the StringBuilder will
>      > never need further expansion and we're only using the String and
>      > primitive overloads of append, the only way for append to OOME
>     would be
>      > in append(float) and append(double), which allocate a FloatingDecimal
>      > (which may, in turn, allocate a new thread-local char[26] if one
>     isn't
>      > already there).
>      >
>      > On Thu, Mar 12, 2015 at 4:28 PM Alex Buckley
>     <alex.buckley at oracle.com <mailto:alex.buckley at oracle.com>
>      > <mailto:alex.buckley at oracle.__com
>     <mailto:alex.buckley at oracle.com>>> wrote:
>      >
>      >     More abstract presentation. Given the expression:
>      >
>      >         "foo" + m() + n()
>      >
>      >     you must not evaluate n() if evaluation of "foo" + m() completes
>      >     abruptly. The proposed implementation evaluates n() regardless.
>      >
>      >     All is not lost. In the proposed implementation, the abrupt
>     completion
>      >     of "foo" + m() could occur because an append call fails or
>     (thanks to
>      >     Jon for pointing this out) the StringBuilder ctor fails. The
>      >     quality-of-implementation issue is thus: if the proposed
>     implementation
>      >     is of sufficiently high quality to guarantee that the ctor
>     and the first
>      >     append both succeed, then the evaluation of "foo" + m() will
>     always
>      >     complete normally, and it would be an unobservable (thus
>     acceptable)
>      >     implementation detail to evaluate n() early.
>      >
>      >     Alex
>      >
>      >     On 3/11/2015 10:26 PM, Jeremy Manson wrote:
>      >      > Isn't Louis's proposed behavior equivalent to saying "the
>     rightmost
>      >      > concatenation threw an OOME" instead of "some
>     concatenation in the
>      >      > middle threw an OOME"?
>      >      >
>      >      > It's true that the intermediate String concatenations haven't
>      >     occurred
>      >      > at that point, but in the JDK's current implementation,
>     that's true,
>      >      > too: the concatenations that have occurred at that point are
>      >      > StringBuilder ones, not String ones.  If any of the append
>     operations
>      >      > throws an OOME, no Strings have been created at all, either in
>      >     Louis's
>      >      > implementation or in the JDK's.
>      >      >
>      >      > Ultimately, isn't this a quality of implementation issue?
>     And if so,
>      >      > isn't it a quality of implementation issue that doesn't
>     provide any
>      >      > additional quality?  I can't imagine code whose semantics
>     relies on
>      >      > this, and if they do, they are relying on something
>      >      > implementation-dependent.
>      >      >
>      >      > Jeremy
>      >      >
>      >      > On Wed, Mar 11, 2015 at 6:13 PM, Alex Buckley
>      >     <alex.buckley at oracle.com <mailto:alex.buckley at oracle.com>
>     <mailto:alex.buckley at oracle.__com <mailto:alex.buckley at oracle.com>>
>      >      > <mailto:alex.buckley at oracle.
>     <mailto:alex.buckley at oracle.>____com
>      >     <mailto:alex.buckley at oracle.__com
>     <mailto:alex.buckley at oracle.com>>>> wrote:
>      >      >
>      >      >     On 3/11/2015 2:01 PM, Louis Wasserman wrote:
>      >      >
>      >      >         So for example, "foo" + myInt + myString + "bar" +
>     myObj
>      >     would be
>      >      >         compiled to the equivalent of
>      >      >
>      >      >         int myIntTmp = myInt;
>      >      >         String myStringTmp = String.valueOf(myString); //
>     defend
>      >     against
>      >      >         null
>      >      >         String myObjTmp =
>      >     String.valueOf(String.valueOf(______myObj)); // defend
>      >      >         against evil toString implementations returning null
>      >      >
>      >      >         return new StringBuilder(
>      >      >                17 // length of "foo" (3) + max length of
>     myInt (11) +
>      >      >         length of
>      >      >         "bar" (3)
>      >      >                + myStringTmp.length()
>      >      >                + myObjTmp.length())
>      >      >              .append("foo")
>      >      >              .append(myIntTmp)
>      >      >              .append(myStringTmp)
>      >      >              .append("bar")
>      >      >              .append(myObjTmp)
>      >      >              .toString();
>      >      >
>      >      >         As far as language constraints go, the JLS is
>     (apparently
>      >      >         deliberately)
>      >      >         vague about how string concatenation is
>     implemented.  "An
>      >      >         implementation
>      >      >         may choose to perform conversion and concatenation
>     in one
>      >     step
>      >      >         to avoid
>      >      >         creating and then discarding an intermediate String
>      >     object. To
>      >      >         increase
>      >      >         the performance of repeated string concatenation,
>     a Java
>      >      >         compiler may
>      >      >         use the StringBuffer class or a similar technique to
>      >     reduce the
>      >      >         number
>      >      >         of intermediate String objects that are created by
>      >     evaluation of an
>      >      >         expression."  We see no reason this approach would not
>      >     qualify as a
>      >      >         "similar technique."
>      >      >
>      >      >
>      >      >     The really key property of the string concatenation
>     operator is
>      >      >     left-associativity. Later subexpressions must not be
>      >     evaluated until
>      >      >     earlier subexpressions have been successfully
>     evaluated AND
>      >      >     concatenated. Consider this expression:
>      >      >
>      >      >        "foo" + m() + n()
>      >      >
>      >      >     which JLS8 15.8 specifies to mean:
>      >      >
>      >      >        ("foo" + m()) + n()
>      >      >
>      >      >     We know from JLS8 15.6 that if m() throws, then foo+m()
>      >     throws, and
>      >      >     n() will never be evaluated.
>      >      >
>      >      >     Happily, your translation doesn't appear to catch and
>     swallow
>      >      >     exceptions when eagerly evaluating each subexpression in
>      >     turn, so I
>      >      >     believe you won't evaluate n() if m() already threw.
>      >      >
>      >      >     Unhappily, a call to append(..) can in general fail with
>      >      >     OutOfMemoryError. (I'm not talking about asynchronous
>      >     exceptions in
>      >      >     general, but rather the sense that append(..)
>     manipulates the
>      >     heap
>      >      >     so an OOME is at least plausible.) In the OpenJDK
>      >     implementation, if
>      >      >     blah.append(m()) fails with OOME, then n() hasn't been
>      >     evaluated yet
>      >      >     -- that's "real" left-associativity. In the proposed
>      >     implementation,
>      >      >     it's possible that more memory is available when
>     evaluating
>      >     m() and
>      >      >     n() upfront than at the time of an append call, so n() is
>      >     evaluated
>      >      >     even if append(<<tmp result of m()>>) fails -- that's not
>      >      >     left-associative.
>      >      >
>      >      >     Perhaps you can set my mind at ease that append(..) can't
>      >     fail with
>      >      >     OOME?
>      >      >
>      >      >     Alex
>      >      >
>      >      >
>      >
>


More information about the compiler-dev mailing list