String concatenation tweaks
lowasser at google.com
Fri Mar 13 01:01:01 UTC 2015
I confess I'm not sure how that "quality" goal would be achievable in
bytecode without deliberately allocating arrays we then discard.
For what it's worth, delaying or avoiding OOMEs seems a desirable goal in
general, and up to a constant multiplicative factor, this implementation
seems to allocate the same amount in the same order. That is, we're still
computing m().toString() before n().toString(), and up to a constant
multiplicative factor, m().toString() allocates the same number of bytes as
the StringBuilder the status quo generates. So if m() does something like
allocate a char[Integer.MAX_VALUE], we still OOM at the appropriate time.
Other notes: this implementation would tend to decrease maximum allocation,
so it'd reduce OOMEs. Also, since the StringBuilder will never need
further expansion and we're only using the String and primitive overloads
of append, the only way for append to OOME would be in append(float) and
append(double), which allocate a FloatingDecimal (which may, in turn,
allocate a new thread-local char if one isn't already there).
On Thu, Mar 12, 2015 at 4:28 PM Alex Buckley <alex.buckley at oracle.com>
> More abstract presentation. Given the expression:
> "foo" + m() + n()
> you must not evaluate n() if evaluation of "foo" + m() completes
> abruptly. The proposed implementation evaluates n() regardless.
> All is not lost. In the proposed implementation, the abrupt completion
> of "foo" + m() could occur because an append call fails or (thanks to
> Jon for pointing this out) the StringBuilder ctor fails. The
> quality-of-implementation issue is thus: if the proposed implementation
> is of sufficiently high quality to guarantee that the ctor and the first
> append both succeed, then the evaluation of "foo" + m() will always
> complete normally, and it would be an unobservable (thus acceptable)
> implementation detail to evaluate n() early.
> On 3/11/2015 10:26 PM, Jeremy Manson wrote:
> > Isn't Louis's proposed behavior equivalent to saying "the rightmost
> > concatenation threw an OOME" instead of "some concatenation in the
> > middle threw an OOME"?
> > It's true that the intermediate String concatenations haven't occurred
> > at that point, but in the JDK's current implementation, that's true,
> > too: the concatenations that have occurred at that point are
> > StringBuilder ones, not String ones. If any of the append operations
> > throws an OOME, no Strings have been created at all, either in Louis's
> > implementation or in the JDK's.
> > Ultimately, isn't this a quality of implementation issue? And if so,
> > isn't it a quality of implementation issue that doesn't provide any
> > additional quality? I can't imagine code whose semantics relies on
> > this, and if they do, they are relying on something
> > implementation-dependent.
> > Jeremy
> > On Wed, Mar 11, 2015 at 6:13 PM, Alex Buckley <alex.buckley at oracle.com
> > <mailto:alex.buckley at oracle.com>> wrote:
> > On 3/11/2015 2:01 PM, Louis Wasserman wrote:
> > So for example, "foo" + myInt + myString + "bar" + myObj would be
> > compiled to the equivalent of
> > int myIntTmp = myInt;
> > String myStringTmp = String.valueOf(myString); // defend against
> > null
> > String myObjTmp = String.valueOf(String.valueOf(__myObj)); //
> > against evil toString implementations returning null
> > return new StringBuilder(
> > 17 // length of "foo" (3) + max length of myInt (11) +
> > length of
> > "bar" (3)
> > + myStringTmp.length()
> > + myObjTmp.length())
> > .append("foo")
> > .append(myIntTmp)
> > .append(myStringTmp)
> > .append("bar")
> > .append(myObjTmp)
> > .toString();
> > As far as language constraints go, the JLS is (apparently
> > deliberately)
> > vague about how string concatenation is implemented. "An
> > implementation
> > may choose to perform conversion and concatenation in one step
> > to avoid
> > creating and then discarding an intermediate String object. To
> > increase
> > the performance of repeated string concatenation, a Java
> > compiler may
> > use the StringBuffer class or a similar technique to reduce the
> > number
> > of intermediate String objects that are created by evaluation of
> > expression." We see no reason this approach would not qualify
> as a
> > "similar technique."
> > The really key property of the string concatenation operator is
> > left-associativity. Later subexpressions must not be evaluated until
> > earlier subexpressions have been successfully evaluated AND
> > concatenated. Consider this expression:
> > "foo" + m() + n()
> > which JLS8 15.8 specifies to mean:
> > ("foo" + m()) + n()
> > We know from JLS8 15.6 that if m() throws, then foo+m() throws, and
> > n() will never be evaluated.
> > Happily, your translation doesn't appear to catch and swallow
> > exceptions when eagerly evaluating each subexpression in turn, so I
> > believe you won't evaluate n() if m() already threw.
> > Unhappily, a call to append(..) can in general fail with
> > OutOfMemoryError. (I'm not talking about asynchronous exceptions in
> > general, but rather the sense that append(..) manipulates the heap
> > so an OOME is at least plausible.) In the OpenJDK implementation, if
> > blah.append(m()) fails with OOME, then n() hasn't been evaluated yet
> > -- that's "real" left-associativity. In the proposed implementation,
> > it's possible that more memory is available when evaluating m() and
> > n() upfront than at the time of an append call, so n() is evaluated
> > even if append(<<tmp result of m()>>) fails -- that's not
> > left-associative.
> > Perhaps you can set my mind at ease that append(..) can't fail with
> > OOME?
> > Alex
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the compiler-dev