Gap Buffer based AbstractStringBuilder implementation

Osvaldo Doederlein opinali at gmail.com
Mon Nov 23 13:25:06 UTC 2009


Hi Jesús,

I'm sorry for the noise, I just forgot to consider the issues of a
StringBuilder shared between threads (I'm well aware of the JMM etc). A
partial "fix" of sharing only StringBuffer seems useless because we're many
years past JDK1.5 and most code uses StringBuilder now; the tradeoff of
synchronization versus copying is pretty bad, as you confirmed again.

I agree that getting some help from HotSpot seems to be the only solution.
HotSpot is certainly able to detect the easiest case of confinement through
EA, and for StringBuilder this should detect virtually all uses. HotSpot
would also need to compile toString() as a intrinsic, inlining alternative
non-copying code, when it detected that the buffer object is confined.

I would like to see a more general solution; we may have many other cases of
potentially great optimizations that are not done just because they're not
thread-safe, but could be done for thread-confined objects. A raw
suggestion:

public String toString () {
  return Unsafe.isConfined(this) ? optimizedToString() : standardToString();
}

In this case HotSpot only has to provide one extra helper, isConfined(),
that returns true iff the argument can be proven by the compiler to be
confined to a single thread. This doesn't look difficult because it's not a
new operation, just an accessor to information that the optimizer already
has. The result is a compile-time constant (per compilation site) so the
code generated for toString() has no extra calls or branches, it's just a
straight call (or inlining) of either optimizedToString() or
standardToString(). (For the interpreter, C1, or C2 with EA disabled,
isConfined() would just always return false.) Now the big advantage of this
approach is that the library team doesn't need to poke the compiler guys to
add extra intrinsic compilation for every method that may benefit from
optimizations which are only safe for confined objects.

A+
Osvaldo

2009/11/22 Jesús Viñuales <serverperformance at gmail.com>

> Osvaldo Doederlein wrote:
> >
> > Em 22/11/2009 05:55, Thomas Hawtin escreveu:
> >>
> >> There is a security issue there. When multiple threads are involved,
> >> it is possible (though not necessily easy) to create a mutable String
> >> if the backing char[] is shared.
> >>
> >> Tom Hawtin
> >
> > That's true. But there's apparently a simple solution
> >
> >     public String toStringShared() {
> >         // createShared() is a package-protected helper/ctor
> >         String ret = String.createShared(value, 0, count);
> >         // Reset value, so evil user can't abuse the buffer to change
> > the String.
> >         value = EMPTY;
> >         count = 0;
> >         return ret;
> >     }
> >     private static final char[] EMPTY = new char[0];
> >
> > This solution should be safe, without need of escape/alias analysis,
> > because StringBuilder and StringBuffer don't have any methods that
> > return a new mutable object that shares the same char[]. The only APIs
> > that aliases the buffer is subSequence(), but this returns a
> > CharSequence which is a read-only object.
> >
> > A+
> > Osvaldo
>
> I don't agree. That solution isn't safe because the involved methods aren't
> synchronized (in StringBuilder), nor you have any guarantee within the Java
> memory model about the visibility to other threads of your changes in the
> value and count variables ... except if they are volatile. And if you have
> to establish the values for more than one variable (value and count) in an
> atomic fashion, the volatile approach doesn't help you. And also may cause
> the String to appear to mutate if one thread calls toString() while another
> is between the read of shared and the insert/append/delete operation, or
> even worst, executing the operation itself).
>
> I'm pretty sure that the only solution is a copy-on-write approach based in
> a volatile boolean flag, and not a never-copy one as Andrew said (and I
> remember that GNU Classpath implementation even addressed the "unused space
> consumption problem" evaluating in the toString method how much unused
> space
> had the buffer, and if the underlining char[] is too big, make a copy
> instead of sharing it).
>
> Anyway. Read carefully the evaluation section of
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6219959. Reintroducing
> the copy-on-write approach was tested by Sun in 2006 approach, and "It was
> discovered that the reintroduction of the sharing code caused a
> reproducible
> regression on the order of 4% in SPECjbb2005 scores", surely for impacting
> the GC or whatever. If you see the prototype description, it is perfect:
> using a volatile flag, testing whether to share or to copy the char[] in
> the
> toString method, etc.
>
> I tried different approaches last year, and even posted one of them in this
> forum (as you can see in archives) but with no luck.
>
> My guess is that this kind of COW optimization is work for the Hotspot via
> Escape analysis... or in the end of the chain, work for the MMU of the CPU.
>
> Regards,
> Jesus
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20091123/e25f77ec/attachment.html>


More information about the core-libs-dev mailing list