RFR: 8176188: jdk/internal/misc/JavaLangAccess/NewUnsafeString.java failing since 9-b93

Martin Buchholz martinrb at google.com
Mon Dec 4 23:46:18 UTC 2017


Thanks, Claes.

I think we're in agreement!

I did that shared String optimization for StringJoiner.  It's a lot harder
to justify in the new String world because we have to handle non-ASCII, and
the non-ASCII case is actually fairly likely.

Does it make sense to keep COMPACT_STRINGS as an option in openjdk?  If
essentially every user runs with that flag turned on, bit rot is likely to
set in, and even if not, we have to remain ever vigilant to not break
anyone turning it off.

The code in Long.fastUUID is indeed ugly.  I've never heard of UUID
creation being a bottleneck.  At Google it sometimes seems all our java
performance problems are with zip file manipulation.

I might have avoided the code duplication in Long.fastUUID by:
- build the all-ASCII byte[] unconditionally
- if COMPACT_STRINGS, then use the secret LATIN1 constructor, else use the
public constructor(byte[], ISO_8859_1)

One idea for a public performant API is to add something like
Charset.toString(byte[]) or Charset.toString(byte[], int start, int length)
via default method on Charset.  Then; override that in ISO_8859_1 to do
only a single copy (and can we cheat somehow to do no copies?)

On Mon, Dec 4, 2017 at 1:47 PM, Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi Martin,
>
> On 2017-12-04 22:06, Martin Buchholz wrote:
>
>> I'm rather sad about what happened to our non-copying String
>> constructions for trusted code.  This is a performance regression with the
>> change in String representation that should have blocked that change IMO.
>> I think we should have a plan for moving in the opposite direction.  I
>> don't think we can implement something as ambitious as Rust's ownership
>> tracking, so have to restrict ourselves to trusted code.  The use case that
>> keeps coming up is constructing zip entry names, which are much more likely
>> to be pure ASCII than their file contents.
>>
>> I don't have a good design for how one could do that, and who the trusted
>> set of callers is (at least java.base, not java.lang), but I think we
>> should set a direction.
>>
>
> as I alluded to in a footnote there exists a non-copying String(byte[]
> value, byte coder) constructor - the problem is that it's somewhat
> cumbersome to use:
>
> - first off, the caller needs to be aware about the value of
> String.COMPACT_STRINGS: if false, all strings needs to be UTF-16 encoded
> and the coder byte always set to String.UTF16
> - secondly, the caller needs to know if the byte[] you're constructing
> needs to be LATIN-1 or UTF-16 encoded up front and act accordingly
>
> Some of the more performance sensitive uses outside of java.lang was
> addressed by the Compact Strings update, for example the implementation
> backing java.util.UUID was somewhat surprisingly moved into
> java.lang.Long::fastUUID[1]. Something similar is doable for the java.sql
> types, but further complicated by those classes being in a different
> module, and ultimately questionable since their implementations in JDK 9
> are quite a bit more performant than in any previous release (thus not
> technically a regression).
>
> That leaves StringJoiner as the one case that stands out. And the fact
> that existing uses of String(byte[], byte) are a bit of an eye-sore[1!!1!].
>
> One idea I'm tinkering with here is to have a trusted, package-private
> SharedStringBuilder added to java.lang and exposed via SharedSecrets. It'd
> more or less mimic StringBuilder (including deal with inflating the byte[]
> when necessary, encapsulate the awkward String.COMPACT_STRINGS checks etc)
> but enable calling String(byte[], byte) in the toString() call.  To be
> effective it'll only have a single constructor taking the capacity, and
> should probably throw IOOBE rather than resize the internal buffer. Some
> cases like Long::fastUUID could probably be much simplified by using such a
> builder instead (for a very minimal overhead). Does that sound reasonable?
> At any rate I think of this as a possible follow-up RFE, and not an
> alternative to the cleanup/"bugfix" at hand.
>
> Thanks!
>
> /Claes
>
> [1] http://hg.openjdk.java.net/jdk/jdk/file/532cdc178e42/src/jav
> a.base/share/classes/java/lang/Long.java#l427
>
>


More information about the core-libs-dev mailing list