RFR: 8176188: jdk/internal/misc/JavaLangAccess/NewUnsafeString.java failing since 9-b93

Claes Redestad claes.redestad at oracle.com
Mon Dec 4 21:47:20 UTC 2017


Hi Martin,

On 2017-12-04 22:06, Martin Buchholz wrote:
> I'm rather sad about what happened to our non-copying String 
> constructions for trusted code.  This is a performance regression with 
> the change in String representation that should have blocked that 
> change IMO.  I think we should have a plan for moving in the opposite 
> direction.  I don't think we can implement something as ambitious as 
> Rust's ownership tracking, so have to restrict ourselves to trusted 
> code.  The use case that keeps coming up is constructing zip entry 
> names, which are much more likely to be pure ASCII than their file 
> contents.
>
> I don't have a good design for how one could do that, and who the 
> trusted set of callers is (at least java.base, not java.lang), but I 
> think we should set a direction.

as I alluded to in a footnote there exists a non-copying String(byte[] 
value, byte coder) constructor - the problem is that it's somewhat 
cumbersome to use:

- first off, the caller needs to be aware about the value of 
String.COMPACT_STRINGS: if false, all strings needs to be UTF-16 encoded 
and the coder byte always set to String.UTF16
- secondly, the caller needs to know if the byte[] you're constructing 
needs to be LATIN-1 or UTF-16 encoded up front and act accordingly

Some of the more performance sensitive uses outside of java.lang was 
addressed by the Compact Strings update, for example the implementation 
backing java.util.UUID was somewhat surprisingly moved into 
java.lang.Long::fastUUID[1]. Something similar is doable for the 
java.sql types, but further complicated by those classes being in a 
different module, and ultimately questionable since their 
implementations in JDK 9 are quite a bit more performant than in any 
previous release (thus not technically a regression).

That leaves StringJoiner as the one case that stands out. And the fact 
that existing uses of String(byte[], byte) are a bit of an eye-sore[1!!1!].

One idea I'm tinkering with here is to have a trusted, package-private 
SharedStringBuilder added to java.lang and exposed via SharedSecrets. 
It'd more or less mimic StringBuilder (including deal with inflating the 
byte[] when necessary, encapsulate the awkward String.COMPACT_STRINGS 
checks etc) but enable calling String(byte[], byte) in the toString() 
call.  To be effective it'll only have a single constructor taking the 
capacity, and should probably throw IOOBE rather than resize the 
internal buffer. Some cases like Long::fastUUID could probably be much 
simplified by using such a builder instead (for a very minimal 
overhead). Does that sound reasonable? At any rate I think of this as a 
possible follow-up RFE, and not an alternative to the cleanup/"bugfix" 
at hand.

Thanks!

/Claes

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/532cdc178e42/src/java.base/share/classes/java/lang/Long.java#l427



More information about the core-libs-dev mailing list