RFR: String Density/Compact String JEP 254
Xueming Shen
xueming.shen at oracle.com
Fri Oct 2 21:13:38 UTC 2015
Hi,
Please review the change for JEP 254/Compact String project.
JPE 254: http://openjdk.java.net/jeps/254
Issue: https://bugs.openjdk.java.net/browse/JDK-8054307
Webrevs: http://cr.openjdk.java.net/~sherman/8054307/jdk/
http://cr.openjdk.java.net/~thartmann/compact_strings/webrev/hotspot
Description:
String Density project is to change the internal representation of the
String class from a UTF-16 char array to a byte array plus an encoding
flag field. The new String class stores characters encoded either as
ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes
per character), based upon the contents of the string. The encoding
flag indicates which encoding is used. It offers reduced memory footprint
while maintaining throughput performance. See JEP 254 for more additional
information
Implementation repo/try out:
http://hg.openjdk.java.net/jdk9/sandbox/ branch: JDK-8054307-branch
$ hg clone http://hg.openjdk.java.net/jdk9/sandbox/
$ cd sandbox
$ sh ./get_source.sh
$ sh ./common/bin/hgforest.sh up -r JDK-8054307-branch
$ make configure
$ make images
Implementation Notes:
- To change the internal representation of the String and the String
builder classes (AbstractStringBuilder, StringBuilder and StringBuffer)
from a UTF-16 char array to a byte array plus an encoding flag field.
The new representation stores the String characters in a single byte
format using the lower 8-bit of character's 16-bit UTF16 value, and
sets the encoding flag as LATIN1, if all characters of the String
object are Unicode Latin1 characters (with its UTF16 value < \u0100)
It stores the String characters in 2-byte format with their UTF-16 value
and sets the flag as UTF16, if any of the character inside the String
object is NOT Unicode latin1 character.
- To change the method implementation of the String class and its builders
to function on the new internal character storage, mainly to delegate to
two implementation classes StringUTF16 and StringLatin1
- To update the StringCoding class to decoding/encoding the String between
String.byte[]/coder(LATIN1/UTF16) <-> byte[](native encoding) instead
of the original String.char[] <-> byte[] (native encoding)
- To update the hotSpot compiler (new and updated instrinsics), GC (String
Deduplication mods) and Runtime to work with the new internal "byte[] +
coder flag" representation.
See Tobias's note for details of the hotspot changes:
http://cr.openjdk.java.net/~thartmann/compact_strings/hotspot-impl-note
- To add a vm option "CompactStrings" (default is true) to provide a
switch-off mechanism to always store the String characters in UTF16
encoding (always 2 bytes, but still in a byte[], instead of the
original char[]).
Supporting performance artifacts:
- Report(s) on memory footprint impact
http://cr.openjdk.java.net/~shade/density/string-density-report.pdf
Latest SPECjbb2005 footprint reduction and throughput numbers for both
Intel (Linux) and SPARC, in which it shows the Compact String binaries
use less memory and have higher throughput.
latest:http://cr.openjdk.java.net/~sherman/8054307/specjbb2005
old: http://cr.openjdk.java.net/~huntch/string-density/reports/String-Density-SPARC-jbb2005-Report.pdf
- Throughput performance impact via String API micro-benchmarks
http://cr.openjdk.java.net/~thartmann/compact_strings/microbenchmarks/Haswell_090915.pdf
http://cr.openjdk.java.net/~thartmann/compact_strings/microbenchmarks/IvyBridge_090915.pdf
http://cr.openjdk.java.net/~thartmann/compact_strings/microbenchmarks/Sparc_090915.pdf
http://cr.openjdk.java.net/~sherman/8054307/string-coding.txt
Thanks,
Sherman
More information about the core-libs-dev
mailing list