JEP 254: Compact Strings
Aleksey Shipilev
aleksey.shipilev at oracle.com
Mon Jun 1 13:31:07 UTC 2015
On 06/01/2015 03:54 PM, Vitaly Davidovich wrote:
> While it's true that the denser format will require fewer cachelines, my
> experience is that most strings are smaller than a single cacheline
> worth of storage, maybe two lines in some cases; there's just a ton of
> them in the heap. So the heap footprint should be substantially
> reduced, but I'm not sure the cache pollution will be significantly reduced.
This calculation assumes object allocations are granular to the cache
lines. They are not: if String takes less space within the cache line,
it allows *more* object data to be squeezed there. In other words, with
compact Strings, the entire dataset can take less cache lines, thus
improving performance.
> There's currently no vectorization of char[] scanning (or any
> vectorization other than memcpy for that matter) - are you referring to
> the recent Intel contributions here or there's a plan to further improve
> vectorization in time for this JEP? Just curious.
String methods are intensely intrinsified (and vectorized in those
implementations). String::equals, String::compareTo, and some
encoding/decoding come to mind.
I really, really invite you to read the collateral materials from the
JEP, where we explored quite a few performance characteristics already.
Thanks,
-Aleksey.
More information about the core-libs-dev
mailing list