JEP 254: Compact Strings

Vitaly Davidovich vitalyd at gmail.com
Mon Jun 1 12:54:54 UTC 2015


Hi Aleksey,

While it's true that the denser format will require fewer cachelines, my
experience is that most strings are smaller than a single cacheline worth
of storage, maybe two lines in some cases; there's just a ton of them in
the heap.  So the heap footprint should be substantially reduced, but I'm
not sure the cache pollution will be significantly reduced.

There's currently no vectorization of char[] scanning (or any vectorization
other than memcpy for that matter) - are you referring to the recent Intel
contributions here or there's a plan to further improve vectorization in
time for this JEP? Just curious.

I agree that string fusion is separate from this change, and we've
discussed this before.  It just seems to me like that's a bigger perf
problem today since even tiny/small strings (very common, IME) incur the
indirection and bloat overhead, so would have liked to see that addressed
first.  If you're saying that's fully on valhalla's plate, ok, but I
haven't seen anything proposed there yet.

Thanks

sent from my phone
On Jun 1, 2015 4:50 AM, "Aleksey Shipilev" <aleksey.shipilev at oracle.com>
wrote:

> On 05/18/2015 05:35 PM, Vitaly Davidovich wrote:
> > This part is a bit unclear for the proposed changes.  While it's true
> that
> > single byte encoding will be denser than two byte, most string ops end up
> > walking the backing store linearly; prefetch (either implicit h/w or
> > software-assisted) could hide the memory access latency.
>
> It will still pollute the caches though, and generally incur more
> instructions to be executed (e.g. think about the vectorized scan of the
> char[] array -- the compressed version will take 2x less instructions).
>
>
> > Personally, what I'd like to see is fusing storage of String with its
> > backing data, irrespective of encoding (i.e. removing the indirection to
> > fetch the char[] or byte[]).
>
> This is not the target for this JEP, and the groundwork for
> String-char[] fusion is handled elsewhere (I put my hopes at Valhalla
> that will explore the exact path to add the "exotic" object shapes into
> the runtime).
>
> String-char[] fusion neither conflicts with the Compact String
> optimization, nor provides the alternative. Removing the "excess"
> headers from backing char[] array would solve the "static" overhead in
> Strings, while the String compaction would further compact the backing
> storage.
>
> Thanks,
> -Aleksey.
>
>
>



More information about the core-libs-dev mailing list