More memory-efficient internal representation for Strings: call for more data

Tue Dec 2 23:24:12 UTC 2014

String construction is a big performance issue for JDBC drivers. Most 
queries return some number of Strings. The overwhelming majority of 
those Strings will be short lived. The cost of constructing these 
Strings from network bytes is a large fraction of total execution 
time. Any increase in the cost of constructing a String will far out 
weigh any reduction in memory use, at least for query results.

All of the proposed compression methods require an additional scan of 
the entire string. That's exactly the wrong direction. Something like 
the following pseudo-code is common inside a driver.

   {
     char[] c = new char[n];
     for (i = 0; i < n; i++) c[i] = charSource.next();
     return new String(c);
   }

The array copy inside the String constructor is a significant 
fraction of JDBC driver execution time. Adding an additional scan on 
top of it is making things worse regardless of the transient benefit 
of more compact storage. In the case of a query result the String 
will be likely never be promoted out of new space; the benefit of 
compression would be minimal.

I don't dispute that Strings occupy a significant fraction of the 
heap or that a lot of those bytes are zero. And I certainly agree 
that reducing memory footprint is valuable, but any worsening of 
String construction time will likely be a problem.

Douglas

At 02:13 PM 12/2/2014, core-libs-dev-request at openjdk.java.net wrote:
>Date: Wed, 03 Dec 2014 00:59:10 +0300
>From: Aleksey Shipilev <aleksey.shipilev at oracle.com>
>To: Java Core Libs <core-libs-dev at openjdk.java.net>
>Cc: charlie hunt <charlie.hunt at oracle.com>
>Subject: More memory-efficient internal representation for Strings:
>         call for        more data
>Message-ID: <547E362E.5010107 at oracle.com>
>Content-Type: text/plain; charset=utf-8
>
>Hi,
>
>As you may already know, we are looking into more memory efficient
>representation for Strings:
>  https://bugs.openjdk.java.net/browse/JDK-8054307
>
>As part of preliminary performance work for this JEP, we have to 
>collect
>the empirical data on usual characteristics of Strings and char[]-s
>normal applications have, as well as figure out the early estimates 
>for
>the improvements based on that data. What we have so far is written 
>up here:
> 
>http://cr.openjdk.java.net/~shade/density/string-density-report.pdf
>
>We would appreciate if people who are interested in this JEP can 
>provide
>the additional data on their applications. It is double-interesting 
>to
>have the data for the applications that process String data outside
>Latin1 plane. Our current data says these cases are rather rare. 
>Please
>read the current report draft, and try to process your own heap 
>dumps
>using the instructions in the Appendix.
>
>Thanks,
>-Aleksey.