More memory-efficient internal representation for Strings: call for more data

Douglas Surber douglas.surber at oracle.com
Wed Dec 3 00:42:46 UTC 2014


The most common operation on most Strings in query results is to do 
nothing. Just construct the String, hold onto it while the rest of 
the transaction completes, then drop it on the floor. Probably the 
next most common is to encode the chars to write them to an 
OutputStream or send them back to the database. I'd be curious how a 
compact representation would help those operations.

SPECjEnterprise is a widely used standard benchmark. It probably uses 
mostly (or even entirely) ASCII characters so it's not representative 
of many customers.

My definition of "sane limits" might be different than yours. As far 
as I'm concerned String construction is already too slow and should 
be made faster by eliminating the char[] copy when possible.

Douglas

At 03:47 PM 12/2/2014, Aleksey Shipilev wrote:
>Hi Douglas,
>
>On 12/03/2014 02:24 AM, Douglas Surber wrote:
> > String construction is a big performance issue for JDBC drivers. 
> Most
> > queries return some number of Strings. The overwhelming majority 
> of
> > those Strings will be short lived. The cost of constructing these
> > Strings from network bytes is a large fraction of total execution 
> time.
> > Any increase in the cost of constructing a String will far out 
> weigh any
> > reduction in memory use, at least for query results.
>
>You will also have to take into the account that shorter 
>(compressed)
>Strings allow for more efficient operations on them. This is not to
>mention the GC costs are also usually "hidden" from the naive
>performance estimations: even though you can perceive the mutator is
>spending more time doing work, that might be offset by easier job 
>for GC.
>
> > All of the proposed compression methods require an additional 
> scan of
> > the entire string. That's exactly the wrong direction. Something 
> like
> > the following pseudo-code is common inside a driver.
> >
> >   {
> >     char[] c = new char[n];
> >     for (i = 0; i < n; i++) c[i] = charSource.next();
> >     return new String(c);
> >   }
>
>Good to know. We will be assessing the String(char[]) construction
>performance in the course of this performance work. What would you 
>say
>is a characteristic high-level benchmark for the scenario you are
>describing?
>
> > The array copy inside the String constructor is a significant 
> fraction
> > of JDBC driver execution time. Adding an additional scan on top 
> of it is
> > making things worse regardless of the transient benefit of more 
> compact
> > storage. In the case of a query result the String will be likely 
> never
> > be promoted out of new space; the benefit of compression would be 
> minimal.
>
>It's hard to say at this point. We want to understand what footprint
>improvements we are talking about. I agree that if cost-benefit 
>analysis
>will say the performance is degrading beyond the sane limits even if 
>we
>are happy with memory savings, there is little reason to push this 
>into
>the general JDK.
>
>Thanks,
>-Aleksey
>




More information about the core-libs-dev mailing list