RFR JDK-8059510 Compact symbol table layout inside shared archive

Thu Oct 30 23:45:11 UTC 2014

Hi John,

Great comments! Please see in-line

On 10/29/14, 12:46 PM, John Rose wrote:
> I have a few points about the impact of this change on startup
> performance, and on trends in our code base:
>
> 1. We can live with small performance regressions on some benchmarks.
> Otherwise we'd never get anywhere.  So I am not saying that the
> current (very interesting and multi-facted) conversation must continue
> a long time before we can push any code.
>
> 2. Aleksey's challenge is valid, and I don't see a strong reply to it
> yet.  Work like this, that emphasizes compactness and sharability can
> usually be given the benefit of the doubt for startup.  But if we
> observe a problem we should make a more careful measurement.  If this
> change goes in with a measurable regression, we need a follow-up
> conversation (centered around a tracking bug) about quantifying the
> performance regression and fixing it.  (It's possible the conversation
> ends by saying "we decided we don't care and here's why", but I doubt
> it will end that way.)
>
> 3. The benchmark Aleksey chose, Nashorn startup, may well be an
> outlier.  Dynamic language runtimes create lots of tiny methods, and
> are likely to put heavy loads on symbol resolution, including
> resolution of symbols that are not known statically.  For these, the
> first-look (front-end) table being proposed might be a loss,
> especially if the Nashorn runtime is not in the first-look table.
>
> 4. Possible workarounds: #1 Track hit rates and start looking at the
> static table *second* if it fails to produce enough hits, compared to
> the dynamic table.  #2 When that happens, incrementally migrate
> (rehash) frequently found symbols in the static table into the main
> table.  #3 Re-evaluate the hashing algorithm.  (More on this below.)
>
I actually was thinking about the same thing to defeat (hmm, optimize 
for) Aleksey's benchmark. I think we can do something simpler -- first 
search the table where you found a match last time.
> 5. I strongly support moving towards position-independent, shareable,
> pointer-free data structures, and I want us to learn to do it well
> over time.  (Ioi, you are one of our main explorers here!)  To me, a
> performance regression is a suggestion that we have something more to
> learn.  And, it's not a surprise that we don't have it all figured out
> yet.
>
> 6. For work like this, we need to agree upon at least one or two
> startup performance tests to shake out bottlenecks early and give
> confidence.  People who work on startup performance should know how to
> run them and be able to quote them.
>
> 7. There's a vibrant literature about offline (statically created)
> hash tables, and lots of tricks floating around, such as perfect or
> semi-perfect hash functions, and multiple-choice hashing, and
> locality-aware structures.  I can think of several algorithmic tweaks
> I'd like to try on this code.  (If they haven't already been tried or
> discarded: I assume Ioi has already tried a number of things.)
> Moreover, this is not just doorknob-polishing, because (as said in
> point 5) we want to develop our competency with these sorts of data
> structures.
>
Do you have any suggestion for a good perfect hash function for this case?

> 8. I found the code hard to read, so it was hard to reason about the
> algorithm.  The core routine, "lookup", has no comments and only one
> assert.  It uses only primitive C types and (perhaps in the name of
> performance) is not factored into sub-functions.  The code which
> generates the static table is also hard to reason about in similar
> ways.  The bucket size parameter choice (a crucial compromise between
> performance and compactness) is 4 but the motivation and implications
> are left for the reader to puzzle out.  Likewise, the use of hardware
> division instead of power-of-two table size is presented without
> comment, despite the fact that we favor power-of-two in other parts of
> our stack, and a power-of-two solution would be reasonable here (given
> the bucket size).
Now that I am reading my code again, I am getting dizzy :-)

I think one problem is the initialization of all these variables:

template <class T, class N> T CompactHashtable<T, N>::lookup(const N* 
name, unsigned int hash, int len) {
   if (_entry_count > 0) {
     assert(!DumpSharedSpaces, "run-time only");
     int index = hash % _bucket_count;
     jushort* bucket_sizes = (jushort*)(_buckets + _bucket_count);
     juint bucket_start = _buckets[index];
     juint bucket_size = bucket_sizes[index];
     juint* bucket = _buckets + bucket_start;
     juint* bucket_end = bucket + bucket_size * 2;

bucket_sizes is actually a constant, so we can save that when the table 
is initialized.

We'll try to rewrite it with more clarity.
> 9. Perhaps I wouldn't care as much about the code style if the code
> were not performance sensitive or if it were a unique and isolated
> part of the JVM.  In this case, I expect this code (and other similar
> code) to improve, over time, in readability and robustness, as we
> learn to work with this new kind of data structure.  So even if we
> decide there is no significant regression here, and decide to push it
> as-is, we still need to use it as an example to help us get better at
> writing easy-to-read code which works with pointer-free data.
>
> 10. I would like to see (posted somewhere or attached to the bug) a
> sample list of the symbols in a typical symbol table.  Perhaps this
> already exists and I missed it.  I think it would be friendly to put
> some comments in the code that help the reader estimate numbers like
> table size, bucket length, number of queries, number of probes per
> query, symbol length statistics (mean, histogram).  Of course such
> comments go stale over time, but so does the algorithm, and being
> coy about the conditions of the moment doesn't help us in the long
> run.  Even a short comment is better than none, for example:
>
I think we can add most static statistics (is there such as word) in 
the  new "jcmd VM.symboltable" diagnostic command.

Dynamic statistics, such as access frqeuency, would be harder to do, 
especially it may slow things down even further (one extra global flag 
check).

Thanks

- Ioi
> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/classfile/vmSymbols.cpp#l206
>
> It is a good time to have these conversations.
>
> Best wishes,
> — John
>
> On Oct 13, 2014, at 11:46 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>
>> Hi Jiangli,
>>
>> On 13.10.2014 18:26, Jiangli Zhou wrote:
>>> On 10/13/2014 03:18 AM, Aleksey Shipilev wrote:
>>>> On 13.10.2014 03:32, David Holmes wrote:
>>>>> On 11/10/2014 1:47 PM, Jiangli Zhou wrote:
>>>>> Also is the benchmarking being done on dedicated systems?
>>>> Also, specjvm98 is meaningless to estimate the classloading costs.
>>>> Please try specjvm2008:startup.* tests?
>>> The specjvm run was for Gerard's question about standard benchmarks.
>> SPECjvm2008 is a standard benchmark. In fact, it is a benchmark that
>> deprecates SPECjvm98.
>>
>>> These are not benchmarks specifically for classloading.
>> There are benchmarks that try to estimate the startup costs.
>> SPECjvm2008:startup.* tests are one of them.
>>
>>> However, I agree it's a good idea to run standard benchmarks to
>>> confirm there is no overall performance degradation. From all the
>>> benchmarks including classloading measurements, we have confirmed
>>> that this specific change does not have negative impact on
>>> classloading itself and the overall performance.
>> Excellent. What are those benchmarks? May we see those? Because I have a
>> counter-example in this thread that this change *does* negatively impact
>> classloading.
>>
>> -Aleksey.
>>