RFR JDK-8059510 Compact symbol table layout inside shared archive

Wed Oct 29 19:46:38 UTC 2014

I have a few points about the impact of this change on startup performance, and on trends in our code base:

1. We can live with small performance regressions on some benchmarks.  Otherwise we'd never get anywhere.  So I am not saying that the current (very interesting and multi-facted) conversation must continue a long time before we can push any code.

2. Aleksey's challenge is valid, and I don't see a strong reply to it yet.  Work like this, that emphasizes compactness and sharability can usually be given the benefit of the doubt for startup.  But if we observe a problem we should make a more careful measurement.  If this change goes in with a measurable regression, we need a follow-up conversation (centered around a tracking bug) about quantifying the performance regression and fixing it.  (It's possible the conversation ends by saying "we decided we don't care and here's why", but I doubt it will end that way.)

3. The benchmark Aleksey chose, Nashorn startup, may well be an outlier.  Dynamic language runtimes create lots of tiny methods, and are likely to put heavy loads on symbol resolution, including resolution of symbols that are not known statically.  For these, the first-look (front-end) table being proposed might be a loss, especially if the Nashorn runtime is not in the first-look table.

4. Possible workarounds:  #1 Track hit rates and start looking at the static table *second* if it fails to produce enough hits, compared to the dynamic table.  #2 When that happens, incrementally migrate (rehash) frequently found symbols in the static table into the main table.  #3 Re-evaluate the hashing algorithm.  (More on this below.)

5. I strongly support moving towards position-independent, shareable, pointer-free data structures, and I want us to learn to do it well over time.  (Ioi, you are one of our main explorers here!)  To me, a performance regression is a suggestion that we have something more to learn.  And, it's not a surprise that we don't have it all figured out yet.

6. For work like this, we need to agree upon at least one or two startup performance tests to shake out bottlenecks early and give confidence.  People who work on startup performance should know how to run them and be able to quote them.

7. There's a vibrant literature about offline (statically created) hash tables, and lots of tricks floating around, such as perfect or semi-perfect hash functions, and multiple-choice hashing, and locality-aware structures.  I can think of several algorithmic tweaks I'd like to try on this code.  (If they haven't already been tried or discarded: I assume Ioi has already tried a number of things.)  Moreover, this is not just doorknob-polishing, because (as said in point 5) we want to develop our competency with these sorts of data structures.

8. I found the code hard to read, so it was hard to reason about the algorithm.  The core routine, "lookup", has no comments and only one assert.  It uses only primitive C types and (perhaps in the name of performance) is not factored into sub-functions.  The code which generates the static table is also hard to reason about in similar ways.  The bucket size parameter choice (a crucial compromise between performance and compactness) is 4 but the motivation and implications are left for the reader to puzzle out.  Likewise, the use of hardware division instead of power-of-two table size is presented without comment, despite the fact that we favor power-of-two in other parts of our stack, and a power-of-two solution would be reasonable here (given the bucket size).

9. Perhaps I wouldn't care as much about the code style if the code were not performance sensitive or if it were a unique and isolated part of the JVM.  In this case, I expect this code (and other similar code) to improve,  over time, in readability and robustness, as we learn to work with this new kind of data structure.  So even if we decide there is no significant regression here, and decide to push it as-is, we still need to use it as an example to help us get better at writing easy-to-read code which works with pointer-free data.

10. I would like to see (posted somewhere or attached to the bug) a sample list of the symbols in a typical symbol table.  Perhaps this already exists and I missed it.  I think it would be friendly to put some comments in the code that help the reader estimate numbers like table size, bucket length, number of queries, number of probes per query, symbol length statistics (mean, histogram).  Of course such comments go stale over time, but so does the algorithm, and being coy about the conditions of the moment doesn't help us in the long run.  Even a short comment is better than none, for example:
  http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/classfile/vmSymbols.cpp#l206

It is a good time to have these conversations.

Best wishes,
— John

On Oct 13, 2014, at 11:46 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:

> Hi Jiangli,
> 
> On 13.10.2014 18:26, Jiangli Zhou wrote:
>> On 10/13/2014 03:18 AM, Aleksey Shipilev wrote:
>>> On 13.10.2014 03:32, David Holmes wrote:
>>>> On 11/10/2014 1:47 PM, Jiangli Zhou wrote:
>>>> Also is the benchmarking being done on dedicated systems?
>>> Also, specjvm98 is meaningless to estimate the classloading costs.
>>> Please try specjvm2008:startup.* tests?
>> 
>> The specjvm run was for Gerard's question about standard benchmarks.
> 
> SPECjvm2008 is a standard benchmark. In fact, it is a benchmark that
> deprecates SPECjvm98.
> 
>> These are not benchmarks specifically for classloading. 
> 
> There are benchmarks that try to estimate the startup costs.
> SPECjvm2008:startup.* tests are one of them.
> 
>> However, I agree it's a good idea to run standard benchmarks to
>> confirm there is no overall performance degradation. From all the
>> benchmarks including classloading measurements, we have confirmed
>> that this specific change does not have negative impact on
>> classloading itself and the overall performance.
> 
> Excellent. What are those benchmarks? May we see those? Because I have a
> counter-example in this thread that this change *does* negatively impact
> classloading.
> 
> -Aleksey.
>