RFR JDK-8059510 Compact symbol table layout inside shared archive
Jiangli Zhou
jiangli.zhou at oracle.com
Wed Dec 3 21:26:14 UTC 2014
On 12/03/2014 01:20 PM, Jiangli Zhou wrote:
> Hi Ioi,
>
> Thanks for the comments.
>
> On 12/03/2014 12:06 PM, Ioi Lam wrote:
>> Hi Jiangli,
>>
>> The changes look good. Just a few nits:
>>
>> (1) src/share/vm/services/diagnosticCommand.cpp:
>>
>> This seems to be not needed
>> 27 #include "classfile/compactHashtable.hpp"
>
> The #include of the header file is needed. The VM.symboltable and
> VM.stringtable definitions are moved to compactHashtable.hpp to be
> more close to their usage.
>
>>
>> (2) test/runtime/SharedArchiveFile/DumpSymbolAndStringTable.java:
>> 43 output.shouldContain("DumpSymbolAndStringTable");
>> 48 output.shouldContain("java.lang.String");Maybe these two
>> should include the full line, and the terminating \n?
>
> Sounds good. Added.
>
>> Also, why is this necessary for VM.stringtable but not done for
>> VM.symboltable?
>>
>> 49 } catch (RuntimeException e) {
>> 50 output.shouldContain("Unknown diagnostic command");
>
> Fixed for VM.symboltable. Thanks for catching that.
>
>>
>> (3) src/share/vm/classfile/compactHashtable.cpp:
>>
>> 50 stats->hashentry_bytes = _num_entries * 8; // estimates only
>>
>> Is this always an conservative estimate (more than actually needed)?
>
> Yes. Some entries will have only the 4-byte offset, but we don't know
> how many entries at this point. So we use a conservative estimate.
>
>> This should be explained in the comments.
>
> Will do.
>
>> Also, if the actual number is different, stats->hashentry_bytes
>> should be adjusted so that the statistics can be printed correctly.
>
> Ok.
>
>>
>> Indentation: 4->2
>> 134 assert(bucket_type == COMPACT_BUCKET_TYPE, "Bad bucket
>> type");
>
> Fixed.
>
>>
>> Is the a reason to use guarantee here and not assert (I realize that
>> I wrote these two lines, and I don't remember why :-)?
>> 142 guarantee(deltax < max_delta, "range check");
>> 147 guarantee(count == _bucket_sizes[index], "sanity");
>
> I'm guessing you wanted to have the check in non-debug build also.
>
>>
>> I think these two blocks can be rewritten to avoid the use of the #ifdef
>> 162 #ifdef _LP64
>> 163 *p++ = juint(base_address >> 32);
>> 164 #else
>> 165 *p++ = 0;
>> 166 #endif
>> 167 *p++ = juint(base_address & 0xffffffff); // base address
>>
>> 205 juint upper = *p++;
>> 206 juint lower = *p++;
>> 207 #ifdef _LP64
>> 208 _base_address = (uintx(upper) << 32 ) + uintx(lower);
>> 209 #else
>> 210 _base_address = uintx(lower);
>> 211 #endif
>>
>> ->
>>
>> 163 *p++ = juint(base_address >> 32);
>> 167 *p++ = juint(base_address & 0xffffffff);
>>
>> 205 juint upper = *p++;
>> 206 juint lower = *p++;
>> 208 _base_address = (uintx(upper) << 32 ) + uintx(lower);
>
> Sounds good.
>
>>
>> Also, do you have statistics of the percentage of COMPACT_BUCKET_TYPE
>> vs REGULAR_BUCKET_TYPE?
>
> Yes. :) With the default classlist, the compact bucket is about 6%.
Here is better formatted table:
empty buckets 1 entry buckets 2 entries buckets 3 entries buckets 4
entries buckets 5 entries buckets 6 entries buckets 7 entries
buckets 8 entries buckets >8 entries
25 71
172
247 229 158 108 87 29 26
Thanks,
Jiangli
>
>
> empty buckets 1 entry buckets
> 2 entries buckets
> 3 entries buckets
> 4 entries buckets
> 5 entries buckets
> 6 entries buckets
> 7 entries buckets
> 8 entries buckets
> >8 entries
> Bucket size 4
> 25
> 71
> 172
> 247
> 229
> 158
> 108
> 87
> 29
> 26
>
>
> Compressed
>
>
>
>
>
>
>
>
>
>
> Thanks,
> Jiangli
>
>>
>> Thanks
>> - Ioi
>>
>>
>> On 12/2/14, 1:44 PM, Jiangli Zhou wrote:
>>> Hi,
>>>
>>> I finally got back to this, sorry for the delay. Please see the
>>> following new webre.
>>>
>>> http://cr.openjdk.java.net/~jiangli/8059510/webrev.06/
>>>
>>> New in the webrev:
>>>
>>> 1. Further compression of the compact table
>>>
>>> * Remove the bucket_size table. With the sequential layout of the
>>> buckets, lookup process can seek to the start of the next bucket
>>> without the need of the current bucket size. For the last bucket,
>>> it can seek to the end of the table. The table end offset is
>>> added to the archived data.
>>> * Bucket with exactly one entry is marked as 'compact' bucket,
>>> whose entry only contains the symbol offset. The symbol hash is
>>> eliminated for 'compact' buckets. Lookup compares the symbol
>>> directly in that case.
>>>
>>> 2. The shared symbol table is not always looked up first. The last
>>> table that fetches symbol successfully is used for lookup.
>>>
>>> 3. Added a lot more comments in compactHashtable.hpp with details of
>>> the compact table layout and dump/lookup process.
>>>
>>> I measured using the classloading benchmark that Aleksey pointed to
>>> me. This benchmark loads classes using user defined classloader.
>>> There is a very small degradation shown in the benchmark comparing
>>> 'before' and 'after' with archive dumped with the default
>>> configuration. When symbols from the test is added to the shared
>>> table, there is an observable speedup in the benchmark. The speedup
>>> is also very small.
>>>
>>> Thanks,
>>> Jiangli
>>>
>>>
>>> On 10/29/2014 02:55 PM, Jiangli Zhou wrote:
>>>> Hi John,
>>>>
>>>> Thank you for the thoughts on this! Yes, it's a good time to have
>>>> these conversations. Please see some quick responses from me below,
>>>> with more details to follow.
>>>>
>>>> On 10/29/2014 12:46 PM, John Rose wrote:
>>>>> I have a few points about the impact of this change on startup
>>>>> performance, and on trends in our code base:
>>>>>
>>>>> 1. We can live with small performance regressions on some
>>>>> benchmarks. Otherwise we'd never get anywhere. So I am not
>>>>> saying that the current (very interesting and multi-facted)
>>>>> conversation must continue a long time before we can push any code.
>>>>>
>>>>> 2. Aleksey's challenge is valid, and I don't see a strong reply to
>>>>> it yet. Work like this, that emphasizes compactness and
>>>>> sharability can usually be given the benefit of the doubt for
>>>>> startup. But if we observe a problem we should make a more
>>>>> careful measurement. If this change goes in with a measurable
>>>>> regression, we need a follow-up conversation (centered around a
>>>>> tracking bug) about quantifying the performance regression and
>>>>> fixing it. (It's possible the conversation ends by saying "we
>>>>> decided we don't care and here's why", but I doubt it will end
>>>>> that way.)
>>>>
>>>> Besides my classloading benchmark results posted in earlier
>>>> message, we have asked performance team to help us measure startup,
>>>> classloding, and memory saving regarding this change, and verified
>>>> there was no regression in startup/classloading. The results were
>>>> not posted in the thread however. They were added to the bug report
>>>> for JDK-8059510 <https://bugs.openjdk.java.net/browse/JDK-8059510>.
>>>>
>>>>> 3. The benchmark Aleksey chose, Nashorn startup, may well be an
>>>>> outlier. Dynamic language runtimes create lots of tiny methods,
>>>>> and are likely to put heavy loads on symbol resolution, including
>>>>> resolution of symbols that are not known statically. For these,
>>>>> the first-look (front-end) table being proposed might be a loss,
>>>>> especially if the Nashorn runtime is not in the first-look table.
>>>>>
>>>>> 4. Possible workarounds: #1 Track hit rates and start looking at
>>>>> the static table *second* if it fails to produce enough hits,
>>>>> compared to the dynamic table. #2 When that happens,
>>>>> incrementally migrate (rehash) frequently found symbols in the
>>>>> static table into the main table. #3 Re-evaluate the hashing
>>>>> algorithm. (More on this below.)
>>>>
>>>> That's interesting thinking. I'll follow up on this.
>>>>
>>>>> 5. I strongly support moving towards position-independent,
>>>>> shareable, pointer-free data structures, and I want us to learn to
>>>>> do it well over time. (Ioi, you are one of our main explorers
>>>>> here!) To me, a performance regression is a suggestion that we
>>>>> have something more to learn. And, it's not a surprise that we
>>>>> don't have it all figured out yet.
>>>>
>>>> Point taken! Position-independent is one of our goal for the
>>>> archived data. We'll be start looking into removing direct pointers
>>>> in the shared class data.
>>>>
>>>>> 6. For work like this, we need to agree upon at least one or two
>>>>> startup performance tests to shake out bottlenecks early and give
>>>>> confidence. People who work on startup performance should know how
>>>>> to run them and be able to quote them.
>>>>
>>>> Thank you for bring this up. Totally agree. For me, personally I've
>>>> played with different benchmarks and applications with different
>>>> focus over time. It would be good to have some commonly agreed
>>>> startup performance tests for this. A standalone classloading
>>>> benchmark, HelloWorld and Nashorn startup probably are good
>>>> choices. We also have an internal application for startup and
>>>> memory saving measurement.
>>>>
>>>>> 7. There's a vibrant literature about offline (statically created)
>>>>> hash tables, and lots of tricks floating around, such as perfect
>>>>> or semi-perfect hash functions, and multiple-choice hashing, and
>>>>> locality-aware structures. I can think of several algorithmic
>>>>> tweaks I'd like to try on this code. (If they haven't already
>>>>> been tried or discarded: I assume Ioi has already tried a number
>>>>> of things.) Moreover, this is not just doorknob-polishing,
>>>>> because (as said in point 5) we want to develop our competency
>>>>> with these sorts of data structures.
>>>>>
>>>>> 8. I found the code hard to read, so it was hard to reason about
>>>>> the algorithm. The core routine, "lookup", has no comments and
>>>>> only one assert. It uses only primitive C types and (perhaps in
>>>>> the name of performance) is not factored into sub-functions. The
>>>>> code which generates the static table is also hard to reason about
>>>>> in similar ways. The bucket size parameter choice (a crucial
>>>>> compromise between performance and compactness) is 4 but the
>>>>> motivation and implications are left for the reader to puzzle
>>>>> out. Likewise, the use of hardware division instead of
>>>>> power-of-two table size is presented without comment, despite the
>>>>> fact that we favor power-of-two in other parts of our stack, and a
>>>>> power-of-two solution would be reasonable here (given the bucket
>>>>> size).
>>>>>
>>>>> 9. Perhaps I wouldn't care as much about the code style if the
>>>>> code were not performance sensitive or if it were a unique and
>>>>> isolated part of the JVM. In this case, I expect this code (and
>>>>> other similar code) to improve, over time, in readability and
>>>>> robustness, as we learn to work with this new kind of data
>>>>> structure. So even if we decide there is no significant
>>>>> regression here, and decide to push it as-is, we still need to use
>>>>> it as an example to help us get better at writing easy-to-read
>>>>> code which works with pointer-free data.
>>>>>
>>>>> 10. I would like to see (posted somewhere or attached to the bug)
>>>>> a sample list of the symbols in a typical symbol table. Perhaps
>>>>> this already exists and I missed it. I think it would be friendly
>>>>> to put some comments in the code that help the reader estimate
>>>>> numbers like table size, bucket length, number of queries, number
>>>>> of probes per query, symbol length statistics (mean, histogram).
>>>>> Of course such comments go stale over time, but so does the
>>>>> algorithm, and being coy about the conditions of the moment
>>>>> doesn't help us in the long run. Even a short comment is better
>>>>> than none, for example:
>>>>> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/classfile/vmSymbols.cpp#l206
>>>>
>>>> Those are good suggestions. I'll add a sample list o the symbols to
>>>> the bug report and try to put more comments in the code.
>>>>
>>>> Thanks!
>>>>
>>>> Jiangli
>>>>
>>>>> It is a good time to have these conversations.
>>>>>
>>>>> Best wishes,
>>>>> — John
>>>>>
>>>>> On Oct 13, 2014, at 11:46 AM, Aleksey
>>>>> Shipilev<aleksey.shipilev at oracle.com> wrote:
>>>>>
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> On 13.10.2014 18:26, Jiangli Zhou wrote:
>>>>>>> On 10/13/2014 03:18 AM, Aleksey Shipilev wrote:
>>>>>>>> On 13.10.2014 03:32, David Holmes wrote:
>>>>>>>>> On 11/10/2014 1:47 PM, Jiangli Zhou wrote:
>>>>>>>>> Also is the benchmarking being done on dedicated systems?
>>>>>>>> Also, specjvm98 is meaningless to estimate the classloading costs.
>>>>>>>> Please try specjvm2008:startup.* tests?
>>>>>>> The specjvm run was for Gerard's question about standard
>>>>>>> benchmarks.
>>>>>> SPECjvm2008 is a standard benchmark. In fact, it is a benchmark that
>>>>>> deprecates SPECjvm98.
>>>>>>
>>>>>>> These are not benchmarks specifically for classloading.
>>>>>> There are benchmarks that try to estimate the startup costs.
>>>>>> SPECjvm2008:startup.* tests are one of them.
>>>>>>
>>>>>>> However, I agree it's a good idea to run standard benchmarks to
>>>>>>> confirm there is no overall performance degradation. From all the
>>>>>>> benchmarks including classloading measurements, we have confirmed
>>>>>>> that this specific change does not have negative impact on
>>>>>>> classloading itself and the overall performance.
>>>>>> Excellent. What are those benchmarks? May we see those? Because I
>>>>>> have a
>>>>>> counter-example in this thread that this change *does* negatively
>>>>>> impact
>>>>>> classloading.
>>>>>>
>>>>>> -Aleksey.
>>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list