RFR JDK-8059510 Compact symbol table layout inside shared archive

Fri Oct 10 23:06:08 UTC 2014

On 10/10/14, 1:29 AM, Aleksey Shipilev wrote:
> Hi Jiangli!
>
> On 10/09/2014 11:51 PM, Jiangli Zhou wrote:
>>> Anyhow, running the classloading benchmark from JDK-8053904 on
>>> Nashorn-generated class files, using the -Xshare:on in both cases,
>>> yields a small degradation:
>>>
>>>    current: 351 +- 2 ms/op
>>>    patched: 357 +- 2 ms/op
>>>
>>> Therefore, I have to ask: what do we try to gain here?
>> Thank you so much for looking into this! The main goal here is for
>> memory saving. There are two benefits of the separate compact table. One
>> is making the shared table read-only by separating it from the runtime
>> table as the runtime symbol table might be rehashed. Making the shared
>> table read-only avoids write into the memory region and improves memory
>> sharing. The other one is smaller entries in the shared table. The
>> reduction was quite big. The original table uses 24-byte entries on
>> 64-bit machine and 12-byte entries on 32-bit machine, while the new
>> table uses 8-byte for each entry.
> I understand why the footprint may be better, but do we have an
> observable improvement that justifies doing this? I wouldn't bother if
> there was no performance implications: in fact, most footprint changes
> we do implicitly improve the performance because of better locality, etc.
>
> But, the test above gives 2% degradation in class loading performance,
> and that does not sound as improvement... We seem to trade this in for
> better footprint, but optimizing footprint just for the sake of it does
> not sound like a good approach to me.
>
> I have to wonder if we should instead invest into optimizing the
> SymbolTable footprint instead of patching it up with front-end
> compressed map. We can dig up the story about Long front-cache in
> java.util.HashMap -- which helped in some narrow cases, but was largely
> a big performance and maintainability nuisance. I would not like us to
> redo the same in native hash tables.
>
> Thanks,
> -Aleksey.

Jiangli & Aleksey,

You can see the size of the symbol table by doing -XX:+PrintSharedSpaces 
during dumping, and add up the SymbolHashentry and SymbolBuckets bytes.

This change has no impact on any other memory sizes, except for these 
two blocks.

You can do two dumps with before/after VMs to see the size differences.

In both before/after cases, these two blocks are contiguous in memory. 
They are accessed in small pieces (each hash entry is about 10 ~ 32 
bytes) and accessed in a random pattern. For any non-trivial programs, 
most of the pages of these two blocks will be completely paged in.

So the before/after size difference will be a good indication of the 
change in memory pressure.

Here's an example BEFORE dump of -Xshare:dump with the default class list.

Detailed metadata info (rw includes md and mc):
                         ro_cnt   ro_bytes     % | rw_cnt   rw_bytes     
% |  all_cnt  all_bytes     %
--------------------+---------------------------+---------------------------+--------------------------
Unknown             :        1         40   0.0 | 1         40   0.0 
|        2         80   0.0
Class               :        0          0   0.0 | 2455    1911272  15.5 
|     2455    1911272   9.9
Symbol              :    50347    1895592  27.2 | 0          0   0.0 
|    50347    1895592   9.8
TypeArrayU1         :    14051     513880   7.4 | 2454     404288   3.3 
|    16505     918168   4.8
TypeArrayU2         :     5110     293280   4.2 | 0          0   0.0 
|     5110     293280   1.5
TypeArrayU4         :     2560     161280   2.3 | 0          0   0.0 
|     2560     161280   0.8
TypeArrayU8         :     3930     340792   4.9 | 0          0   0.0 
|     3930     340792   1.8
TypeArrayOther      :        0          0   0.0 | 0          0   0.0 
|        0          0   0.0
Method              :        0          0   0.0 | 33568    2968496  24.1 
|    33568    2968496  15.4
ConstMethod         :    33568    3763296  54.0 | 0          0   0.0 
|    33568    3763296  19.5
MethodData          :        0          0   0.0 | 0          0   0.0 
|        0          0   0.0
ConstantPool        :        0          0   0.0 | 2454    3221848  26.2 
|     2454    3221848  16.7
ConstantPoolCache   :        0          0   0.0 | 2441    2186736  17.8 
|     2441    2186736  11.3
Annotation          :       30        960   0.0 | 0          0   0.0 
|       30        960   0.0
MethodCounters      :        0          0   0.0 | 0          0   0.0 
|        0          0   0.0
Deallocated         :        0          0   0.0 | 0          0   0.0 
|        0          0   0.0
SymbolHashentry     :        0          0   0.0 | 50347    1208328   9.8 
|    50347 *1208328* 6.3  <<HERE
SymbolBuckets       :        0          0   0.0 | 20011     160088   1.3 
|    20011 *160088* 0.8  <<HERE
Other               :        0          0   0.0 | 0     253597   2.1 
|        0     253597   1.3
--------------------+---------------------------+---------------------------+--------------------------
Total               :   109597    6969120 100.0 | 113731   12314693 
100.0 |   223328   19283813 100.0

We have a large test case where the SymbolHashentry is 6,774,528 bytes 
in the BEFORE case (282,272 entries), so an after case will save about 
4MB in runtime memory usage.

- Ioi