RFR JDK-8059510 Compact symbol table layout inside shared archive

Fri Oct 10 20:10:41 UTC 2014

Hi Gerard,

On 10/10/2014 08:12 AM, Gerard Ziemski wrote:
> hi Jiangli,
>
> On 10/9/2014 2:11 PM, Jiangli Zhou wrote:
>> Hi Gerard,
>>
>> Thank you very much for the review. Please see my comments below.
>>
>> On 10/09/2014 08:04 AM, Gerard Ziemski wrote:
>>> hi Jiangli,
>>>
>>> I'm a reviewer with small "r" and I'm still going through your code 
>>> and learning as I go, but so far I have 2 items as my 
>>> feedback/questions:
>>>
>>> #1 Re: "SymbolTable::lookup”
>>>
>>>  Symbol* SymbolTable::lookup(int index, const char* name,
>>>                                int len, unsigned int hash) {
>>> +  Symbol* s = _shared_table.lookup(name, hash, len);
>>> +  if (s != NULL) {
>>> +    return s;
>>> +  }
>>> +
>>>    int count = 0;
>>>    for (HashtableEntry<Symbol*, mtSymbol>* e = bucket(index); e != 
>>> NULL; e = e->next()) {
>>>      count++;  // count all entries in this bucket, not just ones 
>>> with same hash
>>>      if (e->hash() == hash) {
>>>        Symbol* sym = e->literal();
>>>
>>> a) Do we need to evaluate the lookup time performance, now that some 
>>> entries will have to be looked up in 2 separate tables in 
>>> "SymbolTable::lookup"?
>>>
>>> b) Shared table is being looked at 1st, is this the case we expect?
>>
>> Those are very good questions. The shared symbol table lookup are 
>> fast since we can very efficiently locate the specific bucket with 
>> pre-calculated bucket sizes. The shared table is searched first 
>> because the symbols contained in that are from archived classes, 
>> which are the ones used during bootstrap (by default). Separating the 
>> symbols into two sets do introducing some overhead. In this case, I 
>> think the effect is negligible. The data from Aleksey's benchmark for 
>> classloading showed very small difference between the patched and 
>> non-patched version.
>
> You might be very well right that the performance hit is negligible, 
> but my point is that you haven't shown that this issue isn't a problem 
> by backing it up with actual performance data. You use Aleksey's own 
> benchmark to prove your point, which only came up during the review 
> and which actually shows the opposite (though only a slight 
> regression). I would think that we need real performance data that 
> will prove your assumptions without any doubt.

You have a very good point. I apologize for not providing my first-hand 
benchmark data. Here are some classloading benchmark results on 
linux-i586 and linux-arm (soft-float vfp) platforms. 17436 classes were 
loaded from bootclasspath. For both before and after, the shared archive 
were used. 10 samples were collected for both before and after.

*Linux ARMv7 tegra board*
Before(average): 7.9505s
After(average)   :  7.8601s

*Linux Intel i5*
Before(average): 1.2162s
After(average)   : 1.1457s

On both x86 and ARM platforms, the results do not show any performance 
degradation. After actually had better results on both platforms.

Here are the raw data:

*Linux ARMv7 tegra board*
Before
====
7.893s
8.141s
7.821s
7.849s
7.707s
8.223s
7.791s
7.783s
8.269s
8.028s

1.211s
1.213s
1.208s
1.197s
1.232s
1.229s
1.221s
1.215s
1.221s
1.215s

*Linux Intel i5*
Before
====
1.211s
1.213s
1.208s
1.197s
1.232s
1.229s
1.221s
1.215s
1.221s
1.215s

After
===
1.146s
1.138s
1.135s
1.142s
1.138s
1.176s
1.150s
1.147s
1.147s
1.138s

>
>
>>>
>>> #2 Re: "compact_table_size"
>>>
>>> int compact_table_size(int num_entries) {
>>>
>>>   const int buksize = (int)SharedSymbolTableBucketSize;
>>>   int num_buckets = (num_entries + buksize - 1) / buksize;
>>>
>>>   // make sure it's a multiple of 2, so we can easily skip over
>>>   // the compact_bucket_sizes table.
>>>   num_buckets = (num_buckets + 1) & (~0x01);
>>>
>>>   return num_buckets;
>>> }
>>>
>>> a) I found “compact_table_size” hard to understand: can it be 
>>> implemented by easier to read code, perhaps the following:
>>>
>>> int compact_table_size(int num_entries) {
>>>   int num_buckets = num_entries/SharedSymbolTableBucketSize;
>>>
>>>   // make sure it's a multiple of 2, so we can easily skip over
>>>   // the compact_bucket_sizes table.
>>>   return num_buckets + (num_buckets%2);
>>> }
>>
>> The code you proposed above would allocate too many buckets in most 
>> cases. For example, if 'num_entries' is 2000 and 
>> SharedSymbolTableBucketSize is 4, the above would allocate 750 
>> buckets (while the actual number of buckets should be 500).
>
> Actually it returns the correct number, ie. 500:
>
> int num_buckets = 2000/4; // ==> 500
> return 500 + (500%2); // 500 + 0 ==> 500
>
> and in my opinion is much easier to read and understand - just please 
> take a look at how you calculate num_bucket, ie ((x+y-1)/y) vs (x/y) 
> and how you make the number even,  ie. ((x+1) & (~0x01)) vs (x + (x%2))

Sorry I miss read '(num_buckets%2)' as '(num_buckets/2)'. However, there 
is still an issue with the above code. It has more chance of collision 
due to insufficient bucket in some cases (when the num_entries is not 
divisible by SharedSymbolTableBucketSize and the result of the quotient 
is multiple of 2). If num_entries is 2003 and 
SharedSymbolTableBucketSize is 4, the above computation would give 500, 
while ideally there should be at least 501 buckets.

Thank you for looking this in great details! Really appreciate this kind 
of code review. :)

Jiangli

>
>
> cheers