RFR JDK-8059510 Compact symbol table layout inside shared archive

Mon Oct 13 16:31:17 UTC 2014

Hi David,

On 10/12/2014 06:32 PM, David Holmes wrote:
> On 11/10/2014 1:47 PM, Jiangli Zhou wrote:
>> On 10/10/2014 04:18 PM, Ioi Lam wrote:
>>>
>>> On 10/10/14, 2:06 PM, Jiangli Zhou wrote:
>>>> Hi Gerard,
>>>>
>>>> On 10/10/2014 01:44 PM, Gerard Ziemski wrote:
>>>>> hi Jiangli,
>>>>>
>>>>> On 10/10/2014 3:10 PM, Jiangli Zhou wrote:
>>>>>> Hi Gerard,
>>>>>>
>>>>>> On 10/10/2014 08:12 AM, Gerard Ziemski wrote:
>>>>>>> hi Jiangli,
>>>>>>>
>>>>>>> On 10/9/2014 2:11 PM, Jiangli Zhou wrote:
>>>>>>>> Hi Gerard,
>>>>>>>>
>>>>>>>> Thank you very much for the review. Please see my comments below.
>>>>>>>>
>>>>>>>> On 10/09/2014 08:04 AM, Gerard Ziemski wrote:
>>>>>>>>> hi Jiangli,
>>>>>>>>>
>>>>>>>>> I'm a reviewer with small "r" and I'm still going through your
>>>>>>>>> code and learning as I go, but so far I have 2 items as my
>>>>>>>>> feedback/questions:
>>>>>>>>>
>>>>>>>>> #1 Re: "SymbolTable::lookup”
>>>>>>>>>
>>>>>>>>>  Symbol* SymbolTable::lookup(int index, const char* name,
>>>>>>>>>                                int len, unsigned int hash) {
>>>>>>>>> +  Symbol* s = _shared_table.lookup(name, hash, len);
>>>>>>>>> +  if (s != NULL) {
>>>>>>>>> +    return s;
>>>>>>>>> +  }
>>>>>>>>> +
>>>>>>>>>    int count = 0;
>>>>>>>>>    for (HashtableEntry<Symbol*, mtSymbol>* e = bucket(index); e
>>>>>>>>> != NULL; e = e->next()) {
>>>>>>>>>      count++;  // count all entries in this bucket, not just
>>>>>>>>> ones with same hash
>>>>>>>>>      if (e->hash() == hash) {
>>>>>>>>>        Symbol* sym = e->literal();
>>>>>>>>>
>>>>>>>>> a) Do we need to evaluate the lookup time performance, now that
>>>>>>>>> some entries will have to be looked up in 2 separate tables in
>>>>>>>>> "SymbolTable::lookup"?
>>>>>>>>>
>>>>>>>>> b) Shared table is being looked at 1st, is this the case we 
>>>>>>>>> expect?
>>>>>>>>
>>>>>>>> Those are very good questions. The shared symbol table lookup are
>>>>>>>> fast since we can very efficiently locate the specific bucket
>>>>>>>> with pre-calculated bucket sizes. The shared table is searched
>>>>>>>> first because the symbols contained in that are from archived
>>>>>>>> classes, which are the ones used during bootstrap (by default).
>>>>>>>> Separating the symbols into two sets do introducing some
>>>>>>>> overhead. In this case, I think the effect is negligible.  The
>>>>>>>> data from Aleksey's benchmark for classloading showed very small
>>>>>>>> difference between the patched and non-patched version.
>>>>>>>
>>>>>>> You might be very well right that the performance hit is
>>>>>>> negligible, but my point is that you haven't shown that this issue
>>>>>>> isn't a problem by backing it up with actual performance data. You
>>>>>>> use Aleksey's own benchmark to prove your point, which only came
>>>>>>> up during the review and which actually shows the opposite (though
>>>>>>> only a slight regression). I would think that we need real
>>>>>>> performance data that will prove your assumptions without any 
>>>>>>> doubt.
>>>>>>
>>>>>> You have a very good point. I apologize for not providing my
>>>>>> first-hand benchmark data. Here are some classloading benchmark
>>>>>> results on linux-i586 and linux-arm (soft-float vfp) platforms.
>>>>>> 17436 classes were loaded from bootclasspath. For both before and
>>>>>> after, the shared archive were used. 10 samples were collected for
>>>>>> both before and after.
>>>>>>
>>>>>> *Linux ARMv7 tegra board*
>>>>>> Before(average): 7.9505s
>>>>>> After(average)   :  7.8601s
>>>>>>
>>>>>> *Linux Intel i5*
>>>>>> Before(average): 1.2162s
>>>>>> After(average)   : 1.1457s
>>>>>
>>>>> This looks promising, but it also looks like a specialized benchmark
>>>>> designed to test shared archive behavior. Do we have performance
>>>>> regressions numbers from standard benchmarks (ie. refworkload) that
>>>>> do not use shared archive path?
>>>>
>>>> The test used was designed for benchmarking classloading speed, not
>>>> specifically for testing shared archive behavior. Shared archive was
>>>> used for both before and after because the shared symbol table would
>>>> only be used in that case. The potential performance impact of
>>>> looking up the shared symbol table would only manifest in that case.
>>>> When class data sharing is not enabled, the shared symbol table is
>>>> not used at all.
>>>>
>>>> I'll run specjvm with reworkload.
>>>>
>>> I remember I ran a bunch of refworkload before and there was no
>>> significant difference before/after this change. But I can't seem to
>>> find the e-mail now :-(
>>
>> Here are the spejvm runs on the ARMv7 tegra board. There is no
>> measurable lose with the change.
>>
>> ============================================================================== 
>>
>>
>> logs.specjvm.before:
>>    Benchmark           Samples        Mean     Stdev Geomean Weight
>>    specjvm98                 8       81.33      1.47
>> ============================================================================== 
>>
>>
>> logs.specjvm.after:
>>    Benchmark           Samples        Mean     Stdev   %Diff P
>> Significant
>>    specjvm98                 8       81.72      0.70    0.48
>> 0.509            *
>> ============================================================================== 
>>
>
> Sample size is too small to give meaningful results.

Please see my other email regarding the sample size for specjvm.

>
> Also is the benchmarking being done on dedicated systems?

I don't know which systems are dedicated. The device that I used for 
above runs was a quiet machine, no other application was running at the 
time. All the binaries and benchmarks were local and not through NFS 
mount. That's usually considered as good benchmark environment.

Thanks,
Jiangli

>
> Thanks,
> David
>
>>
>> Thanks,
>> Jiangli
>>
>>>
>>> - Ioi
>>>
>>>
>>>
>>