RFR JDK-8059510 Compact symbol table layout inside shared archive

Thu Oct 9 15:04:11 UTC 2014

hi Jiangli,

I'm a reviewer with small "r" and I'm still going through your code and 
learning as I go, but so far I have 2 items as my feedback/questions:

#1 Re: "SymbolTable::lookup”

  Symbol* SymbolTable::lookup(int index, const char* name,
                                int len, unsigned int hash) {
+  Symbol* s = _shared_table.lookup(name, hash, len);
+  if (s != NULL) {
+    return s;
+  }
+
    int count = 0;
    for (HashtableEntry<Symbol*, mtSymbol>* e = bucket(index); e != NULL; e = e->next()) {
      count++;  // count all entries in this bucket, not just ones with same hash
      if (e->hash() == hash) {
        Symbol* sym = e->literal();

a) Do we need to evaluate the lookup time performance, now that some entries will have to be looked up in 2 separate tables in "SymbolTable::lookup"?

b) Shared table is being looked at 1st, is this the case we expect?

#2 Re: "compact_table_size"

int compact_table_size(int num_entries) {

   const int buksize = (int)SharedSymbolTableBucketSize;
   int num_buckets = (num_entries + buksize - 1) / buksize;

   // make sure it's a multiple of 2, so we can easily skip over
   // the compact_bucket_sizes table.
   num_buckets = (num_buckets + 1) & (~0x01);

   return num_buckets;
}

a) I found “compact_table_size” hard to understand: can it be implemented by easier to read code, perhaps the following:

int compact_table_size(int num_entries) {
   int num_buckets = num_entries/SharedSymbolTableBucketSize;

   // make sure it's a multiple of 2, so we can easily skip over
   // the compact_bucket_sizes table.
   return num_buckets + (num_buckets%2);
}

cheers

On 10/6/2014 9:04 PM, Jiangli Zhou wrote:
> Hi Ioi and John,
>
> Here is the updated webrev that uses 'base_address' for the shared 
> compact symbol table on both 32-bit and 4-bit platforms:
>
> http://cr.openjdk.java.net/~jiangli/8059510/webrev.01/
>
> Thanks,
> Jiangli
>
> On 10/03/2014 09:29 AM, Jiangli Zhou wrote:
>> I had the same thought as John about the symbol entry for LP64 & 
>> !LP64 briefly, when I took over the code. But didn't pursue further. 
>> I agree the differentiation between LP64 & !LP64 here is not 
>> necessary. I'll change it.
>>
>> John, thanks for the review and comments.
>>
>> Thanks,
>> Jiangli
>>
>> On 10/02/2014 10:12 PM, Ioi Lam wrote:
>>> Hi John,
>>>
>>> I agree that the extra addition instruction in 32-bit would be 
>>> noise. I think it's OK to remove the #ifdef and always use the offset.
>>>
>>> If there were more savings in 32-bit (like if it were possible to 
>>> save an extra 2 bytes), then I would put up a fight, but I will 
>>> concede this one :-)
>>>
>>> Thanks
>>> - Ioi
>>>
>>> On 10/2/14, 6:57 PM, John Rose wrote:
>>>> I like the new compact position-independent format, but I am 
>>>> uncomfortable with
>>>> adding #ifdefs unless there is a good reason to do so. What's the 
>>>> reason here?
>>>>
>>>> Maybe this is really a question for Ioi, but why use two data 
>>>> structures when one will do the job?
>>>>
>>>> That is, if you have to implement and support the 
>>>> position-independent format for LP64,
>>>> why not use it also for !LP64?  The extra "addl" instructions and 
>>>> table base pointer are noise.
>>>>
>>>> And it's not just one localized #ifdef; there are several in the 
>>>> proposed changeset.
>>>> If we do relocatable images in the future, the divergent relocation 
>>>> rules will cause even more.
>>>>
>>>> Overall, we should be supporting both 32- and 64-bit systems in 
>>>> common code,
>>>> and more so over time, not splitting new code with #ifdefs.
>>>>
>>>> — John
>>>>
>>>> P.S. One might think, "what's another #ifdef when there are so many?".
>>>> It's a judgement call, of course.  But note these two grep counts:
>>>> $ cat $(hg loc -I src/share/vm) | grep -c '#.*LP64'
>>>> 236
>>>> $ cat ~/Downloads/hotspot-7.patch | grep -c '#.*LP64'
>>>> 9
>>>> The proposed change adds, all by itself, 4% to our #ifdef load for 
>>>> LP64.
>>>>
>>>> On Oct 2, 2014, at 2:33 PM, Jiangli Zhou <jiangli.zhou at oracle.com> 
>>>> wrote:
>>>>
>>>>> Please review the webrev for JDK-8059510 
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8059510> for JDK9: 
>>>>> http://cr.openjdk.java.net/~jiangli/8059510/webrev.00/.
>>>>>
>>>>> The shared classes in the CDS archive and runtime loaded classes 
>>>>> used the same symbol table, which was a hashtable with 24-byte 
>>>>> entries on 64-bit machine or 12-byte entries on 32-bit machine. It 
>>>>> also used a pointer for bucket slot. In the webrev, we separate 
>>>>> the symbol table for shared classes and runtime classes into two. 
>>>>> While the runtime symbole table remain unchanged, the shared 
>>>>> classes use a much compact table, which uses 8-byte per entry on 
>>>>> both 32-bit and 64-bit machines. Each entry contains the symbol 
>>>>> hash (4-byte). On 32-bit machine, it contains the pointer (4-byte) 
>>>>> to the symbol. On 64-bit machine, it uses 4-byte offset from the 
>>>>> base of the table.
>>>>>
>>>>> // juint hash;
>>>>> //#ifdef _LP64
>>>>> // juint offset; /* Symbol *sym = (Symbol*)(SharedBaseAddress + 
>>>>> offset) */
>>>>> //#else
>>>>> // Symbol* sym;
>>>>> //#endif
>>>>>
>>>>>
>>>>> The shared symbol lookup is quick. The targeting bucket is 
>>>>> calculated using the hash (bucket index = hash % _bucket_count). 
>>>>> The bucket sizes are pre-calculated and also stored in the archive 
>>>>> along with the symbol table. So we don't need to calculate the 
>>>>> bucket sizes at runtime.
>>>>>
>>>>> The separate shared symbol table in the archive is now read-only 
>>>>> during runtime. No entry is added/removed from the shared symbol 
>>>>> table. Rehashing of the runtime symbol table does not affect the 
>>>>> shared symbol table in the archive either. This helps memory 
>>>>> sharing by avoid writes to the shared memory.
>>>>>
>>>>> As part of the change, two dumping utilities were added to jcmd 
>>>>> for dumping symbol table and string table.
>>>>>
>>>>> The majority of the code in the webrev were contributed by Ioi Lam.
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>
>>>>>
>>>
>>
>
>