RFR JDK-8059510 Compact symbol table layout inside shared archive

Wed Oct 8 17:11:09 UTC 2014

Thanks, Ioi.

Jiangli

On 10/08/2014 10:07 AM, Ioi Lam wrote:
> Hi Jiangli,
>
> The changes look good to me.
>
> Thanks
> - Ioi
>
> On 10/6/14, 7:04 PM, Jiangli Zhou wrote:
>> Hi Ioi and John,
>>
>> Here is the updated webrev that uses 'base_address' for the shared 
>> compact symbol table on both 32-bit and 4-bit platforms:
>>
>> http://cr.openjdk.java.net/~jiangli/8059510/webrev.01/
>>
>> Thanks,
>> Jiangli
>>
>> On 10/03/2014 09:29 AM, Jiangli Zhou wrote:
>>> I had the same thought as John about the symbol entry for LP64 & 
>>> !LP64 briefly, when I took over the code. But didn't pursue further. 
>>> I agree the differentiation between LP64 & !LP64 here is not 
>>> necessary. I'll change it.
>>>
>>> John, thanks for the review and comments.
>>>
>>> Thanks,
>>> Jiangli
>>>
>>> On 10/02/2014 10:12 PM, Ioi Lam wrote:
>>>> Hi John,
>>>>
>>>> I agree that the extra addition instruction in 32-bit would be 
>>>> noise. I think it's OK to remove the #ifdef and always use the offset.
>>>>
>>>> If there were more savings in 32-bit (like if it were possible to 
>>>> save an extra 2 bytes), then I would put up a fight, but I will 
>>>> concede this one :-)
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>> On 10/2/14, 6:57 PM, John Rose wrote:
>>>>> I like the new compact position-independent format, but I am 
>>>>> uncomfortable with
>>>>> adding #ifdefs unless there is a good reason to do so. What's the 
>>>>> reason here?
>>>>>
>>>>> Maybe this is really a question for Ioi, but why use two data 
>>>>> structures when one will do the job?
>>>>>
>>>>> That is, if you have to implement and support the 
>>>>> position-independent format for LP64,
>>>>> why not use it also for !LP64?  The extra "addl" instructions and 
>>>>> table base pointer are noise.
>>>>>
>>>>> And it's not just one localized #ifdef; there are several in the 
>>>>> proposed changeset.
>>>>> If we do relocatable images in the future, the divergent 
>>>>> relocation rules will cause even more.
>>>>>
>>>>> Overall, we should be supporting both 32- and 64-bit systems in 
>>>>> common code,
>>>>> and more so over time, not splitting new code with #ifdefs.
>>>>>
>>>>> — John
>>>>>
>>>>> P.S. One might think, "what's another #ifdef when there are so 
>>>>> many?".
>>>>> It's a judgement call, of course.  But note these two grep counts:
>>>>> $ cat $(hg loc -I src/share/vm) | grep -c '#.*LP64'
>>>>> 236
>>>>> $ cat ~/Downloads/hotspot-7.patch | grep -c '#.*LP64'
>>>>> 9
>>>>> The proposed change adds, all by itself, 4% to our #ifdef load for 
>>>>> LP64.
>>>>>
>>>>> On Oct 2, 2014, at 2:33 PM, Jiangli Zhou <jiangli.zhou at oracle.com> 
>>>>> wrote:
>>>>>
>>>>>> Please review the webrev for JDK-8059510 
>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8059510> for JDK9: 
>>>>>> http://cr.openjdk.java.net/~jiangli/8059510/webrev.00/.
>>>>>>
>>>>>> The shared classes in the CDS archive and runtime loaded classes 
>>>>>> used the same symbol table, which was a hashtable with 24-byte 
>>>>>> entries on 64-bit machine or 12-byte entries on 32-bit machine. 
>>>>>> It also used a pointer for bucket slot. In the webrev, we 
>>>>>> separate the symbol table for shared classes and runtime classes 
>>>>>> into two. While the runtime symbole table remain unchanged, the 
>>>>>> shared classes use a much compact table, which uses 8-byte per 
>>>>>> entry on both 32-bit and 64-bit machines. Each entry contains the 
>>>>>> symbol hash (4-byte). On 32-bit machine, it contains the pointer 
>>>>>> (4-byte) to the symbol. On 64-bit machine, it uses 4-byte offset 
>>>>>> from the base of the table.
>>>>>>
>>>>>> // juint hash;
>>>>>> //#ifdef _LP64
>>>>>> // juint offset; /* Symbol *sym = (Symbol*)(SharedBaseAddress + 
>>>>>> offset) */
>>>>>> //#else
>>>>>> // Symbol* sym;
>>>>>> //#endif
>>>>>>
>>>>>>
>>>>>> The shared symbol lookup is quick. The targeting bucket is 
>>>>>> calculated using the hash (bucket index = hash % _bucket_count). 
>>>>>> The bucket sizes are pre-calculated and also stored in the 
>>>>>> archive along with the symbol table. So we don't need to 
>>>>>> calculate the bucket sizes at runtime.
>>>>>>
>>>>>> The separate shared symbol table in the archive is now read-only 
>>>>>> during runtime. No entry is added/removed from the shared symbol 
>>>>>> table. Rehashing of the runtime symbol table does not affect the 
>>>>>> shared symbol table in the archive either. This helps memory 
>>>>>> sharing by avoid writes to the shared memory.
>>>>>>
>>>>>> As part of the change, two dumping utilities were added to jcmd 
>>>>>> for dumping symbol table and string table.
>>>>>>
>>>>>> The majority of the code in the webrev were contributed by Ioi Lam.
>>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>
>>>>>>
>>>>
>>>
>>
>