RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes

David Holmes david.holmes at oracle.com
Mon Jun 22 00:20:39 UTC 2015


On 20/06/2015 3:17 PM, Yumin Qi wrote:
> Ioi,
>    Thanks.  I missed the size(...) call. Updated webrev02, convert _body
> from size 1 array to size 2 array and have done tests for it.

Why did you add:

  118  public:

and make a bunch of previously private methods (and the enum) public ??

Thanks,
David


> Thanks
> Yumin
>
> On 6/19/2015 6:39 PM, Ioi Lam wrote:
>> Yumin,
>>
>> The new version of Symbol::size over-calculates the required size by 1
>> byte. That's why your average Symbol size is only reduced from 44.198
>> to 43.317 bytes (a reduction of 0.88 bytes).
>>
>> The code:
>>
>>     jbyte _body[1];
>>     static int size(int length) {
>>        size_t sz = heap_word_size(sizeof(Symbol) + (length > 1 ?
>>     length - 1 : 0));
>>         return align_object_size(sz);
>>     }
>>
>> should be changed to
>>
>>     jbyte _body[2];
>>     static int size(int length) {
>>        size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
>>     length - 2 : 0));
>>        return align_object_size(sz);
>>     }
>>
>> When you have this
>>
>>     class Symbol : public MetaspaceObj {
>>        ATOMIC_SHORT_PAIR(
>>          volatile short _refcount,  // needs atomic operation
>>          unsigned short _length     // number of UTF8 characters in
>>     the symbol (does not need atomic op)
>>        );
>>        short _identity_hash;
>>        jbyte _body[1];
>>     }
>>
>> Even though you have declared only 7 bytes of fields, sizeof(Symbol)
>> is 8 bytes, not 7. That's because the C++ compiler implicitly adds 1
>> byte of padding. So if your string is "ab", Symbol::size() would use
>>
>>      sizeof(Symbol) + (length > 1 ? length - 1 : 0)  ==>  8 + 2 - 1
>> ==> 9 bytes
>>
>> even though 8 bytes are enough.
>>
>> Please see more comments below:
>>
>> On 6/19/15 5:20 PM, Yumin Qi wrote:
>>> Hi, Coleen and Ioi
>>>
>>>   The revised webrev:
>>> http://cr.openjdk.java.net/~minqi/8087143/webrev02/
>>>
>>>   In this version, I have tested with Specjbb2005:
>>>   1) change _body to one byte, contribute _body[0] to the calculation
>>> of identity_hash. No rehash request found during the test. Like the
>>> Warnings mentioned by Coleen in her email;
>>>   2)  Following data is the output, no. of literals is same as no. of
>>> entries which refers to no. of symbol.
>>>
>>> before:
>>> SymbolTable statistics:
>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>> Number of entries       :      2016 =     64512 bytes, avg 32.000
>>> Number of literals      :      2016 =     89104 bytes, avg 44.198
>>> Total footprint         :           =    473792 bytes
>>> Average bucket size     :     0.101
>>> Variance of bucket size :     0.100
>>> Std. dev. of bucket size:     0.316
>>> Maximum bucket size     :         3
>>>
>>> after:
>>> SymbolTable statistics:
>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>> Number of entries       :      2021 =     64672 bytes, avg 32.000
>>> Number of literals      :      2021 =     87544 bytes, avg 43.317
>>> Total footprint         :           =    472392 bytes
>>> Average bucket size     :     0.101
>>> Variance of bucket size :     0.100
>>> Std. dev. of bucket size:     0.316
>>> Maximum bucket size     :         3
>>>
>>>   About no. of literals, the calculation of the size is:
>>>
>>> template <class T, MEMFLAGS F> int RehashableHashtable<T,
>>> F>::literal_size(Symbol *symbol) {
>>>   return symbol->size() * HeapWordSize;
>>> }
>>>
>>>   And the size is just the _length of the symbol which I think is the
>>> length of the characters, not including  Symbol itself. Cannot use
>>> this data to display how many bytes we saved.
>>
>> This is not true.  If it were true, you wouldn't have seen any changes
>> in the "average literal" size printed above.
>>
>> Symbol::size() is defined to be this:
>>
>>   int size()                { return size(utf8_length()); }
>>
>> which calls the static Symbol::size(int) function, which includes the
>> size of the header.
>>
>>> 3) correct a inaccurate calculation of size for Symbol in old version.
>>>       size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
>>> length : 0));
>>>       It will allocate one extra byte when length > 0 since we
>>> already reserve one byte for _body.
>> Your description is incorrect. In the old version, SymbolBase did not
>> include _body. Please see comments in JDK-8009575.
>>
>> So why did we need SymbolBase for 8009575 (the previous version) but
>> don't need it for 8087143 (the current RFE)?
>>
>> After the fix for 8009575, we needed 8 bytes of header for Symbol, so
>> if Symbol was declared as this:
>>
>>     class Symbol : public MetaspaceObj {
>>        ATOMIC_SHORT_PAIR(
>>          volatile short _refcount,  // needs atomic operation
>>          unsigned short _length     // number of UTF8 characters in
>>     the symbol (does not need atomic op)
>>        );
>>        int _identity_hash;
>>        jbyte _body[1];
>>     }
>>
>> It would occupy 9 bytes. Most compilers would automatically pad it up
>> to 12 bytes.
>>
>> I could have written Symbol as the following to consider the
>> pad-to-12-bytes case:
>>
>>     class Symbol : public MetaspaceObj {
>>        ATOMIC_SHORT_PAIR(
>>          volatile short _refcount,  // needs atomic operation
>>          unsigned short _length     // number of UTF8 characters in
>>     the symbol (does not need atomic op)
>>        );
>>        int _identity_hash;
>>        jbyte _body[4];    // align up to 12 bytes
>>
>>        static int size(int length) {
>>          size_t sz = heap_word_size(sizeof(Symbol) + (length > 4 ?
>>     length - 4 : 0));
>>          return align_object_size(sz);
>>        }
>>     }
>>
>> However, the amount of structure padding by the C compiler is not
>> well-defined. Although most of our compiler would pad to 12 bytes,
>> some compiler may decide to pad to 16 byte. While this is highly
>> unlikely, I didn't want to take the chance. Explicitly splitting out a
>> SymbolBase class, which is 8 bytes, would make our size calculation
>> always correct.
>>
>> Anyway, for the current fix, we reduced the Symbol header to only 6
>> bytes. Adding a body[2] will make sizeof(Symbol) to 8 bytes. Virtually
>> no compiler would add any padding to a 8-byte structure, so using
>> sizeof(Symbol) is good enough.
>>
>> Please trace the Symbol::size(int) function inside gdb to verify that
>> it performs the calculation correctly.
>>
>>>      Now we save 3 bytes for every symbol (for symbol whose length is
>>> greater than 1).
>>>   4) remove java.cpp change, which is not needed, as Coleen pointed
>>> out, we can use -XX:+PrintStringTableStatistics to get this data,
>>> which also is a product flag.
>>>   5) for .jsa files:
>>>        specjbb2005-org.jsa:   4026368
>>>        specjbb2005.jsa:          4018176
>>>        Saved about 8K.
>>>
>> Here is the output extracted from "java  -Xshare:dump
>> -XX:+PrintSharedSpaces" on 64-bit Linux.
>>
>> Before any changes:
>>
>> Symbol              :    50913    1925736  27.4 | 0          0   0.0
>> |    50913    1925736  10.5
>>
>> After your webrev02/
>>
>> Symbol              :    50913    1889520  27.0 | 0          0   0.0
>> |    50913    1889520  10.3
>>
>> After changing _body[1] => _body[2]
>>
>> Symbol              :    50913    1856248  26.6 | 0          0   0.0
>> |    50913    1856248  10.1
>>
>> The average Symbol size changed from
>>
>>        37.82 bytes to 37.11 (-0.71) to 36.45 (-1.37)
>>
>> As I mentioned in the bug report, the reduction (1.37) is smaller than
>> the theoretical value (2 bytes) because we have many small symbols,
>> and metaspace.cpp enforces a 24-byte minimum allocation size for each
>> MetaspaceObj on 64-bit platforms.
>>
>> Thanks
>> - Ioi
>>
>>
>>> Thanks
>>> Yumin
>>>
>>>
>>> On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>>>>
>>>> Hi,  This is okay, one comment below.
>>>>
>>>> On 6/19/15 12:21 PM, Yumin Qi wrote:
>>>>> Coleen,
>>>>>
>>>>>   Thanks.
>>>>>
>>>>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>>>>
>>>>>> Yumin,
>>>>>>
>>>>>> If the identity hash isn't unique enough you get a system
>>>>>> dictionary warning like this, which doesn't stop the JVM.   Could
>>>>>> you check your test logs to see if this warning is output in any
>>>>>> of them (jtreg keeps logs around, just checking the jtreg test
>>>>>> results should be sufficient, I think).
>>>>>>
>>>>>> Log : HotSpot(TM) Server VM warning: Performance bug:
>>>>>> SystemDictionary lookup_count=6918 lookup_length=3907
>>>>>> average=0.564759 load=0.266601
>>>>>>
>>>>>> I think with including _body[0] in the function you may make the
>>>>>> hash random enough.   So this was one of my concerns with this
>>>>>> change.
>>>>>>
>>>>> I will check that.
>>>>>> In
>>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>> Are you sure you want all of the SymbolTable dump?  Maybe it
>>>>>> should be under PrintSymbolTableSizeStatistics instead?
>>>>>>
>>>>> I will check and make change based on the output of statistics,
>>>>> maybe it is too much to print out all SymbolTable.
>>>>
>>>> It's way more than anyone needs normally.   Maybe only print it if
>>>> one adds +Verbose to PrintStringTableSizeStatistics and when we do
>>>> logging, it'll be -Xlog:symboltable=trace or a lower logging level
>>>> than info.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>> In
>>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html
>>>>>>
>>>>>>
>>>>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>>>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>>>>
>>>>>> I'm not sure what that comment meant.  With SymbolBase gone (which
>>>>>> I think looks nicer), can you still correctly calculate the space
>>>>>> needed?
>>>>>>
>>>>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>>>>> The original calculation is not correct I think:  if symbol length
>>>>> > 0, we allocate *_ONE_* extra byte for it:
>>>>> -    size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
>>>>> length : 0));
>>>>> +    size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
>>>>> length - 2 : 0));
>>>>>
>>>>>> Other than these things, this change looks good.  Nice to find
>>>>>> some space savings with something so frequent as symbols.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>>>>> Ioi,
>>>>>>>
>>>>>>>   Thanks for review, I will add that (for print out purpose, not
>>>>>>> in the final change?)
>>>>>>>
>>>>>>> Yumin
>>>>>>>
>>>>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>>>>> Hi Yumin,
>>>>>>>>
>>>>>>>> I wonder why the average symbol size didn't change. They are
>>>>>>>> both about 34.25 bytes long.
>>>>>>>>
>>>>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces
>>>>>>>> show about 1.3 bytes reduction per symbol.
>>>>>>>>
>>>>>>>> Also, could you add some code to print out the statistics of the
>>>>>>>> system dictionary to show if there's any change in the hashtable
>>>>>>>> balance?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Please review the small change for
>>>>>>>>>
>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>>>>
>>>>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase),
>>>>>>>>> it is used to compute hash bucket index by modulus division of
>>>>>>>>> table size. Currently in hotspot, no table size is more than
>>>>>>>>> 65535 so we can use short instead.  For case with table size
>>>>>>>>> over 65535 we can use the first two bytes of symbol data to be
>>>>>>>>> as the upper 16 bits for the calculation but rare cases.
>>>>>>>>>
>>>>>>>>> This is a minor change, also eliminates SymbolBase which seems
>>>>>>>>> redundant.  Also add print out SymbolTable statistics under
>>>>>>>>> debug flag PrintSystemDictionaryAtExit.
>>>>>>>>>
>>>>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>>>>
>>>>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>>>>
>>>>>>>>> Before Change:
>>>>>>>>> SymbolTable statistics:
>>>>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>>>>>>> Number of entries       :     49914 =   1597248 bytes, avg 32.000
>>>>>>>>> Number of literals      :     49914 =   1709688 bytes, avg 34.253
>>>>>>>>> Total footprint         :           =   3627112 bytes
>>>>>>>>> Average bucket size     :     2.494
>>>>>>>>> Variance of bucket size :     2.483
>>>>>>>>> Std. dev. of bucket size:     1.576
>>>>>>>>> Maximum bucket size     :        10
>>>>>>>>>
>>>>>>>>> After Change:
>>>>>>>>>
>>>>>>>>> SymbolTable statistics:
>>>>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>>>>>>> Number of entries       :     49911 =   1597152 bytes, avg 32.000
>>>>>>>>> Number of literals      :     49911 =   1709544 bytes, avg 34.252
>>>>>>>>> Total footprint         :           =   3626872 bytes
>>>>>>>>> Average bucket size     :     2.494
>>>>>>>>> Variance of bucket size :     2.483
>>>>>>>>> Std. dev. of bucket size:     1.576
>>>>>>>>> Maximum bucket size     :        10
>>>>>>>>>
>>>>>>>>> There is no big change for the hashtable balance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Yumin
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the hotspot-runtime-dev mailing list