RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes

Yumin Qi yumin.qi at oracle.com
Mon Jun 22 16:36:31 UTC 2015


David,

On 6/21/2015 5:20 PM, David Holmes wrote:
> On 20/06/2015 3:17 PM, Yumin Qi wrote:
>> Ioi,
>>    Thanks.  I missed the size(...) call. Updated webrev02, convert _body
>> from size 1 array to size 2 array and have done tests for it.
>
> Why did you add:
>
>  118  public:
>
> and make a bunch of previously private methods (and the enum) public ??
>
You are right, here we use friend classes instead of declaring the 
members as public to limit usage from other classes.

class Symbol : public MetaspaceObj {
   friend class VMStructs;
   friend class SymbolTable;
   friend class MoveSymbols;

I will modify according to original definition.

Thanks
Yumin

> Thanks,
> David
>
>
>> Thanks
>> Yumin
>>
>> On 6/19/2015 6:39 PM, Ioi Lam wrote:
>>> Yumin,
>>>
>>> The new version of Symbol::size over-calculates the required size by 1
>>> byte. That's why your average Symbol size is only reduced from 44.198
>>> to 43.317 bytes (a reduction of 0.88 bytes).
>>>
>>> The code:
>>>
>>>     jbyte _body[1];
>>>     static int size(int length) {
>>>        size_t sz = heap_word_size(sizeof(Symbol) + (length > 1 ?
>>>     length - 1 : 0));
>>>         return align_object_size(sz);
>>>     }
>>>
>>> should be changed to
>>>
>>>     jbyte _body[2];
>>>     static int size(int length) {
>>>        size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
>>>     length - 2 : 0));
>>>        return align_object_size(sz);
>>>     }
>>>
>>> When you have this
>>>
>>>     class Symbol : public MetaspaceObj {
>>>        ATOMIC_SHORT_PAIR(
>>>          volatile short _refcount,  // needs atomic operation
>>>          unsigned short _length     // number of UTF8 characters in
>>>     the symbol (does not need atomic op)
>>>        );
>>>        short _identity_hash;
>>>        jbyte _body[1];
>>>     }
>>>
>>> Even though you have declared only 7 bytes of fields, sizeof(Symbol)
>>> is 8 bytes, not 7. That's because the C++ compiler implicitly adds 1
>>> byte of padding. So if your string is "ab", Symbol::size() would use
>>>
>>>      sizeof(Symbol) + (length > 1 ? length - 1 : 0) ==>  8 + 2 - 1
>>> ==> 9 bytes
>>>
>>> even though 8 bytes are enough.
>>>
>>> Please see more comments below:
>>>
>>> On 6/19/15 5:20 PM, Yumin Qi wrote:
>>>> Hi, Coleen and Ioi
>>>>
>>>>   The revised webrev:
>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev02/
>>>>
>>>>   In this version, I have tested with Specjbb2005:
>>>>   1) change _body to one byte, contribute _body[0] to the calculation
>>>> of identity_hash. No rehash request found during the test. Like the
>>>> Warnings mentioned by Coleen in her email;
>>>>   2)  Following data is the output, no. of literals is same as no. of
>>>> entries which refers to no. of symbol.
>>>>
>>>> before:
>>>> SymbolTable statistics:
>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>> Number of entries       :      2016 =     64512 bytes, avg 32.000
>>>> Number of literals      :      2016 =     89104 bytes, avg 44.198
>>>> Total footprint         :           =    473792 bytes
>>>> Average bucket size     :     0.101
>>>> Variance of bucket size :     0.100
>>>> Std. dev. of bucket size:     0.316
>>>> Maximum bucket size     :         3
>>>>
>>>> after:
>>>> SymbolTable statistics:
>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>> Number of entries       :      2021 =     64672 bytes, avg 32.000
>>>> Number of literals      :      2021 =     87544 bytes, avg 43.317
>>>> Total footprint         :           =    472392 bytes
>>>> Average bucket size     :     0.101
>>>> Variance of bucket size :     0.100
>>>> Std. dev. of bucket size:     0.316
>>>> Maximum bucket size     :         3
>>>>
>>>>   About no. of literals, the calculation of the size is:
>>>>
>>>> template <class T, MEMFLAGS F> int RehashableHashtable<T,
>>>> F>::literal_size(Symbol *symbol) {
>>>>   return symbol->size() * HeapWordSize;
>>>> }
>>>>
>>>>   And the size is just the _length of the symbol which I think is the
>>>> length of the characters, not including  Symbol itself. Cannot use
>>>> this data to display how many bytes we saved.
>>>
>>> This is not true.  If it were true, you wouldn't have seen any changes
>>> in the "average literal" size printed above.
>>>
>>> Symbol::size() is defined to be this:
>>>
>>>   int size()                { return size(utf8_length()); }
>>>
>>> which calls the static Symbol::size(int) function, which includes the
>>> size of the header.
>>>
>>>> 3) correct a inaccurate calculation of size for Symbol in old version.
>>>>       size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
>>>> length : 0));
>>>>       It will allocate one extra byte when length > 0 since we
>>>> already reserve one byte for _body.
>>> Your description is incorrect. In the old version, SymbolBase did not
>>> include _body. Please see comments in JDK-8009575.
>>>
>>> So why did we need SymbolBase for 8009575 (the previous version) but
>>> don't need it for 8087143 (the current RFE)?
>>>
>>> After the fix for 8009575, we needed 8 bytes of header for Symbol, so
>>> if Symbol was declared as this:
>>>
>>>     class Symbol : public MetaspaceObj {
>>>        ATOMIC_SHORT_PAIR(
>>>          volatile short _refcount,  // needs atomic operation
>>>          unsigned short _length     // number of UTF8 characters in
>>>     the symbol (does not need atomic op)
>>>        );
>>>        int _identity_hash;
>>>        jbyte _body[1];
>>>     }
>>>
>>> It would occupy 9 bytes. Most compilers would automatically pad it up
>>> to 12 bytes.
>>>
>>> I could have written Symbol as the following to consider the
>>> pad-to-12-bytes case:
>>>
>>>     class Symbol : public MetaspaceObj {
>>>        ATOMIC_SHORT_PAIR(
>>>          volatile short _refcount,  // needs atomic operation
>>>          unsigned short _length     // number of UTF8 characters in
>>>     the symbol (does not need atomic op)
>>>        );
>>>        int _identity_hash;
>>>        jbyte _body[4];    // align up to 12 bytes
>>>
>>>        static int size(int length) {
>>>          size_t sz = heap_word_size(sizeof(Symbol) + (length > 4 ?
>>>     length - 4 : 0));
>>>          return align_object_size(sz);
>>>        }
>>>     }
>>>
>>> However, the amount of structure padding by the C compiler is not
>>> well-defined. Although most of our compiler would pad to 12 bytes,
>>> some compiler may decide to pad to 16 byte. While this is highly
>>> unlikely, I didn't want to take the chance. Explicitly splitting out a
>>> SymbolBase class, which is 8 bytes, would make our size calculation
>>> always correct.
>>>
>>> Anyway, for the current fix, we reduced the Symbol header to only 6
>>> bytes. Adding a body[2] will make sizeof(Symbol) to 8 bytes. Virtually
>>> no compiler would add any padding to a 8-byte structure, so using
>>> sizeof(Symbol) is good enough.
>>>
>>> Please trace the Symbol::size(int) function inside gdb to verify that
>>> it performs the calculation correctly.
>>>
>>>>      Now we save 3 bytes for every symbol (for symbol whose length is
>>>> greater than 1).
>>>>   4) remove java.cpp change, which is not needed, as Coleen pointed
>>>> out, we can use -XX:+PrintStringTableStatistics to get this data,
>>>> which also is a product flag.
>>>>   5) for .jsa files:
>>>>        specjbb2005-org.jsa:   4026368
>>>>        specjbb2005.jsa:          4018176
>>>>        Saved about 8K.
>>>>
>>> Here is the output extracted from "java  -Xshare:dump
>>> -XX:+PrintSharedSpaces" on 64-bit Linux.
>>>
>>> Before any changes:
>>>
>>> Symbol              :    50913    1925736  27.4 | 0 0   0.0
>>> |    50913    1925736  10.5
>>>
>>> After your webrev02/
>>>
>>> Symbol              :    50913    1889520  27.0 | 0 0   0.0
>>> |    50913    1889520  10.3
>>>
>>> After changing _body[1] => _body[2]
>>>
>>> Symbol              :    50913    1856248  26.6 | 0 0   0.0
>>> |    50913    1856248  10.1
>>>
>>> The average Symbol size changed from
>>>
>>>        37.82 bytes to 37.11 (-0.71) to 36.45 (-1.37)
>>>
>>> As I mentioned in the bug report, the reduction (1.37) is smaller than
>>> the theoretical value (2 bytes) because we have many small symbols,
>>> and metaspace.cpp enforces a 24-byte minimum allocation size for each
>>> MetaspaceObj on 64-bit platforms.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>>> Thanks
>>>> Yumin
>>>>
>>>>
>>>> On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>>>>>
>>>>> Hi,  This is okay, one comment below.
>>>>>
>>>>> On 6/19/15 12:21 PM, Yumin Qi wrote:
>>>>>> Coleen,
>>>>>>
>>>>>>   Thanks.
>>>>>>
>>>>>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>>>>>
>>>>>>> Yumin,
>>>>>>>
>>>>>>> If the identity hash isn't unique enough you get a system
>>>>>>> dictionary warning like this, which doesn't stop the JVM.   Could
>>>>>>> you check your test logs to see if this warning is output in any
>>>>>>> of them (jtreg keeps logs around, just checking the jtreg test
>>>>>>> results should be sufficient, I think).
>>>>>>>
>>>>>>> Log : HotSpot(TM) Server VM warning: Performance bug:
>>>>>>> SystemDictionary lookup_count=6918 lookup_length=3907
>>>>>>> average=0.564759 load=0.266601
>>>>>>>
>>>>>>> I think with including _body[0] in the function you may make the
>>>>>>> hash random enough.   So this was one of my concerns with this
>>>>>>> change.
>>>>>>>
>>>>>> I will check that.
>>>>>>> In
>>>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Are you sure you want all of the SymbolTable dump? Maybe it
>>>>>>> should be under PrintSymbolTableSizeStatistics instead?
>>>>>>>
>>>>>> I will check and make change based on the output of statistics,
>>>>>> maybe it is too much to print out all SymbolTable.
>>>>>
>>>>> It's way more than anyone needs normally.   Maybe only print it if
>>>>> one adds +Verbose to PrintStringTableSizeStatistics and when we do
>>>>> logging, it'll be -Xlog:symboltable=trace or a lower logging level
>>>>> than info.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>>> In
>>>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>>>>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>>>>>
>>>>>>> I'm not sure what that comment meant.  With SymbolBase gone (which
>>>>>>> I think looks nicer), can you still correctly calculate the space
>>>>>>> needed?
>>>>>>>
>>>>>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>>>>>> The original calculation is not correct I think:  if symbol length
>>>>>> > 0, we allocate *_ONE_* extra byte for it:
>>>>>> -    size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
>>>>>> length : 0));
>>>>>> +    size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
>>>>>> length - 2 : 0));
>>>>>>
>>>>>>> Other than these things, this change looks good.  Nice to find
>>>>>>> some space savings with something so frequent as symbols.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>>>>>> Ioi,
>>>>>>>>
>>>>>>>>   Thanks for review, I will add that (for print out purpose, not
>>>>>>>> in the final change?)
>>>>>>>>
>>>>>>>> Yumin
>>>>>>>>
>>>>>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>>>>>> Hi Yumin,
>>>>>>>>>
>>>>>>>>> I wonder why the average symbol size didn't change. They are
>>>>>>>>> both about 34.25 bytes long.
>>>>>>>>>
>>>>>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces
>>>>>>>>> show about 1.3 bytes reduction per symbol.
>>>>>>>>>
>>>>>>>>> Also, could you add some code to print out the statistics of the
>>>>>>>>> system dictionary to show if there's any change in the hashtable
>>>>>>>>> balance?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Please review the small change for
>>>>>>>>>>
>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>>>>>
>>>>>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase),
>>>>>>>>>> it is used to compute hash bucket index by modulus division of
>>>>>>>>>> table size. Currently in hotspot, no table size is more than
>>>>>>>>>> 65535 so we can use short instead.  For case with table size
>>>>>>>>>> over 65535 we can use the first two bytes of symbol data to be
>>>>>>>>>> as the upper 16 bits for the calculation but rare cases.
>>>>>>>>>>
>>>>>>>>>> This is a minor change, also eliminates SymbolBase which seems
>>>>>>>>>> redundant.  Also add print out SymbolTable statistics under
>>>>>>>>>> debug flag PrintSystemDictionaryAtExit.
>>>>>>>>>>
>>>>>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>>>>>
>>>>>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>>>>>
>>>>>>>>>> Before Change:
>>>>>>>>>> SymbolTable statistics:
>>>>>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 
>>>>>>>>>> 16.000
>>>>>>>>>> Number of entries       :     49914 =   1597248 bytes, avg 
>>>>>>>>>> 32.000
>>>>>>>>>> Number of literals      :     49914 =   1709688 bytes, avg 
>>>>>>>>>> 34.253
>>>>>>>>>> Total footprint         :           =   3627112 bytes
>>>>>>>>>> Average bucket size     :     2.494
>>>>>>>>>> Variance of bucket size :     2.483
>>>>>>>>>> Std. dev. of bucket size:     1.576
>>>>>>>>>> Maximum bucket size     :        10
>>>>>>>>>>
>>>>>>>>>> After Change:
>>>>>>>>>>
>>>>>>>>>> SymbolTable statistics:
>>>>>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 
>>>>>>>>>> 16.000
>>>>>>>>>> Number of entries       :     49911 =   1597152 bytes, avg 
>>>>>>>>>> 32.000
>>>>>>>>>> Number of literals      :     49911 =   1709544 bytes, avg 
>>>>>>>>>> 34.252
>>>>>>>>>> Total footprint         :           =   3626872 bytes
>>>>>>>>>> Average bucket size     :     2.494
>>>>>>>>>> Variance of bucket size :     2.483
>>>>>>>>>> Std. dev. of bucket size:     1.576
>>>>>>>>>> Maximum bucket size     :        10
>>>>>>>>>>
>>>>>>>>>> There is no big change for the hashtable balance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Yumin
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



More information about the hotspot-runtime-dev mailing list