RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes
Ioi Lam
ioi.lam at oracle.com
Sat Jun 20 01:39:21 UTC 2015
Yumin,
The new version of Symbol::size over-calculates the required size by 1
byte. That's why your average Symbol size is only reduced from 44.198 to
43.317 bytes (a reduction of 0.88 bytes).
The code:
jbyte _body[1];
static int size(int length) {
size_t sz = heap_word_size(sizeof(Symbol) + (length > 1 ? length
- 1 : 0));
return align_object_size(sz);
}
should be changed to
jbyte _body[2];
static int size(int length) {
size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ? length
- 2 : 0));
return align_object_size(sz);
}
When you have this
class Symbol : public MetaspaceObj {
ATOMIC_SHORT_PAIR(
volatile short _refcount, // needs atomic operation
unsigned short _length // number of UTF8 characters in the
symbol (does not need atomic op)
);
short _identity_hash;
jbyte _body[1];
}
Even though you have declared only 7 bytes of fields, sizeof(Symbol) is
8 bytes, not 7. That's because the C++ compiler implicitly adds 1 byte
of padding. So if your string is "ab", Symbol::size() would use
sizeof(Symbol) + (length > 1 ? length - 1 : 0) ==> 8 + 2 - 1 ==>
9 bytes
even though 8 bytes are enough.
Please see more comments below:
On 6/19/15 5:20 PM, Yumin Qi wrote:
> Hi, Coleen and Ioi
>
> The revised webrev:
> http://cr.openjdk.java.net/~minqi/8087143/webrev02/
>
> In this version, I have tested with Specjbb2005:
> 1) change _body to one byte, contribute _body[0] to the calculation
> of identity_hash. No rehash request found during the test. Like the
> Warnings mentioned by Coleen in her email;
> 2) Following data is the output, no. of literals is same as no. of
> entries which refers to no. of symbol.
>
> before:
> SymbolTable statistics:
> Number of buckets : 20011 = 320176 bytes, avg 16.000
> Number of entries : 2016 = 64512 bytes, avg 32.000
> Number of literals : 2016 = 89104 bytes, avg 44.198
> Total footprint : = 473792 bytes
> Average bucket size : 0.101
> Variance of bucket size : 0.100
> Std. dev. of bucket size: 0.316
> Maximum bucket size : 3
>
> after:
> SymbolTable statistics:
> Number of buckets : 20011 = 320176 bytes, avg 16.000
> Number of entries : 2021 = 64672 bytes, avg 32.000
> Number of literals : 2021 = 87544 bytes, avg 43.317
> Total footprint : = 472392 bytes
> Average bucket size : 0.101
> Variance of bucket size : 0.100
> Std. dev. of bucket size: 0.316
> Maximum bucket size : 3
>
> About no. of literals, the calculation of the size is:
>
> template <class T, MEMFLAGS F> int RehashableHashtable<T,
> F>::literal_size(Symbol *symbol) {
> return symbol->size() * HeapWordSize;
> }
>
> And the size is just the _length of the symbol which I think is the
> length of the characters, not including Symbol itself. Cannot use
> this data to display how many bytes we saved.
This is not true. If it were true, you wouldn't have seen any changes
in the "average literal" size printed above.
Symbol::size() is defined to be this:
int size() { return size(utf8_length()); }
which calls the static Symbol::size(int) function, which includes the
size of the header.
> 3) correct a inaccurate calculation of size for Symbol in old version.
> size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
> length : 0));
> It will allocate one extra byte when length > 0 since we already
> reserve one byte for _body.
Your description is incorrect. In the old version, SymbolBase did not
include _body. Please see comments in JDK-8009575.
So why did we need SymbolBase for 8009575 (the previous version) but
don't need it for 8087143 (the current RFE)?
After the fix for 8009575, we needed 8 bytes of header for Symbol, so if
Symbol was declared as this:
class Symbol : public MetaspaceObj {
ATOMIC_SHORT_PAIR(
volatile short _refcount, // needs atomic operation
unsigned short _length // number of UTF8 characters in the
symbol (does not need atomic op)
);
int _identity_hash;
jbyte _body[1];
}
It would occupy 9 bytes. Most compilers would automatically pad it up to
12 bytes.
I could have written Symbol as the following to consider the
pad-to-12-bytes case:
class Symbol : public MetaspaceObj {
ATOMIC_SHORT_PAIR(
volatile short _refcount, // needs atomic operation
unsigned short _length // number of UTF8 characters in the
symbol (does not need atomic op)
);
int _identity_hash;
jbyte _body[4]; // align up to 12 bytes
static int size(int length) {
size_t sz = heap_word_size(sizeof(Symbol) + (length > 4 ?
length - 4 : 0));
return align_object_size(sz);
}
}
However, the amount of structure padding by the C compiler is not
well-defined. Although most of our compiler would pad to 12 bytes, some
compiler may decide to pad to 16 byte. While this is highly unlikely, I
didn't want to take the chance. Explicitly splitting out a SymbolBase
class, which is 8 bytes, would make our size calculation always correct.
Anyway, for the current fix, we reduced the Symbol header to only 6
bytes. Adding a body[2] will make sizeof(Symbol) to 8 bytes. Virtually
no compiler would add any padding to a 8-byte structure, so using
sizeof(Symbol) is good enough.
Please trace the Symbol::size(int) function inside gdb to verify that it
performs the calculation correctly.
> Now we save 3 bytes for every symbol (for symbol whose length is
> greater than 1).
> 4) remove java.cpp change, which is not needed, as Coleen pointed
> out, we can use -XX:+PrintStringTableStatistics to get this data,
> which also is a product flag.
> 5) for .jsa files:
> specjbb2005-org.jsa: 4026368
> specjbb2005.jsa: 4018176
> Saved about 8K.
>
Here is the output extracted from "java -Xshare:dump
-XX:+PrintSharedSpaces" on 64-bit Linux.
Before any changes:
Symbol : 50913 1925736 27.4 | 0 0 0.0
| 50913 1925736 10.5
After your webrev02/
Symbol : 50913 1889520 27.0 | 0 0 0.0
| 50913 1889520 10.3
After changing _body[1] => _body[2]
Symbol : 50913 1856248 26.6 | 0 0 0.0
| 50913 1856248 10.1
The average Symbol size changed from
37.82 bytes to 37.11 (-0.71) to 36.45 (-1.37)
As I mentioned in the bug report, the reduction (1.37) is smaller than
the theoretical value (2 bytes) because we have many small symbols, and
metaspace.cpp enforces a 24-byte minimum allocation size for each
MetaspaceObj on 64-bit platforms.
Thanks
- Ioi
> Thanks
> Yumin
>
>
> On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>>
>> Hi, This is okay, one comment below.
>>
>> On 6/19/15 12:21 PM, Yumin Qi wrote:
>>> Coleen,
>>>
>>> Thanks.
>>>
>>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>>
>>>> Yumin,
>>>>
>>>> If the identity hash isn't unique enough you get a system
>>>> dictionary warning like this, which doesn't stop the JVM. Could you
>>>> check your test logs to see if this warning is output in any of
>>>> them (jtreg keeps logs around, just checking the jtreg test results
>>>> should be sufficient, I think).
>>>>
>>>> Log : HotSpot(TM) Server VM warning: Performance bug:
>>>> SystemDictionary lookup_count=6918 lookup_length=3907
>>>> average=0.564759 load=0.266601
>>>>
>>>> I think with including _body[0] in the function you may make the
>>>> hash random enough. So this was one of my concerns with this change.
>>>>
>>> I will check that.
>>>> In
>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html
>>>>
>>>> Are you sure you want all of the SymbolTable dump? Maybe it should
>>>> be under PrintSymbolTableSizeStatistics instead?
>>>>
>>> I will check and make change based on the output of statistics,
>>> maybe it is too much to print out all SymbolTable.
>>
>> It's way more than anyone needs normally. Maybe only print it if
>> one adds +Verbose to PrintStringTableSizeStatistics and when we do
>> logging, it'll be -Xlog:symboltable=trace or a lower logging level
>> than info.
>>
>> Thanks,
>> Coleen
>>>> In
>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html
>>>>
>>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>>
>>>> I'm not sure what that comment meant. With SymbolBase gone (which
>>>> I think looks nicer), can you still correctly calculate the space
>>>> needed?
>>>>
>>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>>> The original calculation is not correct I think: if symbol length >
>>> 0, we allocate *_ONE_* extra byte for it:
>>> - size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ? length : 0));
>>> + size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ? length - 2 : 0));
>>>
>>>> Other than these things, this change looks good. Nice to find some
>>>> space savings with something so frequent as symbols.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>>> Ioi,
>>>>>
>>>>> Thanks for review, I will add that (for print out purpose, not
>>>>> in the final change?)
>>>>>
>>>>> Yumin
>>>>>
>>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>>> Hi Yumin,
>>>>>>
>>>>>> I wonder why the average symbol size didn't change. They are both
>>>>>> about 34.25 bytes long.
>>>>>>
>>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces
>>>>>> show about 1.3 bytes reduction per symbol.
>>>>>>
>>>>>> Also, could you add some code to print out the statistics of the
>>>>>> system dictionary to show if there's any change in the hashtable
>>>>>> balance?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> wrote:
>>>>>>>
>>>>>>> Please review the small change for
>>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>>
>>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase), it
>>>>>>> is used to compute hash bucket index by modulus division of
>>>>>>> table size. Currently in hotspot, no table size is more than
>>>>>>> 65535 so we can use short instead. For case with table size
>>>>>>> over 65535 we can use the first two bytes of symbol data to be
>>>>>>> as the upper 16 bits for the calculation but rare cases.
>>>>>>>
>>>>>>> This is a minor change, also eliminates SymbolBase which seems
>>>>>>> redundant. Also add print out SymbolTable statistics under
>>>>>>> debug flag PrintSystemDictionaryAtExit.
>>>>>>>
>>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>>
>>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>>
>>>>>>> Before Change:
>>>>>>> SymbolTable statistics:
>>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>>> Number of entries : 49914 = 1597248 bytes, avg 32.000
>>>>>>> Number of literals : 49914 = 1709688 bytes, avg 34.253
>>>>>>> Total footprint : = 3627112 bytes
>>>>>>> Average bucket size : 2.494
>>>>>>> Variance of bucket size : 2.483
>>>>>>> Std. dev. of bucket size: 1.576
>>>>>>> Maximum bucket size : 10
>>>>>>>
>>>>>>> After Change:
>>>>>>>
>>>>>>> SymbolTable statistics:
>>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>>> Number of entries : 49911 = 1597152 bytes, avg 32.000
>>>>>>> Number of literals : 49911 = 1709544 bytes, avg 34.252
>>>>>>> Total footprint : = 3626872 bytes
>>>>>>> Average bucket size : 2.494
>>>>>>> Variance of bucket size : 2.483
>>>>>>> Std. dev. of bucket size: 1.576
>>>>>>> Maximum bucket size : 10
>>>>>>>
>>>>>>> There is no big change for the hashtable balance.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list