RFR (S) JDK-8009575 - Reduce Symbol::_refcount from 4 bytes to 2 bytes
Ioi Lam
ioi.lam at oracle.com
Tue Mar 26 22:05:51 PDT 2013
On 03/26/2013 09:24 PM, David Holmes wrote:
> Hi Ioi,
>
> First your frames view in the webrev continues to be broken (missing
> file) :(
>
I am not sure what's wrong. I'll try to update to a new webrev version.
I always use "sdiffs" so I wasn't too bothered by it :-)
> On 27/03/2013 7:03 AM, Ioi Lam wrote:
>> Please review:
>>
>> http://cr.openjdk.java.net/~iklam/8009575/symbol_refcount_001/
>> <http://cr.openjdk.java.net/%7Eiklam/8009575/symbol_refcount_001/>
>
> Looking at generateJvmOffsets.cpp:
>
> I don't understand this part of the code so can't comment.
>
> Looking at vmStructs.cpp:
>
> I don't see any SA changes so I assume these values never actually get
> accessed ?? How do you verify this part of the change?
>
The SA .java code does not expose the _refcount field. So the only
change visible to SA is the offset of _length has changed (but size of
_length remains 2 bytes). I will try to do some SA sanity tests, but it
will probably work (famous last words).
>> Bug: Reduce Symbol::_refcount from 4 bytes to 2 bytes
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8009575
>> https://jbs.oracle.com/bugs/browse/JDK-8009575
>>
>>
>> Summary of fix:
>>
>> (1) Symbol::_refcount has been changed to a short.
>>
>> Note that _refcount needs to be incremented atomically using
>> Atom::add, which requires a 32-bit word in memory. So I glue the
>> _refcount and _length fields together using union and ENDIAN
>> tricks, which is kind of ugly but faster.
>>
>> The prettier alternative would be to use shift and masking to
>> access the _refcount and _length components, but performance
>> would be slightly slower.
>>
>> If anyone has a more elegant solution, please let me know.
>
> Not sure why shift and mask would be slower as the compiler will have
> to generate shifts and masks anyway ??
Many (all?) architectures have 2-byte loads, so loading a two byte
u.s._length would be a single LOAD instruction. In contrast, loading a
32-bit word and then masking off the upper 16 bits would take 2 ~ 3
instructions. Some compilers (such as gcc) would be smart enough to
generate a 2-byte LOAD instead, but I don't want to count on that.
There's also the fact that _refcount is volatile but _length is not. So
always combining them in a 32-bit volatile may make reading of _length
more expensive.
>
>> #------------------------------------------------------
>> union {
>> volatile int _refcount_and_length;
>> struct {
>> #ifdef VM_LITTLE_ENDIAN
>> unsigned short _length; // number of UTF8 characters in the
>> symbol
>> volatile short _refcount;
>> #else
>> volatile short _refcount;
>> unsigned short _length; // number of UTF8 characters in the
>> symbol
>> #endif
>> } s;
>> } u;
>>
>> void Symbol::increment_refcount() {
>> if (u.s._refcount >= 0) {
>> // Adding 0x10000 to _refcount_and_length is equivalent to
>> adding 0x1 to
>> // _refcount (w.r.t overflow/underflow).
>> (void)Atomic::add(0x10000, &u._refcount_and_length);
>> ....
>> }
>> #------------------------------------------------------
>
> As per side email a key enabler here is that _length is immutable once
> set.
>
> I was going to suggest that the fields be swapped so that we just
> add/subtract 1, but I see now that the order is necessary to ensure
> the overflow detection works without modifying _length.
>
This is correct. Underflowing (from 0 to -1) shouldn't happen, unless
there is a coding error. However, today such coding error would just
leave the Symbol pinned forever (_refcount == -1). However, with the new
code (if _refcount were placed at lower 16 bit), then underflowing would
cause _length to be corrupt, and may cause more severe problems. So I
decided to put refcount at the upper 16 bit just for bug-for-bug
compatibility (if any) as today's code.
>> (2) Symbol::size(int) was wasting one byte per Symbol instance. I fixed
>> it by
>> splitting part of the fields from Symbol into SymbolBase.
>
> I don't understand this part. What were we wasting? And given
> alignment constraints did we in fact waste it?
>
The old code was
class Symbol : public MetaspaceObj {
int _refcount;
int _identity_hash;
unsigned short _length;
jbyte _body[1];
static int size(int length) {
size_t sz = heap_word_size(sizeof(Symbol) + (length > 0 ? length -
1 : 0));
return align_object_size(sz);
}
It assumes that sizeof(Symbol) is 11 bytes (2 ints + 1 short + 1 byte),
hence the "length - 1". I.e., if length == 2, size() => 11 + 2 - 1 = 12.
However, on most platforms, sizeof(Symbol) == 12, so we get (12 + 2 - 1)
= 13 bytes.
Hence, while a 2-byte symbol could be stored in 12 bytes, we would
allocate 13 bytes (rounded up to 16) for the Symbol.
On average, we would waste one byte per symbol -- we have a more-or-less
even distribution of (length % 4).
>> Footprint result:
>>
>> The theoretical reduction is 3 bytes per symbol -- 2 bytes
>> from _refcount, and 1 byte from the fixed Symbol::size(int)
>> function.
>> In actual measurement, CDS image (about 3000 classes, with about
>> 50000 symbols) is reduced by about 140KB, or about 2.8 bytes per
>> Symbol, or about 0.6% of the whole CDS image.
>
> I'm not sure that makes sense. I would expect you to save 0 bytes or 4
> bytes depending on alignment (assuming the Symbol is 32-bit aligned to
> start with).
You're correct that sometimes you save 4 bytes, sometimes none. Due to
the even distribution of (length % 4), on average you save ~ 3 bytes per
Symbol.
Thanks
- Ioi
>
>
> Thanks,
> David
>
>>
>> Testing:
>>
>> * UTE (vm.runtime.testlist, vm.quick.testlist
>> vm.parallel_class_loading.testlist)
>>
>> Thanks,
>> Ioi
>>
More information about the hotspot-runtime-dev
mailing list