compressed oops and 64-bit header words
Clemens Eisserer
linuxhippy at gmail.com
Fri May 9 03:54:59 PDT 2008
> Thanks Vladimir - I didn't realize that the extra 32 bits were being
> used for a field. This is work that we're considering doing - mostly,
> I wanted to hear feedback, and find out whether you were already doing
> this.
>
> So the real question from my standpoint is what we're missing when we
> think about this, and whether it's viable at all.
Wouldn't it be great if google would dedicate resources to Hotspot's
development?
Thanks god hotspot does not have a public interface, who knows wether
it would lead to another Android ;)
lg Clemens
>
> Dan
>
> On Thu, May 8, 2008 at 8:12 AM, Vladimir Kozlov <Vladimir.Kozlov at sun.com> wrote:
>> Dan,
>>
>> It is not 2 64-bits words, it is 1 and half :)
>> since klass is 32-bits and we use other 32-bits for a field.
>> So the overhead is only 4 bytes. Also don't forget that
>> all objects are aligned to 8 bytes in the heap even
>> in 32-bits VM. So the average overhead will be less.
>>
>> I want to be clear that it is not that we totally against
>> your suggestion. It is resources we need to implement it
>> which we don't have currently.
>> On other hand, VM is open source now so you or your colleges
>> can do it and help us all.
>>
>> Thanks,
>> Vladimir
>>
>> Dan Grove wrote:
>>>
>>> Thanks Vladimir. I'm still worried about the memory bloat from having
>>> (effectively) 2 64-bit words in the object header, rather than 2 32-bit
>>> words. If we consider an average (non-array) object size around 30-40 bytes,
>>> this is a significant overhead. It seems that if users were willing to
>>> declare that they were running inside a 4GB virtual address space (and in my
>>> case, users would be willing to do in order to avoid memory bloat), we
>>> should be able to do this.
>>>
>>> On linux, I believe that if the process were running with a "ulimit -v
>>> XXXX" shell, we could make guarantees that all address would fit in 32 bits,
>>> even for a 64-bit VM. Do you agree that this would make sense?
>>>
>>> Dan
>>>
>>> 2008/5/5 Vladimir Kozlov <Vladimir.Kozlov at sun.com
>>> <mailto:Vladimir.Kozlov at sun.com>>:
>>> > Dan,
>>> >
>>> > Thank you for the paper.
>>> > I think, the benefit they have with the compressed header comes
>>> > mostly from a compressed vtable pointer. Which in our VM corresponds
>>> > to a klass pointer which is also compressed.
>>> > So in this sense we also have compressed header.
>>> >
>>> > I can not say what the performance benefit we have now with
>>> > compressed oops since the generated code for a klass pointer
>>> > load/stores currently is not what we would like to have
>>> > (and we are working to improve it).
>>> >
>>> > I doubt that the compressed markword will give big difference.
>>> > But I may be wrong.
>>> >
>>> >
>>> >
>>> > Thanks,
>>> > Vladimir
>>> >
>>> > Dan Grove wrote:
>>> >
>>> > > Hi Colleen-
>>> > >
>>> > > I'm not worried about the shift instruction - I agree that it's
>>> > > unlikely to matter. What I am worried about is have the standard
>>> > > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit
>>> > > word, and 32 bits of pad).
>>> > >
>>> > > What I'm worried about is the increase in memory footprint and its
>>> > > impact on performance. I was pointed to
>>> > >
>>> http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667
>>> > > , which (conveniently) breaks out the performance impact of
>>> > > compressing the header versus compressing references versus both.
>>> > >
>>> > > So what I would really be interested would be a way to have both the
>>> > > pointers/words in the header and the oops be 32 bits. I think this
>>> > > would be a good win, when coupled with the extra registers when using
>>> > > the 64-bit ABI.
>>> > >
>>> > > Dan
>>> > >
>>> > > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore
>>> > > <Coleen.Phillimore at sun.com <mailto:Coleen.Phillimore at sun.com>> wrote:
>>> > >
>>> > > > Hi,
>>> > > > It made sense when I first read it but in order to have 32 bit
>>> pointers in
>>> > > > #3, I can't imagine not having to encode and decode them by some
>>> heap base
>>> > > > in order to dereference these pointers, so the only difference
>>> between #2
>>> > > > and #3 is the shift instruction to get to 32G. We didn't believe
>>> that the
>>> > > > shift causes much of a performance penalty so we didn't implement
>>> it this
>>> > > > way. We would like to measure this at some point though, and if it
>>> is
>>> > > > faster could add this mode fairly easily.
>>> > > >
>>> > > > thanks!
>>> > > > Coleen
>>> > > >
>>> > > >
>>> > > >
>>> > > > Dan Grove wrote:
>>> > > >
>>> > > >
>>> > > > > Thanks Colleen and Vladimir-
>>> > > > >
>>> > > > > What I'm wondering is whether there could be a third mode:
>>> > > > >
>>> > > > > 1. > 32GB - uses uncompressed pointers
>>> > > > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed
>>> pointers
>>> > > > > (along with 64-bit mark word), 64-bit ABI
>>> > > > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but
>>> 64-bit ABI.
>>> > > > >
>>> > > > > The idea here is that I'd prefer to pay no penalty over 32-bit
>>> when my
>>> > > > > app runs in 64-bit mode and the app fits in 4GB of memory (my
>>> reason
>>> > > > > for this is that I want to support our JNI libraries only in
>>> 64-bit
>>> > > > > mode, and deprecate the 32-bit JNI libraries).
>>> > > > >
>>> > > > > Does this make any sense to you?
>>> > > > >
>>> > > > > Dan
>>> > > > >
>>> > > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun
>>> Microsystems
>>> > > > > <Coleen.Phillimore at sun.com <mailto:Coleen.Phillimore at sun.com>>
>>> wrote:
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > > Actually, we are using the gap for a field and array length in
>>> the code
>>> > > > > > now, but the code Vladimir showed me makes the allocation code
>>> a lot
>>> > > > > >
>>> > > > >
>>> > > > cleaner
>>> > > >
>>> > > > >
>>> > > > > > for the instance field case.
>>> > > > > >
>>> > > > > > In the array case in 64 bits, compressing the _klass pointer
>>> into 32
>>> > > > > >
>>> > > > >
>>> > > > bits
>>> > > >
>>> > > > >
>>> > > > > > allows us to move the _length field into the other 32 bits,
>>> which
>>> > > > > >
>>> > > > >
>>> > > > because of
>>> > > >
>>> > > > >
>>> > > > > > alignment saves 64 bits. There was a 32 bit alignment gap after
>>> the
>>> > > > > >
>>> > > > >
>>> > > > _length
>>> > > >
>>> > > > >
>>> > > > > > field, if not compressed with the klass pointer.
>>> > > > > >
>>> > > > > > The mark word can also contain a forwarding pointer used during
>>> GC, so
>>> > > > > > can't be 32 bits.
>>> > > > > >
>>> > > > > > The compression that we use allows for 32G because we shift
>>> into the
>>> > > > > >
>>> > > > >
>>> > > > least
>>> > > >
>>> > > > >
>>> > > > > > significant bits - the algorithm is (ptr-heap_base)>>3.
>>> > > > > >
>>> > > > > > Coleen
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Vladimir Kozlov wrote:
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > > Dan,
>>> > > > > > >
>>> > > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits
>>> but
>>> > > > > > > in the current implementation the gap after klass is not
>>> used.
>>> > > > > > >
>>> > > > > > > I am working on to use the gap for a field or array's length.
>>> > > > > > >
>>> > > > > > > The mark word may contain a 64-bits tread pointer (for Biased
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > Locking).
>>> > > >
>>> > > > >
>>> > > > > >
>>> > > > > > > Thanks,
>>> > > > > > > Vladimir
>>> > > > > > >
>>> > > > > > > Dan Grove wrote:
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > > Hi-
>>> > > > > > > >
>>> > > > > > > > I talked some with the Nikolay Igotti about compressed oops
>>> in
>>> > > > > > > > OpenJDK7. He tells me that the mark word and class pointer
>>> remain 64
>>> > > > > > > > bits when compressed oops are being used. It seems that
>>> this leaves
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > a
>>> > > >
>>> > > > >
>>> > > > > >
>>> > > > > > >
>>> > > > > > > > fair amount of the bloat in place when moving from 32->64
>>> bits.
>>> > > > > > > >
>>> > > > > > > > I'm interesting in deprecating 32-bit VM's at my employer
>>> at some
>>> > > > > > > > point. Doing this is going to require that 64-bit VM's have
>>> as
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > little
>>> > > >
>>> > > > >
>>> > > > > >
>>> > > > > > >
>>> > > > > > > > bloat as possible. Has there been any consideration of
>>> making the
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > mark
>>> > > >
>>> > > > >
>>> > > > > >
>>> > > > > > >
>>> > > > > > > > word and class pointer 32 bits in cases where the VM fits
>>> within
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > 4GB?
>>> > > >
>>> > > > >
>>> > > > > >
>>> > > > > > >
>>> > > > > > > > It seems like this would be a major win. A second benefit
>>> here is
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > that
>>> > > >
>>> > > > >
>>> > > > > >
>>> > > > > > >
>>> > > > > > > > the "add and shift" currently required on dereference of
>>> compressed
>>> > > > > > > > oops could be eliminated in cases where the VM fit inside
>>> 4GB.
>>> > > > > > > >
>>> > > > > > > > Dan
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list