compressed oops and 64-bit header words

Mon May 5 16:15:46 PDT 2008

Coleen is right. In the implementation we are working on
the shift and decode/encode instructions "will" fold into
address expression. So the "only" penalty you will pay is an additional
memory for 64-bit mark words. And on x86 you will win very big even
with this penalty since in 64-bits mode you have more registers
for local values (less stack/memory accesses).

Thanks,
Vladimir

Coleen Phillimore wrote:
> 
> Hi,
> It made sense when I first read it but in order to have 32 bit pointers 
> in #3, I can't imagine not having to encode and decode them by some heap 
> base in order to dereference these pointers, so the only difference 
> between #2 and #3 is the shift instruction to get to 32G.  We didn't 
> believe that the shift causes much of a performance penalty so we didn't 
> implement it this way.  We would like to measure this at some point 
> though, and if it is faster could add this mode fairly easily.
> 
> thanks!
> Coleen
> 
> Dan Grove wrote:
>> Thanks Colleen and Vladimir-
>>
>> What I'm wondering is whether there could be a third mode:
>>
>> 1. > 32GB - uses uncompressed pointers
>> 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers
>> (along with 64-bit mark word), 64-bit ABI
>> 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit ABI.
>>
>> The idea here is that I'd prefer to pay no penalty over 32-bit when my
>> app runs in 64-bit mode and the app fits in 4GB of memory (my reason
>> for this is that I want to support our JNI libraries only in 64-bit
>> mode, and deprecate the 32-bit JNI libraries).
>>
>> Does this make any sense to you?
>>
>> Dan
>>
>> On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun Microsystems
>> <Coleen.Phillimore at sun.com> wrote:
>>  
>>>  Actually, we are using the gap for a field and array length in the code
>>> now, but the code Vladimir showed me makes the allocation code a lot 
>>> cleaner
>>> for the instance field case.
>>>
>>>  In the array case in 64 bits, compressing the _klass pointer into 32 
>>> bits
>>> allows us to move the _length field into the other 32 bits, which 
>>> because of
>>> alignment saves 64 bits.  There was a 32 bit alignment gap after the 
>>> _length
>>> field, if not compressed with the klass pointer.
>>>
>>>  The mark word can also contain a forwarding pointer used during GC, so
>>> can't be 32 bits.
>>>
>>>  The compression that we use allows for 32G because we shift into the 
>>> least
>>> significant bits - the algorithm is (ptr-heap_base)>>3.
>>>
>>>  Coleen
>>>
>>>
>>>
>>>  Vladimir Kozlov wrote:
>>>
>>>    
>>>> Dan,
>>>>
>>>> Only the mark word is 64 bits. The klass pointer is 32-bits but
>>>> in the current implementation the gap after klass is not used.
>>>>
>>>> I am working on to use the gap for a field or array's length.
>>>>
>>>> The mark word may contain a 64-bits tread pointer (for Biased Locking).
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> Dan Grove wrote:
>>>>
>>>>      
>>>>> Hi-
>>>>>
>>>>> I talked some with the Nikolay Igotti about compressed oops in
>>>>> OpenJDK7. He tells me that the mark word and class pointer remain 64
>>>>> bits when compressed oops are being used. It seems that this leaves a
>>>>> fair amount of the bloat in place when moving from 32->64 bits.
>>>>>
>>>>> I'm interesting in deprecating 32-bit VM's at my employer at some
>>>>> point. Doing this is going to require that 64-bit VM's have as little
>>>>> bloat as possible. Has there been any consideration of making the mark
>>>>> word and class pointer 32 bits in cases where the VM fits within 4GB?
>>>>> It seems like this would be a major win. A second benefit here is that
>>>>> the "add and shift" currently required on dereference of compressed
>>>>> oops could be eliminated in cases where the VM fit inside 4GB.
>>>>>
>>>>> Dan
>>>>>
>>>>>         
>