compressed oops and 64-bit header words

Wed May 7 23:28:53 PDT 2008

Thanks Vladimir. I'm still worried about the memory bloat from having 
(effectively) 2 64-bit words in the object header, rather than 2 32-bit 
words. If we consider an average (non-array) object size around 30-40 bytes, 
this is a significant overhead. It seems that if users were willing to 
declare that they were running inside a 4GB virtual address space (and in my 
case, users would be willing to do in order to avoid memory bloat), we 
should be able to do this. 

On linux, I believe that if the process were running with a "ulimit -v XXXX" 
shell, we could make guarantees that all address would fit in 32 bits, even 
for a 64-bit VM. Do you agree that this would make sense?

Dan

2008/5/5 Vladimir Kozlov <Vladimir.Kozlov at sun.com>:
> Dan,
> 
> Thank you for the paper.
> I think, the benefit they have with the compressed header comes
> mostly from a compressed vtable pointer. Which in our VM corresponds
> to a klass pointer which is also compressed.
> So in this sense we also have compressed header.
> 
> I can not say what the performance benefit we have now with
> compressed oops since the generated code for a klass pointer
> load/stores currently is not what we would like to have
> (and we are working to improve it).
> 
> I doubt that the compressed markword will give big difference.
> But I may be wrong.
> 
> 
> 
> Thanks,
> Vladimir
> 
> Dan Grove wrote:
> 
> > Hi Colleen-
> > 
> > I'm not worried about the shift instruction - I agree that it's
> > unlikely to matter. What I am worried about is have the standard
> > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit
> > word, and 32 bits of pad).
> > 
> > What I'm worried about is the increase in memory footprint and its
> > impact on performance. I was pointed to
> > http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667
> > , which (conveniently) breaks out the performance impact of
> > compressing the header versus compressing references versus both.
> > 
> > So what I would really be interested would be a way to have both the
> > pointers/words in the header and the oops be 32 bits. I think this
> > would be a good win, when coupled with the extra registers when using
> > the 64-bit ABI.
> > 
> > Dan
> > 
> > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore
> > <Coleen.Phillimore at sun.com> wrote:
> > 
> > > Hi,
> > > It made sense when I first read it but in order to have 32 bit 
pointers in
> > > #3, I can't imagine not having to encode and decode them by some heap 
base
> > > in order to dereference these pointers, so the only difference between 
#2
> > > and #3 is the shift instruction to get to 32G. We didn't believe that 
the
> > > shift causes much of a performance penalty so we didn't implement it 
this
> > > way. We would like to measure this at some point though, and if it is
> > > faster could add this mode fairly easily.
> > > 
> > > thanks!
> > > Coleen
> > > 
> > > 
> > > 
> > > Dan Grove wrote:
> > > 
> > > 
> > > > Thanks Colleen and Vladimir-
> > > > 
> > > > What I'm wondering is whether there could be a third mode:
> > > > 
> > > > 1. > 32GB - uses uncompressed pointers
> > > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers
> > > > (along with 64-bit mark word), 64-bit ABI
> > > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit 
ABI.
> > > > 
> > > > The idea here is that I'd prefer to pay no penalty over 32-bit when 
my
> > > > app runs in 64-bit mode and the app fits in 4GB of memory (my reason
> > > > for this is that I want to support our JNI libraries only in 64-bit
> > > > mode, and deprecate the 32-bit JNI libraries).
> > > > 
> > > > Does this make any sense to you?
> > > > 
> > > > Dan
> > > > 
> > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun 
Microsystems
> > > > <Coleen.Phillimore at sun.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > > Actually, we are using the gap for a field and array length in the 
code
> > > > > now, but the code Vladimir showed me makes the allocation code a 
lot
> > > > > 
> > > > 
> > > cleaner
> > > 
> > > > 
> > > > > for the instance field case.
> > > > > 
> > > > > In the array case in 64 bits, compressing the _klass pointer into 
32
> > > > > 
> > > > 
> > > bits
> > > 
> > > > 
> > > > > allows us to move the _length field into the other 32 bits, which
> > > > > 
> > > > 
> > > because of
> > > 
> > > > 
> > > > > alignment saves 64 bits. There was a 32 bit alignment gap after 
the
> > > > > 
> > > > 
> > > _length
> > > 
> > > > 
> > > > > field, if not compressed with the klass pointer.
> > > > > 
> > > > > The mark word can also contain a forwarding pointer used during 
GC, so
> > > > > can't be 32 bits.
> > > > > 
> > > > > The compression that we use allows for 32G because we shift into 
the
> > > > > 
> > > > 
> > > least
> > > 
> > > > 
> > > > > significant bits - the algorithm is (ptr-heap_base)>>3.
> > > > > 
> > > > > Coleen
> > > > > 
> > > > > 
> > > > > 
> > > > > Vladimir Kozlov wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > > Dan,
> > > > > > 
> > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits but
> > > > > > in the current implementation the gap after klass is not used.
> > > > > > 
> > > > > > I am working on to use the gap for a field or array's length.
> > > > > > 
> > > > > > The mark word may contain a 64-bits tread pointer (for Biased
> > > > > > 
> > > > > 
> > > > 
> > > Locking).
> > > 
> > > > 
> > > > > 
> > > > > > Thanks,
> > > > > > Vladimir
> > > > > > 
> > > > > > Dan Grove wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Hi-
> > > > > > > 
> > > > > > > I talked some with the Nikolay Igotti about compressed oops in
> > > > > > > OpenJDK7. He tells me that the mark word and class pointer 
remain 64
> > > > > > > bits when compressed oops are being used. It seems that this 
leaves
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > a
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > fair amount of the bloat in place when moving from 32->64 
bits.
> > > > > > > 
> > > > > > > I'm interesting in deprecating 32-bit VM's at my employer at 
some
> > > > > > > point. Doing this is going to require that 64-bit VM's have as
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > little
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > bloat as possible. Has there been any consideration of making 
the
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > mark
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > word and class pointer 32 bits in cases where the VM fits 
within
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 4GB?
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > It seems like this would be a major win. A second benefit here 
is
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > that
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > the "add and shift" currently required on dereference of 
compressed
> > > > > > > oops could be eliminated in cases where the VM fit inside 4GB.
> > > > > > > 
> > > > > > > Dan
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > > 
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20080507/8dfd52ba/attachment.html