RFR: 8258576: Try to get zerobased CCS if heap is above 32 and CDS is disabled [v2]

Fri Dec 18 15:49:56 UTC 2020

On Fri, 18 Dec 2020 14:40:39 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>>> Hm, I don't understand that one. OopEncodingHeapMax is normally 32G. CompressedOops.end() is the end of the coops heap; probably way higher than 32G. So you will get an underflow in the substraction, won't you?
>> 
>> It should not be beyond 32G if heap was places below 32G, no?
>> 
>> What I meant was: if the heap end is below 32g, and there is enough space between heap end and narrow klass pointer encoding range, place the ccs at heap end, otherwise try a low address.
>> 
>> But the more I look at this coding the more I think this should not depend on heap and UseCompressedOops at all. UseCompressedOops does not really matter. Existence of the heap only matters insofar as it is a maybe large mapping which maybe got placed into the address range we covet. But other mappings are large too, and live in that same range. So why care for the heap in particular.
>> 
>> It would make the coding way simpler and increase chances of placing ccs if we were to just start at a low address, attempt to attach at regular intervals - while observing platform restrictions (see CompressedOops::is_valid_base()) - until the 32g ceiling is reached. We do not really need to care for heap placement. If heap happens to live below 32G too - irregardless of UseCompressedOops - then some of our mmaps would fail. But that is fine.
>> 
>> For example, if we were to start at, say, 4g, and attempt to attach at 8, 12, 16...28g, I think this would work just fine on all platforms. It would not take much time either. mmap is fast if it fails. Plus, this is a rare corner case anyway.
>> 
>> And note that the low address does not have to be HeapBaseMinAddress either. ccs can live anywhere. The dependence on HeapBaseMinAddress is incidental.
>> 
>>> > It would be interesting to know why aarch64 and OSX don't use shift 0 in the new test case.
>>> 
>>> HeapBaseMinAddress seems to be 2G on all platforms. On AARCH64 2G is considered invalid because it is not 4G aligned. I must admit that I don't understand this. Below 32G the CCS will be zerobased. There is no need to load the base address efficiently into a register. I think that part could be removed, don't you think that too?
>>> 
>>> ```
>>> bool CompressedKlassPointers::is_valid_base(address p) {
>>> #ifdef AARCH64
>>>   // Below 32G, base must be aligned to 4G.
>>>   // Above that point, base must be aligned to 32G
>>>   if (p < (address)(32 * G)) {
>>>     return is_aligned(p, 4 * G);
>>>   }
>>>   return is_aligned(p, (4 << LogKlassAlignmentInBytes) * G);
>>> #else
>>>   return true;
>>> #endif
>>> }
>>> ```
>>> 
>> 
>> No, this coding is correct. See comment above the function. This coding returns true if the given value is usable as a _narrow klass base_. On aarch64, 0 is valid, if its not 0, it has to be 4G aligned (below 32G), 4G<<3 above. 2G is not a correct base. 2G would be valid as start of ccs though. But that distinction would have to be clearly coded out, atm its not. See metaspace.cpp:717 and :739. But I am not sure if its worth the complexity.
>> 
>> For more details, please see: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-April/008757.html , also, the review thread for JDK-8243392.
>> 
>> Please do not change this coding in CompressedClassPointers without talking with the aarch64 devs, since this was done with them together as part of JDK-8243392. Should you change that, could you please do it in a seperate RFE?
>> 
>> One problem with this coding is since this is not really used much (CDS is enabled 99% of the time) it is not that well covered by tests. I run into that problem repeatedly in the past.
>> 
>>> On macos `Xlog:os*=debug` prints
>>> 
>>> ```
>>> [0.052s][debug][os       ] Attempt to reserve memory at 0x0000000080000000 for 1073741824 bytes failed, errno 2
>>> ```
>>> 
>>> errno 2 seems to be ENOENT. I don't know what this means. I don't see any mapping at 2G with vmmap.
>> 
>> errno may be stale. It is printed miles away from the failing mmap call.
>
> One other aspect to check, if ccs is placed below the heap, is whether it gets placed in the way of the sbrk. Important for platforms which implement malloc via sbrk, because you could get malloc fails later. It is the case for AIX and Solaris at least (the latter matters when downporting). For AIX, I added safety coding right into os::reserve_memory. It may also be an issue on MacOS, that should be tested. I believe glibc is fine.

> 
> 
> > Hm, I don't understand that one. OopEncodingHeapMax is normally 32G. CompressedOops.end() is the end of the coops heap; probably way higher than 32G. So you will get an underflow in the substraction, won't you?
> 
> It should not be beyond 32G if heap was places below 32G, no?
> 

Ok, I see, but this is not the point of the enhancement. The point here is to
make use of the lower 32G if the heap is above that limit.

And I'd like to keep that scope. I'm planning to downport the small change to 11u
where CDS is off by default.

Of course you can further optimize this in a follow-up.

> But the more I look at this coding the more I think this should not depend on heap and UseCompressedOops at all. UseCompressedOops does not really matter.

That observation is correct. I wanted to eliminate the dependency but
unfortunately I haven't found something in CollectedHeap that takes the noaccess
prefix required for the not-zerobased coops heap into account (the noaccess
prefix is a protected page used for implicit null
checks). CompressedOops::base() does.

> And note that the low address does not have to be HeapBaseMinAddress either. ccs can live anywhere. The dependence on HeapBaseMinAddress is incidental.

I think the naming is incidental. AFAIK the intend is not to use the address space below HeapBaseMinAddress.

> > > It would be interesting to know why aarch64 and OSX don't use shift 0 in the new test case.
> > 
> > 
> > HeapBaseMinAddress seems to be 2G on all platforms. On AARCH64 2G is considered invalid because it is not 4G aligned. I must admit that I don't understand this. Below 32G the CCS will be zerobased. There is no need to load the base address efficiently into a register. I think that part could be removed, don't you think that too?
> > ```
> > bool CompressedKlassPointers::is_valid_base(address p) {
> > #ifdef AARCH64
> >   // Below 32G, base must be aligned to 4G.
> >   // Above that point, base must be aligned to 32G
> >   if (p < (address)(32 * G)) {
> >     return is_aligned(p, 4 * G);
> >   }
> >   return is_aligned(p, (4 << LogKlassAlignmentInBytes) * G);
> > #else
> >   return true;
> > #endif
> > }
> > ```
> 
> No, this coding is correct. See comment above the function.

The comment says

// Given an address p, return true if p can be used as an encoding base.
//  (Some platforms have restrictions of what constitutes a valid base address).

But the parameter passed is not the encoding base. It is the desired beginning
of ccs. If ccs is mapped at 2G then the encoding base is 0 (actually it is 0 if
the ccs end is <= KlassEncodingMetaspaceMax, see
CompressedKlassPointers::initialize).  So either the function or its users are
not correct.

In other words: mapping ccs at 2G would result in a zerobased ccs. I'd be extremely
surprised if 0x0 as encoding base is actually invalid on aarch64. IMHO opinion
it is a bug that it is not allowed to map ccs at 2g on aarch64.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1815