Request for review (S): 7103665 HeapWord*ParallelScavengeHeap::failed_mem_allocate(unsigned long, bool)+0x97

Fri Mar 23 17:11:10 UTC 2012

Jon,

On 2012-03-23 16:57, Jon Masamitsu wrote:
> Bengt,
>
> The changes you made are correct and an improvement, so if this
> is time critical, I'm good with this version.  I do still have some
> questions below.

OK. Thanks.

>
> On 3/23/2012 8:31 AM, Bengt Rutisson wrote:
>> Jon,
>>
>> Thanks for looking at this!
>>
>> 23 mar 2012 kl. 15:46 skrev Jon Masamitsu<jon.masamitsu at oracle.com>:
>>
>>> Bengt,
>>>
>>> Is it correct that there is still the possibility that during the final
>>> iteration of the loop to do the filling, that the space to be filled
>>> can be too small to fit an object?  I note the assertion
>>>
>>>>    98           assert(words_to_fill>= CollectedHeap::min_fill_size(),
>>>>    99             err_msg("Remaining siz ("SIZE_FORMAT ") is too 
>>>> small to fill (based on " SIZE_FORMAT " and " SIZE_FORMAT ")",
>>>>   100             words_to_fill, words_left_to_fill, 
>>>> CollectedHeap::filler_array_max_size()));
>>> You could fix that by checking for the situation in the next to last
>>> iteration and shortening the filler for the next to last iteration?
>> I think this should be safe. The heap size is object aligned and we 
>> allocate aligned objects, so the hole should be object aligned, right?
>>
>> The max array size is also aligned. So I think we are safe. But to be 
>> sure I added the assert.
>>
> This should be safe in 32bit but I'm not sure about 64bit.  The hole 
> can be objected aligned
> but still an odd number of words?

Good point. But I think it is unlikely that you get holes that are 
larger than what can be covered with one Integer array on 32 bit 
plaforms, right?

Can it be an issue with compressed oops on 64 bit? I think we are safe 
as long as we still don't have compressed headers. But please correct me 
if I am wrong. My head is spinning a bit from all the word/byte/jint 
conversions.

>
>>> Also a question below.
>>>
>>>
>>> On 3/23/2012 5:56 AM, Bengt Rutisson wrote:
>>>> Hi all,
>>>>
>>>> Could I have a couple of reviews for this change? This is on the 
>>>> critical-watch list for hs23, so it would be great if I could get 
>>>> some reviews already today.
>>>>
>>>> http://cr.openjdk.java.net/~brutisso/7103665/webrev.00/
>>>>
>>>> Some background:
>>>>
>>>> There are two problems that this fix solves:
>>>>
>>>> (1) When we run with -XX:+UseNUMA we have to make the heap parsable 
>>>> since we leave some holes in the heap. We do this by faking Integer 
>>>> arrays in those places. However, on a large heap those holes can be 
>>>> larger than what can be covered by a single Integer array. The fix 
>>>> is to allocate several arrays in those cases.
>>>>
>>>> The easy fix would have been to call 
>>>> CollectedHeap::fill_with_objects() instead of 
>>>> CollectedHeap::fill_with_object(). Note the extra "s" in the method 
>>>> name. But MutableNUMASpace::ensure_parsability(), where the change 
>>>> is needed, need to know about all the objects that are allocated so 
>>>> I saw no better solution than to implement my own loop. Any ideas 
>>>> on how I could re-use fill_with_objects() instead are welcome.
>>> Do you need to know all the objects that are allocated so that you can
>>> do the invalid region detection?
>>>
>>> 113             if (crossing_start != crossing_end) {
>>> 114               // If object header crossed a small page boundary 
>>> we mark the area
>>> 115               // as invalid rounding it to a page_size().
>>>
>> Yes, that's why.
>
> If you used fill_with_objects()  (with an "s"), you do know where 
> first filler object is so
> you could step through the objects using the object sizes to find the 
> next filler object.
> Then it seems like you could do this check (which I can't say I 
> understand what's going
> on there) in a second loop.

Right. That's probably a nicer way of doing it. Especially since it is 
only needed on Solaris (I think).

However, I'd like to get this in to have a theoretical chance of porting 
it to hs23. So, since you stated that you are ok with it as it is I'll 
go ahead and push what I have now.

Thanks again for the review and the good comments!
Bengt

>
> Jon
>> Bengt
>>
>>
>>> Jon
>>>> (2) CollectedHeap::_filler_array_max_size should be the maximum 
>>>> size that can be covered by a fake Integer array. This value is in 
>>>> words, but since the word size is different on different platforms 
>>>> but the Integer size is always the same we need to convert between 
>>>> word and sizeof(jint) at some point. Unfortunately we do the 
>>>> conversion in the wrong direction, which means that on 64 bit 
>>>> platforms we end up with a max size that is 4 times larger than 
>>>> intended. This in turn makes us overflow an int when we convert 
>>>> from a word size to the length of the array that we are setting up.
>>>>
>>>> Here is the code that overflows:
>>>>
>>>> 328 void
>>>> 329 CollectedHeap::fill_with_array(HeapWord* start, size_t words, 
>>>> bool zap)
>>>> 330 {
>>>> 331   assert(words>= filler_array_min_size(), "too small for an 
>>>> array");
>>>> 332   assert(words<= filler_array_max_size(), "too big for a single 
>>>> object");
>>>> 333
>>>> 334   const size_t payload_size = words - filler_array_hdr_size();
>>>> 335   const size_t len = payload_size * HeapWordSize / sizeof(jint);
>>>> 336
>>>> 337   // Set the length first for concurrent GC.
>>>> 338   ((arrayOop)start)->set_length((int)len);
>>>> 339   post_allocation_setup_common(Universe::intArrayKlassObj(), 
>>>> start, words);
>>>> 340   DEBUG_ONLY(zap_filler_array(start, words, zap);)
>>>> 341 }
>>>>
>>>> If filler_array_max_size() on line 332, which returns 
>>>> CollectedHeap::_filler_array_max_size, has a too large value we 
>>>> will overflow the int conversation on line 338.
>>>>
>>>> Testing:
>>>> This fix solves the issue that was found in the reproducer that I 
>>>> could set up on a Linux x64 machine. I have asked the one who 
>>>> initially reported the issue to run on their Solaris amd64 setup. I 
>>>> will also try to set up a reproducer on a Solaris machine.
>>>>
>>>> I also ran the fix through JPRT. It did not fail, but there are no 
>>>> NUMA tests in there as far as I know. But the change for issue (2) 
>>>> was hopefully tested.
>>>>
>>>> Thanks,
>>>> Bengt