RFR: 8226236: [TESTBUG] win32: gc/metaspace/TestCapacityUntilGCWrapAround.java fails

Thomas Stuefe stuefe at openjdk.java.net
Wed Oct 14 06:40:16 UTC 2020


On Wed, 14 Oct 2020 03:17:47 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:

>> Hi Yasumasa,
>> 
>> sorry, I do not understand what the error is yet and to me the patch does seem wrong in a number of places.
>> 
>> Both test and whitebox function seem to be doing what they are supposed to do.
>> 
>> Is this error intermittent or always reproducable? If the latter, since when (since the code seems old)?
>> 
>> Have the AdoptOpenJDK folks done some pre-analysis?
>> 
>> ----
>> 
>> From just looking at the code (cannot build 32bit right now), WB_IncMetaspaceCapacityUntilGC() seems correct to me.
>> 
>> - in comes a jlong with 4G-1, so FFFF-FFFF
>> - it fits into a 32bit size_t so its fine
>> - we calculate aligned_inc by aligning the value down somewhat, probably a page: FFFF-F000
>> - we feed this value FFFF-F000 as *inc* value to  MetaspaceGC::inc_capacity_until_GC
>> - it returns false. We throw an exception.
>> 
>> In MetaspaceGC::inc_capacity_until_GC:
>> 
>> bool MetaspaceGC::inc_capacity_until_GC(size_t v, size_t* new_cap_until_GC, size_t* old_cap_until_GC, bool* can_retry) {
>>   assert_is_aligned(v, Metaspace::commit_alignment());
>> 
>>   size_t old_capacity_until_GC = _capacity_until_GC;
>>   size_t new_value = old_capacity_until_GC + v;
>> 
>>   if (new_value < old_capacity_until_GC) {
>>     // The addition wrapped around, set new_value to aligned max value.
>>     new_value = align_down(max_uintx, Metaspace::commit_alignment());
>>   }
>> 
>>   if (new_value > MaxMetaspaceSize) {
>>     if (can_retry != NULL) {
>>       *can_retry = false;
>>     }
>>     return false;
>>   }
>> 
>> we enter with size_t v = FFFF-F000
>> 
>> We calc the new value. Since new value = old value + inc, this will overflow
>> 
>> We sense the overflow and correct the new value to be max_uintx aligned down by commit alignment.
>> 
>> Which will be smaller than MaxMetaspaceSize. Test does not set it therefore it is max_uintx.
>> 
>> So far all good IIUC.
>> 
>> 
>>   if (can_retry != NULL) {
>>     *can_retry = true;
>>   }
>>   size_t prev_value = Atomic::cmpxchg(&_capacity_until_GC, old_capacity_until_GC, new_value);
>> 
>>   if (old_capacity_until_GC != prev_value) {
>>     return false;
>>   }
>> 
>>   if (new_cap_until_GC != NULL) {
>>     *new_cap_until_GC = new_value;
>>   }
>>   if (old_cap_until_GC != NULL) {
>>     *old_cap_until_GC = old_capacity_until_GC;
>>   }
>>   return true;
>> 
>> From looking at this code, the only way I can see this function returning false would be if a concurrent thread
>> modified this threshold. Which should be super rare.
>> Hence my initial question about intermittentness. If its reproducable I do not think I understand it yet.
>> 
>> Thanks, Thomas
>
> I think TestCapacityUntilGCWrapAround.java has two problems:
> 
> 1. It might attempt to increase metaspace size over MaxMetaspaceSize (this PR fixes it)
> 2. Overflow test would always fail because `MetaspaceGC::inc_capacity_until_GC()` always returns `false`
> 
> This test has introduced in JDK-8049599, but I cannot know details because JBS ticket is closed, but we can see its
> commit: https://github.com/openjdk/jdk/commit/6f4355a3a6be2482c388c2022fc7c3c4b2f481c5   I enabled this test on 64bit
> platforms because these problems might happen on them.
> If you agree with the fix for 1. , I will add the test for 2. to this PR.

I found @shipilev's analysis in the bug report and understand better what happens now.

I do not think this is a test error, the test is correct and shows a real issue. Just pasting my JBS comment here:

The test verifies that calling MetaspaceGC::inc_capacity_until_GC() repeatedly will not overflow the gc threshold.

1) MetaspaceGC::ergo_initialize()

MaxMetaspaceSize, if left unspecified, is supposed to be "infinite". In reality it defaults to max_uintx. But it gets
aligned down by Metaspace::reserve_alignment. That one is either one of os::vm_allocation_granularity or 4 pages,
whatever is larger. The 4 pages thing I recently added with JDK-8245707 to shake loose misuse of this alignment value
(exactly cases like this).

So on Windows, Metaspace::reserve_alignment has always been os::vm_allocation_granularity (64K). On Linux, it was
always 4K, until recently it switched to 16K.

This explains the 16K aligned value we see for MaxMetaspaceSize in Alexeys test: 4294950912 (FFFFC000).

2) MetaspaceGC::inc_capacity_until_GC()

The increase value causes an overflow - its supposed to do that. Overflow gets handled by:

  if (new_value < old_capacity_until_GC) {
    // The addition wrapped around, set new_value to aligned max value.
    new_value = align_down(max_uintx, Metaspace::commit_alignment());
  }

which sets the new threshold at max_uintx aligned down by Metaspace::commit_alignment(), which is just os::vm_page size.

So the new value is 4294963200, resp. FFFFF000

So the problem is that one value gets aligned down by Metaspace::reserve_alignment() (os::vm_allocation_granularity()),
the other by Metaspace::commit_alignment (os::vm_page_size()). Then we compare those values.

So the test is okay. It showed us a real issue. I am not 100% sure what the correct behavior would be. Maybe increasing
the gc threshold beyond MaxMetaspaceSize should not be an error at all, but we should just cap out at that value?

Possible solutions:

A) As stated above, just cap.

b) All that alignment business is actually unnecessary and we could remove it for clarity after JEP387 is out of the
door. MaxMetaspaceSize does not need to be aligned to anything, neither does the GC threshold.

C) as a small workaround, at (2) one probably could use Metaspace::reserve_alignment, not commit_alignment. That makes
no sense semantically but would remove this error.

(This must have been always a problem on Windows, and only recently on Linux, right?)

-------------

PR: https://git.openjdk.java.net/jdk/pull/628



More information about the hotspot-gc-dev mailing list