RFR: 8049599: MetaspaceGC::_capacity_until_GC can overflow

Tue Sep 30 12:43:17 UTC 2014

All,

got some great feedback from StefanK:
- Use cmpxchg_ptr instead of cmpxchg
- Add a comment describing the while loop in expand_and_allocate
- Add a comment in the test describing the overflow attempt
- Change the loop in expand_and_allocate to do/while
- Shorten the names of the local variables in expand_and_allocate

The result can be seen in the following webrevs:
- full: http://cr.openjdk.java.net/~ehelin/8049599/webrev.02/
- inc: http://cr.openjdk.java.net/~ehelin/8049599/webrev.01-02/

Thanks,
Erik

On 2014-09-24 18:32, Erik Helin wrote:
> All,
>
> I've reworked the patch quite a bit based on (great!) internal feedback
> from StefanK and Mikael Gerdin. The patch still uses an overflow check
> and a CAS to update the high-water mark (HWM), but the new behavior
> should be the same as the old one (which used Atomic::add_ptr).
>
> With the current code, each thread always increments the HWM, but there
> is a race in that another thread can allocate metadata (due to the
> increased HWM) before the thread that increased the HWM gets around to
> allocate. With the new code, each thread will increase the pointer at
> most once using a CAS. Even if increasing the HWM fails, the allocation
> attempt might still succeed (for the reason described above).
>
> There is a theoretical problem of starvation in the new code, a thread
> might forever fail to increase the HWM and forever fail to allocate due
> to contention, but in practice this should not be a problem. In the
> current code, Atomic::add_ptr is implemented as a CAS in a
> (theoretically) never ending loop on non-x86 CPUs, so the same
> theoretical starvation problem is present in the current code as well
> (on non-x86 CPUs that is).
>
> Webrevs:
> - full:
>    http://cr.openjdk.java.net/~ehelin/8049599/webrev.01/
> - incremental:
>    http://cr.openjdk.java.net/~ehelin/8049599/webrev.00-01/
>
> Testing:
> - JPRT
> - Aurora:
>    - Kitchensink
>    - Weblogic+Medrec
>    - runThese
>    - vm.quick, regression, gc, compiler, runtime, parallel class loading,
>      metaspace, oom
>    - JTReg tests
> - Running newly added JTREG test
>
> Thanks,
> Erik
>
> On 2014-08-20 12:27, Erik Helin wrote:
>> Hi all,
>>
>> this patch fixes a problem where Metaspace::_capacityUntilGC can
>> overflow ("wrap around"). Since _capacityUntilGC is treated as a size_t
>> everywhere it used, we won't calculate with negative numbers, but an
>> eventual wrap around will still cause unnecessary GCs.
>>
>> The problem is solved by detecting an eventual wrap around in
>> Metaspace::incCapacityUntilGC. The overflow check means that
>> _capacityUntilGC now must be updated with a CAS. If the CAS fails more
>> than five times due to contention, no update will be done, because this
>> means that other threads must have incremented _capacityUntilGC (it is
>> decremented only during a safepoint). This also means that a thread
>> calling incCapacityUntilGC might have "its" requested memory "stolen" by
>> another thread, but incCapacityUntilGC has never given any fairness
>> guarantees.
>>
>> The patch also adds two functions to the WhiteBox API to be able to
>> update and read Metaspace::_capacityUntilGC from a JTREG test.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8049599
>>
>> Webrev:
>> http://cr.openjdk.java.net/~ehelin/8049599/webrev.00/
>>
>> Testing:
>> - JPRT
>> - Aurora ad-hoc testing (on all platforms, both 32-bit and 64-bit):
>>    - Kitchensink, runThese and Dacapo
>>    - JTREG tests
>>    - Parallel Class Loading testlist
>>    - GC, runtime and compiler testlists
>>    - OOM and stress testlists
>> - Running newly added JTREG test
>>
>> Thanks,
>> Erik