RFR: 8243961: ForceNUMA and only one available NUMA node fails assertion on Windows
Thomas Schatzl
thomas.schatzl at oracle.com
Mon May 4 09:43:59 UTC 2020
Hi,
On 02.05.20 11:59, Kim Barrett wrote:
>> On Apr 30, 2020, at 6:24 AM, Thomas Schatzl
>> <thomas.schatzl at oracle.com> wrote:
>>
>> Hi,
>>
>> On 29.04.20 14:28, Kim Barrett wrote:
>>> Please review this Windows-specific change to the initialization
>>> of UseNUMAInterleaving. […] CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8243961 Webrev:
>>> https://cr.openjdk.java.net/~kbarrett/8243961/open.00/ Testing:
>>> mach5 tier1-3 mach5 testing on Windows with explict +UseG1GC
>>> +UseNUMA +ForceNUMA and each of
>>> +/-/(default)UseNUMAInterleaving. TestUseNUMAInterleaving fails
>>> for some of those test configurations on non-NUMA hardware,
>>> because the test expects UseNUMAInterleaving to be enabled if
>>> UseNUMA is enabled. As the test failure can only happen with
>>> +ForceNUMA, that configuration doesn't seem to otherwise lead to
>>> other problems, +ForceNUMA doesn't seem to be a tested
>>> configuration, and ForceNUMA is going away (JDK-8243628), we'll
>>> not worry about that failure.
>>
>> Change is okay, but what do you think about always disabling
>> UseNUMA regardless of ForceNUMA on Windows instead?
>>
>> There does not seem to be a code path in the windows OS files that
>> acts on UseNUMA. For NUMA related methods it returns default
>> values, so any path in the GCs will just do nothing or act on
>> useless default values from the os layer and do weird things.
>>
>> UseNUMAInterleaving is completely separate from UseNUMA
>> functionality anyway (although "UseNUMA" is misleading at least to
>> me, but that's another discussion).
>>
>> There is the problem then that probably many people enable UseNUMA
>> to get UseNUMAInterleaving, but that could be handled in a windows
>> specific way too. I.e. when -XX:+UseNUMA and -XX:+/-ForceNUMA is
>> set, disable UseNUMA and enable UseNUMAInterleaving.
>>
>> That sounds much easier to understand to me than the current
>> change.
>>
>> Some collectors (Z, Shenandoah) will set UseNUMA, but this
>> evaluation is before os::init_2 so above idea would do the right
>> thing as neither have any special UseNUMA path. I.e. both actually
>> seem to want UseNUMAInterleaving. Shenandoah explicitly says so, so
>> actually changing them to enable UseNUMAInterleaving instead of
>> UseNUMA would be good too imo.
>>
>> On non-Windows there will be no difference to now.
>>
>> Thomas
>
> [Added ppc-aix-port-dev, as there is a small AIX change here.]
>
[...]
> What I've ended up doing is moving the conditional enabling of
> UseNUMAInterleaving into the platform-specific code, where there's
> enough information to consistently get it right. I've also made all
> the os variants be explicit about UseNUMA and UseNUMAInterleaving;
> those that have no support for either now unconditionally set them
> false. (This happens to keep TestUseNUMAInterleaving working on
> those platforms.)
>
> New webrev: https://cr.openjdk.java.net/~kbarrett/8243961/open.01/
>
> (No incremental; the code change to os_windows.cpp is the same, but
> the rationale and commentary are entirely different.)
os_linux.cpp: s/explicilty/explictly (pre-existing, may ignore)
There is an inconsistency in the use of FLAG_SET_ERGO vs. directly
setting flag values when the implementation does not support NUMA. E.g.
on bsd FLAG_SET_ERGO is used to disable UseNUMA*, on Windows it is
directly set to false when not setting ForceNUMA.
Since ForceNUMA is "soon" going away it's not a big issue for me though.
>
> Testing: mach5 tier1-3, normally, with -XX:+UseNUMA -XX:+ForceNUMA
> added.
>
> TestUseNUMAInterleaving still fails on Windows with +UseNUMA
> +ForceNUMA on a single-node machine, because +UseNUMAInterleaving
> gets turned off, which is not what the test expects.
One option could be to add a @requires vm.opt.ForceNUMA != true to avoid
unnecessary failures.
>
> Manually verified that on Linux -XX:+UseNUMAInterleaving alone gets
> overridden off when libnuma_init is patched to fail.
>
>
Thanks,
Thomas
More information about the ppc-aix-port-dev
mailing list