[ping] Re: [11] RFR(M): 8189922: UseNUMA memory interleaving vs membind
David Holmes
david.holmes at oracle.com
Tue Jul 10 21:39:04 UTC 2018
Hi Gustavo,
On 11/07/2018 6:14 AM, Gustavo Romero wrote:
> Hi Swati,
>
> As David pointed out, it's necessary to determine if that bug qualifies
> as P3 in order to get it into JDK 11 RDP1.
>
> AFAICS, that bug was never triaged explicitly and got its current
> priority (P4) from the default.
Actually no, the P4 was from the (Oracle internal) ILW prioritization
scheme.
For this to be a P3 it needs to be shown either that the impact is quite
significant (IIUC it's only a mild performance issue based on the bug
report); or that the likelihood of this being encountered is very high
(again it seems not that likely based on the info in the bug report).
HTH.
David
-----
>
> Once it's defined the correct integration version, I can sponsor that
> change
> for you. I think there won't be any updates for JDK 11 (contrary to what
> happened for JDK 10), but I think we can understand how distros are
> handling
> it and so find out if there is a possibility to get the change into the
> distros once it's pushed to JDK 12.
>
>
> David, Alan,
>
> I could not find a documentation on how to formally triage a bug. For
> instance,
> on [1] I see Alan used some markers as "ILW =" and "MLH = " but I don't
> know if
> these markers are only for Oracle internal control. Do you know how could I
> triage that bug? I understand its risk of integration is small but even
> tho I
> think it's necessary to bring up additional information on that to
> combine in a
> final bug priority.
>
> Thanks.
>
>
> Best regards,
> Gustavo
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8206953
>
> On 07/03/2018 03:06 AM, David Holmes wrote:
>> Looks fine.
>>
>> Thanks,
>> David
>>
>> On 3/07/2018 3:08 PM, Swati Sharma wrote:
>>> Hi David,
>>>
>>> I have added NULL check for _numa_bitmask_isbitset in
>>> isbound_to_single_node() method.
>>>
>>> Hosted:http://cr.openjdk.java.net/~gromero/8189922/v2/
>>> <http://cr.openjdk.java.net/~gromero/8189922/v2/>
>>>
>>> Swati
>>>
>>> On Mon, Jul 2, 2018 at 5:54 AM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>>
>>> Hi Swati,
>>>
>>> I took a look at this though I'm not familiar with the functional
>>> operation of the NUMA API's - I'm relying on Gustavo and Derek to
>>> spot any actual usage errors there.
>>>
>>> In isbound_to_single_node() there is no NULL check for
>>> _numa_bitmask_isbitset (which seems to be the normal pattern for
>>> using all of these function pointers).
>>>
>>> Otherwise this seems fine.
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 30/06/2018 2:46 AM, Swati Sharma wrote:
>>>
>>> Hi,
>>>
>>> Could I get a review for this change that affects the JVM when
>>> there are
>>> pinned memory nodes please?
>>>
>>> It's already reviewed and tested on PPC64 and on AARCH64 by
>>> Gustavo and
>>> Derek, however both are not Reviewers so I need additional
>>> reviews for that
>>> change.
>>>
>>>
>>> Thanks in advance.
>>>
>>> Swati
>>>
>>> On Tue, Jun 19, 2018 at 5:58 PM, Swati Sharma
>>> <swatibits14 at gmail.com <mailto:swatibits14 at gmail.com>> wrote:
>>>
>>> Hi All,
>>>
>>> Here is the numa information of the system :
>>> swati at java-diesel1:~$ numactl -H
>>> available: 8 nodes (0-7)
>>> node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
>>> node 0 size: 64386 MB
>>> node 0 free: 64134 MB
>>> node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
>>> node 1 size: 64509 MB
>>> node 1 free: 64232 MB
>>> node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
>>> node 2 size: 64509 MB
>>> node 2 free: 64215 MB
>>> node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
>>> node 3 size: 64509 MB
>>> node 3 free: 64157 MB
>>> node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101
>>> 102 103
>>> node 4 size: 64509 MB
>>> node 4 free: 64336 MB
>>> node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109
>>> 110 111
>>> node 5 size: 64509 MB
>>> node 5 free: 64352 MB
>>> node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117
>>> 118 119
>>> node 6 size: 64509 MB
>>> node 6 free: 64359 MB
>>> node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125
>>> 126 127
>>> node 7 size: 64508 MB
>>> node 7 free: 64350 MB
>>> node distances:
>>> node 0 1 2 3 4 5 6 7
>>> 0: 10 16 16 16 32 32 32 32
>>> 1: 16 10 16 16 32 32 32 32
>>> 2: 16 16 10 16 32 32 32 32
>>> 3: 16 16 16 10 32 32 32 32
>>> 4: 32 32 32 32 10 16 16 16
>>> 5: 32 32 32 32 16 10 16 16
>>> 6: 32 32 32 32 16 16 10 16
>>> 7: 32 32 32 32 16 16 16 10
>>>
>>> Thanks,
>>> Swati
>>>
>>> On Tue, Jun 19, 2018 at 12:00 AM, Gustavo Romero <
>>> gromero at linux.vnet.ibm.com
>>> <mailto:gromero at linux.vnet.ibm.com>> wrote:
>>>
>>> Hi Swati,
>>>
>>> On 06/16/2018 02:52 PM, Swati Sharma wrote:
>>>
>>> Hi All,
>>>
>>> This is my first patch,I would appreciate if anyone
>>> can review the fix:
>>>
>>> Bug :
>>> https://bugs.openjdk.java.net/browse/JDK-8189922
>>> <https://bugs.openjdk.java.net/browse/JDK-8189922> <
>>> https://bugs.openjdk.java.net/browse/JDK-8189922
>>> <https://bugs.openjdk.java.net/browse/JDK-8189922>>
>>> Webrev
>>> :http://cr.openjdk.java.net/~gromero/8189922/v1
>>> <http://cr.openjdk.java.net/~gromero/8189922/v1>
>>>
>>> The bug is about JVM flag UseNUMA which bypasses the
>>> user specified
>>> numactl --membind option and divides the whole heap
>>> in lgrps according to
>>> available numa nodes.
>>>
>>> The proposed solution is to disable UseNUMA if bound
>>> to single numa
>>> node. In case more than one numa node binding,
>>> create the lgrps according
>>> to bound nodes.If there is no binding, then JVM will
>>> divide the whole heap
>>> based on the number of NUMA nodes available on the
>>> system.
>>>
>>> I appreciate Gustavo's help for fixing the thread
>>> allocation based on
>>> numa distance for membind which was a dangling issue
>>> associated with main
>>> patch.
>>>
>>>
>>> Thanks. I have no further comments on it. LGTM.
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>> PS: Please, provide numactl -H information when
>>> possible. It helps to
>>> grasp
>>> promptly the actual NUMA topology in question :)
>>>
>>> Tested the fix by running specjbb2015 composite workload
>>> on 8 NUMA node
>>>
>>> system.
>>> Case 1 : Single NUMA node bind
>>> numactl --cpunodebind=0 --membind=0 java -Xmx24g
>>> -Xms24g -Xmn22g
>>> -XX:+UseNUMA
>>> -Xlog:gc*=debug:file=gc.log:time,uptimemillis
>>> <composite_application>
>>> Before Patch: gc.log
>>> eden space 22511616K(22GB), 12% used
>>> lgrp 0 space 2813952K, 100% used
>>> lgrp 1 space 2813952K, 0% used
>>> lgrp 2 space 2813952K, 0% used
>>> lgrp 3 space 2813952K, 0% used
>>> lgrp 4 space 2813952K, 0% used
>>> lgrp 5 space 2813952K, 0% used
>>> lgrp 6 space 2813952K, 0% used
>>> lgrp 7 space 2813952K, 0% used
>>> After Patch : gc.log
>>> eden space 46718976K(45GB), 99% used(NUMA disabled)
>>>
>>> Case 2 : Multiple NUMA node bind
>>> numactl --cpunodebind=0,7 –membind=0,7 java -Xms50g
>>> -Xmx50g -Xmn45g
>>> -XX:+UseNUMA
>>> -Xlog:gc*=debug:file=gc.log:time,uptimemillis
>>> <composite_application>
>>> Before Patch :gc.log
>>> eden space 46718976K, 6% used
>>> lgrp 0 space 5838848K, 14% used
>>> lgrp 1 space 5838848K, 0% used
>>> lgrp 2 space 5838848K, 0% used
>>> lgrp 3 space 5838848K, 0% used
>>> lgrp 4 space 5838848K, 0% used
>>> lgrp 5 space 5838848K, 0% used
>>> lgrp 6 space 5838848K, 0% used
>>> lgrp 7 space 5847040K, 35% used
>>> After Patch : gc.log
>>> eden space 46718976K(45GB), 99% used
>>> lgrp 0 space 23359488K(23.5GB), 100% used
>>> lgrp 7 space 23359488K(23.5GB), 99% used
>>>
>>>
>>> Note: The proposed solution is only for numactl
>>> membind option.The fix
>>> is not for --cpunodebind and localalloc which is a
>>> separate bug bug
>>> https://bugs.openjdk.java.net/browse/JDK-8205051
>>> <https://bugs.openjdk.java.net/browse/JDK-8205051>
>>> and fix is in progress
>>> on this.
>>>
>>> Thanks,
>>> Swati Sharma
>>> Software Engineer -2 at AMD
>>>
>>>
>>>
>>>
>>>
>>
>
More information about the hotspot-dev
mailing list