From HORIE at jp.ibm.com Tue May 2 14:47:01 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 2 May 2017 23:47:01 +0900 Subject: 8179527: Implement intrinsic code for reverseBytes with load/store Message-ID: Dear all, Would you please review following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8179527 Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/ I added new intrinsic code for reverseBytes() in ppc.ad with * match(Set dst (ReverseBytesI/L/US/S (LoadI src))); * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustavo.scalet at eldorado.org.br Tue May 2 15:05:09 2017 From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet) Date: Tue, 2 May 2017 15:05:09 +0000 Subject: 8179527: Implement intrinsic code for reverseBytes with load/store In-Reply-To: References: Message-ID: <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br> Hi Michihiro, I wonder if there is no vectorized approach for implementing your "bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so intentionally? > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Michihiro Horie > Sent: ter?a-feira, 2 de maio de 2017 11:47 > To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; > volker.simonis at sap.com; martin.doerr at sap.com > Subject: 8179527: Implement intrinsic code for reverseBytes with > load/store > > Dear all, > > Would you please review following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8179527 > Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/ > > I added new intrinsic code for reverseBytes() in ppc.ad with > * match(Set dst (ReverseBytesI/L/US/S (LoadI src))); > * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo From martin.doerr at sap.com Tue May 2 17:23:29 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 2 May 2017 17:23:29 +0000 Subject: 8179527: Implement intrinsic code for reverseBytes with load/store In-Reply-To: <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br> References: <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br> Message-ID: <7827421e2c6447f4ae406434f5bb3d25@sap.com> Hi Michihiro and Gustavo, thank you very much for implementing this change. @Gustavo: Thanks for taking a look. I think that the direct match rules are just there to satisfy match_rule_supported. They don't need to be fast, they are just a fall back solution. The goal is to exploit the byte reverse load and store instructions which should match in more performance critical cases. Now my review: assembler_ppc.hpp: Looks good except a minor formatting request: LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532 << 1), should be LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532u << 1), to be consistent. The comments // X-FORM should be aligned with the other ones. assembler_ppc.inline.hpp: Good. ppc.ad: I'm concerned about the additional match rules which are only used for the expand step. They could match directly leading to incorrect code. What they match is not what they do. I suggest to implement the code directly in the ins_encode. This would make the new code significantly shorter and less error prone. I think we don't need to optimize for Power6 anymore and newer processors shouldn't really suffer under a little less optimized instruction scheduling. Would you agree? Displacements may be too large for "li" so I suggest to use the "indirect" memory operand and let the compiler handle it. I know that it may increase latency because the compiler will need to insert an addition which could better be matched into the memory operand of the load which is harder to implement (it is possible to match an addition in an operand). Best regards, Martin -----Original Message----- From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br] Sent: Dienstag, 2. Mai 2017 17:05 To: Michihiro Horie Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Simonis, Volker ; Doerr, Martin Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store Hi Michihiro, I wonder if there is no vectorized approach for implementing your "bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so intentionally? > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Michihiro Horie > Sent: ter?a-feira, 2 de maio de 2017 11:47 > To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; > volker.simonis at sap.com; martin.doerr at sap.com > Subject: 8179527: Implement intrinsic code for reverseBytes with > load/store > > Dear all, > > Would you please review following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8179527 > Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/ > > I added new intrinsic code for reverseBytes() in ppc.ad with > * match(Set dst (ReverseBytesI/L/US/S (LoadI src))); > * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo From rwestrel at redhat.com Wed May 3 08:04:37 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 03 May 2017 10:04:37 +0200 Subject: PPC change needed for: [10] RFR(M): 8176506: C2: loop unswitching and unsafe accesses cause crash Message-ID: Just a heads up that the change below has some platform specific code and is likely to need PPC specific code. Thanks, Roland. -------------- next part -------------- An embedded message was scrubbed... From: Roland Westrelin Subject: Re: [10] RFR(M): 8176506: C2: loop unswitching and unsafe accesses cause crash Date: Fri, 28 Apr 2017 10:46:18 +0200 Size: 3343 URL: From gromero at linux.vnet.ibm.com Wed May 3 13:27:08 2017 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 3 May 2017 10:27:08 -0300 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: <59000AC0.7050507@linux.vnet.ibm.com> References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> Message-ID: <5909DAAC.3070202@linux.vnet.ibm.com> Hi community, I understand that there is nothing that can be done additionally regarding this issue, at this point, on the PPC64 side. It's a change in the shared code - but that in effect does not change anything in the numa detection mechanism for other platforms - and hence it's necessary a conjoint community effort to review the change and a sponsor to run it against the JPRT. I know it's a stabilizing moment of OpenJDK 9, but since that issue is of great concern on PPC64 (specially on POWER8 machines) I would be very glad if the community could point out directions on how that change could move on. Thank you! Best regards, Gustavo On 25-04-2017 23:49, Gustavo Romero wrote: > Dear Volker, > > On 24-04-2017 14:08, Volker Simonis wrote: >> Hi Gustavo, >> >> thanks for addressing this problem and sorry for my late reply. I >> think this is a good change which definitely improves the situation >> for uncommon NUMA configurations without changing the handling for >> common topologies. > > Thanks a lot for reviewing the change! > > >> It would be great if somebody could run this trough JPRT, but as >> Gustavo mentioned, I don't expect any regressions. >> >> @Igor: I think you've been the original author of the NUMA-aware >> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >> linux"). If you could find some spare minutes to take a look at this >> change, your comment would be very much appreciated :) >> >> Following some minor comments from me: >> >> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >> to get the actual number of configured nodes. This is good and >> certainly an improvement over the previous implementation. However, >> the man page for numa_num_configured_nodes() mentions that the >> returned count may contain currently disabled nodes. Do we currently >> handle disabled nodes? What will be the consequence if we would use >> such a disabled node (e.g. mbind() warnings)? > > In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in > found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just > returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the > number of nodes with memory in the system. To the best of my knowledge there is > no system configuration on Linux/PPC64 that could match such a notion of > "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in > that dir and just the ones with memory will be taken into account. If it's > disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no > mbind() tried against it). > > On Power it's possible to have a numa node without memory (memory-less node, a > case covered in this change), a numa node without cpus at all but with memory > (a configured node anyway, so a case already covered) but to disable a specific > numa node so it does not appear in /sys/devices/system/node/* it's only possible > from the inners of the control module. Or other rare condition not invisible / > adjustable from the OS. Also I'm not aware of a case where a node is in this > dir but is at the same time flagged as something like "disabled". There are > cpu/memory hotplugs, but that does not change numa nodes status AFAIK. > > [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 > [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 > > >> - the same question applies to the usage of >> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >> Does isnode_in_configured_nodes() (i.e. the node set defined by >> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >> this be a potential problem (i.e. if we use a disabled node). > > On the meaning of "disabled nodes", it's the same case as above, so to the > best of knowledge it's not a potential problem. > > Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), > i.e. "all nodes on which the calling task may allocate memory". It's exactly > the same pointer returned by numa_get_membind() v2 [3] which: > > "returns the mask of nodes from which memory can currently be allocated" > > and that is used, for example, in "numactl --show" to show nodes from where > memory can be allocated [4, 5]. > > [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 > [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 > [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 > > >> - I'd like to suggest renaming the 'index' part of the following >> variables and functions to 'nindex' ('node_index' is probably to long) >> in the following code, to emphasize that we have node indexes pointing >> to actual, not always consecutive node numbers: >> >> 2879 // Create an index -> node mapping, since nodes are not >> always consecutive >> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >> GrowableArray(0, true); >> 2881 rebuild_index_to_node_map(); > > Simple change but much better to read indeed. Done. > > >> - can you please wrap the following one-line else statement into curly >> braces (it's more readable and we usually do it that way in HotSpot >> although there are no formal style guidelines :) >> >> 2953 } else >> 2954 // Current node is already a configured node. >> 2955 closest_node = index_to_node()->at(i); > > Done. > > >> - in os::Linux::rebuild_cpu_to_node_map(), if you set >> 'closest_distance' to INT_MAX at the beginning of the loop, you can >> later avoid the check for '|| !closest_distance'. Also, according to >> the man page, numa_distance() returns 0 if it can not determine the >> distance. So with the above change, the condition on line 2974 should >> read: >> >> 2947 if (distance && distance < closest_distance) { >> > > Sure, much better to set the initial condition as distant as possible and > adjust to a closer one bit by bit improving the if condition. Done. > > >> Finally, and not directly related to your change, I'd suggest the >> following clean-ups: >> >> - remove the usage of 'NCPUS = 32768' in >> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >> unclear to me and probably related to an older version/problem of >> libnuma? I think we should simply use >> numa_allocate_cpumask()/numa_free_cpumask() instead. >> >> - we still use the NUMA version 1 function prototypes (e.g. >> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >> also "numa_interleave_memory()" and maybe others). I think we should >> switch all prototypes to the new NUMA version 2 interface which you've >> already used for the new functions which you've added. > > I agree. Could I open a new bug to address these clean-ups? > > >> That said, I think these changes all require libnuma 2.0 (see >> os::Linux::libnuma_dlsym). So before starting this, you should make >> sure that libnuma 2.0 is available on all platforms to which you'd >> like to down-port this change. For jdk10 we could definitely do it, >> for jdk9 probably also, for jdk8 I'm not so sure. > > libnuma v1 last release dates back to 2008, but any idea how could I check that > for sure since it's on shared code? > > new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ > > Thank you! > > Best regards, > Gustavo > > >> Regards, >> Volker >> >> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >> wrote: >>> Hi, >>> >>> Any update on it? >>> >>> Thank you. >>> >>> Regards, >>> Gustavo >>> >>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>> Hi, >>>> >>>> Could the following webrev be reviewed please? >>>> >>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>> exist in the system. >>>> >>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>> >>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>> consecutive and have memory, for example in a numa topology like: >>>> >>>> available: 2 nodes (0-1) >>>> node 0 cpus: 0 8 16 24 32 >>>> node 0 size: 65258 MB >>>> node 0 free: 34 MB >>>> node 1 cpus: 40 48 56 64 72 >>>> node 1 size: 65320 MB >>>> node 1 free: 150 MB >>>> node distances: >>>> node 0 1 >>>> 0: 10 20 >>>> 1: 20 10, >>>> >>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>> topology like: >>>> >>>> available: 4 nodes (0-1,16-17) >>>> node 0 cpus: 0 8 16 24 32 >>>> node 0 size: 130706 MB >>>> node 0 free: 7729 MB >>>> node 1 cpus: 40 48 56 64 72 >>>> node 1 size: 0 MB >>>> node 1 free: 0 MB >>>> node 16 cpus: 80 88 96 104 112 >>>> node 16 size: 130630 MB >>>> node 16 free: 5282 MB >>>> node 17 cpus: 120 128 136 144 152 >>>> node 17 size: 0 MB >>>> node 17 free: 0 MB >>>> node distances: >>>> node 0 1 16 17 >>>> 0: 10 20 40 40 >>>> 1: 20 10 40 40 >>>> 16: 40 40 10 20 >>>> 17: 40 40 20 10, >>>> >>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>> no memory. >>>> >>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>> id as a hint that is not available in the system to be bound (it will receive >>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>> messages: >>>> >>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>> >>>> That change improves the detection by making the JVM numa API aware of the >>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>> be available: >>>> >>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>> >>>> The change has no effect on numa topologies were the problem does not occur, >>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>> numa topologies where memory-less nodes exist (like in the last example above), >>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>> to the closest node, otherwise they would be not associate to any node and >>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>> performance. >>>> >>>> I found no regressions on x64 for the following numa topology: >>>> >>>> available: 2 nodes (0-1) >>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>> node 0 size: 24102 MB >>>> node 0 free: 19806 MB >>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>> node 1 size: 24190 MB >>>> node 1 free: 21951 MB >>>> node distances: >>>> node 0 1 >>>> 0: 10 21 >>>> 1: 21 10 >>>> >>>> I understand that fixing the current numa detection is a prerequisite to enable >>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>> >>>> Thank you. >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>> >>> >> > From volker.simonis at gmail.com Wed May 3 14:34:16 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 3 May 2017 16:34:16 +0200 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: <5909DAAC.3070202@linux.vnet.ibm.com> References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com> Message-ID: Hi, I've reviewed Gustavo's change and I'm fine with the latest version at: http://cr.openjdk.java.net/~gromero/8175813/v3/ Can somebody please sponsor the change? Thank you and best regards, Volker On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero wrote: > Hi community, > > I understand that there is nothing that can be done additionally regarding this > issue, at this point, on the PPC64 side. > > It's a change in the shared code - but that in effect does not change anything in > the numa detection mechanism for other platforms - and hence it's necessary a > conjoint community effort to review the change and a sponsor to run it against > the JPRT. > > I know it's a stabilizing moment of OpenJDK 9, but since that issue is of > great concern on PPC64 (specially on POWER8 machines) I would be very glad if > the community could point out directions on how that change could move on. > > Thank you! > > Best regards, > Gustavo > > On 25-04-2017 23:49, Gustavo Romero wrote: >> Dear Volker, >> >> On 24-04-2017 14:08, Volker Simonis wrote: >>> Hi Gustavo, >>> >>> thanks for addressing this problem and sorry for my late reply. I >>> think this is a good change which definitely improves the situation >>> for uncommon NUMA configurations without changing the handling for >>> common topologies. >> >> Thanks a lot for reviewing the change! >> >> >>> It would be great if somebody could run this trough JPRT, but as >>> Gustavo mentioned, I don't expect any regressions. >>> >>> @Igor: I think you've been the original author of the NUMA-aware >>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >>> linux"). If you could find some spare minutes to take a look at this >>> change, your comment would be very much appreciated :) >>> >>> Following some minor comments from me: >>> >>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >>> to get the actual number of configured nodes. This is good and >>> certainly an improvement over the previous implementation. However, >>> the man page for numa_num_configured_nodes() mentions that the >>> returned count may contain currently disabled nodes. Do we currently >>> handle disabled nodes? What will be the consequence if we would use >>> such a disabled node (e.g. mbind() warnings)? >> >> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in >> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just >> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the >> number of nodes with memory in the system. To the best of my knowledge there is >> no system configuration on Linux/PPC64 that could match such a notion of >> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in >> that dir and just the ones with memory will be taken into account. If it's >> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no >> mbind() tried against it). >> >> On Power it's possible to have a numa node without memory (memory-less node, a >> case covered in this change), a numa node without cpus at all but with memory >> (a configured node anyway, so a case already covered) but to disable a specific >> numa node so it does not appear in /sys/devices/system/node/* it's only possible >> from the inners of the control module. Or other rare condition not invisible / >> adjustable from the OS. Also I'm not aware of a case where a node is in this >> dir but is at the same time flagged as something like "disabled". There are >> cpu/memory hotplugs, but that does not change numa nodes status AFAIK. >> >> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 >> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 >> >> >>> - the same question applies to the usage of >>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >>> Does isnode_in_configured_nodes() (i.e. the node set defined by >>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >>> this be a potential problem (i.e. if we use a disabled node). >> >> On the meaning of "disabled nodes", it's the same case as above, so to the >> best of knowledge it's not a potential problem. >> >> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), >> i.e. "all nodes on which the calling task may allocate memory". It's exactly >> the same pointer returned by numa_get_membind() v2 [3] which: >> >> "returns the mask of nodes from which memory can currently be allocated" >> >> and that is used, for example, in "numactl --show" to show nodes from where >> memory can be allocated [4, 5]. >> >> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 >> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 >> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 >> >> >>> - I'd like to suggest renaming the 'index' part of the following >>> variables and functions to 'nindex' ('node_index' is probably to long) >>> in the following code, to emphasize that we have node indexes pointing >>> to actual, not always consecutive node numbers: >>> >>> 2879 // Create an index -> node mapping, since nodes are not >>> always consecutive >>> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >>> GrowableArray(0, true); >>> 2881 rebuild_index_to_node_map(); >> >> Simple change but much better to read indeed. Done. >> >> >>> - can you please wrap the following one-line else statement into curly >>> braces (it's more readable and we usually do it that way in HotSpot >>> although there are no formal style guidelines :) >>> >>> 2953 } else >>> 2954 // Current node is already a configured node. >>> 2955 closest_node = index_to_node()->at(i); >> >> Done. >> >> >>> - in os::Linux::rebuild_cpu_to_node_map(), if you set >>> 'closest_distance' to INT_MAX at the beginning of the loop, you can >>> later avoid the check for '|| !closest_distance'. Also, according to >>> the man page, numa_distance() returns 0 if it can not determine the >>> distance. So with the above change, the condition on line 2974 should >>> read: >>> >>> 2947 if (distance && distance < closest_distance) { >>> >> >> Sure, much better to set the initial condition as distant as possible and >> adjust to a closer one bit by bit improving the if condition. Done. >> >> >>> Finally, and not directly related to your change, I'd suggest the >>> following clean-ups: >>> >>> - remove the usage of 'NCPUS = 32768' in >>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >>> unclear to me and probably related to an older version/problem of >>> libnuma? I think we should simply use >>> numa_allocate_cpumask()/numa_free_cpumask() instead. >>> >>> - we still use the NUMA version 1 function prototypes (e.g. >>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >>> also "numa_interleave_memory()" and maybe others). I think we should >>> switch all prototypes to the new NUMA version 2 interface which you've >>> already used for the new functions which you've added. >> >> I agree. Could I open a new bug to address these clean-ups? >> >> >>> That said, I think these changes all require libnuma 2.0 (see >>> os::Linux::libnuma_dlsym). So before starting this, you should make >>> sure that libnuma 2.0 is available on all platforms to which you'd >>> like to down-port this change. For jdk10 we could definitely do it, >>> for jdk9 probably also, for jdk8 I'm not so sure. >> >> libnuma v1 last release dates back to 2008, but any idea how could I check that >> for sure since it's on shared code? >> >> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ >> >> Thank you! >> >> Best regards, >> Gustavo >> >> >>> Regards, >>> Volker >>> >>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >>> wrote: >>>> Hi, >>>> >>>> Any update on it? >>>> >>>> Thank you. >>>> >>>> Regards, >>>> Gustavo >>>> >>>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>>> Hi, >>>>> >>>>> Could the following webrev be reviewed please? >>>>> >>>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>>> exist in the system. >>>>> >>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>>> >>>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>>> consecutive and have memory, for example in a numa topology like: >>>>> >>>>> available: 2 nodes (0-1) >>>>> node 0 cpus: 0 8 16 24 32 >>>>> node 0 size: 65258 MB >>>>> node 0 free: 34 MB >>>>> node 1 cpus: 40 48 56 64 72 >>>>> node 1 size: 65320 MB >>>>> node 1 free: 150 MB >>>>> node distances: >>>>> node 0 1 >>>>> 0: 10 20 >>>>> 1: 20 10, >>>>> >>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>>> topology like: >>>>> >>>>> available: 4 nodes (0-1,16-17) >>>>> node 0 cpus: 0 8 16 24 32 >>>>> node 0 size: 130706 MB >>>>> node 0 free: 7729 MB >>>>> node 1 cpus: 40 48 56 64 72 >>>>> node 1 size: 0 MB >>>>> node 1 free: 0 MB >>>>> node 16 cpus: 80 88 96 104 112 >>>>> node 16 size: 130630 MB >>>>> node 16 free: 5282 MB >>>>> node 17 cpus: 120 128 136 144 152 >>>>> node 17 size: 0 MB >>>>> node 17 free: 0 MB >>>>> node distances: >>>>> node 0 1 16 17 >>>>> 0: 10 20 40 40 >>>>> 1: 20 10 40 40 >>>>> 16: 40 40 10 20 >>>>> 17: 40 40 20 10, >>>>> >>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>>> no memory. >>>>> >>>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>>> id as a hint that is not available in the system to be bound (it will receive >>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>>> messages: >>>>> >>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>>> >>>>> That change improves the detection by making the JVM numa API aware of the >>>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>>> be available: >>>>> >>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>>> >>>>> The change has no effect on numa topologies were the problem does not occur, >>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>>> numa topologies where memory-less nodes exist (like in the last example above), >>>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>>> to the closest node, otherwise they would be not associate to any node and >>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>>> performance. >>>>> >>>>> I found no regressions on x64 for the following numa topology: >>>>> >>>>> available: 2 nodes (0-1) >>>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>>> node 0 size: 24102 MB >>>>> node 0 free: 19806 MB >>>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>>> node 1 size: 24190 MB >>>>> node 1 free: 21951 MB >>>>> node distances: >>>>> node 0 1 >>>>> 0: 10 21 >>>>> 1: 21 10 >>>>> >>>>> I understand that fixing the current numa detection is a prerequisite to enable >>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>>> >>>> >>> >> > From david.holmes at oracle.com Thu May 4 01:50:27 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 May 2017 11:50:27 +1000 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com> Message-ID: <0e89961f-e5da-cb85-e30d-33e424b69a0b@oracle.com> Hi Volker, Gustavo, I will try to take a look at this again, but may be a day or two. David On 4/05/2017 12:34 AM, Volker Simonis wrote: > Hi, > > I've reviewed Gustavo's change and I'm fine with the latest version at: > > http://cr.openjdk.java.net/~gromero/8175813/v3/ > > Can somebody please sponsor the change? > > Thank you and best regards, > Volker > > > On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero > wrote: >> Hi community, >> >> I understand that there is nothing that can be done additionally regarding this >> issue, at this point, on the PPC64 side. >> >> It's a change in the shared code - but that in effect does not change anything in >> the numa detection mechanism for other platforms - and hence it's necessary a >> conjoint community effort to review the change and a sponsor to run it against >> the JPRT. >> >> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of >> great concern on PPC64 (specially on POWER8 machines) I would be very glad if >> the community could point out directions on how that change could move on. >> >> Thank you! >> >> Best regards, >> Gustavo >> >> On 25-04-2017 23:49, Gustavo Romero wrote: >>> Dear Volker, >>> >>> On 24-04-2017 14:08, Volker Simonis wrote: >>>> Hi Gustavo, >>>> >>>> thanks for addressing this problem and sorry for my late reply. I >>>> think this is a good change which definitely improves the situation >>>> for uncommon NUMA configurations without changing the handling for >>>> common topologies. >>> >>> Thanks a lot for reviewing the change! >>> >>> >>>> It would be great if somebody could run this trough JPRT, but as >>>> Gustavo mentioned, I don't expect any regressions. >>>> >>>> @Igor: I think you've been the original author of the NUMA-aware >>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >>>> linux"). If you could find some spare minutes to take a look at this >>>> change, your comment would be very much appreciated :) >>>> >>>> Following some minor comments from me: >>>> >>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >>>> to get the actual number of configured nodes. This is good and >>>> certainly an improvement over the previous implementation. However, >>>> the man page for numa_num_configured_nodes() mentions that the >>>> returned count may contain currently disabled nodes. Do we currently >>>> handle disabled nodes? What will be the consequence if we would use >>>> such a disabled node (e.g. mbind() warnings)? >>> >>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in >>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just >>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the >>> number of nodes with memory in the system. To the best of my knowledge there is >>> no system configuration on Linux/PPC64 that could match such a notion of >>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in >>> that dir and just the ones with memory will be taken into account. If it's >>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no >>> mbind() tried against it). >>> >>> On Power it's possible to have a numa node without memory (memory-less node, a >>> case covered in this change), a numa node without cpus at all but with memory >>> (a configured node anyway, so a case already covered) but to disable a specific >>> numa node so it does not appear in /sys/devices/system/node/* it's only possible >>> from the inners of the control module. Or other rare condition not invisible / >>> adjustable from the OS. Also I'm not aware of a case where a node is in this >>> dir but is at the same time flagged as something like "disabled". There are >>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK. >>> >>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 >>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 >>> >>> >>>> - the same question applies to the usage of >>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >>>> Does isnode_in_configured_nodes() (i.e. the node set defined by >>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >>>> this be a potential problem (i.e. if we use a disabled node). >>> >>> On the meaning of "disabled nodes", it's the same case as above, so to the >>> best of knowledge it's not a potential problem. >>> >>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), >>> i.e. "all nodes on which the calling task may allocate memory". It's exactly >>> the same pointer returned by numa_get_membind() v2 [3] which: >>> >>> "returns the mask of nodes from which memory can currently be allocated" >>> >>> and that is used, for example, in "numactl --show" to show nodes from where >>> memory can be allocated [4, 5]. >>> >>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 >>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 >>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 >>> >>> >>>> - I'd like to suggest renaming the 'index' part of the following >>>> variables and functions to 'nindex' ('node_index' is probably to long) >>>> in the following code, to emphasize that we have node indexes pointing >>>> to actual, not always consecutive node numbers: >>>> >>>> 2879 // Create an index -> node mapping, since nodes are not >>>> always consecutive >>>> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >>>> GrowableArray(0, true); >>>> 2881 rebuild_index_to_node_map(); >>> >>> Simple change but much better to read indeed. Done. >>> >>> >>>> - can you please wrap the following one-line else statement into curly >>>> braces (it's more readable and we usually do it that way in HotSpot >>>> although there are no formal style guidelines :) >>>> >>>> 2953 } else >>>> 2954 // Current node is already a configured node. >>>> 2955 closest_node = index_to_node()->at(i); >>> >>> Done. >>> >>> >>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set >>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can >>>> later avoid the check for '|| !closest_distance'. Also, according to >>>> the man page, numa_distance() returns 0 if it can not determine the >>>> distance. So with the above change, the condition on line 2974 should >>>> read: >>>> >>>> 2947 if (distance && distance < closest_distance) { >>>> >>> >>> Sure, much better to set the initial condition as distant as possible and >>> adjust to a closer one bit by bit improving the if condition. Done. >>> >>> >>>> Finally, and not directly related to your change, I'd suggest the >>>> following clean-ups: >>>> >>>> - remove the usage of 'NCPUS = 32768' in >>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >>>> unclear to me and probably related to an older version/problem of >>>> libnuma? I think we should simply use >>>> numa_allocate_cpumask()/numa_free_cpumask() instead. >>>> >>>> - we still use the NUMA version 1 function prototypes (e.g. >>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >>>> also "numa_interleave_memory()" and maybe others). I think we should >>>> switch all prototypes to the new NUMA version 2 interface which you've >>>> already used for the new functions which you've added. >>> >>> I agree. Could I open a new bug to address these clean-ups? >>> >>> >>>> That said, I think these changes all require libnuma 2.0 (see >>>> os::Linux::libnuma_dlsym). So before starting this, you should make >>>> sure that libnuma 2.0 is available on all platforms to which you'd >>>> like to down-port this change. For jdk10 we could definitely do it, >>>> for jdk9 probably also, for jdk8 I'm not so sure. >>> >>> libnuma v1 last release dates back to 2008, but any idea how could I check that >>> for sure since it's on shared code? >>> >>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ >>> >>> Thank you! >>> >>> Best regards, >>> Gustavo >>> >>> >>>> Regards, >>>> Volker >>>> >>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >>>> wrote: >>>>> Hi, >>>>> >>>>> Any update on it? >>>>> >>>>> Thank you. >>>>> >>>>> Regards, >>>>> Gustavo >>>>> >>>>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>>>> Hi, >>>>>> >>>>>> Could the following webrev be reviewed please? >>>>>> >>>>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>>>> exist in the system. >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>>>> >>>>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>>>> consecutive and have memory, for example in a numa topology like: >>>>>> >>>>>> available: 2 nodes (0-1) >>>>>> node 0 cpus: 0 8 16 24 32 >>>>>> node 0 size: 65258 MB >>>>>> node 0 free: 34 MB >>>>>> node 1 cpus: 40 48 56 64 72 >>>>>> node 1 size: 65320 MB >>>>>> node 1 free: 150 MB >>>>>> node distances: >>>>>> node 0 1 >>>>>> 0: 10 20 >>>>>> 1: 20 10, >>>>>> >>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>>>> topology like: >>>>>> >>>>>> available: 4 nodes (0-1,16-17) >>>>>> node 0 cpus: 0 8 16 24 32 >>>>>> node 0 size: 130706 MB >>>>>> node 0 free: 7729 MB >>>>>> node 1 cpus: 40 48 56 64 72 >>>>>> node 1 size: 0 MB >>>>>> node 1 free: 0 MB >>>>>> node 16 cpus: 80 88 96 104 112 >>>>>> node 16 size: 130630 MB >>>>>> node 16 free: 5282 MB >>>>>> node 17 cpus: 120 128 136 144 152 >>>>>> node 17 size: 0 MB >>>>>> node 17 free: 0 MB >>>>>> node distances: >>>>>> node 0 1 16 17 >>>>>> 0: 10 20 40 40 >>>>>> 1: 20 10 40 40 >>>>>> 16: 40 40 10 20 >>>>>> 17: 40 40 20 10, >>>>>> >>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>>>> no memory. >>>>>> >>>>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>>>> id as a hint that is not available in the system to be bound (it will receive >>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>>>> messages: >>>>>> >>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>>>> >>>>>> That change improves the detection by making the JVM numa API aware of the >>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>>>> be available: >>>>>> >>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>>>> >>>>>> The change has no effect on numa topologies were the problem does not occur, >>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>>>> numa topologies where memory-less nodes exist (like in the last example above), >>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>>>> to the closest node, otherwise they would be not associate to any node and >>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>>>> performance. >>>>>> >>>>>> I found no regressions on x64 for the following numa topology: >>>>>> >>>>>> available: 2 nodes (0-1) >>>>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>>>> node 0 size: 24102 MB >>>>>> node 0 free: 19806 MB >>>>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>>>> node 1 size: 24190 MB >>>>>> node 1 free: 21951 MB >>>>>> node distances: >>>>>> node 0 1 >>>>>> 0: 10 21 >>>>>> 1: 21 10 >>>>>> >>>>>> I understand that fixing the current numa detection is a prerequisite to enable >>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> Best regards, >>>>>> Gustavo >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>>>> >>>>> >>>> >>> >> From david.holmes at oracle.com Fri May 5 00:32:17 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 May 2017 10:32:17 +1000 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com> Message-ID: Hi Volker, Gustavo, On 4/05/2017 12:34 AM, Volker Simonis wrote: > Hi, > > I've reviewed Gustavo's change and I'm fine with the latest version at: > > http://cr.openjdk.java.net/~gromero/8175813/v3/ Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version. One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers. Style nits: - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0" - spaces around operators eg. node=0 should be node = 0 Thanks, David > Can somebody please sponsor the change? > > Thank you and best regards, > Volker > > > On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero > wrote: >> Hi community, >> >> I understand that there is nothing that can be done additionally regarding this >> issue, at this point, on the PPC64 side. >> >> It's a change in the shared code - but that in effect does not change anything in >> the numa detection mechanism for other platforms - and hence it's necessary a >> conjoint community effort to review the change and a sponsor to run it against >> the JPRT. >> >> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of >> great concern on PPC64 (specially on POWER8 machines) I would be very glad if >> the community could point out directions on how that change could move on. >> >> Thank you! >> >> Best regards, >> Gustavo >> >> On 25-04-2017 23:49, Gustavo Romero wrote: >>> Dear Volker, >>> >>> On 24-04-2017 14:08, Volker Simonis wrote: >>>> Hi Gustavo, >>>> >>>> thanks for addressing this problem and sorry for my late reply. I >>>> think this is a good change which definitely improves the situation >>>> for uncommon NUMA configurations without changing the handling for >>>> common topologies. >>> >>> Thanks a lot for reviewing the change! >>> >>> >>>> It would be great if somebody could run this trough JPRT, but as >>>> Gustavo mentioned, I don't expect any regressions. >>>> >>>> @Igor: I think you've been the original author of the NUMA-aware >>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >>>> linux"). If you could find some spare minutes to take a look at this >>>> change, your comment would be very much appreciated :) >>>> >>>> Following some minor comments from me: >>>> >>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >>>> to get the actual number of configured nodes. This is good and >>>> certainly an improvement over the previous implementation. However, >>>> the man page for numa_num_configured_nodes() mentions that the >>>> returned count may contain currently disabled nodes. Do we currently >>>> handle disabled nodes? What will be the consequence if we would use >>>> such a disabled node (e.g. mbind() warnings)? >>> >>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in >>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just >>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the >>> number of nodes with memory in the system. To the best of my knowledge there is >>> no system configuration on Linux/PPC64 that could match such a notion of >>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in >>> that dir and just the ones with memory will be taken into account. If it's >>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no >>> mbind() tried against it). >>> >>> On Power it's possible to have a numa node without memory (memory-less node, a >>> case covered in this change), a numa node without cpus at all but with memory >>> (a configured node anyway, so a case already covered) but to disable a specific >>> numa node so it does not appear in /sys/devices/system/node/* it's only possible >>> from the inners of the control module. Or other rare condition not invisible / >>> adjustable from the OS. Also I'm not aware of a case where a node is in this >>> dir but is at the same time flagged as something like "disabled". There are >>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK. >>> >>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 >>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 >>> >>> >>>> - the same question applies to the usage of >>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >>>> Does isnode_in_configured_nodes() (i.e. the node set defined by >>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >>>> this be a potential problem (i.e. if we use a disabled node). >>> >>> On the meaning of "disabled nodes", it's the same case as above, so to the >>> best of knowledge it's not a potential problem. >>> >>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), >>> i.e. "all nodes on which the calling task may allocate memory". It's exactly >>> the same pointer returned by numa_get_membind() v2 [3] which: >>> >>> "returns the mask of nodes from which memory can currently be allocated" >>> >>> and that is used, for example, in "numactl --show" to show nodes from where >>> memory can be allocated [4, 5]. >>> >>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 >>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 >>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 >>> >>> >>>> - I'd like to suggest renaming the 'index' part of the following >>>> variables and functions to 'nindex' ('node_index' is probably to long) >>>> in the following code, to emphasize that we have node indexes pointing >>>> to actual, not always consecutive node numbers: >>>> >>>> 2879 // Create an index -> node mapping, since nodes are not >>>> always consecutive >>>> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >>>> GrowableArray(0, true); >>>> 2881 rebuild_index_to_node_map(); >>> >>> Simple change but much better to read indeed. Done. >>> >>> >>>> - can you please wrap the following one-line else statement into curly >>>> braces (it's more readable and we usually do it that way in HotSpot >>>> although there are no formal style guidelines :) >>>> >>>> 2953 } else >>>> 2954 // Current node is already a configured node. >>>> 2955 closest_node = index_to_node()->at(i); >>> >>> Done. >>> >>> >>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set >>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can >>>> later avoid the check for '|| !closest_distance'. Also, according to >>>> the man page, numa_distance() returns 0 if it can not determine the >>>> distance. So with the above change, the condition on line 2974 should >>>> read: >>>> >>>> 2947 if (distance && distance < closest_distance) { >>>> >>> >>> Sure, much better to set the initial condition as distant as possible and >>> adjust to a closer one bit by bit improving the if condition. Done. >>> >>> >>>> Finally, and not directly related to your change, I'd suggest the >>>> following clean-ups: >>>> >>>> - remove the usage of 'NCPUS = 32768' in >>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >>>> unclear to me and probably related to an older version/problem of >>>> libnuma? I think we should simply use >>>> numa_allocate_cpumask()/numa_free_cpumask() instead. >>>> >>>> - we still use the NUMA version 1 function prototypes (e.g. >>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >>>> also "numa_interleave_memory()" and maybe others). I think we should >>>> switch all prototypes to the new NUMA version 2 interface which you've >>>> already used for the new functions which you've added. >>> >>> I agree. Could I open a new bug to address these clean-ups? >>> >>> >>>> That said, I think these changes all require libnuma 2.0 (see >>>> os::Linux::libnuma_dlsym). So before starting this, you should make >>>> sure that libnuma 2.0 is available on all platforms to which you'd >>>> like to down-port this change. For jdk10 we could definitely do it, >>>> for jdk9 probably also, for jdk8 I'm not so sure. >>> >>> libnuma v1 last release dates back to 2008, but any idea how could I check that >>> for sure since it's on shared code? >>> >>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ >>> >>> Thank you! >>> >>> Best regards, >>> Gustavo >>> >>> >>>> Regards, >>>> Volker >>>> >>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >>>> wrote: >>>>> Hi, >>>>> >>>>> Any update on it? >>>>> >>>>> Thank you. >>>>> >>>>> Regards, >>>>> Gustavo >>>>> >>>>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>>>> Hi, >>>>>> >>>>>> Could the following webrev be reviewed please? >>>>>> >>>>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>>>> exist in the system. >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>>>> >>>>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>>>> consecutive and have memory, for example in a numa topology like: >>>>>> >>>>>> available: 2 nodes (0-1) >>>>>> node 0 cpus: 0 8 16 24 32 >>>>>> node 0 size: 65258 MB >>>>>> node 0 free: 34 MB >>>>>> node 1 cpus: 40 48 56 64 72 >>>>>> node 1 size: 65320 MB >>>>>> node 1 free: 150 MB >>>>>> node distances: >>>>>> node 0 1 >>>>>> 0: 10 20 >>>>>> 1: 20 10, >>>>>> >>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>>>> topology like: >>>>>> >>>>>> available: 4 nodes (0-1,16-17) >>>>>> node 0 cpus: 0 8 16 24 32 >>>>>> node 0 size: 130706 MB >>>>>> node 0 free: 7729 MB >>>>>> node 1 cpus: 40 48 56 64 72 >>>>>> node 1 size: 0 MB >>>>>> node 1 free: 0 MB >>>>>> node 16 cpus: 80 88 96 104 112 >>>>>> node 16 size: 130630 MB >>>>>> node 16 free: 5282 MB >>>>>> node 17 cpus: 120 128 136 144 152 >>>>>> node 17 size: 0 MB >>>>>> node 17 free: 0 MB >>>>>> node distances: >>>>>> node 0 1 16 17 >>>>>> 0: 10 20 40 40 >>>>>> 1: 20 10 40 40 >>>>>> 16: 40 40 10 20 >>>>>> 17: 40 40 20 10, >>>>>> >>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>>>> no memory. >>>>>> >>>>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>>>> id as a hint that is not available in the system to be bound (it will receive >>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>>>> messages: >>>>>> >>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>>>> >>>>>> That change improves the detection by making the JVM numa API aware of the >>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>>>> be available: >>>>>> >>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>>>> >>>>>> The change has no effect on numa topologies were the problem does not occur, >>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>>>> numa topologies where memory-less nodes exist (like in the last example above), >>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>>>> to the closest node, otherwise they would be not associate to any node and >>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>>>> performance. >>>>>> >>>>>> I found no regressions on x64 for the following numa topology: >>>>>> >>>>>> available: 2 nodes (0-1) >>>>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>>>> node 0 size: 24102 MB >>>>>> node 0 free: 19806 MB >>>>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>>>> node 1 size: 24190 MB >>>>>> node 1 free: 21951 MB >>>>>> node distances: >>>>>> node 0 1 >>>>>> 0: 10 21 >>>>>> 1: 21 10 >>>>>> >>>>>> I understand that fixing the current numa detection is a prerequisite to enable >>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> Best regards, >>>>>> Gustavo >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>>>> >>>>> >>>> >>> >> From gromero at linux.vnet.ibm.com Fri May 5 19:43:35 2017 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 5 May 2017 16:43:35 -0300 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com> Message-ID: <590CD5E7.10809@linux.vnet.ibm.com> Hi David, On 04-05-2017 21:32, David Holmes wrote: > Hi Volker, Gustavo, > > On 4/05/2017 12:34 AM, Volker Simonis wrote: >> Hi, >> >> I've reviewed Gustavo's change and I'm fine with the latest version at: >> >> http://cr.openjdk.java.net/~gromero/8175813/v3/ > > Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with > this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version. Thanks a lot for reviewing and sponsoring the change. > One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at > all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless > -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers. If libnuma is not available in the system os::Linux::libnuma_init() will return false and JVM will refuse to enable the UseNUMA features instead of aborting: 4904 if (UseNUMA) { 4905 if (!Linux::libnuma_init()) { 4906 UseNUMA = false; 4907 } else { I understand those null checks as part of the initial design of JVM numa api to enforce protection against the usage of its methods in other parts of the code when JVM api failed to initialize properly, even tho it's expected that UseNUMA = false should suffice to protect such a usages. That said, I could not find any recent Linux distribution that does not support libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a dependency of metapackage ubuntu-standard and because that requires "irqbalance" it also requires libnuma. Libnuma was updated from libnuma v1 to v2 around mid 2008: numactl (2.0.1-1) unstable; urgency=low * New upstream * patches/static-lib.patch: update * debian/watch: update to new SGI location -- Ian Wienand Sat, 07 Jun 2008 14:18:22 -0700 numactl (1.0.2-1) unstable; urgency=low * New upstream * Closes: #442690 -- Add to rules a hack to remove libnuma.a after unpatching * Update README.debian -- Ian Wienand Wed, 03 Oct 2007 21:49:27 +1000 It's similar on RHEL, where "irqbalance" is in core group. Regarding the libnuma version it was also updated in 2008 to v2, so since Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it: * Wed Feb 25 2009 Fedora Release Engineering - 2.0.2-3 - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild * Mon Sep 29 2008 Neil Horman - 2.0.2-2 - Fix build break due to register selection in asm * Mon Sep 29 2008 Neil Horman - 2.0.2-1 - Update rawhide to version 2.0.2 of numactl * Fri Apr 25 2008 Neil Horman - 1.0.2-6 - Fix buffer size passing and arg sanity check for physcpubind (bz 442521) Also, the last release of libnuma v1 dates back to 2008: https://github.com/numactl/numactl/releases/tag/v1.0.2 So it looks like libnuma v2 absence on Linux is by now uncommon. > Style nits: > - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0" > - spaces around operators eg. node=0 should be node = 0 new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/ Thank you and best regards, Gustavo > Thanks, > David > >> Can somebody please sponsor the change? >> >> Thank you and best regards, >> Volker >> >> >> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero >> wrote: >>> Hi community, >>> >>> I understand that there is nothing that can be done additionally regarding this >>> issue, at this point, on the PPC64 side. >>> >>> It's a change in the shared code - but that in effect does not change anything in >>> the numa detection mechanism for other platforms - and hence it's necessary a >>> conjoint community effort to review the change and a sponsor to run it against >>> the JPRT. >>> >>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of >>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if >>> the community could point out directions on how that change could move on. >>> >>> Thank you! >>> >>> Best regards, >>> Gustavo >>> >>> On 25-04-2017 23:49, Gustavo Romero wrote: >>>> Dear Volker, >>>> >>>> On 24-04-2017 14:08, Volker Simonis wrote: >>>>> Hi Gustavo, >>>>> >>>>> thanks for addressing this problem and sorry for my late reply. I >>>>> think this is a good change which definitely improves the situation >>>>> for uncommon NUMA configurations without changing the handling for >>>>> common topologies. >>>> >>>> Thanks a lot for reviewing the change! >>>> >>>> >>>>> It would be great if somebody could run this trough JPRT, but as >>>>> Gustavo mentioned, I don't expect any regressions. >>>>> >>>>> @Igor: I think you've been the original author of the NUMA-aware >>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >>>>> linux"). If you could find some spare minutes to take a look at this >>>>> change, your comment would be very much appreciated :) >>>>> >>>>> Following some minor comments from me: >>>>> >>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >>>>> to get the actual number of configured nodes. This is good and >>>>> certainly an improvement over the previous implementation. However, >>>>> the man page for numa_num_configured_nodes() mentions that the >>>>> returned count may contain currently disabled nodes. Do we currently >>>>> handle disabled nodes? What will be the consequence if we would use >>>>> such a disabled node (e.g. mbind() warnings)? >>>> >>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in >>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just >>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the >>>> number of nodes with memory in the system. To the best of my knowledge there is >>>> no system configuration on Linux/PPC64 that could match such a notion of >>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in >>>> that dir and just the ones with memory will be taken into account. If it's >>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no >>>> mbind() tried against it). >>>> >>>> On Power it's possible to have a numa node without memory (memory-less node, a >>>> case covered in this change), a numa node without cpus at all but with memory >>>> (a configured node anyway, so a case already covered) but to disable a specific >>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible >>>> from the inners of the control module. Or other rare condition not invisible / >>>> adjustable from the OS. Also I'm not aware of a case where a node is in this >>>> dir but is at the same time flagged as something like "disabled". There are >>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK. >>>> >>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 >>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 >>>> >>>> >>>>> - the same question applies to the usage of >>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by >>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >>>>> this be a potential problem (i.e. if we use a disabled node). >>>> >>>> On the meaning of "disabled nodes", it's the same case as above, so to the >>>> best of knowledge it's not a potential problem. >>>> >>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), >>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly >>>> the same pointer returned by numa_get_membind() v2 [3] which: >>>> >>>> "returns the mask of nodes from which memory can currently be allocated" >>>> >>>> and that is used, for example, in "numactl --show" to show nodes from where >>>> memory can be allocated [4, 5]. >>>> >>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 >>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 >>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 >>>> >>>> >>>>> - I'd like to suggest renaming the 'index' part of the following >>>>> variables and functions to 'nindex' ('node_index' is probably to long) >>>>> in the following code, to emphasize that we have node indexes pointing >>>>> to actual, not always consecutive node numbers: >>>>> >>>>> 2879 // Create an index -> node mapping, since nodes are not >>>>> always consecutive >>>>> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >>>>> GrowableArray(0, true); >>>>> 2881 rebuild_index_to_node_map(); >>>> >>>> Simple change but much better to read indeed. Done. >>>> >>>> >>>>> - can you please wrap the following one-line else statement into curly >>>>> braces (it's more readable and we usually do it that way in HotSpot >>>>> although there are no formal style guidelines :) >>>>> >>>>> 2953 } else >>>>> 2954 // Current node is already a configured node. >>>>> 2955 closest_node = index_to_node()->at(i); >>>> >>>> Done. >>>> >>>> >>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set >>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can >>>>> later avoid the check for '|| !closest_distance'. Also, according to >>>>> the man page, numa_distance() returns 0 if it can not determine the >>>>> distance. So with the above change, the condition on line 2974 should >>>>> read: >>>>> >>>>> 2947 if (distance && distance < closest_distance) { >>>>> >>>> >>>> Sure, much better to set the initial condition as distant as possible and >>>> adjust to a closer one bit by bit improving the if condition. Done. >>>> >>>> >>>>> Finally, and not directly related to your change, I'd suggest the >>>>> following clean-ups: >>>>> >>>>> - remove the usage of 'NCPUS = 32768' in >>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >>>>> unclear to me and probably related to an older version/problem of >>>>> libnuma? I think we should simply use >>>>> numa_allocate_cpumask()/numa_free_cpumask() instead. >>>>> >>>>> - we still use the NUMA version 1 function prototypes (e.g. >>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >>>>> also "numa_interleave_memory()" and maybe others). I think we should >>>>> switch all prototypes to the new NUMA version 2 interface which you've >>>>> already used for the new functions which you've added. >>>> >>>> I agree. Could I open a new bug to address these clean-ups? >>>> >>>> >>>>> That said, I think these changes all require libnuma 2.0 (see >>>>> os::Linux::libnuma_dlsym). So before starting this, you should make >>>>> sure that libnuma 2.0 is available on all platforms to which you'd >>>>> like to down-port this change. For jdk10 we could definitely do it, >>>>> for jdk9 probably also, for jdk8 I'm not so sure. >>>> >>>> libnuma v1 last release dates back to 2008, but any idea how could I check that >>>> for sure since it's on shared code? >>>> >>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> >>>>> Regards, >>>>> Volker >>>>> >>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> Any update on it? >>>>>> >>>>>> Thank you. >>>>>> >>>>>> Regards, >>>>>> Gustavo >>>>>> >>>>>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Could the following webrev be reviewed please? >>>>>>> >>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>>>>> exist in the system. >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>>>>> >>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>>>>> consecutive and have memory, for example in a numa topology like: >>>>>>> >>>>>>> available: 2 nodes (0-1) >>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>> node 0 size: 65258 MB >>>>>>> node 0 free: 34 MB >>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>> node 1 size: 65320 MB >>>>>>> node 1 free: 150 MB >>>>>>> node distances: >>>>>>> node 0 1 >>>>>>> 0: 10 20 >>>>>>> 1: 20 10, >>>>>>> >>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>>>>> topology like: >>>>>>> >>>>>>> available: 4 nodes (0-1,16-17) >>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>> node 0 size: 130706 MB >>>>>>> node 0 free: 7729 MB >>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>> node 1 size: 0 MB >>>>>>> node 1 free: 0 MB >>>>>>> node 16 cpus: 80 88 96 104 112 >>>>>>> node 16 size: 130630 MB >>>>>>> node 16 free: 5282 MB >>>>>>> node 17 cpus: 120 128 136 144 152 >>>>>>> node 17 size: 0 MB >>>>>>> node 17 free: 0 MB >>>>>>> node distances: >>>>>>> node 0 1 16 17 >>>>>>> 0: 10 20 40 40 >>>>>>> 1: 20 10 40 40 >>>>>>> 16: 40 40 10 20 >>>>>>> 17: 40 40 20 10, >>>>>>> >>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>>>>> no memory. >>>>>>> >>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>>>>> id as a hint that is not available in the system to be bound (it will receive >>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>>>>> messages: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>>>>> >>>>>>> That change improves the detection by making the JVM numa API aware of the >>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>>>>> be available: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>>>>> >>>>>>> The change has no effect on numa topologies were the problem does not occur, >>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>>>>> numa topologies where memory-less nodes exist (like in the last example above), >>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>>>>> to the closest node, otherwise they would be not associate to any node and >>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>>>>> performance. >>>>>>> >>>>>>> I found no regressions on x64 for the following numa topology: >>>>>>> >>>>>>> available: 2 nodes (0-1) >>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>>>>> node 0 size: 24102 MB >>>>>>> node 0 free: 19806 MB >>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>>>>> node 1 size: 24190 MB >>>>>>> node 1 free: 21951 MB >>>>>>> node distances: >>>>>>> node 0 1 >>>>>>> 0: 10 21 >>>>>>> 1: 21 10 >>>>>>> >>>>>>> I understand that fixing the current numa detection is a prerequisite to enable >>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> Gustavo >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>>>>> >>>>>> >>>>> >>>> >>> > From volker.simonis at gmail.com Sat May 6 06:59:15 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Sat, 6 May 2017 08:59:15 +0200 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: <590CD5E7.10809@linux.vnet.ibm.com> References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com> <590CD5E7.10809@linux.vnet.ibm.com> Message-ID: On Fri, May 5, 2017 at 9:43 PM, Gustavo Romero wrote: > Hi David, > > On 04-05-2017 21:32, David Holmes wrote: >> Hi Volker, Gustavo, >> >> On 4/05/2017 12:34 AM, Volker Simonis wrote: >>> Hi, >>> >>> I've reviewed Gustavo's change and I'm fine with the latest version at: >>> >>> http://cr.openjdk.java.net/~gromero/8175813/v3/ >> >> Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with >> this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version. > > Thanks a lot for reviewing and sponsoring the change. > > >> One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at >> all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless >> -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers. > > If libnuma is not available in the system os::Linux::libnuma_init() will return > false and JVM will refuse to enable the UseNUMA features instead of aborting: > > 4904 if (UseNUMA) { > 4905 if (!Linux::libnuma_init()) { > 4906 UseNUMA = false; > 4907 } else { > > I understand those null checks as part of the initial design of JVM numa api to > enforce protection against the usage of its methods in other parts of the code > when JVM api failed to initialize properly, even tho it's expected that > UseNUMA = false should suffice to protect such a usages. > > That said, I could not find any recent Linux distribution that does not support > libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a > dependency of metapackage ubuntu-standard and because that requires "irqbalance" > it also requires libnuma. Libnuma was updated from libnuma v1 to v2 > around mid 2008: > > numactl (2.0.1-1) unstable; urgency=low > > * New upstream > * patches/static-lib.patch: update > * debian/watch: update to new SGI location > > -- Ian Wienand Sat, 07 Jun 2008 14:18:22 -0700 > > numactl (1.0.2-1) unstable; urgency=low > > * New upstream > * Closes: #442690 -- Add to rules a hack to remove libnuma.a after > unpatching > * Update README.debian > > > -- Ian Wienand Wed, 03 Oct 2007 21:49:27 +1000 > > > It's similar on RHEL, where "irqbalance" is in core group. Regarding > the libnuma version it was also updated in 2008 to v2, so since > Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it: > > * Wed Feb 25 2009 Fedora Release Engineering - 2.0.2-3 > - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild > > * Mon Sep 29 2008 Neil Horman - 2.0.2-2 > - Fix build break due to register selection in asm > > * Mon Sep 29 2008 Neil Horman - 2.0.2-1 > - Update rawhide to version 2.0.2 of numactl > > * Fri Apr 25 2008 Neil Horman - 1.0.2-6 > - Fix buffer size passing and arg sanity check for physcpubind (bz 442521) > > > Also, the last release of libnuma v1 dates back to 2008: > https://github.com/numactl/numactl/releases/tag/v1.0.2 > > So it looks like libnuma v2 absence on Linux is by now uncommon. > > >> Style nits: >> - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0" >> - spaces around operators eg. node=0 should be node = 0 > > new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/ > Still good :) THumbs up! And thanks a lot for digging into the history of libnuma and its incarnation in various Linux distros. That's really useful information! Regards, Volker > > Thank you and best regards, > Gustavo > >> Thanks, >> David >> >>> Can somebody please sponsor the change? >>> >>> Thank you and best regards, >>> Volker >>> >>> >>> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero >>> wrote: >>>> Hi community, >>>> >>>> I understand that there is nothing that can be done additionally regarding this >>>> issue, at this point, on the PPC64 side. >>>> >>>> It's a change in the shared code - but that in effect does not change anything in >>>> the numa detection mechanism for other platforms - and hence it's necessary a >>>> conjoint community effort to review the change and a sponsor to run it against >>>> the JPRT. >>>> >>>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of >>>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if >>>> the community could point out directions on how that change could move on. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> On 25-04-2017 23:49, Gustavo Romero wrote: >>>>> Dear Volker, >>>>> >>>>> On 24-04-2017 14:08, Volker Simonis wrote: >>>>>> Hi Gustavo, >>>>>> >>>>>> thanks for addressing this problem and sorry for my late reply. I >>>>>> think this is a good change which definitely improves the situation >>>>>> for uncommon NUMA configurations without changing the handling for >>>>>> common topologies. >>>>> >>>>> Thanks a lot for reviewing the change! >>>>> >>>>> >>>>>> It would be great if somebody could run this trough JPRT, but as >>>>>> Gustavo mentioned, I don't expect any regressions. >>>>>> >>>>>> @Igor: I think you've been the original author of the NUMA-aware >>>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >>>>>> linux"). If you could find some spare minutes to take a look at this >>>>>> change, your comment would be very much appreciated :) >>>>>> >>>>>> Following some minor comments from me: >>>>>> >>>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >>>>>> to get the actual number of configured nodes. This is good and >>>>>> certainly an improvement over the previous implementation. However, >>>>>> the man page for numa_num_configured_nodes() mentions that the >>>>>> returned count may contain currently disabled nodes. Do we currently >>>>>> handle disabled nodes? What will be the consequence if we would use >>>>>> such a disabled node (e.g. mbind() warnings)? >>>>> >>>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in >>>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just >>>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the >>>>> number of nodes with memory in the system. To the best of my knowledge there is >>>>> no system configuration on Linux/PPC64 that could match such a notion of >>>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in >>>>> that dir and just the ones with memory will be taken into account. If it's >>>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no >>>>> mbind() tried against it). >>>>> >>>>> On Power it's possible to have a numa node without memory (memory-less node, a >>>>> case covered in this change), a numa node without cpus at all but with memory >>>>> (a configured node anyway, so a case already covered) but to disable a specific >>>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible >>>>> from the inners of the control module. Or other rare condition not invisible / >>>>> adjustable from the OS. Also I'm not aware of a case where a node is in this >>>>> dir but is at the same time flagged as something like "disabled". There are >>>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK. >>>>> >>>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 >>>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 >>>>> >>>>> >>>>>> - the same question applies to the usage of >>>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >>>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by >>>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >>>>>> this be a potential problem (i.e. if we use a disabled node). >>>>> >>>>> On the meaning of "disabled nodes", it's the same case as above, so to the >>>>> best of knowledge it's not a potential problem. >>>>> >>>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), >>>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly >>>>> the same pointer returned by numa_get_membind() v2 [3] which: >>>>> >>>>> "returns the mask of nodes from which memory can currently be allocated" >>>>> >>>>> and that is used, for example, in "numactl --show" to show nodes from where >>>>> memory can be allocated [4, 5]. >>>>> >>>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 >>>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 >>>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 >>>>> >>>>> >>>>>> - I'd like to suggest renaming the 'index' part of the following >>>>>> variables and functions to 'nindex' ('node_index' is probably to long) >>>>>> in the following code, to emphasize that we have node indexes pointing >>>>>> to actual, not always consecutive node numbers: >>>>>> >>>>>> 2879 // Create an index -> node mapping, since nodes are not >>>>>> always consecutive >>>>>> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >>>>>> GrowableArray(0, true); >>>>>> 2881 rebuild_index_to_node_map(); >>>>> >>>>> Simple change but much better to read indeed. Done. >>>>> >>>>> >>>>>> - can you please wrap the following one-line else statement into curly >>>>>> braces (it's more readable and we usually do it that way in HotSpot >>>>>> although there are no formal style guidelines :) >>>>>> >>>>>> 2953 } else >>>>>> 2954 // Current node is already a configured node. >>>>>> 2955 closest_node = index_to_node()->at(i); >>>>> >>>>> Done. >>>>> >>>>> >>>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set >>>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can >>>>>> later avoid the check for '|| !closest_distance'. Also, according to >>>>>> the man page, numa_distance() returns 0 if it can not determine the >>>>>> distance. So with the above change, the condition on line 2974 should >>>>>> read: >>>>>> >>>>>> 2947 if (distance && distance < closest_distance) { >>>>>> >>>>> >>>>> Sure, much better to set the initial condition as distant as possible and >>>>> adjust to a closer one bit by bit improving the if condition. Done. >>>>> >>>>> >>>>>> Finally, and not directly related to your change, I'd suggest the >>>>>> following clean-ups: >>>>>> >>>>>> - remove the usage of 'NCPUS = 32768' in >>>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >>>>>> unclear to me and probably related to an older version/problem of >>>>>> libnuma? I think we should simply use >>>>>> numa_allocate_cpumask()/numa_free_cpumask() instead. >>>>>> >>>>>> - we still use the NUMA version 1 function prototypes (e.g. >>>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >>>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >>>>>> also "numa_interleave_memory()" and maybe others). I think we should >>>>>> switch all prototypes to the new NUMA version 2 interface which you've >>>>>> already used for the new functions which you've added. >>>>> >>>>> I agree. Could I open a new bug to address these clean-ups? >>>>> >>>>> >>>>>> That said, I think these changes all require libnuma 2.0 (see >>>>>> os::Linux::libnuma_dlsym). So before starting this, you should make >>>>>> sure that libnuma 2.0 is available on all platforms to which you'd >>>>>> like to down-port this change. For jdk10 we could definitely do it, >>>>>> for jdk9 probably also, for jdk8 I'm not so sure. >>>>> >>>>> libnuma v1 last release dates back to 2008, but any idea how could I check that >>>>> for sure since it's on shared code? >>>>> >>>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> >>>>>> Regards, >>>>>> Volker >>>>>> >>>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Any update on it? >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Gustavo >>>>>>> >>>>>>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Could the following webrev be reviewed please? >>>>>>>> >>>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>>>>>> exist in the system. >>>>>>>> >>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>>>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>>>>>> >>>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>>>>>> consecutive and have memory, for example in a numa topology like: >>>>>>>> >>>>>>>> available: 2 nodes (0-1) >>>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>>> node 0 size: 65258 MB >>>>>>>> node 0 free: 34 MB >>>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>>> node 1 size: 65320 MB >>>>>>>> node 1 free: 150 MB >>>>>>>> node distances: >>>>>>>> node 0 1 >>>>>>>> 0: 10 20 >>>>>>>> 1: 20 10, >>>>>>>> >>>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>>>>>> topology like: >>>>>>>> >>>>>>>> available: 4 nodes (0-1,16-17) >>>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>>> node 0 size: 130706 MB >>>>>>>> node 0 free: 7729 MB >>>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>>> node 1 size: 0 MB >>>>>>>> node 1 free: 0 MB >>>>>>>> node 16 cpus: 80 88 96 104 112 >>>>>>>> node 16 size: 130630 MB >>>>>>>> node 16 free: 5282 MB >>>>>>>> node 17 cpus: 120 128 136 144 152 >>>>>>>> node 17 size: 0 MB >>>>>>>> node 17 free: 0 MB >>>>>>>> node distances: >>>>>>>> node 0 1 16 17 >>>>>>>> 0: 10 20 40 40 >>>>>>>> 1: 20 10 40 40 >>>>>>>> 16: 40 40 10 20 >>>>>>>> 17: 40 40 20 10, >>>>>>>> >>>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>>>>>> no memory. >>>>>>>> >>>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>>>>>> id as a hint that is not available in the system to be bound (it will receive >>>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>>>>>> messages: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>>>>>> >>>>>>>> That change improves the detection by making the JVM numa API aware of the >>>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>>>>>> be available: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>>>>>> >>>>>>>> The change has no effect on numa topologies were the problem does not occur, >>>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>>>>>> numa topologies where memory-less nodes exist (like in the last example above), >>>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>>>>>> to the closest node, otherwise they would be not associate to any node and >>>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>>>>>> performance. >>>>>>>> >>>>>>>> I found no regressions on x64 for the following numa topology: >>>>>>>> >>>>>>>> available: 2 nodes (0-1) >>>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>>>>>> node 0 size: 24102 MB >>>>>>>> node 0 free: 19806 MB >>>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>>>>>> node 1 size: 24190 MB >>>>>>>> node 1 free: 21951 MB >>>>>>>> node distances: >>>>>>>> node 0 1 >>>>>>>> 0: 10 21 >>>>>>>> 1: 21 10 >>>>>>>> >>>>>>>> I understand that fixing the current numa detection is a prerequisite to enable >>>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Gustavo >>>>>>>> >>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> > From david.holmes at oracle.com Sun May 7 20:45:09 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 8 May 2017 06:45:09 +1000 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: <590CD5E7.10809@linux.vnet.ibm.com> References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com> <590CD5E7.10809@linux.vnet.ibm.com> Message-ID: <4b26117f-508d-90ee-1ced-2a2c720a1047@oracle.com> Hi Gustavo, On 6/05/2017 5:43 AM, Gustavo Romero wrote: > Hi David, > > On 04-05-2017 21:32, David Holmes wrote: >> Hi Volker, Gustavo, >> >> On 4/05/2017 12:34 AM, Volker Simonis wrote: >>> Hi, >>> >>> I've reviewed Gustavo's change and I'm fine with the latest version at: >>> >>> http://cr.openjdk.java.net/~gromero/8175813/v3/ >> >> Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with >> this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version. > > Thanks a lot for reviewing and sponsoring the change. > > >> One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at >> all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless >> -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers. > > If libnuma is not available in the system os::Linux::libnuma_init() will return > false and JVM will refuse to enable the UseNUMA features instead of aborting: > > 4904 if (UseNUMA) { > 4905 if (!Linux::libnuma_init()) { > 4906 UseNUMA = false; > 4907 } else { > > I understand those null checks as part of the initial design of JVM numa api to > enforce protection against the usage of its methods in other parts of the code > when JVM api failed to initialize properly, even tho it's expected that > UseNUMA = false should suffice to protect such a usages. Ok. Seems like they should be asserts rather than runtime checks if all the paths are properly guarded by UseNUMA - but that isn't your problem. > That said, I could not find any recent Linux distribution that does not support > libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a > dependency of metapackage ubuntu-standard and because that requires "irqbalance" > it also requires libnuma. Libnuma was updated from libnuma v1 to v2 > around mid 2008: Thanks for the additional info. > numactl (2.0.1-1) unstable; urgency=low > > * New upstream > * patches/static-lib.patch: update > * debian/watch: update to new SGI location > > -- Ian Wienand Sat, 07 Jun 2008 14:18:22 -0700 > > numactl (1.0.2-1) unstable; urgency=low > > * New upstream > * Closes: #442690 -- Add to rules a hack to remove libnuma.a after > unpatching > * Update README.debian > > > -- Ian Wienand Wed, 03 Oct 2007 21:49:27 +1000 > > > It's similar on RHEL, where "irqbalance" is in core group. Regarding > the libnuma version it was also updated in 2008 to v2, so since > Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it: > > * Wed Feb 25 2009 Fedora Release Engineering - 2.0.2-3 > - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild > > * Mon Sep 29 2008 Neil Horman - 2.0.2-2 > - Fix build break due to register selection in asm > > * Mon Sep 29 2008 Neil Horman - 2.0.2-1 > - Update rawhide to version 2.0.2 of numactl > > * Fri Apr 25 2008 Neil Horman - 1.0.2-6 > - Fix buffer size passing and arg sanity check for physcpubind (bz 442521) > > > Also, the last release of libnuma v1 dates back to 2008: > https://github.com/numactl/numactl/releases/tag/v1.0.2 > > So it looks like libnuma v2 absence on Linux is by now uncommon. > > >> Style nits: >> - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0" >> - spaces around operators eg. node=0 should be node = 0 > > new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/ Looks good. Changes being pushed now. David ----- > > Thank you and best regards, > Gustavo > >> Thanks, >> David >> >>> Can somebody please sponsor the change? >>> >>> Thank you and best regards, >>> Volker >>> >>> >>> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero >>> wrote: >>>> Hi community, >>>> >>>> I understand that there is nothing that can be done additionally regarding this >>>> issue, at this point, on the PPC64 side. >>>> >>>> It's a change in the shared code - but that in effect does not change anything in >>>> the numa detection mechanism for other platforms - and hence it's necessary a >>>> conjoint community effort to review the change and a sponsor to run it against >>>> the JPRT. >>>> >>>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of >>>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if >>>> the community could point out directions on how that change could move on. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> On 25-04-2017 23:49, Gustavo Romero wrote: >>>>> Dear Volker, >>>>> >>>>> On 24-04-2017 14:08, Volker Simonis wrote: >>>>>> Hi Gustavo, >>>>>> >>>>>> thanks for addressing this problem and sorry for my late reply. I >>>>>> think this is a good change which definitely improves the situation >>>>>> for uncommon NUMA configurations without changing the handling for >>>>>> common topologies. >>>>> >>>>> Thanks a lot for reviewing the change! >>>>> >>>>> >>>>>> It would be great if somebody could run this trough JPRT, but as >>>>>> Gustavo mentioned, I don't expect any regressions. >>>>>> >>>>>> @Igor: I think you've been the original author of the NUMA-aware >>>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >>>>>> linux"). If you could find some spare minutes to take a look at this >>>>>> change, your comment would be very much appreciated :) >>>>>> >>>>>> Following some minor comments from me: >>>>>> >>>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >>>>>> to get the actual number of configured nodes. This is good and >>>>>> certainly an improvement over the previous implementation. However, >>>>>> the man page for numa_num_configured_nodes() mentions that the >>>>>> returned count may contain currently disabled nodes. Do we currently >>>>>> handle disabled nodes? What will be the consequence if we would use >>>>>> such a disabled node (e.g. mbind() warnings)? >>>>> >>>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in >>>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just >>>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the >>>>> number of nodes with memory in the system. To the best of my knowledge there is >>>>> no system configuration on Linux/PPC64 that could match such a notion of >>>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in >>>>> that dir and just the ones with memory will be taken into account. If it's >>>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no >>>>> mbind() tried against it). >>>>> >>>>> On Power it's possible to have a numa node without memory (memory-less node, a >>>>> case covered in this change), a numa node without cpus at all but with memory >>>>> (a configured node anyway, so a case already covered) but to disable a specific >>>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible >>>>> from the inners of the control module. Or other rare condition not invisible / >>>>> adjustable from the OS. Also I'm not aware of a case where a node is in this >>>>> dir but is at the same time flagged as something like "disabled". There are >>>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK. >>>>> >>>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 >>>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 >>>>> >>>>> >>>>>> - the same question applies to the usage of >>>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >>>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by >>>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >>>>>> this be a potential problem (i.e. if we use a disabled node). >>>>> >>>>> On the meaning of "disabled nodes", it's the same case as above, so to the >>>>> best of knowledge it's not a potential problem. >>>>> >>>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), >>>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly >>>>> the same pointer returned by numa_get_membind() v2 [3] which: >>>>> >>>>> "returns the mask of nodes from which memory can currently be allocated" >>>>> >>>>> and that is used, for example, in "numactl --show" to show nodes from where >>>>> memory can be allocated [4, 5]. >>>>> >>>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 >>>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 >>>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 >>>>> >>>>> >>>>>> - I'd like to suggest renaming the 'index' part of the following >>>>>> variables and functions to 'nindex' ('node_index' is probably to long) >>>>>> in the following code, to emphasize that we have node indexes pointing >>>>>> to actual, not always consecutive node numbers: >>>>>> >>>>>> 2879 // Create an index -> node mapping, since nodes are not >>>>>> always consecutive >>>>>> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >>>>>> GrowableArray(0, true); >>>>>> 2881 rebuild_index_to_node_map(); >>>>> >>>>> Simple change but much better to read indeed. Done. >>>>> >>>>> >>>>>> - can you please wrap the following one-line else statement into curly >>>>>> braces (it's more readable and we usually do it that way in HotSpot >>>>>> although there are no formal style guidelines :) >>>>>> >>>>>> 2953 } else >>>>>> 2954 // Current node is already a configured node. >>>>>> 2955 closest_node = index_to_node()->at(i); >>>>> >>>>> Done. >>>>> >>>>> >>>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set >>>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can >>>>>> later avoid the check for '|| !closest_distance'. Also, according to >>>>>> the man page, numa_distance() returns 0 if it can not determine the >>>>>> distance. So with the above change, the condition on line 2974 should >>>>>> read: >>>>>> >>>>>> 2947 if (distance && distance < closest_distance) { >>>>>> >>>>> >>>>> Sure, much better to set the initial condition as distant as possible and >>>>> adjust to a closer one bit by bit improving the if condition. Done. >>>>> >>>>> >>>>>> Finally, and not directly related to your change, I'd suggest the >>>>>> following clean-ups: >>>>>> >>>>>> - remove the usage of 'NCPUS = 32768' in >>>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >>>>>> unclear to me and probably related to an older version/problem of >>>>>> libnuma? I think we should simply use >>>>>> numa_allocate_cpumask()/numa_free_cpumask() instead. >>>>>> >>>>>> - we still use the NUMA version 1 function prototypes (e.g. >>>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >>>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >>>>>> also "numa_interleave_memory()" and maybe others). I think we should >>>>>> switch all prototypes to the new NUMA version 2 interface which you've >>>>>> already used for the new functions which you've added. >>>>> >>>>> I agree. Could I open a new bug to address these clean-ups? >>>>> >>>>> >>>>>> That said, I think these changes all require libnuma 2.0 (see >>>>>> os::Linux::libnuma_dlsym). So before starting this, you should make >>>>>> sure that libnuma 2.0 is available on all platforms to which you'd >>>>>> like to down-port this change. For jdk10 we could definitely do it, >>>>>> for jdk9 probably also, for jdk8 I'm not so sure. >>>>> >>>>> libnuma v1 last release dates back to 2008, but any idea how could I check that >>>>> for sure since it's on shared code? >>>>> >>>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> >>>>>> Regards, >>>>>> Volker >>>>>> >>>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Any update on it? >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Gustavo >>>>>>> >>>>>>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Could the following webrev be reviewed please? >>>>>>>> >>>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>>>>>> exist in the system. >>>>>>>> >>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>>>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>>>>>> >>>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>>>>>> consecutive and have memory, for example in a numa topology like: >>>>>>>> >>>>>>>> available: 2 nodes (0-1) >>>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>>> node 0 size: 65258 MB >>>>>>>> node 0 free: 34 MB >>>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>>> node 1 size: 65320 MB >>>>>>>> node 1 free: 150 MB >>>>>>>> node distances: >>>>>>>> node 0 1 >>>>>>>> 0: 10 20 >>>>>>>> 1: 20 10, >>>>>>>> >>>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>>>>>> topology like: >>>>>>>> >>>>>>>> available: 4 nodes (0-1,16-17) >>>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>>> node 0 size: 130706 MB >>>>>>>> node 0 free: 7729 MB >>>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>>> node 1 size: 0 MB >>>>>>>> node 1 free: 0 MB >>>>>>>> node 16 cpus: 80 88 96 104 112 >>>>>>>> node 16 size: 130630 MB >>>>>>>> node 16 free: 5282 MB >>>>>>>> node 17 cpus: 120 128 136 144 152 >>>>>>>> node 17 size: 0 MB >>>>>>>> node 17 free: 0 MB >>>>>>>> node distances: >>>>>>>> node 0 1 16 17 >>>>>>>> 0: 10 20 40 40 >>>>>>>> 1: 20 10 40 40 >>>>>>>> 16: 40 40 10 20 >>>>>>>> 17: 40 40 20 10, >>>>>>>> >>>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>>>>>> no memory. >>>>>>>> >>>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>>>>>> id as a hint that is not available in the system to be bound (it will receive >>>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>>>>>> messages: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>>>>>> >>>>>>>> That change improves the detection by making the JVM numa API aware of the >>>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>>>>>> be available: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>>>>>> >>>>>>>> The change has no effect on numa topologies were the problem does not occur, >>>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>>>>>> numa topologies where memory-less nodes exist (like in the last example above), >>>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>>>>>> to the closest node, otherwise they would be not associate to any node and >>>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>>>>>> performance. >>>>>>>> >>>>>>>> I found no regressions on x64 for the following numa topology: >>>>>>>> >>>>>>>> available: 2 nodes (0-1) >>>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>>>>>> node 0 size: 24102 MB >>>>>>>> node 0 free: 19806 MB >>>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>>>>>> node 1 size: 24190 MB >>>>>>>> node 1 free: 21951 MB >>>>>>>> node distances: >>>>>>>> node 0 1 >>>>>>>> 0: 10 21 >>>>>>>> 1: 21 10 >>>>>>>> >>>>>>>> I understand that fixing the current numa detection is a prerequisite to enable >>>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Gustavo >>>>>>>> >>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> > From HORIE at jp.ibm.com Mon May 8 04:58:25 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Mon, 8 May 2017 13:58:25 +0900 Subject: 8179527: Implement intrinsic code for reverseBytes with load/store In-Reply-To: <7827421e2c6447f4ae406434f5bb3d25@sap.com> References: <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br> <7827421e2c6447f4ae406434f5bb3d25@sap.com> Message-ID: Dear Martin, Gustavo, Thank you very much for your helpful comments. Fixed code is http://cr.openjdk.java.net/~horii/8179527/webrev.01/ Dear Goetz, Would you kindly review and sponsor this change? I heard you are a C2 compiler expert and Martin is out for a while. Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Gustavo Serra Scalet , Michihiro Horie/Japan/IBM at IBMJP Cc: "ppc-aix-port-dev at openjdk.java.net" , "hotspot-dev at openjdk.java.net" , "Simonis, Volker" Date: 2017/05/03 02:24 Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store Hi Michihiro and Gustavo, thank you very much for implementing this change. @Gustavo: Thanks for taking a look. I think that the direct match rules are just there to satisfy match_rule_supported. They don't need to be fast, they are just a fall back solution. The goal is to exploit the byte reverse load and store instructions which should match in more performance critical cases. Now my review: assembler_ppc.hpp: Looks good except a minor formatting request: LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532 << 1), should be LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532u << 1), to be consistent. The comments // X-FORM should be aligned with the other ones. assembler_ppc.inline.hpp: Good. ppc.ad: I'm concerned about the additional match rules which are only used for the expand step. They could match directly leading to incorrect code. What they match is not what they do. I suggest to implement the code directly in the ins_encode. This would make the new code significantly shorter and less error prone. I think we don't need to optimize for Power6 anymore and newer processors shouldn't really suffer under a little less optimized instruction scheduling. Would you agree? Displacements may be too large for "li" so I suggest to use the "indirect" memory operand and let the compiler handle it. I know that it may increase latency because the compiler will need to insert an addition which could better be matched into the memory operand of the load which is harder to implement (it is possible to match an addition in an operand). Best regards, Martin -----Original Message----- From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br] Sent: Dienstag, 2. Mai 2017 17:05 To: Michihiro Horie Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Simonis, Volker ; Doerr, Martin Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store Hi Michihiro, I wonder if there is no vectorized approach for implementing your "bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so intentionally? > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Michihiro Horie > Sent: ter?a-feira, 2 de maio de 2017 11:47 > To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; > volker.simonis at sap.com; martin.doerr at sap.com > Subject: 8179527: Implement intrinsic code for reverseBytes with > load/store > > Dear all, > > Would you please review following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8179527 > Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/ > > I added new intrinsic code for reverseBytes() in ppc.ad with > * match(Set dst (ReverseBytesI/L/US/S (LoadI src))); > * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From gromero at linux.vnet.ibm.com Mon May 8 14:21:51 2017 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 8 May 2017 11:21:51 -0300 Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used In-Reply-To: <4b26117f-508d-90ee-1ced-2a2c720a1047@oracle.com> References: <58C1AE06.9060609@linux.vnet.ibm.com> <58EEAF7B.6020708@linux.vnet.ibm.com> <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com> <590CD5E7.10809@linux.vnet.ibm.com> <4b26117f-508d-90ee-1ced-2a2c720a1047@oracle.com> Message-ID: <59107EFF.9000805@linux.vnet.ibm.com> Hi David, Volker Thanks a lot reviewing and pushing the change! Regards, Gustavo On 07-05-2017 17:45, David Holmes wrote: > Hi Gustavo, > > On 6/05/2017 5:43 AM, Gustavo Romero wrote: >> Hi David, >> >> On 04-05-2017 21:32, David Holmes wrote: >>> Hi Volker, Gustavo, >>> >>> On 4/05/2017 12:34 AM, Volker Simonis wrote: >>>> Hi, >>>> >>>> I've reviewed Gustavo's change and I'm fine with the latest version at: >>>> >>>> http://cr.openjdk.java.net/~gromero/8175813/v3/ >>> >>> Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with >>> this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version. >> >> Thanks a lot for reviewing and sponsoring the change. >> >> >>> One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at >>> all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless >>> -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers. >> >> If libnuma is not available in the system os::Linux::libnuma_init() will return >> false and JVM will refuse to enable the UseNUMA features instead of aborting: >> >> 4904 if (UseNUMA) { >> 4905 if (!Linux::libnuma_init()) { >> 4906 UseNUMA = false; >> 4907 } else { >> >> I understand those null checks as part of the initial design of JVM numa api to >> enforce protection against the usage of its methods in other parts of the code >> when JVM api failed to initialize properly, even tho it's expected that >> UseNUMA = false should suffice to protect such a usages. > > Ok. Seems like they should be asserts rather than runtime checks if all the paths are properly guarded by UseNUMA - but that isn't your problem. > >> That said, I could not find any recent Linux distribution that does not support >> libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a >> dependency of metapackage ubuntu-standard and because that requires "irqbalance" >> it also requires libnuma. Libnuma was updated from libnuma v1 to v2 >> around mid 2008: > > Thanks for the additional info. > >> numactl (2.0.1-1) unstable; urgency=low >> >> * New upstream >> * patches/static-lib.patch: update >> * debian/watch: update to new SGI location >> >> -- Ian Wienand Sat, 07 Jun 2008 14:18:22 -0700 >> >> numactl (1.0.2-1) unstable; urgency=low >> >> * New upstream >> * Closes: #442690 -- Add to rules a hack to remove libnuma.a after >> unpatching >> * Update README.debian >> >> >> -- Ian Wienand Wed, 03 Oct 2007 21:49:27 +1000 >> >> >> It's similar on RHEL, where "irqbalance" is in core group. Regarding >> the libnuma version it was also updated in 2008 to v2, so since >> Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it: >> >> * Wed Feb 25 2009 Fedora Release Engineering - 2.0.2-3 >> - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild >> >> * Mon Sep 29 2008 Neil Horman - 2.0.2-2 >> - Fix build break due to register selection in asm >> >> * Mon Sep 29 2008 Neil Horman - 2.0.2-1 >> - Update rawhide to version 2.0.2 of numactl >> >> * Fri Apr 25 2008 Neil Horman - 1.0.2-6 >> - Fix buffer size passing and arg sanity check for physcpubind (bz 442521) >> >> >> Also, the last release of libnuma v1 dates back to 2008: >> https://github.com/numactl/numactl/releases/tag/v1.0.2 >> >> So it looks like libnuma v2 absence on Linux is by now uncommon. >> >> >>> Style nits: >>> - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0" >>> - spaces around operators eg. node=0 should be node = 0 >> >> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/ > > Looks good. Changes being pushed now. > > David > ----- > >> >> Thank you and best regards, >> Gustavo >> >>> Thanks, >>> David >>> >>>> Can somebody please sponsor the change? >>>> >>>> Thank you and best regards, >>>> Volker >>>> >>>> >>>> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero >>>> wrote: >>>>> Hi community, >>>>> >>>>> I understand that there is nothing that can be done additionally regarding this >>>>> issue, at this point, on the PPC64 side. >>>>> >>>>> It's a change in the shared code - but that in effect does not change anything in >>>>> the numa detection mechanism for other platforms - and hence it's necessary a >>>>> conjoint community effort to review the change and a sponsor to run it against >>>>> the JPRT. >>>>> >>>>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of >>>>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if >>>>> the community could point out directions on how that change could move on. >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> On 25-04-2017 23:49, Gustavo Romero wrote: >>>>>> Dear Volker, >>>>>> >>>>>> On 24-04-2017 14:08, Volker Simonis wrote: >>>>>>> Hi Gustavo, >>>>>>> >>>>>>> thanks for addressing this problem and sorry for my late reply. I >>>>>>> think this is a good change which definitely improves the situation >>>>>>> for uncommon NUMA configurations without changing the handling for >>>>>>> common topologies. >>>>>> >>>>>> Thanks a lot for reviewing the change! >>>>>> >>>>>> >>>>>>> It would be great if somebody could run this trough JPRT, but as >>>>>>> Gustavo mentioned, I don't expect any regressions. >>>>>>> >>>>>>> @Igor: I think you've been the original author of the NUMA-aware >>>>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to >>>>>>> linux"). If you could find some spare minutes to take a look at this >>>>>>> change, your comment would be very much appreciated :) >>>>>>> >>>>>>> Following some minor comments from me: >>>>>>> >>>>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes() >>>>>>> to get the actual number of configured nodes. This is good and >>>>>>> certainly an improvement over the previous implementation. However, >>>>>>> the man page for numa_num_configured_nodes() mentions that the >>>>>>> returned count may contain currently disabled nodes. Do we currently >>>>>>> handle disabled nodes? What will be the consequence if we would use >>>>>>> such a disabled node (e.g. mbind() warnings)? >>>>>> >>>>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in >>>>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just >>>>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the >>>>>> number of nodes with memory in the system. To the best of my knowledge there is >>>>>> no system configuration on Linux/PPC64 that could match such a notion of >>>>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in >>>>>> that dir and just the ones with memory will be taken into account. If it's >>>>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no >>>>>> mbind() tried against it). >>>>>> >>>>>> On Power it's possible to have a numa node without memory (memory-less node, a >>>>>> case covered in this change), a numa node without cpus at all but with memory >>>>>> (a configured node anyway, so a case already covered) but to disable a specific >>>>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible >>>>>> from the inners of the control module. Or other rare condition not invisible / >>>>>> adjustable from the OS. Also I'm not aware of a case where a node is in this >>>>>> dir but is at the same time flagged as something like "disabled". There are >>>>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK. >>>>>> >>>>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347 >>>>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618 >>>>>> >>>>>> >>>>>>> - the same question applies to the usage of >>>>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups(). >>>>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by >>>>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can >>>>>>> this be a potential problem (i.e. if we use a disabled node). >>>>>> >>>>>> On the meaning of "disabled nodes", it's the same case as above, so to the >>>>>> best of knowledge it's not a potential problem. >>>>>> >>>>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory), >>>>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly >>>>>> the same pointer returned by numa_get_membind() v2 [3] which: >>>>>> >>>>>> "returns the mask of nodes from which memory can currently be allocated" >>>>>> >>>>>> and that is used, for example, in "numactl --show" to show nodes from where >>>>>> memory can be allocated [4, 5]. >>>>>> >>>>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147 >>>>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144 >>>>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177 >>>>>> >>>>>> >>>>>>> - I'd like to suggest renaming the 'index' part of the following >>>>>>> variables and functions to 'nindex' ('node_index' is probably to long) >>>>>>> in the following code, to emphasize that we have node indexes pointing >>>>>>> to actual, not always consecutive node numbers: >>>>>>> >>>>>>> 2879 // Create an index -> node mapping, since nodes are not >>>>>>> always consecutive >>>>>>> 2880 _index_to_node = new (ResourceObj::C_HEAP, mtInternal) >>>>>>> GrowableArray(0, true); >>>>>>> 2881 rebuild_index_to_node_map(); >>>>>> >>>>>> Simple change but much better to read indeed. Done. >>>>>> >>>>>> >>>>>>> - can you please wrap the following one-line else statement into curly >>>>>>> braces (it's more readable and we usually do it that way in HotSpot >>>>>>> although there are no formal style guidelines :) >>>>>>> >>>>>>> 2953 } else >>>>>>> 2954 // Current node is already a configured node. >>>>>>> 2955 closest_node = index_to_node()->at(i); >>>>>> >>>>>> Done. >>>>>> >>>>>> >>>>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set >>>>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can >>>>>>> later avoid the check for '|| !closest_distance'. Also, according to >>>>>>> the man page, numa_distance() returns 0 if it can not determine the >>>>>>> distance. So with the above change, the condition on line 2974 should >>>>>>> read: >>>>>>> >>>>>>> 2947 if (distance && distance < closest_distance) { >>>>>>> >>>>>> >>>>>> Sure, much better to set the initial condition as distant as possible and >>>>>> adjust to a closer one bit by bit improving the if condition. Done. >>>>>> >>>>>> >>>>>>> Finally, and not directly related to your change, I'd suggest the >>>>>>> following clean-ups: >>>>>>> >>>>>>> - remove the usage of 'NCPUS = 32768' in >>>>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is >>>>>>> unclear to me and probably related to an older version/problem of >>>>>>> libnuma? I think we should simply use >>>>>>> numa_allocate_cpumask()/numa_free_cpumask() instead. >>>>>>> >>>>>>> - we still use the NUMA version 1 function prototypes (e.g. >>>>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)" >>>>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but >>>>>>> also "numa_interleave_memory()" and maybe others). I think we should >>>>>>> switch all prototypes to the new NUMA version 2 interface which you've >>>>>>> already used for the new functions which you've added. >>>>>> >>>>>> I agree. Could I open a new bug to address these clean-ups? >>>>>> >>>>>> >>>>>>> That said, I think these changes all require libnuma 2.0 (see >>>>>>> os::Linux::libnuma_dlsym). So before starting this, you should make >>>>>>> sure that libnuma 2.0 is available on all platforms to which you'd >>>>>>> like to down-port this change. For jdk10 we could definitely do it, >>>>>>> for jdk9 probably also, for jdk8 I'm not so sure. >>>>>> >>>>>> libnuma v1 last release dates back to 2008, but any idea how could I check that >>>>>> for sure since it's on shared code? >>>>>> >>>>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/ >>>>>> >>>>>> Thank you! >>>>>> >>>>>> Best regards, >>>>>> Gustavo >>>>>> >>>>>> >>>>>>> Regards, >>>>>>> Volker >>>>>>> >>>>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Any update on it? >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Gustavo >>>>>>>> >>>>>>>> On 09-03-2017 16:33, Gustavo Romero wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Could the following webrev be reviewed please? >>>>>>>>> >>>>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes >>>>>>>>> exist in the system. >>>>>>>>> >>>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/ >>>>>>>>> bug : https://bugs.openjdk.java.net/browse/JDK-8175813 >>>>>>>>> >>>>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are >>>>>>>>> consecutive and have memory, for example in a numa topology like: >>>>>>>>> >>>>>>>>> available: 2 nodes (0-1) >>>>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>>>> node 0 size: 65258 MB >>>>>>>>> node 0 free: 34 MB >>>>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>>>> node 1 size: 65320 MB >>>>>>>>> node 1 free: 150 MB >>>>>>>>> node distances: >>>>>>>>> node 0 1 >>>>>>>>> 0: 10 20 >>>>>>>>> 1: 20 10, >>>>>>>>> >>>>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa >>>>>>>>> topology like: >>>>>>>>> >>>>>>>>> available: 4 nodes (0-1,16-17) >>>>>>>>> node 0 cpus: 0 8 16 24 32 >>>>>>>>> node 0 size: 130706 MB >>>>>>>>> node 0 free: 7729 MB >>>>>>>>> node 1 cpus: 40 48 56 64 72 >>>>>>>>> node 1 size: 0 MB >>>>>>>>> node 1 free: 0 MB >>>>>>>>> node 16 cpus: 80 88 96 104 112 >>>>>>>>> node 16 size: 130630 MB >>>>>>>>> node 16 free: 5282 MB >>>>>>>>> node 17 cpus: 120 128 136 144 152 >>>>>>>>> node 17 size: 0 MB >>>>>>>>> node 17 free: 0 MB >>>>>>>>> node distances: >>>>>>>>> node 0 1 16 17 >>>>>>>>> 0: 10 20 40 40 >>>>>>>>> 1: 20 10 40 40 >>>>>>>>> 16: 40 40 10 20 >>>>>>>>> 17: 40 40 20 10, >>>>>>>>> >>>>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have >>>>>>>>> no memory. >>>>>>>>> >>>>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group >>>>>>>>> id as a hint that is not available in the system to be bound (it will receive >>>>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument" >>>>>>>>> messages: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log >>>>>>>>> >>>>>>>>> That change improves the detection by making the JVM numa API aware of the >>>>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node >>>>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not >>>>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will >>>>>>>>> be available: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log >>>>>>>>> >>>>>>>>> The change has no effect on numa topologies were the problem does not occur, >>>>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On >>>>>>>>> numa topologies where memory-less nodes exist (like in the last example above), >>>>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped >>>>>>>>> to the closest node, otherwise they would be not associate to any node and >>>>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the >>>>>>>>> performance. >>>>>>>>> >>>>>>>>> I found no regressions on x64 for the following numa topology: >>>>>>>>> >>>>>>>>> available: 2 nodes (0-1) >>>>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11 >>>>>>>>> node 0 size: 24102 MB >>>>>>>>> node 0 free: 19806 MB >>>>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15 >>>>>>>>> node 1 size: 24190 MB >>>>>>>>> node 1 free: 21951 MB >>>>>>>>> node distances: >>>>>>>>> node 0 1 >>>>>>>>> 0: 10 21 >>>>>>>>> 1: 21 10 >>>>>>>>> >>>>>>>>> I understand that fixing the current numa detection is a prerequisite to enable >>>>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2]. >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Gustavo >>>>>>>>> >>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate) >>>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation) >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> > From mikael.vidstedt at oracle.com Tue May 9 21:29:13 2017 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Tue, 9 May 2017 14:29:13 -0700 Subject: RFR(S): 8180003: Remove sys/ prefix from poll.h and signal.h includes Message-ID: Please review this small change which removes the sys/ prefix from a bunch of includes of poll.h and signal.h. hotspot: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/hotspot/webrev/ jdk: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/ Using the sys/ prefix works on many platforms, but the posix spec makes it clear that the poll.h and signal.h header files should be included without the prefix. I have verified that this change works on all the Oracle supported platforms, but I could use some help verifying it on AIX. Cheers, Mikael -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.burkhalter at oracle.com Tue May 9 21:45:04 2017 From: brian.burkhalter at oracle.com (Brian Burkhalter) Date: Tue, 9 May 2017 14:45:04 -0700 Subject: RFR(S): 8180003: Remove sys/ prefix from poll.h and signal.h includes In-Reply-To: References: Message-ID: <83DEDA3B-2BD5-4F99-A3BE-2F3AE8F2C39B@oracle.com> On May 9, 2017, at 2:29 PM, Mikael Vidstedt wrote: > Please review this small change which removes the sys/ prefix from a bunch of includes of poll.h and signal.h. > > hotspot: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/hotspot/webrev/ > jdk: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/ The JDK NIO changes look fine at least. > Using the sys/ prefix works on many platforms, but the posix spec makes it clear that the poll.h and signal.h header files should be included without the prefix. Just had to look ? [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/poll.h.html [2] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html > I have verified that this change works on all the Oracle supported platforms, but I could use some help verifying it on AIX. Good about the Oracle platforms. Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue May 9 23:19:23 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 10 May 2017 09:19:23 +1000 Subject: RFR(S): 8180003: Remove sys/ prefix from poll.h and signal.h includes In-Reply-To: References: Message-ID: <3e50ead4-6b81-c0f2-1654-847390981357@oracle.com> Hi Mikael, To repeat myself from: http://mail.openjdk.java.net/pipermail/portola-dev/2017-April/000025.html Changes look okay. I agree with the rationale. Looking at actual implementations, linux and mac OS are trivially fine (poll.h just includes sys/poll.h). Solaris is non-trivially fine - poll.h does more than what sys/poll.h does, but nothing that affects our sources. Thanks, David :) On 10/05/2017 7:29 AM, Mikael Vidstedt wrote: > > Please review this small change which removes the sys/ prefix from a bunch of includes of poll.h and signal.h. > > hotspot: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/hotspot/webrev/ > jdk: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/ > > Using the sys/ prefix works on many platforms, but the posix spec makes it clear that the poll.h and signal.h header files should be included without the prefix. > > I have verified that this change works on all the Oracle supported platforms, but I could use some help verifying it on AIX. > > Cheers, > Mikael > From HORIE at jp.ibm.com Thu May 11 06:46:32 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Thu, 11 May 2017 15:46:32 +0900 Subject: Optimizing byte reverse code for int value In-Reply-To: <174bf72968b5473cb3757a4f1c125bf7@sap.com> References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>, <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> <2e13a32b56cd4d9f89758f4042602e9a@sap.com> <174bf72968b5473cb3757a4f1c125bf7@sap.com> Message-ID: Martin, Thanks a lot for your helpful comments. I fixed my code. http://cr.openjdk.java.net/~horii/8178294/webrev.06/ >@Andrew: Do you think this is the right way to do it and is there a chance to get it in jdk8u? Andrew, I would be grateful if you would approve this change for jdk8u. Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie/Japan/IBM at IBMJP, "aph at redhat.com" Cc: Gustavo Bueno Romero , Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" Date: 2017/04/26 18:04 Subject: RE: Optimizing byte reverse code for int value Hi Michihiro, this looks better, now. Just a few comments: - I think ?UseUnalignedAccesses? should be used instead of #ifdef SPARC. Other platforms can also be affected. - In theory, I think that an ordered load may get matched which would get replaced by an unordered one. I guess this would probably never occur, but I think such changes should be absolutely bullet proof J Besides that, it looks correct to me. @Andrew: Do you think this is the right way to do it and is there a chance to get it in jdk8u? Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Mittwoch, 26. April 2017 05:10 To: Doerr, Martin Cc: aph at redhat.com; Gustavo Bueno Romero ; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: RE: Optimizing byte reverse code for int value Martin, Thanks a lot for your comments. I fixed my code. Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.05/ Best regards, -- Michihiro, IBM Research - Tokyo Inactive hide details for "Doerr, Martin" ---2017/04/24 18:11:29---Hi Michihiro, please note that I?m not a jdk8u reviewer."Doerr, Martin" ---2017/04/24 18:11:29---Hi Michihiro, please note that I?m not a jdk8u reviewer. From: "Doerr, Martin" To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" , Hiroshi H Horii/Japan/IBM at IBMJP, " hotspot-dev at openjdk.java.net" , " ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" , Gustavo Bueno Romero < gromero at br.ibm.com> Date: 2017/04/24 18:11 Subject: RE: Optimizing byte reverse code for int value Hi Michihiro, please note that I?m not a jdk8u reviewer. However, I have taken a quick look and I have the following concerns: 1. I think it?s incorrect for Big Endian. 2. The pattern can also match for an unaligned 4 byte address which would break platforms like SPARC. 3. I couldn?t see checks for shift amount and masks. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 21. April 2017 18:18 To: Doerr, Martin Cc: aph at redhat.com; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Gustavo Bueno Romero Subject: RE: Optimizing byte reverse code for int value Would you review following change for jdk8? Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/ Our byte-reverse optimization now works in shared code. I tested it with jtreg on x86, ppc64, and ppc64le. Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" , Hiroshi H Horii/Japan/IBM at IBMJP, " hotspot-dev at openjdk.java.net" , " ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" Subject: RE: Optimizing byte reverse code for int value Date: Wed, Apr 12, 2017 12:13 AM Hi Michihiro, thanks for the quick reply. I think Andrew?s idea is to optimize in the shared code instead of the platform backends. I haven?t thought about where this could be done. Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don?t have to touch any public interface, this might be an option, too. We?ll appreciate if you take care of the new match rules for PPC64. Thanks a lot. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 11. April 2017 16:55 To: Doerr, Martin Cc: aph at redhat.com; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: RE: Optimizing byte reverse code for int value Andrew, Martin, Thanks a lot for your helpful feedback. >Have you considered it as a generic optimization for all processors? We would support all processors for our byte-reverse optimization to make it generic. Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX. >In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind. Would it be fine with you? Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" To: Andrew Haley , Michihiro Horie/Japan/IBM at IBMJP Cc: "Simonis, Volker" , " ppc-aix-port-dev at openjdk.java.net" , " hotspot-dev at openjdk.java.net" , Hiroshi H Horii/Japan/IBM at IBMJP Subject: RE: Optimizing byte reverse code for int value Date: Tue, Apr 11, 2017 10:44 PM Hi Andrew, thank you for your helpful comments. I fully agree with you. In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. SPARC already has them: match(Set dst (ReverseBytesI/L/US/S (LoadI src))); match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. April 2017 13:02 To: Michihiro Horie Cc: Simonis, Volker ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii Subject: Re: Optimizing byte reverse code for int value On 11/04/17 11:36, Michihiro Horie wrote: > Thank you very much for letting us know Unsafe.getIntUnaligned is available in > JDK9. I do agree we should fix Java source code. > We think our byte-reverse optimization would still work on jdk8u as Hiroshi > mentioned. Would you agree on this point? I do, but I do not agree that this patch should necessarily be done in the PowerPC-specific back end. Have you considered it as a generic optimization for all processors? Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Derek.White at cavium.com Thu May 11 18:33:18 2017 From: Derek.White at cavium.com (White, Derek) Date: Thu, 11 May 2017 18:33:18 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: <4bdec074-3884-497e-ec86-f5a2dab6202f@redhat.com> References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com> <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> <2e13a32b56cd4d9f89758f4042602e9a@sap.com> <174bf72968b5473cb3757a4f1c125bf7@sap.com> <4bdec074-3884-497e-ec86-f5a2dab6202f@redhat.com> Message-ID: Hi Michihiro, Not a jdk8u reviewer OR C2 expert, but a possible simplification: I think a tree like: // AndI // /\ // LoadB ConI(255) will get turned into a LoadUBNode, via AndINode::Ideal() and AndINode::Identity(). It certainly should, considering how often this code pattern is used! If so, you should be able to simplify your pattern matching greatly. - Derek -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Thursday, May 11, 2017 5:02 AM To: Michihiro Horie ; Doerr, Martin Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii ; Simonis, Volker Subject: Re: Optimizing byte reverse code for int value On 11/05/17 07:46, Michihiro Horie wrote: > Thanks a lot for your helpful comments. I fixed my code. > http://cr.openjdk.java.net/~horii/8178294/webrev.06/ > >> @Andrew: Do you think this is the right way to do it and is there a >> chance > to get it in jdk8u? > Andrew, I would be grateful if you would approve this change for jdk8u. The list of jdk8u reviewers is at http://openjdk.java.net/census#jdk8u. You'll want someone who is on the HotSpot team. I have mixed feelings about this patch. It seems too specific to me: if you had something that would work with any integer type it would be more useful, I feel. And - generally speaking - the rule is that patches go into JDK 9 first, but JDK 9 is closed for enhancements. So, I'm sorry for the bad news. Your patch looks interesting and useful but I do not know how to get it committed. Andrew. From thomas.stuefe at gmail.com Tue May 16 12:50:32 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 16 May 2017 14:50:32 +0200 Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174 Message-ID: Hi all, may I have a review for this tiny fix: Issue: https://bugs.openjdk.java.net/browse/JDK-8180424 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another-build-issue-on-aix-after-8034174/webrev.00/webrev/ The prototypes for NET_RecvFrom and NET_Accept do not match their implementations for AIX since 8034174. This did not lead to an error in jdk9 because there, the header (net_util_md.h) was not included by aix_close.c. In JDK10, it is included and therefore does not build. I believe this did not lead to a runtime error on jdk9, at least not for the typical values involved; the mismatch is between int* and unsigned int* (native socklen_t). Kind Regards, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Wed May 17 15:11:26 2017 From: christoph.langer at sap.com (Langer, Christoph) Date: Wed, 17 May 2017 15:11:26 +0000 Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174 In-Reply-To: References: Message-ID: <57ba6104846c4ca2b8fd496f119ee853@sap.com> Hi Thomas, this looks good and should definitely be downported to JDK9 as soon as possible. Best regards Christoph From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Thomas St?fe Sent: Dienstag, 16. Mai 2017 14:51 To: ppc-aix-port-dev at openjdk.java.net; net-dev Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174 Hi all, may I have a review for this tiny fix: Issue: https://bugs.openjdk.java.net/browse/JDK-8180424 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another-build-issue-on-aix-after-8034174/webrev.00/webrev/ The prototypes for NET_RecvFrom and NET_Accept do not match their implementations for AIX since 8034174. This did not lead to an error in jdk9 because there, the header (net_util_md.h) was not included by aix_close.c. In JDK10, it is included and therefore does not build. I believe this did not lead to a runtime error on jdk9, at least not for the typical values involved; the mismatch is between int* and unsigned int* (native socklen_t). Kind Regards, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From vyom.tewari at oracle.com Wed May 17 15:17:26 2017 From: vyom.tewari at oracle.com (Vyom Tewari) Date: Wed, 17 May 2017 20:47:26 +0530 Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174 In-Reply-To: References: Message-ID: Hi Thomas, fix look good to me, but i am not jdk10 reviewer. Thanks, Vyom On Tuesday 16 May 2017 06:20 PM, Thomas St?fe wrote: > Hi all, > > may I have a review for this tiny fix: > > Issue: https://bugs.openjdk.java.net/browse/JDK-8180424 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another-build-issue-on-aix-after-8034174/webrev.00/webrev/ > > > The prototypes for NET_RecvFrom and NET_Accept do not match their > implementations for AIX since 8034174. This did not lead to an error > in jdk9 because there, the header (net_util_md.h) was not included by > aix_close.c. In JDK10, it is included and therefore does not build. > > I believe this did not lead to a runtime error on jdk9, at least not > for the typical values involved; the mismatch is between int* and > unsigned int* (native socklen_t). > > Kind Regards, Thomas From thomas.stuefe at gmail.com Thu May 18 10:07:55 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 18 May 2017 12:07:55 +0200 Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174 In-Reply-To: References: Message-ID: Thanks guys! I requested a fix for jdk9. Lets see how that goes. Best Regards, Thomas On Wed, May 17, 2017 at 5:17 PM, Vyom Tewari wrote: > Hi Thomas, > > fix look good to me, but i am not jdk10 reviewer. > > Thanks, > > Vyom > > > On Tuesday 16 May 2017 06:20 PM, Thomas St?fe wrote: > >> Hi all, >> >> may I have a review for this tiny fix: >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8180424 >> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another- >> build-issue-on-aix-after-8034174/webrev.00/webrev/ < >> http://cr.openjdk.java.net/%7Estuefe/webrevs/8180424-anothe >> r-build-issue-on-aix-after-8034174/webrev.00/webrev/> >> >> The prototypes for NET_RecvFrom and NET_Accept do not match their >> implementations for AIX since 8034174. This did not lead to an error in >> jdk9 because there, the header (net_util_md.h) was not included by >> aix_close.c. In JDK10, it is included and therefore does not build. >> >> I believe this did not lead to a runtime error on jdk9, at least not for >> the typical values involved; the mismatch is between int* and unsigned int* >> (native socklen_t). >> >> Kind Regards, Thomas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.vidstedt at oracle.com Fri May 26 21:22:34 2017 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Fri, 26 May 2017 14:22:34 -0700 Subject: RFR(S): 8180184: Add DATA and FSIZE to os::Posix::print_rlimit_info Message-ID: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com> Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans up/unifies some of the related code. Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ Tested using JPRT. Manually verified that the crash dump contains the expected information. Thanks to Thomas for helping verify that the change works as expected on AIX as well! Cheers, Mikael -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Mon May 29 07:10:52 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 29 May 2017 17:10:52 +1000 Subject: RFR(S): 8180184: Add DATA and FSIZE to os::Posix::print_rlimit_info In-Reply-To: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com> References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com> Message-ID: <92b540d7-dc21-047a-a1e8-8c191a3f7b74@oracle.com> Hi Mikael, Looks okay - good to see the code sharing. I wonder if the C compiler converts x/20124 into x >>10 ? :) Cheers, David On 27/05/2017 7:22 AM, Mikael Vidstedt wrote: > > Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans up/unifies some of the related code. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 > Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ > > Tested using JPRT. Manually verified that the crash dump contains the expected information. > > Thanks to Thomas for helping verify that the change works as expected on AIX as well! > > Cheers, > Mikael > From david.holmes at oracle.com Mon May 29 07:12:06 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 29 May 2017 17:12:06 +1000 Subject: RFR(S): 8180184: Add DATA and FSIZE to os::Posix::print_rlimit_info In-Reply-To: <92b540d7-dc21-047a-a1e8-8c191a3f7b74@oracle.com> References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com> <92b540d7-dc21-047a-a1e8-8c191a3f7b74@oracle.com> Message-ID: <9e60e467-6cc1-62a9-f84d-7fc4cd03fc0e@oracle.com> On 29/05/2017 5:10 PM, David Holmes wrote: > Hi Mikael, > > Looks okay - good to see the code sharing. > > I wonder if the C compiler converts x/20124 into x >>10 ? :) Don't know what happened there: x/1024 into x>>10 :) David > Cheers, > David > > On 27/05/2017 7:22 AM, Mikael Vidstedt wrote: >> >> Please review the following fix which adds RLIMIT_DATA and >> RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans >> up/unifies some of the related code. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 >> >> Webrev: >> http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ >> >> >> >> Tested using JPRT. Manually verified that the crash dump contains the >> expected information. >> >> Thanks to Thomas for helping verify that the change works as expected >> on AIX as well! >> >> Cheers, >> Mikael >> From thomas.stuefe at gmail.com Mon May 29 08:22:39 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 29 May 2017 10:22:39 +0200 Subject: RFR(S): 8180184: Add DATA and FSIZE to os::Posix::print_rlimit_info In-Reply-To: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com> References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com> Message-ID: Hi Mikael, looks fine. Small nit, we never seem to check the return value of getrlimit(). But seeing that the only way to fail getrlimit() would be to specify an invalid limit constant, maybe this is ok. Best Regards, Thomas On Fri, May 26, 2017 at 11:22 PM, Mikael Vidstedt < mikael.vidstedt at oracle.com> wrote: > > Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to > the rlimit related data in the crash dump, and cleans up/unifies some of > the related code. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 > Webrev: http://cr.openjdk.java.net/~mikael/webrevs/ > 8180184/webrev.00/hotspot/webrev/ > > Tested using JPRT. Manually verified that the crash dump contains the > expected information. > > Thanks to Thomas for helping verify that the change works as expected on > AIX as well! > > Cheers, > Mikael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Tue May 30 09:46:41 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 30 May 2017 11:46:41 +0200 Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds Message-ID: Hi all, may I have please a review for this tiny change: Bug: https://bugs.openjdk.java.net/browse/JDK-8181207 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/ This reverts 8177809 for AIX because it leads to build errors on older AIX systems. We want to retain the ability to build on older AIX releases. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.vidstedt at oracle.com Wed May 31 00:20:43 2017 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Tue, 30 May 2017 17:20:43 -0700 Subject: RFR(S): 8180184: Add DATA and FSIZE to os::Posix::print_rlimit_info In-Reply-To: References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com> Message-ID: David, I assume that the C++ compilers will all convert it to a simple shift by ten, but I?m not going to verify it :) Thomas, agree that in theory the return value from getrlimit should be checked, but chose to not make any further modifications as part of this change. Thanks to both of you for the reviews! Cheers, Mikael > On May 29, 2017, at 1:22 AM, Thomas St?fe wrote: > > Hi Mikael, > > looks fine. > > Small nit, we never seem to check the return value of getrlimit(). But seeing that the only way to fail getrlimit() would be to specify an invalid limit constant, maybe this is ok. > > Best Regards, Thomas > > On Fri, May 26, 2017 at 11:22 PM, Mikael Vidstedt > wrote: > > Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans up/unifies some of the related code. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 > Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ > > Tested using JPRT. Manually verified that the crash dump contains the expected information. > > Thanks to Thomas for helping verify that the change works as expected on AIX as well! > > Cheers, > Mikael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vyom.tewari at oracle.com Wed May 31 04:13:23 2017 From: vyom.tewari at oracle.com (Vyom Tewari) Date: Wed, 31 May 2017 09:43:23 +0530 Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds In-Reply-To: References: Message-ID: <489a79d1-d484-e5ae-e33d-7b32ec307a64@oracle.com> Hi Thomas, Change looks good to me, but i am not official reviewer. Thanks, Vyom On Tuesday 30 May 2017 03:16 PM, Thomas St?fe wrote: > Hi all, > > may I have please a review for this tiny change: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8181207 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/ > > This reverts 8177809 for AIX because it leads to build errors on older AIX > systems. We want to retain the ability to build on older AIX releases. > > Thanks, Thomas From thomas.stuefe at gmail.com Wed May 31 05:46:10 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 31 May 2017 07:46:10 +0200 Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds In-Reply-To: <489a79d1-d484-e5ae-e33d-7b32ec307a64@oracle.com> References: <489a79d1-d484-e5ae-e33d-7b32ec307a64@oracle.com> Message-ID: Thank you Vyom! On Wed, May 31, 2017 at 6:13 AM, Vyom Tewari wrote: > Hi Thomas, > > Change looks good to me, but i am not official reviewer. > > Thanks, > > Vyom > > > > On Tuesday 30 May 2017 03:16 PM, Thomas St?fe wrote: > >> Hi all, >> >> may I have please a review for this tiny change: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8181207 >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809- >> breaks-AIX-builds/webrev.00/webrev/ >> >> This reverts 8177809 for AIX because it leads to build errors on older AIX >> systems. We want to retain the ability to build on older AIX releases. >> >> Thanks, Thomas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Wed May 31 08:49:27 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 31 May 2017 10:49:27 +0200 Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds In-Reply-To: References: Message-ID: Hi Thomas, as far as I can see, AIX supports both, the st_[a,c,m]time members in the stat64 structure for seconds and the corresponding st_[a,c,m]time_n members for nanosecond resolution since at least 5.3. Can you please use both - there's no reason to discriminate AIX here :) Also, can you please change the code such that we have: #ifdef MACOSX ... #else #ifdef AIX ... #else ... #endif #endif I don't really like using "ifndef XXX" for everything else except XXX. Thnank you and best regards, Volker On Tue, May 30, 2017 at 11:46 AM, Thomas St?fe wrote: > Hi all, > > may I have please a review for this tiny change: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8181207 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/ > > This reverts 8177809 for AIX because it leads to build errors on older AIX > systems. We want to retain the ability to build on older AIX releases. > > Thanks, Thomas From HORIE at jp.ibm.com Wed May 31 12:36:27 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 31 May 2017 20:36:27 +0800 Subject: 8179527: Implement intrinsic code for reverseBytes with load/store In-Reply-To: References: <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br> <7827421e2c6447f4ae406434f5bb3d25@sap.com> Message-ID: Martin, Thank you very much for your helpful comments and sponsoring this change. Would you review the latest change? http://cr.openjdk.java.net/~horii/8179527/webrev.02/ Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie Cc: "hotspot-dev at openjdk.java.net" , "Simonis, Volker" , Hiroshi H Horii , Gustavo Bueno Romero Date: 2017/05/30 01:26 Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store Hi Michihiro, thanks for the improved webrev. This looks better, but I still have a couple of suggestions. 1. I still don?t like match rules which contain nodes which do something else (even though direct matching is prohibited by predicate). I think it would be better to remove ?match(?)?, ?predicate(false)? and ?ins_const(?)? and just describe the ?effect()?. At least, I?m not aware of why a match rule should be needed for rldicl and extsh. 2. I?d appreciate if you could remove ?predicate (UseCountLeadingZerosInstructionsPPC64)? from all byte_reverse_... rules. They don?t make any sense (not your fault). 3. The costs seem not to be set appropriately in the byte_reverse_... rules. E.g. instruction count * DEFAULT_COST would be better. 4. The load/store byte reversed instructions should use the 2 operand form (no explicit 0 for R0 to support assertions). Maybe we can find a 2nd reviewer if you provide a new webrev. I can sponsor the change. Thanks and best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Montag, 8. Mai 2017 06:58 To: Doerr, Martin ; Lindenmaier, Goetz Cc: Gustavo Serra Scalet ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Hiroshi H Horii ; Gustavo Bueno Romero Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store Dear Martin, Gustavo, Thank you very much for your helpful comments. Fixed code is http://cr.openjdk.java.net/~horii/8179527/webrev.01/ Dear Goetz, Would you kindly review and sponsor this change? I heard you are a C2 compiler expert and Martin is out for a while. Best regards, -- Michihiro, IBM Research - Tokyo Inactive hide details for "Doerr, Martin" ---2017/05/03 02:24:18---Hi Michihiro and Gustavo, thank you very much for implementi"Doerr, Martin" ---2017/05/03 02:24:18---Hi Michihiro and Gustavo, thank you very much for implementing this change. From: "Doerr, Martin" To: Gustavo Serra Scalet , Michihiro Horie/Japan/IBM at IBMJP Cc: "ppc-aix-port-dev at openjdk.java.net" , "hotspot-dev at openjdk.java.net" , "Simonis, Volker" Date: 2017/05/03 02:24 Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store Hi Michihiro and Gustavo, thank you very much for implementing this change. @Gustavo: Thanks for taking a look. I think that the direct match rules are just there to satisfy match_rule_supported. They don't need to be fast, they are just a fall back solution. The goal is to exploit the byte reverse load and store instructions which should match in more performance critical cases. Now my review: assembler_ppc.hpp: Looks good except a minor formatting request: LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532 << 1), should be LDBRX_OPCODE = (31u << OPCODE_SHIFT | 532u << 1), to be consistent. The comments // X-FORM should be aligned with the other ones. assembler_ppc.inline.hpp: Good. ppc.ad: I'm concerned about the additional match rules which are only used for the expand step. They could match directly leading to incorrect code. What they match is not what they do. I suggest to implement the code directly in the ins_encode. This would make the new code significantly shorter and less error prone. I think we don't need to optimize for Power6 anymore and newer processors shouldn't really suffer under a little less optimized instruction scheduling. Would you agree? Displacements may be too large for "li" so I suggest to use the "indirect" memory operand and let the compiler handle it. I know that it may increase latency because the compiler will need to insert an addition which could better be matched into the memory operand of the load which is harder to implement (it is possible to match an addition in an operand). Best regards, Martin -----Original Message----- From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br] Sent: Dienstag, 2. Mai 2017 17:05 To: Michihiro Horie Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Simonis, Volker ; Doerr, Martin < martin.doerr at sap.com> Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store Hi Michihiro, I wonder if there is no vectorized approach for implementing your "bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so intentionally? > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Michihiro Horie > Sent: ter?a-feira, 2 de maio de 2017 11:47 > To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; > volker.simonis at sap.com; martin.doerr at sap.com > Subject: 8179527: Implement intrinsic code for reverseBytes with > load/store > > Dear all, > > Would you please review following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8179527 > Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/ > > I added new intrinsic code for reverseBytes() in ppc.ad with > * match(Set dst (ReverseBytesI/L/US/S (LoadI src))); > * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From thomas.stuefe at gmail.com Wed May 31 15:29:40 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 31 May 2017 17:29:40 +0200 Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds In-Reply-To: References: Message-ID: Hi Volker, Good suggestions! I completely overlooked the ..._n members in stat64 struct. It seems it is even documented: https://www.ibm.com/support/knowledgecenter/ssw_aix_72/com.ibm.aix.files/stat.h.htm new webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.01/webrev/ ..Thomas On Wed, May 31, 2017 at 10:49 AM, Volker Simonis wrote: > Hi Thomas, > > as far as I can see, AIX supports both, the st_[a,c,m]time members in > the stat64 structure for seconds and the corresponding > st_[a,c,m]time_n members for nanosecond resolution since at least 5.3. > Can you please use both - there's no reason to discriminate AIX here > :) > > Also, can you please change the code such that we have: > > #ifdef MACOSX > ... > #else > #ifdef AIX > ... > #else > ... > #endif > #endif > > I don't really like using "ifndef XXX" for everything else except XXX. > > Thnank you and best regards, > Volker > > > On Tue, May 30, 2017 at 11:46 AM, Thomas St?fe > wrote: > > Hi all, > > > > may I have please a review for this tiny change: > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8181207 > > webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8181207- > 8177809-breaks-AIX-builds/webrev.00/webrev/ > > > > This reverts 8177809 for AIX because it leads to build errors on older > AIX > > systems. We want to retain the ability to build on older AIX releases. > > > > Thanks, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Wed May 31 21:41:14 2017 From: christoph.langer at sap.com (Langer, Christoph) Date: Wed, 31 May 2017 21:41:14 +0000 Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds In-Reply-To: References: Message-ID: <05c5fe05f8cb4c8b831255600a0eb2e9@sap.com> Hi Thomas, looks good. Some suggestions about formatting: a) you could code write your code like this: #if defined(_AIX) ? #elif defined(MACOSX) ? #else ? #endif That way the coding has 3 clear sections and you don?t have to do an #ifdef block in another #ifdef. b) Line 234, 235 (AIX block), rather write: rv = (jlong)sb.st_mtime * 1000; rv += (jlong)sb.st_mtime_n / 1000000; Then it looks aligned with the MACOSX and the default section. Best regards Christoph From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Thomas St?fe Sent: Mittwoch, 31. Mai 2017 17:30 To: Volker Simonis Cc: ppc-aix-port-dev at openjdk.java.net; Java Core Libs Subject: Re: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds Hi Volker, Good suggestions! I completely overlooked the ..._n members in stat64 struct. It seems it is even documented: https://www.ibm.com/support/knowledgecenter/ssw_aix_72/com.ibm.aix.files/stat.h.htm new webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.01/webrev/ ..Thomas On Wed, May 31, 2017 at 10:49 AM, Volker Simonis > wrote: Hi Thomas, as far as I can see, AIX supports both, the st_[a,c,m]time members in the stat64 structure for seconds and the corresponding st_[a,c,m]time_n members for nanosecond resolution since at least 5.3. Can you please use both - there's no reason to discriminate AIX here :) Also, can you please change the code such that we have: #ifdef MACOSX ... #else #ifdef AIX ... #else ... #endif #endif I don't really like using "ifndef XXX" for everything else except XXX. Thnank you and best regards, Volker On Tue, May 30, 2017 at 11:46 AM, Thomas St?fe > wrote: > Hi all, > > may I have please a review for this tiny change: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8181207 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/ > > This reverts 8177809 for AIX because it leads to build errors on older AIX > systems. We want to retain the ability to build on older AIX releases. > > Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: