From HORIE at jp.ibm.com  Tue May  2 14:47:01 2017
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 2 May 2017 23:47:01 +0900
Subject: 8179527: Implement intrinsic code for reverseBytes with load/store
Message-ID: <OF1FB78FBB.78C06521-ON00258114.004FD8C6-49258114.005135A3@notes.na.collabserv.com>


Dear all,

Would you please review following change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8179527
Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/

I added new intrinsic code for reverseBytes() in ppc.ad with
* match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
* match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));


Best regards,
--
Michihiro,
IBM Research - Tokyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170502/6dbdd3de/attachment.html>

From gustavo.scalet at eldorado.org.br  Tue May  2 15:05:09 2017
From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet)
Date: Tue, 2 May 2017 15:05:09 +0000
Subject: 8179527: Implement intrinsic code for reverseBytes with load/store
In-Reply-To: <OF1FB78FBB.78C06521-ON00258114.004FD8C6-49258114.005135A3@notes.na.collabserv.com>
References: <OF1FB78FBB.78C06521-ON00258114.004FD8C6-49258114.005135A3@notes.na.collabserv.com>
Message-ID: <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br>

Hi Michihiro,

I wonder if there is no vectorized approach for implementing your "bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so intentionally?

> -----Original Message-----
> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> bounces at openjdk.java.net] On Behalf Of Michihiro Horie
> Sent: ter?a-feira, 2 de maio de 2017 11:47
> To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net;
> volker.simonis at sap.com; martin.doerr at sap.com
> Subject: 8179527: Implement intrinsic code for reverseBytes with
> load/store
> 
> Dear all,
> 
> Would you please review following change?
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8179527
> Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/
> 
> I added new intrinsic code for reverseBytes() in ppc.ad with
> * match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
> * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo


From martin.doerr at sap.com  Tue May  2 17:23:29 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 2 May 2017 17:23:29 +0000
Subject: 8179527: Implement intrinsic code for reverseBytes with load/store
In-Reply-To: <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br>
References: <OF1FB78FBB.78C06521-ON00258114.004FD8C6-49258114.005135A3@notes.na.collabserv.com>
 <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br>
Message-ID: <7827421e2c6447f4ae406434f5bb3d25@sap.com>

Hi Michihiro and Gustavo,

thank you very much for implementing this change.

@Gustavo: Thanks for taking a look.
I think that the direct match rules are just there to satisfy match_rule_supported. They don't need to be fast, they are just a fall back solution.
The goal is to exploit the byte reverse load and store instructions which should match in more performance critical cases.

Now my review:

assembler_ppc.hpp:
Looks good except a minor formatting request:
LDBRX_OPCODE  = (31u << OPCODE_SHIFT |  532 << 1),
should be
LDBRX_OPCODE  = (31u << OPCODE_SHIFT | 532u << 1),
to be consistent.
The comments // X-FORM should be aligned with the other ones.

assembler_ppc.inline.hpp:
Good.

ppc.ad:
I'm concerned about the additional match rules which are only used for the expand step. They could match directly leading to incorrect code. What they match is not what they do.
I suggest to implement the code directly in the ins_encode. This would make the new code significantly shorter and less error prone.
I think we don't need to optimize for Power6 anymore and newer processors shouldn't really suffer under a little less optimized instruction scheduling. Would you agree?

Displacements may be too large for "li" so I suggest to use the "indirect" memory operand and let the compiler handle it. I know that it may increase latency because the compiler will need to insert an addition which could better be matched into the memory operand of the load which is harder to implement (it is possible to match an addition in an operand).


Best regards,
Martin


-----Original Message-----
From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br] 
Sent: Dienstag, 2. Mai 2017 17:05
To: Michihiro Horie <HORIE at jp.ibm.com>
Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Doerr, Martin <martin.doerr at sap.com>
Subject: RE: 8179527: Implement intrinsic code for reverseBytes with load/store

Hi Michihiro,

I wonder if there is no vectorized approach for implementing your "bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so intentionally?

> -----Original Message-----
> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> bounces at openjdk.java.net] On Behalf Of Michihiro Horie
> Sent: ter?a-feira, 2 de maio de 2017 11:47
> To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net;
> volker.simonis at sap.com; martin.doerr at sap.com
> Subject: 8179527: Implement intrinsic code for reverseBytes with
> load/store
> 
> Dear all,
> 
> Would you please review following change?
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8179527
> Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/
> 
> I added new intrinsic code for reverseBytes() in ppc.ad with
> * match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
> * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo


From rwestrel at redhat.com  Wed May  3 08:04:37 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 03 May 2017 10:04:37 +0200
Subject: PPC change needed for: [10] RFR(M): 8176506: C2: loop unswitching and
 unsafe accesses cause crash
Message-ID: <dk6zieuqsbu.fsf@rwestrel.remote.csb>


Just a heads up that the change below has some platform specific code
and is likely to need PPC specific code.

Thanks,
Roland.

-------------- next part --------------
An embedded message was scrubbed...
From: Roland Westrelin <rwestrel at redhat.com>
Subject: Re: [10] RFR(M): 8176506: C2: loop unswitching and unsafe accesses cause crash
Date: Fri, 28 Apr 2017 10:46:18 +0200
Size: 3343
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170503/4419c10d/attachment.mht>

From gromero at linux.vnet.ibm.com  Wed May  3 13:27:08 2017
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 3 May 2017 10:27:08 -0300
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <59000AC0.7050507@linux.vnet.ibm.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com>
Message-ID: <5909DAAC.3070202@linux.vnet.ibm.com>

Hi community,

I understand that there is nothing that can be done additionally regarding this
issue, at this point, on the PPC64 side.

It's a change in the shared code - but that in effect does not change anything in
the numa detection mechanism for other platforms - and hence it's necessary a
conjoint community effort to review the change and a sponsor to run it against
the JPRT.

I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
great concern on PPC64 (specially on POWER8 machines) I would be very glad if
the community could point out directions on how that change could move on.

Thank you!

Best regards,
Gustavo

On 25-04-2017 23:49, Gustavo Romero wrote:
> Dear Volker,
> 
> On 24-04-2017 14:08, Volker Simonis wrote:
>> Hi Gustavo,
>>
>> thanks for addressing this problem and sorry for my late reply. I
>> think this is a good change which definitely improves the situation
>> for uncommon NUMA configurations without changing the handling for
>> common topologies.
> 
> Thanks a lot for reviewing the change!
> 
> 
>> It would be great if somebody could run this trough JPRT, but as
>> Gustavo mentioned, I don't expect any regressions.
>>
>> @Igor: I think you've been the original author of the NUMA-aware
>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>> linux"). If you could find some spare minutes to take a look at this
>> change, your comment would be very much appreciated :)
>>
>> Following some minor comments from me:
>>
>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>> to get the actual number of configured nodes. This is good and
>> certainly an improvement over the previous implementation. However,
>> the man page for numa_num_configured_nodes() mentions that the
>> returned count may contain currently disabled nodes. Do we currently
>> handle disabled nodes? What will be the consequence if we would use
>> such a disabled node (e.g. mbind() warnings)?
> 
> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
> number of nodes with memory in the system. To the best of my knowledge there is
> no system configuration on Linux/PPC64 that could match such a notion of
> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
> that dir and just the ones with memory will be taken into account. If it's
> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
> mbind() tried against it).
> 
> On Power it's possible to have a numa node without memory (memory-less node, a
> case covered in this change), a numa node without cpus at all but with memory
> (a configured node anyway, so a case already covered) but to disable a specific
> numa node so it does not appear in /sys/devices/system/node/* it's only possible
> from the inners of the control module. Or other rare condition not invisible /
> adjustable from the OS. Also I'm not aware of a case where a node is in this
> dir but is at the same time flagged as something like "disabled". There are
> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
> 
> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
> 
> 
>> - the same question applies to the usage of
>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>> this be a potential problem (i.e. if we use a disabled node).
> 
> On the meaning of "disabled nodes", it's the same case as above, so to the
> best of knowledge it's not a potential problem.
> 
> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
> i.e. "all nodes on which the calling task may allocate memory". It's exactly
> the same pointer returned by numa_get_membind() v2 [3] which:
> 
> "returns the mask of nodes from which memory can currently be allocated"
> 
> and that is used, for example, in "numactl --show" to show nodes from where
> memory can be allocated [4, 5].
> 
> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
> 
> 
>> - I'd like to suggest renaming the 'index' part of the following
>> variables and functions to 'nindex' ('node_index' is probably to long)
>> in the following code, to emphasize that we have node indexes pointing
>> to actual, not always consecutive node numbers:
>>
>> 2879         // Create an index -> node mapping, since nodes are not
>> always consecutive
>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>> GrowableArray<int>(0, true);
>> 2881         rebuild_index_to_node_map();
> 
> Simple change but much better to read indeed. Done.
> 
> 
>> - can you please wrap the following one-line else statement into curly
>> braces (it's more readable and we usually do it that way in HotSpot
>> although there are no formal style guidelines :)
>>
>> 2953      } else
>> 2954        // Current node is already a configured node.
>> 2955        closest_node = index_to_node()->at(i);
> 
> Done.
> 
> 
>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>> later avoid the check for '|| !closest_distance'. Also, according to
>> the man page, numa_distance() returns 0 if it can not determine the
>> distance. So with the above change, the condition on line 2974 should
>> read:
>>
>> 2947           if (distance && distance < closest_distance) {
>>
> 
> Sure, much better to set the initial condition as distant as possible and
> adjust to a closer one bit by bit improving the if condition. Done.
> 
> 
>> Finally, and not directly related to your change, I'd suggest the
>> following clean-ups:
>>
>> - remove the usage of 'NCPUS = 32768' in
>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>> unclear to me and probably related to an older version/problem of
>> libnuma? I think we should simply use
>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>
>> - we still use the NUMA version 1 function prototypes (e.g.
>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>> also "numa_interleave_memory()" and maybe others). I think we should
>> switch all prototypes to the new NUMA version 2 interface which you've
>> already used for the new functions which you've added.
> 
> I agree. Could I open a new bug to address these clean-ups?
> 
> 
>> That said, I think these changes all require libnuma 2.0 (see
>> os::Linux::libnuma_dlsym). So before starting this, you should make
>> sure that libnuma 2.0 is available on all platforms to which you'd
>> like to down-port this change. For jdk10 we could definitely do it,
>> for jdk9 probably also, for jdk8 I'm not so sure.
> 
> libnuma v1 last release dates back to 2008, but any idea how could I check that
> for sure since it's on shared code?
> 
> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
> 
> Thank you!
> 
> Best regards,
> Gustavo
> 
> 
>> Regards,
>> Volker
>>
>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>> <gromero at linux.vnet.ibm.com> wrote:
>>> Hi,
>>>
>>> Any update on it?
>>>
>>> Thank you.
>>>
>>> Regards,
>>> Gustavo
>>>
>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>> Hi,
>>>>
>>>> Could the following webrev be reviewed please?
>>>>
>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>> exist in the system.
>>>>
>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>
>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>> consecutive and have memory, for example in a numa topology like:
>>>>
>>>> available: 2 nodes (0-1)
>>>> node 0 cpus: 0 8 16 24 32
>>>> node 0 size: 65258 MB
>>>> node 0 free: 34 MB
>>>> node 1 cpus: 40 48 56 64 72
>>>> node 1 size: 65320 MB
>>>> node 1 free: 150 MB
>>>> node distances:
>>>> node   0   1
>>>>   0:  10  20
>>>>   1:  20  10,
>>>>
>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>> topology like:
>>>>
>>>> available: 4 nodes (0-1,16-17)
>>>> node 0 cpus: 0 8 16 24 32
>>>> node 0 size: 130706 MB
>>>> node 0 free: 7729 MB
>>>> node 1 cpus: 40 48 56 64 72
>>>> node 1 size: 0 MB
>>>> node 1 free: 0 MB
>>>> node 16 cpus: 80 88 96 104 112
>>>> node 16 size: 130630 MB
>>>> node 16 free: 5282 MB
>>>> node 17 cpus: 120 128 136 144 152
>>>> node 17 size: 0 MB
>>>> node 17 free: 0 MB
>>>> node distances:
>>>> node   0   1  16  17
>>>>   0:  10  20  40  40
>>>>   1:  20  10  40  40
>>>>  16:  40  40  10  20
>>>>  17:  40  40  20  10,
>>>>
>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>> no memory.
>>>>
>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>> id as a hint that is not available in the system to be bound (it will receive
>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>> messages:
>>>>
>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>
>>>> That change improves the detection by making the JVM numa API aware of the
>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>> be available:
>>>>
>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>
>>>> The change has no effect on numa topologies were the problem does not occur,
>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>> to the closest node, otherwise they would be not associate to any node and
>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>> performance.
>>>>
>>>> I found no regressions on x64 for the following numa topology:
>>>>
>>>> available: 2 nodes (0-1)
>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>> node 0 size: 24102 MB
>>>> node 0 free: 19806 MB
>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>> node 1 size: 24190 MB
>>>> node 1 free: 21951 MB
>>>> node distances:
>>>> node   0   1
>>>>   0:  10  21
>>>>   1:  21  10
>>>>
>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>
>>>
>>
> 


From volker.simonis at gmail.com  Wed May  3 14:34:16 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Wed, 3 May 2017 16:34:16 +0200
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <5909DAAC.3070202@linux.vnet.ibm.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com>
Message-ID: <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>

Hi,

I've reviewed Gustavo's change and I'm fine with the latest version at:

http://cr.openjdk.java.net/~gromero/8175813/v3/

Can somebody please sponsor the change?

Thank you and best regards,
Volker


On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero
<gromero at linux.vnet.ibm.com> wrote:
> Hi community,
>
> I understand that there is nothing that can be done additionally regarding this
> issue, at this point, on the PPC64 side.
>
> It's a change in the shared code - but that in effect does not change anything in
> the numa detection mechanism for other platforms - and hence it's necessary a
> conjoint community effort to review the change and a sponsor to run it against
> the JPRT.
>
> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
> great concern on PPC64 (specially on POWER8 machines) I would be very glad if
> the community could point out directions on how that change could move on.
>
> Thank you!
>
> Best regards,
> Gustavo
>
> On 25-04-2017 23:49, Gustavo Romero wrote:
>> Dear Volker,
>>
>> On 24-04-2017 14:08, Volker Simonis wrote:
>>> Hi Gustavo,
>>>
>>> thanks for addressing this problem and sorry for my late reply. I
>>> think this is a good change which definitely improves the situation
>>> for uncommon NUMA configurations without changing the handling for
>>> common topologies.
>>
>> Thanks a lot for reviewing the change!
>>
>>
>>> It would be great if somebody could run this trough JPRT, but as
>>> Gustavo mentioned, I don't expect any regressions.
>>>
>>> @Igor: I think you've been the original author of the NUMA-aware
>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>>> linux"). If you could find some spare minutes to take a look at this
>>> change, your comment would be very much appreciated :)
>>>
>>> Following some minor comments from me:
>>>
>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>>> to get the actual number of configured nodes. This is good and
>>> certainly an improvement over the previous implementation. However,
>>> the man page for numa_num_configured_nodes() mentions that the
>>> returned count may contain currently disabled nodes. Do we currently
>>> handle disabled nodes? What will be the consequence if we would use
>>> such a disabled node (e.g. mbind() warnings)?
>>
>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
>> number of nodes with memory in the system. To the best of my knowledge there is
>> no system configuration on Linux/PPC64 that could match such a notion of
>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
>> that dir and just the ones with memory will be taken into account. If it's
>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
>> mbind() tried against it).
>>
>> On Power it's possible to have a numa node without memory (memory-less node, a
>> case covered in this change), a numa node without cpus at all but with memory
>> (a configured node anyway, so a case already covered) but to disable a specific
>> numa node so it does not appear in /sys/devices/system/node/* it's only possible
>> from the inners of the control module. Or other rare condition not invisible /
>> adjustable from the OS. Also I'm not aware of a case where a node is in this
>> dir but is at the same time flagged as something like "disabled". There are
>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
>>
>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
>>
>>
>>> - the same question applies to the usage of
>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>>> this be a potential problem (i.e. if we use a disabled node).
>>
>> On the meaning of "disabled nodes", it's the same case as above, so to the
>> best of knowledge it's not a potential problem.
>>
>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
>> i.e. "all nodes on which the calling task may allocate memory". It's exactly
>> the same pointer returned by numa_get_membind() v2 [3] which:
>>
>> "returns the mask of nodes from which memory can currently be allocated"
>>
>> and that is used, for example, in "numactl --show" to show nodes from where
>> memory can be allocated [4, 5].
>>
>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
>>
>>
>>> - I'd like to suggest renaming the 'index' part of the following
>>> variables and functions to 'nindex' ('node_index' is probably to long)
>>> in the following code, to emphasize that we have node indexes pointing
>>> to actual, not always consecutive node numbers:
>>>
>>> 2879         // Create an index -> node mapping, since nodes are not
>>> always consecutive
>>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>>> GrowableArray<int>(0, true);
>>> 2881         rebuild_index_to_node_map();
>>
>> Simple change but much better to read indeed. Done.
>>
>>
>>> - can you please wrap the following one-line else statement into curly
>>> braces (it's more readable and we usually do it that way in HotSpot
>>> although there are no formal style guidelines :)
>>>
>>> 2953      } else
>>> 2954        // Current node is already a configured node.
>>> 2955        closest_node = index_to_node()->at(i);
>>
>> Done.
>>
>>
>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>>> later avoid the check for '|| !closest_distance'. Also, according to
>>> the man page, numa_distance() returns 0 if it can not determine the
>>> distance. So with the above change, the condition on line 2974 should
>>> read:
>>>
>>> 2947           if (distance && distance < closest_distance) {
>>>
>>
>> Sure, much better to set the initial condition as distant as possible and
>> adjust to a closer one bit by bit improving the if condition. Done.
>>
>>
>>> Finally, and not directly related to your change, I'd suggest the
>>> following clean-ups:
>>>
>>> - remove the usage of 'NCPUS = 32768' in
>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>>> unclear to me and probably related to an older version/problem of
>>> libnuma? I think we should simply use
>>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>>
>>> - we still use the NUMA version 1 function prototypes (e.g.
>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>>> also "numa_interleave_memory()" and maybe others). I think we should
>>> switch all prototypes to the new NUMA version 2 interface which you've
>>> already used for the new functions which you've added.
>>
>> I agree. Could I open a new bug to address these clean-ups?
>>
>>
>>> That said, I think these changes all require libnuma 2.0 (see
>>> os::Linux::libnuma_dlsym). So before starting this, you should make
>>> sure that libnuma 2.0 is available on all platforms to which you'd
>>> like to down-port this change. For jdk10 we could definitely do it,
>>> for jdk9 probably also, for jdk8 I'm not so sure.
>>
>> libnuma v1 last release dates back to 2008, but any idea how could I check that
>> for sure since it's on shared code?
>>
>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
>>
>> Thank you!
>>
>> Best regards,
>> Gustavo
>>
>>
>>> Regards,
>>> Volker
>>>
>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>>> <gromero at linux.vnet.ibm.com> wrote:
>>>> Hi,
>>>>
>>>> Any update on it?
>>>>
>>>> Thank you.
>>>>
>>>> Regards,
>>>> Gustavo
>>>>
>>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>>> Hi,
>>>>>
>>>>> Could the following webrev be reviewed please?
>>>>>
>>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>>> exist in the system.
>>>>>
>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>>
>>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>>> consecutive and have memory, for example in a numa topology like:
>>>>>
>>>>> available: 2 nodes (0-1)
>>>>> node 0 cpus: 0 8 16 24 32
>>>>> node 0 size: 65258 MB
>>>>> node 0 free: 34 MB
>>>>> node 1 cpus: 40 48 56 64 72
>>>>> node 1 size: 65320 MB
>>>>> node 1 free: 150 MB
>>>>> node distances:
>>>>> node   0   1
>>>>>   0:  10  20
>>>>>   1:  20  10,
>>>>>
>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>>> topology like:
>>>>>
>>>>> available: 4 nodes (0-1,16-17)
>>>>> node 0 cpus: 0 8 16 24 32
>>>>> node 0 size: 130706 MB
>>>>> node 0 free: 7729 MB
>>>>> node 1 cpus: 40 48 56 64 72
>>>>> node 1 size: 0 MB
>>>>> node 1 free: 0 MB
>>>>> node 16 cpus: 80 88 96 104 112
>>>>> node 16 size: 130630 MB
>>>>> node 16 free: 5282 MB
>>>>> node 17 cpus: 120 128 136 144 152
>>>>> node 17 size: 0 MB
>>>>> node 17 free: 0 MB
>>>>> node distances:
>>>>> node   0   1  16  17
>>>>>   0:  10  20  40  40
>>>>>   1:  20  10  40  40
>>>>>  16:  40  40  10  20
>>>>>  17:  40  40  20  10,
>>>>>
>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>>> no memory.
>>>>>
>>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>>> id as a hint that is not available in the system to be bound (it will receive
>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>>> messages:
>>>>>
>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>>
>>>>> That change improves the detection by making the JVM numa API aware of the
>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>>> be available:
>>>>>
>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>>
>>>>> The change has no effect on numa topologies were the problem does not occur,
>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>>> to the closest node, otherwise they would be not associate to any node and
>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>>> performance.
>>>>>
>>>>> I found no regressions on x64 for the following numa topology:
>>>>>
>>>>> available: 2 nodes (0-1)
>>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>>> node 0 size: 24102 MB
>>>>> node 0 free: 19806 MB
>>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>>> node 1 size: 24190 MB
>>>>> node 1 free: 21951 MB
>>>>> node distances:
>>>>> node   0   1
>>>>>   0:  10  21
>>>>>   1:  21  10
>>>>>
>>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>>
>>>>
>>>
>>
>

From david.holmes at oracle.com  Thu May  4 01:50:27 2017
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 4 May 2017 11:50:27 +1000
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com>
 <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
Message-ID: <0e89961f-e5da-cb85-e30d-33e424b69a0b@oracle.com>

Hi Volker, Gustavo,

I will try to take a look at this again, but may be a day or two.

David

On 4/05/2017 12:34 AM, Volker Simonis wrote:
> Hi,
>
> I've reviewed Gustavo's change and I'm fine with the latest version at:
>
> http://cr.openjdk.java.net/~gromero/8175813/v3/
>
> Can somebody please sponsor the change?
>
> Thank you and best regards,
> Volker
>
>
> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero
> <gromero at linux.vnet.ibm.com> wrote:
>> Hi community,
>>
>> I understand that there is nothing that can be done additionally regarding this
>> issue, at this point, on the PPC64 side.
>>
>> It's a change in the shared code - but that in effect does not change anything in
>> the numa detection mechanism for other platforms - and hence it's necessary a
>> conjoint community effort to review the change and a sponsor to run it against
>> the JPRT.
>>
>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if
>> the community could point out directions on how that change could move on.
>>
>> Thank you!
>>
>> Best regards,
>> Gustavo
>>
>> On 25-04-2017 23:49, Gustavo Romero wrote:
>>> Dear Volker,
>>>
>>> On 24-04-2017 14:08, Volker Simonis wrote:
>>>> Hi Gustavo,
>>>>
>>>> thanks for addressing this problem and sorry for my late reply. I
>>>> think this is a good change which definitely improves the situation
>>>> for uncommon NUMA configurations without changing the handling for
>>>> common topologies.
>>>
>>> Thanks a lot for reviewing the change!
>>>
>>>
>>>> It would be great if somebody could run this trough JPRT, but as
>>>> Gustavo mentioned, I don't expect any regressions.
>>>>
>>>> @Igor: I think you've been the original author of the NUMA-aware
>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>>>> linux"). If you could find some spare minutes to take a look at this
>>>> change, your comment would be very much appreciated :)
>>>>
>>>> Following some minor comments from me:
>>>>
>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>>>> to get the actual number of configured nodes. This is good and
>>>> certainly an improvement over the previous implementation. However,
>>>> the man page for numa_num_configured_nodes() mentions that the
>>>> returned count may contain currently disabled nodes. Do we currently
>>>> handle disabled nodes? What will be the consequence if we would use
>>>> such a disabled node (e.g. mbind() warnings)?
>>>
>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
>>> number of nodes with memory in the system. To the best of my knowledge there is
>>> no system configuration on Linux/PPC64 that could match such a notion of
>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
>>> that dir and just the ones with memory will be taken into account. If it's
>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
>>> mbind() tried against it).
>>>
>>> On Power it's possible to have a numa node without memory (memory-less node, a
>>> case covered in this change), a numa node without cpus at all but with memory
>>> (a configured node anyway, so a case already covered) but to disable a specific
>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible
>>> from the inners of the control module. Or other rare condition not invisible /
>>> adjustable from the OS. Also I'm not aware of a case where a node is in this
>>> dir but is at the same time flagged as something like "disabled". There are
>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
>>>
>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
>>>
>>>
>>>> - the same question applies to the usage of
>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>>>> this be a potential problem (i.e. if we use a disabled node).
>>>
>>> On the meaning of "disabled nodes", it's the same case as above, so to the
>>> best of knowledge it's not a potential problem.
>>>
>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly
>>> the same pointer returned by numa_get_membind() v2 [3] which:
>>>
>>> "returns the mask of nodes from which memory can currently be allocated"
>>>
>>> and that is used, for example, in "numactl --show" to show nodes from where
>>> memory can be allocated [4, 5].
>>>
>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
>>>
>>>
>>>> - I'd like to suggest renaming the 'index' part of the following
>>>> variables and functions to 'nindex' ('node_index' is probably to long)
>>>> in the following code, to emphasize that we have node indexes pointing
>>>> to actual, not always consecutive node numbers:
>>>>
>>>> 2879         // Create an index -> node mapping, since nodes are not
>>>> always consecutive
>>>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>>>> GrowableArray<int>(0, true);
>>>> 2881         rebuild_index_to_node_map();
>>>
>>> Simple change but much better to read indeed. Done.
>>>
>>>
>>>> - can you please wrap the following one-line else statement into curly
>>>> braces (it's more readable and we usually do it that way in HotSpot
>>>> although there are no formal style guidelines :)
>>>>
>>>> 2953      } else
>>>> 2954        // Current node is already a configured node.
>>>> 2955        closest_node = index_to_node()->at(i);
>>>
>>> Done.
>>>
>>>
>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>>>> later avoid the check for '|| !closest_distance'. Also, according to
>>>> the man page, numa_distance() returns 0 if it can not determine the
>>>> distance. So with the above change, the condition on line 2974 should
>>>> read:
>>>>
>>>> 2947           if (distance && distance < closest_distance) {
>>>>
>>>
>>> Sure, much better to set the initial condition as distant as possible and
>>> adjust to a closer one bit by bit improving the if condition. Done.
>>>
>>>
>>>> Finally, and not directly related to your change, I'd suggest the
>>>> following clean-ups:
>>>>
>>>> - remove the usage of 'NCPUS = 32768' in
>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>>>> unclear to me and probably related to an older version/problem of
>>>> libnuma? I think we should simply use
>>>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>>>
>>>> - we still use the NUMA version 1 function prototypes (e.g.
>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>>>> also "numa_interleave_memory()" and maybe others). I think we should
>>>> switch all prototypes to the new NUMA version 2 interface which you've
>>>> already used for the new functions which you've added.
>>>
>>> I agree. Could I open a new bug to address these clean-ups?
>>>
>>>
>>>> That said, I think these changes all require libnuma 2.0 (see
>>>> os::Linux::libnuma_dlsym). So before starting this, you should make
>>>> sure that libnuma 2.0 is available on all platforms to which you'd
>>>> like to down-port this change. For jdk10 we could definitely do it,
>>>> for jdk9 probably also, for jdk8 I'm not so sure.
>>>
>>> libnuma v1 last release dates back to 2008, but any idea how could I check that
>>> for sure since it's on shared code?
>>>
>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
>>>
>>> Thank you!
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>
>>>> Regards,
>>>> Volker
>>>>
>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Any update on it?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Regards,
>>>>> Gustavo
>>>>>
>>>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Could the following webrev be reviewed please?
>>>>>>
>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>>>> exist in the system.
>>>>>>
>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>>>
>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>>>> consecutive and have memory, for example in a numa topology like:
>>>>>>
>>>>>> available: 2 nodes (0-1)
>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>> node 0 size: 65258 MB
>>>>>> node 0 free: 34 MB
>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>> node 1 size: 65320 MB
>>>>>> node 1 free: 150 MB
>>>>>> node distances:
>>>>>> node   0   1
>>>>>>   0:  10  20
>>>>>>   1:  20  10,
>>>>>>
>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>>>> topology like:
>>>>>>
>>>>>> available: 4 nodes (0-1,16-17)
>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>> node 0 size: 130706 MB
>>>>>> node 0 free: 7729 MB
>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>> node 1 size: 0 MB
>>>>>> node 1 free: 0 MB
>>>>>> node 16 cpus: 80 88 96 104 112
>>>>>> node 16 size: 130630 MB
>>>>>> node 16 free: 5282 MB
>>>>>> node 17 cpus: 120 128 136 144 152
>>>>>> node 17 size: 0 MB
>>>>>> node 17 free: 0 MB
>>>>>> node distances:
>>>>>> node   0   1  16  17
>>>>>>   0:  10  20  40  40
>>>>>>   1:  20  10  40  40
>>>>>>  16:  40  40  10  20
>>>>>>  17:  40  40  20  10,
>>>>>>
>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>>>> no memory.
>>>>>>
>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>>>> id as a hint that is not available in the system to be bound (it will receive
>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>>>> messages:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>>>
>>>>>> That change improves the detection by making the JVM numa API aware of the
>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>>>> be available:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>>>
>>>>>> The change has no effect on numa topologies were the problem does not occur,
>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>>>> to the closest node, otherwise they would be not associate to any node and
>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>>>> performance.
>>>>>>
>>>>>> I found no regressions on x64 for the following numa topology:
>>>>>>
>>>>>> available: 2 nodes (0-1)
>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>>>> node 0 size: 24102 MB
>>>>>> node 0 free: 19806 MB
>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>>>> node 1 size: 24190 MB
>>>>>> node 1 free: 21951 MB
>>>>>> node distances:
>>>>>> node   0   1
>>>>>>   0:  10  21
>>>>>>   1:  21  10
>>>>>>
>>>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Gustavo
>>>>>>
>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>>>
>>>>>
>>>>
>>>
>>

From david.holmes at oracle.com  Fri May  5 00:32:17 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 5 May 2017 10:32:17 +1000
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com>
 <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
Message-ID: <b7fe8dec-e9cc-913e-52a0-690ec664ba88@oracle.com>

Hi Volker, Gustavo,

On 4/05/2017 12:34 AM, Volker Simonis wrote:
> Hi,
>
> I've reviewed Gustavo's change and I'm fine with the latest version at:
>
> http://cr.openjdk.java.net/~gromero/8175813/v3/

Nothing has really changed for me since I first looked at this - I don't 
know NUMA and I can't comment on any of the details. But no-one else has 
commented negatively so they are implicitly okay with this, or else they 
should have spoken up. So with Volker as the Reviewer and myself as a 
second reviewer, I will sponsor this. I'll run the current patch through 
JPRT while awaiting the final version.

One thing I was unclear on with all this numa code is the expectation 
regarding all those dynamically looked up functions - is it expected 
that we will have them all or else have none? It wasn't at all obvious 
what would happen if we don't have those functions but still executed 
this code - assuming that is even possible. I guess I would have 
expected that no numa code would execute unless -XX:+UseNUMA was set, in 
which case the VM would abort if any of the libnuma functions could not 
be found. That way we wouldn't need the null checks for the function 
pointers.

Style nits:
- we should avoid implicit booleans, so the isnode_in_* functions should 
return bool not int; and check "distance != 0"
- spaces around operators eg. node=0 should be node = 0

Thanks,
David

> Can somebody please sponsor the change?
>
> Thank you and best regards,
> Volker
>
>
> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero
> <gromero at linux.vnet.ibm.com> wrote:
>> Hi community,
>>
>> I understand that there is nothing that can be done additionally regarding this
>> issue, at this point, on the PPC64 side.
>>
>> It's a change in the shared code - but that in effect does not change anything in
>> the numa detection mechanism for other platforms - and hence it's necessary a
>> conjoint community effort to review the change and a sponsor to run it against
>> the JPRT.
>>
>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if
>> the community could point out directions on how that change could move on.
>>
>> Thank you!
>>
>> Best regards,
>> Gustavo
>>
>> On 25-04-2017 23:49, Gustavo Romero wrote:
>>> Dear Volker,
>>>
>>> On 24-04-2017 14:08, Volker Simonis wrote:
>>>> Hi Gustavo,
>>>>
>>>> thanks for addressing this problem and sorry for my late reply. I
>>>> think this is a good change which definitely improves the situation
>>>> for uncommon NUMA configurations without changing the handling for
>>>> common topologies.
>>>
>>> Thanks a lot for reviewing the change!
>>>
>>>
>>>> It would be great if somebody could run this trough JPRT, but as
>>>> Gustavo mentioned, I don't expect any regressions.
>>>>
>>>> @Igor: I think you've been the original author of the NUMA-aware
>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>>>> linux"). If you could find some spare minutes to take a look at this
>>>> change, your comment would be very much appreciated :)
>>>>
>>>> Following some minor comments from me:
>>>>
>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>>>> to get the actual number of configured nodes. This is good and
>>>> certainly an improvement over the previous implementation. However,
>>>> the man page for numa_num_configured_nodes() mentions that the
>>>> returned count may contain currently disabled nodes. Do we currently
>>>> handle disabled nodes? What will be the consequence if we would use
>>>> such a disabled node (e.g. mbind() warnings)?
>>>
>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
>>> number of nodes with memory in the system. To the best of my knowledge there is
>>> no system configuration on Linux/PPC64 that could match such a notion of
>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
>>> that dir and just the ones with memory will be taken into account. If it's
>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
>>> mbind() tried against it).
>>>
>>> On Power it's possible to have a numa node without memory (memory-less node, a
>>> case covered in this change), a numa node without cpus at all but with memory
>>> (a configured node anyway, so a case already covered) but to disable a specific
>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible
>>> from the inners of the control module. Or other rare condition not invisible /
>>> adjustable from the OS. Also I'm not aware of a case where a node is in this
>>> dir but is at the same time flagged as something like "disabled". There are
>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
>>>
>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
>>>
>>>
>>>> - the same question applies to the usage of
>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>>>> this be a potential problem (i.e. if we use a disabled node).
>>>
>>> On the meaning of "disabled nodes", it's the same case as above, so to the
>>> best of knowledge it's not a potential problem.
>>>
>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly
>>> the same pointer returned by numa_get_membind() v2 [3] which:
>>>
>>> "returns the mask of nodes from which memory can currently be allocated"
>>>
>>> and that is used, for example, in "numactl --show" to show nodes from where
>>> memory can be allocated [4, 5].
>>>
>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
>>>
>>>
>>>> - I'd like to suggest renaming the 'index' part of the following
>>>> variables and functions to 'nindex' ('node_index' is probably to long)
>>>> in the following code, to emphasize that we have node indexes pointing
>>>> to actual, not always consecutive node numbers:
>>>>
>>>> 2879         // Create an index -> node mapping, since nodes are not
>>>> always consecutive
>>>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>>>> GrowableArray<int>(0, true);
>>>> 2881         rebuild_index_to_node_map();
>>>
>>> Simple change but much better to read indeed. Done.
>>>
>>>
>>>> - can you please wrap the following one-line else statement into curly
>>>> braces (it's more readable and we usually do it that way in HotSpot
>>>> although there are no formal style guidelines :)
>>>>
>>>> 2953      } else
>>>> 2954        // Current node is already a configured node.
>>>> 2955        closest_node = index_to_node()->at(i);
>>>
>>> Done.
>>>
>>>
>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>>>> later avoid the check for '|| !closest_distance'. Also, according to
>>>> the man page, numa_distance() returns 0 if it can not determine the
>>>> distance. So with the above change, the condition on line 2974 should
>>>> read:
>>>>
>>>> 2947           if (distance && distance < closest_distance) {
>>>>
>>>
>>> Sure, much better to set the initial condition as distant as possible and
>>> adjust to a closer one bit by bit improving the if condition. Done.
>>>
>>>
>>>> Finally, and not directly related to your change, I'd suggest the
>>>> following clean-ups:
>>>>
>>>> - remove the usage of 'NCPUS = 32768' in
>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>>>> unclear to me and probably related to an older version/problem of
>>>> libnuma? I think we should simply use
>>>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>>>
>>>> - we still use the NUMA version 1 function prototypes (e.g.
>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>>>> also "numa_interleave_memory()" and maybe others). I think we should
>>>> switch all prototypes to the new NUMA version 2 interface which you've
>>>> already used for the new functions which you've added.
>>>
>>> I agree. Could I open a new bug to address these clean-ups?
>>>
>>>
>>>> That said, I think these changes all require libnuma 2.0 (see
>>>> os::Linux::libnuma_dlsym). So before starting this, you should make
>>>> sure that libnuma 2.0 is available on all platforms to which you'd
>>>> like to down-port this change. For jdk10 we could definitely do it,
>>>> for jdk9 probably also, for jdk8 I'm not so sure.
>>>
>>> libnuma v1 last release dates back to 2008, but any idea how could I check that
>>> for sure since it's on shared code?
>>>
>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
>>>
>>> Thank you!
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>
>>>> Regards,
>>>> Volker
>>>>
>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Any update on it?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Regards,
>>>>> Gustavo
>>>>>
>>>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Could the following webrev be reviewed please?
>>>>>>
>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>>>> exist in the system.
>>>>>>
>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>>>
>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>>>> consecutive and have memory, for example in a numa topology like:
>>>>>>
>>>>>> available: 2 nodes (0-1)
>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>> node 0 size: 65258 MB
>>>>>> node 0 free: 34 MB
>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>> node 1 size: 65320 MB
>>>>>> node 1 free: 150 MB
>>>>>> node distances:
>>>>>> node   0   1
>>>>>>   0:  10  20
>>>>>>   1:  20  10,
>>>>>>
>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>>>> topology like:
>>>>>>
>>>>>> available: 4 nodes (0-1,16-17)
>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>> node 0 size: 130706 MB
>>>>>> node 0 free: 7729 MB
>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>> node 1 size: 0 MB
>>>>>> node 1 free: 0 MB
>>>>>> node 16 cpus: 80 88 96 104 112
>>>>>> node 16 size: 130630 MB
>>>>>> node 16 free: 5282 MB
>>>>>> node 17 cpus: 120 128 136 144 152
>>>>>> node 17 size: 0 MB
>>>>>> node 17 free: 0 MB
>>>>>> node distances:
>>>>>> node   0   1  16  17
>>>>>>   0:  10  20  40  40
>>>>>>   1:  20  10  40  40
>>>>>>  16:  40  40  10  20
>>>>>>  17:  40  40  20  10,
>>>>>>
>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>>>> no memory.
>>>>>>
>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>>>> id as a hint that is not available in the system to be bound (it will receive
>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>>>> messages:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>>>
>>>>>> That change improves the detection by making the JVM numa API aware of the
>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>>>> be available:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>>>
>>>>>> The change has no effect on numa topologies were the problem does not occur,
>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>>>> to the closest node, otherwise they would be not associate to any node and
>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>>>> performance.
>>>>>>
>>>>>> I found no regressions on x64 for the following numa topology:
>>>>>>
>>>>>> available: 2 nodes (0-1)
>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>>>> node 0 size: 24102 MB
>>>>>> node 0 free: 19806 MB
>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>>>> node 1 size: 24190 MB
>>>>>> node 1 free: 21951 MB
>>>>>> node distances:
>>>>>> node   0   1
>>>>>>   0:  10  21
>>>>>>   1:  21  10
>>>>>>
>>>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Gustavo
>>>>>>
>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>>>
>>>>>
>>>>
>>>
>>

From gromero at linux.vnet.ibm.com  Fri May  5 19:43:35 2017
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Fri, 5 May 2017 16:43:35 -0300
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <b7fe8dec-e9cc-913e-52a0-690ec664ba88@oracle.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com>
 <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
 <b7fe8dec-e9cc-913e-52a0-690ec664ba88@oracle.com>
Message-ID: <590CD5E7.10809@linux.vnet.ibm.com>

Hi David,

On 04-05-2017 21:32, David Holmes wrote:
> Hi Volker, Gustavo,
> 
> On 4/05/2017 12:34 AM, Volker Simonis wrote:
>> Hi,
>>
>> I've reviewed Gustavo's change and I'm fine with the latest version at:
>>
>> http://cr.openjdk.java.net/~gromero/8175813/v3/
> 
> Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with
> this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version.

Thanks a lot for reviewing and sponsoring the change.


> One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at
> all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless
> -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers.

If libnuma is not available in the system os::Linux::libnuma_init() will return
false and JVM will refuse to enable the UseNUMA features instead of aborting:

4904   if (UseNUMA) {
4905     if (!Linux::libnuma_init()) {
4906       UseNUMA = false;
4907     } else {

I understand those null checks as part of the initial design of JVM numa api to
enforce protection against the usage of its methods in other parts of the code
when JVM api failed to initialize properly, even tho it's expected that
UseNUMA = false should suffice to protect such a usages.

That said, I could not find any recent Linux distribution that does not support
libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a
dependency of metapackage ubuntu-standard and because that requires "irqbalance"
it also requires libnuma. Libnuma was updated from libnuma v1 to v2
around mid 2008:

numactl (2.0.1-1) unstable; urgency=low

  * New upstream
  * patches/static-lib.patch: update
  * debian/watch: update to new SGI location

 -- Ian Wienand <ianw at debian.org>  Sat, 07 Jun 2008 14:18:22 -0700

numactl (1.0.2-1) unstable; urgency=low

  * New upstream
  * Closes: #442690 -- Add to rules a hack to remove libnuma.a after
    unpatching
  * Update README.debian


 -- Ian Wienand <ianw at debian.org>  Wed, 03 Oct 2007 21:49:27 +1000


It's similar on RHEL, where "irqbalance" is in core group. Regarding
the libnuma version it was also updated in 2008 to v2, so since
Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it:

* Wed Feb 25 2009 Fedora Release Engineering <rel-eng at lists.fedoraproject.org> - 2.0.2-3
- Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild

* Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-2
- Fix build break due to register selection in asm

* Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-1
- Update rawhide to version 2.0.2 of numactl

* Fri Apr 25 2008 Neil Horman <nhorman at redhat.com> - 1.0.2-6
- Fix buffer size passing and arg sanity check for physcpubind (bz 442521)


Also, the last release of libnuma v1 dates back to 2008:
https://github.com/numactl/numactl/releases/tag/v1.0.2

So it looks like libnuma v2 absence on Linux is by now uncommon.


> Style nits:
> - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0"
> - spaces around operators eg. node=0 should be node = 0

new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/


Thank you and best regards,
Gustavo

> Thanks,
> David
> 
>> Can somebody please sponsor the change?
>>
>> Thank you and best regards,
>> Volker
>>
>>
>> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero
>> <gromero at linux.vnet.ibm.com> wrote:
>>> Hi community,
>>>
>>> I understand that there is nothing that can be done additionally regarding this
>>> issue, at this point, on the PPC64 side.
>>>
>>> It's a change in the shared code - but that in effect does not change anything in
>>> the numa detection mechanism for other platforms - and hence it's necessary a
>>> conjoint community effort to review the change and a sponsor to run it against
>>> the JPRT.
>>>
>>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
>>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if
>>> the community could point out directions on how that change could move on.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>> Gustavo
>>>
>>> On 25-04-2017 23:49, Gustavo Romero wrote:
>>>> Dear Volker,
>>>>
>>>> On 24-04-2017 14:08, Volker Simonis wrote:
>>>>> Hi Gustavo,
>>>>>
>>>>> thanks for addressing this problem and sorry for my late reply. I
>>>>> think this is a good change which definitely improves the situation
>>>>> for uncommon NUMA configurations without changing the handling for
>>>>> common topologies.
>>>>
>>>> Thanks a lot for reviewing the change!
>>>>
>>>>
>>>>> It would be great if somebody could run this trough JPRT, but as
>>>>> Gustavo mentioned, I don't expect any regressions.
>>>>>
>>>>> @Igor: I think you've been the original author of the NUMA-aware
>>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>>>>> linux"). If you could find some spare minutes to take a look at this
>>>>> change, your comment would be very much appreciated :)
>>>>>
>>>>> Following some minor comments from me:
>>>>>
>>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>>>>> to get the actual number of configured nodes. This is good and
>>>>> certainly an improvement over the previous implementation. However,
>>>>> the man page for numa_num_configured_nodes() mentions that the
>>>>> returned count may contain currently disabled nodes. Do we currently
>>>>> handle disabled nodes? What will be the consequence if we would use
>>>>> such a disabled node (e.g. mbind() warnings)?
>>>>
>>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
>>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
>>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
>>>> number of nodes with memory in the system. To the best of my knowledge there is
>>>> no system configuration on Linux/PPC64 that could match such a notion of
>>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
>>>> that dir and just the ones with memory will be taken into account. If it's
>>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
>>>> mbind() tried against it).
>>>>
>>>> On Power it's possible to have a numa node without memory (memory-less node, a
>>>> case covered in this change), a numa node without cpus at all but with memory
>>>> (a configured node anyway, so a case already covered) but to disable a specific
>>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible
>>>> from the inners of the control module. Or other rare condition not invisible /
>>>> adjustable from the OS. Also I'm not aware of a case where a node is in this
>>>> dir but is at the same time flagged as something like "disabled". There are
>>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
>>>>
>>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
>>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
>>>>
>>>>
>>>>> - the same question applies to the usage of
>>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>>>>> this be a potential problem (i.e. if we use a disabled node).
>>>>
>>>> On the meaning of "disabled nodes", it's the same case as above, so to the
>>>> best of knowledge it's not a potential problem.
>>>>
>>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
>>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly
>>>> the same pointer returned by numa_get_membind() v2 [3] which:
>>>>
>>>> "returns the mask of nodes from which memory can currently be allocated"
>>>>
>>>> and that is used, for example, in "numactl --show" to show nodes from where
>>>> memory can be allocated [4, 5].
>>>>
>>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
>>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
>>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
>>>>
>>>>
>>>>> - I'd like to suggest renaming the 'index' part of the following
>>>>> variables and functions to 'nindex' ('node_index' is probably to long)
>>>>> in the following code, to emphasize that we have node indexes pointing
>>>>> to actual, not always consecutive node numbers:
>>>>>
>>>>> 2879         // Create an index -> node mapping, since nodes are not
>>>>> always consecutive
>>>>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>>>>> GrowableArray<int>(0, true);
>>>>> 2881         rebuild_index_to_node_map();
>>>>
>>>> Simple change but much better to read indeed. Done.
>>>>
>>>>
>>>>> - can you please wrap the following one-line else statement into curly
>>>>> braces (it's more readable and we usually do it that way in HotSpot
>>>>> although there are no formal style guidelines :)
>>>>>
>>>>> 2953      } else
>>>>> 2954        // Current node is already a configured node.
>>>>> 2955        closest_node = index_to_node()->at(i);
>>>>
>>>> Done.
>>>>
>>>>
>>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>>>>> later avoid the check for '|| !closest_distance'. Also, according to
>>>>> the man page, numa_distance() returns 0 if it can not determine the
>>>>> distance. So with the above change, the condition on line 2974 should
>>>>> read:
>>>>>
>>>>> 2947           if (distance && distance < closest_distance) {
>>>>>
>>>>
>>>> Sure, much better to set the initial condition as distant as possible and
>>>> adjust to a closer one bit by bit improving the if condition. Done.
>>>>
>>>>
>>>>> Finally, and not directly related to your change, I'd suggest the
>>>>> following clean-ups:
>>>>>
>>>>> - remove the usage of 'NCPUS = 32768' in
>>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>>>>> unclear to me and probably related to an older version/problem of
>>>>> libnuma? I think we should simply use
>>>>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>>>>
>>>>> - we still use the NUMA version 1 function prototypes (e.g.
>>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>>>>> also "numa_interleave_memory()" and maybe others). I think we should
>>>>> switch all prototypes to the new NUMA version 2 interface which you've
>>>>> already used for the new functions which you've added.
>>>>
>>>> I agree. Could I open a new bug to address these clean-ups?
>>>>
>>>>
>>>>> That said, I think these changes all require libnuma 2.0 (see
>>>>> os::Linux::libnuma_dlsym). So before starting this, you should make
>>>>> sure that libnuma 2.0 is available on all platforms to which you'd
>>>>> like to down-port this change. For jdk10 we could definitely do it,
>>>>> for jdk9 probably also, for jdk8 I'm not so sure.
>>>>
>>>> libnuma v1 last release dates back to 2008, but any idea how could I check that
>>>> for sure since it's on shared code?
>>>>
>>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>
>>>>> Regards,
>>>>> Volker
>>>>>
>>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Any update on it?
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Regards,
>>>>>> Gustavo
>>>>>>
>>>>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Could the following webrev be reviewed please?
>>>>>>>
>>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>>>>> exist in the system.
>>>>>>>
>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>>>>
>>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>>>>> consecutive and have memory, for example in a numa topology like:
>>>>>>>
>>>>>>> available: 2 nodes (0-1)
>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>> node 0 size: 65258 MB
>>>>>>> node 0 free: 34 MB
>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>> node 1 size: 65320 MB
>>>>>>> node 1 free: 150 MB
>>>>>>> node distances:
>>>>>>> node   0   1
>>>>>>>   0:  10  20
>>>>>>>   1:  20  10,
>>>>>>>
>>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>>>>> topology like:
>>>>>>>
>>>>>>> available: 4 nodes (0-1,16-17)
>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>> node 0 size: 130706 MB
>>>>>>> node 0 free: 7729 MB
>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>> node 1 size: 0 MB
>>>>>>> node 1 free: 0 MB
>>>>>>> node 16 cpus: 80 88 96 104 112
>>>>>>> node 16 size: 130630 MB
>>>>>>> node 16 free: 5282 MB
>>>>>>> node 17 cpus: 120 128 136 144 152
>>>>>>> node 17 size: 0 MB
>>>>>>> node 17 free: 0 MB
>>>>>>> node distances:
>>>>>>> node   0   1  16  17
>>>>>>>   0:  10  20  40  40
>>>>>>>   1:  20  10  40  40
>>>>>>>  16:  40  40  10  20
>>>>>>>  17:  40  40  20  10,
>>>>>>>
>>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>>>>> no memory.
>>>>>>>
>>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>>>>> id as a hint that is not available in the system to be bound (it will receive
>>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>>>>> messages:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>>>>
>>>>>>> That change improves the detection by making the JVM numa API aware of the
>>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>>>>> be available:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>>>>
>>>>>>> The change has no effect on numa topologies were the problem does not occur,
>>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>>>>> to the closest node, otherwise they would be not associate to any node and
>>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>>>>> performance.
>>>>>>>
>>>>>>> I found no regressions on x64 for the following numa topology:
>>>>>>>
>>>>>>> available: 2 nodes (0-1)
>>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>>>>> node 0 size: 24102 MB
>>>>>>> node 0 free: 19806 MB
>>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>>>>> node 1 size: 24190 MB
>>>>>>> node 1 free: 21951 MB
>>>>>>> node distances:
>>>>>>> node   0   1
>>>>>>>   0:  10  21
>>>>>>>   1:  21  10
>>>>>>>
>>>>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Gustavo
>>>>>>>
>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
> 


From volker.simonis at gmail.com  Sat May  6 06:59:15 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Sat, 6 May 2017 08:59:15 +0200
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <590CD5E7.10809@linux.vnet.ibm.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com>
 <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
 <b7fe8dec-e9cc-913e-52a0-690ec664ba88@oracle.com>
 <590CD5E7.10809@linux.vnet.ibm.com>
Message-ID: <CA+3eh13cf8Xhe0+ibmNx5KTjZd+XtD_CZUdYSfOkz+KouJPK6w@mail.gmail.com>

On Fri, May 5, 2017 at 9:43 PM, Gustavo Romero
<gromero at linux.vnet.ibm.com> wrote:
> Hi David,
>
> On 04-05-2017 21:32, David Holmes wrote:
>> Hi Volker, Gustavo,
>>
>> On 4/05/2017 12:34 AM, Volker Simonis wrote:
>>> Hi,
>>>
>>> I've reviewed Gustavo's change and I'm fine with the latest version at:
>>>
>>> http://cr.openjdk.java.net/~gromero/8175813/v3/
>>
>> Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with
>> this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version.
>
> Thanks a lot for reviewing and sponsoring the change.
>
>
>> One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at
>> all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless
>> -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers.
>
> If libnuma is not available in the system os::Linux::libnuma_init() will return
> false and JVM will refuse to enable the UseNUMA features instead of aborting:
>
> 4904   if (UseNUMA) {
> 4905     if (!Linux::libnuma_init()) {
> 4906       UseNUMA = false;
> 4907     } else {
>
> I understand those null checks as part of the initial design of JVM numa api to
> enforce protection against the usage of its methods in other parts of the code
> when JVM api failed to initialize properly, even tho it's expected that
> UseNUMA = false should suffice to protect such a usages.
>
> That said, I could not find any recent Linux distribution that does not support
> libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a
> dependency of metapackage ubuntu-standard and because that requires "irqbalance"
> it also requires libnuma. Libnuma was updated from libnuma v1 to v2
> around mid 2008:
>
> numactl (2.0.1-1) unstable; urgency=low
>
>   * New upstream
>   * patches/static-lib.patch: update
>   * debian/watch: update to new SGI location
>
>  -- Ian Wienand <ianw at debian.org>  Sat, 07 Jun 2008 14:18:22 -0700
>
> numactl (1.0.2-1) unstable; urgency=low
>
>   * New upstream
>   * Closes: #442690 -- Add to rules a hack to remove libnuma.a after
>     unpatching
>   * Update README.debian
>
>
>  -- Ian Wienand <ianw at debian.org>  Wed, 03 Oct 2007 21:49:27 +1000
>
>
> It's similar on RHEL, where "irqbalance" is in core group. Regarding
> the libnuma version it was also updated in 2008 to v2, so since
> Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it:
>
> * Wed Feb 25 2009 Fedora Release Engineering <rel-eng at lists.fedoraproject.org> - 2.0.2-3
> - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild
>
> * Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-2
> - Fix build break due to register selection in asm
>
> * Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-1
> - Update rawhide to version 2.0.2 of numactl
>
> * Fri Apr 25 2008 Neil Horman <nhorman at redhat.com> - 1.0.2-6
> - Fix buffer size passing and arg sanity check for physcpubind (bz 442521)
>
>
> Also, the last release of libnuma v1 dates back to 2008:
> https://github.com/numactl/numactl/releases/tag/v1.0.2
>
> So it looks like libnuma v2 absence on Linux is by now uncommon.
>
>
>> Style nits:
>> - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0"
>> - spaces around operators eg. node=0 should be node = 0
>
> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/
>

Still good :) THumbs up!

And thanks a lot for digging into the history of libnuma and its
incarnation in various Linux distros. That's really useful
information!

Regards,
Volker

>
> Thank you and best regards,
> Gustavo
>
>> Thanks,
>> David
>>
>>> Can somebody please sponsor the change?
>>>
>>> Thank you and best regards,
>>> Volker
>>>
>>>
>>> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero
>>> <gromero at linux.vnet.ibm.com> wrote:
>>>> Hi community,
>>>>
>>>> I understand that there is nothing that can be done additionally regarding this
>>>> issue, at this point, on the PPC64 side.
>>>>
>>>> It's a change in the shared code - but that in effect does not change anything in
>>>> the numa detection mechanism for other platforms - and hence it's necessary a
>>>> conjoint community effort to review the change and a sponsor to run it against
>>>> the JPRT.
>>>>
>>>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
>>>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if
>>>> the community could point out directions on how that change could move on.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>> On 25-04-2017 23:49, Gustavo Romero wrote:
>>>>> Dear Volker,
>>>>>
>>>>> On 24-04-2017 14:08, Volker Simonis wrote:
>>>>>> Hi Gustavo,
>>>>>>
>>>>>> thanks for addressing this problem and sorry for my late reply. I
>>>>>> think this is a good change which definitely improves the situation
>>>>>> for uncommon NUMA configurations without changing the handling for
>>>>>> common topologies.
>>>>>
>>>>> Thanks a lot for reviewing the change!
>>>>>
>>>>>
>>>>>> It would be great if somebody could run this trough JPRT, but as
>>>>>> Gustavo mentioned, I don't expect any regressions.
>>>>>>
>>>>>> @Igor: I think you've been the original author of the NUMA-aware
>>>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>>>>>> linux"). If you could find some spare minutes to take a look at this
>>>>>> change, your comment would be very much appreciated :)
>>>>>>
>>>>>> Following some minor comments from me:
>>>>>>
>>>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>>>>>> to get the actual number of configured nodes. This is good and
>>>>>> certainly an improvement over the previous implementation. However,
>>>>>> the man page for numa_num_configured_nodes() mentions that the
>>>>>> returned count may contain currently disabled nodes. Do we currently
>>>>>> handle disabled nodes? What will be the consequence if we would use
>>>>>> such a disabled node (e.g. mbind() warnings)?
>>>>>
>>>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
>>>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
>>>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
>>>>> number of nodes with memory in the system. To the best of my knowledge there is
>>>>> no system configuration on Linux/PPC64 that could match such a notion of
>>>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
>>>>> that dir and just the ones with memory will be taken into account. If it's
>>>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
>>>>> mbind() tried against it).
>>>>>
>>>>> On Power it's possible to have a numa node without memory (memory-less node, a
>>>>> case covered in this change), a numa node without cpus at all but with memory
>>>>> (a configured node anyway, so a case already covered) but to disable a specific
>>>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible
>>>>> from the inners of the control module. Or other rare condition not invisible /
>>>>> adjustable from the OS. Also I'm not aware of a case where a node is in this
>>>>> dir but is at the same time flagged as something like "disabled". There are
>>>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
>>>>>
>>>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
>>>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
>>>>>
>>>>>
>>>>>> - the same question applies to the usage of
>>>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>>>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>>>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>>>>>> this be a potential problem (i.e. if we use a disabled node).
>>>>>
>>>>> On the meaning of "disabled nodes", it's the same case as above, so to the
>>>>> best of knowledge it's not a potential problem.
>>>>>
>>>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
>>>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly
>>>>> the same pointer returned by numa_get_membind() v2 [3] which:
>>>>>
>>>>> "returns the mask of nodes from which memory can currently be allocated"
>>>>>
>>>>> and that is used, for example, in "numactl --show" to show nodes from where
>>>>> memory can be allocated [4, 5].
>>>>>
>>>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
>>>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
>>>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
>>>>>
>>>>>
>>>>>> - I'd like to suggest renaming the 'index' part of the following
>>>>>> variables and functions to 'nindex' ('node_index' is probably to long)
>>>>>> in the following code, to emphasize that we have node indexes pointing
>>>>>> to actual, not always consecutive node numbers:
>>>>>>
>>>>>> 2879         // Create an index -> node mapping, since nodes are not
>>>>>> always consecutive
>>>>>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>>>>>> GrowableArray<int>(0, true);
>>>>>> 2881         rebuild_index_to_node_map();
>>>>>
>>>>> Simple change but much better to read indeed. Done.
>>>>>
>>>>>
>>>>>> - can you please wrap the following one-line else statement into curly
>>>>>> braces (it's more readable and we usually do it that way in HotSpot
>>>>>> although there are no formal style guidelines :)
>>>>>>
>>>>>> 2953      } else
>>>>>> 2954        // Current node is already a configured node.
>>>>>> 2955        closest_node = index_to_node()->at(i);
>>>>>
>>>>> Done.
>>>>>
>>>>>
>>>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>>>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>>>>>> later avoid the check for '|| !closest_distance'. Also, according to
>>>>>> the man page, numa_distance() returns 0 if it can not determine the
>>>>>> distance. So with the above change, the condition on line 2974 should
>>>>>> read:
>>>>>>
>>>>>> 2947           if (distance && distance < closest_distance) {
>>>>>>
>>>>>
>>>>> Sure, much better to set the initial condition as distant as possible and
>>>>> adjust to a closer one bit by bit improving the if condition. Done.
>>>>>
>>>>>
>>>>>> Finally, and not directly related to your change, I'd suggest the
>>>>>> following clean-ups:
>>>>>>
>>>>>> - remove the usage of 'NCPUS = 32768' in
>>>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>>>>>> unclear to me and probably related to an older version/problem of
>>>>>> libnuma? I think we should simply use
>>>>>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>>>>>
>>>>>> - we still use the NUMA version 1 function prototypes (e.g.
>>>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>>>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>>>>>> also "numa_interleave_memory()" and maybe others). I think we should
>>>>>> switch all prototypes to the new NUMA version 2 interface which you've
>>>>>> already used for the new functions which you've added.
>>>>>
>>>>> I agree. Could I open a new bug to address these clean-ups?
>>>>>
>>>>>
>>>>>> That said, I think these changes all require libnuma 2.0 (see
>>>>>> os::Linux::libnuma_dlsym). So before starting this, you should make
>>>>>> sure that libnuma 2.0 is available on all platforms to which you'd
>>>>>> like to down-port this change. For jdk10 we could definitely do it,
>>>>>> for jdk9 probably also, for jdk8 I'm not so sure.
>>>>>
>>>>> libnuma v1 last release dates back to 2008, but any idea how could I check that
>>>>> for sure since it's on shared code?
>>>>>
>>>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Volker
>>>>>>
>>>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>>>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Any update on it?
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Gustavo
>>>>>>>
>>>>>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Could the following webrev be reviewed please?
>>>>>>>>
>>>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>>>>>> exist in the system.
>>>>>>>>
>>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>>>>>
>>>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>>>>>> consecutive and have memory, for example in a numa topology like:
>>>>>>>>
>>>>>>>> available: 2 nodes (0-1)
>>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>>> node 0 size: 65258 MB
>>>>>>>> node 0 free: 34 MB
>>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>>> node 1 size: 65320 MB
>>>>>>>> node 1 free: 150 MB
>>>>>>>> node distances:
>>>>>>>> node   0   1
>>>>>>>>   0:  10  20
>>>>>>>>   1:  20  10,
>>>>>>>>
>>>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>>>>>> topology like:
>>>>>>>>
>>>>>>>> available: 4 nodes (0-1,16-17)
>>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>>> node 0 size: 130706 MB
>>>>>>>> node 0 free: 7729 MB
>>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>>> node 1 size: 0 MB
>>>>>>>> node 1 free: 0 MB
>>>>>>>> node 16 cpus: 80 88 96 104 112
>>>>>>>> node 16 size: 130630 MB
>>>>>>>> node 16 free: 5282 MB
>>>>>>>> node 17 cpus: 120 128 136 144 152
>>>>>>>> node 17 size: 0 MB
>>>>>>>> node 17 free: 0 MB
>>>>>>>> node distances:
>>>>>>>> node   0   1  16  17
>>>>>>>>   0:  10  20  40  40
>>>>>>>>   1:  20  10  40  40
>>>>>>>>  16:  40  40  10  20
>>>>>>>>  17:  40  40  20  10,
>>>>>>>>
>>>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>>>>>> no memory.
>>>>>>>>
>>>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>>>>>> id as a hint that is not available in the system to be bound (it will receive
>>>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>>>>>> messages:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>>>>>
>>>>>>>> That change improves the detection by making the JVM numa API aware of the
>>>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>>>>>> be available:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>>>>>
>>>>>>>> The change has no effect on numa topologies were the problem does not occur,
>>>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>>>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>>>>>> to the closest node, otherwise they would be not associate to any node and
>>>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>>>>>> performance.
>>>>>>>>
>>>>>>>> I found no regressions on x64 for the following numa topology:
>>>>>>>>
>>>>>>>> available: 2 nodes (0-1)
>>>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>>>>>> node 0 size: 24102 MB
>>>>>>>> node 0 free: 19806 MB
>>>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>>>>>> node 1 size: 24190 MB
>>>>>>>> node 1 free: 21951 MB
>>>>>>>> node distances:
>>>>>>>> node   0   1
>>>>>>>>   0:  10  21
>>>>>>>>   1:  21  10
>>>>>>>>
>>>>>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Gustavo
>>>>>>>>
>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

From david.holmes at oracle.com  Sun May  7 20:45:09 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 8 May 2017 06:45:09 +1000
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <590CD5E7.10809@linux.vnet.ibm.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com>
 <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
 <b7fe8dec-e9cc-913e-52a0-690ec664ba88@oracle.com>
 <590CD5E7.10809@linux.vnet.ibm.com>
Message-ID: <4b26117f-508d-90ee-1ced-2a2c720a1047@oracle.com>

Hi Gustavo,

On 6/05/2017 5:43 AM, Gustavo Romero wrote:
> Hi David,
>
> On 04-05-2017 21:32, David Holmes wrote:
>> Hi Volker, Gustavo,
>>
>> On 4/05/2017 12:34 AM, Volker Simonis wrote:
>>> Hi,
>>>
>>> I've reviewed Gustavo's change and I'm fine with the latest version at:
>>>
>>> http://cr.openjdk.java.net/~gromero/8175813/v3/
>>
>> Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with
>> this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version.
>
> Thanks a lot for reviewing and sponsoring the change.
>
>
>> One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at
>> all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless
>> -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers.
>
> If libnuma is not available in the system os::Linux::libnuma_init() will return
> false and JVM will refuse to enable the UseNUMA features instead of aborting:
>
> 4904   if (UseNUMA) {
> 4905     if (!Linux::libnuma_init()) {
> 4906       UseNUMA = false;
> 4907     } else {
>
> I understand those null checks as part of the initial design of JVM numa api to
> enforce protection against the usage of its methods in other parts of the code
> when JVM api failed to initialize properly, even tho it's expected that
> UseNUMA = false should suffice to protect such a usages.

Ok. Seems like they should be asserts rather than runtime checks if all 
the paths are properly guarded by UseNUMA - but that isn't your problem.

> That said, I could not find any recent Linux distribution that does not support
> libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a
> dependency of metapackage ubuntu-standard and because that requires "irqbalance"
> it also requires libnuma. Libnuma was updated from libnuma v1 to v2
> around mid 2008:

Thanks for the additional info.

> numactl (2.0.1-1) unstable; urgency=low
>
>   * New upstream
>   * patches/static-lib.patch: update
>   * debian/watch: update to new SGI location
>
>  -- Ian Wienand <ianw at debian.org>  Sat, 07 Jun 2008 14:18:22 -0700
>
> numactl (1.0.2-1) unstable; urgency=low
>
>   * New upstream
>   * Closes: #442690 -- Add to rules a hack to remove libnuma.a after
>     unpatching
>   * Update README.debian
>
>
>  -- Ian Wienand <ianw at debian.org>  Wed, 03 Oct 2007 21:49:27 +1000
>
>
> It's similar on RHEL, where "irqbalance" is in core group. Regarding
> the libnuma version it was also updated in 2008 to v2, so since
> Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it:
>
> * Wed Feb 25 2009 Fedora Release Engineering <rel-eng at lists.fedoraproject.org> - 2.0.2-3
> - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild
>
> * Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-2
> - Fix build break due to register selection in asm
>
> * Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-1
> - Update rawhide to version 2.0.2 of numactl
>
> * Fri Apr 25 2008 Neil Horman <nhorman at redhat.com> - 1.0.2-6
> - Fix buffer size passing and arg sanity check for physcpubind (bz 442521)
>
>
> Also, the last release of libnuma v1 dates back to 2008:
> https://github.com/numactl/numactl/releases/tag/v1.0.2
>
> So it looks like libnuma v2 absence on Linux is by now uncommon.
>
>
>> Style nits:
>> - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0"
>> - spaces around operators eg. node=0 should be node = 0
>
> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/

Looks good. Changes being pushed now.

David
-----

>
> Thank you and best regards,
> Gustavo
>
>> Thanks,
>> David
>>
>>> Can somebody please sponsor the change?
>>>
>>> Thank you and best regards,
>>> Volker
>>>
>>>
>>> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero
>>> <gromero at linux.vnet.ibm.com> wrote:
>>>> Hi community,
>>>>
>>>> I understand that there is nothing that can be done additionally regarding this
>>>> issue, at this point, on the PPC64 side.
>>>>
>>>> It's a change in the shared code - but that in effect does not change anything in
>>>> the numa detection mechanism for other platforms - and hence it's necessary a
>>>> conjoint community effort to review the change and a sponsor to run it against
>>>> the JPRT.
>>>>
>>>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
>>>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if
>>>> the community could point out directions on how that change could move on.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>> On 25-04-2017 23:49, Gustavo Romero wrote:
>>>>> Dear Volker,
>>>>>
>>>>> On 24-04-2017 14:08, Volker Simonis wrote:
>>>>>> Hi Gustavo,
>>>>>>
>>>>>> thanks for addressing this problem and sorry for my late reply. I
>>>>>> think this is a good change which definitely improves the situation
>>>>>> for uncommon NUMA configurations without changing the handling for
>>>>>> common topologies.
>>>>>
>>>>> Thanks a lot for reviewing the change!
>>>>>
>>>>>
>>>>>> It would be great if somebody could run this trough JPRT, but as
>>>>>> Gustavo mentioned, I don't expect any regressions.
>>>>>>
>>>>>> @Igor: I think you've been the original author of the NUMA-aware
>>>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>>>>>> linux"). If you could find some spare minutes to take a look at this
>>>>>> change, your comment would be very much appreciated :)
>>>>>>
>>>>>> Following some minor comments from me:
>>>>>>
>>>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>>>>>> to get the actual number of configured nodes. This is good and
>>>>>> certainly an improvement over the previous implementation. However,
>>>>>> the man page for numa_num_configured_nodes() mentions that the
>>>>>> returned count may contain currently disabled nodes. Do we currently
>>>>>> handle disabled nodes? What will be the consequence if we would use
>>>>>> such a disabled node (e.g. mbind() warnings)?
>>>>>
>>>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
>>>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
>>>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
>>>>> number of nodes with memory in the system. To the best of my knowledge there is
>>>>> no system configuration on Linux/PPC64 that could match such a notion of
>>>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
>>>>> that dir and just the ones with memory will be taken into account. If it's
>>>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
>>>>> mbind() tried against it).
>>>>>
>>>>> On Power it's possible to have a numa node without memory (memory-less node, a
>>>>> case covered in this change), a numa node without cpus at all but with memory
>>>>> (a configured node anyway, so a case already covered) but to disable a specific
>>>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible
>>>>> from the inners of the control module. Or other rare condition not invisible /
>>>>> adjustable from the OS. Also I'm not aware of a case where a node is in this
>>>>> dir but is at the same time flagged as something like "disabled". There are
>>>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
>>>>>
>>>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
>>>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
>>>>>
>>>>>
>>>>>> - the same question applies to the usage of
>>>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>>>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>>>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>>>>>> this be a potential problem (i.e. if we use a disabled node).
>>>>>
>>>>> On the meaning of "disabled nodes", it's the same case as above, so to the
>>>>> best of knowledge it's not a potential problem.
>>>>>
>>>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
>>>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly
>>>>> the same pointer returned by numa_get_membind() v2 [3] which:
>>>>>
>>>>> "returns the mask of nodes from which memory can currently be allocated"
>>>>>
>>>>> and that is used, for example, in "numactl --show" to show nodes from where
>>>>> memory can be allocated [4, 5].
>>>>>
>>>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
>>>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
>>>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
>>>>>
>>>>>
>>>>>> - I'd like to suggest renaming the 'index' part of the following
>>>>>> variables and functions to 'nindex' ('node_index' is probably to long)
>>>>>> in the following code, to emphasize that we have node indexes pointing
>>>>>> to actual, not always consecutive node numbers:
>>>>>>
>>>>>> 2879         // Create an index -> node mapping, since nodes are not
>>>>>> always consecutive
>>>>>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>>>>>> GrowableArray<int>(0, true);
>>>>>> 2881         rebuild_index_to_node_map();
>>>>>
>>>>> Simple change but much better to read indeed. Done.
>>>>>
>>>>>
>>>>>> - can you please wrap the following one-line else statement into curly
>>>>>> braces (it's more readable and we usually do it that way in HotSpot
>>>>>> although there are no formal style guidelines :)
>>>>>>
>>>>>> 2953      } else
>>>>>> 2954        // Current node is already a configured node.
>>>>>> 2955        closest_node = index_to_node()->at(i);
>>>>>
>>>>> Done.
>>>>>
>>>>>
>>>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>>>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>>>>>> later avoid the check for '|| !closest_distance'. Also, according to
>>>>>> the man page, numa_distance() returns 0 if it can not determine the
>>>>>> distance. So with the above change, the condition on line 2974 should
>>>>>> read:
>>>>>>
>>>>>> 2947           if (distance && distance < closest_distance) {
>>>>>>
>>>>>
>>>>> Sure, much better to set the initial condition as distant as possible and
>>>>> adjust to a closer one bit by bit improving the if condition. Done.
>>>>>
>>>>>
>>>>>> Finally, and not directly related to your change, I'd suggest the
>>>>>> following clean-ups:
>>>>>>
>>>>>> - remove the usage of 'NCPUS = 32768' in
>>>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>>>>>> unclear to me and probably related to an older version/problem of
>>>>>> libnuma? I think we should simply use
>>>>>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>>>>>
>>>>>> - we still use the NUMA version 1 function prototypes (e.g.
>>>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>>>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>>>>>> also "numa_interleave_memory()" and maybe others). I think we should
>>>>>> switch all prototypes to the new NUMA version 2 interface which you've
>>>>>> already used for the new functions which you've added.
>>>>>
>>>>> I agree. Could I open a new bug to address these clean-ups?
>>>>>
>>>>>
>>>>>> That said, I think these changes all require libnuma 2.0 (see
>>>>>> os::Linux::libnuma_dlsym). So before starting this, you should make
>>>>>> sure that libnuma 2.0 is available on all platforms to which you'd
>>>>>> like to down-port this change. For jdk10 we could definitely do it,
>>>>>> for jdk9 probably also, for jdk8 I'm not so sure.
>>>>>
>>>>> libnuma v1 last release dates back to 2008, but any idea how could I check that
>>>>> for sure since it's on shared code?
>>>>>
>>>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Volker
>>>>>>
>>>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>>>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Any update on it?
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Gustavo
>>>>>>>
>>>>>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Could the following webrev be reviewed please?
>>>>>>>>
>>>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>>>>>> exist in the system.
>>>>>>>>
>>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>>>>>
>>>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>>>>>> consecutive and have memory, for example in a numa topology like:
>>>>>>>>
>>>>>>>> available: 2 nodes (0-1)
>>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>>> node 0 size: 65258 MB
>>>>>>>> node 0 free: 34 MB
>>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>>> node 1 size: 65320 MB
>>>>>>>> node 1 free: 150 MB
>>>>>>>> node distances:
>>>>>>>> node   0   1
>>>>>>>>   0:  10  20
>>>>>>>>   1:  20  10,
>>>>>>>>
>>>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>>>>>> topology like:
>>>>>>>>
>>>>>>>> available: 4 nodes (0-1,16-17)
>>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>>> node 0 size: 130706 MB
>>>>>>>> node 0 free: 7729 MB
>>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>>> node 1 size: 0 MB
>>>>>>>> node 1 free: 0 MB
>>>>>>>> node 16 cpus: 80 88 96 104 112
>>>>>>>> node 16 size: 130630 MB
>>>>>>>> node 16 free: 5282 MB
>>>>>>>> node 17 cpus: 120 128 136 144 152
>>>>>>>> node 17 size: 0 MB
>>>>>>>> node 17 free: 0 MB
>>>>>>>> node distances:
>>>>>>>> node   0   1  16  17
>>>>>>>>   0:  10  20  40  40
>>>>>>>>   1:  20  10  40  40
>>>>>>>>  16:  40  40  10  20
>>>>>>>>  17:  40  40  20  10,
>>>>>>>>
>>>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>>>>>> no memory.
>>>>>>>>
>>>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>>>>>> id as a hint that is not available in the system to be bound (it will receive
>>>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>>>>>> messages:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>>>>>
>>>>>>>> That change improves the detection by making the JVM numa API aware of the
>>>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>>>>>> be available:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>>>>>
>>>>>>>> The change has no effect on numa topologies were the problem does not occur,
>>>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>>>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>>>>>> to the closest node, otherwise they would be not associate to any node and
>>>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>>>>>> performance.
>>>>>>>>
>>>>>>>> I found no regressions on x64 for the following numa topology:
>>>>>>>>
>>>>>>>> available: 2 nodes (0-1)
>>>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>>>>>> node 0 size: 24102 MB
>>>>>>>> node 0 free: 19806 MB
>>>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>>>>>> node 1 size: 24190 MB
>>>>>>>> node 1 free: 21951 MB
>>>>>>>> node distances:
>>>>>>>> node   0   1
>>>>>>>>   0:  10  21
>>>>>>>>   1:  21  10
>>>>>>>>
>>>>>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Gustavo
>>>>>>>>
>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

From HORIE at jp.ibm.com  Mon May  8 04:58:25 2017
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Mon, 8 May 2017 13:58:25 +0900
Subject: 8179527: Implement intrinsic code for reverseBytes with load/store
In-Reply-To: <7827421e2c6447f4ae406434f5bb3d25@sap.com>
References: <OF1FB78FBB.78C06521-ON00258114.004FD8C6-49258114.005135A3@notes.na.collabserv.com>
 <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br>
 <7827421e2c6447f4ae406434f5bb3d25@sap.com>
Message-ID: <OFC09A152C.17EC3547-ON0025811A.0018ED01-4925811A.001B526B@notes.na.collabserv.com>


Dear Martin, Gustavo,

Thank you very much for your helpful comments.

Fixed code is
http://cr.openjdk.java.net/~horii/8179527/webrev.01/

Dear Goetz,
Would you kindly review and sponsor this change?
I heard you are a C2 compiler expert and Martin is out for a while.


Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>,
            Michihiro Horie/Japan/IBM at IBMJP
Cc:	"ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
            "Simonis, Volker" <volker.simonis at sap.com>
Date:	2017/05/03 02:24
Subject:	RE: 8179527: Implement intrinsic code for reverseBytes with
            load/store


Hi Michihiro and Gustavo,

thank you very much for implementing this change.

@Gustavo: Thanks for taking a look.
I think that the direct match rules are just there to satisfy
match_rule_supported. They don't need to be fast, they are just a fall back
solution.
The goal is to exploit the byte reverse load and store instructions which
should match in more performance critical cases.

Now my review:

assembler_ppc.hpp:
Looks good except a minor formatting request:
LDBRX_OPCODE  = (31u << OPCODE_SHIFT |  532 << 1),
should be
LDBRX_OPCODE  = (31u << OPCODE_SHIFT | 532u << 1),
to be consistent.
The comments // X-FORM should be aligned with the other ones.

assembler_ppc.inline.hpp:
Good.

ppc.ad:
I'm concerned about the additional match rules which are only used for the
expand step. They could match directly leading to incorrect code. What they
match is not what they do.
I suggest to implement the code directly in the ins_encode. This would make
the new code significantly shorter and less error prone.
I think we don't need to optimize for Power6 anymore and newer processors
shouldn't really suffer under a little less optimized instruction
scheduling. Would you agree?

Displacements may be too large for "li" so I suggest to use the "indirect"
memory operand and let the compiler handle it. I know that it may increase
latency because the compiler will need to insert an addition which could
better be matched into the memory operand of the load which is harder to
implement (it is possible to match an addition in an operand).


Best regards,
Martin


-----Original Message-----
From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
Sent: Dienstag, 2. Mai 2017 17:05
To: Michihiro Horie <HORIE at jp.ibm.com>
Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net;
Simonis, Volker <volker.simonis at sap.com>; Doerr, Martin
<martin.doerr at sap.com>
Subject: RE: 8179527: Implement intrinsic code for reverseBytes with
load/store

Hi Michihiro,

I wonder if there is no vectorized approach for implementing your
"bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so
intentionally?

> -----Original Message-----
> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> bounces at openjdk.java.net] On Behalf Of Michihiro Horie
> Sent: ter?a-feira, 2 de maio de 2017 11:47
> To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net;
> volker.simonis at sap.com; martin.doerr at sap.com
> Subject: 8179527: Implement intrinsic code for reverseBytes with
> load/store
>
> Dear all,
>
> Would you please review following change?
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8179527
> Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/
>
> I added new intrinsic code for reverseBytes() in ppc.ad with
> * match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
> * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170508/d065ee1a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170508/d065ee1a/graycol.gif>

From gromero at linux.vnet.ibm.com  Mon May  8 14:21:51 2017
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 8 May 2017 11:21:51 -0300
Subject: [10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when
 -XX:+UseNUMA is used
In-Reply-To: <4b26117f-508d-90ee-1ced-2a2c720a1047@oracle.com>
References: <58C1AE06.9060609@linux.vnet.ibm.com>
 <58EEAF7B.6020708@linux.vnet.ibm.com>
 <CA+3eh11Wy_JBOxnoX4g=x7tWuqbUny8HUVq4gYqXoo5fz_AAFw@mail.gmail.com>
 <59000AC0.7050507@linux.vnet.ibm.com> <5909DAAC.3070202@linux.vnet.ibm.com>
 <CA+3eh12Qt9bK7U94i2-KH6abU_9puonF-PO0TDrOHOXidBvFVA@mail.gmail.com>
 <b7fe8dec-e9cc-913e-52a0-690ec664ba88@oracle.com>
 <590CD5E7.10809@linux.vnet.ibm.com>
 <4b26117f-508d-90ee-1ced-2a2c720a1047@oracle.com>
Message-ID: <59107EFF.9000805@linux.vnet.ibm.com>

Hi David, Volker

Thanks a lot reviewing and pushing the change!


Regards,
Gustavo

On 07-05-2017 17:45, David Holmes wrote:
> Hi Gustavo,
> 
> On 6/05/2017 5:43 AM, Gustavo Romero wrote:
>> Hi David,
>>
>> On 04-05-2017 21:32, David Holmes wrote:
>>> Hi Volker, Gustavo,
>>>
>>> On 4/05/2017 12:34 AM, Volker Simonis wrote:
>>>> Hi,
>>>>
>>>> I've reviewed Gustavo's change and I'm fine with the latest version at:
>>>>
>>>> http://cr.openjdk.java.net/~gromero/8175813/v3/
>>>
>>> Nothing has really changed for me since I first looked at this - I don't know NUMA and I can't comment on any of the details. But no-one else has commented negatively so they are implicitly okay with
>>> this, or else they should have spoken up. So with Volker as the Reviewer and myself as a second reviewer, I will sponsor this. I'll run the current patch through JPRT while awaiting the final version.
>>
>> Thanks a lot for reviewing and sponsoring the change.
>>
>>
>>> One thing I was unclear on with all this numa code is the expectation regarding all those dynamically looked up functions - is it expected that we will have them all or else have none? It wasn't at
>>> all obvious what would happen if we don't have those functions but still executed this code - assuming that is even possible. I guess I would have expected that no numa code would execute unless
>>> -XX:+UseNUMA was set, in which case the VM would abort if any of the libnuma functions could not be found. That way we wouldn't need the null checks for the function pointers.
>>
>> If libnuma is not available in the system os::Linux::libnuma_init() will return
>> false and JVM will refuse to enable the UseNUMA features instead of aborting:
>>
>> 4904   if (UseNUMA) {
>> 4905     if (!Linux::libnuma_init()) {
>> 4906       UseNUMA = false;
>> 4907     } else {
>>
>> I understand those null checks as part of the initial design of JVM numa api to
>> enforce protection against the usage of its methods in other parts of the code
>> when JVM api failed to initialize properly, even tho it's expected that
>> UseNUMA = false should suffice to protect such a usages.
> 
> Ok. Seems like they should be asserts rather than runtime checks if all the paths are properly guarded by UseNUMA - but that isn't your problem.
> 
>> That said, I could not find any recent Linux distribution that does not support
>> libnuma v2 api (and so also v1 api). On Ubuntu it will be installed as a
>> dependency of metapackage ubuntu-standard and because that requires "irqbalance"
>> it also requires libnuma. Libnuma was updated from libnuma v1 to v2
>> around mid 2008:
> 
> Thanks for the additional info.
> 
>> numactl (2.0.1-1) unstable; urgency=low
>>
>>   * New upstream
>>   * patches/static-lib.patch: update
>>   * debian/watch: update to new SGI location
>>
>>  -- Ian Wienand <ianw at debian.org>  Sat, 07 Jun 2008 14:18:22 -0700
>>
>> numactl (1.0.2-1) unstable; urgency=low
>>
>>   * New upstream
>>   * Closes: #442690 -- Add to rules a hack to remove libnuma.a after
>>     unpatching
>>   * Update README.debian
>>
>>
>>  -- Ian Wienand <ianw at debian.org>  Wed, 03 Oct 2007 21:49:27 +1000
>>
>>
>> It's similar on RHEL, where "irqbalance" is in core group. Regarding
>> the libnuma version it was also updated in 2008 to v2, so since
>> Fedora 11 contains v2, hence RHEL 6 and RHEL 7 contains it:
>>
>> * Wed Feb 25 2009 Fedora Release Engineering <rel-eng at lists.fedoraproject.org> - 2.0.2-3
>> - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild
>>
>> * Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-2
>> - Fix build break due to register selection in asm
>>
>> * Mon Sep 29 2008 Neil Horman <nhorman at redhat.com> - 2.0.2-1
>> - Update rawhide to version 2.0.2 of numactl
>>
>> * Fri Apr 25 2008 Neil Horman <nhorman at redhat.com> - 1.0.2-6
>> - Fix buffer size passing and arg sanity check for physcpubind (bz 442521)
>>
>>
>> Also, the last release of libnuma v1 dates back to 2008:
>> https://github.com/numactl/numactl/releases/tag/v1.0.2
>>
>> So it looks like libnuma v2 absence on Linux is by now uncommon.
>>
>>
>>> Style nits:
>>> - we should avoid implicit booleans, so the isnode_in_* functions should return bool not int; and check "distance != 0"
>>> - spaces around operators eg. node=0 should be node = 0
>>
>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v4/
> 
> Looks good. Changes being pushed now.
> 
> David
> -----
> 
>>
>> Thank you and best regards,
>> Gustavo
>>
>>> Thanks,
>>> David
>>>
>>>> Can somebody please sponsor the change?
>>>>
>>>> Thank you and best regards,
>>>> Volker
>>>>
>>>>
>>>> On Wed, May 3, 2017 at 3:27 PM, Gustavo Romero
>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>> Hi community,
>>>>>
>>>>> I understand that there is nothing that can be done additionally regarding this
>>>>> issue, at this point, on the PPC64 side.
>>>>>
>>>>> It's a change in the shared code - but that in effect does not change anything in
>>>>> the numa detection mechanism for other platforms - and hence it's necessary a
>>>>> conjoint community effort to review the change and a sponsor to run it against
>>>>> the JPRT.
>>>>>
>>>>> I know it's a stabilizing moment of OpenJDK 9, but since that issue is of
>>>>> great concern on PPC64 (specially on POWER8 machines) I would be very glad if
>>>>> the community could point out directions on how that change could move on.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>> On 25-04-2017 23:49, Gustavo Romero wrote:
>>>>>> Dear Volker,
>>>>>>
>>>>>> On 24-04-2017 14:08, Volker Simonis wrote:
>>>>>>> Hi Gustavo,
>>>>>>>
>>>>>>> thanks for addressing this problem and sorry for my late reply. I
>>>>>>> think this is a good change which definitely improves the situation
>>>>>>> for uncommon NUMA configurations without changing the handling for
>>>>>>> common topologies.
>>>>>>
>>>>>> Thanks a lot for reviewing the change!
>>>>>>
>>>>>>
>>>>>>> It would be great if somebody could run this trough JPRT, but as
>>>>>>> Gustavo mentioned, I don't expect any regressions.
>>>>>>>
>>>>>>> @Igor: I think you've been the original author of the NUMA-aware
>>>>>>> allocator port to Linux (i.e. "6684395: Port NUMA-aware allocator to
>>>>>>> linux"). If you could find some spare minutes to take a look at this
>>>>>>> change, your comment would be very much appreciated :)
>>>>>>>
>>>>>>> Following some minor comments from me:
>>>>>>>
>>>>>>> - in os::numa_get_groups_num() you now use numa_num_configured_nodes()
>>>>>>> to get the actual number of configured nodes. This is good and
>>>>>>> certainly an improvement over the previous implementation. However,
>>>>>>> the man page for numa_num_configured_nodes() mentions that the
>>>>>>> returned count may contain currently disabled nodes. Do we currently
>>>>>>> handle disabled nodes? What will be the consequence if we would use
>>>>>>> such a disabled node (e.g. mbind() warnings)?
>>>>>>
>>>>>> In [1] 'numa_memnode_ptr' is set to keep a list of *just nodes with memory in
>>>>>> found in /sys/devices/system/node/* Hence numa_num_configured_nodes() just
>>>>>> returns the number of nodes in 'numa_memnode_ptr' [2], thus just returns the
>>>>>> number of nodes with memory in the system. To the best of my knowledge there is
>>>>>> no system configuration on Linux/PPC64 that could match such a notion of
>>>>>> "disabled nodes" as it appears in libnuma's manual. If it is enabled, it's in
>>>>>> that dir and just the ones with memory will be taken into account. If it's
>>>>>> disabled (somehow), it's not in the dir, so won't be taken into account (i.e. no
>>>>>> mbind() tried against it).
>>>>>>
>>>>>> On Power it's possible to have a numa node without memory (memory-less node, a
>>>>>> case covered in this change), a numa node without cpus at all but with memory
>>>>>> (a configured node anyway, so a case already covered) but to disable a specific
>>>>>> numa node so it does not appear in /sys/devices/system/node/* it's only possible
>>>>>> from the inners of the control module. Or other rare condition not invisible /
>>>>>> adjustable from the OS. Also I'm not aware of a case where a node is in this
>>>>>> dir but is at the same time flagged as something like "disabled". There are
>>>>>> cpu/memory hotplugs, but that does not change numa nodes status AFAIK.
>>>>>>
>>>>>> [1] https://github.com/numactl/numactl/blob/master/libnuma.c#L334-L347
>>>>>> [2] https://github.com/numactl/numactl/blob/master/libnuma.c#L614-L618
>>>>>>
>>>>>>
>>>>>>> - the same question applies to the usage of
>>>>>>> Linux::isnode_in_configured_nodes() within os::numa_get_leaf_groups().
>>>>>>> Does isnode_in_configured_nodes() (i.e. the node set defined by
>>>>>>> 'numa_all_nodes_ptr' take into account the disabled nodes or not? Can
>>>>>>> this be a potential problem (i.e. if we use a disabled node).
>>>>>>
>>>>>> On the meaning of "disabled nodes", it's the same case as above, so to the
>>>>>> best of knowledge it's not a potential problem.
>>>>>>
>>>>>> Anyway 'numa_all_nodes_ptr' just includes the configured nodes (with memory),
>>>>>> i.e. "all nodes on which the calling task may allocate memory". It's exactly
>>>>>> the same pointer returned by numa_get_membind() v2 [3] which:
>>>>>>
>>>>>> "returns the mask of nodes from which memory can currently be allocated"
>>>>>>
>>>>>> and that is used, for example, in "numactl --show" to show nodes from where
>>>>>> memory can be allocated [4, 5].
>>>>>>
>>>>>> [3] https://github.com/numactl/numactl/blob/master/libnuma.c#L1147
>>>>>> [4] https://github.com/numactl/numactl/blob/master/numactl.c#L144
>>>>>> [5] https://github.com/numactl/numactl/blob/master/numactl.c#L177
>>>>>>
>>>>>>
>>>>>>> - I'd like to suggest renaming the 'index' part of the following
>>>>>>> variables and functions to 'nindex' ('node_index' is probably to long)
>>>>>>> in the following code, to emphasize that we have node indexes pointing
>>>>>>> to actual, not always consecutive node numbers:
>>>>>>>
>>>>>>> 2879         // Create an index -> node mapping, since nodes are not
>>>>>>> always consecutive
>>>>>>> 2880         _index_to_node = new (ResourceObj::C_HEAP, mtInternal)
>>>>>>> GrowableArray<int>(0, true);
>>>>>>> 2881         rebuild_index_to_node_map();
>>>>>>
>>>>>> Simple change but much better to read indeed. Done.
>>>>>>
>>>>>>
>>>>>>> - can you please wrap the following one-line else statement into curly
>>>>>>> braces (it's more readable and we usually do it that way in HotSpot
>>>>>>> although there are no formal style guidelines :)
>>>>>>>
>>>>>>> 2953      } else
>>>>>>> 2954        // Current node is already a configured node.
>>>>>>> 2955        closest_node = index_to_node()->at(i);
>>>>>>
>>>>>> Done.
>>>>>>
>>>>>>
>>>>>>> - in os::Linux::rebuild_cpu_to_node_map(), if you set
>>>>>>> 'closest_distance' to INT_MAX at the beginning of the loop, you can
>>>>>>> later avoid the check for '|| !closest_distance'. Also, according to
>>>>>>> the man page, numa_distance() returns 0 if it can not determine the
>>>>>>> distance. So with the above change, the condition on line 2974 should
>>>>>>> read:
>>>>>>>
>>>>>>> 2947           if (distance && distance < closest_distance) {
>>>>>>>
>>>>>>
>>>>>> Sure, much better to set the initial condition as distant as possible and
>>>>>> adjust to a closer one bit by bit improving the if condition. Done.
>>>>>>
>>>>>>
>>>>>>> Finally, and not directly related to your change, I'd suggest the
>>>>>>> following clean-ups:
>>>>>>>
>>>>>>> - remove the usage of 'NCPUS = 32768' in
>>>>>>> os::Linux::rebuild_cpu_to_node_map(). The comment on that line is
>>>>>>> unclear to me and probably related to an older version/problem of
>>>>>>> libnuma? I think we should simply use
>>>>>>> numa_allocate_cpumask()/numa_free_cpumask() instead.
>>>>>>>
>>>>>>> - we still use the NUMA version 1 function prototypes (e.g.
>>>>>>> "numa_node_to_cpus(int node, unsigned long *buffer, int buffer_len)"
>>>>>>> instead of "numa_node_to_cpus(int node, struct bitmask *mask)", but
>>>>>>> also "numa_interleave_memory()" and maybe others). I think we should
>>>>>>> switch all prototypes to the new NUMA version 2 interface which you've
>>>>>>> already used for the new functions which you've added.
>>>>>>
>>>>>> I agree. Could I open a new bug to address these clean-ups?
>>>>>>
>>>>>>
>>>>>>> That said, I think these changes all require libnuma 2.0 (see
>>>>>>> os::Linux::libnuma_dlsym). So before starting this, you should make
>>>>>>> sure that libnuma 2.0 is available on all platforms to which you'd
>>>>>>> like to down-port this change. For jdk10 we could definitely do it,
>>>>>>> for jdk9 probably also, for jdk8 I'm not so sure.
>>>>>>
>>>>>> libnuma v1 last release dates back to 2008, but any idea how could I check that
>>>>>> for sure since it's on shared code?
>>>>>>
>>>>>> new webrev: http://cr.openjdk.java.net/~gromero/8175813/v3/
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>> Gustavo
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Volker
>>>>>>>
>>>>>>> On Thu, Apr 13, 2017 at 12:51 AM, Gustavo Romero
>>>>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Any update on it?
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Gustavo
>>>>>>>>
>>>>>>>> On 09-03-2017 16:33, Gustavo Romero wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Could the following webrev be reviewed please?
>>>>>>>>>
>>>>>>>>> It improves the numa node detection when non-consecutive or memory-less nodes
>>>>>>>>> exist in the system.
>>>>>>>>>
>>>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
>>>>>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
>>>>>>>>>
>>>>>>>>> Currently, although no problem exists when the JVM detects numa nodes that are
>>>>>>>>> consecutive and have memory, for example in a numa topology like:
>>>>>>>>>
>>>>>>>>> available: 2 nodes (0-1)
>>>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>>>> node 0 size: 65258 MB
>>>>>>>>> node 0 free: 34 MB
>>>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>>>> node 1 size: 65320 MB
>>>>>>>>> node 1 free: 150 MB
>>>>>>>>> node distances:
>>>>>>>>> node   0   1
>>>>>>>>>   0:  10  20
>>>>>>>>>   1:  20  10,
>>>>>>>>>
>>>>>>>>> it fails on detecting numa nodes to be used in the Parallel GC in a numa
>>>>>>>>> topology like:
>>>>>>>>>
>>>>>>>>> available: 4 nodes (0-1,16-17)
>>>>>>>>> node 0 cpus: 0 8 16 24 32
>>>>>>>>> node 0 size: 130706 MB
>>>>>>>>> node 0 free: 7729 MB
>>>>>>>>> node 1 cpus: 40 48 56 64 72
>>>>>>>>> node 1 size: 0 MB
>>>>>>>>> node 1 free: 0 MB
>>>>>>>>> node 16 cpus: 80 88 96 104 112
>>>>>>>>> node 16 size: 130630 MB
>>>>>>>>> node 16 free: 5282 MB
>>>>>>>>> node 17 cpus: 120 128 136 144 152
>>>>>>>>> node 17 size: 0 MB
>>>>>>>>> node 17 free: 0 MB
>>>>>>>>> node distances:
>>>>>>>>> node   0   1  16  17
>>>>>>>>>   0:  10  20  40  40
>>>>>>>>>   1:  20  10  40  40
>>>>>>>>>  16:  40  40  10  20
>>>>>>>>>  17:  40  40  20  10,
>>>>>>>>>
>>>>>>>>> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
>>>>>>>>> no memory.
>>>>>>>>>
>>>>>>>>> If a topology like that exists, os::numa_make_local() will receive a local group
>>>>>>>>> id as a hint that is not available in the system to be bound (it will receive
>>>>>>>>> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
>>>>>>>>> messages:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
>>>>>>>>>
>>>>>>>>> That change improves the detection by making the JVM numa API aware of the
>>>>>>>>> existence of numa nodes that are non-consecutive from 0 to the highest node
>>>>>>>>> number and also of nodes that might be memory-less nodes, i.e. that might not
>>>>>>>>> be, in libnuma terms, a configured node. Hence just the configured nodes will
>>>>>>>>> be available:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
>>>>>>>>>
>>>>>>>>> The change has no effect on numa topologies were the problem does not occur,
>>>>>>>>> i.e. no change in the number of nodes and no change in the cpu to node map. On
>>>>>>>>> numa topologies where memory-less nodes exist (like in the last example above),
>>>>>>>>> cpus from a memory-less node won't be able to bind locally so they are mapped
>>>>>>>>> to the closest node, otherwise they would be not associate to any node and
>>>>>>>>> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
>>>>>>>>> performance.
>>>>>>>>>
>>>>>>>>> I found no regressions on x64 for the following numa topology:
>>>>>>>>>
>>>>>>>>> available: 2 nodes (0-1)
>>>>>>>>> node 0 cpus: 0 1 2 3 8 9 10 11
>>>>>>>>> node 0 size: 24102 MB
>>>>>>>>> node 0 free: 19806 MB
>>>>>>>>> node 1 cpus: 4 5 6 7 12 13 14 15
>>>>>>>>> node 1 size: 24190 MB
>>>>>>>>> node 1 free: 21951 MB
>>>>>>>>> node distances:
>>>>>>>>> node   0   1
>>>>>>>>>   0:  10  21
>>>>>>>>>   1:  21  10
>>>>>>>>>
>>>>>>>>> I understand that fixing the current numa detection is a prerequisite to enable
>>>>>>>>> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Gustavo
>>>>>>>>>
>>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
>>>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
> 


From mikael.vidstedt at oracle.com  Tue May  9 21:29:13 2017
From: mikael.vidstedt at oracle.com (Mikael Vidstedt)
Date: Tue, 9 May 2017 14:29:13 -0700
Subject: RFR(S): 8180003: Remove sys/ prefix from poll.h and signal.h includes
Message-ID: <A35A8531-58FE-4E4C-942D-7B563F9A4BD2@oracle.com>


Please review this small change which removes the sys/ prefix from a bunch of includes of poll.h and signal.h.

hotspot: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/hotspot/webrev/
jdk: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/>

Using the sys/ prefix works on many platforms, but the posix spec makes it clear that the poll.h and signal.h header files should be included without the prefix.

I have verified that this change works on all the Oracle supported platforms, but I could use some help verifying it on AIX.

Cheers,
Mikael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170509/89ed1166/attachment.html>

From brian.burkhalter at oracle.com  Tue May  9 21:45:04 2017
From: brian.burkhalter at oracle.com (Brian Burkhalter)
Date: Tue, 9 May 2017 14:45:04 -0700
Subject: RFR(S): 8180003: Remove sys/ prefix from poll.h and signal.h
 includes
In-Reply-To: <A35A8531-58FE-4E4C-942D-7B563F9A4BD2@oracle.com>
References: <A35A8531-58FE-4E4C-942D-7B563F9A4BD2@oracle.com>
Message-ID: <83DEDA3B-2BD5-4F99-A3BE-2F3AE8F2C39B@oracle.com>


On May 9, 2017, at 2:29 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com> wrote:

> Please review this small change which removes the sys/ prefix from a bunch of includes of poll.h and signal.h.
> 
> hotspot: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/hotspot/webrev/
> jdk: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/

The JDK NIO changes look fine at least.

> Using the sys/ prefix works on many platforms, but the posix spec makes it clear that the poll.h and signal.h header files should be included without the prefix.

Just had to look ?

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/poll.h.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

> I have verified that this change works on all the Oracle supported platforms, but I could use some help verifying it on AIX.

Good about the Oracle platforms.

Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170509/08268754/attachment.html>

From david.holmes at oracle.com  Tue May  9 23:19:23 2017
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 10 May 2017 09:19:23 +1000
Subject: RFR(S): 8180003: Remove sys/ prefix from poll.h and signal.h
 includes
In-Reply-To: <A35A8531-58FE-4E4C-942D-7B563F9A4BD2@oracle.com>
References: <A35A8531-58FE-4E4C-942D-7B563F9A4BD2@oracle.com>
Message-ID: <3e50ead4-6b81-c0f2-1654-847390981357@oracle.com>

Hi Mikael,

To repeat myself from:

http://mail.openjdk.java.net/pipermail/portola-dev/2017-April/000025.html

Changes look okay. I agree with the rationale.

Looking at actual implementations, linux and mac OS are trivially fine
(poll.h just includes sys/poll.h). Solaris is non-trivially fine -
poll.h does more than what sys/poll.h does, but nothing that affects our 
sources.

Thanks,
David

:)

On 10/05/2017 7:29 AM, Mikael Vidstedt wrote:
>
> Please review this small change which removes the sys/ prefix from a bunch of includes of poll.h and signal.h.
>
> hotspot: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/hotspot/webrev/
> jdk: http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180003/webrev.00/jdk/webrev/>
>
> Using the sys/ prefix works on many platforms, but the posix spec makes it clear that the poll.h and signal.h header files should be included without the prefix.
>
> I have verified that this change works on all the Oracle supported platforms, but I could use some help verifying it on AIX.
>
> Cheers,
> Mikael
>

From HORIE at jp.ibm.com  Thu May 11 06:46:32 2017
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Thu, 11 May 2017 15:46:32 +0900
Subject: Optimizing byte reverse code for int value
In-Reply-To: <174bf72968b5473cb3757a4f1c125bf7@sap.com>
References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>,
 <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com>
 <OFAF77E611.37600CDE-ON002580FB.001DEF7C-492580FB.0020094D@notes.na.collabserv.com>
 <acab7aa6-3685-d4fc-e93a-33099251f830@redhat.com>
 <OF6F0869C1.8027F3E2-ON492580FB.00600291-492580FB.006210B0@notes.na.collabserv.com>
 <OFC1B580DF.9ABD9B32-ON002580FF.0039A3FB-002580FF.003A49AC@notes.na.collabserv.com>
 <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com>
 <OFA45720EF.81931F53-ON002580FF.00507F13-002580FF.0051FAFE@notes.na.collabserv.com>
 <OF7EC4D38F.9CDD929F-ON00258109.0056ADAD-00258109.005994CC@notes.na.collabserv.com>
 <2e13a32b56cd4d9f89758f4042602e9a@sap.com>
 <OFEF526CAC.E20698FB-ON0025810D.0042D5D1-4925810E.00115AC1@notes.na.collabserv.com>
 <174bf72968b5473cb3757a4f1c125bf7@sap.com>
Message-ID: <OF9C800689.BAD95653-ON0025811D.0024B441-4925811D.00253875@notes.na.collabserv.com>


Martin,

Thanks a lot for your helpful comments. I fixed my code.
http://cr.openjdk.java.net/~horii/8178294/webrev.06/

>@Andrew: Do you think this is the right way to do it and is there a chance
to get it in jdk8u?
Andrew, I would be grateful if you would approve this change for jdk8u.

Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie/Japan/IBM at IBMJP, "aph at redhat.com"
            <aph at redhat.com>
Cc:	Gustavo Bueno Romero <gromero at br.ibm.com>, Hiroshi H
            Horii/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net"
            <hotspot-dev at openjdk.java.net>,
            "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>, "Simonis, Volker"
            <volker.simonis at sap.com>
Date:	2017/04/26 18:04
Subject:	RE: Optimizing byte reverse code for int value


Hi Michihiro,

this looks better, now.

Just a few comments:
      -          I think ?UseUnalignedAccesses? should be used instead of
      #ifdef SPARC. Other platforms can also be affected.
      -          In theory, I think that an ordered load may get matched
      which would get replaced by an unordered one. I guess this would
      probably never occur, but I think such changes should be absolutely
      bullet proof J

Besides that, it looks correct to me.

@Andrew: Do you think this is the right way to do it and is there a chance
to get it in jdk8u?

Best regards,
Martin


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Mittwoch, 26. April 2017 05:10
To: Doerr, Martin <martin.doerr at sap.com>
Cc: aph at redhat.com; Gustavo Bueno Romero <gromero at br.ibm.com>; Hiroshi H
Horii <HORII at jp.ibm.com>; hotspot-dev at openjdk.java.net;
ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
Subject: RE: Optimizing byte reverse code for int value


Martin,

Thanks a lot for your comments. I fixed my code.
Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.05/

Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for "Doerr, Martin" ---2017/04/24 18:11:29---Hi
Michihiro, please note that I?m not a jdk8u reviewer."Doerr, Martin"
---2017/04/24 18:11:29---Hi Michihiro, please note that I?m not a jdk8u
reviewer.

From: "Doerr, Martin" <martin.doerr at sap.com>
To: Michihiro Horie/Japan/IBM at IBMJP
Cc: "aph at redhat.com" <aph at redhat.com>, Hiroshi H Horii/Japan/IBM at IBMJP, "
hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>, "
ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>,
"Simonis, Volker" <volker.simonis at sap.com>, Gustavo Bueno Romero <
gromero at br.ibm.com>
Date: 2017/04/24 18:11
Subject: RE: Optimizing byte reverse code for int value


Hi Michihiro,

please note that I?m not a jdk8u reviewer.
However, I have taken a quick look and I have the following concerns:
            1. I think it?s incorrect for Big Endian.
            2. The pattern can also match for an unaligned 4 byte address
            which would break platforms like SPARC.
            3. I couldn?t see checks for shift amount and masks.

Best regards,
Martin


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Freitag, 21. April 2017 18:18
To: Doerr, Martin <martin.doerr at sap.com>
Cc: aph at redhat.com; Hiroshi H Horii <HORII at jp.ibm.com>;
hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis,
Volker <volker.simonis at sap.com>; Gustavo Bueno Romero <gromero at br.ibm.com>
Subject: RE: Optimizing byte reverse code for int value

Would you review following change for jdk8?

Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/

Our byte-reverse optimization now works in shared code. I tested it with
jtreg on x86, ppc64, and ppc64le.
Best regards,
--
Michihiro,
IBM Research - Tokyo


----- Original message -----
From: "Doerr, Martin" <martin.doerr at sap.com>
To: Michihiro Horie/Japan/IBM at IBMJP
Cc: "aph at redhat.com" <aph at redhat.com>, Hiroshi H Horii/Japan/IBM at IBMJP, "
hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>, "
ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>,
"Simonis, Volker" <volker.simonis at sap.com>
Subject: RE: Optimizing byte reverse code for int value
Date: Wed, Apr 12, 2017 12:13 AM


Hi Michihiro,


thanks for the quick reply.


I think Andrew?s idea is to optimize in the shared code instead of the
platform backends. I haven?t thought about where this could be done.


Or would it be possible to backport jdk (especially Unsafe) changes? If the
required changes are small enough and we don?t have to touch any public
interface, this might be an option, too.


We?ll appreciate if you take care of the new match rules for PPC64. Thanks
a lot.


Best regards,


Martin


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Dienstag, 11. April 2017 16:55
To: Doerr, Martin <martin.doerr at sap.com>
Cc: aph at redhat.com; Hiroshi H Horii <HORII at jp.ibm.com>;
hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis,
Volker <volker.simonis at sap.com>
Subject: RE: Optimizing byte reverse code for int value


Andrew, Martin,


Thanks a lot for your helpful feedback.


>Have you considered it as a generic optimization for all processors?


We would support all processors for our byte-reverse optimization to make
it generic.


Currently, I just finished adding match rules for little endian and big
endian on PPC64, and am testing it in AIX.


>In addition, I noticed that we don't have match rules which exploit byte
reverse load/store instructions on PPC64.


We would like to handle adding match rules for byte reverse load/store
instructions on PPC64 for JDK10 if you would not mind.


Would it be fine with you?


Best regards,
--
Michihiro,
IBM Research - Tokyo


----- Original message -----
From: "Doerr, Martin" <martin.doerr at sap.com>
To: Andrew Haley <aph at redhat.com>, Michihiro Horie/Japan/IBM at IBMJP
Cc: "Simonis, Volker" <volker.simonis at sap.com>, "
ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>, "
hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>, Hiroshi H
Horii/Japan/IBM at IBMJP
Subject: RE: Optimizing byte reverse code for int value
Date: Tue, Apr 11, 2017 10:44 PM


Hi Andrew,

thank you for your helpful comments. I fully agree with you.

In addition, I noticed that we don't have match rules which exploit byte
reverse load/store instructions on PPC64.
SPARC already has them:
match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
I think we should add them for jdk10. They should be used when the platform
endianness doesn't match the bigEndian parameter in Unsafe methods.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf
Of Andrew Haley
Sent: Dienstag, 11. April 2017 13:02
To: Michihiro Horie <HORIE at jp.ibm.com>
Cc: Simonis, Volker <volker.simonis at sap.com>;
ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H
Horii <HORII at jp.ibm.com>
Subject: Re: Optimizing byte reverse code for int value

On 11/04/17 11:36, Michihiro Horie wrote:
> Thank you very much for letting us know Unsafe.getIntUnaligned is
available in
> JDK9. I do agree we should fix Java source code.
> We think our byte-reverse optimization would still work on jdk8u as
Hiroshi
> mentioned. Would you agree on this point?

I do, but I do not agree that this patch should necessarily be done in
the PowerPC-specific back end. Have you considered it as a generic
optimization for all processors?

Andrew.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170511/6364d2e0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170511/6364d2e0/graycol-0001.gif>

From Derek.White at cavium.com  Thu May 11 18:33:18 2017
From: Derek.White at cavium.com (White, Derek)
Date: Thu, 11 May 2017 18:33:18 +0000
Subject: Optimizing byte reverse code for int value
In-Reply-To: <4bdec074-3884-497e-ec86-f5a2dab6202f@redhat.com>
References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>
 <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com>
 <OFAF77E611.37600CDE-ON002580FB.001DEF7C-492580FB.0020094D@notes.na.collabserv.com>
 <acab7aa6-3685-d4fc-e93a-33099251f830@redhat.com>
 <OF6F0869C1.8027F3E2-ON492580FB.00600291-492580FB.006210B0@notes.na.collabserv.com>
 <OFC1B580DF.9ABD9B32-ON002580FF.0039A3FB-002580FF.003A49AC@notes.na.collabserv.com>
 <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com>
 <OFA45720EF.81931F53-ON002580FF.00507F13-002580FF.0051FAFE@notes.na.collabserv.com>
 <OF7EC4D38F.9CDD929F-ON00258109.0056ADAD-00258109.005994CC@notes.na.collabserv.com>
 <2e13a32b56cd4d9f89758f4042602e9a@sap.com>
 <OFEF526CAC.E20698FB-ON0025810D.0042D5D1-4925810E.00115AC1@notes.na.collabserv.com>
 <174bf72968b5473cb3757a4f1c125bf7@sap.com>
 <OF9C800689.BAD95653-ON0025811D.0024B441-4925811D.00253875@notes.na.collabserv.com>
 <4bdec074-3884-497e-ec86-f5a2dab6202f@redhat.com>
Message-ID: <CY1PR0701MB163281F38DD1261AC2C6A62984ED0@CY1PR0701MB1632.namprd07.prod.outlook.com>

Hi Michihiro,

Not a jdk8u reviewer OR C2 expert, but a possible simplification:

I think a tree like:

 //     AndI 
 //      /\ 
// LoadB  ConI(255)

will get turned into a LoadUBNode, via AndINode::Ideal() and AndINode::Identity(). It certainly should, considering how often this code pattern is used!

If so, you should be able to simplify your pattern matching greatly.

 - Derek

-----Original Message-----
From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley
Sent: Thursday, May 11, 2017 5:02 AM
To: Michihiro Horie <HORIE at jp.ibm.com>; Doerr, Martin <martin.doerr at sap.com>
Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii <HORII at jp.ibm.com>; Simonis, Volker <volker.simonis at sap.com>
Subject: Re: Optimizing byte reverse code for int value

On 11/05/17 07:46, Michihiro Horie wrote:

> Thanks a lot for your helpful comments. I fixed my code.
> http://cr.openjdk.java.net/~horii/8178294/webrev.06/
> 
>> @Andrew: Do you think this is the right way to do it and is there a 
>> chance
> to get it in jdk8u?
> Andrew, I would be grateful if you would approve this change for jdk8u.

The list of jdk8u reviewers is at
http://openjdk.java.net/census#jdk8u.  You'll want someone who is on the HotSpot team.

I have mixed feelings about this patch.  It seems too specific to me:
if you had something that would work with any integer type it would be more useful, I feel.  And - generally speaking - the rule is that patches go into JDK 9 first, but JDK 9 is closed for enhancements.

So, I'm sorry for the bad news.  Your patch looks interesting and useful but I do not know how to get it committed.

Andrew.


From thomas.stuefe at gmail.com  Tue May 16 12:50:32 2017
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 16 May 2017 14:50:32 +0200
Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174
Message-ID: <CAA-vtUxE4PA3QmS6JCTnRdjqmeQVqxEPk7d55yoRDmxVCfWV4Q@mail.gmail.com>

Hi all,

may I have a review for this tiny fix:

Issue: https://bugs.openjdk.java.net/browse/JDK-8180424
webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another-build-issue-on-aix-after-8034174/webrev.00/webrev/

The prototypes for NET_RecvFrom and NET_Accept do not match their
implementations for AIX since 8034174. This did not lead to an error in
jdk9 because there, the header (net_util_md.h) was not included by
aix_close.c. In JDK10, it is included and therefore does not build.

I believe this did not lead to a runtime error on jdk9, at least not for
the typical values involved; the mismatch is between int* and unsigned int*
(native socklen_t).

Kind Regards, Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170516/ecf03caa/attachment.html>

From christoph.langer at sap.com  Wed May 17 15:11:26 2017
From: christoph.langer at sap.com (Langer, Christoph)
Date: Wed, 17 May 2017 15:11:26 +0000
Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174
In-Reply-To: <CAA-vtUxE4PA3QmS6JCTnRdjqmeQVqxEPk7d55yoRDmxVCfWV4Q@mail.gmail.com>
References: <CAA-vtUxE4PA3QmS6JCTnRdjqmeQVqxEPk7d55yoRDmxVCfWV4Q@mail.gmail.com>
Message-ID: <57ba6104846c4ca2b8fd496f119ee853@sap.com>

Hi Thomas,

this looks good and should definitely be downported to JDK9 as soon as possible.

Best regards
Christoph

From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Thomas St?fe
Sent: Dienstag, 16. Mai 2017 14:51
To: ppc-aix-port-dev at openjdk.java.net; net-dev <net-dev at openjdk.java.net>
Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174

Hi all,

may I have a review for this tiny fix:

Issue: https://bugs.openjdk.java.net/browse/JDK-8180424
webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another-build-issue-on-aix-after-8034174/webrev.00/webrev/

The prototypes for NET_RecvFrom and NET_Accept do not match their implementations for AIX since 8034174. This did not lead to an error in jdk9 because there, the header (net_util_md.h) was not included by aix_close.c. In JDK10, it is included and therefore does not build.

I believe this did not lead to a runtime error on jdk9, at least not for the typical values involved; the mismatch is between int* and unsigned int* (native socklen_t).

Kind Regards, Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170517/7ba6b196/attachment.html>

From vyom.tewari at oracle.com  Wed May 17 15:17:26 2017
From: vyom.tewari at oracle.com (Vyom Tewari)
Date: Wed, 17 May 2017 20:47:26 +0530
Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174
In-Reply-To: <CAA-vtUxE4PA3QmS6JCTnRdjqmeQVqxEPk7d55yoRDmxVCfWV4Q@mail.gmail.com>
References: <CAA-vtUxE4PA3QmS6JCTnRdjqmeQVqxEPk7d55yoRDmxVCfWV4Q@mail.gmail.com>
Message-ID: <dfbcd3f8-1956-34d2-8549-6921e9a0d0a1@oracle.com>

Hi Thomas,

fix look good to me, but i am not jdk10 reviewer.

Thanks,

Vyom


On Tuesday 16 May 2017 06:20 PM, Thomas St?fe wrote:
> Hi all,
>
> may I have a review for this tiny fix:
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8180424
> webrev: 
> http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another-build-issue-on-aix-after-8034174/webrev.00/webrev/ 
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8180424-another-build-issue-on-aix-after-8034174/webrev.00/webrev/>
>
> The prototypes for NET_RecvFrom and NET_Accept do not match their 
> implementations for AIX since 8034174. This did not lead to an error 
> in jdk9 because there, the header (net_util_md.h) was not included by 
> aix_close.c. In JDK10, it is included and therefore does not build.
>
> I believe this did not lead to a runtime error on jdk9, at least not 
> for the typical values involved; the mismatch is between int* and 
> unsigned int* (native socklen_t).
>
> Kind Regards, Thomas


From thomas.stuefe at gmail.com  Thu May 18 10:07:55 2017
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 18 May 2017 12:07:55 +0200
Subject: RFR (xxs): 8180424: Another build issue on AIX after 8034174
In-Reply-To: <dfbcd3f8-1956-34d2-8549-6921e9a0d0a1@oracle.com>
References: <CAA-vtUxE4PA3QmS6JCTnRdjqmeQVqxEPk7d55yoRDmxVCfWV4Q@mail.gmail.com>
 <dfbcd3f8-1956-34d2-8549-6921e9a0d0a1@oracle.com>
Message-ID: <CAA-vtUyyJa0Ji7PKK2TVbR1EuksGCad8ycZrwOcZpc=-YYRKpw@mail.gmail.com>

Thanks guys!

I requested a fix for jdk9. Lets see how that goes.

Best Regards, Thomas

On Wed, May 17, 2017 at 5:17 PM, Vyom Tewari <vyom.tewari at oracle.com> wrote:

> Hi Thomas,
>
> fix look good to me, but i am not jdk10 reviewer.
>
> Thanks,
>
> Vyom
>
>
> On Tuesday 16 May 2017 06:20 PM, Thomas St?fe wrote:
>
>> Hi all,
>>
>> may I have a review for this tiny fix:
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8180424
>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8180424-another-
>> build-issue-on-aix-after-8034174/webrev.00/webrev/ <
>> http://cr.openjdk.java.net/%7Estuefe/webrevs/8180424-anothe
>> r-build-issue-on-aix-after-8034174/webrev.00/webrev/>
>>
>> The prototypes for NET_RecvFrom and NET_Accept do not match their
>> implementations for AIX since 8034174. This did not lead to an error in
>> jdk9 because there, the header (net_util_md.h) was not included by
>> aix_close.c. In JDK10, it is included and therefore does not build.
>>
>> I believe this did not lead to a runtime error on jdk9, at least not for
>> the typical values involved; the mismatch is between int* and unsigned int*
>> (native socklen_t).
>>
>> Kind Regards, Thomas
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170518/e41ce59b/attachment-0001.html>

From mikael.vidstedt at oracle.com  Fri May 26 21:22:34 2017
From: mikael.vidstedt at oracle.com (Mikael Vidstedt)
Date: Fri, 26 May 2017 14:22:34 -0700
Subject: RFR(S): 8180184: Add DATA and FSIZE to os::Posix::print_rlimit_info
Message-ID: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com>


Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans up/unifies some of the related code.

Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 <https://bugs.openjdk.java.net/browse/JDK-8180184>
Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/>

Tested using JPRT. Manually verified that the crash dump contains the expected information.

Thanks to Thomas for helping verify that the change works as expected on AIX as well!

Cheers,
Mikael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170526/01b6ec1e/attachment.html>

From david.holmes at oracle.com  Mon May 29 07:10:52 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 29 May 2017 17:10:52 +1000
Subject: RFR(S): 8180184: Add DATA and FSIZE to
 os::Posix::print_rlimit_info
In-Reply-To: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com>
References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com>
Message-ID: <92b540d7-dc21-047a-a1e8-8c191a3f7b74@oracle.com>

Hi Mikael,

Looks okay - good to see the code sharing.

I wonder if the C compiler converts x/20124 into x >>10 ? :)

Cheers,
David

On 27/05/2017 7:22 AM, Mikael Vidstedt wrote:
> 
> Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans up/unifies some of the related code.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 <https://bugs.openjdk.java.net/browse/JDK-8180184>
> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/>
> 
> Tested using JPRT. Manually verified that the crash dump contains the expected information.
> 
> Thanks to Thomas for helping verify that the change works as expected on AIX as well!
> 
> Cheers,
> Mikael
> 

From david.holmes at oracle.com  Mon May 29 07:12:06 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 29 May 2017 17:12:06 +1000
Subject: RFR(S): 8180184: Add DATA and FSIZE to
 os::Posix::print_rlimit_info
In-Reply-To: <92b540d7-dc21-047a-a1e8-8c191a3f7b74@oracle.com>
References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com>
 <92b540d7-dc21-047a-a1e8-8c191a3f7b74@oracle.com>
Message-ID: <9e60e467-6cc1-62a9-f84d-7fc4cd03fc0e@oracle.com>

On 29/05/2017 5:10 PM, David Holmes wrote:
> Hi Mikael,
> 
> Looks okay - good to see the code sharing.
> 
> I wonder if the C compiler converts x/20124 into x >>10 ? :)

Don't know what happened there: x/1024 into x>>10 :)

David

> Cheers,
> David
> 
> On 27/05/2017 7:22 AM, Mikael Vidstedt wrote:
>>
>> Please review the following fix which adds RLIMIT_DATA and 
>> RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans 
>> up/unifies some of the related code.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 
>> <https://bugs.openjdk.java.net/browse/JDK-8180184>
>> Webrev: 
>> http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ 
>> <http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/> 
>>
>>
>> Tested using JPRT. Manually verified that the crash dump contains the 
>> expected information.
>>
>> Thanks to Thomas for helping verify that the change works as expected 
>> on AIX as well!
>>
>> Cheers,
>> Mikael
>>

From thomas.stuefe at gmail.com  Mon May 29 08:22:39 2017
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 29 May 2017 10:22:39 +0200
Subject: RFR(S): 8180184: Add DATA and FSIZE to
 os::Posix::print_rlimit_info
In-Reply-To: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com>
References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com>
Message-ID: <CAA-vtUxbVrNWfcZr5tr-_AAm+y2fvxbeQQAgW-WK=mvgLY8=EQ@mail.gmail.com>

Hi Mikael,

looks fine.

Small nit, we never seem to check the return value of getrlimit(). But
seeing that the only way to fail getrlimit() would be to specify an invalid
limit constant, maybe this is ok.

Best Regards, Thomas

On Fri, May 26, 2017 at 11:22 PM, Mikael Vidstedt <
mikael.vidstedt at oracle.com> wrote:

>
> Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to
> the rlimit related data in the crash dump, and cleans up/unifies some of
> the related code.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8180184
> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/
> 8180184/webrev.00/hotspot/webrev/
>
> Tested using JPRT. Manually verified that the crash dump contains the
> expected information.
>
> Thanks to Thomas for helping verify that the change works as expected on
> AIX as well!
>
> Cheers,
> Mikael
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170529/0d9ba586/attachment.html>

From thomas.stuefe at gmail.com  Tue May 30 09:46:41 2017
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 30 May 2017 11:46:41 +0200
Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds
Message-ID: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>

Hi all,

may I have please a review for this tiny change:

Bug: https://bugs.openjdk.java.net/browse/JDK-8181207
webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/

This reverts 8177809 for AIX because it leads to build errors on older AIX
systems. We want to retain the ability to build on older AIX releases.

Thanks, Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170530/40eaae91/attachment.html>

From mikael.vidstedt at oracle.com  Wed May 31 00:20:43 2017
From: mikael.vidstedt at oracle.com (Mikael Vidstedt)
Date: Tue, 30 May 2017 17:20:43 -0700
Subject: RFR(S): 8180184: Add DATA and FSIZE to
 os::Posix::print_rlimit_info
In-Reply-To: <CAA-vtUxbVrNWfcZr5tr-_AAm+y2fvxbeQQAgW-WK=mvgLY8=EQ@mail.gmail.com>
References: <17B78E0D-C38B-4C8C-9F52-FA572563D485@oracle.com>
 <CAA-vtUxbVrNWfcZr5tr-_AAm+y2fvxbeQQAgW-WK=mvgLY8=EQ@mail.gmail.com>
Message-ID: <C42F2331-4B8C-48DA-9B62-EF96E2CF2D45@oracle.com>


David, I assume that the C++ compilers will all convert it to a simple shift by ten, but I?m not going to verify it :)

Thomas, agree that in theory the return value from getrlimit should be checked, but chose to not make any further modifications as part of this change.

Thanks to both of you for the reviews!

Cheers,
Mikael

> On May 29, 2017, at 1:22 AM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> 
> Hi Mikael,
> 
> looks fine. 
> 
> Small nit, we never seem to check the return value of getrlimit(). But seeing that the only way to fail getrlimit() would be to specify an invalid limit constant, maybe this is ok.
> 
> Best Regards, Thomas
> 
> On Fri, May 26, 2017 at 11:22 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com <mailto:mikael.vidstedt at oracle.com>> wrote:
> 
> Please review the following fix which adds RLIMIT_DATA and RLIMIT_FISZE to the rlimit related data in the crash dump, and cleans up/unifies some of the related code.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8180184 <https://bugs.openjdk.java.net/browse/JDK-8180184>
> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180184/webrev.00/hotspot/webrev/>
> 
> Tested using JPRT. Manually verified that the crash dump contains the expected information.
> 
> Thanks to Thomas for helping verify that the change works as expected on AIX as well!
> 
> Cheers,
> Mikael
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170530/a32f0136/attachment-0001.html>

From vyom.tewari at oracle.com  Wed May 31 04:13:23 2017
From: vyom.tewari at oracle.com (Vyom Tewari)
Date: Wed, 31 May 2017 09:43:23 +0530
Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds
In-Reply-To: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>
References: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>
Message-ID: <489a79d1-d484-e5ae-e33d-7b32ec307a64@oracle.com>

Hi Thomas,

Change looks good to me, but i am not official reviewer.

Thanks,

Vyom


On Tuesday 30 May 2017 03:16 PM, Thomas St?fe wrote:
> Hi all,
>
> may I have please a review for this tiny change:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8181207
> webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/
>
> This reverts 8177809 for AIX because it leads to build errors on older AIX
> systems. We want to retain the ability to build on older AIX releases.
>
> Thanks, Thomas


From thomas.stuefe at gmail.com  Wed May 31 05:46:10 2017
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 31 May 2017 07:46:10 +0200
Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds
In-Reply-To: <489a79d1-d484-e5ae-e33d-7b32ec307a64@oracle.com>
References: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>
 <489a79d1-d484-e5ae-e33d-7b32ec307a64@oracle.com>
Message-ID: <CAA-vtUyhiiEcMnw54BaOu6UJSTZZ7rjDvtS_wa7zXfHNLaJ90A@mail.gmail.com>

Thank you Vyom!

On Wed, May 31, 2017 at 6:13 AM, Vyom Tewari <vyom.tewari at oracle.com> wrote:

> Hi Thomas,
>
> Change looks good to me, but i am not official reviewer.
>
> Thanks,
>
> Vyom
>
>
>
> On Tuesday 30 May 2017 03:16 PM, Thomas St?fe wrote:
>
>> Hi all,
>>
>> may I have please a review for this tiny change:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8181207
>> webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-
>> breaks-AIX-builds/webrev.00/webrev/
>>
>> This reverts 8177809 for AIX because it leads to build errors on older AIX
>> systems. We want to retain the ability to build on older AIX releases.
>>
>> Thanks, Thomas
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170531/7485d80d/attachment.html>

From volker.simonis at gmail.com  Wed May 31 08:49:27 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Wed, 31 May 2017 10:49:27 +0200
Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds
In-Reply-To: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>
References: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>
Message-ID: <CA+3eh11SQoQCyNFsN==o3S2dhc1i6TOy1TSNGdrpgp9nFvBJZw@mail.gmail.com>

Hi Thomas,

as far as I can see, AIX supports both, the st_[a,c,m]time members in
the stat64 structure for seconds and the corresponding
st_[a,c,m]time_n members for nanosecond resolution since at least 5.3.
Can you please use both - there's no reason to discriminate AIX here
:)

Also, can you please change the code such that we have:

#ifdef MACOSX
...
#else
#ifdef AIX
...
#else
...
#endif
#endif

I don't really like using "ifndef XXX" for everything else except XXX.

Thnank you and best regards,
Volker


On Tue, May 30, 2017 at 11:46 AM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> Hi all,
>
> may I have please a review for this tiny change:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8181207
> webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/
>
> This reverts 8177809 for AIX because it leads to build errors on older AIX
> systems. We want to retain the ability to build on older AIX releases.
>
> Thanks, Thomas

From HORIE at jp.ibm.com  Wed May 31 12:36:27 2017
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Wed, 31 May 2017 20:36:27 +0800
Subject: 8179527: Implement intrinsic code for reverseBytes with load/store
In-Reply-To: <ec51524d3e3a4cd9b4cc8c245555e192@sap.com>
References: <OF1FB78FBB.78C06521-ON00258114.004FD8C6-49258114.005135A3@notes.na.collabserv.com>
 <3507c10563a84106ac6c2e8d2554c053@serv030.corp.eldorado.org.br>
 <7827421e2c6447f4ae406434f5bb3d25@sap.com>
 <OFC09A152C.17EC3547-ON0025811A.0018ED01-4925811A.001B526B@notes.na.collabserv.com>
 <ec51524d3e3a4cd9b4cc8c245555e192@sap.com>
Message-ID: <OFE1B20852.64237EB0-ON00258131.00449E1E-48258131.004541AC@notes.na.collabserv.com>

Martin,

Thank you very much for your helpful comments and sponsoring this change.

Would you review the latest change?
http://cr.openjdk.java.net/~horii/8179527/webrev.02/

Best regards,
--
Michihiro,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie <HORIE at jp.ibm.com>
Cc:	"hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
            "Simonis, Volker" <volker.simonis at sap.com>, Hiroshi H Horii
            <HORII at jp.ibm.com>, Gustavo Bueno Romero <gromero at br.ibm.com>
Date:	2017/05/30 01:26
Subject:	RE: 8179527: Implement intrinsic code for reverseBytes with
            load/store


Hi Michihiro,

thanks for the improved webrev. This looks better, but I still have a
couple of suggestions.

1.
I still don?t like match rules which contain nodes which do something else
(even though direct matching is prohibited by predicate).
I think it would be better to remove ?match(?)?, ?predicate(false)? and
?ins_const(?)? and just describe the ?effect()?. At least, I?m not aware of
why a match rule should be needed for rldicl and extsh.

2.
I?d appreciate if you could remove ?predicate
(UseCountLeadingZerosInstructionsPPC64)? from all byte_reverse_... rules.
They don?t make any sense (not your fault).

3.
The costs seem not to be set appropriately in the byte_reverse_... rules.
E.g. instruction count * DEFAULT_COST would be better.

4.
The load/store byte reversed instructions should use the 2 operand form (no
explicit 0 for R0 to support assertions).

Maybe we can find a 2nd reviewer if you provide a new webrev. I can sponsor
the change.

Thanks and best regards,
Martin


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Montag, 8. Mai 2017 06:58
To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz
<goetz.lindenmaier at sap.com>
Cc: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>;
hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis,
Volker <volker.simonis at sap.com>; Hiroshi H Horii <HORII at jp.ibm.com>;
Gustavo Bueno Romero <gromero at br.ibm.com>
Subject: RE: 8179527: Implement intrinsic code for reverseBytes with
load/store


Dear Martin, Gustavo,

Thank you very much for your helpful comments.

Fixed code is
http://cr.openjdk.java.net/~horii/8179527/webrev.01/

Dear Goetz,
Would you kindly review and sponsor this change?
I heard you are a C2 compiler expert and Martin is out for a while.


Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for "Doerr, Martin" ---2017/05/03 02:24:18---Hi
Michihiro and Gustavo, thank you very much for implementi"Doerr, Martin"
---2017/05/03 02:24:18---Hi Michihiro and Gustavo, thank you very much for
implementing this change.

From: "Doerr, Martin" <martin.doerr at sap.com>
To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>, Michihiro
Horie/Japan/IBM at IBMJP
Cc: "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net
>, "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>, "Simonis,
Volker" <volker.simonis at sap.com>
Date: 2017/05/03 02:24
Subject: RE: 8179527: Implement intrinsic code for reverseBytes with
load/store


Hi Michihiro and Gustavo,

thank you very much for implementing this change.

@Gustavo: Thanks for taking a look.
I think that the direct match rules are just there to satisfy
match_rule_supported. They don't need to be fast, they are just a fall back
solution.
The goal is to exploit the byte reverse load and store instructions which
should match in more performance critical cases.

Now my review:

assembler_ppc.hpp:
Looks good except a minor formatting request:
LDBRX_OPCODE  = (31u << OPCODE_SHIFT |  532 << 1),
should be
LDBRX_OPCODE  = (31u << OPCODE_SHIFT | 532u << 1),
to be consistent.
The comments // X-FORM should be aligned with the other ones.

assembler_ppc.inline.hpp:
Good.

ppc.ad:
I'm concerned about the additional match rules which are only used for the
expand step. They could match directly leading to incorrect code. What they
match is not what they do.
I suggest to implement the code directly in the ins_encode. This would make
the new code significantly shorter and less error prone.
I think we don't need to optimize for Power6 anymore and newer processors
shouldn't really suffer under a little less optimized instruction
scheduling. Would you agree?

Displacements may be too large for "li" so I suggest to use the "indirect"
memory operand and let the compiler handle it. I know that it may increase
latency because the compiler will need to insert an addition which could
better be matched into the memory operand of the load which is harder to
implement (it is possible to match an addition in an operand).


Best regards,
Martin


-----Original Message-----
From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
Sent: Dienstag, 2. Mai 2017 17:05
To: Michihiro Horie <HORIE at jp.ibm.com>
Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net;
Simonis, Volker <volker.simonis at sap.com>; Doerr, Martin <
martin.doerr at sap.com>
Subject: RE: 8179527: Implement intrinsic code for reverseBytes with
load/store

Hi Michihiro,

I wonder if there is no vectorized approach for implementing your
"bytes_reverse_long_Ex" instruct on ppc.ad. Or did you avoid doing it so
intentionally?

> -----Original Message-----
> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> bounces at openjdk.java.net] On Behalf Of Michihiro Horie
> Sent: ter?a-feira, 2 de maio de 2017 11:47
> To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net;
> volker.simonis at sap.com; martin.doerr at sap.com
> Subject: 8179527: Implement intrinsic code for reverseBytes with
> load/store
>
> Dear all,
>
> Would you please review following change?
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8179527
> Webrev: http://cr.openjdk.java.net/~horii/8179527/webrev.00/
>
> I added new intrinsic code for reverseBytes() in ppc.ad with
> * match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
> * match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170531/6abe2f85/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170531/6abe2f85/graycol-0001.gif>

From thomas.stuefe at gmail.com  Wed May 31 15:29:40 2017
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 31 May 2017 17:29:40 +0200
Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds
In-Reply-To: <CA+3eh11SQoQCyNFsN==o3S2dhc1i6TOy1TSNGdrpgp9nFvBJZw@mail.gmail.com>
References: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>
 <CA+3eh11SQoQCyNFsN==o3S2dhc1i6TOy1TSNGdrpgp9nFvBJZw@mail.gmail.com>
Message-ID: <CAA-vtUzG-CtmO=p_iLW=_yFNg1Gp1LoCnkd0mCnOAOJJ_nLw8Q@mail.gmail.com>

Hi Volker,

Good suggestions! I completely overlooked the ..._n members in stat64
struct. It seems it is even documented:
https://www.ibm.com/support/knowledgecenter/ssw_aix_72/com.ibm.aix.files/stat.h.htm

new webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.01/webrev/

..Thomas

On Wed, May 31, 2017 at 10:49 AM, Volker Simonis <volker.simonis at gmail.com>
wrote:

> Hi Thomas,
>
> as far as I can see, AIX supports both, the st_[a,c,m]time members in
> the stat64 structure for seconds and the corresponding
> st_[a,c,m]time_n members for nanosecond resolution since at least 5.3.
> Can you please use both - there's no reason to discriminate AIX here
> :)
>
> Also, can you please change the code such that we have:
>
> #ifdef MACOSX
> ...
> #else
> #ifdef AIX
> ...
> #else
> ...
> #endif
> #endif
>
> I don't really like using "ifndef XXX" for everything else except XXX.
>
> Thnank you and best regards,
> Volker
>
>
> On Tue, May 30, 2017 at 11:46 AM, Thomas St?fe <thomas.stuefe at gmail.com>
> wrote:
> > Hi all,
> >
> > may I have please a review for this tiny change:
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8181207
> > webrev:
> > http://cr.openjdk.java.net/~stuefe/webrevs/8181207-
> 8177809-breaks-AIX-builds/webrev.00/webrev/
> >
> > This reverts 8177809 for AIX because it leads to build errors on older
> AIX
> > systems. We want to retain the ability to build on older AIX releases.
> >
> > Thanks, Thomas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170531/e8ce72a2/attachment.html>

From christoph.langer at sap.com  Wed May 31 21:41:14 2017
From: christoph.langer at sap.com (Langer, Christoph)
Date: Wed, 31 May 2017 21:41:14 +0000
Subject: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds
In-Reply-To: <CAA-vtUzG-CtmO=p_iLW=_yFNg1Gp1LoCnkd0mCnOAOJJ_nLw8Q@mail.gmail.com>
References: <CAA-vtUyVyMSF08CNnWsX_gL9w77BZGz-wV9XPMarbO=gKZMX6Q@mail.gmail.com>
 <CA+3eh11SQoQCyNFsN==o3S2dhc1i6TOy1TSNGdrpgp9nFvBJZw@mail.gmail.com>
 <CAA-vtUzG-CtmO=p_iLW=_yFNg1Gp1LoCnkd0mCnOAOJJ_nLw8Q@mail.gmail.com>
Message-ID: <05c5fe05f8cb4c8b831255600a0eb2e9@sap.com>

Hi Thomas,

looks good.

Some suggestions about formatting:

a) you could code write your code like this:

#if defined(_AIX)
?
#elif defined(MACOSX)
?
#else
?
#endif

That way the coding has 3 clear sections and you don?t have to do an #ifdef block in another #ifdef.

b)  Line 234, 235 (AIX block), rather write:
             rv   = (jlong)sb.st_mtime * 1000;
             rv += (jlong)sb.st_mtime_n / 1000000;
Then it looks aligned with the MACOSX and the default section.

Best regards
Christoph


From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Thomas St?fe
Sent: Mittwoch, 31. Mai 2017 17:30
To: Volker Simonis <volker.simonis at gmail.com>
Cc: ppc-aix-port-dev at openjdk.java.net; Java Core Libs <core-libs-dev at openjdk.java.net>
Subject: Re: JDK10: RFR(xxs): 8181207: 8177809 breaks AIX 5.3, 6.1 builds

Hi Volker,

Good suggestions! I completely overlooked the ..._n members in stat64 struct. It seems it is even documented: https://www.ibm.com/support/knowledgecenter/ssw_aix_72/com.ibm.aix.files/stat.h.htm

new webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.01/webrev/

..Thomas

On Wed, May 31, 2017 at 10:49 AM, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>> wrote:
Hi Thomas,

as far as I can see, AIX supports both, the st_[a,c,m]time members in
the stat64 structure for seconds and the corresponding
st_[a,c,m]time_n members for nanosecond resolution since at least 5.3.
Can you please use both - there's no reason to discriminate AIX here
:)

Also, can you please change the code such that we have:

#ifdef MACOSX
...
#else
#ifdef AIX
...
#else
...
#endif
#endif

I don't really like using "ifndef XXX" for everything else except XXX.

Thnank you and best regards,
Volker


On Tue, May 30, 2017 at 11:46 AM, Thomas St?fe <thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com>> wrote:
> Hi all,
>
> may I have please a review for this tiny change:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8181207
> webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8181207-8177809-breaks-AIX-builds/webrev.00/webrev/
>
> This reverts 8177809 for AIX because it leads to build errors on older AIX
> systems. We want to retain the ability to build on older AIX releases.
>
> Thanks, Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20170531/a9a41451/attachment-0001.html>