RFR: JDK-8144312: Remove limitations on the default number of jobs in the build

Wed Dec 16 14:08:23 UTC 2015

On 2015-12-15 19:38, Martin Buchholz wrote:
> Actually calling nproc as a separate process at runtime is interesting
> but totally unorthodox.
>
> I think the configury pain is the usual: detecting sched.h,
> sched_getaffinity, CPU_COUNT, don't forget _GNU_SOURCE, check you're
> on a glibc system, probably check at runtime too, so use dlsym to
> access sched_getaffinity, look for similar hacks on non-glibc systems.
> Worry about systems with more than 1024 cpus.  Worry about
> sched_getaffinity returning a higher number than the old way.
>
> Is that enough things to worry about?

Are you talking about JDK-6515172? I was thinking on how to implement a 
proper check in the configure script, where calling separate process are 
not so unorthodox after all. ;-)

I'd still like to see some real-world confirmation that nproc does 
indeed return the correct number of cpus in a Docker environment.

/Magnus

>
> On Tue, Dec 15, 2015 at 5:28 AM, Magnus Ihse Bursie
> <magnus.ihse.bursie at oracle.com> wrote:
>> On 2015-12-15 04:27, Martin Buchholz wrote:
>>> My current mental model is
>>> configured cpus >= online cpus >= allowed cpus
>>> In a traditional system they are all the same.
>>>
>>> I experimented and saw that cpusets are indeed turned on in some
>>> systems used for testing at Google.
>>> I.e. allowed cpus is a strict subset of online cpus.
>>>
>>> It seems likely that the following would be a better implementation of
>>> availableProcessors on Linux:
>>>
>>>     cpu_set_t s;
>>>     return (sched_getaffinity(0, sizeof(s), &s) == 0) ? CPU_COUNT(&s) :
>>> fallback_to_old_way();
>>>
>>> with all the pain in configury.
>>
>> Making system calls from configure is more than acceptably difficult. :-)
>> But if nproc does this, we can do something like checking if nproc is
>> present, and if so, if it returns a non-zero value, we use it, otherwise we
>> fall back to the current method. Is that what you're suggesting?
>>
>> /Magnus
>>
>>
>>
>>> On Mon, Dec 14, 2015 at 6:58 AM, Mikael Gerdin <mikael.gerdin at oracle.com>
>>> wrote:
>>>> Hi David,
>>>>
>>>> On 2015-12-11 14:21, David Holmes wrote:
>>>>> On 11/12/2015 11:16 PM, Magnus Ihse Bursie wrote:
>>>>>> On 2015-12-03 03:11, Roger Riggs wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> It would be useful to figure out the number of cpus available when in
>>>>>>> a container.
>>>>>>> Some comments have added to:
>>>>>>> 8140793 <https://bugs.openjdk.java.net/browse/JDK-8140793>
>>>>>>> getAvailableProcessors may incorrectly report the number of cpus in
>>>>>>> Docker container
>>>>>>>
>>>>>>> But so far we haven't dug deep enough.   Suggestions are welcome?
>>>>>>
>>>>>>
>>>>>> http://serverfault.com/questions/691659/count-number-of-allowed-cpus-in-a-docker-container
>>>>>>
>>>>>> suggests running nproc. I'm not sure if that can be counted on to be
>>>>>> present, but we could certainly check for it.
>>>>>
>>>>> I'd like to know how nproc does it so we can try to apply the same logic
>>>>> in the VM for Runtime.availableProcessors. Can someone actually confirm
>>>>> that it returns the number of processors available to the container?
>>>>
>>>> I don't have a container at hand but running nproc under strace suggests
>>>> that it calls sched_getaffinity and counts the number of set bits in the
>>>> cpu
>>>> affinity mask:
>>>>
>>>> $ strace -e trace=sched_getaffinity nproc
>>>> sched_getaffinity(0, 128, {f, 0, 0, 0}) = 32
>>>> 4
>>>> +++ exited with 0 +++
>>>>
>>>> It would be nice if anyone with access to a system where the number of
>>>> cpus
>>>> is limited in a similar manner to a docker container could run the above
>>>> command and see if it
>>>> 1) returns the correct number of cpus
>>>> 2) works as I think, that is, it counts the number of set bits in the
>>>> array
>>>> which is the third syscall argument.
>>>>
>>>>
>>>> /Mikael
>>>>
>>>>
>>>>
>>>>> David
>>>>>
>>>>>> /Magnus
>>>>>>
>>>>>>> Roger
>>>>>>>
>>>>>>>
>>>>>>> On 12/2/15 6:59 PM, Martin Buchholz wrote:
>>>>>>>> Not to say you shouldn't do this, but I worry that increasingly
>>>>>>>> computing
>>>>>>>> is being done in "containers" where e.g. the number of cpus is
>>>>>>>> doubling
>>>>>>>> every year but only a small number are available to actually be used
>>>>>>>> by a
>>>>>>>> given process.  if availableProcessors reports 1 million, what
>>>>>>>> should we
>>>>>>>> do?  (no need to answer...)
>>>>>>>>
>>>>>>>> On Tue, Dec 1, 2015 at 1:55 AM, Erik Joelsson
>>>>>>>> <erik.joelsson at oracle.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> The current heuristic for figuring out what to default set the -j
>>>>>>>>> flag to
>>>>>>>>> make needs some tweaking.
>>>>>>>>>
>>>>>>>>> In JDK 9, it looks at the amount of memory and the number of cpus in
>>>>>>>>> the
>>>>>>>>> system. It divides memory by 1024 to get a safe number of jobs that
>>>>>>>>> will
>>>>>>>>> fit into memory. The lower of that number and the number of cpus is
>>>>>>>>> then
>>>>>>>>> picked. The number is then scaled down to about 90% of the number of
>>>>>>>>> cpus
>>>>>>>>> to leave some resources for other activities. It is also capped at
>>>>>>>>> 16.
>>>>>>>>>
>>>>>>>>> Since we now have the build using "nice" to make sure the build
>>>>>>>>> isn't
>>>>>>>>> bogging down the system, I see no reason to do the 90% scaling
>>>>>>>>> anymore.
>>>>>>>>> Also, the performance issues that forced us to cap at 16 have long
>>>>>>>>> been
>>>>>>>>> fixed, and even if we don't scale well beyond 16, we do still scale.
>>>>>>>>> So I
>>>>>>>>> propose we remove that arbitrary limitation too.
>>>>>>>>>
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144312
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~erikj/8144312/webrev.01/
>>>>>>>>>
>>>>>>>>> /Erik
>>>>>>>>>