RFR: JDK-8144312: Remove limitations on the default number of jobs in the build

Martin Buchholz martinrb at google.com
Tue Dec 15 18:38:19 UTC 2015


Actually calling nproc as a separate process at runtime is interesting
but totally unorthodox.

I think the configury pain is the usual: detecting sched.h,
sched_getaffinity, CPU_COUNT, don't forget _GNU_SOURCE, check you're
on a glibc system, probably check at runtime too, so use dlsym to
access sched_getaffinity, look for similar hacks on non-glibc systems.
Worry about systems with more than 1024 cpus.  Worry about
sched_getaffinity returning a higher number than the old way.

Is that enough things to worry about?

On Tue, Dec 15, 2015 at 5:28 AM, Magnus Ihse Bursie
<magnus.ihse.bursie at oracle.com> wrote:
> On 2015-12-15 04:27, Martin Buchholz wrote:
>>
>> My current mental model is
>> configured cpus >= online cpus >= allowed cpus
>> In a traditional system they are all the same.
>>
>> I experimented and saw that cpusets are indeed turned on in some
>> systems used for testing at Google.
>> I.e. allowed cpus is a strict subset of online cpus.
>>
>> It seems likely that the following would be a better implementation of
>> availableProcessors on Linux:
>>
>>    cpu_set_t s;
>>    return (sched_getaffinity(0, sizeof(s), &s) == 0) ? CPU_COUNT(&s) :
>> fallback_to_old_way();
>>
>> with all the pain in configury.
>
>
> Making system calls from configure is more than acceptably difficult. :-)
> But if nproc does this, we can do something like checking if nproc is
> present, and if so, if it returns a non-zero value, we use it, otherwise we
> fall back to the current method. Is that what you're suggesting?
>
> /Magnus
>
>
>
>>
>> On Mon, Dec 14, 2015 at 6:58 AM, Mikael Gerdin <mikael.gerdin at oracle.com>
>> wrote:
>>>
>>> Hi David,
>>>
>>> On 2015-12-11 14:21, David Holmes wrote:
>>>>
>>>> On 11/12/2015 11:16 PM, Magnus Ihse Bursie wrote:
>>>>>
>>>>> On 2015-12-03 03:11, Roger Riggs wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It would be useful to figure out the number of cpus available when in
>>>>>> a container.
>>>>>> Some comments have added to:
>>>>>> 8140793 <https://bugs.openjdk.java.net/browse/JDK-8140793>
>>>>>> getAvailableProcessors may incorrectly report the number of cpus in
>>>>>> Docker container
>>>>>>
>>>>>> But so far we haven't dug deep enough.   Suggestions are welcome?
>>>>>
>>>>>
>>>>>
>>>>> http://serverfault.com/questions/691659/count-number-of-allowed-cpus-in-a-docker-container
>>>>>
>>>>> suggests running nproc. I'm not sure if that can be counted on to be
>>>>> present, but we could certainly check for it.
>>>>
>>>>
>>>> I'd like to know how nproc does it so we can try to apply the same logic
>>>> in the VM for Runtime.availableProcessors. Can someone actually confirm
>>>> that it returns the number of processors available to the container?
>>>
>>>
>>> I don't have a container at hand but running nproc under strace suggests
>>> that it calls sched_getaffinity and counts the number of set bits in the
>>> cpu
>>> affinity mask:
>>>
>>> $ strace -e trace=sched_getaffinity nproc
>>> sched_getaffinity(0, 128, {f, 0, 0, 0}) = 32
>>> 4
>>> +++ exited with 0 +++
>>>
>>> It would be nice if anyone with access to a system where the number of
>>> cpus
>>> is limited in a similar manner to a docker container could run the above
>>> command and see if it
>>> 1) returns the correct number of cpus
>>> 2) works as I think, that is, it counts the number of set bits in the
>>> array
>>> which is the third syscall argument.
>>>
>>>
>>> /Mikael
>>>
>>>
>>>
>>>> David
>>>>
>>>>> /Magnus
>>>>>
>>>>>> Roger
>>>>>>
>>>>>>
>>>>>> On 12/2/15 6:59 PM, Martin Buchholz wrote:
>>>>>>>
>>>>>>> Not to say you shouldn't do this, but I worry that increasingly
>>>>>>> computing
>>>>>>> is being done in "containers" where e.g. the number of cpus is
>>>>>>> doubling
>>>>>>> every year but only a small number are available to actually be used
>>>>>>> by a
>>>>>>> given process.  if availableProcessors reports 1 million, what
>>>>>>> should we
>>>>>>> do?  (no need to answer...)
>>>>>>>
>>>>>>> On Tue, Dec 1, 2015 at 1:55 AM, Erik Joelsson
>>>>>>> <erik.joelsson at oracle.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> The current heuristic for figuring out what to default set the -j
>>>>>>>> flag to
>>>>>>>> make needs some tweaking.
>>>>>>>>
>>>>>>>> In JDK 9, it looks at the amount of memory and the number of cpus in
>>>>>>>> the
>>>>>>>> system. It divides memory by 1024 to get a safe number of jobs that
>>>>>>>> will
>>>>>>>> fit into memory. The lower of that number and the number of cpus is
>>>>>>>> then
>>>>>>>> picked. The number is then scaled down to about 90% of the number of
>>>>>>>> cpus
>>>>>>>> to leave some resources for other activities. It is also capped at
>>>>>>>> 16.
>>>>>>>>
>>>>>>>> Since we now have the build using "nice" to make sure the build
>>>>>>>> isn't
>>>>>>>> bogging down the system, I see no reason to do the 90% scaling
>>>>>>>> anymore.
>>>>>>>> Also, the performance issues that forced us to cap at 16 have long
>>>>>>>> been
>>>>>>>> fixed, and even if we don't scale well beyond 16, we do still scale.
>>>>>>>> So I
>>>>>>>> propose we remove that arbitrary limitation too.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144312
>>>>>>>> Webrev: http://cr.openjdk.java.net/~erikj/8144312/webrev.01/
>>>>>>>>
>>>>>>>> /Erik
>>>>>>>>
>



More information about the build-dev mailing list