RFR: JDK-8144312: Remove limitations on the default number of jobs in the build
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Tue Dec 15 13:28:06 UTC 2015
On 2015-12-15 04:27, Martin Buchholz wrote:
> My current mental model is
> configured cpus >= online cpus >= allowed cpus
> In a traditional system they are all the same.
>
> I experimented and saw that cpusets are indeed turned on in some
> systems used for testing at Google.
> I.e. allowed cpus is a strict subset of online cpus.
>
> It seems likely that the following would be a better implementation of
> availableProcessors on Linux:
>
> cpu_set_t s;
> return (sched_getaffinity(0, sizeof(s), &s) == 0) ? CPU_COUNT(&s) :
> fallback_to_old_way();
>
> with all the pain in configury.
Making system calls from configure is more than acceptably difficult.
:-) But if nproc does this, we can do something like checking if nproc
is present, and if so, if it returns a non-zero value, we use it,
otherwise we fall back to the current method. Is that what you're
suggesting?
/Magnus
>
> On Mon, Dec 14, 2015 at 6:58 AM, Mikael Gerdin <mikael.gerdin at oracle.com> wrote:
>> Hi David,
>>
>> On 2015-12-11 14:21, David Holmes wrote:
>>> On 11/12/2015 11:16 PM, Magnus Ihse Bursie wrote:
>>>> On 2015-12-03 03:11, Roger Riggs wrote:
>>>>> Hi,
>>>>>
>>>>> It would be useful to figure out the number of cpus available when in
>>>>> a container.
>>>>> Some comments have added to:
>>>>> 8140793 <https://bugs.openjdk.java.net/browse/JDK-8140793>
>>>>> getAvailableProcessors may incorrectly report the number of cpus in
>>>>> Docker container
>>>>>
>>>>> But so far we haven't dug deep enough. Suggestions are welcome?
>>>>
>>>> http://serverfault.com/questions/691659/count-number-of-allowed-cpus-in-a-docker-container
>>>>
>>>> suggests running nproc. I'm not sure if that can be counted on to be
>>>> present, but we could certainly check for it.
>>>
>>> I'd like to know how nproc does it so we can try to apply the same logic
>>> in the VM for Runtime.availableProcessors. Can someone actually confirm
>>> that it returns the number of processors available to the container?
>>
>> I don't have a container at hand but running nproc under strace suggests
>> that it calls sched_getaffinity and counts the number of set bits in the cpu
>> affinity mask:
>>
>> $ strace -e trace=sched_getaffinity nproc
>> sched_getaffinity(0, 128, {f, 0, 0, 0}) = 32
>> 4
>> +++ exited with 0 +++
>>
>> It would be nice if anyone with access to a system where the number of cpus
>> is limited in a similar manner to a docker container could run the above
>> command and see if it
>> 1) returns the correct number of cpus
>> 2) works as I think, that is, it counts the number of set bits in the array
>> which is the third syscall argument.
>>
>>
>> /Mikael
>>
>>
>>
>>> David
>>>
>>>> /Magnus
>>>>
>>>>> Roger
>>>>>
>>>>>
>>>>> On 12/2/15 6:59 PM, Martin Buchholz wrote:
>>>>>> Not to say you shouldn't do this, but I worry that increasingly
>>>>>> computing
>>>>>> is being done in "containers" where e.g. the number of cpus is doubling
>>>>>> every year but only a small number are available to actually be used
>>>>>> by a
>>>>>> given process. if availableProcessors reports 1 million, what
>>>>>> should we
>>>>>> do? (no need to answer...)
>>>>>>
>>>>>> On Tue, Dec 1, 2015 at 1:55 AM, Erik Joelsson
>>>>>> <erik.joelsson at oracle.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> The current heuristic for figuring out what to default set the -j
>>>>>>> flag to
>>>>>>> make needs some tweaking.
>>>>>>>
>>>>>>> In JDK 9, it looks at the amount of memory and the number of cpus in
>>>>>>> the
>>>>>>> system. It divides memory by 1024 to get a safe number of jobs that
>>>>>>> will
>>>>>>> fit into memory. The lower of that number and the number of cpus is
>>>>>>> then
>>>>>>> picked. The number is then scaled down to about 90% of the number of
>>>>>>> cpus
>>>>>>> to leave some resources for other activities. It is also capped at 16.
>>>>>>>
>>>>>>> Since we now have the build using "nice" to make sure the build isn't
>>>>>>> bogging down the system, I see no reason to do the 90% scaling
>>>>>>> anymore.
>>>>>>> Also, the performance issues that forced us to cap at 16 have long
>>>>>>> been
>>>>>>> fixed, and even if we don't scale well beyond 16, we do still scale.
>>>>>>> So I
>>>>>>> propose we remove that arbitrary limitation too.
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144312
>>>>>>> Webrev: http://cr.openjdk.java.net/~erikj/8144312/webrev.01/
>>>>>>>
>>>>>>> /Erik
>>>>>>>
More information about the build-dev
mailing list