RFR: 2178143: VM crashes if the number of bound CPUs changed during runtime
Dmitry Samersoff
dmitry.samersoff at oracle.com
Thu Mar 21 09:16:13 UTC 2013
Yumin,
My few cents.
1. I think MP/SP and number of active CPUs is a different problems, so
we should have separate boolean AssumeMP flag and
integer NumberOfProcessors flag.
Where NumberOfProcessors > 1 set AssumeMP = True, but NumberOfProcessors
< 2 *doesn't set* AssumeMP = False.
2. As a long term, IMHO, we should go to always being MP. If the cost of
it is too high for some platform, make it compile time platform depended
decision.
-Dmitry
On 2013-03-21 09:28, David Holmes wrote:
> Hi Yumin,
>
> On 21/03/2013 2:37 PM, Yumin Qi wrote:
> <snip>
>> I think this is only a workaround and not a solution for the specific
>> use case, or there is no perfect solution for it. If customers decided
>> to use this flag, they should be aware that they are ready not to
>> consider performance at first. For using number of available
>> processors, we need to add code to get that number, not the one in
>> hotspot os::active_processor_count() which will return the number of
>> live processors. So do you think I could just use a flag -XX:+AssumeMP
>> work around this problem? Since GC threads will not be based on this
>> assumption, I agree (Harold pointed this out either) that one flag is
>> simpler. In fact, the code for is_MP() is obsolete for today's
>> computers, all with multi-cores (even for cell chips) so I think better
>> solution is remove call to is_MP() in all places in hotspot. I will
>> prepare another webrev with -XX:+AssumeMP for next codereview.
>
> As per my follow up email I think AssumeMP is the way to go for now. We
> just need to decide whether we think the default should be true or false.
>
> Removing is_MP() altogether is more problematic because of the range of
> platforms this has to work. A build time solution would define is_MP()
> as a constant for those platforms that want that, and then the is_MP
> calls will not appear in the generated code - while still allowing other
> platforms to only insert MP code when needed. But that disallows any
> runtime configuration for those platforms we assume are always MP.
>
>> For number of GC Threads:
>> unsigned int Abstract_VM_Version::nof_parallel_worker_threads(
>> unsigned int num,
>> unsigned int den,
>> unsigned int
>> switch_pt) {
>> if (FLAG_IS_DEFAULT(ParallelGCThreads)) {
>> assert(ParallelGCThreads == 0, "Default ParallelGCThreads is not
>> 0");
>> // For very large machines, there are diminishing returns
>> // for large numbers of worker threads. Instead of
>> // hogging the whole system, use a fraction of the workers for every
>> // processor after the first 8. For example, on a 72 cpu machine
>> // and a chosen fraction of 5/8
>> // use 8 + (72 - 8) * (5/8) == 48 worker threads.
>> unsigned int ncpus = (unsigned int) os::active_processor_count();
>> return (ncpus <= switch_pt) ?
>> ncpus :
>> (switch_pt + ((ncpus - switch_pt) * num) / den);
>> } else {
>> return ParallelGCThreads;
>> }
>> }
>>
>> the call to this function is
>> unsigned int Abstract_VM_Version::calc_parallel_worker_threads() {
>> return nof_parallel_worker_threads(5, 8, 8);
>> }
>>
>> We can see that, if active_processor_count is 1, it will return 1, and
>> the VM will run with single GC thread. So the better choices maybe:
>>
>> 1) get processor count not active processor count for ParallelGCThreads,
>> that is up to decision from GC team.
>
> Given this occurs at VM startup there may not be any difference. It
> depends on the OS and any "container" facility (like Solaris zones) as
> to what number of processors will be seen to "exist" versus what are
> seen to be "available".
>
> But this is a GC ergonomics issue distinct from the is_MP problem.
>
>> 2) Recommend usage is
>>
>> -XX:+AssumeMP -XX:ParallelGCThreads=<number>
>
> It is hard to know whether the people launching the VM will have the
> necessary knowledge as to what to put here.
>
> David
> -----
>
>> Thanks
>> Yumin
>>
>>>> 2178143: VM crashes if the number of bound CPUs changed during
>>>> runtime.
>>>>
>>>> Situation: Customer first configure only one CPU online and turn others
>>>> offline to run java application, after java program started, bring more
>>>> CPUs back online. Since VM started on a single CPU, os::is_MP() will
>>>> return false, but after more CPUs available, OS will schedule the app
>>>> run on multiple CPUs, this caused SEGV in various places where data
>>>> consistency was broken. The solution is supply a flag to assume it is
>>>> running on MP, so lock is forced to be called.
>>>>
>>>> http://cr.openjdk.java.net/~minqi/2178143/
>>>>
>>>> Thanks
>>>> Yumin
>>
--
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* Give Rabbit time, and he'll always get the answer
More information about the hotspot-gc-dev
mailing list