RFR: 2178143: VM crashes if the number of bound CPUs changed during runtime

Thu Mar 21 09:16:13 UTC 2013

Yumin,

My few cents.

1. I think MP/SP and number of active CPUs is a different problems, so
we should have separate boolean AssumeMP flag and
integer NumberOfProcessors flag.

Where NumberOfProcessors > 1 set AssumeMP = True, but NumberOfProcessors
< 2 *doesn't set* AssumeMP = False.

2. As a long term, IMHO, we should go to always being MP. If the cost of
it is too high for some platform, make it compile time platform depended
decision.

-Dmitry

On 2013-03-21 09:28, David Holmes wrote:
> Hi Yumin,
> 
> On 21/03/2013 2:37 PM, Yumin Qi wrote:
> <snip>
>> I think this is only a workaround and not a solution for the specific
>> use case, or  there is no perfect solution for it. If customers decided
>> to use this flag,  they should be aware that they are ready not to
>> consider performance at first.  For using number of available
>> processors, we need to add code to get that number, not the one in
>> hotspot os::active_processor_count() which will return the number of
>> live processors.  So do you think I could just use a flag -XX:+AssumeMP
>> work around  this problem? Since GC threads will not be based on this
>> assumption, I agree (Harold pointed this out either) that one flag is
>> simpler.  In fact, the code for is_MP() is obsolete for today's
>> computers, all with multi-cores (even for cell chips) so I think better
>> solution is remove call to is_MP() in all places in hotspot. I will
>> prepare another webrev with -XX:+AssumeMP for next codereview.
> 
> As per my follow up email I think AssumeMP is the way to go for now. We
> just need to decide whether we think the default should be true or false.
> 
> Removing is_MP() altogether is more problematic because of the range of
> platforms this has to work. A build time solution would define is_MP()
> as a constant for those platforms that want that, and then the is_MP
> calls will not appear in the generated code - while still allowing other
> platforms to only insert MP code when needed. But that disallows any
> runtime configuration for those platforms we assume are always MP.
> 
>> For number of GC Threads:
>> unsigned int Abstract_VM_Version::nof_parallel_worker_threads(
>>                                                        unsigned int num,
>>                                                        unsigned int den,
>>                                                        unsigned int
>> switch_pt) {
>>    if (FLAG_IS_DEFAULT(ParallelGCThreads)) {
>>      assert(ParallelGCThreads == 0, "Default ParallelGCThreads is not
>> 0");
>>      // For very large machines, there are diminishing returns
>>      // for large numbers of worker threads.  Instead of
>>      // hogging the whole system, use a fraction of the workers for every
>>      // processor after the first 8.  For example, on a 72 cpu machine
>>      // and a chosen fraction of 5/8
>>      // use 8 + (72 - 8) * (5/8) == 48 worker threads.
>>      unsigned int ncpus = (unsigned int) os::active_processor_count();
>>      return (ncpus <= switch_pt) ?
>>             ncpus :
>>            (switch_pt + ((ncpus - switch_pt) * num) / den);
>>    } else {
>>      return ParallelGCThreads;
>>    }
>> }
>>
>> the call to this function is
>> unsigned int Abstract_VM_Version::calc_parallel_worker_threads() {
>>    return nof_parallel_worker_threads(5, 8, 8);
>> }
>>
>> We can see that, if active_processor_count is 1, it will return 1, and
>> the VM will run with single GC thread. So the better choices maybe:
>>
>> 1) get processor count not active processor count for ParallelGCThreads,
>> that is up to decision from GC team.
> 
> Given this occurs at VM startup there may not be any difference. It
> depends on the OS and any "container" facility (like Solaris zones) as
> to what number of processors will be seen to "exist" versus what are
> seen to be "available".
> 
> But this is a GC ergonomics issue distinct from the is_MP problem.
> 
>> 2) Recommend usage is
>>
>>    -XX:+AssumeMP -XX:ParallelGCThreads=<number>
> 
> It is hard to know whether the people launching the VM will have the
> necessary knowledge as to what to put here.
> 
> David
> -----
> 
>> Thanks
>> Yumin
>>
>>>> 2178143:  VM crashes if the number of bound CPUs changed during
>>>> runtime.
>>>>
>>>> Situation: Customer first configure only one CPU online and turn others
>>>> offline to run java application, after java program started, bring more
>>>> CPUs back online. Since VM started on a single CPU, os::is_MP() will
>>>> return false, but after more CPUs available, OS will schedule the app
>>>> run on multiple CPUs, this caused SEGV in various places where data
>>>> consistency was broken. The solution is supply a flag to assume it is
>>>> running on MP, so lock is forced to be called.
>>>>
>>>> http://cr.openjdk.java.net/~minqi/2178143/
>>>>
>>>> Thanks
>>>> Yumin
>>

-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* Give Rabbit time, and he'll always get the answer