RFR: 2178143: VM crashes if the number of bound CPUs changed during runtime

David Holmes david.holmes at oracle.com
Thu Mar 21 21:05:51 UTC 2013


Yumin,

The problem is environments where the number of configured CPUs changes 
not the online CPUs. Try using Solaris zones and dynamically changing 
the number of CPUs for the zone. If you start with one CPU then the VM 
stores 1 in _processor_count and forever thereafter is_MP() returns 
false. But the number of CPUs available can be increased and so we crash 
due to missing memory barriers etc.

David

On 22/03/2013 1:56 AM, Yumin Qi wrote:
> Karen,
>
>    The problem may not look like the is_MP problem, I rechecked the code:
>
>    // Interface for detecting multiprocessor system
>    static inline bool is_MP() {
>      assert(_processor_count > 0, "invalid processor count");
>      return _processor_count > 1;
>    }
>
>    The _processor_count is calculated in initialization:
>
>    1) BSD:
>    if (sysctl(mib, 2, &cpu_val, &len, NULL, 0) != -1 && cpu_val >= 1) {
>         assert(len == sizeof(cpu_val), "unexpected data size");
>         set_processor_count(cpu_val);
>    }
>    else {
>         set_processor_count(1);   // fallback
>    }
>
>    2) Linux:
>    set_processor_count(sysconf(_SC_NPROCESSORS_CONF));
>    if (processor_count() == 1) {
>      pid_t pid = os::Linux::gettid();
>      char fname[32];
>      jio_snprintf(fname, sizeof(fname), "/proc/%d", pid);
>      FILE *fp = fopen(fname, "r");
>      if (fp == NULL) {
>        unsafe_chroot_detected = true;
>      } else {
>        fclose(fp);
>      }
>    }
>
>    3) Solaris:
> void os::Solaris::initialize_system_info() {
>    set_processor_count(sysconf(_SC_NPROCESSORS_CONF));
>    _processors_online = sysconf (_SC_NPROCESSORS_ONLN);
>    _physical_memory = (julong)sysconf(_SC_PHYS_PAGES) *
> (julong)sysconf(_SC_PAGESIZE);
> }
>
> sysconf(_SC_NPROCESSORS_CONF) will return number of CPUs not the number
> of online.
>
> That is, it does not relate to user configuration of cpus
> online/offline. It always returns the real number of processors.  If the
> system equipped with multiple processors, is_MP() will always be true or
> it will be false for a single CPU. Any how, we will not escape from this
> call. Am I right here? If so, the problem is not caused by is_MP(), must
> be somewhere else.
>
> BTW, it should use processor_count() function to instead of
> _processor_count. Let compiler to do the optimization.
>
> Thanks
> Yumin
>
>
>
>
> On 3/21/2013 6:54 AM, Karen Kinnear wrote:
>> Yumin,
>>
>> I really like the suggestion from David and from Harold that we
>> simplify this fix to just be a boolean - isMP, and limit the scope of
>> the change to isMP. It won't change the number of processors, and
>> won't tune everything in the libraries etc. according to actual cpus or
>> processors sets. The original goal was to allow folks to get the MP
>> locking even if there is only one processor when the vm starts up.
>> Let's scale back to that if you don't mind.
>>
>> thanks,
>> Karen
>>
>> On Mar 21, 2013, at 12:37 AM, Yumin Qi wrote:
>>
>>> Hi, David
>>>
>>> On 3/20/2013 6:02 PM, David Holmes wrote:
>>>> On 21/03/2013 7:27 AM, Yumin Qi wrote:
>>>>> Hi, can I have your code review of a small change?
>>>> Not really small conceptually. :)
>>>>
>>>> I don't think this form of the fix addresses the underlying issue as
>>>> discussed in the bug report. If the variable was renamed
>>>> MinimumNumberOfProcessors, or MinimumAssumedProcessors, then simply
>>>> using it to turn on is_MP would be okay. Such a flag would suit the
>>>> initial problem perfectly. Of course any value >2 would be
>>>> semantically indistinct, so this really acts as a boolean flag -
>>>> AssumeMP.
>>>>
>>>> The more general NumberOfProcessors approach, which I'm still unsure
>>>> of, should to me control what is reported for available-processors.
>>>> That way it would affect everything in the VM, libraries and
>>>> application code that configures itself based on the number of
>>>> available processors. The main usecase for that, in my opinion,
>>>> would be for apps running on large systems but you want to constrain
>>>> it to using a subset of the physical CPUs (without having to
>>>> configure processor sets). That is a different kind of problem and a
>>>> different kind of flag. Using NumberOfProcessors but not having it
>>>> control anything except is_MP just seems wrong - and using it to
>>>> replace available_processors is not a complex change.
>>>>
>>>> The VM is not designed for dynamic adaptation of threads/pools so if
>>>> the number of processors does change dynamically neither of the
>>>> above options are going to provide solutions to the potential
>>>> performance problems that will be encountered (too many or too few
>>>> threads). Any apps that starts on single core (as reported by the
>>>> OS) is going to be under-provisioned.
>>>>
>>>> David
>>>> -----
>>>>
>>> I think this is only a workaround and not a solution for the specific
>>> use case, or  there is no perfect solution for it. If customers
>>> decided to use this flag,  they should be aware that they are ready
>>> not to consider performance at first.  For using number of available
>>> processors, we need to add code to get that number, not the one in
>>> hotspot os::active_processor_count() which will return the number of
>>> live processors.  So do you think I could just use a flag
>>> -XX:+AssumeMP work around  this problem? Since GC threads will not be
>>> based on this assumption, I agree (Harold pointed this out either)
>>> that one flag is simpler.  In fact, the code for is_MP() is obsolete
>>> for today's computers, all with multi-cores (even for cell chips) so
>>> I think better solution is remove call to is_MP() in all places in
>>> hotspot. I will prepare another webrev with -XX:+AssumeMP for next
>>> codereview.
>>>
>>> For number of GC Threads:
>>> unsigned int Abstract_VM_Version::nof_parallel_worker_threads(
>>>                                                       unsigned int num,
>>>                                                       unsigned int den,
>>>                                                       unsigned int
>>> switch_pt) {
>>>   if (FLAG_IS_DEFAULT(ParallelGCThreads)) {
>>>     assert(ParallelGCThreads == 0, "Default ParallelGCThreads is not
>>> 0");
>>>     // For very large machines, there are diminishing returns
>>>     // for large numbers of worker threads.  Instead of
>>>     // hogging the whole system, use a fraction of the workers for every
>>>     // processor after the first 8.  For example, on a 72 cpu machine
>>>     // and a chosen fraction of 5/8
>>>     // use 8 + (72 - 8) * (5/8) == 48 worker threads.
>>>     unsigned int ncpus = (unsigned int) os::active_processor_count();
>>>     return (ncpus <= switch_pt) ?
>>>            ncpus :
>>>           (switch_pt + ((ncpus - switch_pt) * num) / den);
>>>   } else {
>>>     return ParallelGCThreads;
>>>   }
>>> }
>>>
>>> the call to this function is
>>> unsigned int Abstract_VM_Version::calc_parallel_worker_threads() {
>>>   return nof_parallel_worker_threads(5, 8, 8);
>>> }
>>>
>>> We can see that, if active_processor_count is 1, it will return 1,
>>> and the VM will run with single GC thread. So the better choices maybe:
>>>
>>> 1) get processor count not active processor count for
>>> ParallelGCThreads, that is up to decision from GC team.
>>> 2) Recommend usage is
>>>
>>>   -XX:+AssumeMP -XX:ParallelGCThreads=<number>
>>>
>>> Thanks
>>> Yumin
>>>
>>>>> 2178143:  VM crashes if the number of bound CPUs changed during
>>>>> runtime.
>>>>>
>>>>> Situation: Customer first configure only one CPU online and turn
>>>>> others
>>>>> offline to run java application, after java program started, bring
>>>>> more
>>>>> CPUs back online. Since VM started on a single CPU, os::is_MP() will
>>>>> return false, but after more CPUs available, OS will schedule the app
>>>>> run on multiple CPUs, this caused SEGV in various places where data
>>>>> consistency was broken. The solution is supply a flag to assume it is
>>>>> running on MP, so lock is forced to be called.
>>>>>
>>>>> http://cr.openjdk.java.net/~minqi/2178143/
>>>>>
>>>>> Thanks
>>>>> Yumin
>



More information about the hotspot-gc-dev mailing list