RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Nov 21 17:40:57 UTC 2014


Okay. Looks good.

Thanks,
Vladimir

On 11/21/14 9:19 AM, Vladimir Kempik wrote:
> Hello
>
>
>  >That check was added long ago for 6968646 and is present in jdk7 and
> 6update. And the failure happened in jdk which have it:
>
> I meant this check failed to do its job, there is no other way to get
> cores_per_cpu == 0 on intel cpu in this function.
>
>
>  >One note - do you need to check (result == 0) in threads_per_core() too?
>
> for result to be 0 in cores_per_cpu()
>
> result = _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus /
>   _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus;
>
> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus needs to be zero and
> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus to be non zero. in this
> case threads_per_core isn't affected:
>
> if (is_intel() && supports_processor_topology()) {
> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus;
>
> if _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus == 0 then we would
> crash in cores_per_cpu with div by zero anyway.
>
> That was my reason to do not edit threads_per_cpu.
>
> Thanks, Vladimir
> On 21.11.2014 20:08, Vladimir Kozlov wrote:
>> > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) |
>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>>
>> That check was added long ago for 6968646 and is present in jdk7 and
>> 6update. And the failure happened in jdk which have it:
>>
>> # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build
>> 1.7.0_51-b13)
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode
>> linux-amd64 compressed oops)
>>
>> But if Dmitry is right we can do nothing here. So your change seems
>> valid in such case.
>>
>> One note - do you need to check (result == 0) in threads_per_core() too?
>>
>> Thanks,
>> Vladimir
>>
>> On 11/21/14 7:31 AM, Vladimir Kempik wrote:
>>> Hello
>>>
>>> Thanks for looking into this.
>>>
>>> It's impossible to collect needed data at the moment, the bug isn't
>>> reproducible now. And cpuid dump I've collected from
>>> ec2 virtual machine says that supports_processor_topology() should
>>> report false now:
>>>
>>> static bool supports_processor_topology() {
>>>    return (_cpuid_info.std_max_function >= 0xB) &&
>>>    // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level.
>>>    // Some cpus have max cpuid >= 0xB but do not support processor
>>> topology.
>>>    (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) |
>>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>>> }
>>>
>>>
>>>   which comes from this being false:
>>>
>>> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) |
>>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>>>
>>> The check I've added is sanity check to prevent same crashes in future.
>>>
>>> Thanks. Vladimir
>>>
>>>
>>> On 17.11.2014 22:47, Vladimir Kozlov wrote:
>>>> According to next document the cpu has 10 cores (and 2 threads per
>>>> core):
>>>>
>>>> http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz
>>>>
>>>>
>>>> hs_err in the bug report reports only 2 processors and next lines
>>>> are missing:
>>>>
>>>> physical id    : 0
>>>> siblings    : 4
>>>> core id        : 0
>>>> cpu cores    : 4
>>>> apicid        : 0
>>>> initial apicid    : 0
>>>>
>>>> I assume it is some kind of virtual environment with which cpuid
>>>> topology is not working (at least our code does not
>>>> work).
>>>> We may missing some checks which indicates that topology is not
>>>> supported.
>>>> It would be nice if you can put all topology and related cpuid bits
>>>> from amazon ec2 in bug report.
>>>> Checking for 0 could be fine but if it is not 0 it could be still
>>>> wrong if topology info is not supported.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 11/17/14 8:20 AM, Vladimir Kempik wrote:
>>>>> Hi,
>>>>>
>>>>> Please review patch adding sanity check to cores_per_cpu():
>>>>>
>>>>> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>>>>
>>>>> Few months ago we've got reports of java crashing in amazon ec2
>>>>> enviroment (they use Xen).
>>>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>>>> https://bugs.openjdk.java.net/browse/JDK-8058937
>>>>>
>>>>> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures
>>>>> -XX:+FlightRecorder
>>>>>
>>>>> After investigation I think the crash could only have happened if
>>>>> support_processor_topology() returned true and
>>>>> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero.
>>>>>
>>>>> I wasn't able to reproduce the bug on amazon ec2 cloud in present
>>>>> days.
>>>>>
>>>>> The patch adds sanity check, if cpu topology was used and resulted
>>>>> in 0
>>>>> cores per cpu, then fallback to non-topology variant, which can't
>>>>> result
>>>>> in 0 cores per cpu.
>>>>>
>>>>> Testing: JPRT.
>>>>>
>>>>> Thanks,
>>>>> Vladimir.
>>>
>


More information about the hotspot-runtime-dev mailing list