RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment

Vladimir Kempik vladimir.kempik at oracle.com
Fri Nov 21 17:19:18 UTC 2014


Hello


 >That check was added long ago for 6968646 and is present in jdk7 and 
6update. And the failure happened in jdk which have it:

I meant this check failed to do its job, there is no other way to get 
cores_per_cpu == 0 on intel cpu in this function.


 >One note - do you need to check (result == 0) in threads_per_core() too?

for result to be 0 in cores_per_cpu()

result = _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus /
  _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus;

_cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus needs to be zero and 
_cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus to be non zero. in this 
case threads_per_core isn't affected:

if (is_intel() && supports_processor_topology()) {
result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus;

if _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus == 0 then we would 
crash in cores_per_cpu with div by zero anyway.

That was my reason to do not edit threads_per_cpu.

Thanks, Vladimir
On 21.11.2014 20:08, Vladimir Kozlov wrote:
> > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | 
> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>
> That check was added long ago for 6968646 and is present in jdk7 and 
> 6update. And the failure happened in jdk which have it:
>
> # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 
> 1.7.0_51-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode 
> linux-amd64 compressed oops)
>
> But if Dmitry is right we can do nothing here. So your change seems 
> valid in such case.
>
> One note - do you need to check (result == 0) in threads_per_core() too?
>
> Thanks,
> Vladimir
>
> On 11/21/14 7:31 AM, Vladimir Kempik wrote:
>> Hello
>>
>> Thanks for looking into this.
>>
>> It's impossible to collect needed data at the moment, the bug isn't 
>> reproducible now. And cpuid dump I've collected from
>> ec2 virtual machine says that supports_processor_topology() should 
>> report false now:
>>
>> static bool supports_processor_topology() {
>>    return (_cpuid_info.std_max_function >= 0xB) &&
>>    // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level.
>>    // Some cpus have max cpuid >= 0xB but do not support processor 
>> topology.
>>    (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | 
>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>> }
>>
>>
>>   which comes from this being false:
>>
>> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | 
>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>>
>> The check I've added is sanity check to prevent same crashes in future.
>>
>> Thanks. Vladimir
>>
>>
>> On 17.11.2014 22:47, Vladimir Kozlov wrote:
>>> According to next document the cpu has 10 cores (and 2 threads per 
>>> core):
>>>
>>> http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz 
>>>
>>>
>>> hs_err in the bug report reports only 2 processors and next lines 
>>> are missing:
>>>
>>> physical id    : 0
>>> siblings    : 4
>>> core id        : 0
>>> cpu cores    : 4
>>> apicid        : 0
>>> initial apicid    : 0
>>>
>>> I assume it is some kind of virtual environment with which cpuid 
>>> topology is not working (at least our code does not
>>> work).
>>> We may missing some checks which indicates that topology is not 
>>> supported.
>>> It would be nice if you can put all topology and related cpuid bits 
>>> from amazon ec2 in bug report.
>>> Checking for 0 could be fine but if it is not 0 it could be still 
>>> wrong if topology info is not supported.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 11/17/14 8:20 AM, Vladimir Kempik wrote:
>>>> Hi,
>>>>
>>>> Please review patch adding sanity check to cores_per_cpu():
>>>>
>>>> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>>>
>>>> Few months ago we've got reports of java crashing in amazon ec2
>>>> enviroment (they use Xen).
>>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>>> https://bugs.openjdk.java.net/browse/JDK-8058937
>>>>
>>>> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures
>>>> -XX:+FlightRecorder
>>>>
>>>> After investigation I think the crash could only have happened if
>>>> support_processor_topology() returned true and
>>>> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero.
>>>>
>>>> I wasn't able to reproduce the bug on amazon ec2 cloud in present 
>>>> days.
>>>>
>>>> The patch adds sanity check, if cpu topology was used and resulted 
>>>> in 0
>>>> cores per cpu, then fallback to non-topology variant, which can't 
>>>> result
>>>> in 0 cores per cpu.
>>>>
>>>> Testing: JPRT.
>>>>
>>>> Thanks,
>>>> Vladimir.
>>



More information about the hotspot-runtime-dev mailing list