RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Nov 21 17:08:06 UTC 2014


 > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);

That check was added long ago for 6968646 and is present in jdk7 and 6update. And the failure happened in jdk which have it:

# JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops)

But if Dmitry is right we can do nothing here. So your change seems valid in such case.

One note - do you need to check (result == 0) in threads_per_core() too?

Thanks,
Vladimir

On 11/21/14 7:31 AM, Vladimir Kempik wrote:
> Hello
>
> Thanks for looking into this.
>
> It's impossible to collect needed data at the moment, the bug isn't reproducible now. And cpuid dump I've collected from
> ec2 virtual machine says that supports_processor_topology() should report false now:
>
> static bool supports_processor_topology() {
>    return (_cpuid_info.std_max_function >= 0xB) &&
>    // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level.
>    // Some cpus have max cpuid >= 0xB but do not support processor topology.
>    (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
> }
>
>
>   which comes from this being false:
>
> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>
> The check I've added is sanity check to prevent same crashes in future.
>
> Thanks. Vladimir
>
>
> On 17.11.2014 22:47, Vladimir Kozlov wrote:
>> According to next document the cpu has 10 cores (and 2 threads per core):
>>
>> http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz
>>
>> hs_err in the bug report reports only 2 processors and next lines are missing:
>>
>> physical id    : 0
>> siblings    : 4
>> core id        : 0
>> cpu cores    : 4
>> apicid        : 0
>> initial apicid    : 0
>>
>> I assume it is some kind of virtual environment with which cpuid topology is not working (at least our code does not
>> work).
>> We may missing some checks which indicates that topology is not supported.
>> It would be nice if you can put all topology and related cpuid bits from amazon ec2 in bug report.
>> Checking for 0 could be fine but if it is not 0 it could be still wrong if topology info is not supported.
>>
>> Thanks,
>> Vladimir
>>
>> On 11/17/14 8:20 AM, Vladimir Kempik wrote:
>>> Hi,
>>>
>>> Please review patch adding sanity check to cores_per_cpu():
>>>
>>> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>>
>>> Few months ago we've got reports of java crashing in amazon ec2
>>> enviroment (they use Xen).
>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>> https://bugs.openjdk.java.net/browse/JDK-8058937
>>>
>>> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures
>>> -XX:+FlightRecorder
>>>
>>> After investigation I think the crash could only have happened if
>>> support_processor_topology() returned true and
>>> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero.
>>>
>>> I wasn't able to reproduce the bug on amazon ec2 cloud in present days.
>>>
>>> The patch adds sanity check, if cpu topology was used and resulted in 0
>>> cores per cpu, then fallback to non-topology variant, which can't result
>>> in 0 cores per cpu.
>>>
>>> Testing: JPRT.
>>>
>>> Thanks,
>>> Vladimir.
>


More information about the hotspot-runtime-dev mailing list