RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment
Vladimir Kempik
vladimir.kempik at oracle.com
Fri Nov 21 17:19:18 UTC 2014
Hello
>That check was added long ago for 6968646 and is present in jdk7 and
6update. And the failure happened in jdk which have it:
I meant this check failed to do its job, there is no other way to get
cores_per_cpu == 0 on intel cpu in this function.
>One note - do you need to check (result == 0) in threads_per_core() too?
for result to be 0 in cores_per_cpu()
result = _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus /
_cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus;
_cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus needs to be zero and
_cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus to be non zero. in this
case threads_per_core isn't affected:
if (is_intel() && supports_processor_topology()) {
result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus;
if _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus == 0 then we would
crash in cores_per_cpu with div by zero anyway.
That was my reason to do not edit threads_per_cpu.
Thanks, Vladimir
On 21.11.2014 20:08, Vladimir Kozlov wrote:
> > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) |
> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>
> That check was added long ago for 6968646 and is present in jdk7 and
> 6update. And the failure happened in jdk which have it:
>
> # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build
> 1.7.0_51-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode
> linux-amd64 compressed oops)
>
> But if Dmitry is right we can do nothing here. So your change seems
> valid in such case.
>
> One note - do you need to check (result == 0) in threads_per_core() too?
>
> Thanks,
> Vladimir
>
> On 11/21/14 7:31 AM, Vladimir Kempik wrote:
>> Hello
>>
>> Thanks for looking into this.
>>
>> It's impossible to collect needed data at the moment, the bug isn't
>> reproducible now. And cpuid dump I've collected from
>> ec2 virtual machine says that supports_processor_topology() should
>> report false now:
>>
>> static bool supports_processor_topology() {
>> return (_cpuid_info.std_max_function >= 0xB) &&
>> // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level.
>> // Some cpus have max cpuid >= 0xB but do not support processor
>> topology.
>> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) |
>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>> }
>>
>>
>> which comes from this being false:
>>
>> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) |
>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0);
>>
>> The check I've added is sanity check to prevent same crashes in future.
>>
>> Thanks. Vladimir
>>
>>
>> On 17.11.2014 22:47, Vladimir Kozlov wrote:
>>> According to next document the cpu has 10 cores (and 2 threads per
>>> core):
>>>
>>> http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz
>>>
>>>
>>> hs_err in the bug report reports only 2 processors and next lines
>>> are missing:
>>>
>>> physical id : 0
>>> siblings : 4
>>> core id : 0
>>> cpu cores : 4
>>> apicid : 0
>>> initial apicid : 0
>>>
>>> I assume it is some kind of virtual environment with which cpuid
>>> topology is not working (at least our code does not
>>> work).
>>> We may missing some checks which indicates that topology is not
>>> supported.
>>> It would be nice if you can put all topology and related cpuid bits
>>> from amazon ec2 in bug report.
>>> Checking for 0 could be fine but if it is not 0 it could be still
>>> wrong if topology info is not supported.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 11/17/14 8:20 AM, Vladimir Kempik wrote:
>>>> Hi,
>>>>
>>>> Please review patch adding sanity check to cores_per_cpu():
>>>>
>>>> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>>>
>>>> Few months ago we've got reports of java crashing in amazon ec2
>>>> enviroment (they use Xen).
>>>> https://bugs.openjdk.java.net/browse/JDK-8058935
>>>> https://bugs.openjdk.java.net/browse/JDK-8058937
>>>>
>>>> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures
>>>> -XX:+FlightRecorder
>>>>
>>>> After investigation I think the crash could only have happened if
>>>> support_processor_topology() returned true and
>>>> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero.
>>>>
>>>> I wasn't able to reproduce the bug on amazon ec2 cloud in present
>>>> days.
>>>>
>>>> The patch adds sanity check, if cpu topology was used and resulted
>>>> in 0
>>>> cores per cpu, then fallback to non-topology variant, which can't
>>>> result
>>>> in 0 cores per cpu.
>>>>
>>>> Testing: JPRT.
>>>>
>>>> Thanks,
>>>> Vladimir.
>>
More information about the hotspot-runtime-dev
mailing list