segfault with 8u5
Martin Traverso
mtraverso at gmail.com
Mon Sep 15 23:10:13 UTC 2014
It turns out the crash only happens with that code and in other places that
use "RetryDriver":
https://github.com/facebook/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/RetryDriver.java
Unfortunately, RetryDriver is a general class that takes a Callable and
dispatches to it while applying some retry policy based on which exceptions
are seen. It's used in a lot of places so I have no idea which sequence of
calls results in this crash. Let me know if there's anything else you'd
like me to try to narrow down the exact sequence of events that triggers it.
Martin
On Fri, Sep 12, 2014 at 1:20 PM, Martin Traverso <mtraverso at gmail.com>
wrote:
>
>> It would be nice if you can give a java code example which shows what
>> HiveMetastoreApiStats$1::call() does. It doesn't need to be exactly that
>> method which could be proprietary and closed. I will file a bug and I need
>> a small test case to show the problem.
>>
>
> The code is here:
>
>
> https://github.com/facebook/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/metastore/HiveMetastoreApiStats.java
>
>
> I've seen the crash in other places, so let me run it a few times and I'll
> provide you with other examples.
>
> Martin
>
>
>> Regards,
>> Vladimir
>>
>>
>>> Thanks!
>>> Martin
>>>
>>>
>>> On Thu, Sep 11, 2014 at 8:08 PM, Martin Traverso <mtraverso at gmail.com
>>> <mailto:mtraverso at gmail.com>> wrote:
>>>
>>> Good news (so far)!
>>>
>>> I've had the system running for a couple of hours with this patch
>>> without problems (it would crash within a few minutes before). I'll
>>> leave it running a bit longer to be sure, but this looks promising.
>>>
>>> Thanks!
>>> Martin
>>>
>>>
>>> On Thu, Sep 11, 2014 at 6:03 PM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>> wrote:
>>>
>>> Nice!
>>>
>>> You can remove changes I gave to you and apply next code and see
>>> if it helps. It may crash in an other place.
>>>
>>> Vladimir
>>>
>>> diff -r fe1f65b0a2d8 src/share/vm/opto/doCall.cpp
>>> --- a/src/share/vm/opto/doCall.cpp Wed Sep 10 09:05:31 2014
>>> -0700
>>> +++ b/src/share/vm/opto/doCall.cpp Thu Sep 11 18:00:29 2014
>>> -0700
>>> @@ -802,6 +802,11 @@
>>> if( ex_node->is_Phi() ) {
>>> ex_klass_node = new (C) PhiNode( ex_node->in(0),
>>> TypeKlassPtr::OBJECT );
>>> for( uint i = 1; i < ex_node->req(); i++ ) {
>>> + Node* ex_in = ex_node->in(i);
>>> + if (ex_in == top() || ex_in == NULL) {
>>> + ex_klass_node->init_req(i, top());
>>> + continue;
>>> + }
>>> Node* p = basic_plus_adr( ex_node->in(i),
>>> ex_node->in(i), oopDesc::klass_offset_in___bytes() );
>>> Node* k = _gvn.transform( LoadKlassNode::make(_gvn,
>>> immutable_memory(), p, TypeInstPtr::KLASS, TypeKlassPtr::OBJECT)
>>> );
>>> ex_klass_node->init_req( i, k );
>>>
>>>
>>>
>>> On 9/11/14 5:31 PM, Martin Traverso wrote:
>>>
>>> Vladimir,
>>>
>>> Did you get next message in output before the crash?
>>> "*** Exception not InstPtr"
>>>
>>>
>>> Not that I can tell.
>>>
>>>
>>> It could happen if it is dead part of the IR graph.
>>> Can you add next debug output code to Hotspot's
>>> doCall.cpp (patch
>>> attached) and test with it?
>>>
>>>
>>> Here's the output:
>>>
>>> ******** Expecting TypeClassPtr 1 ********
>>>
>>> *** compilation continue
>>>
>>> *** save_ex_node:
>>>
>>> 900 CastPP === 899 575 [[ 891 909 909 1016
>>> 932 1110 958
>>> 978 970 993 1016 1004 1004 1013 ]]
>>> #java/lang/Exception:NotNull
>>> * Oop:java/lang/Exception:__NotNull * !jvms:
>>> HiveMetastoreApiStats$1::call @ bci:97 RetryDriver::run @
>>> bci:25
>>>
>>> 1013 CheckCastPP === 1011 900 [[ 923 1021
>>> 1021 1031 1040
>>> 1050 1066 1085 1077 1100 1110 1176 1176 ]]
>>> #org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>> *
>>> Oop:org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>> * !jvms: HiveMetastoreApiStats$1::call @ bci:118
>>> RetryDriver::run @ bci:25
>>>
>>> 1000 Region === 1000 1097 _ 990 [[ 1000 942
>>> 1107 1108 1109
>>> 1110 1121 1124 1175 ]] !jvms:
>>> HiveMetastoreApiStats$1::call @
>>> bci:135 RetryDriver::run @ bci:25
>>>
>>> 1110 Phi === 1000 1013 1 900 [[ 942 1115
>>> 1132 1150
>>> 1142 1165 1115 1172 1172 ]]
>>> #java/lang/Exception:NotNull *
>>> Oop:java/lang/Exception:__NotNull * !jvms:
>>> HiveMetastoreApiStats$1::call @
>>> bci:135 RetryDriver::run @ bci:25
>>>
>>> *** ex_node:
>>>
>>> 900 CastPP === 899 575 [[ 891 909 909 1016
>>> 932 1110 958
>>> 978 970 993 1016 1004 1004 1013 ]]
>>> #java/lang/Exception:NotNull
>>> * Oop:java/lang/Exception:__NotNull * !jvms:
>>> HiveMetastoreApiStats$1::call @ bci:97 RetryDriver::run @
>>> bci:25
>>>
>>> 1013 CheckCastPP === 1011 900 [[ 923 1021
>>> 1021 1031 1040
>>> 1050 1066 1085 1077 1100 1110 1176 1176 ]]
>>> #org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>> *
>>> Oop:org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>> * !jvms: HiveMetastoreApiStats$1::call @ bci:118
>>> RetryDriver::run @ bci:25
>>>
>>> 1000 Region === 1000 1097 _ 990 [[ 1000 942
>>> 1107 1108 1109
>>> 1110 1121 1124 1175 ]] !jvms:
>>> HiveMetastoreApiStats$1::call @
>>> bci:135 RetryDriver::run @ bci:25
>>>
>>> 1110 Phi === 1000 1013 1 900 [[ 942 1115
>>> 1132 1150
>>> 1142 1165 1115 1172 1172 ]]
>>> #java/lang/Exception:NotNull *
>>> Oop:java/lang/Exception:__NotNull * !jvms:
>>> HiveMetastoreApiStats$1::call @
>>> bci:135 RetryDriver::run @ bci:25
>>>
>>>
>>>
>>> I may need to ask you do more such testing if it is
>>> fine with you.
>>>
>>>
>>> Absolutely! Just let me know.
>>>
>>> Thanks,
>>> Martin
>>>
>>>
>>>
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 9/11/14 2:05 PM, Martin Traverso wrote:
>>>
>>> Hi Vladimir,
>>>
>>> I finally got around to building a fastdebug VM. I
>>> don't see the
>>> first
>>> crash anymore (great!), but the second one still
>>> happens. Here's
>>> the output:
>>>
>>> https://gist.github.com/____martint/abea9be3df700236ec0b
>>> <https://gist.github.com/__martint/abea9be3df700236ec0b>
>>>
>>> <https://gist.github.com/__martint/abea9be3df700236ec0b
>>> <https://gist.github.com/martint/abea9be3df700236ec0b>>
>>>
>>> Let me know if there's anything other information
>>> you'd like me
>>> to gather.
>>>
>>> Thanks!
>>> Martin
>>>
>>> On Wed, Jul 30, 2014 at 10:29 AM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>
>>> <mailto:vladimir.kozlov at __oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>>
>>> <mailto:vladimir.kozlov@
>>> <mailto:vladimir.kozlov@>__orac__le.com <http://oracle.com>
>>> <mailto:vladimir.kozlov at __oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>>>> wrote:
>>>
>>> Martin,
>>>
>>> It would be also nice if you can build
>>> fastdebug VM and run
>>> with it.
>>> cd hotspot/make; make fastdebug LP64=1
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> On 7/30/14 9:40 AM, Vladimir Kozlov wrote:
>>>
>>> 8029381 was fixed in jdk9 and 8u20 (which
>>> should be
>>> release soon):
>>>
>>> http://hg.openjdk.java.net/______jdk8u/jdk8u/hotspot/rev/___
>>> ___0b9500028980
>>> <http://hg.openjdk.java.net/____jdk8u/jdk8u/hotspot/rev/____
>>> 0b9500028980>
>>>
>>> <http://hg.openjdk.java.net/____jdk8u/jdk8u/hotspot/rev/____
>>> 0b9500028980
>>> <http://hg.openjdk.java.net/__jdk8u/jdk8u/hotspot/rev/__
>>> 0b9500028980>>
>>>
>>>
>>> <http://hg.openjdk.java.net/____jdk8u/jdk8u/hotspot/rev/____
>>> 0b9500028980
>>> <http://hg.openjdk.java.net/__jdk8u/jdk8u/hotspot/rev/__
>>> 0b9500028980>
>>>
>>> <http://hg.openjdk.java.net/__jdk8u/jdk8u/hotspot/rev/__
>>> 0b9500028980
>>> <http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/
>>> 0b9500028980>>>
>>>
>>> I will look on C2 crash more. I don't
>>> remember any recent
>>> problems in
>>> catch_inline_exceptions().
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 5/19/14 12:24 PM, Martin Traverso wrote:
>>>
>>> The failure happened in C1 JIT
>>> compiler (first
>>> tier).
>>> You can try
>>> to switch off -XX:-TieredCompilation.
>>>
>>>
>>> That seemed to have worked around this
>>> particular
>>> issue.
>>>
>>> However, we ran into another crash:
>>>
>>> Stack:
>>> [0x00000000430c8000,______0x00000000431c9000],
>>> sp=0x00000000431c5980,
>>> free space=1014k
>>> Native frames: (J=compiled Java code,
>>> j=interpreted, Vv=VM code,
>>> C=native code)
>>>
>>> V [libjvm.so+0x814115]
>>> LoadKlassNode::make(PhaseGVN&, Node*,
>>> Node*,
>>> TypePtr const*, TypeKlassPtr
>>> const*)+0x45
>>> V [libjvm.so+0x51ab06]
>>>
>>>
>>> Parse::catch_inline_______exceptions(SafePointNode*)+___
>>> ___0x936
>>>
>>> V [libjvm.so+0x8c1a5a]
>>> Parse::do_exceptions()+0xba
>>> V [libjvm.so+0x8c6100]
>>> Parse::do_one_block()+0x180
>>> V [libjvm.so+0x8c6377]
>>> Parse::do_all_blocks()+0x127
>>> V [libjvm.so+0x8c95d3]
>>> Parse::Parse(JVMState*,
>>> ciMethod*, float,
>>> Parse*)+0x15a3
>>> V [libjvm.so+0x3b6529]
>>> ParseGenerator::generate(______JVMState*,
>>> Parse*)+0x99
>>> V [libjvm.so+0x3b7202]
>>>
>>> PredictedCallGenerator::______generate(JVMState*,
>>> Parse*)+0x2a2
>>> V [libjvm.so+0x51aefd]
>>> Parse::do_call()+0x1cd
>>> V [libjvm.so+0x8d3c7a]
>>> Parse::do_one_bytecode()+______0x32da
>>> V [libjvm.so+0x8c60f8]
>>> Parse::do_one_block()+0x178
>>>
>>> V [libjvm.so+0x8c6377]
>>> Parse::do_all_blocks()+0x127
>>> V [libjvm.so+0x8c95d3]
>>> Parse::Parse(JVMState*,
>>> ciMethod*, float,
>>> Parse*)+0x15a3
>>>
>>> V [libjvm.so+0x3b6529]
>>> ParseGenerator::generate(______JVMState*,
>>> Parse*)+0x99
>>> V [libjvm.so+0x46111c]
>>> Compile::Compile(ciEnv*,
>>> C2Compiler*,
>>> ciMethod*, int, bool, bool,
>>> bool)+0x128c
>>>
>>> V [libjvm.so+0x3b5008]
>>> C2Compiler::compile_method(______ciEnv*,
>>> ciMethod*,
>>> int)+0x198
>>> V [libjvm.so+0x46982a]
>>>
>>>
>>> CompileBroker::invoke_______compiler_on_method(______
>>> CompileTask*)+0xc8a
>>>
>>> V [libjvm.so+0x46c230]
>>>
>>> CompileBroker::compiler_______thread_loop()+0x620
>>> V [libjvm.so+0x9e303f]
>>> JavaThread::thread_main_inner(______)+0xdf
>>>
>>> V [libjvm.so+0x9e3205]
>>> JavaThread::run()+0x1b5
>>> V [libjvm.so+0x8a00c8]
>>> java_start(Thread*)+0x108
>>>
>>>
>>>
>>> Full dump here:
>>> https://gist.github.com/______martint/783cf3e30c17fc897423
>>> <https://gist.github.com/____martint/783cf3e30c17fc897423>
>>>
>>> <https://gist.github.com/____martint/783cf3e30c17fc897423
>>> <https://gist.github.com/__martint/783cf3e30c17fc897423>>
>>>
>>>
>>> <https://gist.github.com/____martint/783cf3e30c17fc897423
>>> <https://gist.github.com/__martint/783cf3e30c17fc897423>
>>>
>>> <https://gist.github.com/__martint/783cf3e30c17fc897423
>>> <https://gist.github.com/martint/783cf3e30c17fc897423>>>
>>>
>>>
>>> The only bug I found which could
>>> be related is
>>> next:
>>>
>>> https://bugs.openjdk.java.net/________browse/JDK-8029381
>>> <https://bugs.openjdk.java.net/______browse/JDK-8029381>
>>>
>>> <https://bugs.openjdk.java.__net/____browse/JDK-8029381
>>> <https://bugs.openjdk.java.net/____browse/JDK-8029381>>
>>>
>>> <https://bugs.openjdk.java.___
>>> _net/__browse/JDK-8029381
>>>
>>> <https://bugs.openjdk.java.__net/__browse/JDK-8029381
>>> <https://bugs.openjdk.java.net/__browse/JDK-8029381>>>
>>>
>>> <https://bugs.openjdk.java.______net/browse/JDK-8029381
>>>
>>> <https://bugs.openjdk.java.____net/browse/JDK-8029381
>>> <https://bugs.openjdk.java.__net/browse/JDK-8029381
>>> <https://bugs.openjdk.java.net/browse/JDK-8029381>>>>
>>>
>>>
>>> Unfortunately, I can't see this bug
>>> report. I get
>>> redirected
>>> to the
>>> login screen.
>>>
>>>
>>> Thanks!
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140915/bcb73088/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list