segfault with 8u5

Martin Traverso mtraverso at gmail.com
Mon Sep 15 23:10:13 UTC 2014


It turns out the crash only happens with that code and in other places that
use "RetryDriver":

https://github.com/facebook/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/RetryDriver.java

Unfortunately, RetryDriver is a general class that takes a Callable and
dispatches to it while applying some retry policy based on which exceptions
are seen. It's used in a lot of places so I have no idea which sequence of
calls results in this crash. Let me know if there's anything else you'd
like me to try to narrow down the exact sequence of events that triggers it.

Martin

On Fri, Sep 12, 2014 at 1:20 PM, Martin Traverso <mtraverso at gmail.com>
wrote:

>
>> It would be nice if you can give a java code example which shows what
>> HiveMetastoreApiStats$1::call() does. It doesn't need to be exactly that
>> method which could be proprietary and closed. I will file a bug and I need
>> a small test case to show the problem.
>>
>
> The code is here:
>
>
> https://github.com/facebook/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/metastore/HiveMetastoreApiStats.java
>
>
> I've seen the crash in other places, so let me run it a few times and I'll
> provide you with other examples.
>
> Martin
>
>
>> Regards,
>> Vladimir
>>
>>
>>> Thanks!
>>> Martin
>>>
>>>
>>> On Thu, Sep 11, 2014 at 8:08 PM, Martin Traverso <mtraverso at gmail.com
>>> <mailto:mtraverso at gmail.com>> wrote:
>>>
>>>     Good news (so far)!
>>>
>>>     I've had the system running for a couple of hours with this patch
>>>     without problems (it would crash within a few minutes before). I'll
>>>     leave it running a bit longer to be sure, but this looks promising.
>>>
>>>     Thanks!
>>>     Martin
>>>
>>>
>>>     On Thu, Sep 11, 2014 at 6:03 PM, Vladimir Kozlov
>>>     <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>> wrote:
>>>
>>>         Nice!
>>>
>>>         You can remove changes I gave to you and apply next code and see
>>>         if it helps. It may crash in an other place.
>>>
>>>         Vladimir
>>>
>>>         diff -r fe1f65b0a2d8 src/share/vm/opto/doCall.cpp
>>>         --- a/src/share/vm/opto/doCall.cpp      Wed Sep 10 09:05:31 2014
>>>         -0700
>>>         +++ b/src/share/vm/opto/doCall.cpp      Thu Sep 11 18:00:29 2014
>>>         -0700
>>>         @@ -802,6 +802,11 @@
>>>               if( ex_node->is_Phi() ) {
>>>                 ex_klass_node = new (C) PhiNode( ex_node->in(0),
>>>         TypeKlassPtr::OBJECT );
>>>                 for( uint i = 1; i < ex_node->req(); i++ ) {
>>>         +        Node* ex_in = ex_node->in(i);
>>>         +        if (ex_in == top() || ex_in == NULL) {
>>>         +          ex_klass_node->init_req(i, top());
>>>         +          continue;
>>>         +        }
>>>                   Node* p = basic_plus_adr( ex_node->in(i),
>>>         ex_node->in(i), oopDesc::klass_offset_in___bytes() );
>>>                   Node* k = _gvn.transform( LoadKlassNode::make(_gvn,
>>>         immutable_memory(), p, TypeInstPtr::KLASS, TypeKlassPtr::OBJECT)
>>> );
>>>                   ex_klass_node->init_req( i, k );
>>>
>>>
>>>
>>>         On 9/11/14 5:31 PM, Martin Traverso wrote:
>>>
>>>             Vladimir,
>>>
>>>                  Did you get next message in output before the crash?
>>>                  "*** Exception not InstPtr"
>>>
>>>
>>>             Not that I can tell.
>>>
>>>
>>>                  It could happen if it is dead part of the IR graph.
>>>                  Can you add next debug output code to Hotspot's
>>>             doCall.cpp (patch
>>>                  attached) and test with it?
>>>
>>>
>>>             Here's the output:
>>>
>>>             ******** Expecting TypeClassPtr 1 ********
>>>
>>>             *** compilation continue
>>>
>>>             *** save_ex_node:
>>>
>>>                900    CastPP  ===  899  575  [[ 891  909  909  1016
>>>             932  1110  958
>>>             978  970  993  1016  1004  1004  1013 ]]
>>>             #java/lang/Exception:NotNull
>>>             *  Oop:java/lang/Exception:__NotNull * !jvms:
>>>             HiveMetastoreApiStats$1::call @ bci:97 RetryDriver::run @
>>> bci:25
>>>
>>>                1013   CheckCastPP     ===  1011  900  [[ 923  1021
>>>             1021  1031  1040
>>>             1050  1066  1085  1077  1100  1110  1176  1176 ]]
>>>             #org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>>             *
>>>             Oop:org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>>             * !jvms: HiveMetastoreApiStats$1::call @ bci:118
>>>             RetryDriver::run @ bci:25
>>>
>>>                1000   Region  ===  1000  1097 _  990  [[ 1000  942
>>>             1107  1108  1109
>>>             1110  1121  1124  1175 ]]  !jvms:
>>>             HiveMetastoreApiStats$1::call @
>>>             bci:135 RetryDriver::run @ bci:25
>>>
>>>                1110   Phi     ===  1000  1013  1  900  [[ 942  1115
>>>             1132  1150
>>>             1142  1165  1115  1172  1172 ]]
>>> #java/lang/Exception:NotNull *
>>>             Oop:java/lang/Exception:__NotNull * !jvms:
>>>             HiveMetastoreApiStats$1::call @
>>>             bci:135 RetryDriver::run @ bci:25
>>>
>>>             *** ex_node:
>>>
>>>                900    CastPP  ===  899  575  [[ 891  909  909  1016
>>>             932  1110  958
>>>             978  970  993  1016  1004  1004  1013 ]]
>>>             #java/lang/Exception:NotNull
>>>             *  Oop:java/lang/Exception:__NotNull * !jvms:
>>>             HiveMetastoreApiStats$1::call @ bci:97 RetryDriver::run @
>>> bci:25
>>>
>>>                1013   CheckCastPP     ===  1011  900  [[ 923  1021
>>>             1021  1031  1040
>>>             1050  1066  1085  1077  1100  1110  1176  1176 ]]
>>>             #org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>>             *
>>>             Oop:org/apache/hadoop/hive/__metastore/api/__
>>> NoSuchObjectException:NotNull:__exact
>>>             * !jvms: HiveMetastoreApiStats$1::call @ bci:118
>>>             RetryDriver::run @ bci:25
>>>
>>>                1000   Region  ===  1000  1097 _  990  [[ 1000  942
>>>             1107  1108  1109
>>>             1110  1121  1124  1175 ]]  !jvms:
>>>             HiveMetastoreApiStats$1::call @
>>>             bci:135 RetryDriver::run @ bci:25
>>>
>>>                1110   Phi     ===  1000  1013  1  900  [[ 942  1115
>>>             1132  1150
>>>             1142  1165  1115  1172  1172 ]]
>>> #java/lang/Exception:NotNull *
>>>             Oop:java/lang/Exception:__NotNull * !jvms:
>>>             HiveMetastoreApiStats$1::call @
>>>             bci:135 RetryDriver::run @ bci:25
>>>
>>>
>>>
>>>                  I may need to ask you do more such testing if it is
>>>             fine with you.
>>>
>>>
>>>             Absolutely! Just let me know.
>>>
>>>             Thanks,
>>>             Martin
>>>
>>>
>>>
>>>
>>>                  Regards,
>>>                  Vladimir
>>>
>>>                  On 9/11/14 2:05 PM, Martin Traverso wrote:
>>>
>>>                      Hi Vladimir,
>>>
>>>                      I finally got around to building a fastdebug VM. I
>>>             don't see the
>>>                      first
>>>                      crash anymore (great!), but the second one still
>>>             happens. Here's
>>>                      the output:
>>>
>>>             https://gist.github.com/____martint/abea9be3df700236ec0b
>>>             <https://gist.github.com/__martint/abea9be3df700236ec0b>
>>>
>>>             <https://gist.github.com/__martint/abea9be3df700236ec0b
>>>             <https://gist.github.com/martint/abea9be3df700236ec0b>>
>>>
>>>                      Let me know if there's anything other information
>>>             you'd like me
>>>                      to gather.
>>>
>>>                      Thanks!
>>>                      Martin
>>>
>>>                      On Wed, Jul 30, 2014 at 10:29 AM, Vladimir Kozlov
>>>                      <vladimir.kozlov at oracle.com
>>>             <mailto:vladimir.kozlov at oracle.com>
>>>             <mailto:vladimir.kozlov at __oracle.com
>>>             <mailto:vladimir.kozlov at oracle.com>>
>>>                      <mailto:vladimir.kozlov@
>>>             <mailto:vladimir.kozlov@>__orac__le.com <http://oracle.com>
>>>                      <mailto:vladimir.kozlov at __oracle.com
>>>             <mailto:vladimir.kozlov at oracle.com>>>> wrote:
>>>
>>>                           Martin,
>>>
>>>                           It would be also nice if you can build
>>>             fastdebug VM and run
>>>                      with it.
>>>                           cd hotspot/make; make fastdebug LP64=1
>>>
>>>                           Thanks,
>>>                           Vladimir
>>>
>>>
>>>                           On 7/30/14 9:40 AM, Vladimir Kozlov wrote:
>>>
>>>                               8029381 was fixed in jdk9 and 8u20 (which
>>>             should be
>>>                      release soon):
>>>
>>>             http://hg.openjdk.java.net/______jdk8u/jdk8u/hotspot/rev/___
>>> ___0b9500028980
>>>             <http://hg.openjdk.java.net/____jdk8u/jdk8u/hotspot/rev/____
>>> 0b9500028980>
>>>
>>>             <http://hg.openjdk.java.net/____jdk8u/jdk8u/hotspot/rev/____
>>> 0b9500028980
>>>             <http://hg.openjdk.java.net/__jdk8u/jdk8u/hotspot/rev/__
>>> 0b9500028980>>
>>>
>>>
>>>             <http://hg.openjdk.java.net/____jdk8u/jdk8u/hotspot/rev/____
>>> 0b9500028980
>>>             <http://hg.openjdk.java.net/__jdk8u/jdk8u/hotspot/rev/__
>>> 0b9500028980>
>>>
>>>             <http://hg.openjdk.java.net/__jdk8u/jdk8u/hotspot/rev/__
>>> 0b9500028980
>>>             <http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/
>>> 0b9500028980>>>
>>>
>>>                               I will look on C2 crash more. I don't
>>>             remember any recent
>>>                               problems in
>>>                               catch_inline_exceptions().
>>>
>>>                               Regards,
>>>                               Vladimir
>>>
>>>                               On 5/19/14 12:24 PM, Martin Traverso wrote:
>>>
>>>                                        The failure happened in C1 JIT
>>>             compiler (first
>>>                      tier).
>>>                                   You can try
>>>                                   to switch off -XX:-TieredCompilation.
>>>
>>>
>>>                                   That seemed to have worked around this
>>>             particular
>>>                      issue.
>>>
>>>                                   However, we ran into another crash:
>>>
>>>                                   Stack:
>>>             [0x00000000430c8000,______0x00000000431c9000],
>>>                                   sp=0x00000000431c5980,
>>>                                   free space=1014k
>>>                                   Native frames: (J=compiled Java code,
>>>                      j=interpreted, Vv=VM code,
>>>                                   C=native code)
>>>
>>>                                   V [libjvm.so+0x814115]
>>>                      LoadKlassNode::make(PhaseGVN&, Node*,
>>>                                   Node*,
>>>                                   TypePtr const*, TypeKlassPtr
>>> const*)+0x45
>>>                                   V [libjvm.so+0x51ab06]
>>>
>>>
>>>             Parse::catch_inline_______exceptions(SafePointNode*)+___
>>> ___0x936
>>>
>>>                                   V [libjvm.so+0x8c1a5a]
>>>             Parse::do_exceptions()+0xba
>>>                                   V [libjvm.so+0x8c6100]
>>>             Parse::do_one_block()+0x180
>>>                                   V [libjvm.so+0x8c6377]
>>>             Parse::do_all_blocks()+0x127
>>>                                   V [libjvm.so+0x8c95d3]
>>>             Parse::Parse(JVMState*,
>>>                      ciMethod*, float,
>>>                                   Parse*)+0x15a3
>>>                                   V [libjvm.so+0x3b6529]
>>>                      ParseGenerator::generate(______JVMState*,
>>>                                   Parse*)+0x99
>>>                                   V [libjvm.so+0x3b7202]
>>>
>>>               PredictedCallGenerator::______generate(JVMState*,
>>>                                   Parse*)+0x2a2
>>>                                   V [libjvm.so+0x51aefd]
>>>             Parse::do_call()+0x1cd
>>>                                   V [libjvm.so+0x8d3c7a]
>>>                      Parse::do_one_bytecode()+______0x32da
>>>                                   V [libjvm.so+0x8c60f8]
>>>             Parse::do_one_block()+0x178
>>>
>>>                                   V [libjvm.so+0x8c6377]
>>>             Parse::do_all_blocks()+0x127
>>>                                   V [libjvm.so+0x8c95d3]
>>>             Parse::Parse(JVMState*,
>>>                      ciMethod*, float,
>>>                                   Parse*)+0x15a3
>>>
>>>                                   V [libjvm.so+0x3b6529]
>>>                      ParseGenerator::generate(______JVMState*,
>>>                                   Parse*)+0x99
>>>                                   V [libjvm.so+0x46111c]
>>>             Compile::Compile(ciEnv*,
>>>                      C2Compiler*,
>>>                                   ciMethod*, int, bool, bool,
>>> bool)+0x128c
>>>
>>>                                   V [libjvm.so+0x3b5008]
>>>                      C2Compiler::compile_method(______ciEnv*,
>>>                                   ciMethod*,
>>>                                   int)+0x198
>>>                                   V [libjvm.so+0x46982a]
>>>
>>>
>>>             CompileBroker::invoke_______compiler_on_method(______
>>> CompileTask*)+0xc8a
>>>
>>>                                   V [libjvm.so+0x46c230]
>>>
>>>               CompileBroker::compiler_______thread_loop()+0x620
>>>                                   V [libjvm.so+0x9e303f]
>>>                      JavaThread::thread_main_inner(______)+0xdf
>>>
>>>                                   V [libjvm.so+0x9e3205]
>>>             JavaThread::run()+0x1b5
>>>                                   V [libjvm.so+0x8a00c8]
>>>             java_start(Thread*)+0x108
>>>
>>>
>>>
>>>                                   Full dump here:
>>>             https://gist.github.com/______martint/783cf3e30c17fc897423
>>>             <https://gist.github.com/____martint/783cf3e30c17fc897423>
>>>
>>>             <https://gist.github.com/____martint/783cf3e30c17fc897423
>>>             <https://gist.github.com/__martint/783cf3e30c17fc897423>>
>>>
>>>
>>>             <https://gist.github.com/____martint/783cf3e30c17fc897423
>>>             <https://gist.github.com/__martint/783cf3e30c17fc897423>
>>>
>>>             <https://gist.github.com/__martint/783cf3e30c17fc897423
>>>             <https://gist.github.com/martint/783cf3e30c17fc897423>>>
>>>
>>>
>>>                                        The only bug I found which could
>>>             be related is
>>>                      next:
>>>
>>>             https://bugs.openjdk.java.net/________browse/JDK-8029381
>>>             <https://bugs.openjdk.java.net/______browse/JDK-8029381>
>>>
>>>             <https://bugs.openjdk.java.__net/____browse/JDK-8029381
>>>             <https://bugs.openjdk.java.net/____browse/JDK-8029381>>
>>>
>>>                      <https://bugs.openjdk.java.___
>>> _net/__browse/JDK-8029381
>>>
>>>             <https://bugs.openjdk.java.__net/__browse/JDK-8029381
>>>             <https://bugs.openjdk.java.net/__browse/JDK-8029381>>>
>>>
>>>               <https://bugs.openjdk.java.______net/browse/JDK-8029381
>>>
>>>               <https://bugs.openjdk.java.____net/browse/JDK-8029381
>>>                      <https://bugs.openjdk.java.__net/browse/JDK-8029381
>>>             <https://bugs.openjdk.java.net/browse/JDK-8029381>>>>
>>>
>>>
>>>                                   Unfortunately, I can't see this bug
>>>             report. I get
>>>                      redirected
>>>                                   to the
>>>                                   login screen.
>>>
>>>
>>>                                   Thanks!
>>>                                   Martin
>>>
>>>
>>>
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140915/bcb73088/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list