RFR(M): 8166317: InterpreterCodeSize should be computed
Volker Simonis
volker.simonis at gmail.com
Mon Oct 9 19:24:57 UTC 2017
Hi Vladimir,
I've analyzed the crash. The problem is Sparc specific because on
Sparc we do not call the SharedRuntime for G1 pre/post barriers (i.e.
SharedRuntime::g1_wb_pre() / SharedRuntime::g1_wb_post()) like on
other architectures. Instead we lazily create assembler stubs on the
fly (generate_satb_log_enqueue_if_necessary() /
generate_dirty_card_log_enqueue_if_necessary()) when they are needed.
This happens during the generation of the interpreter and allocates
more memory in the code cache such that we can't shrink the memory
which was initially allocated for the interpreter any more.
Unfortunately we can't easily generate these stubs during
'stubRoutines_init1()' because
'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map
base address which is only initialized in
'CardTableModRefBS::initialize()' during 'univers_init()' which
happens after 'stubRoutines_init1()'.
I'm still thinking about a good way to fix this without too many
platfrom-specific ifdefs.
Regards,
Volker
On Tue, Oct 3, 2017 at 9:46 PM, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> I rebased it. But there is problem with changes. VM hit guarantee() in this
> code when run on SPARC in both, fastdebug and product, builds.
> Crash happens during build. We can't push this - problem should be
> investigated and fixed first.
>
> Thanks,
> Vladimir
>
> make/Main.gmk:443: recipe for target 'generate-link-opt-data' failed
> /usr/ccs/bin/bash: line 4: 9349 Abort (core dumped)
> /s/build/solaris-sparcv9-debug/support/interim-image/bin/java
> -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist
> -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true -cp
> /s/build/solaris-sparcv9-debug/support/classlist.jar
> build.tools.classlist.HelloClasslist 2>&1 >
> /s/build/solaris-sparcv9-debug/support/link_opt/default_jli_trace.txt
> make[3]: *** [/s/build/solaris-sparcv9-debug/support/link_opt/classlist]
> Error 134
> make[2]: *** [generate-link-opt-data] Error 1
>
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error (/s/open/src/hotspot/share/memory/heap.cpp:233), pid=9349,
> tid=2
> # guarantee(b == block_at(_next_segment - actual_number_of_segments))
> failed: Intermediate allocation!
> #
> # JRE version: (10.0) (fastdebug build )
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug
> 10-internal+0-2017-09-30-014154.8166317, mixed mode, tiered, compressed
> oops, g1 gc, solaris-sparc)
> # Core dump will be written. Default location: /s/open/make/core or
> core.9349
> #
> # If you would like to submit a bug report, please visit:
> # http://bugreport.java.com/bugreport/crash.jsp
> #
>
> --------------- S U M M A R Y ------------
>
> Command Line:
> -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist
> -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true
> build.tools.classlist.HelloClasslist
>
> Host: sca00dbv, Sparcv9 64 bit 3600 MHz, 16 cores, 32G, Oracle Solaris 11.2
> SPARC
> Time: Sat Sep 30 03:29:46 2017 UTC elapsed time: 0 seconds (0d 0h 0m 0s)
>
> --------------- T H R E A D ---------------
>
> Current thread (0x000000010012f000): JavaThread "Unknown thread"
> [_thread_in_vm, id=2, stack(0x0007fffef9700000,0x0007fffef9800000)]
>
> Stack: [0x0007fffef9700000,0x0007fffef9800000], sp=0x0007fffef97ff020,
> free space=1020k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> V [libjvm.so+0x1f94508] void VMError::report_and_die(int,const char*,const
> char*,void*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned
> long)+0xa58
> V [libjvm.so+0x1f93a3c] void VMError::report_and_die(Thread*,const
> char*,int,const char*,const char*,void*)+0x3c
> V [libjvm.so+0xd02f38] void report_vm_error(const char*,int,const
> char*,const char*,...)+0x78
> V [libjvm.so+0xfc219c] void CodeHeap::deallocate_tail(void*,unsigned
> long)+0xec
> V [libjvm.so+0xbf4f14] void CodeCache::free_unused_tail(CodeBlob*,unsigned
> long)+0xe4
> V [libjvm.so+0x1e0ae70] void StubQueue::deallocate_unused_tail()+0x40
> V [libjvm.so+0x1e7452c] void TemplateInterpreter::initialize()+0x19c
> V [libjvm.so+0x1051220] void interpreter_init()+0x20
> V [libjvm.so+0x10116e0] int init_globals()+0xf0
> V [libjvm.so+0x1ed8548] int
> Threads::create_vm(JavaVMInitArgs*,bool*)+0x4a8
> V [libjvm.so+0x11c7b58] int
> JNI_CreateJavaVM_inner(JavaVM_**,void**,void*)+0x108
> C [libjli.so+0x7950] InitializeJVM+0x100
>
>
> On 10/2/17 7:55 AM, coleen.phillimore at oracle.com wrote:
>>
>>
>> I can sponsor this for you once you rebase, and fix these compilation
>> errors.
>> Thanks,
>> Coleen
>>
>> On 9/30/17 12:28 AM, Volker Simonis wrote:
>>>
>>> Hi Vladimir,
>>>
>>> thanks a lot for remembering these changes!
>>>
>>> Regards,
>>> Volker
>>>
>>>
>>> Vladimir Kozlov <vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>> schrieb am Fr. 29. Sep. 2017 um 15:47:
>>>
>>> I hit build failure when tried to push changes:
>>>
>>> src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' :
>>> conversion from 'size_t' to 'int', possible loss of data
>>> src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' :
>>> conversion from 'size_t' to 'int', possible loss of data
>>>
>>> I am going to fix it by casting (int):
>>>
>>> + void adjust_size(size_t used) {
>>> + _size = (int)used;
>>> + _data_offset = (int)used;
>>> + _code_end = (address)this + used;
>>> + _data_end = (address)this + used;
>>> + }
>>>
>>> Note, CodeCache size can't more than 2Gb (max_int) so such casting is
>>> fine.
>>>
>>> Vladimir
>>>
>>> On 9/6/17 6:20 AM, Volker Simonis wrote:
>>> > On Tue, Sep 5, 2017 at 9:36 PM, <coleen.phillimore at oracle.com
>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>> >>
>>> >> I was going to make the same comment about the friend declaration
>>> in v1, so
>>> >> v2 looks better to me. Looks good. Thank you for finding a
>>> solution to
>>> >> this problem that we've had for a long time. I will sponsor this
>>> (remind me
>>> >> if I forget after the 18th).
>>> >>
>>> >
>>> > Thanks Coleen! I've updated
>>> >
>>> > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/
>>> <http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v2/>
>>> >
>>> > in-place and added you as a second reviewer.
>>> >
>>> > Regards,
>>> > Volker
>>> >
>>> >
>>> >> thanks,
>>> >> Coleen
>>> >>
>>> >>
>>> >>
>>> >> On 9/5/17 1:17 PM, Vladimir Kozlov wrote:
>>> >>>
>>> >>> On 9/5/17 9:49 AM, Volker Simonis wrote:
>>> >>>>
>>> >>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov
>>> >>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>> wrote:
>>> >>>>>
>>> >>>>> May be add new CodeBlob's method to adjust sizes instead of
>>> directly
>>> >>>>> setting
>>> >>>>> them in CodeCache::free_unused_tail(). Then you would not need
>>> friend
>>> >>>>> class
>>> >>>>> CodeCache in CodeBlob.
>>> >>>>>
>>> >>>>
>>> >>>> Changed as suggested (I didn't liked the friend declaration as
>>> well :)
>>> >>>>
>>> >>>>> Also I think adjustment to header_size should be done in
>>> >>>>> CodeCache::free_unused_tail() to limit scope of code who knows
>>> about
>>> >>>>> blob
>>> >>>>> layout.
>>> >>>>>
>>> >>>>
>>> >>>> Yes, that's much cleaner. Please find the updated webrev here:
>>> >>>>
>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/
>>> <http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v2/>
>>>
>>> >>>
>>> >>>
>>> >>> Good.
>>> >>>
>>> >>>>
>>> >>>> I've also found another "day 1" problem in StubQueue::next():
>>> >>>>
>>> >>>> Stub* next(Stub* s) const { int i =
>>> >>>> index_of(s) + stub_size(s);
>>> >>>> - if (i ==
>>> >>>> _buffer_limit) i = 0;
>>> >>>> + // Only wrap
>>> >>>> around in the non-contiguous case (see stubss.cpp)
>>> >>>> + if (i ==
>>> >>>> _buffer_limit && _queue_end < _buffer_limit) i = 0;
>>> >>>> return (i ==
>>> >>>> _queue_end) ? NULL : stub_at(i);
>>> >>>> }
>>> >>>>
>>> >>>> The problem was that the method was not prepared to handle the
>>> case
>>> >>>> where _buffer_limit == _queue_end == _buffer_size which lead to
>>> an
>>> >>>> infinite recursion when iterating over a StubQueue with
>>> >>>> StubQueue::next() until next() returns NULL (as this was for
>>> example
>>> >>>> done with -XX:+PrintInterpreter). But with the new, trimmed
>>> CodeBlob
>>> >>>> we run into exactly this situation.
>>> >>>
>>> >>>
>>> >>> Okay.
>>> >>>
>>> >>>>
>>> >>>> While doing this last fix I also noticed that
>>> "StubQueue::stubs_do()",
>>> >>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't
>>> seem
>>> >>>> to be used anywhere in the open code base (please correct me if
>>> I'm
>>> >>>> wrong). What do you think, maybe we should remove this code in a
>>> >>>> follow up change if it is really not needed?
>>> >>>
>>> >>>
>>> >>> register_queue() is used in constructor. Other 2 you can remove.
>>> >>> stub_code_begin() and stub_code_end() are not used too -remove.
>>> >>> I thought we run on linux with flag which warn about unused code.
>>> >>>
>>> >>>>
>>> >>>> Finally, could you please run the new version through JPRT and
>>> sponsor
>>> >>>> it once jdk10/hs will be opened again?
>>> >>>
>>> >>>
>>> >>> Will do when jdk10 "consolidation" is finished. Please, remind me
>>> later if
>>> >>> I forget.
>>> >>>
>>> >>> Thanks,
>>> >>> Vladimir
>>> >>>
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Volker
>>> >>>>
>>> >>>>> Thanks,
>>> >>>>> Vladimir
>>> >>>>>
>>> >>>>>
>>> >>>>> On 9/1/17 8:46 AM, Volker Simonis wrote:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> I've decided to split the fix for the
>>> 'CodeHeap::contains_blob()'
>>> >>>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest
>>> fails
>>> >>>>>> because of problems in CodeHeap::contains_blob()"
>>> >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started
>>> a new
>>> >>>>>> review thread for discussing it at:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html
>>> >>>>>>
>>> >>>>>> So please lets keep this thread for discussing the interpreter
>>> code
>>> >>>>>> size issue only. I've prepared a new version of the webrev
>>> which is
>>> >>>>>> the same as the first one with the only difference that the
>>> change to
>>> >>>>>> 'CodeHeap::contains_blob()' has been removed:
>>> >>>>>>
>>> >>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/
>>> <http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v1/>
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Volker
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis
>>> >>>>>> <volker.simonis at gmail.com <mailto:volker.simonis at gmail.com>>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov
>>> >>>>>>> <vladimir.kozlov at oracle.com
>>> <mailto:vladimir.kozlov at oracle.com>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> Very good change. Thank you, Volker.
>>> >>>>>>>>
>>> >>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod
>>> >>>>>>>> allocated
>>> >>>>>>>> in
>>> >>>>>>>> CHeap and not in aot code section (which is RO):
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124
>>> >>>>>>>>
>>> >>>>>>>> It is allocated in CHeap after AOT library is loaded. Its
>>> >>>>>>>> code_begin()
>>> >>>>>>>> points to AOT code section but AOTCompiledMethod* points
>>> outside it
>>> >>>>>>>> (to
>>> >>>>>>>> normal malloced space) so you can't use (char*)blob address.
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>> Thanks for the explanation - now I got it.
>>> >>>>>>>
>>> >>>>>>>> There are 2 ways to fix it, I think.
>>> >>>>>>>> One is to add new field to CodeBlobLayout and set it to
>>> blob* address
>>> >>>>>>>> for
>>> >>>>>>>> normal CodeCache blobs and to code_begin for AOT code.
>>> >>>>>>>> Second is to use contains(blob->code_end() - 1) assuming
>>> that AOT
>>> >>>>>>>> code
>>> >>>>>>>> is
>>> >>>>>>>> never zero.
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>> I'll give it a try tomorrow and will send out a new webrev.
>>> >>>>>>>
>>> >>>>>>> Regards,
>>> >>>>>>> Volker
>>> >>>>>>>
>>> >>>>>>>> Thanks,
>>> >>>>>>>> Vladimir
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad
>>> >>>>>>>>> <claes.redestad at oracle.com
>>> <mailto:claes.redestad at oracle.com>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> While working on this, I found another problem which is
>>> related to
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing
>>> the JTreg
>>> >>>>>>>>>>> test
>>> >>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> The problem is that JDK-8183573 replaced
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> virtual bool contains_blob(const CodeBlob* blob)
>>> const {
>>> >>>>>>>>>>> return
>>> >>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high();
>>> }
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> by:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> bool contains_blob(const CodeBlob* blob) const {
>>> return
>>> >>>>>>>>>>> contains(blob->code_begin()); }
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> But that my be wrong in the corner case where the size of
>>> the
>>> >>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists
>>> only of the
>>> >>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that
>>> case
>>> >>>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's
>>> header
>>> >>>>>>>>>>> which
>>> >>>>>>>>>>> is a memory location which doesn't belong to the CodeBlob
>>> anymore.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> I recall this change was somehow necessary to allow
>>> merging
>>> >>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob
>>> into
>>> >>>>>>>>>> one devirtualized method, so you need to ensure all AOT
>>> tests
>>> >>>>>>>>>> pass with this change (on linux-x64).
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and
>>> passed
>>> >>>>>>>>> successful. Are there any other tests I should check?
>>> >>>>>>>>>
>>> >>>>>>>>> That said, it is a little hard to follow the stages of your
>>> change.
>>> >>>>>>>>> It
>>> >>>>>>>>> seems like
>>> >>>>>>>>>
>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/
>>> <http://cr.openjdk.java.net/%7Eredestad/scratch/codeheap_contains.00/>
>>> >>>>>>>>> was reviewed [1] but then finally the slightly changed
>>> version from
>>> >>>>>>>>>
>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/
>>> <http://cr.openjdk.java.net/%7Eredestad/scratch/codeheap_contains.01/>
>>>
>>> >>>>>>>>> was
>>> >>>>>>>>> checked in and linked to the bug report.
>>> >>>>>>>>>
>>> >>>>>>>>> The first, reviewed version of the change still had a
>>> correct
>>> >>>>>>>>> version
>>> >>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while
>>> the second,
>>> >>>>>>>>> checked in version has the faulty version of that method.
>>> >>>>>>>>>
>>> >>>>>>>>> I don't know why you finally did that change to
>>> 'contains_blob()'
>>> >>>>>>>>> but
>>> >>>>>>>>> I don't see any reason why we shouldn't be able to directly
>>> use the
>>> >>>>>>>>> blob's address for inclusion checking. From what I
>>> understand, it
>>> >>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so
>>> no
>>> >>>>>>>>> reason
>>> >>>>>>>>> to mess with 'CodeBlob::code_begin()'.
>>> >>>>>>>>>
>>> >>>>>>>>> Please let me know if I'm missing something.
>>> >>>>>>>>>
>>> >>>>>>>>> [1]
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html
>>> >>>>>>>>>
>>> >>>>>>>>>> I can't help to wonder if we'd not be better served by
>>> disallowing
>>> >>>>>>>>>> zero-sized payloads. Is this something that can ever
>>> actually
>>> >>>>>>>>>> happen except by abuse of the white box API?
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java)
>>> specifically
>>> >>>>>>>>> wants to allocate "segment sized" blocks which is most
>>> easily
>>> >>>>>>>>> achieved
>>> >>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's
>>> nothing
>>> >>>>>>>>> wrong
>>> >>>>>>>>> about it if we handle the inclusion tests correctly.
>>> >>>>>>>>>
>>> >>>>>>>>> Thank you and best regards,
>>> >>>>>>>>> Volker
>>> >>>>>>>>>
>>> >>>>>>>>>> /Claes
>>> >>
>>> >>
>>>
>>
>
More information about the hotspot-dev
mailing list