RFR(M): 8166317: InterpreterCodeSize should be computed

Christian Thalinger cthalinger at twitter.com
Mon Oct 9 19:45:58 UTC 2017


> On Oct 9, 2017, at 9:24 AM, Volker Simonis <volker.simonis at gmail.com> wrote:
> 
> Hi Vladimir,
> 
> I've analyzed the crash. The problem is Sparc specific because on
> Sparc we do not call the SharedRuntime for G1 pre/post barriers (i.e.
> SharedRuntime::g1_wb_pre() / SharedRuntime::g1_wb_post()) like on
> other architectures. Instead we lazily create assembler stubs on the
> fly (generate_satb_log_enqueue_if_necessary() /
> generate_dirty_card_log_enqueue_if_necessary()) when they are needed.

Why are we using these stubs on SPARC?  Can we get rid of them and just call into the runtime instead?

> This happens during the generation of the interpreter and allocates
> more memory in the code cache such that we can't shrink the memory
> which was initially allocated for the interpreter any more.
> 
> Unfortunately we can't easily generate these stubs during
> 'stubRoutines_init1()' because
> 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map
> base address which is only initialized in
> 'CardTableModRefBS::initialize()' during 'univers_init()' which
> happens after 'stubRoutines_init1()'.
> 
> I'm still thinking about a good way to fix this without too many
> platfrom-specific ifdefs.
> 
> Regards,
> Volker
> 
> 
> On Tue, Oct 3, 2017 at 9:46 PM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com> wrote:
>> I rebased it. But there is problem with changes. VM hit guarantee() in this
>> code when run on SPARC in both, fastdebug and product, builds.
>> Crash happens during build. We can't push this - problem should be
>> investigated and fixed first.
>> 
>> Thanks,
>> Vladimir
>> 
>> make/Main.gmk:443: recipe for target 'generate-link-opt-data' failed
>> /usr/ccs/bin/bash: line 4:  9349 Abort                   (core dumped)
>> /s/build/solaris-sparcv9-debug/support/interim-image/bin/java
>> -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist
>> -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true -cp
>> /s/build/solaris-sparcv9-debug/support/classlist.jar
>> build.tools.classlist.HelloClasslist 2>&1 >
>> /s/build/solaris-sparcv9-debug/support/link_opt/default_jli_trace.txt
>> make[3]: *** [/s/build/solaris-sparcv9-debug/support/link_opt/classlist]
>> Error 134
>> make[2]: *** [generate-link-opt-data] Error 1
>> 
>> 
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (/s/open/src/hotspot/share/memory/heap.cpp:233), pid=9349,
>> tid=2
>> #  guarantee(b == block_at(_next_segment - actual_number_of_segments))
>> failed: Intermediate allocation!
>> #
>> # JRE version:  (10.0) (fastdebug build )
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug
>> 10-internal+0-2017-09-30-014154.8166317, mixed mode, tiered, compressed
>> oops, g1 gc, solaris-sparc)
>> # Core dump will be written. Default location: /s/open/make/core or
>> core.9349
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://bugreport.java.com/bugreport/crash.jsp
>> #
>> 
>> ---------------  S U M M A R Y ------------
>> 
>> Command Line:
>> -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist
>> -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true
>> build.tools.classlist.HelloClasslist
>> 
>> Host: sca00dbv, Sparcv9 64 bit 3600 MHz, 16 cores, 32G, Oracle Solaris 11.2
>> SPARC
>> Time: Sat Sep 30 03:29:46 2017 UTC elapsed time: 0 seconds (0d 0h 0m 0s)
>> 
>> ---------------  T H R E A D  ---------------
>> 
>> Current thread (0x000000010012f000):  JavaThread "Unknown thread"
>> [_thread_in_vm, id=2, stack(0x0007fffef9700000,0x0007fffef9800000)]
>> 
>> Stack: [0x0007fffef9700000,0x0007fffef9800000],  sp=0x0007fffef97ff020,
>> free space=1020k
>> Native frames: (J=compiled Java code, A=aot compiled Java code,
>> j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x1f94508]  void VMError::report_and_die(int,const char*,const
>> char*,void*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned
>> long)+0xa58
>> V  [libjvm.so+0x1f93a3c]  void VMError::report_and_die(Thread*,const
>> char*,int,const char*,const char*,void*)+0x3c
>> V  [libjvm.so+0xd02f38]  void report_vm_error(const char*,int,const
>> char*,const char*,...)+0x78
>> V  [libjvm.so+0xfc219c]  void CodeHeap::deallocate_tail(void*,unsigned
>> long)+0xec
>> V  [libjvm.so+0xbf4f14]  void CodeCache::free_unused_tail(CodeBlob*,unsigned
>> long)+0xe4
>> V  [libjvm.so+0x1e0ae70]  void StubQueue::deallocate_unused_tail()+0x40
>> V  [libjvm.so+0x1e7452c]  void TemplateInterpreter::initialize()+0x19c
>> V  [libjvm.so+0x1051220]  void interpreter_init()+0x20
>> V  [libjvm.so+0x10116e0]  int init_globals()+0xf0
>> V  [libjvm.so+0x1ed8548]  int
>> Threads::create_vm(JavaVMInitArgs*,bool*)+0x4a8
>> V  [libjvm.so+0x11c7b58]  int
>> JNI_CreateJavaVM_inner(JavaVM_**,void**,void*)+0x108
>> C  [libjli.so+0x7950]  InitializeJVM+0x100
>> 
>> 
>> On 10/2/17 7:55 AM, coleen.phillimore at oracle.com wrote:
>>> 
>>> 
>>> I can sponsor this for you once you rebase, and fix these compilation
>>> errors.
>>> Thanks,
>>> Coleen
>>> 
>>> On 9/30/17 12:28 AM, Volker Simonis wrote:
>>>> 
>>>> Hi Vladimir,
>>>> 
>>>> thanks a lot for remembering these changes!
>>>> 
>>>> Regards,
>>>> Volker
>>>> 
>>>> 
>>>> Vladimir Kozlov <vladimir.kozlov at oracle.com
>>>> <mailto:vladimir.kozlov at oracle.com>> schrieb am Fr. 29. Sep. 2017 um 15:47:
>>>> 
>>>>    I hit build failure when tried to push changes:
>>>> 
>>>>    src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' :
>>>> conversion from 'size_t' to 'int', possible loss of data
>>>>    src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' :
>>>> conversion from 'size_t' to 'int', possible loss of data
>>>> 
>>>>    I am going to fix it by casting (int):
>>>> 
>>>>    +  void adjust_size(size_t used) {
>>>>    +    _size = (int)used;
>>>>    +    _data_offset = (int)used;
>>>>    +    _code_end = (address)this + used;
>>>>    +    _data_end = (address)this + used;
>>>>    +  }
>>>> 
>>>>    Note, CodeCache size can't more than 2Gb (max_int) so such casting is
>>>> fine.
>>>> 
>>>>    Vladimir
>>>> 
>>>>    On 9/6/17 6:20 AM, Volker Simonis wrote:
>>>>> On Tue, Sep 5, 2017 at 9:36 PM,  <coleen.phillimore at oracle.com
>>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>>>>> 
>>>>>> I was going to make the same comment about the friend declaration
>>>> in v1, so
>>>>>> v2 looks better to me.  Looks good.  Thank you for finding a
>>>> solution to
>>>>>> this problem that we've had for a long time.  I will sponsor this
>>>> (remind me
>>>>>> if I forget after the 18th).
>>>>>> 
>>>>> 
>>>>> Thanks Coleen! I've updated
>>>>> 
>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/
>>>> <http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v2/>
>>>>> 
>>>>> in-place and added you as a second reviewer.
>>>>> 
>>>>> Regards,
>>>>> Volker
>>>>> 
>>>>> 
>>>>>> thanks,
>>>>>> Coleen
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 9/5/17 1:17 PM, Vladimir Kozlov wrote:
>>>>>>> 
>>>>>>> On 9/5/17 9:49 AM, Volker Simonis wrote:
>>>>>>>> 
>>>>>>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov
>>>>>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> May be add new CodeBlob's method to adjust sizes instead of
>>>> directly
>>>>>>>>> setting
>>>>>>>>> them in  CodeCache::free_unused_tail(). Then you would not need
>>>> friend
>>>>>>>>> class
>>>>>>>>> CodeCache in CodeBlob.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Changed as suggested (I didn't liked the friend declaration as
>>>> well :)
>>>>>>>> 
>>>>>>>>> Also I think adjustment to header_size should be done in
>>>>>>>>> CodeCache::free_unused_tail() to limit scope of code who knows
>>>> about
>>>>>>>>> blob
>>>>>>>>> layout.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yes, that's much cleaner. Please find the updated webrev here:
>>>>>>>> 
>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/
>>>> <http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v2/>
>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Good.
>>>>>>> 
>>>>>>>> 
>>>>>>>> I've also found another "day 1" problem in StubQueue::next():
>>>>>>>> 
>>>>>>>>     Stub* next(Stub* s) const         { int i =
>>>>>>>> index_of(s) + stub_size(s);
>>>>>>>> -          if (i ==
>>>>>>>> _buffer_limit) i = 0;
>>>>>>>> +          // Only wrap
>>>>>>>> around in the non-contiguous case (see stubss.cpp)
>>>>>>>> +          if (i ==
>>>>>>>> _buffer_limit && _queue_end < _buffer_limit) i = 0;
>>>>>>>>           return (i ==
>>>>>>>> _queue_end) ? NULL : stub_at(i);
>>>>>>>>         }
>>>>>>>> 
>>>>>>>> The problem was that the method was not prepared to handle the
>>>> case
>>>>>>>> where _buffer_limit == _queue_end == _buffer_size which lead to
>>>> an
>>>>>>>> infinite recursion when iterating over a StubQueue with
>>>>>>>> StubQueue::next() until next() returns NULL (as this was for
>>>> example
>>>>>>>> done with -XX:+PrintInterpreter). But with the new, trimmed
>>>> CodeBlob
>>>>>>>> we run into exactly this situation.
>>>>>>> 
>>>>>>> 
>>>>>>> Okay.
>>>>>>> 
>>>>>>>> 
>>>>>>>> While doing this last fix I also noticed that
>>>> "StubQueue::stubs_do()",
>>>>>>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't
>>>> seem
>>>>>>>> to be used anywhere in the open code base (please correct me if
>>>> I'm
>>>>>>>> wrong). What do you think, maybe we should remove this code in a
>>>>>>>> follow up change if it is really not needed?
>>>>>>> 
>>>>>>> 
>>>>>>> register_queue() is used in constructor. Other 2 you can remove.
>>>>>>> stub_code_begin() and stub_code_end() are not used too -remove.
>>>>>>> I thought we run on linux with flag which warn about unused code.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Finally, could you please run the new version through JPRT and
>>>> sponsor
>>>>>>>> it once jdk10/hs will be opened again?
>>>>>>> 
>>>>>>> 
>>>>>>> Will do when jdk10 "consolidation" is finished. Please, remind me
>>>> later if
>>>>>>> I forget.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Volker
>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 9/1/17 8:46 AM, Volker Simonis wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I've decided to split the fix for the
>>>> 'CodeHeap::contains_blob()'
>>>>>>>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest
>>>> fails
>>>>>>>>>> because of problems in CodeHeap::contains_blob()"
>>>>>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started
>>>> a new
>>>>>>>>>> review thread for discussing it at:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html
>>>>>>>>>> 
>>>>>>>>>> So please lets keep this thread for discussing the interpreter
>>>> code
>>>>>>>>>> size issue only. I've prepared a new version of the webrev
>>>> which is
>>>>>>>>>> the same as the first one with the only difference that the
>>>> change to
>>>>>>>>>> 'CodeHeap::contains_blob()' has been removed:
>>>>>>>>>> 
>>>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/
>>>> <http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v1/>
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Volker
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis
>>>>>>>>>> <volker.simonis at gmail.com <mailto:volker.simonis at gmail.com>>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov
>>>>>>>>>>> <vladimir.kozlov at oracle.com
>>>> <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Very good change. Thank you, Volker.
>>>>>>>>>>>> 
>>>>>>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod
>>>>>>>>>>>> allocated
>>>>>>>>>>>> in
>>>>>>>>>>>> CHeap and not in aot code section (which is RO):
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124
>>>>>>>>>>>> 
>>>>>>>>>>>> It is allocated in CHeap after AOT library is loaded. Its
>>>>>>>>>>>> code_begin()
>>>>>>>>>>>> points to AOT code section but AOTCompiledMethod* points
>>>> outside it
>>>>>>>>>>>> (to
>>>>>>>>>>>> normal malloced space) so you can't use (char*)blob address.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the explanation - now I got it.
>>>>>>>>>>> 
>>>>>>>>>>>> There are 2 ways to fix it, I think.
>>>>>>>>>>>> One is to add new field to CodeBlobLayout and set it to
>>>> blob* address
>>>>>>>>>>>> for
>>>>>>>>>>>> normal CodeCache blobs and to code_begin for AOT code.
>>>>>>>>>>>> Second is to use contains(blob->code_end() - 1) assuming
>>>> that AOT
>>>>>>>>>>>> code
>>>>>>>>>>>> is
>>>>>>>>>>>> never zero.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I'll give it a try tomorrow and will send out a new webrev.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Volker
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad
>>>>>>>>>>>>> <claes.redestad at oracle.com
>>>> <mailto:claes.redestad at oracle.com>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> While working on this, I found another problem which is
>>>> related to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing
>>>> the JTreg
>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The problem is that JDK-8183573 replaced
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>       virtual bool contains_blob(const CodeBlob* blob)
>>>> const {
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high();
>>>> }
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> by:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>       bool contains_blob(const CodeBlob* blob) const {
>>>> return
>>>>>>>>>>>>>>> contains(blob->code_begin()); }
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> But that my be wrong in the corner case where the size of
>>>> the
>>>>>>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists
>>>> only of the
>>>>>>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that
>>>> case
>>>>>>>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's
>>>> header
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>> is a memory location which doesn't belong to the CodeBlob
>>>> anymore.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I recall this change was somehow necessary to allow
>>>> merging
>>>>>>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob
>>>> into
>>>>>>>>>>>>>> one devirtualized method, so you need to ensure all AOT
>>>> tests
>>>>>>>>>>>>>> pass with this change (on linux-x64).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and
>>>> passed
>>>>>>>>>>>>> successful. Are there any other tests I should check?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That said, it is a little hard to follow the stages of your
>>>> change.
>>>>>>>>>>>>> It
>>>>>>>>>>>>> seems like
>>>>>>>>>>>>> 
>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/
>>>> <http://cr.openjdk.java.net/%7Eredestad/scratch/codeheap_contains.00/>
>>>>>>>>>>>>> was reviewed [1] but then finally the slightly changed
>>>> version from
>>>>>>>>>>>>> 
>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/
>>>> <http://cr.openjdk.java.net/%7Eredestad/scratch/codeheap_contains.01/>
>>>> 
>>>>>>>>>>>>> was
>>>>>>>>>>>>> checked in and linked to the bug report.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The first, reviewed version of the change still had a
>>>> correct
>>>>>>>>>>>>> version
>>>>>>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while
>>>> the second,
>>>>>>>>>>>>> checked in version has the faulty version of that method.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don't know why you finally did that change to
>>>> 'contains_blob()'
>>>>>>>>>>>>> but
>>>>>>>>>>>>> I don't see any reason why we shouldn't be able to directly
>>>> use the
>>>>>>>>>>>>> blob's address for inclusion checking. From what I
>>>> understand, it
>>>>>>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so
>>>> no
>>>>>>>>>>>>> reason
>>>>>>>>>>>>> to mess with 'CodeBlob::code_begin()'.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please let me know if I'm missing something.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I can't help to wonder if we'd not be better served by
>>>> disallowing
>>>>>>>>>>>>>> zero-sized payloads. Is this something that can ever
>>>> actually
>>>>>>>>>>>>>> happen except by abuse of the white box API?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java)
>>>> specifically
>>>>>>>>>>>>> wants to allocate "segment sized" blocks which is most
>>>> easily
>>>>>>>>>>>>> achieved
>>>>>>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's
>>>> nothing
>>>>>>>>>>>>> wrong
>>>>>>>>>>>>> about it if we handle the inclusion tests correctly.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you and best regards,
>>>>>>>>>>>>> Volker
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> /Claes
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 



More information about the hotspot-dev mailing list