[9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log
Nils Eliasson
nils.eliasson at oracle.com
Mon Apr 25 08:33:20 UTC 2016
Thank you Vladimir!
//Nils
On 2016-04-23 03:00, Vladimir Kozlov wrote:
> Yes, this fix is good enough.
>
> Usually when engineers are using print assembly they know about this
> problem and use CICompilerCount=1 to have only one compiler thread to
> print output.
>
> Thanks,
> Vladimir
>
> On 4/22/16 6:53 AM, Nils Eliasson wrote:
>> Hi,
>>
>> Background:
>> The compilelog can get corrupted and the VM may assert on "failed:
>> bad tag in log" when printing opto_assembly. (Print assembly turns on
>> print opto_assembly if hs_dis is not present.)
>>
>> When printing opto_assembly in output.cpp we may loose the ttylock
>> (break_tty_lock_for_safepoint) due to a safepoint in both
>> print_metadata and dump_asm. Another thread can claim the lock and start
>> printing. When the safepoint is over both threads will think they own
>> the lock. The content will look ok thanks to the xml stream adding
>> the writing thread tag to the log. The closing xml-tag has two
>> problems: 1) It uses a raw_print and may get intermingled with other
>> output 2) The xml tag stack tracking may see a bad sequence of tags.
>>
>> Solution:
>> Retake the ttylock before printing the closing print_optoassembly
>> tag. (I have only observed this safepoint issue with
>> print_optoassembly.)
>>
>> If another tag already has the lock and is printing print_nmethod for
>> example, print opto_assembly will block. Here we can have two variants:
>>
>> 1) the other thread will print something else (like print_nmethod) -
>> then that tag will be closed before releaseing the lock, and the tag
>> stack will be consistent but the output may look like
>> <print_optoassembly>...<print_nmethod>...</print_nmethod></print_optoassembly>
>>
>>
>> 2) the other thread is also printing opto_assembly. Then that thread
>> may yield the look during a safepoint while the first one is retaking
>> the look. Then we can get <print_optoassembly
>> id=1><print_optoassembly id=2></print_optoassembly (for id
>> 1)></print_optoassembly (for id 2)> or <print_optoassembly
>> id=1><print_optoassembly id=2></print_optoassembly (for id
>> 2)></print_optoassembly
>> (for id 1)>
>>
>> Fortunately the xml stack consistency will be ok since it doesn't
>> make any difference on what thread wrote the print_optoassembly tag.
>>
>> Pre-mortem
>> If this issue pops-up again we must investigate if there are more
>> places in the compile log code that yields the tty lock on a
>> safepoint. NoSafePointVerifiers don't seem to check on all transitions.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02
>>
>> Testing:
>> All regression tests.
>>
>> Regards,
>> Nils Eliasson
>>
>> On 2016-04-20 22:07, Vladimir Kozlov wrote:
>>> On 4/20/16 12:56 PM, Nils Eliasson wrote:
>>>> Hi,
>>>>
>>>> Thanks for the help,
>>>>
>>>> I got it to work, and added NoSafePointVerifiers to make sure I hadn't
>>>> missed anything. Then after many test iterations it failed again. It
>>>> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry
>>>> to get
>>>> a ciSymbol->as_utf8. Now I am considering if I should direct
>>>> dump_asm to
>>>> the temporary buffer too, or relax the tag checks in the xml and
>>>> accept
>>>> that the output may need to be sorted by writer-thread before use. The
>>>> output looks like:
>>>>
>>>> <writer id=1/>
>>>> <print_optoassembly>
>>>> ...
>>>> releases tty when blocking on a safepoint
>>>> <writer id=2/>
>>>> <print_nmethod>
>>>> ...
>>>> <writer id=1/> // back again after safepoint writing without
>>>> ttylock now.
>>>> </print_optoassembly> // Here we fail on an assert today when we
>>>> expect
>>>> a closing print_nmethod tag
>>>> <writer id=2/>
>>>> </print_nmethod>
>>>>
>>>> This is malformed xml but has enough information to be reconstructed.
>>>> Would this be an acceptable output?
>>>
>>> Yes, I think it is acceptable - we don't loose information. And it
>>> is not worse than it was before.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Nils
>>>>
>>>>
>>>> On 2016-04-18 19:30, Vladimir Kozlov wrote:
>>>>> tty would have the same problem but it use C_HEAP to allocate:
>>>>>
>>>>> defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal)
>>>>> defaultStream();
>>>>>
>>>>> Please, look if you can do something similar.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 4/18/16 4:24 AM, Nils Eliasson wrote:
>>>>>> Resizeable is better, but then we assert on expanding the
>>>>>> stringbuffer
>>>>>> while being under a different ResourceMark.
>>>>>>
>>>>>> Regards,
>>>>>> Nils
>>>>>>
>>>>>> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>>>>>>> Use resizable stream:
>>>>>>>
>>>>>>> stringStream(size_t initial_bufsize = 256);
>>>>>>>
>>>>>>> 1024 may not be enough.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please review this fix of print opto_assembly.
>>>>>>>>
>>>>>>>> Summary:
>>>>>>>> The compilelog can get corrupted and the VM may assert on "failed:
>>>>>>>> bad tag in log".
>>>>>>>>
>>>>>>>> When printing assembly in output.cpp we first take the ttylock,
>>>>>>>> print
>>>>>>>> the head and then the method metadata. However the
>>>>>>>> metadata printing makes a vm entry and may block for a
>>>>>>>> safepoint and
>>>>>>>> will then release the lock
>>>>>>>> (break_tty_lock_for_safepoint). After that some of the other
>>>>>>>> compiler
>>>>>>>> thread that haven't safepointed will take the lock
>>>>>>>> and the broken log will be a fact when the safepoint is over
>>>>>>>> and the
>>>>>>>> first thread starts logging again.
>>>>>>>>
>>>>>>>> Solution:
>>>>>>>> Print the method metadata to a temporary buffer, then take the tty
>>>>>>>> lock.
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> Repro from bug stops failing.
>>>>>>>> Running :hotspot_all
>>>>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Nils Eliasson
>>>>>>
>>>>
>>
More information about the hotspot-compiler-dev
mailing list