[9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log

Vladimir Kozlov vladimir.kozlov at oracle.com
Sat Apr 23 01:00:28 UTC 2016


Yes, this fix is good enough.

Usually when engineers are using print assembly they know about this problem and use CICompilerCount=1 to have only one compiler thread to print output.

Thanks,
Vladimir

On 4/22/16 6:53 AM, Nils Eliasson wrote:
> Hi,
>
> Background:
> The compilelog can get corrupted and the VM may assert on "failed: bad tag in log" when printing opto_assembly. (Print assembly turns on print opto_assembly if hs_dis is not present.)
>
> When printing opto_assembly in output.cpp we may loose the ttylock (break_tty_lock_for_safepoint) due to a safepoint in both print_metadata and dump_asm. Another thread can claim the lock and start
> printing. When the safepoint is over both threads will think they own the lock. The content will look ok thanks to the xml stream adding the writing thread tag to the log. The closing xml-tag has two
> problems: 1) It uses a raw_print and may get intermingled with other output 2) The xml tag stack tracking may see a bad sequence of tags.
>
> Solution:
> Retake the ttylock before printing the closing print_optoassembly tag. (I have only observed this safepoint issue with print_optoassembly.)
>
> If another tag already has the lock and is printing print_nmethod for example, print opto_assembly will block. Here we can have two variants:
>
> 1) the other thread will print something else (like print_nmethod) - then that tag will be closed before releaseing the lock, and the tag stack will be consistent but the output may look like
> <print_optoassembly>...<print_nmethod>...</print_nmethod></print_optoassembly>
>
> 2) the other thread is also printing opto_assembly. Then that thread may yield the look during a safepoint while the first one is retaking the look. Then we can get <print_optoassembly
> id=1><print_optoassembly id=2></print_optoassembly (for id 1)></print_optoassembly (for id 2)> or <print_optoassembly id=1><print_optoassembly id=2></print_optoassembly (for id 2)></print_optoassembly
> (for id 1)>
>
> Fortunately the xml stack consistency will be ok since it doesn't make any difference on what thread wrote the print_optoassembly tag.
>
> Pre-mortem
> If this issue pops-up again we must investigate if there are more places in the compile log code that yields the tty lock on a safepoint. NoSafePointVerifiers don't seem to check on all transitions.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02
>
> Testing:
> All regression tests.
>
> Regards,
> Nils Eliasson
>
> On 2016-04-20 22:07, Vladimir Kozlov wrote:
>> On 4/20/16 12:56 PM, Nils Eliasson wrote:
>>> Hi,
>>>
>>> Thanks for the help,
>>>
>>> I got it to work, and added NoSafePointVerifiers to make sure I hadn't
>>> missed anything. Then after many test iterations it failed again. It
>>> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get
>>> a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to
>>> the temporary buffer too, or relax the tag checks in the xml and accept
>>> that the output may need to be sorted by writer-thread before use. The
>>> output looks like:
>>>
>>> <writer id=1/>
>>> <print_optoassembly>
>>> ...
>>> releases tty when blocking on a safepoint
>>> <writer id=2/>
>>> <print_nmethod>
>>> ...
>>> <writer id=1/>         // back again after safepoint writing without
>>> ttylock now.
>>> </print_optoassembly>  // Here we fail on an assert today when we expect
>>> a closing print_nmethod tag
>>> <writer id=2/>
>>> </print_nmethod>
>>>
>>> This is malformed xml but has enough information to be reconstructed.
>>> Would this be an acceptable output?
>>
>> Yes, I think it is acceptable - we don't loose information. And it is not worse than it was before.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Regards,
>>> Nils
>>>
>>>
>>> On 2016-04-18 19:30, Vladimir Kozlov wrote:
>>>> tty would have the same problem but it use C_HEAP to allocate:
>>>>
>>>>     defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal)
>>>> defaultStream();
>>>>
>>>> Please, look if you can do something similar.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/18/16 4:24 AM, Nils Eliasson wrote:
>>>>> Resizeable is better, but then we assert on expanding the stringbuffer
>>>>> while being under a different ResourceMark.
>>>>>
>>>>> Regards,
>>>>> Nils
>>>>>
>>>>> On 2016-04-15 19:44, Vladimir Kozlov wrote:
>>>>>> Use resizable stream:
>>>>>>
>>>>>> stringStream(size_t initial_bufsize = 256);
>>>>>>
>>>>>> 1024 may not be enough.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please review this fix of print opto_assembly.
>>>>>>>
>>>>>>> Summary:
>>>>>>> The compilelog can get corrupted and the VM may assert on "failed:
>>>>>>> bad tag in log".
>>>>>>>
>>>>>>> When printing assembly in output.cpp we first take the ttylock, print
>>>>>>> the head and then the method metadata. However the
>>>>>>> metadata printing makes a vm entry and may block for a safepoint and
>>>>>>> will then release the lock
>>>>>>> (break_tty_lock_for_safepoint). After that some of the other compiler
>>>>>>> thread that haven't safepointed will take the lock
>>>>>>> and the broken log will be a fact when the safepoint is over and the
>>>>>>> first thread starts logging again.
>>>>>>>
>>>>>>> Solution:
>>>>>>> Print the method metadata to a temporary buffer, then take the tty
>>>>>>> lock.
>>>>>>>
>>>>>>> Testing:
>>>>>>> Repro from bug stops failing.
>>>>>>> Running :hotspot_all
>>>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527
>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nils Eliasson
>>>>>
>>>
>


More information about the hotspot-compiler-dev mailing list