RFR 8191278: MappedByteBuffer bulk access memory failures are not handled gracefully

Tue Jun 4 11:31:24 UTC 2019

Hi Vladimir,

Thanks for the review.

Best regards,

Jamsheed

On 04/06/19 6:36 AM, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir
>
> On 6/3/19 11:42 AM, Jamsheed wrote:
>> Hi Vladimir, Dan,
>>
>> added two cases in  Assembler::locate_operand as movdqa was 
>> unrecognized. update webrev in place
>>
>> +    case 0x6F: // movdq
>> +    case 0x7F: // movdq
>>
>> full diff : http://cr.openjdk.java.net/~jcm/8191278/webrev.06/
>>
>> http://cr.openjdk.java.net/~jcm/8191278/webrev.05/ + (incremental 
>> patch) http://cr.openjdk.java.net/~jcm/8191278/webrev.06_02/
>>
>> Best regards,
>>
>> Jamsheed
>>
>> On 31/05/19 10:34 PM, Jamsheed wrote:
>>> Hi Vladimir, Dan,
>>>
>>> On 15/05/19 2:24 AM, Vladimir Kozlov wrote:
>>>> Sorry, I missed this. Please "ping" if you do not get responses.
>>>>
>>>> It become even more complicated :( but I understand why you are 
>>>> doing this way. You did great job.
>>>>
>>>> I am thinking about separating the fix for arraycopy stubs fix and 
>>>> adding graceful exit for Unsafe_CopyMemory and Unsafe_CopySwapMemory.
>>>   It would be really good if it can be done in runtime, i tried this 
>>> approach as it seemed far more easier.
>>>
>>>   aarch64 port cpp implementation actually crashes for 
>>> Unsafe_CopyMemory.
>>>
>>>>
>>>> My main concern is new swap code complicates reliable arraycopy 
>>>> code for very corner case. And you implemented it only for x64 anyway.
>>>
>>> copySwap cpp code may work for all cases (depends on compiler again) 
>>> , so i am ok with removing the changes.
>>>
>>> but today when i was testing the removed code[1], found another 
>>> issue. some x86 instructions generated by cpp copyswap code  is not 
>>> recognized by our assembler and it crashes in signalHandler 
>>> Assembler::locate_next_instruction(pc). i will investigate and file 
>>> a bug.
>>>
>>> Best regards,
>>>
>>> Jamsheed
>>>
>>> [1]
>>>
>>> full diff : http://cr.openjdk.java.net/~jcm/8191278/webrev.06/
>>>
>>> http://cr.openjdk.java.net/~jcm/8191278/webrev.05/ + (incremental 
>>> patch) http://cr.openjdk.java.net/~jcm/8191278/webrev.06_02/
>>>
>>>>
>>>> I would like to hear what Runtime group think.
>>>>
>>>> On 4/30/19 8:17 PM, Jamsheed wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Thank you for all the feedback.
>>>>>
>>>>> please find the revised webrev.
>>>>>
>>>>> http://cr.openjdk.java.net/~jcm/8191278/webrev.05/
>>>>>
>>>>> what changed
>>>>>
>>>>> 1) Unsafe Copy Memory regions, error exits are captured using 
>>>>> UnsafeCopyMemoryMark, UnsafeCopyMemory.
>>>>
>>>> This is good.
>>>>
>>>>>
>>>>> 2) all Unsafe copy (intrinsic or native) ,uses array copy stub.
>>>>
>>>> Right, otherwise we would have to duplicate logic in Copy:: 
>>>> platform specific C++ methods. But may be it is fine to do in C++ 
>>>> in this case. Or not do that at all as other platforms.
>>>>
>>>>>
>>>>> 3) Unsafe copyswap x64 implementation extends arraycopy stub and 
>>>>> used as reliable exit(fast exit too)*. it is not implemented for 
>>>>> other platforms.
>>>>
>>>> As I said I have concern about this.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> *copySwap was tested using copySwap test in jdk dir(with different 
>>>>> buffer sizes)
>>>>>
>>>>> newly added test tested on linux(aarch64,arm32,x86(32/64)) and 
>>>>> platforms in mach5. + mach(1-5)
>>>>>
>>>>> ppc test is not done.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Jamsheed
>>>>>
>>>>>
>>>>> On 12/04/19 9:14 PM, Vladimir Kozlov wrote:
>>>>>> Hi Jamsheed
>>>>>>
>>>>>> I think new methods and the table don't need to be part of 
>>>>>> StubRoutines class. Your new class is visible in all places too. 
>>>>>> You can move methods and table into new class. Then you don't 
>>>>>> need long names and new code will be in one place.
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/11/19 11:00 PM, Jamsheed wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> On 12/04/19 12:20 AM, Vladimir Kozlov wrote:
>>>>>>>> On 4/11/19 10:25 AM, Jamsheed wrote:
>>>>>>>>> Hi Vladimir,
>>>>>>>>>
>>>>>>>>> the runtime calls uses indirect call, and it is not that 
>>>>>>>>> straight forward to compare dst i guess.
>>>>>>>>
>>>>>>>> Okay. Yes, we may load dest into register since it is 
>>>>>>>> considered far call (outside CodeCache).
>>>>>>>> But at least you can find nmethod. So we can do 
>>>>>>>> nm->has_unsafe_access() check.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> will use doing_unsafe_access and error table as you suggested, 
>>>>>>>>> doing_unsafe_access for intrinsic call doesn't require 
>>>>>>>>> volatile semantics in c2 i believe.
>>>>>>>>
>>>>>>>> Yes, we don't need fragile frame walking if we use 
>>>>>>>> doing_unsafe_access.
>>>>>>>>
>>>>>>>> There is MemBarCPUOrder already in inline_unsafe_copyMemory() 
>>>>>>>> which will prevent store instruction moves in generated code. 
>>>>>>>> But it does not prevent CPU (Aarch64?) to schedule store in 
>>>>>>>> different place.
>>>>>>>>
>>>>>>>> On other hand we need to read it in Signal handle. I would 
>>>>>>>> assume all stores should be flushed when we hit SEGV.
>>>>>>> yes
>>>>>>>>
>>>>>>>> I thought about avoiding your error table. But you really need 
>>>>>>>> continuation PC for graceful return.
>>>>>>>> I was thinking to have a new separate stub to restore registers 
>>>>>>>> and pop frame. But return code in stubs varies unfortunately. 
>>>>>>>> So we need a table.
>>>>>>>>
>>>>>>>> One complain about table is its name too long. And it should be 
>>>>>>>> unsafe_copymemory to hint intrinsic. Can it be 
>>>>>>>> unsafe_copymemory_error and UnsafeCopyMemoryError class.
>>>>>>>> StubRoutines:: is_unsafe_copymemory() and 
>>>>>>>> next_pc_for_unsafe_copymemory_error()
>>>>>>> yes
>>>>>>>>
>>>>>>>> I did not get why you providing next PC only for 64 bit VM.
>>>>>>>
>>>>>>> next_pc is calculated for all case(both 32 bit and 64 bit). this 
>>>>>>> should work for c2-intrisics at-least except for arm.
>>>>>>>
>>>>>>> fast exit is implemented only for x64, as of now.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> all code whose behavior is unpredictable will be removed. like 
>>>>>>>>> arm int/c1, non intrinsic c2 (And other platforms).
>>>>>>>>
>>>>>>>> Okay.
>>>>>>>
>>>>>>> so i am planning to remove graceful exit for all unpredictable 
>>>>>>> cases. so old behavior will be seen if there is an exit at 
>>>>>>> startup(SIGBUS crash).
>>>>>>>
>>>>>>> and steady state use will be mostly c2 intrinsic and will have 
>>>>>>> graceful exit.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Jamsheed
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Jamsheed
>>>>>>>>>
>>>>>>>>> On 11/04/19 5:17 AM, Jamsheed wrote:
>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>
>>>>>>>>>> On 10/04/19 10:03 PM, Vladimir Kozlov wrote:
>>>>>>>>>>> Okay, I see what you did. But it seems incomplete. You did 
>>>>>>>>>>> not set continue_pc for some platforms. Why?
>>>>>>>>>> for some platform i don't have testing setup, others are not 
>>>>>>>>>> very much used in servers(32 bit case).
>>>>>>>>>>>
>>>>>>>>>>> Also this will trigger the code if we hit segv for normal 
>>>>>>>>>>> arraycopy. You may want to lookup caller frame to get 
>>>>>>>>>>> address from call instruction and compare it with 
>>>>>>>>>>> _unsafe_arraycopy:
>>>>>>>>>>>
>>>>>>>>>>> if (StubCodeDesc::desc_for(pc)) {
>>>>>>>>>>>   frame fr = os::fetch_frame_from_context(uc);
>>>>>>>>>>>   address ret_pc = fr.sender_pc();
>>>>>>>>>>>   CodeBlob* cb = CodeCache::find_blob_unsafe(ret_pc);
>>>>>>>>>>>   CompiledMethod* nm = (cb != NULL) ? 
>>>>>>>>>>> cb->as_compiled_method_or_null() : NULL;
>>>>>>>>>>>   if (nm != NULL && NativeCall::is_call_before(ret_pc)) {
>>>>>>>>>>>     address dest = nativeCall_before(ret_pc)->destination();
>>>>>>>>>>>     if (dest == StubRoutines::_unsafe_arraycopy) {
>>>>>>>>>>>
>>>>>>>>>>> But you need to verify if it works.
>>>>>>>>>>
>>>>>>>>>> this should work i guess.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Jamsheed
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 4/9/19 8:08 PM, Jamsheed wrote:
>>>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for looking at this.
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/04/19 4:01 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>> Hi Jamsheed,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Instead of finding PC in stubs we should use something 
>>>>>>>>>>>>> similar to GuardUnsafeAccess to set thread's 
>>>>>>>>>>>>> doing_unsafe_access flag when we call copy stub for unsafe 
>>>>>>>>>>>>> memory as you suggested first (in bug report).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Interpreter set the flag for Unsafe.CopyMemory0() and 
>>>>>>>>>>>>> Unsafe.copySwapMemory0(). C2 has intrinsic only for 
>>>>>>>>>>>>> CopyMemory0():
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/6ad0281a654e/src/hotspot/share/opto/library_call.cpp#l4189 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It only use unsafe_arraycopy stab:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/6ad0281a654e/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#l2434 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Setting on it's entry and cleaning on exit 
>>>>>>>>>>>>> Thread::_doing_unsafe_access field should be enough. Right?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or I am missing something?
>>>>>>>>>>>>
>>>>>>>>>>>> initially thought of implementing it that way[1], but as it 
>>>>>>>>>>>> is having both store and volatile semantics went with this 
>>>>>>>>>>>> zero overhead solution.
>>>>>>>>>>>>
>>>>>>>>>>>> also, that doesn't provide me  continuation pc, which is 
>>>>>>>>>>>> required for fast exit for bulkaccess or even for graceful 
>>>>>>>>>>>> exit in platform like arm.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> An other thing which bothering me is Harold's comment:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Unfortunately, "CompiledMethod* nm" gets set to NULL and 
>>>>>>>>>>>>> so handle_unsafe_access() is not executed."
>>>>>>>>>>>>>
>>>>>>>>>>>>> Where/why nm is NULLed?
>>>>>>>>>>>> as we are in BufferBlob/RuntimeBlob(stub frame).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually I think the whole code for "// BugId 4454115" is 
>>>>>>>>>>>>> questionable since it replaces any crash (most likely not 
>>>>>>>>>>>>> related to unsafe access) in compiled method which has at 
>>>>>>>>>>>>> least one unsafe access with exception. May be we should 
>>>>>>>>>>>>> use PcDesc to record unsafe instructions and compare with 
>>>>>>>>>>>>> SEGV PC. But it is separate RFE. For this one we need to 
>>>>>>>>>>>>> fix only Unsafe.CopyMemory0() C2 inrinsic.
>>>>>>>>>>>>
>>>>>>>>>>>> Right, Ok.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Jamsheed
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>
>>>>>>>>>>>> [1]http://cr.openjdk.java.net/~jcm/8191278/webrev.00/src/hotspot/share/opto/library_call.cpp.udiff.html 
>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/8/19 4:21 AM, Tobias Hartmann wrote:
>>>>>>>>>>>>>> Hi Jamsheed,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 05.04.19 15:11, Jamsheed wrote:
>>>>>>>>>>>>>>> On 04/04/19 7:23 PM, Andrew Haley wrote:
>>>>>>>>>>>>>>>>> this looks reasonable to me although the code is 
>>>>>>>>>>>>>>>>> getting quite complicated to handle this edge case.
>>>>>>>>>>>>>>>> Yeah, it really is. Can't we just assume that *any* 
>>>>>>>>>>>>>>>> fault in these stubs is
>>>>>>>>>>>>>>>> caused via Unsafe, and get rid of bool 
>>>>>>>>>>>>>>>> unsafe_copy_code_range?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the feedback Tobias, Andrew, removed 
>>>>>>>>>>>>>>> unsafe_copy_code_range.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~jcm/8191278/webrev.04/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think what Andrew meant is basically to go with webrev.01:
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8191278/webrev.01/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Right?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Tobias
>>>>>>>>>>>>>>