[9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings

Mon Jan 18 07:09:51 UTC 2016

Thanks, Vladimir!

Best,
Tobias

On 15.01.2016 19:14, Vladimir Kozlov wrote:
> Very good.
> 
> Thanks,
> Vladimir
> 
> On 1/13/16 4:00 AM, Tobias Hartmann wrote:
>> Thanks, Vladimir.
>>
>> On 12.01.2016 20:24, Vladimir Kozlov wrote:
>>>> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string.
>>>
>>> Yes, that is right solution here.
>>
>> I changed the implementation to only capture the byte[] and char[] memory:
>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.03/
>>
>> The method GraphKit::capture_memory(src_type, dst_type) returns a new MergeMemNode if the src and dst types are different, merging the two.
>>
>> Best,
>> Tobias
>>
>>> On 1/12/16 5:59 AM, Tobias Hartmann wrote:
>>>> On 11.01.2016 21:00, Vladimir Kozlov wrote:
>>>>> On 1/11/16 7:20 AM, Tobias Hartmann wrote:
>>>>>> On 08.01.2016 20:41, Vladimir Kozlov wrote:
>>>>>>> On 1/8/16 2:37 AM, Tobias Hartmann wrote:
>>>>>>>> On 07.01.2016 21:49, Vladimir Kozlov wrote:
>>>>>>>>> On 1/7/16 6:52 AM, Tobias Hartmann wrote:
>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>
>>>>>>>>>> On 07.01.2016 00:58, Vladimir Kozlov wrote:
>>>>>>>>>>> Andrew is right.
>>>>>>>>>>
>>>>>>>>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic.
>>>>>>>>>
>>>>>>>>> Right. It was the root of this bug, see below.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I fixed this for the inflate and compress intrinsics.
>>>>>>>>>>
>>>>>>>>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up.
>>>>>>>>>>> StrInflatedCopyNode is not memory node.
>>>>>>>>>>
>>>>>>>>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says:
>>>>>>>>>
>>>>>>>>> I did not get the question. Is it before your webrev.01 change? Or even with the change?
>>>>>>>>
>>>>>>>> I meant with webrev.01 but you answered my question below.
>>>>>>>>
>>>>>>>>>>       // This class defines a projection of the memory  state of a store conditional node.
>>>>>>>>>>       // These nodes return a value, but also update memory.
>>>>>>>>>>
>>>>>>>>>> But inflate does not return any value.
>>>>>>>>>
>>>>>>>>> Hmm, according to bottom type inflate produce memory:
>>>>>>>>>
>>>>>>>>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; }
>>>>>>>>>
>>>>>>>>> So it really does not need SCMemProjNode. Sorry about that.
>>>>>>>>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES.
>>>>>>>>
>>>>>>>> Exactly.
>>>>>>>>
>>>>>>>>> Instead of SCMemProjNode we should have to change the idx of your dst_type:
>>>>>>>>>
>>>>>>>>> set_memory(str, dst_type);
>>>>>>>>
>>>>>>>> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string.
>>>>>>>>
>>>>>>>>> And you should rollback part of changes in escape.cpp and macro.cpp.
>>>>>>>>
>>>>>>>> Okay, I'll to that.
>>>>>>>>
>>>>>>>>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly:
>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/
>>>>>>>>>
>>>>>>>>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes.
>>>>>>>>
>>>>>>>> Okay, should we then use BOTTOM for both the input and output type?
>>>>>>>
>>>>>>> Only input. Output type corresponds to dst array type which you set correctly now.
>>>>>>
>>>>>> It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case:
>>>>>>     StoreC
>>>>>>     inflate_string
>>>>>>     LoadC
>>>>>>
>>>>>> The memory graph (def->use) now looks like this:
>>>>>>     LoadC -> inflate_string -> ByteMem
>>>>>>                   ... StoreC-> CharMem
>>>>>
>>>>> I did not get this. If StoreC node is created before inflate_string - inflate_string should point to it be barrier for LoadC.
>>>>
>>>> Note that the StoreC and inflate_string are *not* writing to the same char[] array. The test looks like this:
>>>>
>>>>    char c1[] = new char[1];
>>>>    char c2[] = new char[1];
>>>>
>>>>    c2[0] = 42;
>>>>    // Inflate String from byte[] to char[]
>>>>    s.getChars(0, 1, c1, 0);
>>>>    // Read char[] memory written before inflation
>>>>    return c2[0];
>>>>
>>>> The result should be 42. The problem is that inflate_string does not point to StoreC because inflate_string uses a byte[] as input and in this case also writes to a different char[]. Even if we set the input to BOTTOM, inflate_string points to 7 Parm (BOTTOM) but not to the char[] memory produced by 96 StoreC:
>>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png
>>>>
>>>> 349 LoadUS then reads from the output char[] memory of inflate_string which does not include the result of StoreC. The test fails because the return value is != 42.
>>>>
>>>> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string.
>>>>
>>>>>    If StoreC followed inflate_string and LoadC followed StoreC - LoadC should point to StoreC. If LoadC does not follow StoreC then result is relaxed.
>>>>
>>>> Yes, these cases work fine.
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>>>> The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2').
>>>>>>
>>>>>> Setting the input to BOTTOM, generates the following graph:
>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png
>>>>>> The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails.
>>>>>>
>>>>>> I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this:
>>>>>>     LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem))
>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png
>>>>>>
>>>>>> Here is the new webrev:
>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/
>>>>>> Probably, we could also only capture the byte and char slices instead of merging everything. What do you think?
>>>>>>
>>>>>> Best,
>>>>>> Tobias
>>>>>>
>>>>>>>>>> Related question:
>>>>>>>>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns:
>>>>>>>>>>       char[int:>=0]:exact+any *
>>>>>>>>>>
>>>>>>>>>> which is equal to the type of the char load.
>>>>>>>>>
>>>>>>>>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array?
>>>>>>>>
>>>>>>>> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp:
>>>>>>>>
>>>>>>>> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[])
>>>>>>>> //   int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len)
>>>>>>>> //   int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len)
>>>>>>>> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[])
>>>>>>>> //   void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len)
>>>>>>>> //   void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len)
>>>>>>>>
>>>>>>>> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[].
>>>>>>>>
>>>>>>>>> Should we also be more careful in inflate_string_slow()? Is it used?
>>>>>>>>
>>>>>>>> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[].
>>>>>>>>
>>>>>>>>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type:
>>>>>>>>>>       char[int:1]:NotNull:exact *
>>>>>>>>>>
>>>>>>>>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type?
>>>>>>>>>
>>>>>>>>> It is indeed strange. What memory type of LoadUS? It could be bug.
>>>>>>>>
>>>>>>>> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8.
>>>>>>>>
>>>>>>>> I will look into this again and try to understand what happens.
>>>>>>>
>>>>>>> It could that aryptr is pointer to array and load type is pointer to array's element.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>>>>> On 1/6/16 5:34 AM, Andrew Haley wrote:
>>>>>>>>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The problem here is that C2 reorders memory instructions and moves
>>>>>>>>>>>>> an array load before an array store. The MemBarCPUOrder is now used
>>>>>>>>>>>>> (compiler internally) to prevent this. We do the same for normal
>>>>>>>>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual
>>>>>>>>>>>>> code is emitted. See also the comment in memnode.hpp:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        // Ordering within the same CPU.  Used to order unsafe memory references
>>>>>>>>>>>>>        // inside the compiler when we lack alias info.  Not needed "outside" the
>>>>>>>>>>>>>        // compiler because the CPU does all the ordering for us.
>>>>>>>>>>>>>
>>>>>>>>>>>>> "CPU does all the ordering for us" means that even with a relaxed
>>>>>>>>>>>>> memory ordering, loads are never moved before dependent stores.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or did I misunderstand your question?
>>>>>>>>>>>>
>>>>>>>>>>>> No, I don't think so.  I was just checking: I am very aware that
>>>>>>>>>>>> HotSpot has presented those of use with relaxed memory order machines
>>>>>>>>>>>> with some interesting gotchas over the years, that's all.  I'm a bit
>>>>>>>>>>>> surprised that C2 needs this barrier, given that there is a
>>>>>>>>>>>> read-after-write dependency, but never mind.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Andrew.
>>>>>>>>>>>>