[vector] ClassCastException in C2 with Mask rebracketing

Mon Feb 26 18:18:55 UTC 2018

Adam,

Can you, please, run the test with fastdebug build?

There should be additional debugging output printed right before the crash:

   2690   } else {
   2691 #ifndef PRODUCT
   2692     tty->print_cr("vbox"); vbox->dump(3);
   2693     tty->print_cr("vect"); vbox->dump(3);
   2694 #endif // PRODUCT
   2695     fatal("");
   2696     return NULL;
   2697   }

Regarding division-by-zero bug, it looks like a problem with 
initializing vector store into newly created box. I'll try to spot the bug.

Best regards,
Vladimir Ivanov

On 2/23/18 6:10 PM, Adam Pocock wrote:
> Hi Vladimir,
> 
> That patch does fix the issue in the binary search. Your patch didn't 
> apply cleanly on top of my source tree + Razvan's patch (I pulled just 
> before applying both Razvan & your patches), but it seems to be ok. I 
> got "1 out of 20 hunks FAILED -- saving rejects to file 
> src/hotspot/share/opto/library_call.cpp.rej", the two functions 
> LibraryCallKit::addMasking and LibraryCallKit::inline_bin_vector_op 
> didn't remove cleanly, so I just commented them out and applied the 
> remainder of that hunk manually (which changed the call signature of 
> LibraryCallKit::inline_un_vector_op).
> 
> I returned to running the application which uses the binary search, and 
> now it's just core dumping on me at compile.cpp line 2695 (with vector 
> intrinsics turned on). With them turned off it occasionally gives me an 
> ArithmeticException (divide by zero) out of IntVector.floorMod, but the 
> argument to floorMod is a IntSpecies.broadcast(arg). I made the argument 
> to the broadcast a final field (which is set to 60 on construction) to 
> ensure it wasn't modified (which it wasn't by my code) and now the 
> exception is dependent on whether I put a print statement before the 
> floorMod call to check it's still 60 (with print statement, no 
> exception, without print statement ArithmeticException dividing by 
> zero). With the print statement it then gets into an infinite loop in 
> the binary search function, but this infinite loop happens before the 
> arithmetic exception would happen in the version without (as I have a 
> print statement at the top counting the number of documents it's 
> processed, and it's variable when it loops infinitely, but it happens 
> before the ArithmeticException).
> 
> Stack trace for the ArithmeticException:
> Caused by: java.lang.ArithmeticException: / by zero
>      at 
> jdk.incubator.vector/jdk.incubator.vector.IntVector.lambda$floorMod$35(IntVector.java:317) 
> 
>      at 
> jdk.incubator.vector/jdk.incubator.vector.Int256Vector.bOp(Int256Vector.java:89) 
> 
>      at 
> jdk.incubator.vector/jdk.incubator.vector.Int256Vector.bOp(Int256Vector.java:33) 
> 
>      at 
> jdk.incubator.vector/jdk.incubator.vector.IntVector.floorMod(IntVector.java:317) 
> 
>      at 
> com.oracle.labs.mlrg.topicmodel.util.vector.VectorRNG.nextInt(VectorRNG.java:142) 
> 
> 
> Attaching IntelliJ's debugger to it causes the JVM to dump core no 
> matter if it's using vector intrinsics or not, but I haven't tried 
> setting IntelliJ's JVM to panama, I don't know if that will have an effect.
> 
> Testing just the vectorRNG.nextInt call indicates that C2 is failing to 
> compile it properly with "-XX:-UseVectorApiIntrinsics", as it gives the 
> ArithmeticException when I ask for 1 million random numbers (dies around 
> the 10000th iteration), however when using the intrinsics and/or with 
> -XX:TieredStopAtLevel=3 it completes all the iterations.
> 
> I'm not sure what other things I should try.
> 
> Thanks,
> 
> Adam
> 
> 
> 
> On 23/02/18 07:17, Vladimir Ivanov wrote:
>> Adam,
>>
>> Please, try the following patch (on top of the one Razvan sent out 
>> yesterday [1]):
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics/webrev.07 
>>
>>
>> It fixes CCE bug for me.
>>
>> Regarding generated code quality, I see issues with vector box 
>> elimination and they are caused by:
>>   (1) no intrinsic for Mask.not() yet [2]
>>
>>   (2) interface calls hinders the analysis by wrappign VectorBox nodes 
>> into CheckCastPP which hide exact class info by casting them to 
>> interfaces
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] 
>> http://cr.openjdk.java.net/~rlupusoru/panama/webrev_maskboxingavx512_02/index.html 
>>
>>
>> [2] @ 259   jdk.incubator.vector.AbstractMask::not (5 bytes) inline (hot)
>>      \-> TypeProfile (721/721 counts) = 
>> jdk/incubator/vector/Float256Vector$Float256Mask
>>       @ 1   jdk.incubator.vector.AbstractMask::not (10 bytes) inline 
>> (hot)
>>         @ 1 
>> java.lang.invoke.LambdaForm$MH/100555887::linkToTargetMethod (8 bytes) 
>> force inline by annotation
>>           @ 4   java.lang.invoke.LambdaForm$MH/611437735::invoke (8 
>> bytes)   force inline by annotation
>>         @ 6 jdk.incubator.vector.Float256Vector$Float256Mask::uOp (6 
>> bytes) inline (hot)
>>           @ 2 jdk.incubator.vector.Float256Vector$Float256Mask::uOp 
>> (61 bytes) too big
>>
>> [3]
>>
>> On 2/21/18 11:59 PM, Adam Pocock wrote:
>>> Ok. I'm trying to work around this issue by turning off C2 
>>> compilation of the binarySearchCDF method, so I can do correctness 
>>> testing on the rest of the SIMD code, and maybe put it in a profiler 
>>> to see how many cache misses I'm causing etc.
>>>
>>> I'm using "-XX:CompilerDirectivesFile=filename" pointed at a 
>>> directives file containing:
>>>
>>> [
>>> {
>>>      match: 
>>> "com/oracle/labs/mlrg/topicmodel/util/vector/BinarySearch.binarySearchCDF*", 
>>>
>>>      c2: {
>>>              Exclude: true,
>>>      },
>>> },
>>> {
>>>      match: "*Float256Vector$Float256Mask.rebracket*",
>>>      c2: {
>>>              Exclude: true,
>>>      },
>>> },
>>> {
>>>      match: "*Int256Vector$Int256Mask.rebracket*",
>>>      c2: {
>>>              Exclude: true,
>>>      },
>>> },
>>> ]
>>>
>>> but all I've managed to do is move around the place where I get a 
>>> ClassCastException. It's still in the binarySearch but it moves to 
>>> different rebracket operations. I did once manage to move it to a 
>>> completely incomprehensible place where it dies with 
>>> ClassCastException from Float256Vector to Int256Vector, in a function 
>>> which only takes IntVector arguments, but that was after the binary 
>>> search call and I suspect something weird is happening.
>>>
>>> Also now when I turn off the vector intrinsics with 
>>> "-XX:-UseVectorApiIntrinsics" and remove the compiler directives, 
>>> Hotspot core dumps midway through execution. It runs past that point 
>>> if I turn off C2 with "-XX:TieredStopAtLevel=3".
>>>
>>> Feb 21, 2018 3:53:08 PM 
>>> com.oracle.labs.mlrg.topicmodel.model.train.SSCA train
>>> INFO: Iteration 0
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  Internal Error (compile.cpp:2695), pid=21949, tid=21976
>>> #  Error: fatal error
>>> #
>>> # JRE version: OpenJDK Runtime Environment (11.0) (build 
>>> 11-internal+0-adhoc.apocock.panama)
>>> # Java VM: OpenJDK 64-Bit Server VM 
>>> (11-internal+0-adhoc.apocock.panama, mixed mode, tiered, compressed 
>>> oops, g1 gc, linux-amd64)
>>> # Core dump will be written. Default location: Core dumps may be 
>>> processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping 
>>> to /local/Repositories/TopicModel/reuters-test/core.21949)
>>>
>>> Should I just wait till the generalised intrinsics support has fully 
>>> landed? Is there some Hotspot incantation which will let me test out 
>>> areas of the code which don't use casting.
>>>
>>> Thanks,
>>>
>>> Adam
>>>
>>> On 16/02/18 18:14, Vladimir Ivanov wrote:
>>>>
>>>>> Yep, the problem went away with "-XX:-UseVectorApiIntrinsics". My 
>>>>> vector shapes are unchanging though, everything is fixed at S256Bit 
>>>>> as that's what I've got on my desktop. I thought the rebracket for 
>>>>> things like masks were supposed to be optimised out, as a Float 256 
>>>>> Mask is the same bit string as an Integer 256 Mask.
>>>>
>>>> Yes, rebracketing turns into an no-op when boxes go away. Otherwise, 
>>>> the operand has to be boxed/reboxed.
>>>>
>>>>> Any idea when the generalized intrinsics will land?
>>>>
>>>> The first batch is already there, but no exact dates when the rest 
>>>> follow.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> On 15/02/18 12:50, Vladimir Ivanov wrote:
>>>>>> Adam,
>>>>>>
>>>>>> Thanks for the report.
>>>>>>
>>>>>> I hit a similar problem with vectors and tracked it down to a 
>>>>>> changes in Parse::do_call (disabled C->optimize_virtual_call() on 
>>>>>> vectors). The bug is devirtualized non-inlined call can float 
>>>>>> before the type check it depends on. It leads to a wrong method 
>>>>>> being called when type check fails and manifests as a CCE in 
>>>>>> interpreter after deoptimization (due to failed type check). It 
>>>>>> usually happens when vector shapes change at runtime: C2 produces 
>>>>>> a method specialized for some particular vector shape and then the 
>>>>>> first time the method observes a different vector shape.
>>>>>>
>>>>>> The problem is specific to original intrinsics which rely on some 
>>>>>> inlining tweaks (like in Parse::do_call) to make intrinsification 
>>>>>> more reliable.
>>>>>>
>>>>>> I suggest you to try -XX:-UseVectorApiIntrinsics and check whether 
>>>>>> the problem goes away.
>>>>>>
>>>>>> Generalized intrinsics will be used (where available) and they 
>>>>>> shouldn't be prone to that problem (relevant code path in 
>>>>>> Parse::do_call is used only for original intrinsics).
>>>>>>
>>>>>> Unfortunately, not all operations are covered by generalized 
>>>>>> intrinsics yet. So, generated code quality may suffer as well.
>>>>>>
>>>>>> I didn't bother fixing the bug because the plan is to replace 
>>>>>> original intrinsics with generalized ones and remove inlining tweaks.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> On 2/15/18 8:11 PM, Adam Pocock wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been working on more involved machine learning demos on top 
>>>>>>> of the vector API. As part of that I built a binary search demo 
>>>>>>> that searches n vectors at the same time using an n-wide SIMD. 
>>>>>>> This keeps rebracketing the Mask from Integer to Float and back 
>>>>>>> again as it searches through the arrays.
>>>>>>>
>>>>>>> Paul Sandoz has been helping me debug it, and when this code is 
>>>>>>> run using C1 or lower it executes fine, but when it's recompiled 
>>>>>>> with C2 (triggered by executing binarySearchCDF in a loop with 
>>>>>>> the same arguments and a print statement, takes about 300-500 
>>>>>>> iterations) it throws a ClassCastException (stack trace below). 
>>>>>>> Turning off C2 with "-XX:TieredStopAtLevel=3" allows the loop to 
>>>>>>> complete.
>>>>>>>
>>>>>>> Code:
>>>>>>>
>>>>>>>      public static <S extends Vector.Shape> IntVector<S> 
>>>>>>> binarySearchCDF(IntSpecies<S> spec, float[][] input, int 
>>>>>>> fromIndex, int toIndex, FloatVector<S> key) {
>>>>>>>          IntVector<S> low = spec.broadcast(fromIndex);
>>>>>>>          IntVector<S> high = spec.broadcast(toIndex - 1);
>>>>>>>          IntVector<S> one = spec.broadcast(1);
>>>>>>>
>>>>>>>          Mask<Float,S> mask = key.species().trueMask();
>>>>>>>
>>>>>>>          int[] indicesBuffer = new int[key.length()];
>>>>>>>          float[] valuesBuffer = new float[key.length()];
>>>>>>>
>>>>>>>          while (mask.anyTrue()) {
>>>>>>>              IntVector<S> mid = 
>>>>>>> low.add(high,mask.rebracket(Integer.class)).shiftR(1);
>>>>>>>              mid.intoArray(indicesBuffer,0);
>>>>>>>              for (int i = 0; i < valuesBuffer.length; i++) {
>>>>>>>                  valuesBuffer[i] = input[i][indicesBuffer[i]];
>>>>>>>              }
>>>>>>>              FloatVector<S> values = 
>>>>>>> key.species().fromArray(valuesBuffer,0);
>>>>>>>
>>>>>>>              Mask<Integer,S> lessThanKey = 
>>>>>>> values.lessThan(key).and(mask).rebracket(Integer.class);
>>>>>>>              low = low.blend(mid.add(one),lessThanKey);
>>>>>>>              Mask<Integer,S> greaterThanKey = 
>>>>>>> values.greaterThan(key).and(mask).rebracket(Integer.class);
>>>>>>>              high = high.blend(mid.sub(one),greaterThanKey);
>>>>>>>              Mask<Integer,S> equalsKey = 
>>>>>>> values.equal(key).and(mask).rebracket(Integer.class);
>>>>>>>              low = low.blend(mid,equalsKey);
>>>>>>>              mask = 
>>>>>>> mask.and(equalsKey.rebracket(Float.class).not());
>>>>>>>              mask = 
>>>>>>> mask.and(low.lessThan(high).rebracket(Float.class));
>>>>>>>          }
>>>>>>>
>>>>>>>          return low;
>>>>>>>      }
>>>>>>>
>>>>>>> Stack trace:
>>>>>>>
>>>>>>> Caused by: java.lang.ClassCastException: 
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Int256Vector$Int256Mask 
>>>>>>> cannot be cast to 
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask 
>>>>>>>
>>>>>>>      at 
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:488) 
>>>>>>>
>>>>>>>      at 
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:430) 
>>>>>>>
>>>>>>>      at 
>>>>>>> mlrg.topicmodel/com.oracle.labs.mlrg.topicmodel.util.vector.BinarySearch.binarySearchCDF(BinarySearch.java:37) 
>>>>>>>
>>>>>>>
>>>>>>> Line 37 is:
>>>>>>>              mask = 
>>>>>>> mask.and(equalsKey.rebracket(Float.class).not());
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Adam
>>>>>>>
>>>>>
>>>
>