[vector] ClassCastException in C2 with Mask rebracketing

Adam Pocock adam.pocock at oracle.com
Fri Feb 23 15:10:39 UTC 2018


Hi Vladimir,

That patch does fix the issue in the binary search. Your patch didn't 
apply cleanly on top of my source tree + Razvan's patch (I pulled just 
before applying both Razvan & your patches), but it seems to be ok. I 
got "1 out of 20 hunks FAILED -- saving rejects to file 
src/hotspot/share/opto/library_call.cpp.rej", the two functions 
LibraryCallKit::addMasking and LibraryCallKit::inline_bin_vector_op 
didn't remove cleanly, so I just commented them out and applied the 
remainder of that hunk manually (which changed the call signature of  
LibraryCallKit::inline_un_vector_op).

I returned to running the application which uses the binary search, and 
now it's just core dumping on me at compile.cpp line 2695 (with vector 
intrinsics turned on). With them turned off it occasionally gives me an 
ArithmeticException (divide by zero) out of IntVector.floorMod, but the 
argument to floorMod is a IntSpecies.broadcast(arg). I made the argument 
to the broadcast a final field (which is set to 60 on construction) to 
ensure it wasn't modified (which it wasn't by my code) and now the 
exception is dependent on whether I put a print statement before the 
floorMod call to check it's still 60 (with print statement, no 
exception, without print statement ArithmeticException dividing by 
zero). With the print statement it then gets into an infinite loop in 
the binary search function, but this infinite loop happens before the 
arithmetic exception would happen in the version without (as I have a 
print statement at the top counting the number of documents it's 
processed, and it's variable when it loops infinitely, but it happens 
before the ArithmeticException).

Stack trace for the ArithmeticException:
Caused by: java.lang.ArithmeticException: / by zero
     at 
jdk.incubator.vector/jdk.incubator.vector.IntVector.lambda$floorMod$35(IntVector.java:317)
     at 
jdk.incubator.vector/jdk.incubator.vector.Int256Vector.bOp(Int256Vector.java:89)
     at 
jdk.incubator.vector/jdk.incubator.vector.Int256Vector.bOp(Int256Vector.java:33)
     at 
jdk.incubator.vector/jdk.incubator.vector.IntVector.floorMod(IntVector.java:317)
     at 
com.oracle.labs.mlrg.topicmodel.util.vector.VectorRNG.nextInt(VectorRNG.java:142)

Attaching IntelliJ's debugger to it causes the JVM to dump core no 
matter if it's using vector intrinsics or not, but I haven't tried 
setting IntelliJ's JVM to panama, I don't know if that will have an effect.

Testing just the vectorRNG.nextInt call indicates that C2 is failing to 
compile it properly with "-XX:-UseVectorApiIntrinsics", as it gives the 
ArithmeticException when I ask for 1 million random numbers (dies around 
the 10000th iteration), however when using the intrinsics and/or with 
-XX:TieredStopAtLevel=3 it completes all the iterations.

I'm not sure what other things I should try.

Thanks,

Adam



On 23/02/18 07:17, Vladimir Ivanov wrote:
> Adam,
>
> Please, try the following patch (on top of the one Razvan sent out 
> yesterday [1]):
>
> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics/webrev.07 
>
>
> It fixes CCE bug for me.
>
> Regarding generated code quality, I see issues with vector box 
> elimination and they are caused by:
>   (1) no intrinsic for Mask.not() yet [2]
>
>   (2) interface calls hinders the analysis by wrappign VectorBox nodes 
> into CheckCastPP which hide exact class info by casting them to 
> interfaces
>
> Best regards,
> Vladimir Ivanov
>
> [1] 
> http://cr.openjdk.java.net/~rlupusoru/panama/webrev_maskboxingavx512_02/index.html
>
> [2] @ 259   jdk.incubator.vector.AbstractMask::not (5 bytes) inline (hot)
>      \-> TypeProfile (721/721 counts) = 
> jdk/incubator/vector/Float256Vector$Float256Mask
>       @ 1   jdk.incubator.vector.AbstractMask::not (10 bytes) inline 
> (hot)
>         @ 1 
> java.lang.invoke.LambdaForm$MH/100555887::linkToTargetMethod (8 bytes) 
> force inline by annotation
>           @ 4   java.lang.invoke.LambdaForm$MH/611437735::invoke (8 
> bytes)   force inline by annotation
>         @ 6 jdk.incubator.vector.Float256Vector$Float256Mask::uOp (6 
> bytes) inline (hot)
>           @ 2 jdk.incubator.vector.Float256Vector$Float256Mask::uOp 
> (61 bytes) too big
>
> [3]
>
> On 2/21/18 11:59 PM, Adam Pocock wrote:
>> Ok. I'm trying to work around this issue by turning off C2 
>> compilation of the binarySearchCDF method, so I can do correctness 
>> testing on the rest of the SIMD code, and maybe put it in a profiler 
>> to see how many cache misses I'm causing etc.
>>
>> I'm using "-XX:CompilerDirectivesFile=filename" pointed at a 
>> directives file containing:
>>
>> [
>> {
>>      match: 
>> "com/oracle/labs/mlrg/topicmodel/util/vector/BinarySearch.binarySearchCDF*", 
>>
>>      c2: {
>>              Exclude: true,
>>      },
>> },
>> {
>>      match: "*Float256Vector$Float256Mask.rebracket*",
>>      c2: {
>>              Exclude: true,
>>      },
>> },
>> {
>>      match: "*Int256Vector$Int256Mask.rebracket*",
>>      c2: {
>>              Exclude: true,
>>      },
>> },
>> ]
>>
>> but all I've managed to do is move around the place where I get a 
>> ClassCastException. It's still in the binarySearch but it moves to 
>> different rebracket operations. I did once manage to move it to a 
>> completely incomprehensible place where it dies with 
>> ClassCastException from Float256Vector to Int256Vector, in a function 
>> which only takes IntVector arguments, but that was after the binary 
>> search call and I suspect something weird is happening.
>>
>> Also now when I turn off the vector intrinsics with 
>> "-XX:-UseVectorApiIntrinsics" and remove the compiler directives, 
>> Hotspot core dumps midway through execution. It runs past that point 
>> if I turn off C2 with "-XX:TieredStopAtLevel=3".
>>
>> Feb 21, 2018 3:53:08 PM 
>> com.oracle.labs.mlrg.topicmodel.model.train.SSCA train
>> INFO: Iteration 0
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (compile.cpp:2695), pid=21949, tid=21976
>> #  Error: fatal error
>> #
>> # JRE version: OpenJDK Runtime Environment (11.0) (build 
>> 11-internal+0-adhoc.apocock.panama)
>> # Java VM: OpenJDK 64-Bit Server VM 
>> (11-internal+0-adhoc.apocock.panama, mixed mode, tiered, compressed 
>> oops, g1 gc, linux-amd64)
>> # Core dump will be written. Default location: Core dumps may be 
>> processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping 
>> to /local/Repositories/TopicModel/reuters-test/core.21949)
>>
>> Should I just wait till the generalised intrinsics support has fully 
>> landed? Is there some Hotspot incantation which will let me test out 
>> areas of the code which don't use casting.
>>
>> Thanks,
>>
>> Adam
>>
>> On 16/02/18 18:14, Vladimir Ivanov wrote:
>>>
>>>> Yep, the problem went away with "-XX:-UseVectorApiIntrinsics". My 
>>>> vector shapes are unchanging though, everything is fixed at S256Bit 
>>>> as that's what I've got on my desktop. I thought the rebracket for 
>>>> things like masks were supposed to be optimised out, as a Float 256 
>>>> Mask is the same bit string as an Integer 256 Mask.
>>>
>>> Yes, rebracketing turns into an no-op when boxes go away. Otherwise, 
>>> the operand has to be boxed/reboxed.
>>>
>>>> Any idea when the generalized intrinsics will land?
>>>
>>> The first batch is already there, but no exact dates when the rest 
>>> follow.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> On 15/02/18 12:50, Vladimir Ivanov wrote:
>>>>> Adam,
>>>>>
>>>>> Thanks for the report.
>>>>>
>>>>> I hit a similar problem with vectors and tracked it down to a 
>>>>> changes in Parse::do_call (disabled C->optimize_virtual_call() on 
>>>>> vectors). The bug is devirtualized non-inlined call can float 
>>>>> before the type check it depends on. It leads to a wrong method 
>>>>> being called when type check fails and manifests as a CCE in 
>>>>> interpreter after deoptimization (due to failed type check). It 
>>>>> usually happens when vector shapes change at runtime: C2 produces 
>>>>> a method specialized for some particular vector shape and then the 
>>>>> first time the method observes a different vector shape.
>>>>>
>>>>> The problem is specific to original intrinsics which rely on some 
>>>>> inlining tweaks (like in Parse::do_call) to make intrinsification 
>>>>> more reliable.
>>>>>
>>>>> I suggest you to try -XX:-UseVectorApiIntrinsics and check whether 
>>>>> the problem goes away.
>>>>>
>>>>> Generalized intrinsics will be used (where available) and they 
>>>>> shouldn't be prone to that problem (relevant code path in 
>>>>> Parse::do_call is used only for original intrinsics).
>>>>>
>>>>> Unfortunately, not all operations are covered by generalized 
>>>>> intrinsics yet. So, generated code quality may suffer as well.
>>>>>
>>>>> I didn't bother fixing the bug because the plan is to replace 
>>>>> original intrinsics with generalized ones and remove inlining tweaks.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> On 2/15/18 8:11 PM, Adam Pocock wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been working on more involved machine learning demos on top 
>>>>>> of the vector API. As part of that I built a binary search demo 
>>>>>> that searches n vectors at the same time using an n-wide SIMD. 
>>>>>> This keeps rebracketing the Mask from Integer to Float and back 
>>>>>> again as it searches through the arrays.
>>>>>>
>>>>>> Paul Sandoz has been helping me debug it, and when this code is 
>>>>>> run using C1 or lower it executes fine, but when it's recompiled 
>>>>>> with C2 (triggered by executing binarySearchCDF in a loop with 
>>>>>> the same arguments and a print statement, takes about 300-500 
>>>>>> iterations) it throws a ClassCastException (stack trace below). 
>>>>>> Turning off C2 with "-XX:TieredStopAtLevel=3" allows the loop to 
>>>>>> complete.
>>>>>>
>>>>>> Code:
>>>>>>
>>>>>>      public static <S extends Vector.Shape> IntVector<S> 
>>>>>> binarySearchCDF(IntSpecies<S> spec, float[][] input, int 
>>>>>> fromIndex, int toIndex, FloatVector<S> key) {
>>>>>>          IntVector<S> low = spec.broadcast(fromIndex);
>>>>>>          IntVector<S> high = spec.broadcast(toIndex - 1);
>>>>>>          IntVector<S> one = spec.broadcast(1);
>>>>>>
>>>>>>          Mask<Float,S> mask = key.species().trueMask();
>>>>>>
>>>>>>          int[] indicesBuffer = new int[key.length()];
>>>>>>          float[] valuesBuffer = new float[key.length()];
>>>>>>
>>>>>>          while (mask.anyTrue()) {
>>>>>>              IntVector<S> mid = 
>>>>>> low.add(high,mask.rebracket(Integer.class)).shiftR(1);
>>>>>>              mid.intoArray(indicesBuffer,0);
>>>>>>              for (int i = 0; i < valuesBuffer.length; i++) {
>>>>>>                  valuesBuffer[i] = input[i][indicesBuffer[i]];
>>>>>>              }
>>>>>>              FloatVector<S> values = 
>>>>>> key.species().fromArray(valuesBuffer,0);
>>>>>>
>>>>>>              Mask<Integer,S> lessThanKey = 
>>>>>> values.lessThan(key).and(mask).rebracket(Integer.class);
>>>>>>              low = low.blend(mid.add(one),lessThanKey);
>>>>>>              Mask<Integer,S> greaterThanKey = 
>>>>>> values.greaterThan(key).and(mask).rebracket(Integer.class);
>>>>>>              high = high.blend(mid.sub(one),greaterThanKey);
>>>>>>              Mask<Integer,S> equalsKey = 
>>>>>> values.equal(key).and(mask).rebracket(Integer.class);
>>>>>>              low = low.blend(mid,equalsKey);
>>>>>>              mask = 
>>>>>> mask.and(equalsKey.rebracket(Float.class).not());
>>>>>>              mask = 
>>>>>> mask.and(low.lessThan(high).rebracket(Float.class));
>>>>>>          }
>>>>>>
>>>>>>          return low;
>>>>>>      }
>>>>>>
>>>>>> Stack trace:
>>>>>>
>>>>>> Caused by: java.lang.ClassCastException: 
>>>>>> jdk.incubator.vector/jdk.incubator.vector.Int256Vector$Int256Mask 
>>>>>> cannot be cast to 
>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask 
>>>>>>
>>>>>>      at 
>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:488) 
>>>>>>
>>>>>>      at 
>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:430) 
>>>>>>
>>>>>>      at 
>>>>>> mlrg.topicmodel/com.oracle.labs.mlrg.topicmodel.util.vector.BinarySearch.binarySearchCDF(BinarySearch.java:37) 
>>>>>>
>>>>>>
>>>>>> Line 37 is:
>>>>>>              mask = 
>>>>>> mask.and(equalsKey.rebracket(Float.class).not());
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Adam
>>>>>>
>>>>
>>

-- 
Adam Pocock
Principal Member of Technical Staff
Machine Learning Research Group
Oracle Labs, Burlington, MA



More information about the panama-dev mailing list