[vector] ClassCastException in C2 with Mask rebracketing

Wed Feb 21 20:59:11 UTC 2018

Ok. I'm trying to work around this issue by turning off C2 compilation 
of the binarySearchCDF method, so I can do correctness testing on the 
rest of the SIMD code, and maybe put it in a profiler to see how many 
cache misses I'm causing etc.

I'm using "-XX:CompilerDirectivesFile=filename" pointed at a directives 
file containing:

[
{
     match: 
"com/oracle/labs/mlrg/topicmodel/util/vector/BinarySearch.binarySearchCDF*",
     c2: {
             Exclude: true,
     },
},
{
     match: "*Float256Vector$Float256Mask.rebracket*",
     c2: {
             Exclude: true,
     },
},
{
     match: "*Int256Vector$Int256Mask.rebracket*",
     c2: {
             Exclude: true,
     },
},
]

but all I've managed to do is move around the place where I get a 
ClassCastException. It's still in the binarySearch but it moves to 
different rebracket operations. I did once manage to move it to a 
completely incomprehensible place where it dies with ClassCastException 
from Float256Vector to Int256Vector, in a function which only takes 
IntVector arguments, but that was after the binary search call and I 
suspect something weird is happening.

Also now when I turn off the vector intrinsics with 
"-XX:-UseVectorApiIntrinsics" and remove the compiler directives, 
Hotspot core dumps midway through execution. It runs past that point if 
I turn off C2 with "-XX:TieredStopAtLevel=3".

Feb 21, 2018 3:53:08 PM com.oracle.labs.mlrg.topicmodel.model.train.SSCA 
train
INFO: Iteration 0
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (compile.cpp:2695), pid=21949, tid=21976
#  Error: fatal error
#
# JRE version: OpenJDK Runtime Environment (11.0) (build 
11-internal+0-adhoc.apocock.panama)
# Java VM: OpenJDK 64-Bit Server VM (11-internal+0-adhoc.apocock.panama, 
mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be 
processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to 
/local/Repositories/TopicModel/reuters-test/core.21949)

Should I just wait till the generalised intrinsics support has fully 
landed? Is there some Hotspot incantation which will let me test out 
areas of the code which don't use casting.

Thanks,

Adam

On 16/02/18 18:14, Vladimir Ivanov wrote:
>
>> Yep, the problem went away with "-XX:-UseVectorApiIntrinsics". My 
>> vector shapes are unchanging though, everything is fixed at S256Bit 
>> as that's what I've got on my desktop. I thought the rebracket for 
>> things like masks were supposed to be optimised out, as a Float 256 
>> Mask is the same bit string as an Integer 256 Mask.
>
> Yes, rebracketing turns into an no-op when boxes go away. Otherwise, 
> the operand has to be boxed/reboxed.
>
>> Any idea when the generalized intrinsics will land?
>
> The first batch is already there, but no exact dates when the rest 
> follow.
>
> Best regards,
> Vladimir Ivanov
>
>> On 15/02/18 12:50, Vladimir Ivanov wrote:
>>> Adam,
>>>
>>> Thanks for the report.
>>>
>>> I hit a similar problem with vectors and tracked it down to a 
>>> changes in Parse::do_call (disabled C->optimize_virtual_call() on 
>>> vectors). The bug is devirtualized non-inlined call can float before 
>>> the type check it depends on. It leads to a wrong method being 
>>> called when type check fails and manifests as a CCE in interpreter 
>>> after deoptimization (due to failed type check). It usually happens 
>>> when vector shapes change at runtime: C2 produces a method 
>>> specialized for some particular vector shape and then the first time 
>>> the method observes a different vector shape.
>>>
>>> The problem is specific to original intrinsics which rely on some 
>>> inlining tweaks (like in Parse::do_call) to make intrinsification 
>>> more reliable.
>>>
>>> I suggest you to try -XX:-UseVectorApiIntrinsics and check whether 
>>> the problem goes away.
>>>
>>> Generalized intrinsics will be used (where available) and they 
>>> shouldn't be prone to that problem (relevant code path in 
>>> Parse::do_call is used only for original intrinsics).
>>>
>>> Unfortunately, not all operations are covered by generalized 
>>> intrinsics yet. So, generated code quality may suffer as well.
>>>
>>> I didn't bother fixing the bug because the plan is to replace 
>>> original intrinsics with generalized ones and remove inlining tweaks.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 2/15/18 8:11 PM, Adam Pocock wrote:
>>>> Hi,
>>>>
>>>> I've been working on more involved machine learning demos on top of 
>>>> the vector API. As part of that I built a binary search demo that 
>>>> searches n vectors at the same time using an n-wide SIMD. This 
>>>> keeps rebracketing the Mask from Integer to Float and back again as 
>>>> it searches through the arrays.
>>>>
>>>> Paul Sandoz has been helping me debug it, and when this code is run 
>>>> using C1 or lower it executes fine, but when it's recompiled with 
>>>> C2 (triggered by executing binarySearchCDF in a loop with the same 
>>>> arguments and a print statement, takes about 300-500 iterations) it 
>>>> throws a ClassCastException (stack trace below). Turning off C2 
>>>> with "-XX:TieredStopAtLevel=3" allows the loop to complete.
>>>>
>>>> Code:
>>>>
>>>>      public static <S extends Vector.Shape> IntVector<S> 
>>>> binarySearchCDF(IntSpecies<S> spec, float[][] input, int fromIndex, 
>>>> int toIndex, FloatVector<S> key) {
>>>>          IntVector<S> low = spec.broadcast(fromIndex);
>>>>          IntVector<S> high = spec.broadcast(toIndex - 1);
>>>>          IntVector<S> one = spec.broadcast(1);
>>>>
>>>>          Mask<Float,S> mask = key.species().trueMask();
>>>>
>>>>          int[] indicesBuffer = new int[key.length()];
>>>>          float[] valuesBuffer = new float[key.length()];
>>>>
>>>>          while (mask.anyTrue()) {
>>>>              IntVector<S> mid = 
>>>> low.add(high,mask.rebracket(Integer.class)).shiftR(1);
>>>>              mid.intoArray(indicesBuffer,0);
>>>>              for (int i = 0; i < valuesBuffer.length; i++) {
>>>>                  valuesBuffer[i] = input[i][indicesBuffer[i]];
>>>>              }
>>>>              FloatVector<S> values = 
>>>> key.species().fromArray(valuesBuffer,0);
>>>>
>>>>              Mask<Integer,S> lessThanKey = 
>>>> values.lessThan(key).and(mask).rebracket(Integer.class);
>>>>              low = low.blend(mid.add(one),lessThanKey);
>>>>              Mask<Integer,S> greaterThanKey = 
>>>> values.greaterThan(key).and(mask).rebracket(Integer.class);
>>>>              high = high.blend(mid.sub(one),greaterThanKey);
>>>>              Mask<Integer,S> equalsKey = 
>>>> values.equal(key).and(mask).rebracket(Integer.class);
>>>>              low = low.blend(mid,equalsKey);
>>>>              mask = mask.and(equalsKey.rebracket(Float.class).not());
>>>>              mask = 
>>>> mask.and(low.lessThan(high).rebracket(Float.class));
>>>>          }
>>>>
>>>>          return low;
>>>>      }
>>>>
>>>> Stack trace:
>>>>
>>>> Caused by: java.lang.ClassCastException: 
>>>> jdk.incubator.vector/jdk.incubator.vector.Int256Vector$Int256Mask 
>>>> cannot be cast to 
>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask
>>>>      at 
>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:488) 
>>>>
>>>>      at 
>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:430) 
>>>>
>>>>      at 
>>>> mlrg.topicmodel/com.oracle.labs.mlrg.topicmodel.util.vector.BinarySearch.binarySearchCDF(BinarySearch.java:37) 
>>>>
>>>>
>>>> Line 37 is:
>>>>              mask = mask.and(equalsKey.rebracket(Float.class).not());
>>>>
>>>> Thanks,
>>>>
>>>> Adam
>>>>
>>

-- 
Adam Pocock
Principal Member of Technical Staff
Machine Learning Research Group
Oracle Labs, Burlington, MA