[vector] ClassCastException in C2 with Mask rebracketing
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Mon Feb 26 18:18:55 UTC 2018
Adam,
Can you, please, run the test with fastdebug build?
There should be additional debugging output printed right before the crash:
2690 } else {
2691 #ifndef PRODUCT
2692 tty->print_cr("vbox"); vbox->dump(3);
2693 tty->print_cr("vect"); vbox->dump(3);
2694 #endif // PRODUCT
2695 fatal("");
2696 return NULL;
2697 }
Regarding division-by-zero bug, it looks like a problem with
initializing vector store into newly created box. I'll try to spot the bug.
Best regards,
Vladimir Ivanov
On 2/23/18 6:10 PM, Adam Pocock wrote:
> Hi Vladimir,
>
> That patch does fix the issue in the binary search. Your patch didn't
> apply cleanly on top of my source tree + Razvan's patch (I pulled just
> before applying both Razvan & your patches), but it seems to be ok. I
> got "1 out of 20 hunks FAILED -- saving rejects to file
> src/hotspot/share/opto/library_call.cpp.rej", the two functions
> LibraryCallKit::addMasking and LibraryCallKit::inline_bin_vector_op
> didn't remove cleanly, so I just commented them out and applied the
> remainder of that hunk manually (which changed the call signature of
> LibraryCallKit::inline_un_vector_op).
>
> I returned to running the application which uses the binary search, and
> now it's just core dumping on me at compile.cpp line 2695 (with vector
> intrinsics turned on). With them turned off it occasionally gives me an
> ArithmeticException (divide by zero) out of IntVector.floorMod, but the
> argument to floorMod is a IntSpecies.broadcast(arg). I made the argument
> to the broadcast a final field (which is set to 60 on construction) to
> ensure it wasn't modified (which it wasn't by my code) and now the
> exception is dependent on whether I put a print statement before the
> floorMod call to check it's still 60 (with print statement, no
> exception, without print statement ArithmeticException dividing by
> zero). With the print statement it then gets into an infinite loop in
> the binary search function, but this infinite loop happens before the
> arithmetic exception would happen in the version without (as I have a
> print statement at the top counting the number of documents it's
> processed, and it's variable when it loops infinitely, but it happens
> before the ArithmeticException).
>
> Stack trace for the ArithmeticException:
> Caused by: java.lang.ArithmeticException: / by zero
> at
> jdk.incubator.vector/jdk.incubator.vector.IntVector.lambda$floorMod$35(IntVector.java:317)
>
> at
> jdk.incubator.vector/jdk.incubator.vector.Int256Vector.bOp(Int256Vector.java:89)
>
> at
> jdk.incubator.vector/jdk.incubator.vector.Int256Vector.bOp(Int256Vector.java:33)
>
> at
> jdk.incubator.vector/jdk.incubator.vector.IntVector.floorMod(IntVector.java:317)
>
> at
> com.oracle.labs.mlrg.topicmodel.util.vector.VectorRNG.nextInt(VectorRNG.java:142)
>
>
> Attaching IntelliJ's debugger to it causes the JVM to dump core no
> matter if it's using vector intrinsics or not, but I haven't tried
> setting IntelliJ's JVM to panama, I don't know if that will have an effect.
>
> Testing just the vectorRNG.nextInt call indicates that C2 is failing to
> compile it properly with "-XX:-UseVectorApiIntrinsics", as it gives the
> ArithmeticException when I ask for 1 million random numbers (dies around
> the 10000th iteration), however when using the intrinsics and/or with
> -XX:TieredStopAtLevel=3 it completes all the iterations.
>
> I'm not sure what other things I should try.
>
> Thanks,
>
> Adam
>
>
>
> On 23/02/18 07:17, Vladimir Ivanov wrote:
>> Adam,
>>
>> Please, try the following patch (on top of the one Razvan sent out
>> yesterday [1]):
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics/webrev.07
>>
>>
>> It fixes CCE bug for me.
>>
>> Regarding generated code quality, I see issues with vector box
>> elimination and they are caused by:
>> (1) no intrinsic for Mask.not() yet [2]
>>
>> (2) interface calls hinders the analysis by wrappign VectorBox nodes
>> into CheckCastPP which hide exact class info by casting them to
>> interfaces
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1]
>> http://cr.openjdk.java.net/~rlupusoru/panama/webrev_maskboxingavx512_02/index.html
>>
>>
>> [2] @ 259 jdk.incubator.vector.AbstractMask::not (5 bytes) inline (hot)
>> \-> TypeProfile (721/721 counts) =
>> jdk/incubator/vector/Float256Vector$Float256Mask
>> @ 1 jdk.incubator.vector.AbstractMask::not (10 bytes) inline
>> (hot)
>> @ 1
>> java.lang.invoke.LambdaForm$MH/100555887::linkToTargetMethod (8 bytes)
>> force inline by annotation
>> @ 4 java.lang.invoke.LambdaForm$MH/611437735::invoke (8
>> bytes) force inline by annotation
>> @ 6 jdk.incubator.vector.Float256Vector$Float256Mask::uOp (6
>> bytes) inline (hot)
>> @ 2 jdk.incubator.vector.Float256Vector$Float256Mask::uOp
>> (61 bytes) too big
>>
>> [3]
>>
>> On 2/21/18 11:59 PM, Adam Pocock wrote:
>>> Ok. I'm trying to work around this issue by turning off C2
>>> compilation of the binarySearchCDF method, so I can do correctness
>>> testing on the rest of the SIMD code, and maybe put it in a profiler
>>> to see how many cache misses I'm causing etc.
>>>
>>> I'm using "-XX:CompilerDirectivesFile=filename" pointed at a
>>> directives file containing:
>>>
>>> [
>>> {
>>> match:
>>> "com/oracle/labs/mlrg/topicmodel/util/vector/BinarySearch.binarySearchCDF*",
>>>
>>> c2: {
>>> Exclude: true,
>>> },
>>> },
>>> {
>>> match: "*Float256Vector$Float256Mask.rebracket*",
>>> c2: {
>>> Exclude: true,
>>> },
>>> },
>>> {
>>> match: "*Int256Vector$Int256Mask.rebracket*",
>>> c2: {
>>> Exclude: true,
>>> },
>>> },
>>> ]
>>>
>>> but all I've managed to do is move around the place where I get a
>>> ClassCastException. It's still in the binarySearch but it moves to
>>> different rebracket operations. I did once manage to move it to a
>>> completely incomprehensible place where it dies with
>>> ClassCastException from Float256Vector to Int256Vector, in a function
>>> which only takes IntVector arguments, but that was after the binary
>>> search call and I suspect something weird is happening.
>>>
>>> Also now when I turn off the vector intrinsics with
>>> "-XX:-UseVectorApiIntrinsics" and remove the compiler directives,
>>> Hotspot core dumps midway through execution. It runs past that point
>>> if I turn off C2 with "-XX:TieredStopAtLevel=3".
>>>
>>> Feb 21, 2018 3:53:08 PM
>>> com.oracle.labs.mlrg.topicmodel.model.train.SSCA train
>>> INFO: Iteration 0
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> # Internal Error (compile.cpp:2695), pid=21949, tid=21976
>>> # Error: fatal error
>>> #
>>> # JRE version: OpenJDK Runtime Environment (11.0) (build
>>> 11-internal+0-adhoc.apocock.panama)
>>> # Java VM: OpenJDK 64-Bit Server VM
>>> (11-internal+0-adhoc.apocock.panama, mixed mode, tiered, compressed
>>> oops, g1 gc, linux-amd64)
>>> # Core dump will be written. Default location: Core dumps may be
>>> processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping
>>> to /local/Repositories/TopicModel/reuters-test/core.21949)
>>>
>>> Should I just wait till the generalised intrinsics support has fully
>>> landed? Is there some Hotspot incantation which will let me test out
>>> areas of the code which don't use casting.
>>>
>>> Thanks,
>>>
>>> Adam
>>>
>>> On 16/02/18 18:14, Vladimir Ivanov wrote:
>>>>
>>>>> Yep, the problem went away with "-XX:-UseVectorApiIntrinsics". My
>>>>> vector shapes are unchanging though, everything is fixed at S256Bit
>>>>> as that's what I've got on my desktop. I thought the rebracket for
>>>>> things like masks were supposed to be optimised out, as a Float 256
>>>>> Mask is the same bit string as an Integer 256 Mask.
>>>>
>>>> Yes, rebracketing turns into an no-op when boxes go away. Otherwise,
>>>> the operand has to be boxed/reboxed.
>>>>
>>>>> Any idea when the generalized intrinsics will land?
>>>>
>>>> The first batch is already there, but no exact dates when the rest
>>>> follow.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> On 15/02/18 12:50, Vladimir Ivanov wrote:
>>>>>> Adam,
>>>>>>
>>>>>> Thanks for the report.
>>>>>>
>>>>>> I hit a similar problem with vectors and tracked it down to a
>>>>>> changes in Parse::do_call (disabled C->optimize_virtual_call() on
>>>>>> vectors). The bug is devirtualized non-inlined call can float
>>>>>> before the type check it depends on. It leads to a wrong method
>>>>>> being called when type check fails and manifests as a CCE in
>>>>>> interpreter after deoptimization (due to failed type check). It
>>>>>> usually happens when vector shapes change at runtime: C2 produces
>>>>>> a method specialized for some particular vector shape and then the
>>>>>> first time the method observes a different vector shape.
>>>>>>
>>>>>> The problem is specific to original intrinsics which rely on some
>>>>>> inlining tweaks (like in Parse::do_call) to make intrinsification
>>>>>> more reliable.
>>>>>>
>>>>>> I suggest you to try -XX:-UseVectorApiIntrinsics and check whether
>>>>>> the problem goes away.
>>>>>>
>>>>>> Generalized intrinsics will be used (where available) and they
>>>>>> shouldn't be prone to that problem (relevant code path in
>>>>>> Parse::do_call is used only for original intrinsics).
>>>>>>
>>>>>> Unfortunately, not all operations are covered by generalized
>>>>>> intrinsics yet. So, generated code quality may suffer as well.
>>>>>>
>>>>>> I didn't bother fixing the bug because the plan is to replace
>>>>>> original intrinsics with generalized ones and remove inlining tweaks.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> On 2/15/18 8:11 PM, Adam Pocock wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been working on more involved machine learning demos on top
>>>>>>> of the vector API. As part of that I built a binary search demo
>>>>>>> that searches n vectors at the same time using an n-wide SIMD.
>>>>>>> This keeps rebracketing the Mask from Integer to Float and back
>>>>>>> again as it searches through the arrays.
>>>>>>>
>>>>>>> Paul Sandoz has been helping me debug it, and when this code is
>>>>>>> run using C1 or lower it executes fine, but when it's recompiled
>>>>>>> with C2 (triggered by executing binarySearchCDF in a loop with
>>>>>>> the same arguments and a print statement, takes about 300-500
>>>>>>> iterations) it throws a ClassCastException (stack trace below).
>>>>>>> Turning off C2 with "-XX:TieredStopAtLevel=3" allows the loop to
>>>>>>> complete.
>>>>>>>
>>>>>>> Code:
>>>>>>>
>>>>>>> public static <S extends Vector.Shape> IntVector<S>
>>>>>>> binarySearchCDF(IntSpecies<S> spec, float[][] input, int
>>>>>>> fromIndex, int toIndex, FloatVector<S> key) {
>>>>>>> IntVector<S> low = spec.broadcast(fromIndex);
>>>>>>> IntVector<S> high = spec.broadcast(toIndex - 1);
>>>>>>> IntVector<S> one = spec.broadcast(1);
>>>>>>>
>>>>>>> Mask<Float,S> mask = key.species().trueMask();
>>>>>>>
>>>>>>> int[] indicesBuffer = new int[key.length()];
>>>>>>> float[] valuesBuffer = new float[key.length()];
>>>>>>>
>>>>>>> while (mask.anyTrue()) {
>>>>>>> IntVector<S> mid =
>>>>>>> low.add(high,mask.rebracket(Integer.class)).shiftR(1);
>>>>>>> mid.intoArray(indicesBuffer,0);
>>>>>>> for (int i = 0; i < valuesBuffer.length; i++) {
>>>>>>> valuesBuffer[i] = input[i][indicesBuffer[i]];
>>>>>>> }
>>>>>>> FloatVector<S> values =
>>>>>>> key.species().fromArray(valuesBuffer,0);
>>>>>>>
>>>>>>> Mask<Integer,S> lessThanKey =
>>>>>>> values.lessThan(key).and(mask).rebracket(Integer.class);
>>>>>>> low = low.blend(mid.add(one),lessThanKey);
>>>>>>> Mask<Integer,S> greaterThanKey =
>>>>>>> values.greaterThan(key).and(mask).rebracket(Integer.class);
>>>>>>> high = high.blend(mid.sub(one),greaterThanKey);
>>>>>>> Mask<Integer,S> equalsKey =
>>>>>>> values.equal(key).and(mask).rebracket(Integer.class);
>>>>>>> low = low.blend(mid,equalsKey);
>>>>>>> mask =
>>>>>>> mask.and(equalsKey.rebracket(Float.class).not());
>>>>>>> mask =
>>>>>>> mask.and(low.lessThan(high).rebracket(Float.class));
>>>>>>> }
>>>>>>>
>>>>>>> return low;
>>>>>>> }
>>>>>>>
>>>>>>> Stack trace:
>>>>>>>
>>>>>>> Caused by: java.lang.ClassCastException:
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Int256Vector$Int256Mask
>>>>>>> cannot be cast to
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask
>>>>>>>
>>>>>>> at
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:488)
>>>>>>>
>>>>>>> at
>>>>>>> jdk.incubator.vector/jdk.incubator.vector.Float256Vector$Float256Mask.and(Float256Vector.java:430)
>>>>>>>
>>>>>>> at
>>>>>>> mlrg.topicmodel/com.oracle.labs.mlrg.topicmodel.util.vector.BinarySearch.binarySearchCDF(BinarySearch.java:37)
>>>>>>>
>>>>>>>
>>>>>>> Line 37 is:
>>>>>>> mask =
>>>>>>> mask.and(equalsKey.rebracket(Float.class).not());
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Adam
>>>>>>>
>>>>>
>>>
>
More information about the panama-dev
mailing list