[code-reflection] RFR: Model verifier [v2]

Fri Oct 11 17:39:39 UTC 2024

On Fri, 11 Oct 2024 07:10:14 GMT, Adam Sotona <asotona at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/reflect/code/interpreter/Interpreter.java line 424:
>> 
>>> 422:             MethodHandle mh = resolveToMethodHandle(il, co.invokeDescriptor(), co.invokeKind());
>>> 423:             Object[] values = o.operands().stream().map(oc::getValue).toArray();
>>> 424:             target = eraseInts(mh.type(), target, values);
>> 
>> Can you provide an example of why this is required? I think i can guess based on changes to the lifter. Can this impact other areas too like field and array access?
>> 
>> I think in general we have to be careful here. The properties of lifted models are having a broader impact. Arguably, this could be a code model transformation instead.
>> 
>> Terminology-wise "erase" is reserved for references. IIUC correctly we are operating on values of basic primitive types and i believe converting to them non-basic types based on corresponding method parameter types.
>
> Example is provided by the test: https://github.com/openjdk/babylon/blob/104ef7dda29d8f580b6f3f330436365764d0b3c7/test/jdk/java/lang/reflect/code/bytecode/TestLiftCustomBytecode.java#L74
> 
> Yes, it can impact also other areas and more similar Interpreter corrections might be necessary.
> 
> I'm aware of the impact of "erasing" int sub-types. We have to make a decision if all int sub-types are freely assignable (according to JVMS) or their assignability is strictly specified (as in JLS). This PR is a step toward the first option, where Interpreter should not fail on an attempt to call a method with short parameter when passing a boolean as an argument.
> 
> A transformation inserting explicit type conversions is the option if we decide to go the assignability-strict way. 
> 
> Exactly the same issue is with objects assignability. Should we allow to pass an argument declared as of  j.l.Object type to a method with other specific type parameter or should we inject explicit casts?

The interpreter is indeed "relaxed" about references and boxing conversions, opting to use Method/VarHandle.invoke rather than invokeExact etc, guarding at runtime with casts and boxing/unboxing conversions.

This does mean there is some possible ambiguity when building code models explicitly and invoking say method `m(int )` when the method's argument's type is unintentionally declared to be symbolically  an `Integer`. We don't differentiate because all runtime values are boxed and it is simpler not to track in more detail.

e.g.,

        CoreOp.FuncOp f = CoreOp.func("f", FunctionType.functionType(JavaType.VOID, JavaType.J_L_INTEGER))
                .body(b -> {
                    Block.Parameter i = b.parameters().get(0);
                    b.op(CoreOp.invoke(MethodRef.method(T.class, "m", void.class, int.class), i));
                    b.op(CoreOp._return());
                });
        System.out.println(f.toText());

        // Works, the Integer parameter is unboxed to an int argument
        Interpreter.invoke(MethodHandles.lookup(),
                f, 1);

Change to:

        CoreOp.FuncOp f = CoreOp.func("f", FunctionType.functionType(JavaType.VOID, JavaType.J_L_STRING))
                .body(b -> {
                    Block.Parameter i = b.parameters().get(0);
                    b.op(CoreOp.invoke(MethodRef.method(T.class, "m", void.class, int.class), i));
                    b.op(CoreOp._return());
                });
        System.out.println(f.toText());

And the interpreter will rightly fail in MethodHandle.asType conversion when interpreting the invoke operation.

We could make the interpreter perform stronger runtime checks according to the symbolic types thereby identify issues closer to the problem. Although perhaps that is a validation step? So far though I like the simplicity and using reflection to catch the issues.

However, I think basic narrowing primitive conversions (int to short etc) are a little different because they can in general be lossy and there is no failure, by design, in such cases. This strongly suggests to me that such implicit and lossy primitive conversions should be something that is not the default behaviour and we should opt in. Then that behaviour can be used for models lifted from bytecode that operate on basic primitive types.

In effect what we have now is this kind of transformation pipeline

  Java source -> Model-high -> Model-low -> bytecode -> Model-low-basic

Then ideally we round trip on:

  bytecode -> Model-low-basic -> bytecode -> Model-low-basic

And ideally we could also support reverse engineering to sharper types

  Model-low-basic -> Model-low

And we could of course support

  Model-low -> Model-low-basic

But what i am wary of right now is making Model-low become Model-low-basic or some blurring of two.

-------------

PR Review Comment: https://git.openjdk.org/babylon/pull/247#discussion_r1797253360