Intrinsics
Gilles Duboscq
gilles.m.duboscq at oracle.com
Tue Sep 18 15:24:43 UTC 2018
Hi Martin,
One of the way to do that is to combine both solution and use "projection" nodes to model the "multiple outputs" part" while still emitting the whole code sequence at once.
In the graph you have
```
add128Node = Add128Node(long low1, long high1, long low2, long high2)
result[0] = Add128LowNode(add128Node) // this is a projection of Add128Node
result[1] = Add128HighNode(add128Node) // this is a projection of Add128Node
return = Add128CarryNode(add128Node) // this is a projection of Add128Node
````
In `Add128Node.generate`, you will need to generate a LIR Op that has 3 results:
```
class Add128Op extends AMD64LIRInstruction {
@Use({REG, STACK}) protected AllocatableValue low1; // TODO might need HINTs
@Use({REG, STACK}) protected AllocatableValue low2;
@Use({REG, STACK}) protected AllocatableValue high1;
@Use({REG, STACK}) protected AllocatableValue high2;
@Def({REG}) protected AllocatableValue lowResult;
@Def({REG}) protected AllocatableValue highResult;
@Def({REG}) protected AllocatableValue carryResult;
...
void emitCode(CompilationResultBuilder crb, AMD64MacroAssembler masm) {
// see AMD64Binary.CommutativeTwoOp#emitCode
AllocatableValue lowInput;
if (sameRegister(lowResult, low2)) {
lowInput = low1;
} else {
AMD64Move.move(crb, masm, lowResult, low1);
lowInput = low2;
}
// TODO deal with stack vs reg etc.
masm.add(asRegister(lowResult), asRegister(lowInput));
// TODO setup highInput, stack vs reg etc.
masm.adc(highResult, highInput);
AMD64ControlFlow.cmove(crb, masm, carryResult, false, ConditionFlag.CarrySet, false,
new ConstantValue(toRegisterKind(AMD64Kind.BYTE), JavaConstant.forBoolean(true)),
new ConstantValue(toRegisterKind(AMD64Kind.BYTE), JavaConstant.forBoolean(false)))
}
}
```
During `Add128Node.generate`, remember the values you used for `lowResult`, `highResult`, and `carryResult`:
```
AllocatableValue low1Value = tool.operand(low1);
...
this.lowResultValue = tool.getLIRGeneratorTool().newVariable(LIRKind.value(AMD64Kind.QWORD));
...
tool.setResult(this, tool.getLIRGeneratorTool().append(new Add128Op(
low1Value, low2Value, high1Value, high2Value,
lowResultValue, highResultValue, carrtResultValue)));
```
In `Add128LowNode.generate`, just do: `tool.setResult(this, getAdd128Node().getLowResultValue());`
I hope that helps.
Gilles
On 14/09/18 20:17, Martin Traverso wrote:
> Hi,
>
> I'm playing around with Graal, and as an experiment, I'm trying to see what
> it would take to intrinsify some operations to do math on 128-bit values.
>
> I have a method with the following signature:
>
> boolean add128(long low1, long high1, long low2, long high2, long[]
> result)
>
> It computes the sum of two 128-bit integers encoded in two longs each and
> stores the result in the 2-element array that's provided via the last
> argument. It returns true if the sum overflows.
>
> I'd like to emit the equivalent of the following assembly pseudocode:
>
> result[0] = ADD low1 low2
> result[1] = ADC high1 high2
> return = (carry == 1)
>
> From what I gathered so far, I should add a new node (e.g., Add128Node) and
> register a builder a graph builder plugin that swaps invocations to that
> method with the new node.
>
> But that's where I'm getting stuck. Two paths I've started exploring:
> 1. Lower the Add128Node into operations that perform the sums of the high
> vs low parts (e.g., Add128LowNode, Add128HighNode), do the assignments to
> the resulting array, etc. This would seem to require modeling operations
> that produce multiple outputs (low + low produces one value + carry). Is
> this even possible?
> 2. Make Add128Node LIRLowerable and generate the whole sequence of
> low-level operations in one shot. I'm not sure how the assignments to the
> output array and return value would fit here, though.
>
> I'm sure I'm missing something obvious, so I appreciate any pointers or
> suggestions. Are there similar examples I can draw inspiration from?
>
> Thanks,
> Martin
>
More information about the graal-dev
mailing list