That seems likely; we are aiming in both of those directions,
to support direct programming with AVX-type registers (not
just x86 specific, by the way) and general support for by-value
return of structured objects (starting with "minimal value types").

For now, though, every method is limited to at most 64 bits
of return value "payload", which means that 128-bit operations
need to be split into two method calls, or else buffer their
result into a temp object (e.g., array).  The JIT knows how
to combine two intrinsic calls into a single machine operation,
in some very limited circumstances, notably the classic
"div/rem" instructions.  This technique would probably work
for 64-to-128 multiplies.  (Also AES-128, by the way.)

