[aarch64-port-dev ] RFR: AARCH64: Changes to HotSpot shared code

Fri Nov 14 06:15:25 UTC 2014

On 11/13/2014 1:31 AM, Andrew Haley wrote:
> On 12/11/14 20:23, Dean Long wrote:
>> On 11/11/2014 11:02 AM, Andrew Haley wrote:
>>> http://cr.openjdk.java.net/~aph/aarch64-JDK-8064611/hotspot.patch
>>>
>>> Everything except cpu/ and os_cpu/.
>>>
>>> Most of this is obvious and trivial, with a few exceptions.
>>>
>>> In memory/metaspace.cpp, we allocated the memory for metadata in a
>>> different way.  This is because we want to be able to decode and
>>> encode compressed metadata pointers with a single instruction, and we
>>> can always do that iff the base address is of a particular form.
>>>
>>> In opto/, we have made some changes in order to be able to use AArch64
>>> store release instructions for volatile field stores.  These don't
>>> require leading or trailing barriers.  We have tried several times to
>>> do this without changing shared code, but it is impossible with the
>>> current back-end interface.
>> Is this something ppc64 can also take advantage of?  I hope Vladimir can
>> suggest
>> a more flexible way to do this, perhaps with a runtime flag.
> Perhaps so, but as far as I'm aware AArch64 is the only CPU with
> exactly these semantics.  From my point of view, it would be ideal if
> we simply emitted volatile store and volatile load as nodes and let
> the back end handle them.  But if we do that we lose the opportunity
> to coalesce barriers in C2 optimization.  Hmmm....  :-)
>
>>> In several places a release store is used where the AArch64 memory
>>> model makes it unnecessary.  From earlier emails on this list we
>>> discovered that the only architecture which requires this release
>>> store is IA64, and OpenJDK does not support it anyway.  We should
>>> perhaps look at re-engineering the way that memory barriers and memory
>>> accesses are handled in HotSpot with a view to pushing all these
>>> architecture-dependent assumptions out to the back ends.
>> I agree.  More comments below.
>>> Andrew.
>> c1_Canonicalizer.cpp
>>       Can this be handled in the back-end?  I imagine other platforms,
>> such as x86, have similar limitations.
> It certainly could be.  Maybe pd_valid_shift_count() ?  But I'm striving
> not to touch any other ports.
What I'm actually wondering is what happens if you remove the AARCH64 
log2_scale check
altogether.  As far as I can tell, it isn't needed, because in 
do_UnsafeGetRaw, the scale
is only used directly in the LIR_Address for X86 and ARM.  For other 
platforms, we do:

       LIR_Opr tmp = new_pointer_register();
       __ shift_left(index_op, log2_scale, tmp);
       addr = new LIR_Address(base_op, tmp, dst_type);

so you don't have to worry about a mis-scaled load on AARCH64.

>> c1_LIR.cpp
>>       It looks like you need a temp for convert because your backend
>> because you're checking the FPSR.
>>       What happens if you ignore the FPSR, do you get a wrong result?
> I've looked for a while, and I'm sorry but I don't understand which
> hunk this refers to.

This one, for lir_convert:

#if defined(PPC) || defined(TARGET_ARCH_aarch64)
       if (opConvert->_tmp1->is_valid()) do_temp(opConvert->_tmp1);
       if (opConvert->_tmp2->is_valid()) do_temp(opConvert->_tmp2);
#endif

I'm wondering if, for example, d2l in the back-end needs to check FPSCR.IOC?
If you get the correct result even if FPSCR.IOC is set, then you should 
be able to
simply ignore FPSCR.IOC.

dl

>> c1_LinearScan.cpp
>>       I'm not familiar with what the changed code is doing.  Can you
>> explain why it applies to x86 and aarch64?
> It certainly was at the time.  I'll investigate to see if this is
> still needed.
>
>> c1_Runtime1.cpp
>>       This will break our closed port that NOP instructions for
>>       patching.
> Ah, interesting.  I spent quite a lot of time kicking around ideas for
> C1 patching, but (to my surprise) deoptimizing instead didn't seem to
> have significant adverse effect.
>
>>       How about moving your deopt-instead-of-patch support
>>       into Runtime1::patch_code() and enable it with a read-only
>>       platform-specific developer runtime flag
>>       (see INTPRESSURE for example)?
> Okay.  I'll have a look at that.
>
>> compiledIC.hpp
>>       You should be able to use set_inst_mark()/cbuf.insts_mark() to set
>> and retrieve the mark address.
> Okay.
>
>> arguments.cpp
>>       I wish there was a way to fix ReservedCodeCacheSize in the back-end.
> Indeed.
>
> Thanks,
> Andrew.