RFR: aarch32: softfp patch for template interpreter

Wed Oct 19 15:50:20 UTC 2016

Dear Yangyongyong, Ed, All,

(Feeling myself in a position of Bob V) would like to give you a small heads up on a fact that Azul’s almost finished our version of soft-float support. We’ve quickly taken a look at the changes made by Huawei and found that we’ve taken a bit different approach. Huawei guys were quicker but at the same time we have something to offer beyond the current state:
- soft-float support in c1
- VFP support in the binary compiled for soft-float. This is different to softfp: the binary should still be compiled with -mfloat-abi=soft, however if the VFP is present on the target system it’s used by both interpreter and the compiler
We’re finishing the test runs and should finalize the changes in the next very few days. Hope you’ll find them useful

Thanks,
Andrey

> On 19 Oct 2016, at 14:41, Edward Nevill <edward.nevill at gmail.com> wrote:
> 
> Hi yangyongyong,
> 
> On Mon, 2016-10-17 at 12:53 +0000, yangyongyong wrote:
>>  
>> Attached is a softfp patch for template interpreter. Two pairs of macros are defined in globalDefinitions_aarch32.hpp to isolate softfp code: HARD_FLOAT_CC/ SOFT_FLOAT_CC and HARD_FLOAT/ SOFT_FLOAT.
> 
> Attachments are stripped when sent to the OpenJDK mailing lists. If you want to post patches you should put them on a web site and post a link.
> 
> I have put your patch at
> 
> http://cr.openjdk.java.net/~enevill/softfp/161014-softfp.hg.diff
> 
> so others can review it.
> 
> I have looked through the patch and have a few comments.
> 
>> HARD_FLOAT_CC/SOFT_FLOAT_CC defines C calling convention. HARD_FLOAT are used to indicate that a platform can take advantage of hardfp instructions regardless of its calling convention, while SOFT_FLOAT means hardfp instructions are not supported or are disallowed to use.
>>  
>> Only three combinations are possible configurations for ARM platforms:
>> 1. HARD_FLOAT/ HARD_FLOAT_CC are defined when __ARM_PCS_VFP is defined.
>> 2. HARD_FLOAT/ SOFT_FLOAT_CC are defined when gcc runs with "-mfloat-abi=softfp" option.
>> 3. SOFT_FLOAT/ SOFT_FLOAT_CC when gcc runs with "-mfloat-abi=soft" option.
> 
> Some instructions on how to build the different variants would be useful.
> 
> Should I define -mfloat-abi=softfp or -mfloat-abi=soft in CFLAGS, or CXXFLAGS or should I use the --with-extra-cflags=XXX, or --with-extra-cxxflags=XXX. Should I build with --with-jvm-variant=core, or will it work if I build with --with-jvm-variant=client and then use -Xint to run only the template interpreter.
> 
> I can probably work this out myself, as can most readers here, but why should I? If you want people to build/test/review your patch give clear instructions on how to build with it.
> 
> Some comments on the patch itself
> 
> @@ -1733,6 +1742,18 @@
>    if(imm_instr(decode, Rd, Rn, imm, cond, s)) return;
>    if(imm_instr(cpart, Rd, Rn, -imm, cond, s)) return;
> 
> +  if (Rd == Rn) {
> +    // TODO: may use 4 adds here, need optimization.
> +    int i;
> +    for (i = 0;i < 4;i++) {
> +      uint32_t imm8 = (imm & (0xFF << (i * 8)));
> +      if (imm8 == 0)
> +        continue;
> +      bool ret = imm_instr(decode, Rd, Rd, imm8, cond, s);
> +      assert(ret, "should be");
> +    }
> +    return;
> +  }
> 
> 
> This is patching the following in assembler_aarch32.cpp
> 
> void Assembler::add_sub_imm(int decode, Register Rd, Register Rn, int imm,
>                             Condition cond, bool s) {
> 
> I assume you have added this code because in places you do call add_sub_imm with immediates which cannot be handled in a single instruction and with Rd==Rn.
> 
> However, this is not correct if the S bit is set. Eg.
> 
>  SUBS	Rd, Rd, #0xcafebabe
> 
> does not set the flags in the same way as
> 
>  SUBS	Rd, Rd, #0xbe
>  SUBS  Rd, Rd, #0xba00
>  SUBS	Rd, Rd, #0xfe0000
>  SUBS	Rd, Rd, #0xca000000
> 
> At the very least this needs an assert that you are not trying to set the flags (ie. s==false). However, I think it would be prefereable for the caller to pass in a temp which can be used in this case as it will generate better code using movw/movt/add. The temp can be option and set to noreg by default.
> 
> 
>    //TODO Misc instructions
> +  void clz(Register Rd, Register Rm, Condition cond = C_DFLT) {
> +    starti;
> +    f(cond, 31, 28), f(0b000101101111, 27, 16);
> +    rf(Rd, 12), f(0b11110001, 11, 4);
> +    rf(Rm, 0);
> +  }
> +
> 
> I can't find any usage of this clz method. Is there meant to be any?
> 
> +void swap_register(Register r1, Register r2) {
> +  eor(r1, r1, r2);
> +  eor(r2, r1, r2);
> +  eor(r1, r1, r2);
> +}
> 
> I dislike this use of eor to swap regs. It is possible to use a temp. mov tmp, r1; mov r1, r2; mov r2, tmp is clearer and may also be more efficient on some implementations.
> 
> 
> I am in the process of building and testing this which will take a few days as there are quite a few combinations (hardfp/softfp, core/client, release/debug) and I want to be sure that all combinations still work and nothing has been broken.
> 
> It would be really helpful if someone else could help review this patch as well as it is quite a big patch.
> 
> All the best,
> Ed.
>