Possible bug: compiler generating `ldc` instead of `sipush` and `iconst_m1` for characters in range [32768; 65535]

JARvis PROgrammer mrjarviscraft at gmail.com
Fri Aug 9 11:13:06 UTC 2019


Yes, have had it tested now with int casting and it does really produce
wrong results.
Thanks again for participation

пт, 9 авг. 2019 г., 13:20 Maurizio Cimadamore <
maurizio.cimadamore at oracle.com>:

> Actually, thinking more about this...
>
> I'm not convinced it is a bug.
>
> Consider the following simple example:
>
> class Foo {
>    public static int m(char c) {
>       return c;
>    }
>
>    public static void main(String[] args) {
>       System.out.println(m((char)-1));
>    }
> }
>
> This currently prints "65535". The bytecode for the invocation in 'main'
> is as follows:
>
> 3: ldc           #3                  // int 65535
> 5: invokestatic  #4                  // Method m:(C)I
>
> Now, if we were to replace the "ldc 65535" with a iconst_m1, the net
> effect would be to load a 32-bit wide -1 constant. So this example would
> end up printing -1, that is, no overflow/truncation behavior would be
> exposed. So, I think that the reason why 'less optimal' opcodes are used is
> precisely that: to respect the semantics of truncation vs. sign extension.
>
> sipush has a similar issue - as the JVMS says:
>
> "The immediate unsigned *byte1* and *byte2* values are assembled into an
> intermediate short, where the value of the short is (*byte1* << 8) |
> *byte2*. *The intermediate value is then sign-extended to an **int*
> *value*. That *value* is pushed onto the operand stack."
>
> So again, you have a sign extension, which will not respect the truncation
> semantics of the language. The instruction bipush is similar.
>
> So, bipush, sipush can only be safely use to handle chars if they handle
> positive values - if they are used with negative values, the sign extension
> embedded in the opcode would lead to the same results as using iconst_m1.
>
> The only way then to use these bytecodes in the "right" way would be to
> immediately follow the load instruction with a i2c instruction. So it's a
> trade off between number of instructions and constant pool pressure.
>
> In conlusion, I don't think anything needs to be changed here.
>
> Maurizio
> On 09/08/2019 00:52, Maurizio Cimadamore wrote:
>
> Yes, there seems to be some suboptimality here - which I have also been
> able to reproduce with other compilers (namely ecj).
>
> I guess the code didn't check for all the possible optimization
> opportunities - yes, values from 128 to 255 can still be encoded in a
> (negative) byte and values from 32768 to 65535 can be encoded in a
> (negative) short, so no ldc should be needed, at least in principle.
>
> In terms of perfomances though - this is hardly going to make any
> difference, as the JIT will easily see through the redundancy. An argument
> can be made though that we are wasting valuable constant pool estate when
> we can avoid it, so, IMHO that would be the main reason to fix something
> like this.
>
> Cheers
> Maurizio
> On 08/08/2019 18:16, JARvis PROgrammer wrote:
>
> Hi there, it seems that the javac compiler generates non-optimal bytecode
> for pushing values of (compile-time) type `char`.
> In order to test it I've first written a class which has a method
> accepting `char`s which writes them to a file (so that results can be
> easily compared):
>
> private static void consumeChar(final char character) throws IOException {
>     try (final BufferedWriter writer = new BufferedWriter(new FileWriter(new File("Javac.txt"), true))) {
>         writer.write(Character.toString(character));    }
> }
>
> And then added code to `main` which simply invokes this method for all
> corner cases and nearby char values {(char) 0, (char) 1, (char) 2, (char)
> 3, (char) 4, (char) 5, (char) 6, (char) 7, (char) 126, (char) 127, (char)
> 128, (char) 129, (char) 32766, (char) 32767, (char) 32768, (char) 32769,
> (char) 65534, (char) 65535}. After this the class was compiled using
> `javac` (both tested on release 11 and early-access 13).
> The resulting bytecode uses *iconst#* for range [(char) 0; (char) 5],
> *bipush* for range from [(char) 6; (char) 127], *sipush* for range
> [(char) 128; (char) 32767] and (strangely) *ldc* for range [(char) 32768;
> (char) 65535].
> In order to check if the strange behavour is not the only way to achieve
> pushing "big" `char` values onto the stack, I've recreated the similar
> class using ObjectWeb ASM using *iconst_m1* instruction for pusing
> ((char) 65535) and *sipush* (with negative values) for pushing values of
> range [(char) 32768; (char) 65534].
> The following function describes this behaviour:
>
> private static void pushChar(final MethodVisitor method, final int charCode) {
>     switch (charCode) {
>         case 65535: {
>             method.visitInsn(ICONST_M1);            break;        }
>         case 1: {
>             method.visitInsn(ICONST_1);            break;        }
>         case 2: {
>             method.visitInsn(ICONST_2);            break;        }
>         case 3: {
>             method.visitInsn(ICONST_3);            break;        }
>         case 4: {
>             method.visitInsn(ICONST_4);            break;        }
>         case 5: {
>             method.visitInsn(ICONST_5);            break;        } default: {
>             if (charCode <= Byte.MAX_VALUE) method.visitIntInsn(BIPUSH, charCode
>             else method.visitIntInsn(SIPUSH, (short) charCode);        }
>     }
> }
>
> (PS, in theory, *bipush* might also be aplicable for some values where
> *sipush* is used, but I didn't test it)
> The generated class once run via java (release 11 and early-access 13 were
> used) generated the file with content equal to the one of the
> javac-compiled class.
>
> And so, it seems to be a bug that `javac` uses *ldc* instruction instead
> of more primitive ones while those may handle all range of characters.
>
> All files used for test purposes (including javap call results) are
> attached to this message:
> JavacMain.java - source code of class compiled via javac
> JavacMain.class - compiled JavacMain.java (javac -g:none --enable-preview)
> JavacMain.javap - disassembled JavacMain.class (javac  -v -p -s -c)
> Javac.txt - file written by running JavacMain.class
>
> AsmMain_Generator.java - source code of class generating AsmMain.class
> AsmMain.class - generated replica of JavacMain.class using *iconst_m1* instruction
> for pusing ((char) 65535) and *sipush* (with negative values) for pushing
> values of range [(char) 32768; (char) 65534]
> Asm  Main.javap - disassembled  AsmMain.class (javac  -v -p -s -c)
> Asm.txt - file written by running AsmMain.class
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190809/ea125af0/attachment-0001.html>


More information about the compiler-dev mailing list