Possible bug: compiler generating `ldc` instead of `sipush` and `iconst_m1` for characters in range [32768; 65535]

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Aug 9 10:20:32 UTC 2019


Actually, thinking more about this...

I'm not convinced it is a bug.

Consider the following simple example:

class Foo {
    public static int m(char c) {
       return c;
    }

    public static void main(String[] args) {
       System.out.println(m((char)-1));
    }
}

This currently prints "65535". The bytecode for the invocation in 'main' 
is as follows:

3: ldc           #3                  // int 65535
5: invokestatic  #4                  // Method m:(C)I

Now, if we were to replace the "ldc 65535" with a iconst_m1, the net 
effect would be to load a 32-bit wide -1 constant. So this example would 
end up printing -1, that is, no overflow/truncation behavior would be 
exposed. So, I think that the reason why 'less optimal' opcodes are used 
is precisely that: to respect the semantics of truncation vs. sign 
extension.

sipush has a similar issue - as the JVMS says:

"The immediate unsigned /byte1/ and /byte2/ values are assembled into an 
intermediate |short|, where the value of the |short| is (/byte1/ |<<| 8) 
| /byte2/. *The intermediate value is then sign-extended to an 
**|int|****/value/*. That /value/ is pushed onto the operand stack."

So again, you have a sign extension, which will not respect the 
truncation semantics of the language. The instruction bipush is similar.

So, bipush, sipush can only be safely use to handle chars if they handle 
positive values - if they are used with negative values, the sign 
extension embedded in the opcode would lead to the same results as using 
iconst_m1.

The only way then to use these bytecodes in the "right" way would be to 
immediately follow the load instruction with a i2c instruction. So it's 
a trade off between number of instructions and constant pool pressure.

In conlusion, I don't think anything needs to be changed here.

Maurizio

On 09/08/2019 00:52, Maurizio Cimadamore wrote:
>
> Yes, there seems to be some suboptimality here - which I have also 
> been able to reproduce with other compilers (namely ecj).
>
> I guess the code didn't check for all the possible optimization 
> opportunities - yes, values from 128 to 255 can still be encoded in a 
> (negative) byte and values from 32768 to 65535 can be encoded in a 
> (negative) short, so no ldc should be needed, at least in principle.
>
> In terms of perfomances though - this is hardly going to make any 
> difference, as the JIT will easily see through the redundancy. An 
> argument can be made though that we are wasting valuable constant pool 
> estate when we can avoid it, so, IMHO that would be the main reason to 
> fix something like this.
>
> Cheers
> Maurizio
>
> On 08/08/2019 18:16, JARvis PROgrammer wrote:
>> Hi there, it seems that the javac compiler generates non-optimal 
>> bytecode for pushing values of (compile-time) type `char`.
>> In order to test it I've first written a class which has a method 
>> accepting `char`s which writes them to a file (so that results can be 
>> easily compared):
>> private static void consumeChar(final char character)throws IOException {
>>      try (final BufferedWriter writer =new BufferedWriter(new FileWriter(new File("Javac.txt"), true))) {
>>          writer.write(Character.toString(character)); }
>> }
>> And then added code to `main` which simply invokes this method for 
>> all corner cases and nearby char values {(char) 0, (char) 1, (char) 
>> 2, (char) 3, (char) 4, (char) 5, (char) 6, (char) 7, (char) 126, 
>> (char) 127, (char) 128, (char) 129, (char) 32766, (char) 32767, 
>> (char) 32768, (char) 32769, (char) 65534, (char) 65535}. After this 
>> the class was compiled using `javac` (both tested on release 11 and 
>> early-access 13).
>> The resulting bytecode uses /iconst#/ for range [(char) 0; (char) 5], 
>> /bipush/ for range from [(char) 6; (char) 127], /sipush/ for range 
>> [(char) 128; (char) 32767] and (strangely) /ldc/ for range [(char) 
>> 32768; (char) 65535].
>> In order to check if the strange behavour is not the only way to 
>> achieve pushing "big" `char` values onto the stack, I've recreated 
>> the similar class using ObjectWeb ASM using /iconst_m1/ instruction 
>> for pusing ((char) 65535) and /sipush/ (with negative values) for 
>> pushing values of range [(char) 32768; (char) 65534].
>> The following function describes this behaviour:
>> private static void pushChar(final MethodVisitor method, final int charCode) {
>>      switch (charCode) {
>>          case 65535: {
>>              method.visitInsn(ICONST_M1);break;}
>>          case 1: {
>>              method.visitInsn(ICONST_1);break;}
>>          case 2: {
>>              method.visitInsn(ICONST_2);break;}
>>          case 3: {
>>              method.visitInsn(ICONST_3);break;}
>>          case 4: {
>>              method.visitInsn(ICONST_4);break;}
>>          case 5: {
>>              method.visitInsn(ICONST_5);break;}default: {
>>              if (charCode <= Byte.MAX_VALUE) method.visitIntInsn(BIPUSH, charCode
>>              else method.visitIntInsn(SIPUSH, (short) charCode);}
>>      }
>> }
>> (PS, in theory, /bipush/ might also be aplicable for some values 
>> where /sipush/ is used, but I didn't test it)
>> The generated class once run via java (release 11 and early-access 13 
>> were used) generated the file with content equal to the one of the 
>> javac-compiled class.
>>
>> And so, it seems to be a bug that `javac` uses /ldc/ instruction 
>> instead of more primitive ones while those may handle all range of 
>> characters.
>>
>> All files used for test purposes (including javap call results) are 
>> attached to this message:
>> JavacMain.java - source code of class compiled via javac
>> JavacMain.class - compiled JavacMain.java (javac -g:none 
>> --enable-preview)
>> JavacMain.javap - disassembled JavacMain.class (javac  -v -p -s -c)
>> Javac.txt - file written by running JavacMain.class
>>
>> AsmMain_Generator.java - source code of class generating AsmMain.class
>> AsmMain.class - generated replica of JavacMain.class using 
>> /iconst_m1/ instruction for pusing ((char) 65535) and /sipush/ (with 
>> negative values) for pushing values of range [(char) 32768; (char) 65534]
>> Asm  Main.javap - disassembled  AsmMain.class (javac  -v -p -s -c)
>> Asm.txt - file written by running AsmMain.class
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190809/3fea3edb/attachment-0001.html>


More information about the compiler-dev mailing list