PROPOSAL: Binary Literals

Derek Foster vapor1 at teleport.com
Wed Mar 25 11:06:03 PDT 2009


Hmm. Second try at sending to the list. Let's see if this works. (In the
meantime, I noticed that Bruce Chapman has mentioned something similar in his
another proposal, so I think we are in agreement on this. This proposal
should not be taken as to compete with his similar proposal: I'd quite like
to see type suffixes for bytes, shorts, etc. added to Java, in addition to
binary literals.) Anyway...




Add binary literals to Java.

AUTHOR(S): Derek Foster

OVERVIEW

In some programming domains, use of binary numbers (typically as bitmasks,
bit-shifts, etc.) is very common. However, Java code, due to its C heritage,
has traditionally forced programmers to represent numbers in only decimal,
octal, or hexadecimal. (In practice, octal is rarely used, and is present
mostly for backwards compatibility with C)

When the data being dealt with is fundamentally bit-oriented, however, using
hexadecimal to represent ranges of bits requires an extra degree of
translation for the programmer, and this can often become a source of errors.
For instance, if a technical specification lists specific values of interest
in binary (for example, in a compression encoding algorithm or in the
specifications for a network protocol, or for communicating with a bitmapped
hardware device) then a programmer coding to that specification must
translate each such value from its binary representation into hexadecimal.
Checking to see if this translation has been done correctly is accomplished
by back-translating the numbers. In most cases, programmers do these
translations in their heads, and HOPEFULLY get them right. however, errors
can easily creep in, and re-verifying the results is not straightforward
enough to be done frequently.

Furthermore, in many cases, the binary representations of numbers makes it
much more clear what is actually intended than the hexadecimal one. For
instance, this:

private static final int BITMASK = 0x1E;

does not immediately make it clear that the bitmask being declared comprises
a single contiguous range of four bits.

In many cases, it would be more natural for the programmer to be able to
write the numbers in binary in the source code, eliminating the need for
manual translation to hexadecimal entirely.


FEATURE SUMMARY:

In addition to the existing "1" (decimal), "01" (octal) and "0x1"
(hexadecimal) form of specifying numeric literals, a new form "0b1" (binary)
would be added.

Note that this is the same syntax as has been used as an extension by the GCC
C/C++ compilers for many years, and also is used in the Ruby language, as
well as in the Python language.


MAJOR ADVANTAGE:

It is no longer necessary for programmers to translate binary numbers to and
from hexadecimal in order to use them in Java programs.


MAJOR BENEFIT:

Code using bitwise operations is more readable and easier to verify against
technical specifications that use binary numbers to specify constants.

Routines that are bit-oriented are easier to understand when an artifical
translation to hexadecimal is not required in order to fulfill the
constraints of the language.

MAJOR DISADVANTAGE:

Someone might incorrectly think that "0b1" represented the same value as
hexadecimal number "0xB1". However, note that this problem has existed for
octal/decimal for many years (confusion between "050" and "50") and does not
seem to be a major issue.


ALTERNATIVES:

Users could continue to write the numbers as decimal, octal, or hexadecimal,
and would continue to have the problems observed in this document.

Another alternative would be for code to translate at runtime from binary
strings, such as:

   int BITMASK = Integer.parseInt("00001110", 2);

Besides the obvious extra verbosity, there are several problems with this:

* Calling a method such as Integer.parseInt at runtime will typically make it
impossible for the compiler to inline the value of this constant, since its
value has been taken from a runtime method call. Inlining is important,
because code that does bitwise parsing is often very low-level code in tight
loops that must execute quickly. (This is particularly the case for mobile
applications and other applications that run on severely resource-constrained
environments, which is one of the cases where binary numbers would be most
valuable, since talking to low-level hardware is one of the primary use cases
for this feature.)

* Constants such as the above cannot be used as selectors in 'switch'
statements.

* Any errors in the string to be parsed (for instance, an extra space) will
result in runtime exceptions, rather than compile-time errors as would have
occurred in normal parsing. If such a value is declared 'static', this will
result in some very ugly exceptions at runtime.


EXAMPLES:

// An 8-bit 'byte' literal.
byte aByte = (byte)0b00100001;

// A 16-bit 'short' literal.
short aShort = (short)0b1010000101000101;

// Some 32-bit 'int' literals.
int anInt1 = 0b10100001010001011010000101000101;
int anInt2 = 0b101;
int anInt3 = 0B101; // The B can be upper or lower case as per the x in
"0x45".

// A 64-bit 'long' literal. Note the "L" suffix, as would also be used
// for a long in decimal, hexadecimal, or octal.
long aLong =
0b01010000101000101101000010100010110100001010001011010000101000101L;

SIMPLE EXAMPLE:

class Foo {
public static void main(String[] args) {
 System.out.println("The value 10100001 in decimal is " + 0b10100001);
}


ADVANCED EXAMPLE:

// Binary constants could be used in code that needs to be
// easily checkable against a specifications document, such
// as this simulator for a hypothetical 8-bit microprocessor:

public State decodeInstruction(int instruction, State state) {
  if ((instruction & 0b11100000) == 0b00000000) {
    final int register = instruction & 0b00001111;
    switch (instruction & 0b11110000) {
      case 0b00000000: return state.nop();
      case 0b00010000: return state.copyAccumTo(register);
      case 0b00100000: return state.addToAccum(register);
      case 0b00110000: return state.subFromAccum(register);
      case 0b01000000: return state.multiplyAccumBy(register);
      case 0b01010000: return state.divideAccumBy(register);
      case 0b01100000: return state.setAccumFrom(register);
      case 0b01110000: return state.returnFromCall();
      default: throw new IllegalArgumentException();
    }
  } else {
    final int address = instruction & 0b00011111;
    switch (instruction & 0b11100000) {
      case 0b00100000: return state.jumpTo(address);
      case 0b01000000: return state.jumpIfAccumZeroTo(address);
      case 0b01000000: return state.jumpIfAccumNonzeroTo(address);
      case 0b01100000: return state.setAccumFromMemory(address);
      case 0b10100000: return state.writeAccumToMemory(address);
      case 0b11000000: return state.callTo(address);
      default: throw new IllegalArgumentException();
    }
  }
}

// Binary literals can be used to make a bitmap more readable:

public static final short[] HAPPY_FACE = {
   (short)0b0000011111100000;
   (short)0b0000100000010000;
   (short)0b0001000000001000;
   (short)0b0010000000000100;
   (short)0b0100000000000010;
   (short)0b1000011001100001;
   (short)0b1000011001100001;
   (short)0b1000000000000001;
   (short)0b1000000000000001;
   (short)0b1001000000001001;
   (short)0b1000100000010001;
   (short)0b0100011111100010;
   (short)0b0010000000000100;
   (short)0b0001000000001000;
   (short)0b0000100000010000;
   (short)0b0000011111100000;
}   

// Binary literals can make relationships
// among data more apparent than they would
// be in hex or octal.
//
// For instance, what does the following
// array contain? In hexadecimal, it's hard to tell:
public static final int[] PHASES = {
    0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98
}

// In binary, it's obvious that a number is being
// rotated left one bit at a time.
public static final int[] PHASES = {
    0b00110001,
    0b01100010,
    0b11000100,
    0b10001001,
    0b00010011,
    0b00100110,
    0b01001100,
    0b10011000,
}


DETAILS

SPECIFICATION:

Section 3.10.1 ("Integer Literals") of the JLS3 should be changed to add the
following:

IntegerLiteral:
        DecimalIntegerLiteral
        HexIntegerLiteral       
        OctalIntegerLiteral
        BinaryIntegerLiteral         // Added

BinaryIntegerLiteral:
        BinaryNumeral IntegerTypeSuffix_opt

BinaryNumeral:
        0 b BinaryDigits
        0 B BinaryDigits

BinaryDigits:
        BinaryDigit
        BinaryDigit BinaryDigits

BinaryDigit: one of
        0 1

COMPILATION:

Binary literals would be compiled to class files in the same fashion as
existing decimal, hexadecimal, and octal literals are. No special support or
changes to the class file format are needed.

TESTING:

The feature can be tested in the same way as existing decimal, hexadecimal,
and octal literals are: Create a bunch of constants in source code, including
the maximum and minimum positive and negative values for integer and long
types, and verify them at runtime to have the correct values.


LIBRARY SUPPORT:

The methods Integer.decode(String) and Long.decode(String) should be modified
to parse binary numbers (as specified above) in addition to their existing
support for decimal, hexadecimal, and octal numbers.


REFLECTIVE APIS:

No updates to the reflection APIs are needed.


OTHER CHANGES:

No other changes are needed.


MIGRATION:

Individual decimal, hexadecimal, or octal constants in existing code can be
updated to binary as a programmer desires.


COMPATIBILITY


BREAKING CHANGES:

This feature would not break any existing programs, since the suggested
syntax is currently considerd to be a compile-time error.


EXISTING PROGRAMS:

Class file format does not change, so existing programs can use class files
compiled with the new feature without problems.


REFERENCES:

The GCC/G++ compiler, which already supports this syntax (as of version 4.3)
as an extension to standard C/C++.
http://gcc.gnu.org/gcc-4.3/changes.html

The Ruby language, which supports binary literals:
http://wordaligned.org/articles/binary-literals

The Python language added binary literals in version 2.6:
http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax

EXISTING BUGS:

"Language support for literal numbers in binary and other bases"
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025288

URL FOR PROTOTYPE (optional):

None.



More information about the coin-dev mailing list