PROPOSAL: Binary Literals

Wed Mar 25 11:50:41 PDT 2009

See http://www.jroller.com/scolebourne/entry/changing_java_adding_simpler_primitive
for my take on this from long ago.

In particular, I'd suggest allowing a character to separate long binary strings:

int anInt1 = 0b10100001_01000101_10100001_01000101;

much more readable.

Stephen

2009/3/25 Derek Foster <vapor1 at teleport.com>:
> Hmm. Second try at sending to the list. Let's see if this works. (In the
> meantime, I noticed that Bruce Chapman has mentioned something similar in his
> another proposal, so I think we are in agreement on this. This proposal
> should not be taken as to compete with his similar proposal: I'd quite like
> to see type suffixes for bytes, shorts, etc. added to Java, in addition to
> binary literals.) Anyway...
>
>
>
>
> Add binary literals to Java.
>
> AUTHOR(S): Derek Foster
>
> OVERVIEW
>
> In some programming domains, use of binary numbers (typically as bitmasks,
> bit-shifts, etc.) is very common. However, Java code, due to its C heritage,
> has traditionally forced programmers to represent numbers in only decimal,
> octal, or hexadecimal. (In practice, octal is rarely used, and is present
> mostly for backwards compatibility with C)
>
> When the data being dealt with is fundamentally bit-oriented, however, using
> hexadecimal to represent ranges of bits requires an extra degree of
> translation for the programmer, and this can often become a source of errors.
> For instance, if a technical specification lists specific values of interest
> in binary (for example, in a compression encoding algorithm or in the
> specifications for a network protocol, or for communicating with a bitmapped
> hardware device) then a programmer coding to that specification must
> translate each such value from its binary representation into hexadecimal.
> Checking to see if this translation has been done correctly is accomplished
> by back-translating the numbers. In most cases, programmers do these
> translations in their heads, and HOPEFULLY get them right. however, errors
> can easily creep in, and re-verifying the results is not straightforward
> enough to be done frequently.
>
> Furthermore, in many cases, the binary representations of numbers makes it
> much more clear what is actually intended than the hexadecimal one. For
> instance, this:
>
> private static final int BITMASK = 0x1E;
>
> does not immediately make it clear that the bitmask being declared comprises
> a single contiguous range of four bits.
>
> In many cases, it would be more natural for the programmer to be able to
> write the numbers in binary in the source code, eliminating the need for
> manual translation to hexadecimal entirely.
>
>
> FEATURE SUMMARY:
>
> In addition to the existing "1" (decimal), "01" (octal) and "0x1"
> (hexadecimal) form of specifying numeric literals, a new form "0b1" (binary)
> would be added.
>
> Note that this is the same syntax as has been used as an extension by the GCC
> C/C++ compilers for many years, and also is used in the Ruby language, as
> well as in the Python language.
>
>
> MAJOR ADVANTAGE:
>
> It is no longer necessary for programmers to translate binary numbers to and
> from hexadecimal in order to use them in Java programs.
>
>
> MAJOR BENEFIT:
>
> Code using bitwise operations is more readable and easier to verify against
> technical specifications that use binary numbers to specify constants.
>
> Routines that are bit-oriented are easier to understand when an artifical
> translation to hexadecimal is not required in order to fulfill the
> constraints of the language.
>
> MAJOR DISADVANTAGE:
>
> Someone might incorrectly think that "0b1" represented the same value as
> hexadecimal number "0xB1". However, note that this problem has existed for
> octal/decimal for many years (confusion between "050" and "50") and does not
> seem to be a major issue.
>
>
> ALTERNATIVES:
>
> Users could continue to write the numbers as decimal, octal, or hexadecimal,
> and would continue to have the problems observed in this document.
>
> Another alternative would be for code to translate at runtime from binary
> strings, such as:
>
>   int BITMASK = Integer.parseInt("00001110", 2);
>
> Besides the obvious extra verbosity, there are several problems with this:
>
> * Calling a method such as Integer.parseInt at runtime will typically make it
> impossible for the compiler to inline the value of this constant, since its
> value has been taken from a runtime method call. Inlining is important,
> because code that does bitwise parsing is often very low-level code in tight
> loops that must execute quickly. (This is particularly the case for mobile
> applications and other applications that run on severely resource-constrained
> environments, which is one of the cases where binary numbers would be most
> valuable, since talking to low-level hardware is one of the primary use cases
> for this feature.)
>
> * Constants such as the above cannot be used as selectors in 'switch'
> statements.
>
> * Any errors in the string to be parsed (for instance, an extra space) will
> result in runtime exceptions, rather than compile-time errors as would have
> occurred in normal parsing. If such a value is declared 'static', this will
> result in some very ugly exceptions at runtime.
>
>
> EXAMPLES:
>
> // An 8-bit 'byte' literal.
> byte aByte = (byte)0b00100001;
>
> // A 16-bit 'short' literal.
> short aShort = (short)0b1010000101000101;
>
> // Some 32-bit 'int' literals.
> int anInt1 = 0b10100001010001011010000101000101;
> int anInt2 = 0b101;
> int anInt3 = 0B101; // The B can be upper or lower case as per the x in
> "0x45".
>
> // A 64-bit 'long' literal. Note the "L" suffix, as would also be used
> // for a long in decimal, hexadecimal, or octal.
> long aLong =
> 0b01010000101000101101000010100010110100001010001011010000101000101L;
>
> SIMPLE EXAMPLE:
>
> class Foo {
> public static void main(String[] args) {
>  System.out.println("The value 10100001 in decimal is " + 0b10100001);
> }
>
>
> ADVANCED EXAMPLE:
>
> // Binary constants could be used in code that needs to be
> // easily checkable against a specifications document, such
> // as this simulator for a hypothetical 8-bit microprocessor:
>
> public State decodeInstruction(int instruction, State state) {
>  if ((instruction & 0b11100000) == 0b00000000) {
>    final int register = instruction & 0b00001111;
>    switch (instruction & 0b11110000) {
>      case 0b00000000: return state.nop();
>      case 0b00010000: return state.copyAccumTo(register);
>      case 0b00100000: return state.addToAccum(register);
>      case 0b00110000: return state.subFromAccum(register);
>      case 0b01000000: return state.multiplyAccumBy(register);
>      case 0b01010000: return state.divideAccumBy(register);
>      case 0b01100000: return state.setAccumFrom(register);
>      case 0b01110000: return state.returnFromCall();
>      default: throw new IllegalArgumentException();
>    }
>  } else {
>    final int address = instruction & 0b00011111;
>    switch (instruction & 0b11100000) {
>      case 0b00100000: return state.jumpTo(address);
>      case 0b01000000: return state.jumpIfAccumZeroTo(address);
>      case 0b01000000: return state.jumpIfAccumNonzeroTo(address);
>      case 0b01100000: return state.setAccumFromMemory(address);
>      case 0b10100000: return state.writeAccumToMemory(address);
>      case 0b11000000: return state.callTo(address);
>      default: throw new IllegalArgumentException();
>    }
>  }
> }
>
> // Binary literals can be used to make a bitmap more readable:
>
> public static final short[] HAPPY_FACE = {
>   (short)0b0000011111100000;
>   (short)0b0000100000010000;
>   (short)0b0001000000001000;
>   (short)0b0010000000000100;
>   (short)0b0100000000000010;
>   (short)0b1000011001100001;
>   (short)0b1000011001100001;
>   (short)0b1000000000000001;
>   (short)0b1000000000000001;
>   (short)0b1001000000001001;
>   (short)0b1000100000010001;
>   (short)0b0100011111100010;
>   (short)0b0010000000000100;
>   (short)0b0001000000001000;
>   (short)0b0000100000010000;
>   (short)0b0000011111100000;
> }
>
> // Binary literals can make relationships
> // among data more apparent than they would
> // be in hex or octal.
> //
> // For instance, what does the following
> // array contain? In hexadecimal, it's hard to tell:
> public static final int[] PHASES = {
>    0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98
> }
>
> // In binary, it's obvious that a number is being
> // rotated left one bit at a time.
> public static final int[] PHASES = {
>    0b00110001,
>    0b01100010,
>    0b11000100,
>    0b10001001,
>    0b00010011,
>    0b00100110,
>    0b01001100,
>    0b10011000,
> }
>
>
> DETAILS
>
> SPECIFICATION:
>
> Section 3.10.1 ("Integer Literals") of the JLS3 should be changed to add the
> following:
>
> IntegerLiteral:
>        DecimalIntegerLiteral
>        HexIntegerLiteral
>        OctalIntegerLiteral
>        BinaryIntegerLiteral         // Added
>
> BinaryIntegerLiteral:
>        BinaryNumeral IntegerTypeSuffix_opt
>
> BinaryNumeral:
>        0 b BinaryDigits
>        0 B BinaryDigits
>
> BinaryDigits:
>        BinaryDigit
>        BinaryDigit BinaryDigits
>
> BinaryDigit: one of
>        0 1
>
> COMPILATION:
>
> Binary literals would be compiled to class files in the same fashion as
> existing decimal, hexadecimal, and octal literals are. No special support or
> changes to the class file format are needed.
>
> TESTING:
>
> The feature can be tested in the same way as existing decimal, hexadecimal,
> and octal literals are: Create a bunch of constants in source code, including
> the maximum and minimum positive and negative values for integer and long
> types, and verify them at runtime to have the correct values.
>
>
> LIBRARY SUPPORT:
>
> The methods Integer.decode(String) and Long.decode(String) should be modified
> to parse binary numbers (as specified above) in addition to their existing
> support for decimal, hexadecimal, and octal numbers.
>
>
> REFLECTIVE APIS:
>
> No updates to the reflection APIs are needed.
>
>
> OTHER CHANGES:
>
> No other changes are needed.
>
>
> MIGRATION:
>
> Individual decimal, hexadecimal, or octal constants in existing code can be
> updated to binary as a programmer desires.
>
>
> COMPATIBILITY
>
>
> BREAKING CHANGES:
>
> This feature would not break any existing programs, since the suggested
> syntax is currently considerd to be a compile-time error.
>
>
> EXISTING PROGRAMS:
>
> Class file format does not change, so existing programs can use class files
> compiled with the new feature without problems.
>
>
> REFERENCES:
>
> The GCC/G++ compiler, which already supports this syntax (as of version 4.3)
> as an extension to standard C/C++.
> http://gcc.gnu.org/gcc-4.3/changes.html
>
> The Ruby language, which supports binary literals:
> http://wordaligned.org/articles/binary-literals
>
> The Python language added binary literals in version 2.6:
> http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax
>
> EXISTING BUGS:
>
> "Language support for literal numbers in binary and other bases"
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025288
>
> URL FOR PROTOTYPE (optional):
>
> None.
>
>