PROPOSAL: Binary Literals

Reinier Zwitserloot reinier at zwitserloot.com
Wed Mar 25 13:07:07 PDT 2009


Of all low impact coin submissions, this just isn't very compelling.  
Real Programmers can count in hexadecimal ever since they were A years  
old, after all.

  --Reinier Zwitserloot



On Mar 25, 2009, at 20:35, Gaseous Fumes wrote:

> A couple of quick notes:
>
> 1) Ruby already has this. (The underscores in numbers, that is)
>
> 2) I considered adding this to the proposal as I was writing it, but  
> decided it was an orthogonal issue and deserved its own proposal,  
> since it has nothing per se to do with binary literals. (As James  
> mentions, it would make sense for all numbers, not just binary ones.)
>
> 3) I encourage someone else to write it up as a proposal. If I get  
> done with the other proposals I intend to submit, and still have  
> time, I might do it myself, but if you want to ensure that the  
> proposal gets proposed, I suggest you don't wait for me. One caveat:  
> Consider the impact on Integer.parseInt, Long.decode(), etc. (I  
> suggest the decode methods get changed to accept underscores, but  
> the parseInt ones don't.)
>
> 4) An observation to Joe Darcy and the other Sun engineers involved  
> in reviewing proposals: The expected "about five" limit on proposals  
> really encourages people to lump a bunch of semi-related things into  
> one proposal rather than making each their own proposal, even when  
> the latter would be a more logical way of getting individual  
> orthogonal ideas reviewed separately. I think this is a problem.
>
> For instance, the "null safe operators" proposal (all or nothing)  
> vs. splitting each of them out as individual proposals. There have  
> been a number of proposals put forth where I thought "I agree with  
> half of this proposal and wish the other half wasn't in the same  
> proposal."
>
> I hope that the "about" in "about five" is flexible enough to allow  
> a bunch of very minor proposals to expand that limit well past five  
> if they seem good ideas and easy to implement with few  
> repercussions. Five seems like a pretty low number to me, given that  
> it's been MANY years since it was even possible for users to suggest  
> changes to Java (basically, since JDK 5 was in the planning stages,  
> as JDK 6 was announced to be a "no language changes" release), and  
> there has been much evolution in other programming languages during  
> that time. I think that good ideas should make it into Java (and bad  
> ideas shouldn't) subject to the necessary manpower in review and  
> implementation, regardless of the number of proposals used to submit  
> them. Otherwise, Java risks getting left in the dust as other  
> languages become much easier to use, much faster.
>
> Derek
>
> -----Original Message-----
>> From: james lowden <jl0235 at yahoo.com>
>> Sent: Mar 25, 2009 12:07 PM
>> To: coin-dev at openjdk.java.net
>> Subject: Re: PROPOSAL: Binary Literals
>>
>>
>> Actually, that's a good idea in general for long numeric  
>> constants.  9_000_000_000_000_000L is easier to parse than  
>> 9000000000000000L.
>>
>>
>> --- On Wed, 3/25/09, Stephen Colebourne <scolebourne at joda.org> wrote:
>>
>>> From: Stephen Colebourne <scolebourne at joda.org>
>>> Subject: Re: PROPOSAL: Binary Literals
>>> To: coin-dev at openjdk.java.net
>>> Date: Wednesday, March 25, 2009, 1:50 PM
>>> See
>>> http://www.jroller.com/scolebourne/entry/changing_java_adding_simpler_primitive
>>> for my take on this from long ago.
>>>
>>> In particular, I'd suggest allowing a character to
>>> separate long binary strings:
>>>
>>> int anInt1 = 0b10100001_01000101_10100001_01000101;
>>>
>>> much more readable.
>>>
>>> Stephen
>>>
>>> 2009/3/25 Derek Foster <vapor1 at teleport.com>:
>>>> Hmm. Second try at sending to the list. Let's see
>>> if this works. (In the
>>>> meantime, I noticed that Bruce Chapman has mentioned
>>> something similar in his
>>>> another proposal, so I think we are in agreement on
>>> this. This proposal
>>>> should not be taken as to compete with his similar
>>> proposal: I'd quite like
>>>> to see type suffixes for bytes, shorts, etc. added to
>>> Java, in addition to
>>>> binary literals.) Anyway...
>>>>
>>>>
>>>>
>>>>
>>>> Add binary literals to Java.
>>>>
>>>> AUTHOR(S): Derek Foster
>>>>
>>>> OVERVIEW
>>>>
>>>> In some programming domains, use of binary numbers
>>> (typically as bitmasks,
>>>> bit-shifts, etc.) is very common. However, Java code,
>>> due to its C heritage,
>>>> has traditionally forced programmers to represent
>>> numbers in only decimal,
>>>> octal, or hexadecimal. (In practice, octal is rarely
>>> used, and is present
>>>> mostly for backwards compatibility with C)
>>>>
>>>> When the data being dealt with is fundamentally
>>> bit-oriented, however, using
>>>> hexadecimal to represent ranges of bits requires an
>>> extra degree of
>>>> translation for the programmer, and this can often
>>> become a source of errors.
>>>> For instance, if a technical specification lists
>>> specific values of interest
>>>> in binary (for example, in a compression encoding
>>> algorithm or in the
>>>> specifications for a network protocol, or for
>>> communicating with a bitmapped
>>>> hardware device) then a programmer coding to that
>>> specification must
>>>> translate each such value from its binary
>>> representation into hexadecimal.
>>>> Checking to see if this translation has been done
>>> correctly is accomplished
>>>> by back-translating the numbers. In most cases,
>>> programmers do these
>>>> translations in their heads, and HOPEFULLY get them
>>> right. however, errors
>>>> can easily creep in, and re-verifying the results is
>>> not straightforward
>>>> enough to be done frequently.
>>>>
>>>> Furthermore, in many cases, the binary representations
>>> of numbers makes it
>>>> much more clear what is actually intended than the
>>> hexadecimal one. For
>>>> instance, this:
>>>>
>>>> private static final int BITMASK = 0x1E;
>>>>
>>>> does not immediately make it clear that the bitmask
>>> being declared comprises
>>>> a single contiguous range of four bits.
>>>>
>>>> In many cases, it would be more natural for the
>>> programmer to be able to
>>>> write the numbers in binary in the source code,
>>> eliminating the need for
>>>> manual translation to hexadecimal entirely.
>>>>
>>>>
>>>> FEATURE SUMMARY:
>>>>
>>>> In addition to the existing "1" (decimal),
>>> "01" (octal) and "0x1"
>>>> (hexadecimal) form of specifying numeric literals, a
>>> new form "0b1" (binary)
>>>> would be added.
>>>>
>>>> Note that this is the same syntax as has been used as
>>> an extension by the GCC
>>>> C/C++ compilers for many years, and also is used in
>>> the Ruby language, as
>>>> well as in the Python language.
>>>>
>>>>
>>>> MAJOR ADVANTAGE:
>>>>
>>>> It is no longer necessary for programmers to translate
>>> binary numbers to and
>>>> from hexadecimal in order to use them in Java
>>> programs.
>>>>
>>>>
>>>> MAJOR BENEFIT:
>>>>
>>>> Code using bitwise operations is more readable and
>>> easier to verify against
>>>> technical specifications that use binary numbers to
>>> specify constants.
>>>>
>>>> Routines that are bit-oriented are easier to
>>> understand when an artifical
>>>> translation to hexadecimal is not required in order to
>>> fulfill the
>>>> constraints of the language.
>>>>
>>>> MAJOR DISADVANTAGE:
>>>>
>>>> Someone might incorrectly think that "0b1"
>>> represented the same value as
>>>> hexadecimal number "0xB1". However, note
>>> that this problem has existed for
>>>> octal/decimal for many years (confusion between
>>> "050" and "50") and does not
>>>> seem to be a major issue.
>>>>
>>>>
>>>> ALTERNATIVES:
>>>>
>>>> Users could continue to write the numbers as decimal,
>>> octal, or hexadecimal,
>>>> and would continue to have the problems observed in
>>> this document.
>>>>
>>>> Another alternative would be for code to translate at
>>> runtime from binary
>>>> strings, such as:
>>>>
>>>>   int BITMASK =
>>> Integer.parseInt("00001110", 2);
>>>>
>>>> Besides the obvious extra verbosity, there are several
>>> problems with this:
>>>>
>>>> * Calling a method such as Integer.parseInt at runtime
>>> will typically make it
>>>> impossible for the compiler to inline the value of
>>> this constant, since its
>>>> value has been taken from a runtime method call.
>>> Inlining is important,
>>>> because code that does bitwise parsing is often very
>>> low-level code in tight
>>>> loops that must execute quickly. (This is particularly
>>> the case for mobile
>>>> applications and other applications that run on
>>> severely resource-constrained
>>>> environments, which is one of the cases where binary
>>> numbers would be most
>>>> valuable, since talking to low-level hardware is one
>>> of the primary use cases
>>>> for this feature.)
>>>>
>>>> * Constants such as the above cannot be used as
>>> selectors in 'switch'
>>>> statements.
>>>>
>>>> * Any errors in the string to be parsed (for instance,
>>> an extra space) will
>>>> result in runtime exceptions, rather than compile-time
>>> errors as would have
>>>> occurred in normal parsing. If such a value is
>>> declared 'static', this will
>>>> result in some very ugly exceptions at runtime.
>>>>
>>>>
>>>> EXAMPLES:
>>>>
>>>> // An 8-bit 'byte' literal.
>>>> byte aByte = (byte)0b00100001;
>>>>
>>>> // A 16-bit 'short' literal.
>>>> short aShort = (short)0b1010000101000101;
>>>>
>>>> // Some 32-bit 'int' literals.
>>>> int anInt1 = 0b10100001010001011010000101000101;
>>>> int anInt2 = 0b101;
>>>> int anInt3 = 0B101; // The B can be upper or lower
>>> case as per the x in
>>>> "0x45".
>>>>
>>>> // A 64-bit 'long' literal. Note the
>>> "L" suffix, as would also be used
>>>> // for a long in decimal, hexadecimal, or octal.
>>>> long aLong =
>>>>
>>> 0b01010000101000101101000010100010110100001010001011010000101000101L 
>>> ;
>>>>
>>>> SIMPLE EXAMPLE:
>>>>
>>>> class Foo {
>>>> public static void main(String[] args) {
>>>>  System.out.println("The value 10100001 in
>>> decimal is " + 0b10100001);
>>>> }
>>>>
>>>>
>>>> ADVANCED EXAMPLE:
>>>>
>>>> // Binary constants could be used in code that needs
>>> to be
>>>> // easily checkable against a specifications document,
>>> such
>>>> // as this simulator for a hypothetical 8-bit
>>> microprocessor:
>>>>
>>>> public State decodeInstruction(int instruction, State
>>> state) {
>>>>  if ((instruction & 0b11100000) == 0b00000000) {
>>>>    final int register = instruction &
>>> 0b00001111;
>>>>    switch (instruction & 0b11110000) {
>>>>      case 0b00000000: return state.nop();
>>>>      case 0b00010000: return
>>> state.copyAccumTo(register);
>>>>      case 0b00100000: return
>>> state.addToAccum(register);
>>>>      case 0b00110000: return
>>> state.subFromAccum(register);
>>>>      case 0b01000000: return
>>> state.multiplyAccumBy(register);
>>>>      case 0b01010000: return
>>> state.divideAccumBy(register);
>>>>      case 0b01100000: return
>>> state.setAccumFrom(register);
>>>>      case 0b01110000: return
>>> state.returnFromCall();
>>>>      default: throw new IllegalArgumentException();
>>>>    }
>>>>  } else {
>>>>    final int address = instruction & 0b00011111;
>>>>    switch (instruction & 0b11100000) {
>>>>      case 0b00100000: return state.jumpTo(address);
>>>>      case 0b01000000: return
>>> state.jumpIfAccumZeroTo(address);
>>>>      case 0b01000000: return
>>> state.jumpIfAccumNonzeroTo(address);
>>>>      case 0b01100000: return
>>> state.setAccumFromMemory(address);
>>>>      case 0b10100000: return
>>> state.writeAccumToMemory(address);
>>>>      case 0b11000000: return state.callTo(address);
>>>>      default: throw new IllegalArgumentException();
>>>>    }
>>>>  }
>>>> }
>>>>
>>>> // Binary literals can be used to make a bitmap more
>>> readable:
>>>>
>>>> public static final short[] HAPPY_FACE = {
>>>>   (short)0b0000011111100000;
>>>>   (short)0b0000100000010000;
>>>>   (short)0b0001000000001000;
>>>>   (short)0b0010000000000100;
>>>>   (short)0b0100000000000010;
>>>>   (short)0b1000011001100001;
>>>>   (short)0b1000011001100001;
>>>>   (short)0b1000000000000001;
>>>>   (short)0b1000000000000001;
>>>>   (short)0b1001000000001001;
>>>>   (short)0b1000100000010001;
>>>>   (short)0b0100011111100010;
>>>>   (short)0b0010000000000100;
>>>>   (short)0b0001000000001000;
>>>>   (short)0b0000100000010000;
>>>>   (short)0b0000011111100000;
>>>> }
>>>>
>>>> // Binary literals can make relationships
>>>> // among data more apparent than they would
>>>> // be in hex or octal.
>>>> //
>>>> // For instance, what does the following
>>>> // array contain? In hexadecimal, it's hard to
>>> tell:
>>>> public static final int[] PHASES = {
>>>>    0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98
>>>> }
>>>>
>>>> // In binary, it's obvious that a number is being
>>>> // rotated left one bit at a time.
>>>> public static final int[] PHASES = {
>>>>    0b00110001,
>>>>    0b01100010,
>>>>    0b11000100,
>>>>    0b10001001,
>>>>    0b00010011,
>>>>    0b00100110,
>>>>    0b01001100,
>>>>    0b10011000,
>>>> }
>>>>
>>>>
>>>> DETAILS
>>>>
>>>> SPECIFICATION:
>>>>
>>>> Section 3.10.1 ("Integer Literals") of the
>>> JLS3 should be changed to add the
>>>> following:
>>>>
>>>> IntegerLiteral:
>>>>        DecimalIntegerLiteral
>>>>        HexIntegerLiteral
>>>>        OctalIntegerLiteral
>>>>        BinaryIntegerLiteral         // Added
>>>>
>>>> BinaryIntegerLiteral:
>>>>        BinaryNumeral IntegerTypeSuffix_opt
>>>>
>>>> BinaryNumeral:
>>>>        0 b BinaryDigits
>>>>        0 B BinaryDigits
>>>>
>>>> BinaryDigits:
>>>>        BinaryDigit
>>>>        BinaryDigit BinaryDigits
>>>>
>>>> BinaryDigit: one of
>>>>        0 1
>>>>
>>>> COMPILATION:
>>>>
>>>> Binary literals would be compiled to class files in
>>> the same fashion as
>>>> existing decimal, hexadecimal, and octal literals are.
>>> No special support or
>>>> changes to the class file format are needed.
>>>>
>>>> TESTING:
>>>>
>>>> The feature can be tested in the same way as existing
>>> decimal, hexadecimal,
>>>> and octal literals are: Create a bunch of constants in
>>> source code, including
>>>> the maximum and minimum positive and negative values
>>> for integer and long
>>>> types, and verify them at runtime to have the correct
>>> values.
>>>>
>>>>
>>>> LIBRARY SUPPORT:
>>>>
>>>> The methods Integer.decode(String) and
>>> Long.decode(String) should be modified
>>>> to parse binary numbers (as specified above) in
>>> addition to their existing
>>>> support for decimal, hexadecimal, and octal numbers.
>>>>
>>>>
>>>> REFLECTIVE APIS:
>>>>
>>>> No updates to the reflection APIs are needed.
>>>>
>>>>
>>>> OTHER CHANGES:
>>>>
>>>> No other changes are needed.
>>>>
>>>>
>>>> MIGRATION:
>>>>
>>>> Individual decimal, hexadecimal, or octal constants in
>>> existing code can be
>>>> updated to binary as a programmer desires.
>>>>
>>>>
>>>> COMPATIBILITY
>>>>
>>>>
>>>> BREAKING CHANGES:
>>>>
>>>> This feature would not break any existing programs,
>>> since the suggested
>>>> syntax is currently considerd to be a compile-time
>>> error.
>>>>
>>>>
>>>> EXISTING PROGRAMS:
>>>>
>>>> Class file format does not change, so existing
>>> programs can use class files
>>>> compiled with the new feature without problems.
>>>>
>>>>
>>>> REFERENCES:
>>>>
>>>> The GCC/G++ compiler, which already supports this
>>> syntax (as of version 4.3)
>>>> as an extension to standard C/C++.
>>>> http://gcc.gnu.org/gcc-4.3/changes.html
>>>>
>>>> The Ruby language, which supports binary literals:
>>>> http://wordaligned.org/articles/binary-literals
>>>>
>>>> The Python language added binary literals in version
>>> 2.6:
>>>>
>>> http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax
>>>>
>>>> EXISTING BUGS:
>>>>
>>>> "Language support for literal numbers in binary
>>> and other bases"
>>>>
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025288
>>>>
>>>> URL FOR PROTOTYPE (optional):
>>>>
>>>> None.
>>>>
>>>>
>>
>>
>>
>>
>
>




More information about the coin-dev mailing list