PROPOSAL: Binary Literals

Stephen Colebourne scolebourne at joda.org
Wed Mar 25 15:17:09 PDT 2009


 > A couple of quick notes:
 >
 > 1) Ruby already has this. (The underscores in numbers, that is)

So, does Fan. Yes, someone should write up underscores in numbers as a 
separate proposal. It won't be me.


 > Of all low impact coin submissions, this just isn't very compelling.
 > Real Programmers can count in hexadecimal ever since they were A years
 > old, after all.

Actually, my experience suggests that lots can't count in hex - we have 
calculators and the internet we that kind of thing these days.

There are definitely use cases for this too, but they occur relatively 
rarely. For example, I once encoded the 30 year repeating pattern of 
leap years in the Islamic calendar system as a binary. This was a mess 
to write:

Years to encode: 2, 5, 7, 10, 13, 16, 18, 21, 24, 26 & 29

Final int value: 623191204

Binary literal: 0b00010010_10010010_10010010_01010010

It is also beneficial in teaching.

But the truth is it will always be low priority (mind you, I'd have 
placed it higher than hexadecimal floating point literals...)

Stephen




Reinier Zwitserloot wrote:
> Of all low impact coin submissions, this just isn't very compelling.  
> Real Programmers can count in hexadecimal ever since they were A years  
> old, after all.
> 
>   --Reinier Zwitserloot
> 
> 
> 
> On Mar 25, 2009, at 20:35, Gaseous Fumes wrote:
> 
>> A couple of quick notes:
>>
>> 1) Ruby already has this. (The underscores in numbers, that is)
>>
>> 2) I considered adding this to the proposal as I was writing it, but  
>> decided it was an orthogonal issue and deserved its own proposal,  
>> since it has nothing per se to do with binary literals. (As James  
>> mentions, it would make sense for all numbers, not just binary ones.)
>>
>> 3) I encourage someone else to write it up as a proposal. If I get  
>> done with the other proposals I intend to submit, and still have  
>> time, I might do it myself, but if you want to ensure that the  
>> proposal gets proposed, I suggest you don't wait for me. One caveat:  
>> Consider the impact on Integer.parseInt, Long.decode(), etc. (I  
>> suggest the decode methods get changed to accept underscores, but  
>> the parseInt ones don't.)
>>
>> 4) An observation to Joe Darcy and the other Sun engineers involved  
>> in reviewing proposals: The expected "about five" limit on proposals  
>> really encourages people to lump a bunch of semi-related things into  
>> one proposal rather than making each their own proposal, even when  
>> the latter would be a more logical way of getting individual  
>> orthogonal ideas reviewed separately. I think this is a problem.
>>
>> For instance, the "null safe operators" proposal (all or nothing)  
>> vs. splitting each of them out as individual proposals. There have  
>> been a number of proposals put forth where I thought "I agree with  
>> half of this proposal and wish the other half wasn't in the same  
>> proposal."
>>
>> I hope that the "about" in "about five" is flexible enough to allow  
>> a bunch of very minor proposals to expand that limit well past five  
>> if they seem good ideas and easy to implement with few  
>> repercussions. Five seems like a pretty low number to me, given that  
>> it's been MANY years since it was even possible for users to suggest  
>> changes to Java (basically, since JDK 5 was in the planning stages,  
>> as JDK 6 was announced to be a "no language changes" release), and  
>> there has been much evolution in other programming languages during  
>> that time. I think that good ideas should make it into Java (and bad  
>> ideas shouldn't) subject to the necessary manpower in review and  
>> implementation, regardless of the number of proposals used to submit  
>> them. Otherwise, Java risks getting left in the dust as other  
>> languages become much easier to use, much faster.
>>
>> Derek
>>
>> -----Original Message-----
>>> From: james lowden <jl0235 at yahoo.com>
>>> Sent: Mar 25, 2009 12:07 PM
>>> To: coin-dev at openjdk.java.net
>>> Subject: Re: PROPOSAL: Binary Literals
>>>
>>>
>>> Actually, that's a good idea in general for long numeric  
>>> constants.  9_000_000_000_000_000L is easier to parse than  
>>> 9000000000000000L.
>>>
>>>
>>> --- On Wed, 3/25/09, Stephen Colebourne <scolebourne at joda.org> wrote:
>>>
>>>> From: Stephen Colebourne <scolebourne at joda.org>
>>>> Subject: Re: PROPOSAL: Binary Literals
>>>> To: coin-dev at openjdk.java.net
>>>> Date: Wednesday, March 25, 2009, 1:50 PM
>>>> See
>>>> http://www.jroller.com/scolebourne/entry/changing_java_adding_simpler_primitive
>>>> for my take on this from long ago.
>>>>
>>>> In particular, I'd suggest allowing a character to
>>>> separate long binary strings:
>>>>
>>>> int anInt1 = 0b10100001_01000101_10100001_01000101;
>>>>
>>>> much more readable.
>>>>
>>>> Stephen
>>>>
>>>> 2009/3/25 Derek Foster <vapor1 at teleport.com>:
>>>>> Hmm. Second try at sending to the list. Let's see
>>>> if this works. (In the
>>>>> meantime, I noticed that Bruce Chapman has mentioned
>>>> something similar in his
>>>>> another proposal, so I think we are in agreement on
>>>> this. This proposal
>>>>> should not be taken as to compete with his similar
>>>> proposal: I'd quite like
>>>>> to see type suffixes for bytes, shorts, etc. added to
>>>> Java, in addition to
>>>>> binary literals.) Anyway...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Add binary literals to Java.
>>>>>
>>>>> AUTHOR(S): Derek Foster
>>>>>
>>>>> OVERVIEW
>>>>>
>>>>> In some programming domains, use of binary numbers
>>>> (typically as bitmasks,
>>>>> bit-shifts, etc.) is very common. However, Java code,
>>>> due to its C heritage,
>>>>> has traditionally forced programmers to represent
>>>> numbers in only decimal,
>>>>> octal, or hexadecimal. (In practice, octal is rarely
>>>> used, and is present
>>>>> mostly for backwards compatibility with C)
>>>>>
>>>>> When the data being dealt with is fundamentally
>>>> bit-oriented, however, using
>>>>> hexadecimal to represent ranges of bits requires an
>>>> extra degree of
>>>>> translation for the programmer, and this can often
>>>> become a source of errors.
>>>>> For instance, if a technical specification lists
>>>> specific values of interest
>>>>> in binary (for example, in a compression encoding
>>>> algorithm or in the
>>>>> specifications for a network protocol, or for
>>>> communicating with a bitmapped
>>>>> hardware device) then a programmer coding to that
>>>> specification must
>>>>> translate each such value from its binary
>>>> representation into hexadecimal.
>>>>> Checking to see if this translation has been done
>>>> correctly is accomplished
>>>>> by back-translating the numbers. In most cases,
>>>> programmers do these
>>>>> translations in their heads, and HOPEFULLY get them
>>>> right. however, errors
>>>>> can easily creep in, and re-verifying the results is
>>>> not straightforward
>>>>> enough to be done frequently.
>>>>>
>>>>> Furthermore, in many cases, the binary representations
>>>> of numbers makes it
>>>>> much more clear what is actually intended than the
>>>> hexadecimal one. For
>>>>> instance, this:
>>>>>
>>>>> private static final int BITMASK = 0x1E;
>>>>>
>>>>> does not immediately make it clear that the bitmask
>>>> being declared comprises
>>>>> a single contiguous range of four bits.
>>>>>
>>>>> In many cases, it would be more natural for the
>>>> programmer to be able to
>>>>> write the numbers in binary in the source code,
>>>> eliminating the need for
>>>>> manual translation to hexadecimal entirely.
>>>>>
>>>>>
>>>>> FEATURE SUMMARY:
>>>>>
>>>>> In addition to the existing "1" (decimal),
>>>> "01" (octal) and "0x1"
>>>>> (hexadecimal) form of specifying numeric literals, a
>>>> new form "0b1" (binary)
>>>>> would be added.
>>>>>
>>>>> Note that this is the same syntax as has been used as
>>>> an extension by the GCC
>>>>> C/C++ compilers for many years, and also is used in
>>>> the Ruby language, as
>>>>> well as in the Python language.
>>>>>
>>>>>
>>>>> MAJOR ADVANTAGE:
>>>>>
>>>>> It is no longer necessary for programmers to translate
>>>> binary numbers to and
>>>>> from hexadecimal in order to use them in Java
>>>> programs.
>>>>>
>>>>> MAJOR BENEFIT:
>>>>>
>>>>> Code using bitwise operations is more readable and
>>>> easier to verify against
>>>>> technical specifications that use binary numbers to
>>>> specify constants.
>>>>> Routines that are bit-oriented are easier to
>>>> understand when an artifical
>>>>> translation to hexadecimal is not required in order to
>>>> fulfill the
>>>>> constraints of the language.
>>>>>
>>>>> MAJOR DISADVANTAGE:
>>>>>
>>>>> Someone might incorrectly think that "0b1"
>>>> represented the same value as
>>>>> hexadecimal number "0xB1". However, note
>>>> that this problem has existed for
>>>>> octal/decimal for many years (confusion between
>>>> "050" and "50") and does not
>>>>> seem to be a major issue.
>>>>>
>>>>>
>>>>> ALTERNATIVES:
>>>>>
>>>>> Users could continue to write the numbers as decimal,
>>>> octal, or hexadecimal,
>>>>> and would continue to have the problems observed in
>>>> this document.
>>>>> Another alternative would be for code to translate at
>>>> runtime from binary
>>>>> strings, such as:
>>>>>
>>>>>   int BITMASK =
>>>> Integer.parseInt("00001110", 2);
>>>>> Besides the obvious extra verbosity, there are several
>>>> problems with this:
>>>>> * Calling a method such as Integer.parseInt at runtime
>>>> will typically make it
>>>>> impossible for the compiler to inline the value of
>>>> this constant, since its
>>>>> value has been taken from a runtime method call.
>>>> Inlining is important,
>>>>> because code that does bitwise parsing is often very
>>>> low-level code in tight
>>>>> loops that must execute quickly. (This is particularly
>>>> the case for mobile
>>>>> applications and other applications that run on
>>>> severely resource-constrained
>>>>> environments, which is one of the cases where binary
>>>> numbers would be most
>>>>> valuable, since talking to low-level hardware is one
>>>> of the primary use cases
>>>>> for this feature.)
>>>>>
>>>>> * Constants such as the above cannot be used as
>>>> selectors in 'switch'
>>>>> statements.
>>>>>
>>>>> * Any errors in the string to be parsed (for instance,
>>>> an extra space) will
>>>>> result in runtime exceptions, rather than compile-time
>>>> errors as would have
>>>>> occurred in normal parsing. If such a value is
>>>> declared 'static', this will
>>>>> result in some very ugly exceptions at runtime.
>>>>>
>>>>>
>>>>> EXAMPLES:
>>>>>
>>>>> // An 8-bit 'byte' literal.
>>>>> byte aByte = (byte)0b00100001;
>>>>>
>>>>> // A 16-bit 'short' literal.
>>>>> short aShort = (short)0b1010000101000101;
>>>>>
>>>>> // Some 32-bit 'int' literals.
>>>>> int anInt1 = 0b10100001010001011010000101000101;
>>>>> int anInt2 = 0b101;
>>>>> int anInt3 = 0B101; // The B can be upper or lower
>>>> case as per the x in
>>>>> "0x45".
>>>>>
>>>>> // A 64-bit 'long' literal. Note the
>>>> "L" suffix, as would also be used
>>>>> // for a long in decimal, hexadecimal, or octal.
>>>>> long aLong =
>>>>>
>>>> 0b01010000101000101101000010100010110100001010001011010000101000101L 
>>>> ;
>>>>> SIMPLE EXAMPLE:
>>>>>
>>>>> class Foo {
>>>>> public static void main(String[] args) {
>>>>>  System.out.println("The value 10100001 in
>>>> decimal is " + 0b10100001);
>>>>> }
>>>>>
>>>>>
>>>>> ADVANCED EXAMPLE:
>>>>>
>>>>> // Binary constants could be used in code that needs
>>>> to be
>>>>> // easily checkable against a specifications document,
>>>> such
>>>>> // as this simulator for a hypothetical 8-bit
>>>> microprocessor:
>>>>> public State decodeInstruction(int instruction, State
>>>> state) {
>>>>>  if ((instruction & 0b11100000) == 0b00000000) {
>>>>>    final int register = instruction &
>>>> 0b00001111;
>>>>>    switch (instruction & 0b11110000) {
>>>>>      case 0b00000000: return state.nop();
>>>>>      case 0b00010000: return
>>>> state.copyAccumTo(register);
>>>>>      case 0b00100000: return
>>>> state.addToAccum(register);
>>>>>      case 0b00110000: return
>>>> state.subFromAccum(register);
>>>>>      case 0b01000000: return
>>>> state.multiplyAccumBy(register);
>>>>>      case 0b01010000: return
>>>> state.divideAccumBy(register);
>>>>>      case 0b01100000: return
>>>> state.setAccumFrom(register);
>>>>>      case 0b01110000: return
>>>> state.returnFromCall();
>>>>>      default: throw new IllegalArgumentException();
>>>>>    }
>>>>>  } else {
>>>>>    final int address = instruction & 0b00011111;
>>>>>    switch (instruction & 0b11100000) {
>>>>>      case 0b00100000: return state.jumpTo(address);
>>>>>      case 0b01000000: return
>>>> state.jumpIfAccumZeroTo(address);
>>>>>      case 0b01000000: return
>>>> state.jumpIfAccumNonzeroTo(address);
>>>>>      case 0b01100000: return
>>>> state.setAccumFromMemory(address);
>>>>>      case 0b10100000: return
>>>> state.writeAccumToMemory(address);
>>>>>      case 0b11000000: return state.callTo(address);
>>>>>      default: throw new IllegalArgumentException();
>>>>>    }
>>>>>  }
>>>>> }
>>>>>
>>>>> // Binary literals can be used to make a bitmap more
>>>> readable:
>>>>> public static final short[] HAPPY_FACE = {
>>>>>   (short)0b0000011111100000;
>>>>>   (short)0b0000100000010000;
>>>>>   (short)0b0001000000001000;
>>>>>   (short)0b0010000000000100;
>>>>>   (short)0b0100000000000010;
>>>>>   (short)0b1000011001100001;
>>>>>   (short)0b1000011001100001;
>>>>>   (short)0b1000000000000001;
>>>>>   (short)0b1000000000000001;
>>>>>   (short)0b1001000000001001;
>>>>>   (short)0b1000100000010001;
>>>>>   (short)0b0100011111100010;
>>>>>   (short)0b0010000000000100;
>>>>>   (short)0b0001000000001000;
>>>>>   (short)0b0000100000010000;
>>>>>   (short)0b0000011111100000;
>>>>> }
>>>>>
>>>>> // Binary literals can make relationships
>>>>> // among data more apparent than they would
>>>>> // be in hex or octal.
>>>>> //
>>>>> // For instance, what does the following
>>>>> // array contain? In hexadecimal, it's hard to
>>>> tell:
>>>>> public static final int[] PHASES = {
>>>>>    0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98
>>>>> }
>>>>>
>>>>> // In binary, it's obvious that a number is being
>>>>> // rotated left one bit at a time.
>>>>> public static final int[] PHASES = {
>>>>>    0b00110001,
>>>>>    0b01100010,
>>>>>    0b11000100,
>>>>>    0b10001001,
>>>>>    0b00010011,
>>>>>    0b00100110,
>>>>>    0b01001100,
>>>>>    0b10011000,
>>>>> }
>>>>>
>>>>>
>>>>> DETAILS
>>>>>
>>>>> SPECIFICATION:
>>>>>
>>>>> Section 3.10.1 ("Integer Literals") of the
>>>> JLS3 should be changed to add the
>>>>> following:
>>>>>
>>>>> IntegerLiteral:
>>>>>        DecimalIntegerLiteral
>>>>>        HexIntegerLiteral
>>>>>        OctalIntegerLiteral
>>>>>        BinaryIntegerLiteral         // Added
>>>>>
>>>>> BinaryIntegerLiteral:
>>>>>        BinaryNumeral IntegerTypeSuffix_opt
>>>>>
>>>>> BinaryNumeral:
>>>>>        0 b BinaryDigits
>>>>>        0 B BinaryDigits
>>>>>
>>>>> BinaryDigits:
>>>>>        BinaryDigit
>>>>>        BinaryDigit BinaryDigits
>>>>>
>>>>> BinaryDigit: one of
>>>>>        0 1
>>>>>
>>>>> COMPILATION:
>>>>>
>>>>> Binary literals would be compiled to class files in
>>>> the same fashion as
>>>>> existing decimal, hexadecimal, and octal literals are.
>>>> No special support or
>>>>> changes to the class file format are needed.
>>>>>
>>>>> TESTING:
>>>>>
>>>>> The feature can be tested in the same way as existing
>>>> decimal, hexadecimal,
>>>>> and octal literals are: Create a bunch of constants in
>>>> source code, including
>>>>> the maximum and minimum positive and negative values
>>>> for integer and long
>>>>> types, and verify them at runtime to have the correct
>>>> values.
>>>>>
>>>>> LIBRARY SUPPORT:
>>>>>
>>>>> The methods Integer.decode(String) and
>>>> Long.decode(String) should be modified
>>>>> to parse binary numbers (as specified above) in
>>>> addition to their existing
>>>>> support for decimal, hexadecimal, and octal numbers.
>>>>>
>>>>>
>>>>> REFLECTIVE APIS:
>>>>>
>>>>> No updates to the reflection APIs are needed.
>>>>>
>>>>>
>>>>> OTHER CHANGES:
>>>>>
>>>>> No other changes are needed.
>>>>>
>>>>>
>>>>> MIGRATION:
>>>>>
>>>>> Individual decimal, hexadecimal, or octal constants in
>>>> existing code can be
>>>>> updated to binary as a programmer desires.
>>>>>
>>>>>
>>>>> COMPATIBILITY
>>>>>
>>>>>
>>>>> BREAKING CHANGES:
>>>>>
>>>>> This feature would not break any existing programs,
>>>> since the suggested
>>>>> syntax is currently considerd to be a compile-time
>>>> error.
>>>>>
>>>>> EXISTING PROGRAMS:
>>>>>
>>>>> Class file format does not change, so existing
>>>> programs can use class files
>>>>> compiled with the new feature without problems.
>>>>>
>>>>>
>>>>> REFERENCES:
>>>>>
>>>>> The GCC/G++ compiler, which already supports this
>>>> syntax (as of version 4.3)
>>>>> as an extension to standard C/C++.
>>>>> http://gcc.gnu.org/gcc-4.3/changes.html
>>>>>
>>>>> The Ruby language, which supports binary literals:
>>>>> http://wordaligned.org/articles/binary-literals
>>>>>
>>>>> The Python language added binary literals in version
>>>> 2.6:
>>>> http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax
>>>>> EXISTING BUGS:
>>>>>
>>>>> "Language support for literal numbers in binary
>>>> and other bases"
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025288
>>>>> URL FOR PROTOTYPE (optional):
>>>>>
>>>>> None.
>>>>>
>>>>>
>>>
>>>
>>>
>>
> 
> 
> 



More information about the coin-dev mailing list