PROPOSAL: Binary Literals

Matthias Ernst matthias at mernst.org
Thu Mar 26 00:29:49 PDT 2009


On Thu, Mar 26, 2009 at 12:47 AM, Reinier Zwitserloot
<reinier at zwitserloot.com> wrote:
> Good points. Another useful method:
>
> public class Byte {
>     public static byte withBitsSet(int... positions);
>     public static byte withBitsCleared(int... positions);
>     public static byte of(String bitString);
> }


>
> where setBits starts with 0 and then turns on the bit at each listed
> position, and clearBits starts with -1 and turns off the bit at each
> listed position.
>
> 0x1C can then be implemented as:
>
> Byte.withBitsSet(4, 3, 2);
> Byte.withBitsCleared(7, 6, 5, 1, 0);
> Byte.of("00011100");

JDK 7's BitSet will allow this:
new BitSet().set(2, 4).toByteArray()[0];

I believe this proposal could be turned into a new
BitSet.valueOf(String) method with higher chances of acceptance?

> That WOULD be a useful library addition. Not only are they almost as
> short as the proposed byte literal, anybody that doesn't have the
> first clue about bit masking can at least find some javadoc this way
> and learn. They would be awfully slow compared to a literal, but
> presumably any literal needs to be calculated only once, so any
> slowness in the methods should never have a measurable impact. It's a
> shame the JVM can't memoize, though.
>
>
>
>  --Reinier Zwitserloot
>
>
>
> On Mar 25, 2009, at 23:17, Stephen Colebourne wrote:
>
>>> A couple of quick notes:
>>>
>>> 1) Ruby already has this. (The underscores in numbers, that is)
>>
>> So, does Fan. Yes, someone should write up underscores in numbers as a
>> separate proposal. It won't be me.
>>
>>
>>> Of all low impact coin submissions, this just isn't very compelling.
>>> Real Programmers can count in hexadecimal ever since they were A
>>> years
>>> old, after all.
>>
>> Actually, my experience suggests that lots can't count in hex - we
>> have
>> calculators and the internet we that kind of thing these days.
>>
>> There are definitely use cases for this too, but they occur relatively
>> rarely. For example, I once encoded the 30 year repeating pattern of
>> leap years in the Islamic calendar system as a binary. This was a mess
>> to write:
>>
>> Years to encode: 2, 5, 7, 10, 13, 16, 18, 21, 24, 26 & 29
>>
>> Final int value: 623191204
>>
>> Binary literal: 0b00010010_10010010_10010010_01010010
>>
>> It is also beneficial in teaching.
>>
>> But the truth is it will always be low priority (mind you, I'd have
>> placed it higher than hexadecimal floating point literals...)
>>
>> Stephen
>>
>>
>>
>>
>> Reinier Zwitserloot wrote:
>>> Of all low impact coin submissions, this just isn't very compelling.
>>> Real Programmers can count in hexadecimal ever since they were A
>>> years
>>> old, after all.
>>>
>>>  --Reinier Zwitserloot
>>>
>>>
>>>
>>> On Mar 25, 2009, at 20:35, Gaseous Fumes wrote:
>>>
>>>> A couple of quick notes:
>>>>
>>>> 1) Ruby already has this. (The underscores in numbers, that is)
>>>>
>>>> 2) I considered adding this to the proposal as I was writing it, but
>>>> decided it was an orthogonal issue and deserved its own proposal,
>>>> since it has nothing per se to do with binary literals. (As James
>>>> mentions, it would make sense for all numbers, not just binary
>>>> ones.)
>>>>
>>>> 3) I encourage someone else to write it up as a proposal. If I get
>>>> done with the other proposals I intend to submit, and still have
>>>> time, I might do it myself, but if you want to ensure that the
>>>> proposal gets proposed, I suggest you don't wait for me. One caveat:
>>>> Consider the impact on Integer.parseInt, Long.decode(), etc. (I
>>>> suggest the decode methods get changed to accept underscores, but
>>>> the parseInt ones don't.)
>>>>
>>>> 4) An observation to Joe Darcy and the other Sun engineers involved
>>>> in reviewing proposals: The expected "about five" limit on proposals
>>>> really encourages people to lump a bunch of semi-related things into
>>>> one proposal rather than making each their own proposal, even when
>>>> the latter would be a more logical way of getting individual
>>>> orthogonal ideas reviewed separately. I think this is a problem.
>>>>
>>>> For instance, the "null safe operators" proposal (all or nothing)
>>>> vs. splitting each of them out as individual proposals. There have
>>>> been a number of proposals put forth where I thought "I agree with
>>>> half of this proposal and wish the other half wasn't in the same
>>>> proposal."
>>>>
>>>> I hope that the "about" in "about five" is flexible enough to allow
>>>> a bunch of very minor proposals to expand that limit well past five
>>>> if they seem good ideas and easy to implement with few
>>>> repercussions. Five seems like a pretty low number to me, given that
>>>> it's been MANY years since it was even possible for users to suggest
>>>> changes to Java (basically, since JDK 5 was in the planning stages,
>>>> as JDK 6 was announced to be a "no language changes" release), and
>>>> there has been much evolution in other programming languages during
>>>> that time. I think that good ideas should make it into Java (and bad
>>>> ideas shouldn't) subject to the necessary manpower in review and
>>>> implementation, regardless of the number of proposals used to submit
>>>> them. Otherwise, Java risks getting left in the dust as other
>>>> languages become much easier to use, much faster.
>>>>
>>>> Derek
>>>>
>>>> -----Original Message-----
>>>>> From: james lowden <jl0235 at yahoo.com>
>>>>> Sent: Mar 25, 2009 12:07 PM
>>>>> To: coin-dev at openjdk.java.net
>>>>> Subject: Re: PROPOSAL: Binary Literals
>>>>>
>>>>>
>>>>> Actually, that's a good idea in general for long numeric
>>>>> constants.  9_000_000_000_000_000L is easier to parse than
>>>>> 9000000000000000L.
>>>>>
>>>>>
>>>>> --- On Wed, 3/25/09, Stephen Colebourne <scolebourne at joda.org>
>>>>> wrote:
>>>>>
>>>>>> From: Stephen Colebourne <scolebourne at joda.org>
>>>>>> Subject: Re: PROPOSAL: Binary Literals
>>>>>> To: coin-dev at openjdk.java.net
>>>>>> Date: Wednesday, March 25, 2009, 1:50 PM
>>>>>> See
>>>>>> http://www.jroller.com/scolebourne/entry/changing_java_adding_simpler_primitive
>>>>>> for my take on this from long ago.
>>>>>>
>>>>>> In particular, I'd suggest allowing a character to
>>>>>> separate long binary strings:
>>>>>>
>>>>>> int anInt1 = 0b10100001_01000101_10100001_01000101;
>>>>>>
>>>>>> much more readable.
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> 2009/3/25 Derek Foster <vapor1 at teleport.com>:
>>>>>>> Hmm. Second try at sending to the list. Let's see
>>>>>> if this works. (In the
>>>>>>> meantime, I noticed that Bruce Chapman has mentioned
>>>>>> something similar in his
>>>>>>> another proposal, so I think we are in agreement on
>>>>>> this. This proposal
>>>>>>> should not be taken as to compete with his similar
>>>>>> proposal: I'd quite like
>>>>>>> to see type suffixes for bytes, shorts, etc. added to
>>>>>> Java, in addition to
>>>>>>> binary literals.) Anyway...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Add binary literals to Java.
>>>>>>>
>>>>>>> AUTHOR(S): Derek Foster
>>>>>>>
>>>>>>> OVERVIEW
>>>>>>>
>>>>>>> In some programming domains, use of binary numbers
>>>>>> (typically as bitmasks,
>>>>>>> bit-shifts, etc.) is very common. However, Java code,
>>>>>> due to its C heritage,
>>>>>>> has traditionally forced programmers to represent
>>>>>> numbers in only decimal,
>>>>>>> octal, or hexadecimal. (In practice, octal is rarely
>>>>>> used, and is present
>>>>>>> mostly for backwards compatibility with C)
>>>>>>>
>>>>>>> When the data being dealt with is fundamentally
>>>>>> bit-oriented, however, using
>>>>>>> hexadecimal to represent ranges of bits requires an
>>>>>> extra degree of
>>>>>>> translation for the programmer, and this can often
>>>>>> become a source of errors.
>>>>>>> For instance, if a technical specification lists
>>>>>> specific values of interest
>>>>>>> in binary (for example, in a compression encoding
>>>>>> algorithm or in the
>>>>>>> specifications for a network protocol, or for
>>>>>> communicating with a bitmapped
>>>>>>> hardware device) then a programmer coding to that
>>>>>> specification must
>>>>>>> translate each such value from its binary
>>>>>> representation into hexadecimal.
>>>>>>> Checking to see if this translation has been done
>>>>>> correctly is accomplished
>>>>>>> by back-translating the numbers. In most cases,
>>>>>> programmers do these
>>>>>>> translations in their heads, and HOPEFULLY get them
>>>>>> right. however, errors
>>>>>>> can easily creep in, and re-verifying the results is
>>>>>> not straightforward
>>>>>>> enough to be done frequently.
>>>>>>>
>>>>>>> Furthermore, in many cases, the binary representations
>>>>>> of numbers makes it
>>>>>>> much more clear what is actually intended than the
>>>>>> hexadecimal one. For
>>>>>>> instance, this:
>>>>>>>
>>>>>>> private static final int BITMASK = 0x1E;
>>>>>>>
>>>>>>> does not immediately make it clear that the bitmask
>>>>>> being declared comprises
>>>>>>> a single contiguous range of four bits.
>>>>>>>
>>>>>>> In many cases, it would be more natural for the
>>>>>> programmer to be able to
>>>>>>> write the numbers in binary in the source code,
>>>>>> eliminating the need for
>>>>>>> manual translation to hexadecimal entirely.
>>>>>>>
>>>>>>>
>>>>>>> FEATURE SUMMARY:
>>>>>>>
>>>>>>> In addition to the existing "1" (decimal),
>>>>>> "01" (octal) and "0x1"
>>>>>>> (hexadecimal) form of specifying numeric literals, a
>>>>>> new form "0b1" (binary)
>>>>>>> would be added.
>>>>>>>
>>>>>>> Note that this is the same syntax as has been used as
>>>>>> an extension by the GCC
>>>>>>> C/C++ compilers for many years, and also is used in
>>>>>> the Ruby language, as
>>>>>>> well as in the Python language.
>>>>>>>
>>>>>>>
>>>>>>> MAJOR ADVANTAGE:
>>>>>>>
>>>>>>> It is no longer necessary for programmers to translate
>>>>>> binary numbers to and
>>>>>>> from hexadecimal in order to use them in Java
>>>>>> programs.
>>>>>>>
>>>>>>> MAJOR BENEFIT:
>>>>>>>
>>>>>>> Code using bitwise operations is more readable and
>>>>>> easier to verify against
>>>>>>> technical specifications that use binary numbers to
>>>>>> specify constants.
>>>>>>> Routines that are bit-oriented are easier to
>>>>>> understand when an artifical
>>>>>>> translation to hexadecimal is not required in order to
>>>>>> fulfill the
>>>>>>> constraints of the language.
>>>>>>>
>>>>>>> MAJOR DISADVANTAGE:
>>>>>>>
>>>>>>> Someone might incorrectly think that "0b1"
>>>>>> represented the same value as
>>>>>>> hexadecimal number "0xB1". However, note
>>>>>> that this problem has existed for
>>>>>>> octal/decimal for many years (confusion between
>>>>>> "050" and "50") and does not
>>>>>>> seem to be a major issue.
>>>>>>>
>>>>>>>
>>>>>>> ALTERNATIVES:
>>>>>>>
>>>>>>> Users could continue to write the numbers as decimal,
>>>>>> octal, or hexadecimal,
>>>>>>> and would continue to have the problems observed in
>>>>>> this document.
>>>>>>> Another alternative would be for code to translate at
>>>>>> runtime from binary
>>>>>>> strings, such as:
>>>>>>>
>>>>>>>  int BITMASK =
>>>>>> Integer.parseInt("00001110", 2);
>>>>>>> Besides the obvious extra verbosity, there are several
>>>>>> problems with this:
>>>>>>> * Calling a method such as Integer.parseInt at runtime
>>>>>> will typically make it
>>>>>>> impossible for the compiler to inline the value of
>>>>>> this constant, since its
>>>>>>> value has been taken from a runtime method call.
>>>>>> Inlining is important,
>>>>>>> because code that does bitwise parsing is often very
>>>>>> low-level code in tight
>>>>>>> loops that must execute quickly. (This is particularly
>>>>>> the case for mobile
>>>>>>> applications and other applications that run on
>>>>>> severely resource-constrained
>>>>>>> environments, which is one of the cases where binary
>>>>>> numbers would be most
>>>>>>> valuable, since talking to low-level hardware is one
>>>>>> of the primary use cases
>>>>>>> for this feature.)
>>>>>>>
>>>>>>> * Constants such as the above cannot be used as
>>>>>> selectors in 'switch'
>>>>>>> statements.
>>>>>>>
>>>>>>> * Any errors in the string to be parsed (for instance,
>>>>>> an extra space) will
>>>>>>> result in runtime exceptions, rather than compile-time
>>>>>> errors as would have
>>>>>>> occurred in normal parsing. If such a value is
>>>>>> declared 'static', this will
>>>>>>> result in some very ugly exceptions at runtime.
>>>>>>>
>>>>>>>
>>>>>>> EXAMPLES:
>>>>>>>
>>>>>>> // An 8-bit 'byte' literal.
>>>>>>> byte aByte = (byte)0b00100001;
>>>>>>>
>>>>>>> // A 16-bit 'short' literal.
>>>>>>> short aShort = (short)0b1010000101000101;
>>>>>>>
>>>>>>> // Some 32-bit 'int' literals.
>>>>>>> int anInt1 = 0b10100001010001011010000101000101;
>>>>>>> int anInt2 = 0b101;
>>>>>>> int anInt3 = 0B101; // The B can be upper or lower
>>>>>> case as per the x in
>>>>>>> "0x45".
>>>>>>>
>>>>>>> // A 64-bit 'long' literal. Note the
>>>>>> "L" suffix, as would also be used
>>>>>>> // for a long in decimal, hexadecimal, or octal.
>>>>>>> long aLong =
>>>>>>>
>>>>>> 0b01010000101000101101000010100010110100001010001011010000101000101L
>>>>>> ;
>>>>>>> SIMPLE EXAMPLE:
>>>>>>>
>>>>>>> class Foo {
>>>>>>> public static void main(String[] args) {
>>>>>>> System.out.println("The value 10100001 in
>>>>>> decimal is " + 0b10100001);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ADVANCED EXAMPLE:
>>>>>>>
>>>>>>> // Binary constants could be used in code that needs
>>>>>> to be
>>>>>>> // easily checkable against a specifications document,
>>>>>> such
>>>>>>> // as this simulator for a hypothetical 8-bit
>>>>>> microprocessor:
>>>>>>> public State decodeInstruction(int instruction, State
>>>>>> state) {
>>>>>>> if ((instruction & 0b11100000) == 0b00000000) {
>>>>>>>   final int register = instruction &
>>>>>> 0b00001111;
>>>>>>>   switch (instruction & 0b11110000) {
>>>>>>>     case 0b00000000: return state.nop();
>>>>>>>     case 0b00010000: return
>>>>>> state.copyAccumTo(register);
>>>>>>>     case 0b00100000: return
>>>>>> state.addToAccum(register);
>>>>>>>     case 0b00110000: return
>>>>>> state.subFromAccum(register);
>>>>>>>     case 0b01000000: return
>>>>>> state.multiplyAccumBy(register);
>>>>>>>     case 0b01010000: return
>>>>>> state.divideAccumBy(register);
>>>>>>>     case 0b01100000: return
>>>>>> state.setAccumFrom(register);
>>>>>>>     case 0b01110000: return
>>>>>> state.returnFromCall();
>>>>>>>     default: throw new IllegalArgumentException();
>>>>>>>   }
>>>>>>> } else {
>>>>>>>   final int address = instruction & 0b00011111;
>>>>>>>   switch (instruction & 0b11100000) {
>>>>>>>     case 0b00100000: return state.jumpTo(address);
>>>>>>>     case 0b01000000: return
>>>>>> state.jumpIfAccumZeroTo(address);
>>>>>>>     case 0b01000000: return
>>>>>> state.jumpIfAccumNonzeroTo(address);
>>>>>>>     case 0b01100000: return
>>>>>> state.setAccumFromMemory(address);
>>>>>>>     case 0b10100000: return
>>>>>> state.writeAccumToMemory(address);
>>>>>>>     case 0b11000000: return state.callTo(address);
>>>>>>>     default: throw new IllegalArgumentException();
>>>>>>>   }
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> // Binary literals can be used to make a bitmap more
>>>>>> readable:
>>>>>>> public static final short[] HAPPY_FACE = {
>>>>>>>  (short)0b0000011111100000;
>>>>>>>  (short)0b0000100000010000;
>>>>>>>  (short)0b0001000000001000;
>>>>>>>  (short)0b0010000000000100;
>>>>>>>  (short)0b0100000000000010;
>>>>>>>  (short)0b1000011001100001;
>>>>>>>  (short)0b1000011001100001;
>>>>>>>  (short)0b1000000000000001;
>>>>>>>  (short)0b1000000000000001;
>>>>>>>  (short)0b1001000000001001;
>>>>>>>  (short)0b1000100000010001;
>>>>>>>  (short)0b0100011111100010;
>>>>>>>  (short)0b0010000000000100;
>>>>>>>  (short)0b0001000000001000;
>>>>>>>  (short)0b0000100000010000;
>>>>>>>  (short)0b0000011111100000;
>>>>>>> }
>>>>>>>
>>>>>>> // Binary literals can make relationships
>>>>>>> // among data more apparent than they would
>>>>>>> // be in hex or octal.
>>>>>>> //
>>>>>>> // For instance, what does the following
>>>>>>> // array contain? In hexadecimal, it's hard to
>>>>>> tell:
>>>>>>> public static final int[] PHASES = {
>>>>>>>   0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98
>>>>>>> }
>>>>>>>
>>>>>>> // In binary, it's obvious that a number is being
>>>>>>> // rotated left one bit at a time.
>>>>>>> public static final int[] PHASES = {
>>>>>>>   0b00110001,
>>>>>>>   0b01100010,
>>>>>>>   0b11000100,
>>>>>>>   0b10001001,
>>>>>>>   0b00010011,
>>>>>>>   0b00100110,
>>>>>>>   0b01001100,
>>>>>>>   0b10011000,
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> DETAILS
>>>>>>>
>>>>>>> SPECIFICATION:
>>>>>>>
>>>>>>> Section 3.10.1 ("Integer Literals") of the
>>>>>> JLS3 should be changed to add the
>>>>>>> following:
>>>>>>>
>>>>>>> IntegerLiteral:
>>>>>>>       DecimalIntegerLiteral
>>>>>>>       HexIntegerLiteral
>>>>>>>       OctalIntegerLiteral
>>>>>>>       BinaryIntegerLiteral         // Added
>>>>>>>
>>>>>>> BinaryIntegerLiteral:
>>>>>>>       BinaryNumeral IntegerTypeSuffix_opt
>>>>>>>
>>>>>>> BinaryNumeral:
>>>>>>>       0 b BinaryDigits
>>>>>>>       0 B BinaryDigits
>>>>>>>
>>>>>>> BinaryDigits:
>>>>>>>       BinaryDigit
>>>>>>>       BinaryDigit BinaryDigits
>>>>>>>
>>>>>>> BinaryDigit: one of
>>>>>>>       0 1
>>>>>>>
>>>>>>> COMPILATION:
>>>>>>>
>>>>>>> Binary literals would be compiled to class files in
>>>>>> the same fashion as
>>>>>>> existing decimal, hexadecimal, and octal literals are.
>>>>>> No special support or
>>>>>>> changes to the class file format are needed.
>>>>>>>
>>>>>>> TESTING:
>>>>>>>
>>>>>>> The feature can be tested in the same way as existing
>>>>>> decimal, hexadecimal,
>>>>>>> and octal literals are: Create a bunch of constants in
>>>>>> source code, including
>>>>>>> the maximum and minimum positive and negative values
>>>>>> for integer and long
>>>>>>> types, and verify them at runtime to have the correct
>>>>>> values.
>>>>>>>
>>>>>>> LIBRARY SUPPORT:
>>>>>>>
>>>>>>> The methods Integer.decode(String) and
>>>>>> Long.decode(String) should be modified
>>>>>>> to parse binary numbers (as specified above) in
>>>>>> addition to their existing
>>>>>>> support for decimal, hexadecimal, and octal numbers.
>>>>>>>
>>>>>>>
>>>>>>> REFLECTIVE APIS:
>>>>>>>
>>>>>>> No updates to the reflection APIs are needed.
>>>>>>>
>>>>>>>
>>>>>>> OTHER CHANGES:
>>>>>>>
>>>>>>> No other changes are needed.
>>>>>>>
>>>>>>>
>>>>>>> MIGRATION:
>>>>>>>
>>>>>>> Individual decimal, hexadecimal, or octal constants in
>>>>>> existing code can be
>>>>>>> updated to binary as a programmer desires.
>>>>>>>
>>>>>>>
>>>>>>> COMPATIBILITY
>>>>>>>
>>>>>>>
>>>>>>> BREAKING CHANGES:
>>>>>>>
>>>>>>> This feature would not break any existing programs,
>>>>>> since the suggested
>>>>>>> syntax is currently considerd to be a compile-time
>>>>>> error.
>>>>>>>
>>>>>>> EXISTING PROGRAMS:
>>>>>>>
>>>>>>> Class file format does not change, so existing
>>>>>> programs can use class files
>>>>>>> compiled with the new feature without problems.
>>>>>>>
>>>>>>>
>>>>>>> REFERENCES:
>>>>>>>
>>>>>>> The GCC/G++ compiler, which already supports this
>>>>>> syntax (as of version 4.3)
>>>>>>> as an extension to standard C/C++.
>>>>>>> http://gcc.gnu.org/gcc-4.3/changes.html
>>>>>>>
>>>>>>> The Ruby language, which supports binary literals:
>>>>>>> http://wordaligned.org/articles/binary-literals
>>>>>>>
>>>>>>> The Python language added binary literals in version
>>>>>> 2.6:
>>>>>> http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax
>>>>>>> EXISTING BUGS:
>>>>>>>
>>>>>>> "Language support for literal numbers in binary
>>>>>> and other bases"
>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025288
>>>>>>> URL FOR PROTOTYPE (optional):
>>>>>>>
>>>>>>> None.
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>



More information about the coin-dev mailing list