PROPOSAL: Underscores in numbers
Derek Foster
vapor1 at teleport.com
Fri Apr 3 00:45:56 PDT 2009
-----Original Message-----
>From: Bruce Chapman <brucechapman at paradise.net.nz>
>Sent: Mar 31, 2009 3:14 AM
>To: Derek Foster <vapor1 at teleport.com>
>Cc: coin-dev at openjdk.java.net
>Subject: Re: PROPOSAL: Underscores in numbers
>
>Derek,
>thanks for writing this up, it saved me doing it. (and it would
>complement my integer literal proposals nicely)
You're welcome!
Personally, I definitely hope that your type suffixes for bytes and shorts proposal goes through. Lack of that seems a weird omission in the language, and I am tired of casting to bytes.
I am not as keen on the other proposal with its "0h52" notation, however. That seems to be mixing radix and type in a way that is not very consistent with how the rest of the language works.
>A couple of comments:
>
>1. I don't really like most of your decimal examples because (apart from
>the money one) although we talk about these as numbers, they are not
>numbers really, just identifiers whose significant elements consist
>solely of digits. As an explanation, prepending 00 to the front of any
>of these would yield a different (or invalid) phone/credit card/SS
>number, whereas prepending 00 to a number does not (but not in java
>where it makes an octal or a compiler error :), similarly add two
>together or subtracting them makes no sense.
I think that both pure numbers (underscores used for order-of-magnitude grouping of digits) and formatted numbers (underscores used in standard groups) like these are both valid use cases for this feature. They are definitely different use cases, but it is nice that the feature can be used for both.
>Real countable number examples that could be useful are populations,
>national debt, Long.MAX_VALUE etc, as well as the hex and binary
>examples you use later.
I'm not averse to adding more examples. I tried to make each existing example show a different likely use of the feature. So I really only needed one example to show the use of underscores to group big numbers in three-digit chunks (as commas would be used, for Americans anyway).
>I am not convinced of the utility of multiple consecutive underscore
>separators, and I think the example that uses that is confusing and
>asking for trouble. IMHO it would add to the value if you removed that
>option, but you might have some good use case you are thinking about.
>Similarly for underscore at either end seems strange. It might take more
>effort to describe it such that these are illegal, but I don't think
>anyone would complain at the absence of that form.
I left them in because I couldn't see a good reason to exclude them, and it's possible that someone might want to use them for some form of formatting that I haven't thought of. Perhaps someone might want to space the digits out to make a number line up with the characters in a comment field that's above it, for instance:
// AreaCode Exchange Number
int x = 555______555___1212;
int y = 555______857___5309;
The question really is whether adding multiples causes actual problems. As far as I can see, it doesn't. If someone is abusing them to write unclear code, then it seems to me that that person is really the problem. I don't think the feature in and of itself particularly encourages abuse. It could be used in ways that make code clearer rather than harder to read. I don't really expect it to be used much, though.
>Any reason why in the syntax you didn't just add underscore to the
>various XXXXDigits forms (and maybe change the name a little) rather
>than the more complex approach, then describe the erasure of the
>underscore in the description of each form?
Because that would have made things like "0x_" or "_._" be parseable as syntactically legal numbers, which would then become illegal once the underscores were removed.
Getting the grammar right for the underscores proposal turned out to be much trickier than I had anticipated. (It probably took me two hours.) I initially did it wrong, and had to revise it once I realized that I couldn't ever allow an underscore to appear if it wasn't next to a digit.
Derek
>Bruce
>
>Derek Foster wrote:
>> When I posted my Binary Literals proposal, I got feedback from several people stating that they would like to see a proposal regarding underscores in numbers, in the style of Ruby, since that would make numbers of all types (but especially binary ones) easier to read. Here is such a proposal.
>>
>> AUTHOR(S): Derek Foster
>>
>> OVERVIEW
>>
>> In Java, currently, numbers of various types currently are expressed in their pure form, as a long string of digits possibly interspersed with other punctuation (periods, an exponent specifier, etc.) needed to separate distinct sections of the number. While this is easy for a compiler to process, it is often difficult for a human being to visually parse.
>>
>> The ability of a human to visually separate separate items tops out somewhere near "seven plus or minus two" items. Research done by telephone companies suggests that for many practical purposes, the longest string of numbers an average human can successfully hold in memory at a time is around three or four. Also, it is difficult for the human eye to find common positions in numbers that have no break in their visual structure.
>>
>> As a result, most numbers that humans deal with in day-to-day life have separators included in them to break the long sequence of digits into manageable chunks that are easy to deal with as separate entities. This includes items such as (apologies to non-USA readers...):
>>
>> Phone numbers: 555-555-1212
>> Credit card numbers: 1234-5678-9012-3456
>> Social security numbers: 999-99-9999
>> Monetary amounts: $12,345,132.12
>>
>> and a wide variety of other types of numbers.
>>
>> However, Java provides no way to add these kinds of visual separators into a number. Java expects the number to be essentially an unbroken string of digits.
>>
>> This proposal suggests that Java follow the lead of the Ruby programming language in allowing the underscore character to be inserted into numbers in most positions, for readability purposes.
>>
>>
>> FEATURE SUMMARY:
>>
>> Java numeric literals will allow underscores to be placed in (nearly) arbitrary positions within the number, at the programmer's discretion, for readability purposes. These underscores shall be ignored by the compiler for the purposes of code generation.
>>
>>
>> MAJOR ADVANTAGE:
>>
>> Programmers won't have to visually parse long strings of digits (a task humans are quite poor at). The internal digit-oriented structure of many numbers can be made more clear.
>>
>>
>> MAJOR BENEFIT:
>>
>> Increased readability of code.
>>
>>
>> MAJOR DISADVANTAGE:
>>
>> The number parsers in the Java compiler would have to be adjusted to parse and ignore the underscores. This is a small amount of effort, but nonzero. There might also be some small performance impact.
>>
>> If someone were to use this feature inappropriately, it could result in difficult to read code.
>>
>>
>> ALTERNATIVES:
>>
>> Do without separators in numbers, or use some other character for them.
>>
>>
>> EXAMPLES
>>
>>
>> SIMPLE EXAMPLE: Show the simplest possible program utilizing the new feature.
>>
>> int phoneNumber = 555_555_1212;
>>
>>
>> ADVANCED EXAMPLE:
>>
>> long creditCardNumber = 1234_5678_9012_3456L;
>> long socialSecurityNumbers = 999_99_9999L;
>> float monetaryAmount = 12_345_132.12;
>> long hexBytes = 0xFF_EC_DE_5E;
>> long hexWords = 0xFFEC_DE5E;
>> long maxLong = 0x7fff_ffff_ffff_ffffL;
>> long alsoMaxLong = 9_223_372_036_854_775_808L;
>> double whyWouldYouEverDoThis = 0x1_.ffff_ffff_ffff_fp10_23;
>>
>> (Additionally, if binary literals are ever added to the Java language, the following might also be possible...
>> byte nybbles = 0b0010_0101;
>> long bytes = 0b11010010_01101001_10010100_10010010;
>> int weirdBitfields = 0b000_10_101;
>> )
>>
>> Note that underscores can be placed around and between digits, but that underscores cannot be placed by themselves in positions where a string of digits would normally be expected:
>>
>> int x1 = _52; // This is an identifier, not a numeric literal.
>> int x2 = 5_2; // OK. (decimal literal)
>> int x2 = 52_; // OK. (decimal literal)
>> int x3 = 5_______2; // OK. (decimal literal.)
>>
>> int x4 = 0_x52; // Illegal. Can't put underscores in the "0x" radix prefix.
>> int x5 = 0x_52; // OK. (hexadecimal literal)
>> int x6 = 0x5_2; // OK. (hexadecimal literal)
>> int x6 = 0x52_; // OK. (hexadecimal literal)
>> int x6 = 0x_; // Illegal. Not a valid hex number with the underscore removed.
>>
>> int x7 = 0_52; // OK. (octal literal)
>> int x7 = 05_2; // OK. (octal literal)
>> int x8 = 052_; // OK. (octal literal)
>>
>>
>> DETAILS
>>
>> SPECIFICATION:
>>
>> DECIMALS:
>>
>> Section 3.10.1 ("Integer Literals") of the Java Language Specification 3rd edition shall be modified like so:
>>
>> Underscores:
>> _
>> _ Underscores
>>
>> DecimalNumeral:
>> 0
>> NonZeroDigit DigitsAndUnderscores_opt
>>
>> DigitsAndUnderscores:
>> Underscores_opt Digit Underscores_opt
>> DigitsAndUnderscores Digit Underscores_opt
>>
>> Digit:
>> 0
>> NonZeroDigit
>>
>> NonZeroDigit: one of
>> 1 2 3 4 5 6 7 8 9
>>
>> HexNumeral:
>> 0 x HexDigitsAndUnderscores
>> 0 X HexDigitsAndUnderscores
>>
>> HexDigitsAndUnderscores:
>> Underscores_opt HexDigit Underscores_opt
>> Underscores_opt HexDigit HexDigitsAndUnderscores
>>
>> HexDigit: one of
>> 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
>>
>> OctalNumeral:
>> 0 OctalDigitsAndUnderscores
>>
>> OctalDigitsAndUnderscores:
>> Underscores_opt OctalDigit Underscores_opt
>> Underscores_opt OctalDigit OctalDigitsAndUnderscores
>>
>> OctalDigit: one of
>> 0 1 2 3 4 5 6 7
>>
>> Section 3.10.2 ("Floating-Point Literals") would be modified as follows:
>>
>>
>> FloatingPointLiteral:
>> DecimalFloatingPointLiteral
>> HexadecimalFloatingPointLiteral
>>
>> DecimalFloatingPointLiteral:
>> Digit DigitsAndUnderscores_opt . DigitsAndUnderscores_opt ExponentPart_opt FloatTypeSuffix_opt
>> . DigitsAndUnderscores ExponentPartopt FloatTypeSuffix_opt
>> Digit DigitsAndUnderscores_opt ExponentPart FloatTypeSuffix_opt
>> Digit DigitsAndUnderscores_opt ExponentPart_opt FloatTypeSuffix
>>
>> ExponentPart:
>> ExponentIndicator SignedInteger
>>
>> ExponentIndicator: one of
>> e E
>>
>> SignedInteger:
>> Sign_opt DigitsAndUnderscores
>>
>> Sign: one of
>> + -
>>
>> FloatTypeSuffix: one of
>> f F d D
>>
>> HexadecimalFloatingPointLiteral:
>> HexSignificand BinaryExponent FloatTypeSuffix_opt
>>
>> HexSignificand:
>> HexNumeral
>> HexNumeral .
>> 0x HexDigitsAndUnderscores_opt . HexDigitsAndUnderscores
>> 0X HexDigitsAndUnderscores_opt . HexDigitsAndUnderscores
>>
>> BinaryExponent:
>> BinaryExponentIndicator SignedInteger
>>
>> BinaryExponentIndicator:one of
>> p P
>>
>>
>> COMPILATION:
>>
>> Numbers containing underscores are to be parsed and evaluated by the compiler exactly as if the underscores were not present. The above grammar ensures that removing underscores will not result in an unparseable number.
>>
>> A simple strategy for achieving this is that once a number has been parsed by the compiler lexer and determined to be syntactically valid according to the above grammar, then if the number contains any underscores, then all underscores in it may be removed (by something as simple as numberAsString.replaceAll("_","")) before passing the number on to the code that would normally have parsed the number prior to this proposal.
>>
>> More performant approaches are certainly possible.
>>
>>
>> TESTING: How can the feature be tested?
>>
>> A variety of literals may be generated, of the cross product of each of the following radicies:
>>
>> hex, decimal, octal
>>
>> with each of the following types:
>>
>> byte, char, short, int, long, float, double
>>
>> such that for each possible numeric field in the result, that one or more underscores are inserted at the beginning, in the middle, and at the end of the digits.
>>
>> Note that the above grammar is specifically designed to disallow any underscores from appearing which are not either preceded by or followed by a digit.
>>
>>
>> LIBRARY SUPPORT:
>>
>> Methods such as Integer.decode(String) and Long.decode(String) should probably be updated to ignore underscores in their inputs, since these methods attempt to parse according to Java conventions.
>>
>> I suggest that methods such as Integer.parseInt(), Float.parseFloat(), etc. should probably NOT be updated to ignore underscores, since these methods deal with numbers in their pure form, and are more focused and much more widely used. To alter them to ignore underscores would introduce ambiguity in (and have a performance impact on) various parsing code that uses them.
>>
>>
>> REFLECTIVE APIS:
>>
>> No changes to reflective APIs are needed.
>>
>>
>> OTHER CHANGES:
>>
>> No other changes are needed.
>>
>>
>> MIGRATION:
>>
>> Underscores can be inserted into numbers within an existing code base as desired for readability.
>>
>>
>> COMPATIBILITY
>>
>> BREAKING CHANGES:
>>
>> Since use of underscores within numbers was previously a syntax error, this should not break any existing programs.
>>
>>
>> EXISTING PROGRAMS:
>>
>> This feature does not affect the format of class files. It is purely a notational convenience. Hence, interaction with existing class files would not be affected.
>>
>> REFERENCES
>>
>> EXISTING BUGS:
>>
>> A search of the Bug Database did not find any bug ID's related to this proposal.
>>
>> URL FOR PROTOTYPE (optional):
>>
>> None.
>>
>>
>>
>>
>
More information about the coin-dev
mailing list