RFR: 8197594 - String and character repeat

James Laskey james.laskey at oracle.com
Sun Feb 18 09:37:34 UTC 2018


Didn’t I hear someone mentioning “\U1D11A” at some point?

Sent from my iPhone

> On Feb 18, 2018, at 1:10 AM, Stuart Marks <stuart.marks at oracle.com> wrote:
> 
> Fair enough. I'll be less unhappy if there is a way to convert from a code point to a String, as requested by JDK-4993841. This will reduce
> 
>    new String(Character.toChars(codepoint)).repeat(count)
> 
> to
> 
>    Character.toString(codepoint).repeat(count)
> 
> But this is still fairly roundabout. Since most cases are constants, the advice is to use a string literal instead of a char literal. This works for BMP characters, e.g. "-".repeat(10) or "\u2501".repeat(15). But if I want a non-BMP character as a string literal, I have encode it into a surrogate pair myself. For example, a string literal containing the character U+1D11A MUSICAL SYMBOL FIVE-LINE STAFF would be "\uD834\uDD1A". Ugh! Or, I could just call a function and live with it not being a constant. It would be nice if there were an escape sequence that allowed any Unicode code point, including supplementary characters, to be put to n a string literal.
> 
> s'marks
> 
>> On Feb 16, 2018, at 18:02, Brian Goetz <brian.goetz at oracle.com> wrote:
>> 
>> Disagree.  
>> 
>> On #3, most of the time the char being repeated is already a literal.  So just make it a string.  
>> 
>> On #2, better to aim for string.ofCodePoint(int) and compose w repeat.  
>> 
>> Down to one method again :)
>> 
>> Sent from my MacBook Wheel
>> 
>>> On Feb 16, 2018, at 5:13 PM, Stuart Marks <stuart.marks at oracle.com> wrote:
>>> 
>>> Let me put in an argument for handling code points:
>>> 
>>>> 3. public static String repeat(final int codepoint, final int count)
>>> 
>>> Most of the String and Character API handles code points on an equal footing with chars. I think this is important, as over time Unicode is continuing to add supplementary characters -- those that can't be represented in a Java char value. Examples abound of how such characters are mishandled. Therefore, I believe Java APIs should have full support for code points.
>>> 
>>> This is a small thing, and some might consider it a rare case -- how often does one need to repeat something like an emoji? The issue however isn't that particular use case. Instead what's required is the ability to handle *any Unicode character* uniformly, regardless of whether or not it's a supplementary character. The way to do that is to deal with code points, so any Java API that deals with character data must also handle code points.
>>> 
>>> If we were to add just one method:
>>> 
>>>> 1. public String repeat(final int count)
>>> 
>>> the workaround is to take the character, turn it into a string, and call the repeat() method on it. For a 'char' value, this isn't too bad, but I'd argue it isn't pretty either:
>>> 
>>>  Character.toString(charVal).repeat(n)
>>> 
>>> But this only handles BMP characters, not supplementary characters. Unfortunately, there's no direct way to turn a code point into a string -- you have to turn it into a byte array first! Thus, to get a string from a code point and repeat it, you have to do this:
>>> 
>>>  new String(Character.toChars(codepoint)).repeat(count)
>>> 
>>> This is enough indirection that it's hard to discover, and I suspect that most people won't put in the effort to do this correctly, resulting in more code that mishandles supplementary characters.
>>> 
>>> Thus, I think we need to add API #3 that performs the repeat function on code points.
>>> 
>>> (Hm, the lack of Character.toString(codepoint) is covered by JDK-4993841, which is closed. I think I'll reopen it.)
>>> 
>>>> 2. public static String repeat(final char ch, final int count)
>>> 
>>> I can see that this API is not as important as one that handles code points, and it seems to be less frequently used according to Louis W's analysis. But if you have char data you want to repeat, not having this seems like an omission; it seems backwards to have to create a string from the char, only for repeat() to extract that char from that String in order to repeat it. Thus I've vote for inclusion of this method as well.
>>> 
>>> s'marks
>>> 
>>> 
>>>> On 2/16/18 5:10 AM, Jim Laskey wrote:
>>>> We’re going with the one instance method (Louis clinched it.) with recommended enhancements and not touching CharSequence.
>>>> Working it up now.
>>>> — Jim
>>>>> On Feb 16, 2018, at 7:46 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>>>>> 
>>>>> On 15/02/2018 17:20, Jim Laskey wrote:
>>>>>> This is a pre-CSR code review [1] for String repeat methods (Enhancement).
>>>>>> 
>>>>>> The proposal is to introduce four new methods;
>>>>>> 
>>>>>> 1. public String repeat(final int count)
>>>>>> 2. public static String repeat(final char ch, final int count)
>>>>>> 3. public static String repeat(final int codepoint, final int count)
>>>>>> 4. public static String repeat(final CharSequence seq, final int count)
>>>>>> 
>>>>> Just catching up on this thread and it's hard to see where the bidding is currently at. Are you planning to send an updated proposal, a list of methods is fine, even if it's just one, is okay (implementation can follow later).
>>>>> 
>>>>> -Alan
>>> 
>> 
> 



More information about the core-libs-dev mailing list