RFR: 8197594 - String and character repeat
Stuart Marks
stuart.marks at oracle.com
Tue Feb 27 00:23:02 UTC 2018
To close the loop on this, I've reopened JDK-4993841, which requests adding an
API Character.toString(int) which converts an int codepoint value to a String.
(This seems like the obvious API, in parallel with Character.toString(char), but
of course alternatives could be considered.)
s'marks
On 2/20/18 11:46 AM, Louis Wasserman wrote:
> I'm with Brian: adding a separate API to make it easier to get from a
> codepoint to a String seems independently merited, and makes the single
> repeat API work well for that case. A very quick regex-powered search
> comes up with 183 hits in Google for (new
> String|String.copyValueOf|String.valueOf)(Character.toChars(..)).
>
> I do, however, recommend a separate thread for discussing that API :)
>
> On Tue, Feb 20, 2018 at 11:33 AM Kevin Bourrillion <kevinb at google.com>
> wrote:
>
>> Just to add another dimension to this data: most of the usages of our
>> repeat method (~75%) are in test code. These tests usually just want any
>> old test string of a certain length. Repeating a single character is the
>> obvious way to get that.
>>
>> Among production code usages (~25%), there are a few roughly equal use
>> cases: ascii indentation/alignment, redaction, and Martin's expected case
>> of "drawing" with ASCII symbols, and "other".
>>
>>
>>
>> On Thu, Feb 15, 2018 at 12:52 PM, Louis Wasserman <lowasser at google.com>
>> wrote:
>>
>>> I don't think there's a case for demand to merit having a
>>> repeat(CharSequence, int) at all.
>>>
>>> I did an analysis of usages of Guava's Strings.repeat on Google's
>>> codebase. Users might be rolling their own implementations, too, but this
>>> should be a very good proxy for demand.
>>>
>>> StringRepeat_SingleConstantChar = 4.475 K // strings with .length() == 1
>>> StringRepeat_SingleConstantCodePoint = 28 // strings with
>>> .codePointCount(...) == 1
>>> StringRepeat_MultiCodePointConstant = 1.156 K // constant strings neither
>>> of the above
>>> StringRepeat_CharSequenceToString = 2 //
>>> Strings.repeat(CharSequence.toString(), n)
>>> StringRepeat_NoneOfTheAbove = 248
>>>
>>> Notably, it seems like basically nobody needs to repeat a CharSequence --
>>> definitely not enough demand to merit the awkwardness of e.g.
>>> Rope.repeat(n) inheriting a repeat returning a String.
>>>
>>> Based on this data, I'd recommend providing one and only one method of
>>> this
>>> type: String.repeat(int). There's no real advantage to a static
>>> repeat(char, int) method when the overwhelming majority of these are
>>> constants: e.g. compare SomeUtilClass.repeat('*', n) versus "*".repeat(n).
>>> Character.toString(c).repeat(n) isn't a bad workaround if you don't have a
>>> constant char. There also isn't much demand for dealing with the code
>>> point case specially; the String.repeat(int) method seems like it'd handle
>>> that just fine.
>>>
>>> On Thu, Feb 15, 2018 at 11:44 AM Jim Laskey <james.laskey at oracle.com>
>>> wrote:
>>>
>>>>
>>>>
>>>>> On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov <
>>> ivan.gerasimov at oracle.com>
>>>> wrote:
>>>>>
>>>>> Hello!
>>>>>
>>>>> The link with the webrev returned 404, but I could find it at this
>>>> location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/
>>>>>
>>>>> A few minor comments:
>>>>>
>>>>> 1)
>>>>>
>>>>> This check:
>>>>>
>>>>> 2992 final long limit = (long)count * 2L;
>>>>> 2993 if ((long)Integer.MAX_VALUE < limit) {
>>>>>
>>>>> can be possibly simplified as
>>>>> if (count > Integer.MAX_VALUE - count) {
>>>>
>>>> Good.
>>>>
>>>>>
>>>>> 2)
>>>>> Should String repeat(final int codepoint, final int count) be
>>> optimized
>>>> for codepoints that can be represented with a single char?
>>>>>
>>>>> E.g. like this:
>>>>>
>>>>> public static String repeat(final int codepoint, final int count) {
>>>>> return Character.isBmpCodePoint(codepoint))
>>>>> ? repeat((char) codepoint, count)
>>>>> : (new String(Character.toChars(codepoint))).repeat(count);
>>>>> }
>>>>
>>>> Yes, avoid array allocation.
>>>>
>>>>>
>>>>> 3)
>>>>> Using long arithmetic can possibly be avoided in the common path of
>>>> repeat(final int count):
>>>>>
>>>>> E.g. like this:
>>>>>
>>>>> if (count < 0) {
>>>>> throw new IllegalArgumentException("count is negative, " +
>>>> count);
>>>>> } else if (count == 1) {
>>>>> return this;
>>>>> } else if (count == 0) {
>>>>> return "";
>>>>> }
>>>>> final int len = value.length;
>>>>> if (Integer.MAX_VALUE / count < len) {
>>>>> throw new IllegalArgumentException(
>>>>> "Resulting string exceeds maximum string length:
>>> " +
>>>> ((long)len * (long)count));
>>>>> }
>>>>> final int limit = count * len;
>>>>
>>>> Good.
>>>>
>>>> Thank you.
>>>>
>>>>>
>>>>> With kind regards,
>>>>> Ivan
>>>>>
>>>>> On 2/15/18 9:20 AM, Jim Laskey wrote:
>>>>>> This is a pre-CSR code review [1] for String repeat methods
>>>> (Enhancement).
>>>>>>
>>>>>> The proposal is to introduce four new methods;
>>>>>>
>>>>>> 1. public String repeat(final int count)
>>>>>> 2. public static String repeat(final char ch, final int count)
>>>>>> 3. public static String repeat(final int codepoint, final int count)
>>>>>> 4. public static String repeat(final CharSequence seq, final int
>>> count)
>>>>>>
>>>>>> For the sake of transparency, only 1 is necessary, 2-4 are
>>> convenience
>>>> methods.
>>>>>> In the case of 2, “*”.repeat(10) performs as well as
>>> String.repeat(‘*’,
>>>> 10).
>>>>>> 3 and 4 convert to String before calling 1.
>>>>>>
>>>>>> Performance runs with jmh (results as comment in [2]) show that these
>>>>>> methods are significantly faster that StringBuilder equivalents.
>>>>>> - fewer memory allocations
>>>>>> - fewer char to byte array conversions
>>>>>> - faster pyramid replication vs O(N) copying
>>>>>>
>>>>>> I left StringBuilder out of scope. It falls under the category of
>>>>>> Appendables#append with repeat. A much bigger project.
>>>>>>
>>>>>> All comments welcome. Especially around the need for convenience
>>>>>> methods, the JavaDoc content and expanding the tests.
>>>>>>
>>>>>> — Jim
>>>>>>
>>>>>> [1] webrev:
>>>> http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00
>>>>>> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> With kind regards,
>>>>> Ivan Gerasimov
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>>
More information about the core-libs-dev
mailing list