RFR: 8197594 - String and character repeat
Louis Wasserman
lowasser at google.com
Tue Feb 20 19:46:00 UTC 2018
I'm with Brian: adding a separate API to make it easier to get from a
codepoint to a String seems independently merited, and makes the single
repeat API work well for that case. A very quick regex-powered search
comes up with 183 hits in Google for (new
String|String.copyValueOf|String.valueOf)(Character.toChars(..)).
I do, however, recommend a separate thread for discussing that API :)
On Tue, Feb 20, 2018 at 11:33 AM Kevin Bourrillion <kevinb at google.com>
wrote:
> Just to add another dimension to this data: most of the usages of our
> repeat method (~75%) are in test code. These tests usually just want any
> old test string of a certain length. Repeating a single character is the
> obvious way to get that.
>
> Among production code usages (~25%), there are a few roughly equal use
> cases: ascii indentation/alignment, redaction, and Martin's expected case
> of "drawing" with ASCII symbols, and "other".
>
>
>
> On Thu, Feb 15, 2018 at 12:52 PM, Louis Wasserman <lowasser at google.com>
> wrote:
>
>> I don't think there's a case for demand to merit having a
>> repeat(CharSequence, int) at all.
>>
>> I did an analysis of usages of Guava's Strings.repeat on Google's
>> codebase. Users might be rolling their own implementations, too, but this
>> should be a very good proxy for demand.
>>
>> StringRepeat_SingleConstantChar = 4.475 K // strings with .length() == 1
>> StringRepeat_SingleConstantCodePoint = 28 // strings with
>> .codePointCount(...) == 1
>> StringRepeat_MultiCodePointConstant = 1.156 K // constant strings neither
>> of the above
>> StringRepeat_CharSequenceToString = 2 //
>> Strings.repeat(CharSequence.toString(), n)
>> StringRepeat_NoneOfTheAbove = 248
>>
>> Notably, it seems like basically nobody needs to repeat a CharSequence --
>> definitely not enough demand to merit the awkwardness of e.g.
>> Rope.repeat(n) inheriting a repeat returning a String.
>>
>> Based on this data, I'd recommend providing one and only one method of
>> this
>> type: String.repeat(int). There's no real advantage to a static
>> repeat(char, int) method when the overwhelming majority of these are
>> constants: e.g. compare SomeUtilClass.repeat('*', n) versus "*".repeat(n).
>> Character.toString(c).repeat(n) isn't a bad workaround if you don't have a
>> constant char. There also isn't much demand for dealing with the code
>> point case specially; the String.repeat(int) method seems like it'd handle
>> that just fine.
>>
>> On Thu, Feb 15, 2018 at 11:44 AM Jim Laskey <james.laskey at oracle.com>
>> wrote:
>>
>> >
>> >
>> > > On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov <
>> ivan.gerasimov at oracle.com>
>> > wrote:
>> > >
>> > > Hello!
>> > >
>> > > The link with the webrev returned 404, but I could find it at this
>> > location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/
>> > >
>> > > A few minor comments:
>> > >
>> > > 1)
>> > >
>> > > This check:
>> > >
>> > > 2992 final long limit = (long)count * 2L;
>> > > 2993 if ((long)Integer.MAX_VALUE < limit) {
>> > >
>> > > can be possibly simplified as
>> > > if (count > Integer.MAX_VALUE - count) {
>> >
>> > Good.
>> >
>> > >
>> > > 2)
>> > > Should String repeat(final int codepoint, final int count) be
>> optimized
>> > for codepoints that can be represented with a single char?
>> > >
>> > > E.g. like this:
>> > >
>> > > public static String repeat(final int codepoint, final int count) {
>> > > return Character.isBmpCodePoint(codepoint))
>> > > ? repeat((char) codepoint, count)
>> > > : (new String(Character.toChars(codepoint))).repeat(count);
>> > > }
>> >
>> > Yes, avoid array allocation.
>> >
>> > >
>> > > 3)
>> > > Using long arithmetic can possibly be avoided in the common path of
>> > repeat(final int count):
>> > >
>> > > E.g. like this:
>> > >
>> > > if (count < 0) {
>> > > throw new IllegalArgumentException("count is negative, " +
>> > count);
>> > > } else if (count == 1) {
>> > > return this;
>> > > } else if (count == 0) {
>> > > return "";
>> > > }
>> > > final int len = value.length;
>> > > if (Integer.MAX_VALUE / count < len) {
>> > > throw new IllegalArgumentException(
>> > > "Resulting string exceeds maximum string length:
>> " +
>> > ((long)len * (long)count));
>> > > }
>> > > final int limit = count * len;
>> >
>> > Good.
>> >
>> > Thank you.
>> >
>> > >
>> > > With kind regards,
>> > > Ivan
>> > >
>> > > On 2/15/18 9:20 AM, Jim Laskey wrote:
>> > >> This is a pre-CSR code review [1] for String repeat methods
>> > (Enhancement).
>> > >>
>> > >> The proposal is to introduce four new methods;
>> > >>
>> > >> 1. public String repeat(final int count)
>> > >> 2. public static String repeat(final char ch, final int count)
>> > >> 3. public static String repeat(final int codepoint, final int count)
>> > >> 4. public static String repeat(final CharSequence seq, final int
>> count)
>> > >>
>> > >> For the sake of transparency, only 1 is necessary, 2-4 are
>> convenience
>> > methods.
>> > >> In the case of 2, “*”.repeat(10) performs as well as
>> String.repeat(‘*’,
>> > 10).
>> > >> 3 and 4 convert to String before calling 1.
>> > >>
>> > >> Performance runs with jmh (results as comment in [2]) show that these
>> > >> methods are significantly faster that StringBuilder equivalents.
>> > >> - fewer memory allocations
>> > >> - fewer char to byte array conversions
>> > >> - faster pyramid replication vs O(N) copying
>> > >>
>> > >> I left StringBuilder out of scope. It falls under the category of
>> > >> Appendables#append with repeat. A much bigger project.
>> > >>
>> > >> All comments welcome. Especially around the need for convenience
>> > >> methods, the JavaDoc content and expanding the tests.
>> > >>
>> > >> — Jim
>> > >>
>> > >> [1] webrev:
>> > http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00
>> > >> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594
>> > >>
>> > >>
>> > >
>> > > --
>> > > With kind regards,
>> > > Ivan Gerasimov
>> > >
>> >
>> >
>>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>
More information about the core-libs-dev
mailing list