RFR: 8197594 - String and character repeat
Louis Wasserman
lowasser at google.com
Thu Feb 15 20:52:11 UTC 2018
I don't think there's a case for demand to merit having a
repeat(CharSequence, int) at all.
I did an analysis of usages of Guava's Strings.repeat on Google's
codebase. Users might be rolling their own implementations, too, but this
should be a very good proxy for demand.
StringRepeat_SingleConstantChar = 4.475 K // strings with .length() == 1
StringRepeat_SingleConstantCodePoint = 28 // strings with
.codePointCount(...) == 1
StringRepeat_MultiCodePointConstant = 1.156 K // constant strings neither
of the above
StringRepeat_CharSequenceToString = 2 //
Strings.repeat(CharSequence.toString(), n)
StringRepeat_NoneOfTheAbove = 248
Notably, it seems like basically nobody needs to repeat a CharSequence --
definitely not enough demand to merit the awkwardness of e.g.
Rope.repeat(n) inheriting a repeat returning a String.
Based on this data, I'd recommend providing one and only one method of this
type: String.repeat(int). There's no real advantage to a static
repeat(char, int) method when the overwhelming majority of these are
constants: e.g. compare SomeUtilClass.repeat('*', n) versus "*".repeat(n).
Character.toString(c).repeat(n) isn't a bad workaround if you don't have a
constant char. There also isn't much demand for dealing with the code
point case specially; the String.repeat(int) method seems like it'd handle
that just fine.
On Thu, Feb 15, 2018 at 11:44 AM Jim Laskey <james.laskey at oracle.com> wrote:
>
>
> > On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov <ivan.gerasimov at oracle.com>
> wrote:
> >
> > Hello!
> >
> > The link with the webrev returned 404, but I could find it at this
> location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/
> >
> > A few minor comments:
> >
> > 1)
> >
> > This check:
> >
> > 2992 final long limit = (long)count * 2L;
> > 2993 if ((long)Integer.MAX_VALUE < limit) {
> >
> > can be possibly simplified as
> > if (count > Integer.MAX_VALUE - count) {
>
> Good.
>
> >
> > 2)
> > Should String repeat(final int codepoint, final int count) be optimized
> for codepoints that can be represented with a single char?
> >
> > E.g. like this:
> >
> > public static String repeat(final int codepoint, final int count) {
> > return Character.isBmpCodePoint(codepoint))
> > ? repeat((char) codepoint, count)
> > : (new String(Character.toChars(codepoint))).repeat(count);
> > }
>
> Yes, avoid array allocation.
>
> >
> > 3)
> > Using long arithmetic can possibly be avoided in the common path of
> repeat(final int count):
> >
> > E.g. like this:
> >
> > if (count < 0) {
> > throw new IllegalArgumentException("count is negative, " +
> count);
> > } else if (count == 1) {
> > return this;
> > } else if (count == 0) {
> > return "";
> > }
> > final int len = value.length;
> > if (Integer.MAX_VALUE / count < len) {
> > throw new IllegalArgumentException(
> > "Resulting string exceeds maximum string length: " +
> ((long)len * (long)count));
> > }
> > final int limit = count * len;
>
> Good.
>
> Thank you.
>
> >
> > With kind regards,
> > Ivan
> >
> > On 2/15/18 9:20 AM, Jim Laskey wrote:
> >> This is a pre-CSR code review [1] for String repeat methods
> (Enhancement).
> >>
> >> The proposal is to introduce four new methods;
> >>
> >> 1. public String repeat(final int count)
> >> 2. public static String repeat(final char ch, final int count)
> >> 3. public static String repeat(final int codepoint, final int count)
> >> 4. public static String repeat(final CharSequence seq, final int count)
> >>
> >> For the sake of transparency, only 1 is necessary, 2-4 are convenience
> methods.
> >> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’,
> 10).
> >> 3 and 4 convert to String before calling 1.
> >>
> >> Performance runs with jmh (results as comment in [2]) show that these
> >> methods are significantly faster that StringBuilder equivalents.
> >> - fewer memory allocations
> >> - fewer char to byte array conversions
> >> - faster pyramid replication vs O(N) copying
> >>
> >> I left StringBuilder out of scope. It falls under the category of
> >> Appendables#append with repeat. A much bigger project.
> >>
> >> All comments welcome. Especially around the need for convenience
> >> methods, the JavaDoc content and expanding the tests.
> >>
> >> — Jim
> >>
> >> [1] webrev:
> http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00
> >> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594
> >>
> >>
> >
> > --
> > With kind regards,
> > Ivan Gerasimov
> >
>
>
More information about the core-libs-dev
mailing list