PROPOSAL: Simplified StringBuffer/StringBuilder syntax
Gabriel Belingueres
belingueres at gmail.com
Sat Mar 28 15:09:30 PDT 2009
I compiled this code:
String getString(List<String> baz) {
int i = 2;
String foo = "abcd" + "efgh";
if (i % 2 == 0) {
foo += "ghij" + 42 + "klmn";
}
for (String bar : baz) {
foo += bar + "\n";
}
return foo;
}
that decompile to this:
String getString(List baz)
{
int i = 2;
String foo = "abcdefgh";
if(i % 2 == 0)
foo = (new
StringBuilder(String.valueOf(foo))).append("ghij42klmn").toString();
for(Iterator iterator = baz.iterator(); iterator.hasNext();)
{
String bar = (String)iterator.next();
foo = (new
StringBuilder(String.valueOf(foo))).append(bar).append("\n").toString();
}
return foo;
}
Maybe other solution can be make the compiler smarter.
Here, the String foo is a local variable that is used as an lvalue
only, meaning it is always written and NEVER read (a perfect candidate
for optimizing it creating only one StringBuilder)
I'm astonished to find that the compiler didn't optimize the +=
assignment tough.
2009/3/28 Derek Foster <vapor1 at teleport.com>:
> Simplified StringBuffer/StringBuilder syntax.
>
> AUTHOR: Derek Foster
>
>
> OVERVIEW
>
> The Java language is designed to use immutable strings. Strings are intended to be constructed via one of two builder classes: StringBuffer (older, and slower due to synchronization) and StringBuilder (newer, and faster). Using these classes can make constructing Strings out of multiple pieces considerably more efficient than repeatedly concatenating strings together. These classes are often used by the Java compiler to concatenate strings behind the scenes, so that an expression like this:
>
> String foo = "abcd" + 42 + "efgh";
>
> might be compiled as:
>
> String foo = new StringBuilder("abcd").append(42).append("efgh");
>
> (Note: For clarity in this overview, I am ignoring the impact of constant folding by the compiler. Constant folding will be discussed more later.)
>
> Unfortunately, the syntax for constructing strings using these builder classes differs significantly from that of constructing strings using string concatenation. This has proved to be a barrier for their use. For instance, programmers can easily write and understand the following:
>
> void getString(List<?> baz) {
> String foo = "abcd" + "efgh";
> if (whatever) {
> foo += "ghij" + 42 + "klmn";
> }
> for (String bar : baz) {
> foo += bar + "\n";
> }
> return foo;
> }
>
> This syntax is clear and readable. It's also quite inefficient, since it has to create a StringBuilder for each concatenation operation, only to throw it away later. This results in multiple unnecessary memory allocations. For instance, the 'foreach' loop will exhibit memory allocation performance proportional to O(N^2), since it has to allocate a new String containing the prior contents each time the body of the loop is executed.
>
> Regardless, the programmers who wrote such code may legitimately wonder why on earth someone would want to use a StringBuilder to do the same thing, when the syntax is so much more verbose and awkward:
>
> String getString(List<?> baz) {
> StringBuilder foo = new StringBuilder("abcd");
> foo.append("efgh");
> if (whatever) {
> foo.append("ghij");
> foo.append(42);
> foo.append("klmn");
> }
> for (String bar : baz) {
> foo.append(bar);
> foo.append("\n");
> }
> return foo;
> }
>
> Note that now, the important parts of the code (the strings being appended) are swallowed up in the syntax required to invoke the "append" method repeatedly. This impairs readability significantly. However, it is much more efficient. The memory allocation performance in the loop is now likely O(log(N)) instead of O(N^2), since the StringBuilder can simply double its allocated memory size each time the existing threshold is exceeded, and most append operations simply use more of memory that has already been allocated.
>
> Even using the chaining syntax allowed by the append method doesn't make things much better:
>
> String getString() {
> StringBuilder foo = new StringBuilder("abcd").append("efgh");
> if (whatever) {
> foo.append("ghij").append(42).append("klmn");
> }
> for (String foo : bar) {
> foo.append(bar).append("\n");
> }
> return foo;
> }
>
> This code has the same problem as the previous example in that the syntax necessary to call "append" multiple times dwarfs the values actually being appended. This makes it easy to miss bugs when reading the code. Also, long chains of method calls like these are seldom handled well by automatic formatting utilities such as are found in Eclipse and other development tools.
>
> As a result, many readability-minded programmers have simply decided that StringBuffer and StringBuilder are not worth the trouble to use in the vast majority of cases, due to their uglier syntax. These programmers have decided that directly concatenating strings to other strings is worth it for the simplicity of syntax even though it results in considerably less efficient code (with lots of extra memory allocation and more work for the garbage collector) than using StringBuffer or StringBuilder.
>
> Still other more efficiency-minded programmers have written large string-processing methods with large numbers of calls to "append" (often, each on a separate line) which results in code that is quite efficient, but three or four times as long as it would otherwise need to be if it were written to use string concatenation. Even simple algorithms can become huge when written in this style.
>
> Yet other programmers try to mix the approaches, like this:
>
> foo.append("ghij"+42+"klmn");
>
> This is in between the other examples given above, both in readability and efficiency, since it is typically expanded by a compiler to something like:
>
> foo.append(new StringBuilder("ghij").append(42).append("klmn"));
>
> rather than
>
> foo.append("ghij").append(42).append("klmn");
>
> which would be more optimal.
>
>
> This proposal attempts to introduce new syntax to remove the currently existing tradeoff between readability and efficency when performing string concatenation.
>
>
> FEATURE SUMMARY:
>
> This proposal suggests that StringBuilder and StringBuffer should allow syntax similar to that which is used to append and assign strings to each other. In particular, the Java language should be modified to allow the "+=" operator to be used between a StringBuffer or StringBuilder and a string-valued expression on its right-hand side, and the "=" operator to be used between a StringBuffer or StringBuilder and a string-valued expression on its right-hand side. The following forms of expressions would then become legal:
>
> StringBuilder foo = "abc";
> foo += "abc";
> foo = "abc";
>
> Furthermore, the following special cases would be recognized, and optimized further by the compiler:
>
> StringBuilder foo = "abc" + 42 + "def";
> foo += "abc" + 42 + "def";
> foo = "abc" + 42 + "def";
>
>
> Using desugarings as described below, these would be expanded by the compiler into code that is as efficient as writing expressions using the existing StringBuilder/StringBuffer APIs.
>
>
> MAJOR ADVANTAGE:
>
> The syntax for creating strings using the efficient StringBuffer and StringBuilder classes will become simpler and clearer, with less clutter, which will give people fewer reasons to avoid their use.
>
>
> MAJOR BENEFIT:
>
> Elimination of the existing tradeoff between efficiency and readability will mean that programmers will have either more readable programs or more efficient programs, depending on which of these alternatives they were used to choosing.
>
>
> MAJOR DISADVANTAGE:
>
> Compiler vendors would have to implement the new feature. This feature has been designed to be relatively easy to implement, but it will still take some effort.
>
> Some programmers might be confused by the fact that these formerly illegal expressions were now legal and had defined semantics.
>
>
> ALTERNATIVES:
>
> The standard workarounds to this problem are shown above. Each has drawbacks, in either efficiency or readability.
>
> As another alternative, it would be possible for a compiler to do more extensive analysis of String expressions, considering the entire body of a function, and invisibly substituting a StringBuilder for the String up until the point it was needed for assignment to another String, passed to a String-valued function parameter, or returned. With such a change, it might be unnecessary for programmers to ever explicitly use StringBuilder or StringBuffer in their code. This type of sophisticated analysis and optimization is in principle possible, but would be quite a challenge for a compiler vendor to implement. Also, efficiency-minded Java programmers would have to be "untrained" from the widespread advice that using StringBuilders directly is the way to achieve efficient code.
>
>
> SIMPLE EXAMPLE:
>
> The following method written using current syntax:
>
> String getFoo() {
> StringBuilder foo = new StringBuilder("abc");
> foo.append("def");
> return foo.toString();
> }
>
> could be reduced to the following with the new syntax:
>
> String getFoo() {
> StringBuilder foo = "abc";
> foo += "def";
> return foo.toString();
> }
>
>
> ADVANCED EXAMPLE:
>
> Assuming the existence of the following class:
>
> class Person {
> public String name;
> public int age;
> public int weight;
> }
>
> The following method:
>
> class People {
> private List<Person> people = ...;
> public String toString() {
> StringBuilder result = new StringBuilder("{");
> for (Person person : people) {
> result.append("{name=").append(person.name);
> result.append(",age=").append(person.age);
> result.append(",weight=").append(person.weight).append("}");
> }
> result.append("}");
> return result.toString();
> }
> }
>
> could be reduced by the new constructs to:
>
> class People {
> private List<Person> people = ...;
> public String toString() {
> StringBuilder result = "{";
> for (Person person : people) {
> result += "{name=" + name + ",age=" + age + ",weight=" + weight + "}";
> }
> result += "}";
> return result.toString();
> }
> }
>
> and would be just as efficient (and would ideally translate to the same compiler-generated code).
>
>
> DETAILS
>
> For conciseness, the discussion below refers to java.util.StringBuilder only, but the intent of this proposal is that java.util.StringBuffer be treated in the same manner.
>
>
> SPECIFICATION:
>
> Note that in the following that the intent is not to create special type-conversion rules between String and StringBuilder, since that would complicate method overloading and other mechanisms of Java and would dramatically widen the impact of this proposal. As such, this proposal only seeks to add new, very limited use overloads of existing = and += operators, without altering how the String and StringBuilder types are otherwise used within the language.
>
>
> INITIALIZATION: A declaration of the form:
>
> StringBuilder SB = S;
>
> where S is an expression of type String, shall be considered to have meaning as defined below. (Previously, this was a syntax error)
>
>
> CONCATENATION: An expression of the form
>
> SB += S
>
> where SB is an RValue expression of type StringBuilder, and S is an expression of type String, shall be considered to have meaning as defined below. (Previously, this was a syntax error)
>
> SELF-CONCATENATION:
>
> An expression of the form
>
> SB = SB + S
>
> where SB is an LValue expression of type StringBuilder, and S is an expression of type String, shall be considered to have meaning as defined below. (Previously, this was a syntax error). Note that SB must be provable by the compiler to denote the same variable in both instances.
>
>
> ASSIGNMENT: An expression of the form:
>
> SB = S
>
> where SB is an LValue expression of type StringBuilder, and S is an expression of type String, shall be considered to have meaning as defined below. (Previously, this was a syntax error.)
>
>
>
> COMPILATION:
>
> The expressions as shown above shall be compiled to normal class files, desugared as follows.
>
> In the following discussion, the expression "S" shall refer to an arbitrary String expression.
>
> In the following discussion, the expression "A + B + C + ..." refers to a special case of String expressions: namely, a String concatenation expression consisting of an arbitrary number of operands being concatenated together (of which at least one is a String, as per the normal Java rules on the "+" String concatenation operator). Optimizations for this common special case are as shown below.
>
> [For the purpose of detecting this special case, constant folding optimizations should first be applied by the compiler. Also, redundant parentheses enclosing string concatenation subexpressions should be flattened prior to analysis. For instance, "A + B + (C + D)" may be treated the same as "A + B + C + D" if "C + D" is also a String concatenation expression.]
>
> Within the context of the preceding definitions, then:
>
> Declarations of the "INITIALIZATION" form specified above:
>
> StringBuilder SB = S;
> StringBuilder SB = A + B + C + ...;
>
> shall be desugared to:
>
> StringBuilder SB = new StringBuilder(S);
> StringBuilder SB = new StringBuilder(A).append(B).append(C)....;
>
>
> expressions of the "CONCATENATION" form specified above:
>
> SB += S
> SB += A + B + C + ...
>
> shall be desugared to:
>
> SB.append(S)
> SB.append(A).append(B).append(C)....
>
>
> Expressions of the "SELF-CONCATENATION" form specified above:
>
> SB = SB + S
> SB = SB + A + B + C + ...
>
> shall be desugared to:
>
> SB.append(S)
> SB.append(A).append(B).append(C)....
>
>
> Expression of the "ASSIGNMENT" form specified above:
>
> B = S
> B = A + B + C + ...
>
> shall be desugared to:
>
> B = new StringBuilder(S)
> B = new StringBuilder(A).append(B).append(C)....
>
>
>
>
> TESTING:
>
> Expressions and statements of the above types can be constructed and compared with the results of their desugared equivalents.
>
>
> LIBRARY SUPPORT:
>
> No changes to supporting libraries are needed.
>
>
> REFLECTIVE APIS:
>
> No changes to reflective APIs are needed.
>
>
> OTHER CHANGES:
>
> No other changes to the JAVA platform are needed.
>
>
> MIGRATION:
>
> See simple and advanced examples above.
>
>
>
> COMPATIBILITY
>
>
> BREAKING CHANGES:
>
> Since the proposed syntax now provokes a syntax error, this change will not break any existing programs.
>
>
> EXISTING PROGRAMS:
>
> Since class file format does not need to change as a result of this feature, interaction with existing class files is not affected.
>
>
> REFERENCES
>
>
> EXISTING BUGS:
>
> I searched the bug database but was unable to find any enhancement proposals similar to this one.
>
>
> URL FOR PROTOTYPE (optional):
>
> None
>
>
>
More information about the coin-dev
mailing list