Proposal: Replace foreach loop with Iterable.forEach in String.join(CharSequence, Iterable)

Claes Redestad claes.redestad at oracle.com
Tue Aug 17 19:06:50 UTC 2021


Hi Don,

On 2021-08-17 19:05, Donald Raab wrote:
> Hi Claes,
> 
>> so the win comes from forEach on the synchronized collection implicitly
>> being synchronized atomically, whereas foreach will synchronize on
>> each iterator operation and thus allow racy modification of elements
>> in between iteration, right?
> 
> Yes, this win will be applied to any type that does something special in forEach. For instance we have MultiReader collections in Eclipse Collections, which take a read lock on forEach, and throws UnsupportedOperationException on iterator. This change would make String.join safe to use for these as well, without requiring an explicit lock in client code.

Thanks, I think this will help inform the decision-making.

> 
>>
>> Seems reasonable to consider, though it'd have to percolate through to most production deployments before you could use it - and even then
>> it seems somewhat fragile. And isn't this just one of many utility
>> methods that would have to be fixed to avoid issues with hidden
>> iterators completely?
>>
> 
> Yes, this is one small step. Synchronized collections wouldn’t become less tricky to use, but there would be one less hidden iterator to worry about.

Just thinking in terms of whether this ought to be a concerted effort.
And perhaps some paranoid fear it might spiral out of control unless
we're wary. :-)

Fixing the OpenJDK implementation is one thing, but I think this ought
to be also considered on a specification (and JCK) level to avoid
creating a dependency on the implementation.


> 
>> String.join was recently changed to this by JDK-8265237[1]:
>>
>>         for (CharSequence cs: elements) {
>>             if (size >= elems.length) {
>>                 elems = Arrays.copyOf(elems, elems.length << 1);
>>             }
>>             elems[size++] = String.valueOf(cs);
>>         }
>>
> 
> This looks like the code for the overloaded String.join which takes an array. I was referring to only the version that takes Iterable.

I'm looking at the Iterable-based implementation in openjdk/jdk:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L3320

> 
>> Of course the body in the new code could be extracted to the same
>> effect. The drawback in either implementation is that it would by
>> necessity be a capturing lambda, so might add a small allocation
>> overhead. I'm biased towards avoiding performance overheads, but maybe
>> someone else can make a call on whether this is a worthwhile thing to
>> attempt.
>>
> 
> There is a trade-off here. You remove the need to create an iterator for all types (due to foreach loop), and replace with forEach with a method reference, which IIRC might be slightly better than having a lambda. The underlying collection (let’s say ArrayList) will be called which will also not create an iterator but iterate directly over the ArrayList internal array.

Method references and lambdas should boil down to the same thing at
runtime. The differentiator for performance is when you need to capture
some state, in which the lambda won't be a singleton but instead each
invocation will allocate an object that captures the state. I believe
this would happen with joiner::add pre-JDK-8265237 (the method ref will
capture joiner).

But you're right that for many (most?) collections then forEach is
specialized and we could become more or less performance neutral. I've
prototyped a solution using a local Consumer class, along with a
microbenchmark, here:

https://github.com/openjdk/jdk/pull/5152/files

Both baseline and prototype allocates about as much (ArrayList-based),
while the prototype actually comes out slightly faster:

Benchmark 
(count)  (length)  (mode)  Mode  Cnt    Score    Error   Units
StringJoinerBenchmark.joinIterable 
    5         1   latin  avgt   10  362.882 ± 6.789   ns/op # 18-b09 
baseline
StringJoinerBenchmark.joinIterable 
    5         1   latin  avgt   10  394.977 ± 9.448   ns/op # prototype

Collections without a forEach specialization might suffer since the 
default method will go via the iterator, but that should be acceptable
from a performance point of view.

Thanks,
Claes

> 
> Thanks,
> Don
> 
>> Thanks!
>>
>> /Claes
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8265237
>>
>> On 2021-08-17 06:55, Donald Raab wrote:
>>> The following code:
>>> for (CharSequence cs: elements) {
>>>      joiner.add(cs);
>>> }
>>> Can be replaced with:
>>> elements.forEach(joiner::add);
>>> This would have the minor benefit of making it safe to use String.join with synchronized collections without requiring a client side lock. There are likely other opportunities like this in the JDK.
>>> Also, with the addition of forEach on Iterable, and Predicate in Java 8, the following design FAQ is outdated.
>>> https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a6 <https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/doc-files/coll-designfaq.html#a6>
>>> Thanks,
>>> Don
> 


More information about the core-libs-dev mailing list