[NEW BUG]: Avoid allocations in String.replace(CharSequence, CharSequence) in case no replacement happened
Hey, I just noticed a small improvement for String.replace(CharSequence, CharSequence) in case the target sequence is not found. There is the possibility of an early-out in case there is nothing to replace. Yet there is a call to replacement.toString(); While this is usually not a problem in case of String parameters, a StringBuilder - or other more "dynamic" CharSequence.toString() implementations - will likely produce unnecessary allocations here. I think the JDK could prevent that with the given patch below. I would be very happy if this is sponsored for JDK 10. Cheers, Christoph ===== PATCH ====== diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2177,11 +2177,11 @@ */ public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
Seems like a good idea. You probably want to null check 'replacement' before the bail out as the method is specified as throwing NPE in that case. On Wed, Mar 1, 2017 at 4:37 PM Christoph Dreis <christoph.dreis@freenet.de> wrote:
Hey,
I just noticed a small improvement for String.replace(CharSequence, CharSequence) in case the target sequence is not found.
There is the possibility of an early-out in case there is nothing to replace. Yet there is a call to replacement.toString(); While this is usually not a problem in case of String parameters, a StringBuilder - or other more "dynamic" CharSequence.toString() implementations - will likely produce unnecessary allocations here.
I think the JDK could prevent that with the given patch below.
I would be very happy if this is sponsored for JDK 10.
Cheers, Christoph
===== PATCH ====== diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2177,11 +2177,11 @@ */ public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
-- Sent from my phone
Hey Vitaly,
Seems like a good idea. You probably want to null check 'replacement' before the bail out as the method is specified as throwing NPE in that case.
Thanks for your feedback. See the adjusted patch below. ===== PATCH ====== diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2176,12 +2176,13 @@ * @since 1.5 */ public String replace(CharSequence target, CharSequence replacement) { + Objects.requireNonNull(replacement); String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
Hi Christoph, I think it's worth adding a small comment to the end of `String tgtStr = target.toString();`, saying that the null check is implicit, in case people read the code in the future and get confused as to why only `replacement` is apparently null-checked. What do you think? Kind regards, Jonathan On 1 March 2017 at 23:47, Christoph Dreis <christoph.dreis@freenet.de> wrote:
Hey Vitaly,
Seems like a good idea. You probably want to null check 'replacement' before the bail out as the method is specified as throwing NPE in that case.
Thanks for your feedback. See the adjusted patch below.
===== PATCH ======
diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2176,12 +2176,13 @@ * @since 1.5 */ public String replace(CharSequence target, CharSequence replacement) { + Objects.requireNonNull(replacement); String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
As mentioned offline, I would move the null check right before "return this". On Wed, Mar 1, 2017 at 6:50 PM Christoph Dreis <christoph.dreis@freenet.de> wrote:
Hey Vitaly,
Seems like a good idea. You probably want to null check 'replacement' before the bail out as the method is specified as throwing NPE in that case.
Thanks for your feedback. See the adjusted patch below.
===== PATCH ======
diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2176,12 +2176,13 @@ * @since 1.5 */ public String replace(CharSequence target, CharSequence replacement) { + Objects.requireNonNull(replacement); String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
-- Sent from my phone
Hey Vitaly and Jonathan,
As mentioned offline, I would move the null check right before "return this".
@Vitaly: Thanks again. Adjusted. @Jonathan: Thanks. Thought about that as well, but I'd probably rather go with explaining the explicit nullcheck. See the adjusted patch below. What do you think? ===== PATCH ====== diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2177,11 +2177,13 @@ */ public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { + // Explicit nullcheck of replacement to fulfill NPE contract in early out + Objects.requireNonNull(replacement); return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
Hi Christoph and Vitaly, I am struggling to understand why the null-check for `replacement` has been moved within the `if (j < 0) { ... }` block. Could one of you explain to me the reasoning behind this change? I ask because, AFAICT, making it so that it only throws an NPE if `replacement` is null *and* `j < 0`, rather than throwing NPE at the very start of the method, seems to be going against the class-level doc which says, "Unless otherwise noted, passing a null argument to a constructor or method in this class will cause a NullPointerException to be thrown." Other than that, I am equally happy with `replacement` itself having an explicit-nullness-related comment instead of target having an implicit-nullness comment - it seems to deliver the same sort of message. :) Kind regards, Jonathan On 2 March 2017 at 07:26, Christoph Dreis <christoph.dreis@freenet.de> wrote:
Hey Vitaly and Jonathan,
As mentioned offline, I would move the null check right before "return this".
@Vitaly: Thanks again. Adjusted. @Jonathan: Thanks. Thought about that as well, but I'd probably rather go with explaining the explicit nullcheck.
See the adjusted patch below. What do you think?
===== PATCH ====== diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2177,11 +2177,13 @@ */ public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { + // Explicit nullcheck of replacement to fulfill NPE contract in early out + Objects.requireNonNull(replacement); return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
On Thu, Mar 2, 2017 at 8:26 AM, Jonathan Bluett-Duncan < jbluettduncan@gmail.com> wrote:
Hi Christoph and Vitaly,
I am struggling to understand why the null-check for `replacement` has been moved within the `if (j < 0) { ... }` block. Could one of you explain to me the reasoning behind this change?
I ask because, AFAICT, making it so that it only throws an NPE if `replacement` is null *and* `j < 0`, rather than throwing NPE at the very start of the method, seems to be going against the class-level doc which says, "Unless otherwise noted, passing a null argument to a constructor or method in this class will cause a NullPointerException to be thrown."
If there's no early bail out, there's an implicit null check when replacement.toString() is called; in fact, if a null is never seen (based on profile), there won't be any code generated for the null check (sigsegv handling will be used instead). If we add an explicit null check at the top of the method, I'm not confident that the JIT will eliminate it in cases where the method does not bail out early. It probably doesn't matter, really, but I don't see much harm in delaying the explicit null check until the last moment, so to speak.
Other than that, I am equally happy with `replacement` itself having an explicit-nullness-related comment instead of target having an implicit-nullness comment - it seems to deliver the same sort of message. :)
Kind regards, Jonathan
On 2 March 2017 at 07:26, Christoph Dreis <christoph.dreis@freenet.de> wrote:
Hey Vitaly and Jonathan,
As mentioned offline, I would move the null check right before "return this".
@Vitaly: Thanks again. Adjusted. @Jonathan: Thanks. Thought about that as well, but I'd probably rather go with explaining the explicit nullcheck.
See the adjusted patch below. What do you think?
===== PATCH ====== diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2177,11 +2177,13 @@ */ public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { + // Explicit nullcheck of replacement to fulfill NPE contract in early out + Objects.requireNonNull(replacement); return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
Hi Vitaly,
If there's no early bail out, there's an implicit null check when replacement.toString() is called...
Oh apologies! I had missed in the code that NPE does get thrown if `replacement` is null, no matter if the `if (j < 0) { ... }` block is entered or not, because `replacement.toString()` gets called immediately after the if block, which of course executes an implicit null-check. In that case, the patch LGTM. :) Kind regards, Jonathan
Hi Vitaly, I don't see any contract to throw an NPE: https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replace-java... In any case, what's the reasonable of checking an argument, which is not used in that case? Am 02.03.2017 um 00:18 schrieb Vitaly Davidovich:
Seems like a good idea. You probably want to null check 'replacement' before the bail out as the method is specified as throwing NPE in that case.
On 03/02/2017 11:38 AM, Ulf Zibis wrote:
Hi Vitaly,
I don't see any contract to throw an NPE: https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replace-java...
"Unless otherwise noted, passing anullargument to a constructor or method in this class will cause a|NullPointerException| <https://docs.oracle.com/javase/8/docs/api/java/lang/NullPointerException.html>to be thrown." /Claes
On Thu, Mar 2, 2017 at 6:45 AM Claes Redestad <claes.redestad@oracle.com> wrote:
On 03/02/2017 11:38 AM, Ulf Zibis wrote:
Hi Vitaly,
I don't see any contract to throw an NPE:
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replace-java...
"Unless otherwise noted, passing anullargument to a constructor or method in this class will cause a|NullPointerException| < https://docs.oracle.com/javase/8/docs/api/java/lang/NullPointerException.htm...
to be thrown."
/Claes
Ah, it was moved to class level docs in 8 - thanks Claes. As a side comment, I certainly understand the desire to not repeat yourself for each method's javadoc, but at the same time it's a bit less user friendly as most people don't read class level docs nearly as frequently as method level. -- Sent from my phone
On Thu, Mar 2, 2017 at 6:38 AM Ulf Zibis <Ulf.Zibis@cosoco.de> wrote:
Hi Vitaly,
I don't see any contract to throw an NPE:
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replace-java...
Hmm, I must've looked at java 7 docs, which do call out the NPE: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java....) Was this changed intentionally in 8?
In any case, what's the reasonable of checking an argument, which is not used in that case?
My understanding is contracts like the above (let's assume it still called out the NPE in 8) are checked even if the argument isn't used in some cases. The other case where I believe this is done is when passing a Supplier to some method that uses it to obtain a default value - even if it's not needed, it's checked for null because most (all?) such methods stipulate that it cannot be null.
Am 02.03.2017 um 00:18 schrieb Vitaly Davidovich:
Seems like a good idea. You probably want to null check 'replacement' before the bail out as the method is specified as throwing NPE in that case.
--
Sent from my phone
HI, first sorry for missing the class wide definition. Am 02.03.2017 um 12:45 schrieb Vitaly Davidovich:
On Thu, Mar 2, 2017 at 6:38 AM Ulf Zibis <Ulf.Zibis@cosoco.de <mailto:Ulf.Zibis@cosoco.de>> wrote:
In any case, what's the reasonable of checking an argument, which is not used in that case?
My understanding is contracts like the above (let's assume it still called out the NPE in 8) are checked even if the argument isn't used in some cases. The other case where I believe this is done is when passing a Supplier to some method that uses it to obtain a default value - even if it's not needed, it's checked for null because most (all?) such methods stipulate that it cannot be null.
On the other hand, the null check is a waste of performance and footprint (implicits performance cost too). I can't imagine, if any programmer would rely on such contract, instead of doing the check explicitly if of significance. I think, the original reasonable of the contract is to rule out any other exception type or undefined behaviour in case of relevance of the argument. So I would vote for stating the contract more precisely in that way. Also, if I remember correct, if a variable, here replStr, is only referenced later, the compiler can move the execution of here replacement.toString() to a later position, so it is not clear to me, if the original implementation ever fulfilled the contract. -Ulf
On Thu, Mar 2, 2017 at 8:32 AM Ulf Zibis <Ulf.Zibis@cosoco.de> wrote:
HI,
first sorry for missing the class wide definition.
Am 02.03.2017 um 12:45 schrieb Vitaly Davidovich:
On Thu, Mar 2, 2017 at 6:38 AM Ulf Zibis <Ulf.Zibis@cosoco.de <mailto:
Ulf.Zibis@cosoco.de>> wrote:
In any case, what's the reasonable of checking an argument, which is
not used in that case?
My understanding is contracts like the above (let's assume it still
called out the NPE in 8) are
checked even if the argument isn't used in some cases. The other case where I believe this is done is when passing a Supplier to some method that uses it to obtain a default value - even if it's not needed, it's checked for null because most (all?) such methods stipulate that it cannot be null. On the other hand, the null check is a waste of performance and footprint (implicits performance cost too).
Implicit null checks aren't always eliminated (e.g. calling an empty devirtualized method will still test the receiver for null) but generally don't cost anything. Explicit null checks can sometimes be folded into the implicit one by the optimizer, but not always.
I can't imagine, if any programmer would rely on such contract, instead of doing the check explicitly if of significance.
I think it's to have consistent behavior, irrespective of control flow. It can detect bugs early on. I agree callers can do their own checks, but this consistency is what I've noticed JDK to prefer.
I think, the original reasonable of the contract is to rule out any other exception type or undefined behaviour in case of relevance of the argument. So I would vote for stating the contract more precisely in that way.
Also, if I remember correct, if a variable, here replStr, is only referenced later, the compiler can move the execution of here replacement.toString() to a later position, so it is not clear to me, if the original implementation ever fulfilled the contract.
Compiler can only do that if it proves a bunch of things, such as no observable side effects as a result of sinking it down. If CharSequence.toString isn't devirtualized there, eg, it definitely cannot do it. I wouldn't bet on it.
-Ulf
--
Sent from my phone
Hey Vitaly, Thanks for clearing the way. As all things seem to be clarified now, is there someone who is willing to sponsor the patch? Cheers, Christoph ====== PATCH ====== diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.jav --- a/src/java.base/share/classes/java/lang/String.java +++ b/src/java.base/share/classes/java/lang/String.java @@ -2177,11 +2177,13 @@ */ public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); - String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { + // Explicit nullcheck of replacement to fulfill NPE contract in early out + Objects.requireNonNull(replacement); return this; } + String replStr = replacement.toString(); int tgtLen = tgtStr.length(); int tgtLen1 = Math.max(tgtLen, 1); int thisLen = length();
Am 02.03.2017 um 14:55 schrieb Vitaly Davidovich:
Implicit null checks ... generally don't cost anything. Any additional byte of code in CPU cache prevents other code from staying there.
I can't imagine, if any programmer would rely on such contract, instead of doing the check explicitly if of significance.
I think it's to have consistent behavior, irrespective of control flow. It can detect bugs early on.
Yes, serving bug detection almost always has some cost and should be pondered, but the issue of this patch is to increase performance. Here I suspect the real value of the bug detection help.
I think, the original reasonable of the contract is to rule out any other exception type or undefined behaviour in case of relevance of the argument. So I would vote for stating the contract more precisely in that way.
Also, if I remember correct, if a variable, here replStr, is only referenced later, the compiler can move the execution of here replacement.toString() to a later position, so it is not clear to me, if the original implementation ever fulfilled the contract.
Compiler can only do that if it proves a bunch of things, such as no observable side effects as a result of sinking it down. If CharSequence.toString isn't devirtualized there, eg, it definitely cannot do it. I wouldn't bet on it.
For the original code in case of inlining, I guess, the optimizer would place the null check before the if block and the allocation of the replacement String after. For the new code, the optimizer can detect the duplicate null check inside and outside the if block, so can move it before the if block. So both java codes could result in the same machine code. IMHO the effective optimization should be proved by HS disassembler. There is also the possibility, that an allocation of an intermediate String object is never done, because the anyway involved StringBuilder perhaps can insert the replacement sequence without before instantiating a String object. -Ulf
On Thu, Mar 2, 2017 at 11:42 AM, Ulf Zibis <Ulf.Zibis@cosoco.de> wrote:
Am 02.03.2017 um 14:55 schrieb Vitaly Davidovich:
Implicit null checks ... generally don't cost anything.
Any additional byte of code in CPU cache prevents other code from staying there.
When the optimization applies, there shouldn't be *any* instructions for the null check on the path - it's done via signal handling.
I can't imagine, if any programmer would rely on such contract, instead of doing the check explicitly if of significance.
I think it's to have consistent behavior, irrespective of control flow. It can detect bugs early on.
Yes, serving bug detection almost always has some cost and should be pondered, but the issue of this patch is to increase performance. Here I suspect the real value of the bug detection help.
Performance by way of removing an allocation. There's already code in that method that's much more expensive than a test against a register. So while I suggested to move the explicit null check into the if block, I don't think it'll really matter. But removing an allocation is always a good thing.
I think, the original reasonable of the contract is to rule out any
other exception type or undefined behaviour in case of relevance of the argument. So I would vote for stating the contract more precisely in that way.
Also, if I remember correct, if a variable, here replStr, is only referenced later, the compiler can move the execution of here replacement.toString() to a later position, so it is not clear to me, if the original implementation ever fulfilled the contract.
Compiler can only do that if it proves a bunch of things, such as no observable side effects as a result of sinking it down. If CharSequence.toString isn't devirtualized there, eg, it definitely cannot do it. I wouldn't bet on it.
For the original code in case of inlining, I guess, the optimizer would place the null check before the if block and the allocation of the replacement String after. For the new code, the optimizer can detect the duplicate null check inside and outside the if block, so can move it before the if block. So both java codes could result in the same machine code. IMHO the effective optimization should be proved by HS disassembler.
This is Sufficiently Smart Compiler territory. It's much better to hand-code this properly, particularly since nothing is lost in readability or maintainability in this case.
There is also the possibility, that an allocation of an intermediate String object is never done, because the anyway involved StringBuilder perhaps can insert the replacement sequence without before instantiating a String object.
I know C2 can remove some intermediate string allocations, but I would be very impressed if it handled this particular method in the way you describe.
-Ulf
Am 02.03.2017 um 18:16 schrieb Vitaly Davidovich:
When the optimization applies, there shouldn't be *any* instructions for the null check on the path - it's done via signal handling. Thanks for clarification.
This is Sufficiently Smart Compiler territory. It's much better to hand-code this properly, particularly since nothing is lost in readability or maintainability in this case. Convinced!
-Ulf
Hey Ulf,
Hi Vitaly,
I don't see any contract to throw an NPE: https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replace- java.lang.CharSequence-java.lang.CharSequence-
In any case, what's the reasonable of checking an argument, which is not used in that case?
Am 02.03.2017 um 00:18 schrieb Vitaly Davidovich:
Seems like a good idea. You probably want to null check 'replacement' before the bail out as the method is specified as throwing NPE in that case.
It is stated in the class comment. "Unless otherwise noted, passing a null argument to a constructor or method in this class will cause a NullPointerException to be thrown." Cheers, Christoph
participants (5)
-
Christoph Dreis
-
Claes Redestad
-
Jonathan Bluett-Duncan
-
Ulf Zibis
-
Vitaly Davidovich