Impact of code difference in Collection#contains() worth improving?

Wed Aug 27 19:42:18 UTC 2014

On Aug 27, 2014, at 6:48 AM, Fabian Lange <lange.fabian at gmail.com> wrote:

> Hi all,
> I have been involved recently in some theoretical or nonsensical
> discussions about microbenchmarking, jit compiling assemblies and so fort.
> One example was LinkedList vs ArrayList.
> 
> What I noticed is that those two have a different implementation for
> contains():
> 
> ArrayList:
> 
>    *public* *boolean* contains(Object o) {
>        *return* indexOf(o) >= 0;
>    }
> 
> LinkedList:
> 
>    *public* *boolean* contains(Object o) {
>        *return* indexOf(o) != -1;
>    }
> 
> Logically this is of course identical due to the contract of contains which
> returns either -1 or the >=0 index of the element.
> 
> This code has been like this almost forever, and I was wondering if this
> actually makes a difference in CPU cycles.

If it makes a difference, (a) that difference is unimportant, and (b) it is not controllable at the source code level.

Ad (a), the only time the choice of an instruction or two would make even a slight difference is if the list is empty.  Otherwise, the cost of traversing the list and testing each element will swamp any small difference in the comparison.

Ad (b), the JIT will unrecognizably optimize both codes into surprising instruction sequences.  If they differ, the differences will be suprising, and depend on stuff you weren't looking at, such as branch frequencies or the complexity of nearby code structures.

> And in fact this code compiles into different assembler instructions. The
> array list does a test against 0 and conditional move, while the linked
> list does a jump equals on -1.

On Thursdays, they compile to the same thing.  On every full moon, they swap instruction sequences.  ...That's not strictly true, but you get the idea.  Changing the Java source code operators in response to observed instruction sequences is a losing game, unless there is a large profit directly available.

So what's the winning game?  That might be to file a bug against the optimizer (server compiler) if you see truly bad instructions.  But the "badness" has to be more than speculative.  Unless it must has a measurable and significant cost, for plausibly applications (not nanobenchmarks), the bug will not (and should not) get time from compiler engineers.

Often, as with a bit of dust on your camera lens, the best course is not to touch it at all, for fear of making things worse.

— John

> Again that is not surprising, because the actual java source is different.
> But I wonder if both options are equally good in cold performance and when
> jitted based on parameter values.
> 
> Wouldn't one implementation be better than the other? And why is not the
> "better" implementation taken in both classes (and maybe other Collections
> which use indexOf) ?
> 
> Is the answer that this has always been like this and the benefit is not
> worth the risk of touching ancient code?
> 
> And if not for performance, would code clarify and similarity be an
> argument?
> 
> Or can the answer be found on a different mailing list? Then let me know :)
> 
> 
> Fabian