RFR (2nd round) 8071627: Code refactoring to override == operator of Symbol*

Thu Apr 30 19:42:47 UTC 2015

I have a few followup points on this question.  Actually, it reminds me a lot of some struggles in Java with object references and their sometimes-broken == operation.

Calvin's V2 patch shows that there are about 200 of the risky Symbol*::== operations.  It introduces 347 new lines to fix them.  With high confidence it finds all the the risky operations (because he started with handles).  As you point out, Coleen, our confidence about introducing new bugs is not so good.  But I like the patch because it is relatively localized.  Though it is not as localized as I had hoped, since 200 is not a small count.

Calvin's V1 patch, by comparison, has 1530 new lines, four times more than V2.  Nearly all of those are trivial changes from Symbol* to SymbolRef (or maybe SymbolHandle in V3).  If we go with that then we invest in more FooHandles instead of Foo* types.  The change to a class incrementally increases the cost of working with the code.  TempNewSymbol did this, and was painful though worth it (and hard to get right) because it helped us get our reference counting right.  Will SymbolHandle be worth it enough?

This is a real question, because C++ programmers know that some pointer operations are just tricky, and so they already have a certain amount of defensiveness about expressions like p==q (oh, was that strcmp I wanted?) or p++ (oh, wait, I wanted p->increment()).  Given that defensiveness, the incremental benefit of adding more defenses is… debatable.  (Hence this debate.)

On Apr 22, 2015, at 2:11 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
> 
> Hi Calvin, and fellow JVM engineers,
> 
> I prefer a modification of your first version of this change much better.
> 
> I really don't like this...  It feels very unsafe to me.  I don't know how to run any tools to make sure I don't break this!  Honestly this seems wrong and there are too many places that compare Symbols even though the changeset is smaller.
> 
> We triage bugs in the runtime group weekly with SQE.  This change will cause bugs that have various symptoms and will be hard to trace to this root cause.  The bugs will mostly land in the Runtime component of the JVM because in fact, the Symbol class is mostly used by the runtime component of the JVM.  In addition, running internal tools to find these errors *monthly* is too late and running them individually adds overhead and friction to making changes in the JVM.  More overhead is the last thing we need!

I'm talking with the Parfait folks today about whether their tools can be configured to diagnose suspicious uses of Symbol*::==, and also things like char*::== and Foo*::+(int) and Foo*::-(Foo*).  I think they can.  Would that help, say, if we had nightly scans to find problems that escape programmer diligence?  The programmer diligence is always the first line of defense though:  There is no excuse for not knowing how C pointers work.

> Having to use ->equals() is clunky too.

It's clunky but in thousands fewer places, if you agree with me that SymbolHandle is clunky also.

> For better or worse, the JVM is written in C++ which has operator overloading for these purposes.  Modern C++ programming already avoids raw pointers in favore of smart pointers!

This, and the previous point about access to tools, is the strongest point in your argument, IMO.

> The JVM code has historically avoided raw metadata pointers, first because of PermGen but now because the values pointed to have semantics that we want to encapsulate.  I admit that it was nice using raw pointers and brought all of us back to a simpler day but they're not safe in general for this sort of system software.
> 
> In the JVM code, we have Handles:
> 
> 1. oops => Handle because they may move with garbage collection.
> 2. Method* => methodHandle because they may get deallocated with class redefinition (same for Klass eventually)
> 3. Symbol* => SymbolHandle because pointer equality isn't sufficient to ensure equality
> 
> The other objection for Calvin's first change was that it's a lot of code changed.  But there's a lot of other large code changes going forward at this time.  This is the simplest of large changes, ie. simple renaming.   This feature is needed for others going forward to support our important customers.  This amount of code change is justified.
> 
> Embedded in this mail, if you've read this far, is a suggestion to rename SymbolRef (a name I hate) to SymbolHandle.  Because that's what it now is.

I am leaning towards agreeing, despite the thousands of lines of noisy changes required, because we won't need tooling support if we bite the bullet and dump the Symbol pointers.  But I keep coming back to this:  Where does it stop?  How many of our object classes need auxiliary handle classes?  Are all of our pointer types just bugs waiting to happen?  If so, why are we fixating on Symbol*::==?  If not, why is Symbol*::== such a uniquely  bad problem?

Thanks, Coleen.

— John

> Thanks,
> Coleen
> 
> 
> On 4/15/15, 3:56 PM, Calvin Cheung wrote:
>> Please review this second version of the fix.
>> 
>> This version has 2 new functions (equals() and not_equals()) in the Symbol class.
>> It replaces the Symbol* == and != comparisons with those 2 function calls.