FW: Interface evolution via virtual extension methods - Extending Interfaces By Breaking Application Logic (Dangerous)

Elvis Ligu elvis_ligu at hotmail.com
Tue Mar 13 09:06:46 PDT 2012




From: elvis_ligu at hotmail.com
To: brian.goetz at oracle.com
Subject: RE: Interface evolution via virtual extension methods - Extending Interfaces By Breaking Application Logic (Dangerous)
Date: Tue, 13 Mar 2012 16:40:14 +0100













(I just saw my email, and I apologize for the horrible formatting)
I will try to give a very simple example in order to explain (at least I will give it a try) what I mean. Before starting let me tell you that I don't have a lot of knowledge about compilers and how things works into details, so I can't propose anything in detail. I am looking at the problem from a SE perspective and how to achieve a smooth transition from java 7 to java 8 avoiding the overall complexity and accidentally breaking the client code. By breaking client code I mean, breaking client's application even from not compiling at all (which is the least undesirable) or breaking client's application logic (the worst case is when the client's application compiles, run, and pass all tests but the application logic is in some way compromised).
Before continue this is how do I interpret the following concepts:
- Client code, API's client, and client in general -> the client is some other (let say a developer) different  from API. For example, I am a client of Collections API in Java SE 7, and my code is the client code. You  can assume my classes are API's client or I (as a developer) am the client. Generally speaking, the client  is independent from the API's development process, and is using the API according to its public contract  (the interface, whether they are classes or interfaces).
- Caller -> I would like to avoid the misunderstanding between an API's client and a class client. So when  I say caller I mean the client of a specific class (and or interface). I.e. in class A there is a statement   b.print(); in this case I would rather say class A (the instance of A) is the caller of class B (the instance of B). - Java Interface -> explicitly or implicitly a Java Interface (i.e, interface Service {}) is part of the  contract of the API. It doesn't provide any implementation (and it shouldn't), and the purpose of it  is to define a communication point between the implementation provider and the caller. An other purpose  is to define an implicit logic, i.e. Comparator has method compare(), where the method signature is   the definition of communication rules (the caller must call this method using its signature), and the logic   is that the method compare() takes two same arguments and return the order of those objects: 0 -> equals,  positive integer -> obj1 > obj2, negative integer -> obj1 < obj2. Of course there is nothing to prevent a client  to break the implicit logic an interface imposes, and this depends on implementation, however this is not an   issue from the API's developer perspective.
  Our example - the API:    interface Socket {       ...      // method added in release 1.1      public void consumePackets(final PacketBuffer<Packet> packets) default {                   final List<Packet> buffer = new ArrayList<Packet>();          synchronized(packets) {              while(packets.hasNext()) buffer.add(p.next());          }          final PacketSequence sequence = new PacketSequence(buffer.toArray(new Packet[0]));                   // an empty sequence will be returned if no previous one exists         final PacketSequence current = getPreviousSequence().merge(sequence);         setCurrentSequence(current);      }       ...   }   interface SecureSocket extends Socket { }    interface RTPSocket {       ...      // this is defined in release 2.0      public void consumePackets(final PacketBuffer<Packet> packets) default {                     final List<Packet> buffer = new ArrayList<Packet>();          synchronized(packets) {              while(packets.hasNext()) {                  Packet p = p.next();                 buffer.add(p);                 // send each packet to handler to avoid delay                 transmitPacketsToRTPHandlers(new Packet[]{p});         }          final PacketSequence sequence = new PacketSequence(buffer.toArray(new Packet[0])); 
         // an empty sequence will be returned if no previous one exists         final PacketSequence current = getPreviousSequence().merge(sequence);         setCurrentSequence(current);      }      ...  }      class DefaultSocket implements Socket{}  class DefaultSecureSocket extends DefaultSocket implements SecureSocket{}  class DefaultRTPSocket extends DefaultSocket implemenets RTPSocket{}
Here the API designers, have designed an API which provide socket communication to its clients. In release 1.0 the Socket interface didn't have a consumePackets() method. In release 1.1 API designers would like to extend Socket interface so they introduced consumePackets() and provided a default implementation because they don't want to break the clients' code. In release 2.0 they decided to change the default implementation of consumePackets() because they observed their DefaultRTPSocket was having delays when this method was called. They decided to deliver each packet to the handlers while consuming the packets and leave the rest as is.
Our example - API's client
  // defined after release 1.1 of API  class SecureRTPSocket extends DefaultSecureSocket implements RTPSocket {      ...      public void setCurrentSequence(PackeSequence sequence) {           doubleBufferSequence(sequence);           transmitPacketsToRTPHandlers(getPacketsArray());       }
      // this is not override, this is defined here      public int reset() { ... };  }
Here the API's client, using the version 1.1 decided to construct a new secure rtp socket. He used the default implementation of consumePackets() because it did meet his requirements. Because he already observed the delay produced by consumePackets() he decided to use a double buffer in order to avoid it. His application was working perfectly until he decided to replace the old API v1.1 with the new API v2.0. After replacing the API its application started to have delays! In this case the delay comes from the fact that API designers override the default implementation of consumePackets(), and the handler is getting duplicate packets!!! Of course the handler is smart enough to reject duplicate packets. But the problem is still there, so the client is going to invest it. The client is lookingat DefaultSecureSocket and at Socket#consumePackets() but he can't find the problem. Why he is not looking at RTPSocket? Well for 15 years now Java developers don't expect the interface to impact their implementation without breaking the contract (look at the definition above). I know such a powerful mechanism in Java would by very great and using it wisely would provide so much flexibility to an API developer. On the other hand it increases the possibility of writing fragile code. Of course the above example is such a bad one and can be easily avoided, but again virtual methods are increasing risks. Think of C++ and its power, think of multiply inheritance, one can say that it is powerful, however (in my opinion) it complicates things and can lead at more fragile code. As you stated this is an old problem, because this problem is still present with abstract classes for example, but as I previously stated a Java developer has never looked at Interfaces as they are abstract classes. Should they start doing it now?
As I can see you are working at Oracle, and I understand that when it comes to make a change to existing Java API it is a very critical decision, to be or not to be. One of the big problems you should face with is backward compatibility. I was reading some email from a guy here at this mail list where he was concerned about Collections API and the changes that can possibly be made at Java 8. He had extended a Collections interface and added some methods, let say foreach(). In this case if you add a method foreach() it would possibly break his API and this is not what you want. In the above example if the client had a method int reset() in SecureRTPSocket the API designers will not be able to add a method void reset() in Socket interface, at least not without breaking the client's code. The problem here is that API developers don't know how API clients are using their API so they can not make any assumption. Actually the virtual methods gives you the (virtual reality) sense of thinking that by adding a method to an interface you, will not break the API, but as you we can easily see this is not the case. 
* Here I am not criticizing the virtual methods, I like the idea very much. I am rising some of the facts we need to consider when dealing with them.
However, I think there is an interesting solution we can avoid such problems, especially API breaks (from the perspective of API designer). Looking at the above example, let consider the API designers would like to add a method void reset() at Socket interface. In this case we could have a client's code break here, because there is a int reset() method. To avoid this, we can have an other keyword: override.In order the SecureRTPSocket to see the new virtual method he has to write something like this: 
 private override void reset(){...}
If the client doesn't write something like this than this method is not visible to client's class SecureRTPSocket. Some one would say this is not possible because what if some one makes a call like this:
 Socket socket = getSocket(); socket.reset();
In such a case if getSocket() has a return type Socket and the implementation of getSocket() returns a SecureRTPSocketthere are two possibilities: 1) method reset() is already present in SecureRTPSocket, 2) method reset() is not present. Here how it can possibly works:
1) In version 1.1 the client didn't know any Socket#reset() method so there is not a statement in his code that will call Socket#reset().If the Socket#reset() method is ignored in v2.0 his code will continue to be compatible even in case there is a int SecureRTPSocket#reset(). 
2) There is a void SecureRTPSocket#reset() method. Even in this case the Socket#reset() method will be ignored if called as above. At least in client's code will not be any Socket#reset() call, because his code was written based on v1.1.
3) When a caller call Socket#reset(), if this method is ignored in the underlying class implementation, in our case SecureRTPSocket, then a UnsupportedOperationException will be thrown. This makes sense because, the old client code is ignoring Socket#reset() so it simply means "I don't have any implementation for this". 
4) When there exists a void SecureRTPSocket#reset(). Again if the caller calls void Socket#reset() and the void SecureRTPSocket#reset() existsbefore void Socket#reset() then an UnsupportedOperationException is thrown. This makes sense because, the old client code will not have any call Socket#reset(), instead it could have calls like SecureRTPSocket#reset() and this will normally work because no resolution is required against Socket interface.
5) When there exists a override void DefaultSocket#reset(). Well in this case the void Socket#reset() will be visible to DefaultSocket, and it will be visible to DefaultSecureSocket and DefaultRTPSocket, and so it will be visible to SecureRTPSocket, because SecureRTPSocket extends DefaultSecureSocket. If there exist void SecureRTPSocket#reset() then it just overrides void DefaultSecureSocket#reset() so there is not a problem in this case. If there exist a int SecureRTPSocket#reset() than client's code will break (will not compile at all). This makes sense because when a client extends API's classes it means he knows what he is doing and he understand the internals of those classes, so he is aware of any change. In our example a good developer would simply uses encapsulation, and have something like this: class SecureRTSocket implements SecureSocket, RTSocket, and uses a private DefaultSecureSocket to delegate some of its methods and implements the rest. However, if this is not a good approach an other alternative would be to ignore the virtual methods just like in the above cases. The last approach introduces an extra effort because it requires from API developers to write override void reset() to each implementation class (Default...Socket) but at least this ensures 100% backward compatibility. Think of void Collection#foreach, it will require for each Collection's subtype to write override void foreach(). Is this affordable? Well, the virtual methods are used to extends an API so I think this is an effort that it worth.
6) What does override void reset() implementation body contains? The case is: a) it can completely override the super's default implementation, b) it can write something like this SuperType.reset() in order to accept super's default implementation.
7) Where the client has already extended Socket interface, let say he has an interface like this ISecureRTPSocket and SecureRTPSocket class implements this interface. In this case void Socket#reset() is a virtual method so it is visible to ISecureRTPSocket as a virtual one. But it is not visible to SecureRTPSocket because the doesn't exists any override void SecureRTPSocket#reset(). Here we don't have any client's code compatibility issues, it works just like the above cases.
8) Where there doesn't exists any override void SecureRTPSocket#reset() (look at the example above). In this case if the first approach will be implemented then if exists override void DefaultSocket#reset() the reset() method will be visible to all other subclasses just like any other method. If the second approach will be implemented then, if there doesn't exist any override void DefaultSocket#reset() then it will not be visible to DefaultSecureSocket so a override void DefaultSecureSocket#reset() will be a compilation error (not in our case because DefaultSecureSocket implements SecureSocket, reset() is visible to SecureSocket and so override void DefaultSecureSocket#reset() can be overrided). The rule is simple: if we have interface A, B and C extends A, then virtual method A#m() will be visible to B and C. If classes Bimp implements B, Dimp extends Bimp, if there exists override Bimp#m() then could exist override Dimp#m() too, if not then Dimp must explicitly implements A, B or C so we can have override Dimp#m().
9) The compiler and runtime will be crazy with all this complexity. Actually I can't give any reliable response to this. From a higher level I think the compiler and runtime can treat virtual methods just like any other method but with some small differences. In case there is a Interface to Interface resolution they can easily be treated just like other methods, after all we can keep backward compatibility because an interface is not required to have an implementation so the virtual method in my interfaces will not break other interfaces. In case of classes the compiler and or runtime can make the distinct between virtual methods and other regular methods by keeping (let say) an extra bit for a method. So when this bit is 0 it is a regular method and the resolution is the same as before in Java 7, when this bit is 1 it is a virtual method so the resolution can follow the rules I stated above. Especially if the second approach will be implemented (see 5) above) the compiler can easily keep track of virtual methods because the keyword override will be present in any class that implements the virtual method so it can deal with their (strange) resolutions and conflicts.
Conclusions: 
I think the approach of keyword override and the resolution of virtual methods in this way doesn't break any OOP principles (well I am just an undergraduate student that is going to be graduated soon, so I cant say this for sure :-)). Indeed it imposes good OOP because it requires the attention of programmer. Think of this scenario: I am about to work at a company and my boss just gave me a job description (the interface, which is the contract). I take my responsibilities and try to do my best to do my job according to its description. My boss has a new job description for me and he required to meet me to announce that. In the meanwhile some guy from staff who already knows that my by boss is having a new job description for me comes to me and requires me to do some job that is in the new job description not in the one I have. What should I do in this case? Should I try to do what he says, at the risk of doing something that I am not prepare to do, or should I say to him UnsupportedJobException? When I get my new job description from my boss I sit down and learn how to do it and write to my notes override JobDesc#newJob1(), override JobDesc#newJob2() and so on. Next time the guy from staff comes to me I would be prepared and happily do what he requires by taking my responsibilities.
Thank you for hearing me. I know there are brilliant minds out there that could make all of my ideas useless, unfortunately I am not an experienced Software Engineer (well actually not even a new SE :-)) so I could not think of all the tricks. At least I tried to squeeze my little mind and express my thoughts (damn English I really want to master you :-).
I am looking forward to hearing from you.    

> Subject: Re: Interface evolution via virtual extension methods - Extending Interfaces By Breaking Application Logic (Dangerous)
> From: brian.goetz at oracle.com
> Date: Sun, 11 Mar 2012 21:51:35 -0700
> CC: lambda-dev at openjdk.java.net
> To: elvis_ligu at hotmail.com
> 
> Thanks for your comments.  Responses inline: 
> 
> > 1 - There is a risk of breaking client code: consider a developer who has already implemented a comparator like this:          abstract class MyComparator implements Comparator { protected int order = 1; public void reverseOrder() { return -1*order;} }
> >     and uses this somewhere in his code like this:          MyComparator c = new MyComparator() { public int compare(Objct o1, Object o2) { return order* (o1.toString().compareTo(o2.toString())); }}     .... c.reverseOrder();     Arrays.sort(list, c);        If oracle developers decide to add such a method in Comparator interface then they can not ensure that client code will not break; two methods     with the same signature but with a different returned type can not compile (at least not in java 7). 
> 
> Yes, this is a risk.  Note that this is not new with extension methods; this risk exists for adding new methods to existing classes (abstract or concrete) too.  
> 
> The alternative is: never add any new methods to libraries.  We find this alternative unappealing!  
> 
> > 2 - There is a risk of breaking client logic: according to Brian's paper, section 3. Method Dispatch, we can have a diamond-shaped hierarchy as:
> >    interface A { void m() default X.a; }     interface B extends A { }     interface C extends A { }     class D implements B, C { }
> > When the compiler sees d.m() and if there is not an implementation of m() in class D then it will call the default method implementation A.m().
> 
> The compiler does *not* short-circuit calls to D.m() to A.m().  The compiler emits an invokespecial with D.m(); the runtime will perform the resolution, and if the situation is still like this by the time the classes are actually loaded, then the VM will decide to dispatch to A.m().  (This is no different than today; the compiler's job is simply to sanity-check linkage and "fail fast" if the compiler can see the call would not succeed if made at runtime with the classes as they are at compile time, but the runtime builds vtables based on the classes it sees at runtime.)  
> 
> > The client of this API can easily "make the mistake" of relying on default implementation of A.m().
> 
> Again, this is nothing new.  Let's say class C has a method q(), and D extends C but does not override q().  Clients of D could easily "make the mistake" of assuming calls to D.q() are going to end up at C.q(), but that is simply a mistake.  
> 
> > For some reason in the future the developerof the API would like to add a default implementation of m() in B. The problem in this case is bigger than in case 1; 
> > a) if the client of API have already extended interface A with B1 and provided a default implementation for m, and has a "class E implements B, B1" the client code will break.
> 
> At this point E will not compile, due to competing implementations in B and B1.  
> 
> > b) the default m() in B will probably break invariants of a "class MyClass implements B, C" and cause no compile error.
> 
> Why?  If B.m() overrides A.m(), we expect B.m() to implement the contract of A.m().  Note that interfaces have no state (barring heroic actions), so the protection of state-based invariants still falls entirely to C and its subclasses.  The implementation of m() in A or B can only call other interface methods, and if any of those affect state in C, eventually we'll be calling code in C to actually touch the state.  
> 
> > 1 - How to keep backward compatibility? Well at least in the transition phase from java 7 to java 8 nothing can ensure the java developers that adding a virtual method inCollection interface will be compatible with the old java. So my suggestion is another keyword, let say override. Let say we add reverseOrder() in interface Comparator.In this case the java compiler will make the reverseOrder() visible to Comparator's subtypes only and only if the subtype explicitly define: override reverseOrder().
> 
> I believe this is a request for "opting in" to extension methods, rather than having them actually inherited?  I invite you to propose exactly how this might work in more detail.  
> 
> > So incase 1 MyComparator (the client code) is not affected; reverseOrder() is not visible in MyComparator.  2 - How to avoid making API fragile by default (case 2)? My suggestion is the same as the above: the keyword override. By writing override reverseOrder() the API's clientis required to explicitly define which default implementation he is relying on, specifying SuperType.super.m() or make a "default none" in case of an interface. If the clientdoesn't specify override, there i!
> 
> You say "client" when I think you mean "subclass".  Are you really suggesting the opt-in should be at the *client* site, or are you speaking only at the inheritance site?  
> 


 		 	   		   		 	   		  


More information about the lambda-dev mailing list