RFR [16/java.xml] 8248348: Regression caused by the update to BCEL 6.0

Joe Wang huizhe.wang at oracle.com
Sat Jun 27 23:48:30 UTC 2020



On 6/26/2020 12:46 PM, Stuart Marks wrote:
>
>
> On 6/25/20 4:53 PM, Joe Wang wrote:
>> Please review a fix to a BCEL regression. At issue was the addition 
>> of hashCode() method to Instruction.java in BCEL 6.0. This hashCode() 
>> method was implemented to return the instruction's opcode that 
>> unfortunately is mutable. The problem hasn't showed up until the code 
>> path led to calls to remove from a HashSet a target that has been 
>> changed since it was added to the HashSet. The proposed patch is to 
>> make the hashCode final/immutable.
>>
>> This patch implies that a target object is considered the same one 
>> even if its field values may have been changed. It therefore may not 
>> be appropriate in other situations (or may cause problems). However, 
>> since it had always had no hashCode() override before BCEL 6.0, 
>> thereby relying on Object's identity hash code, its use case has been 
>> limited and time-tested. It can benefit from this patch in that it 
>> provides the same function as Object's hash code, and then serves as 
>> a reminder (to any who might read the code) how it was used (objects 
>> considered to be the same over the course as far as the hashCode is 
>> concerned).
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8248348
>> webrevs: http://cr.openjdk.java.net/~joehw/jdk16/8248348/webrev/
>
> Hi Joe,
>
> This fix seems vaguely wrong, but in the context of what I understand 
> it's doing, it seems that BCEL or the surrounding JAXP code is even 
> more wrong. :-)

The patch is surely unusual, because of this unusual case. As far as 
which one is more wrong, I believe the burden is mostly on BCEL although 
JAXP might share a bit, at the least it exposed the issue.

>
> From what I understand, this bugfix was prompted by a case where an 
> Instruction was added to a HashSet. The Instruction was then mutated 
> while in the HashSet, and this resulted in it not being removed from 
> the HashSet in the future, when it was necessary to do so, thus 
> leaving a stale entry. How could this have possibly worked?!?

It works in most majority of the cases, although I think the BCEL 
developers admitted they had limited tests. This is because while 
Instruction allows mutation, all but one type of instructions actually 
do,  and the mutated one don't get removed from the HashSet in a normal 
build-up process. But XML applications can go berserk sometimes, and in 
this case resulted in a method exceeding the max method size. When that 
happens, a rebuild process involved many copies/clones, resetting 
targeters, the later included removing targeters (among them the mutated 
ones) from the HashSet. Given that such a big stylesheet is unusual, it 
makes it an edge case. At least, we haven't received any error reports 
from the 20% who adopted Java 11.

>
> I'm speculating a bit, but I could imagine code like this:
>
>     var inst = new Instruction(ORIG_OPCODE);
>     var set = new HashSet<Instruction>();
>     set.add(inst);
>     ...
>     inst.setOpcode(NEW_OPCODE); // (1) mutates inst while in HashSet!
>     ...
>     set.remove(inst); // (2)
>
> In the current version, the hashCode() is simply the opcode, and 
> equals() compares opcodes (and possibly other things). This is correct 
> from the standpoint of Instruction itself, as equal items have the 
> same hashcode. However, the above code fails because at line (2), the 
> remove() method looks in the "wrong" hash bucket and can't find the 
> instruction, so it does nothing.

Yes, it will be correct for the majority cases where opcodes don't 
change. Also note that HashSet checks hashcode, and then the key's 
identity before key.equals, which is why this patch works. When the 
build process calls on an instruction to dispose itself, it removes 
itself from the targeter HashSet. But while it is itself, if the 
hashcode has changed, it won't even find itself!  (sort of lost itself) 
(that's too many "itself")

>
> The patch changes the class to compute the hashcode at construction 
> time, and makes the hashcode constant, so it "fixes" the above code, 
> since the remove() method will look in the same bucket and find the 
> desired instance. But now Instruction breaks the hashCode/equals 
> contract, since it's possible for two instances to be equal but to 
> have different hashcodes. (Consider an Instruction that was created 
> with NEW_OPCODE. It can be equal to inst, but its hashcode will differ.)

It would break the hashCode/equals contract if it was implemented 
normally. But this is unusual, the equals implementation was nothing but 
normal: for branch instructions, it always returns false!  Thus, this 
unusual hashCode implementation will not break the contract.

>
> It seems to me that this patch is compensating for broken code 
> elsewhere in BCEL or JAXP by adding more brokenness. It could be that 
> is ok in this narrow usage context. That is, as far as we know, this 
> "shaded" copy of BCEL is used *only* by JAXP and not used in general, 
> and it may be that the particular ways that JAXP uses BCEL aren't 
> broken (and are indeed fixed) by this change. But I'd like to see some 
> assurance of that. It could be that there are bugs introduced by this 
> change that simply haven't been uncovered by the testing we've done yet.

I mentioned above, I believe the burden was mostly on BCEL. JAXP/Xalan 
might have complicated things, the way it rebuilds the instruction list 
when it encounters an over-the-limit method, or it could have been 
smarter to avoid building such a large list. But it was all done through 
BCEL's public APIs. Nothing JAXP/Xalan has done was not within the 
functions BCEL provided.

In BCEL, not only the equals were implemented with some special 
treatment for certain subtypes, the choose of HashSet to store mutable 
(instruction) objects may arguably not have been the best.

>
> Another approach is to figure out what is mutating the Instruction 
> while it's in a HashSet and fix that instead.

Re-setting Instruction target triggered it. But unfortunately that's 
necessary as far as the current process goes.

>
> Unfortunately, either of these approaches seems to involve an 
> arbitrary amount of work. :-(

Very true, and that's why I provided this shaky yet 
working-for-the-hacky-code patch, it's a patch on patch rather than 
something one could do properly when designing a new code.

That's not to say the BCEL engineers hadn't put in a great effort. 
They've worked on it for about half a year related to this 
hashCode/Equals issue in BCEL 6.0. They also deliberated replacing 
HashSet. But I guess eventually reality sets in, it would have been a 
major architectural work.

All that's said, while it has its (maybe major) problems as this issue 
revealed, BCEL has been a stable component for JAXP/java.xml. This issue 
introduced by the changes in BCEL 6.0, I believe, is an edge case as 
explained previously. If the instruction list is built 
straight-forwardly, it would not have run into this issue. This patch 
resolves the case, restores the delicate balance and hopefully keep BCEL 
as stable as before.

-Joe

>
> s'marks



More information about the core-libs-dev mailing list