Request for review JDK-8185591 guarantee(_byte_map[_guard_index] == last_card) failed: card table guard has been modified
Thomas Schatzl
thomas.schatzl at oracle.com
Wed Nov 22 08:28:33 UTC 2017
On Tue, 2017-11-21 at 18:24 -0500, Kim Barrett wrote:
> > On Nov 21, 2017, at 5:18 PM, Alexander Harlap <alexander.harlap at ora
> > cle.com> wrote:
> >
> > Please review change for JDK-8185591 <https://bugs.openjdk.java.net
> > /browse/JDK-8185591> - guarantee(_byte_map[_guard_index] ==
> > last_card) failed: card table guard has been modified
> >
> > Change is located at http://cr.openjdk.java.net/~aharlap/8185591/we
> > brev.01/
> >
> > Problem was in mishandling zero count in code generated by
> > gen_write_ref_array_post_barrier().
> >
> > Code is machine specific. Suggested fix for arm, sparc and x86_64.
> >
> > No changes are required for x86_32 - case zero count already is
> > handled properly ( same as for s390 and ppc).
> >
> > Aarch64 code have same problem as arm, sparc and x86_64, but I did
> > not include this platform in suggested changeset.
> >
> > I attached possiible change for aarch64
> > stubGenerator_aarch64.cpp.diff
> >
> > Testing was done with JPRT.
> >
> > Thank you,
> >
> > Alex
>
> Thanks for tracking this down.
Thanks! Looks good to me.
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
> 1270 __ testl(count, count);
> 1271 __ jcc(Assembler::zero, L_done); // zero count -
> nothing to do
>
> Instead of that, might it be better to instead add something like
> this?
>
> 1277 __ subptr(end, start); // end --> cards count
> + 1278 __ jcc(Assembler::less, L_done); // negative
> inclusive count - nothing to do
>
> Similar question for all of the affected platforms.
>
> I don't currently have a strong preference either way, but wonder if
> there's a good reason to choose one over the other.
Basically the test instruction, particularly in combination with jcc,
has some fast paths in processors. While in current ones that
difference is pretty small after looking at the optimization manual
(current Intel manuals only indicate that the test instruction (macro-)
fuses with all flags afterwards; the sub does not [0]). On earlier
processors (probably includes AMD ones) only test fuses with the jcc.
I.e. some nano-optimization.
Thanks,
Thomas
[0] https://www.intel.com/content/dam/www/public/us/en/documents/manual
s/64-ia-32-architectures-optimization-manual.pdf , 3-13
More information about the hotspot-dev
mailing list