RFR: JDK-8160350 cannot truss jdk9 [solaris]

Fri Jul 1 20:31:51 UTC 2016

On 01/07/2016 20:44, Kim Barrett wrote:

> This doesn't look right to me.
>
> The BSD implementation of os::elapsedVTime presently just calls
> elapsedTime, with a comment "better than nothing, but not much".  I
> think it's worse than nothing, and should actually fail, with
> os::supports_vtime returning false for this platform.  Similarly,
> elapsedVtime on some other platforms falls back to elapsedTime, which
> I suspect is a mistake.

I think os::elapsedVTime() is right on Windows, AIX, Solaris and Linux, 
and could be made right on OSX as well if the Mach APIs were used, and 
on FreeBSD it could use getrusage(). I think the problem is the other 
BSDs, I couldn't find anything suitable to use for those.

And I agree the fallback to elapsedTime() also seems suspect as it's 
only ever done if the vtime mechanism for a given platform fails. If 
there was ever a transient failure and elapsedVTime() switched between 
vtime and elapsed time, you'd get some *very* bizarre values.

> The problem is that the way one deals with vtime is different from the
> way one deals with elapsedTime values.  The predicate is needed so
> that code that wants to use vtime if available can be properly written
> to fall back to something else when it's not.  (Note that I think
> there might be bugs in how G1 is attempting to do that in some
> places.)

I agree what's there at present is a bit of a mess, which is why the 
first version of the patch (attached to the bug) just addressed the 
Solaris issue that caused truss to fail - that *definitely* needs fixing 
as it will cause a significant observability issue on Solaris.

> I have a note to self to look into this and probably report some bugs,
> but haven't gotten around to doing so.
>
> So while we might no longer need os::enable_vtime and
> os::vtime_enabled, I think getting rid of os::supports_vtime isn't
> right.

Two main options seem feasible:

1. Revert to the V1 patch which just fixes the Solaris issue and leaves 
everything else alone.

2. Respin the V2 patch and leave os::supports_vtime() in place, and the 
two calls to it (from g1CollectedHeap.cpp and 
g1YoungRemSetSamplingThread.cpp). Remove os::enable_vtime() and 
os::vtime_enabled().

For #2 we could in addition change BSD code so it returns the correct 
value from os::supports_vtime() (false) but I'm cautious about doing 
that as it would start using a new codepath that hasn't previously been 
exercised.

Also for #2 we could remove the fallback code, the question being what 
to replace it with? Returning 0 could potentially cause the generation 
of negative values elsewhere, if the call to os::elapsedVTime() had 
previously succeeded.

> OTOH, the present usage of vtime is quite limited.  As noted, it's
> only used by G1, and there I think only for some logging / stats
> reporting.  Maybe we don't really need it at all?  I don't have an
> opinion on that, but other folks in the GC team (at least) might.

Who should we ask for a GC opinion?

I'm leaning towards switching back to the V1 of the patch which just 
fixes the Solaris issue, and in addition logging another bug which 
covers the larger cross-platform issues. That seems like the lowest-risk 
approach whist still fixing a problem on Solaris that *absolutely* needs 
to be addressed, but I'll welcome any advice.

Thanks,

-- 
Alan Burlison
--