RFR: JDK-8160350 cannot truss jdk9 [solaris]

Fri Jul 1 20:32:30 UTC 2016

 > I'm leaning towards switching back to the V1 of the patch which just
 > fixes the Solaris issue, and in addition logging another bug which
 > covers the larger cross-platform issues. That seems like the lowest-risk
 > approach whist still fixing a problem on Solaris that *absolutely* needs
 > to be addressed, but I'll welcome any advice.

Vote: Use JDK-8160350 to fix Solaris and file a new bug for the
     larger cross-platform issues.

Dan

On 7/1/16 2:31 PM, Alan Burlison wrote:
> On 01/07/2016 20:44, Kim Barrett wrote:
>
>> This doesn't look right to me.
>>
>> The BSD implementation of os::elapsedVTime presently just calls
>> elapsedTime, with a comment "better than nothing, but not much".  I
>> think it's worse than nothing, and should actually fail, with
>> os::supports_vtime returning false for this platform. Similarly,
>> elapsedVtime on some other platforms falls back to elapsedTime, which
>> I suspect is a mistake.
>
> I think os::elapsedVTime() is right on Windows, AIX, Solaris and 
> Linux, and could be made right on OSX as well if the Mach APIs were 
> used, and on FreeBSD it could use getrusage(). I think the problem is 
> the other BSDs, I couldn't find anything suitable to use for those.
>
> And I agree the fallback to elapsedTime() also seems suspect as it's 
> only ever done if the vtime mechanism for a given platform fails. If 
> there was ever a transient failure and elapsedVTime() switched between 
> vtime and elapsed time, you'd get some *very* bizarre values.
>
>> The problem is that the way one deals with vtime is different from the
>> way one deals with elapsedTime values.  The predicate is needed so
>> that code that wants to use vtime if available can be properly written
>> to fall back to something else when it's not.  (Note that I think
>> there might be bugs in how G1 is attempting to do that in some
>> places.)
>
> I agree what's there at present is a bit of a mess, which is why the 
> first version of the patch (attached to the bug) just addressed the 
> Solaris issue that caused truss to fail - that *definitely* needs 
> fixing as it will cause a significant observability issue on Solaris.
>
>> I have a note to self to look into this and probably report some bugs,
>> but haven't gotten around to doing so.
>>
>> So while we might no longer need os::enable_vtime and
>> os::vtime_enabled, I think getting rid of os::supports_vtime isn't
>> right.
>
> Two main options seem feasible:
>
> 1. Revert to the V1 patch which just fixes the Solaris issue and 
> leaves everything else alone.
>
> 2. Respin the V2 patch and leave os::supports_vtime() in place, and 
> the two calls to it (from g1CollectedHeap.cpp and 
> g1YoungRemSetSamplingThread.cpp). Remove os::enable_vtime() and 
> os::vtime_enabled().
>
> For #2 we could in addition change BSD code so it returns the correct 
> value from os::supports_vtime() (false) but I'm cautious about doing 
> that as it would start using a new codepath that hasn't previously 
> been exercised.
>
> Also for #2 we could remove the fallback code, the question being what 
> to replace it with? Returning 0 could potentially cause the generation 
> of negative values elsewhere, if the call to os::elapsedVTime() had 
> previously succeeded.
>
>> OTOH, the present usage of vtime is quite limited.  As noted, it's
>> only used by G1, and there I think only for some logging / stats
>> reporting.  Maybe we don't really need it at all?  I don't have an
>> opinion on that, but other folks in the GC team (at least) might.
>
> Who should we ask for a GC opinion?
>
> I'm leaning towards switching back to the V1 of the patch which just 
> fixes the Solaris issue, and in addition logging another bug which 
> covers the larger cross-platform issues. That seems like the 
> lowest-risk approach whist still fixing a problem on Solaris that 
> *absolutely* needs to be addressed, but I'll welcome any advice.
>
> Thanks,
>