[8u] JDK-8191393: Random crashes during cfree+0x1c

Tue Apr 7 15:20:13 UTC 2020

Hi, Xin.

Could you explain the nature of the race condition between concurrent mark threads and the vm thread? And why _gclog_reentry is needed (i.e., how the vm thread can call write() from rotate_gc_log())?

The comment on rotate_log() in ostream.cpp (you can see most of it in the webrev) is:

// rotate_log must be called from VMThread at safepoint. In case need change parameters
// for gc log rotation from thread other than VMThread, a sub type of VM_Operation
// should be created and be submitted to VMThread's operation queue. DO NOT call this
// function directly. Currently, it is safe to rotate log at safepoint through VMThread.
// That is, no mutator threads and concurrent GC threads run parallel with VMThread to
// write to gc log file at safepoint. If in future, changes made for mutator threads or
// concurrent GC threads to run parallel with VMThread at safepoint, write and rotate_log
// must be synchronized.

I'd take a look at the CMS (and G1) concurrent mark code to discover why it's running during a safepoint. That might be where the problem should be fixed, rather than in the logging code.

Assuming the fix should be in the logging code:

In vmThread.*:

The names of simple getters are conventionally the same as the field name. In this case, I'd use _reentry_gclog as the field name rather than _reentry_gclog_writing.

In ostream.*:

I don't understand the comment in gcLogFileStream::write() that says

	// can't use Thread::current() because thread is NULL before initialization

It'd be good for the comment to explain how it can happen that a GC log entry can be written before Thread::current() is valid.

In rotate_log(), you can hoist the assignment to thread out of the assert and use it instead of vmthread in the locked region.

In rotate_log(), it looks like you added rotate_gc_log_impl() because part of the race is in should_rotate(). But, you don't need to add rotate_gc_log_impl(), you can just put the new code into the existing rotate_log().

Thanks,
Paul

On 4/6/20, 3:47 AM, "jdk8u-dev on behalf of Liu, Xin" <jdk8u-dev-bounces at openjdk.java.net on behalf of xxinliu at amazon.com> wrote:

    Hi,  Reviewers, 

    May I request review this bugfix? 
    Bug: https://bugs.openjdk.java.net/browse/JDK-8191393
    Webrev: https://cr.openjdk.java.net/~xliu/8191393/webrev/

    I believe that root cause  is the race condition between ConcurrentMarkThread and the VMThread.
    We can’t directly backport from upstream because upstream is based on unified logging framework. further, TIP doesn’t have reentry problem while jdk8u has. 
    Instead, this patch uses a mutex to protect FILE* fileStream::_file and a Boolean flag to avoid reentry.  

    Of course, a mutex  is not free. Therefore, I didn't use it if users don't emit log to file. 

    Test: 
    I ran the stress test over 24 hours and no problem is identified. 
    I passed tier1 test of jtreg. 

    Thanks,
    --lx