From serguei.spitsyn at oracle.com Sat Nov 1 09:59:39 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Sat, 01 Nov 2014 02:59:39 -0700 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <5454218D.40009@oracle.com> References: <543C591E.8010602@oracle.com> <544AB477.4000204@oracle.com> <544ADC07.6080904@oracle.com> <544AE76A.9030701@oracle.com> <544E5123.1060202@oracle.com> <544E8844.1070907@oracle.com> <0FB37288-5995-4E0B-B005-E00A4FB3B22A@oracle.com> <5454218D.40009@oracle.com> Message-ID: <5454AF0B.4060008@oracle.com> On 10/31/14 4:55 PM, Yumin Qi wrote: > Karen, > > Thanks for your detail message for debugging. Yes, from my > debugging, the exception did happen in TestThread other than main > thread. I have no idea why in the end the exception was reported in > main thread. > > You mention > > So that change to the test would be: > in TestTransformer: > if (loader != null) { > if (tName.equals("TestThread")) { > { > loadClasses(3); > } > } > return null; > } > > The loader is the one defined in the test case, right? Not sure, I understand your question correctly. If thread is the TestThread then most likely the answer is "Yes". This one is expected: sClassLoader = new URLClassLoader(new URL[] {sURL}); The class loading for TestThread has to happen in the loadClasses(2). I wonder if we ever observe any other loader for the TestThread. The question is because the TestThread is pretty simple: private static class TestThread extends Thread { private final int fIndex; public TestThread(int index) { <== it is called with index = 2 super("TestThread"); fIndex = index; } public void run() { loadClasses(fIndex); } } Thanks, Serguei > The system class loader is never null. > I will try this change, let's see if it can work it out. > > Thanks > Yumin > > On 10/31/2014 3:29 PM, Karen Kinnear wrote: >> Yumin, >> >> From your earlier exception stack trace (many thanks) you reported: >> >> Exception in thread "main" java.lang.ClassCircularityError: (no - I don't know why this is in thread "main") >> sun/misc/URLClassPath$JarLoader$2 >> at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:771) >> at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:843) >> at sun.misc.URLClassPath.getResource(URLClassPath.java:199) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:364) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:426) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:359) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:340) >> at ParallelTransformerLoaderApp.loadClasses(ParallelTransformerLoaderApp.java:83) >> at ParallelTransformerLoaderApp.main(ParallelTransformerLoaderApp.java:45) >> >> >> So I ran with -XX:AbortVMOnException=java.lang.ClassCircularityError -XX:+ShowMessageBoxOnError to get >> a log file and stack trace. See my instructions below on how to do that. >> >> I did this, attached a debugger, which didn't help enough since I needed to see the java stack frames, >> and got an hs_err_log also, so the stack traces came from the error log. >> >> The stack trace was on Thread 2, which in the hs_err_log was TestThread (which makes sense for what the test logic says). >> See later in email for stack traces from Thread 2. >> >> Summary of stack trace: >> >> TestThread: >> loadClasses(#) -> forName(TestClass#, URLClassLoader) >> vm calls out to URLClassLoader.loadClass(String) which is inherited from java.lang.ClassLoader.loadClass(String) >> ... calls java.net.URLClassLoader.findClass(...) which calls >> DoPrivileged java.net.URLClassLoader$1.run which calls >> sun.misc.URLClassPath.getResource(name, false) which calls >> sun.misc.URLClassPath$JarLoader.getResource which calls >> sun.misc.URLClassPath$JarLoader.checkResource which tries to call sun.misc.URLClassPath$JarLoader$2 >> - and then the transformer jumps in with loadClasses(# (which we know is 3) and walks the same logic which tries to load sun.misc.URLClassPath$JarLoader$2 again >> >> Note that in the placeholder table information that Yumin printed, the circularity error is on sun.misc.URLClassPath$JarLoader$2 with the null == boot loader, which >> makes sense -- that is the appropriate defining loader, and therefore the one the CFLH would intercept during the defineClass phase. >> >> In the sun.misc.URLClassPath.java file, in the class JarLoader, in the method checkResource >> ... return new Resource() { ... } >> -- I do not know why that generates sun.misc.URLClassPath$JarLoader$1, $2 and $3 at build time or when that was added. >> I would guess that is when the bug started happening. >> >> When I have a successful run, sun.misc.URLClassPath$JarLoader$2 loads before any TestClass1 loads. >> >> My belief is that the point of the test is to test parallel class loading for URL class loaders. >> I don't think the point is to test the bootstrap class loader, nor to test bootstrapping - i.e. running the agent before >> we have loaded sufficient classes to allow loading URLClassLoader classes. >> >> What I suggested to Yumin that he try would be to change the test to NOT intercept boot loader loads, so that sun.misc.URLClassPath$JarLoader$# >> can load which will in turn allow classes loaded by a URLClassLoader subclass to load. >> >> So that change to the test would be: >> in TestTransformer: >> if (loader != null) { >> if (tName.equals("TestThread")) { >> { >> loadClasses(3); >> } >> } >> return null; >> } >> // I also suspect with that change, we can remove the sleep loop >> Note: there was a printed message which said that the Thread "Signal Dispatcher" has called transform(), which I >> ignored, however it is good that we don't call loadClass on that thread - which is part of what the sleep loop does - >> but that would be handled by the boot loader screening above >> >> Alternatively we can preload the URLClassPath classes, but I don't think we want to do that, or >> we can have the agent explicitly screen on a variety of jdk bootstrapping classes. But I think the cleaner >> solution is to screen on the boot loader. >> >> Does that make any sense to others? >> >> thanks, >> Karen >> >> p.s. How to run with hotspot flags (jtreg has a -show:rerun option, but with a shell script in the test, this is more complex, so >> the following should be easier): >> >> So what I did was run the test once for it to pass (not your script, but just once with jtreg) so that it generated >> the $DST/work directory. >> I then created a rerun.csh script - attached - you can modify for your own $DST directory. >> I used it to be able to quickly rerun the test without the jtreg framework and compile time etc. but mostly >> to be able to actually add hotspot command-line flags. >> >> >> >> p.p.s. details from the error log (let me know if you want me to attach the error log to the bug report) >> >> note: error log shows last 10 events including: >> Event: 0.928 loading class sun/misc/URLClassPath$JarLoader$2 >> Event: 0.928 loading class TestClass3 >> Event: 0.929 loading class TestClass3 done >> Event: 0.929 loading class java/lang/ClassCircularityError >> Event: 0.929 loading class java/lang/ClassCircularityError done >> >> TestThread >> >> java frames: >> >> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >> v ~StubRoutines::call_stub >> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >> j ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >> j sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >> j sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >> v ~StubRoutines::call_stub >> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >> v ~StubRoutines::call_stub >> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >> j ParallelTransformerLoaderApp$TestThread.run()V+4 >> v ~StubRoutines::call_stub >> >> >> >> >> detailed frames: >> >> V [libjvm.so+0x760f5a] Exceptions::_throw_msg(Thread*, char const*, int, Symbol*, char const*)+0x7c >> V [libjvm.so+0xce005c] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x7d8 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x690fbc] ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, ConstantPool*, int)+0x14a >> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >> V [libjvm.so+0xce2096] SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >> V [libjvm.so+0xce00a8] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x824 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >> j ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >> j sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >> j sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x911bfb] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x3cd >> V [libjvm.so+0x916918] jni_CallObjectMethod+0x388 >> C [libinstrument.so+0x4eb5] transformClassFile+0x1e5 >> C [libinstrument.so+0x1e06] eventHandlerClassFileLoadHook+0x96 >> V [libjvm.so+0xa04afa] JvmtiClassFileLoadHookPoster::post_to_env(JvmtiEnv*, bool)+0x1a8 >> V [libjvm.so+0xa0485e] JvmtiClassFileLoadHookPoster::post_all_envs()+0x8a >> V [libjvm.so+0xa047c6] JvmtiClassFileLoadHookPoster::post()+0x18 >> V [libjvm.so+0x9fb6e1] JvmtiExport::post_class_file_load_hook(Symbol*, Handle, Handle, unsigned char**, unsigned char**, JvmtiCachedClassFileData**)+0x85 >> V [libjvm.so+0x5cd17d] ClassFileParser::parseClassFile(Symbol*, ClassLoaderData*, Handle, KlassHandle, GrowableArray*, TempNewSymbol&, bool, Thread*)+0x2af >> V [libjvm.so+0x5dd441] ClassFileParser::parseClassFile(Symbol*, ClassLoaderData*, Handle, TempNewSymbol&, bool, Thread*)+0x95 >> V [libjvm.so+0x5daf03] ClassLoader::load_classfile(Symbol*, Thread*)+0x2ed >> V [libjvm.so+0xce1cc4] SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x118 >> V [libjvm.so+0xce00a8] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x824 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x690fbc] ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, ConstantPool*, int)+0x14a >> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >> V [libjvm.so+0xce2096] SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >> V [libjvm.so+0xce00a8] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x824 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >> ...... From karen.kinnear at oracle.com Sat Nov 1 12:47:07 2014 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Sat, 1 Nov 2014 08:47:07 -0400 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <5454AF0B.4060008@oracle.com> References: <543C591E.8010602@oracle.com> <544AB477.4000204@oracle.com> <544ADC07.6080904@oracle.com> <544AE76A.9030701@oracle.com> <544E5123.1060202@oracle.com> <544E8844.1070907@oracle.com> <0FB37288-5995-4E0B-B005-E00A4FB3B22A@oracle.com> <5454218D.40009@oracle.com> <5454AF0B.4060008@oracle.com> Message-ID: <48A070EE-B8B8-4C3E-B29C-1A45BF48C84B@oracle.com> So the loader this code refers to is the loader that is called when we get a CFLH. Run the test with -XX:+TraceClassLoading and on the error situation - see which class loader is trying to load URLClassPath$JarLoader$2 - that should be the boot loader, and that is the value of the "loader" coming in to the TestTransformer for the situation below. The way I read this - the boot loader is trying to load URLClassPath$JarLoader$2 in order to be able to let the URLClassLoader do its findClass, and the agent intercepts that and tries to call loadClasses, which itself will use the loader in the test case, which puts us into an infinite loop - except circularity detection catches us. See if that makes sense with experimentation please. thanks, Karen On Nov 1, 2014, at 5:59 AM, serguei.spitsyn at oracle.com wrote: > On 10/31/14 4:55 PM, Yumin Qi wrote: >> Karen, >> >> Thanks for your detail message for debugging. Yes, from my debugging, the exception did happen in TestThread other than main thread. I have no idea why in the end the exception was reported in main thread. >> >> You mention >> >> So that change to the test would be: >> in TestTransformer: >> if (loader != null) { >> if (tName.equals("TestThread")) { >> { >> loadClasses(3); >> } >> } >> return null; >> } >> >> The loader is the one defined in the test case, right? > > Not sure, I understand your question correctly. > > If thread is the TestThread then most likely the answer is "Yes". > This one is expected: > sClassLoader = new URLClassLoader(new URL[] {sURL}); > > The class loading for TestThread has to happen in the loadClasses(2). > I wonder if we ever observe any other loader for the TestThread. > > The question is because the TestThread is pretty simple: > > private static class TestThread extends Thread { > private final int fIndex; > public TestThread(int index) { <== it is called with index = 2 > super("TestThread"); > fIndex = index; > } > public void run() { loadClasses(fIndex); } > } > > Thanks, > Serguei > > >> The system class loader is never null. >> I will try this change, let's see if it can work it out. >> >> Thanks >> Yumin >> >> On 10/31/2014 3:29 PM, Karen Kinnear wrote: >>> Yumin, >>> >>> From your earlier exception stack trace (many thanks) you reported: >>> >>> Exception in thread "main" java.lang.ClassCircularityError: (no - I don't know why this is in thread "main") >>> sun/misc/URLClassPath$JarLoader$2 >>> at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:771) >>> at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:843) >>> at sun.misc.URLClassPath.getResource(URLClassPath.java:199) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:364) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:426) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:359) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:340) >>> at ParallelTransformerLoaderApp.loadClasses(ParallelTransformerLoaderApp.java:83) >>> at ParallelTransformerLoaderApp.main(ParallelTransformerLoaderApp.java:45) >>> >>> >>> So I ran with -XX:AbortVMOnException=java.lang.ClassCircularityError -XX:+ShowMessageBoxOnError to get >>> a log file and stack trace. See my instructions below on how to do that. >>> >>> I did this, attached a debugger, which didn't help enough since I needed to see the java stack frames, >>> and got an hs_err_log also, so the stack traces came from the error log. >>> >>> The stack trace was on Thread 2, which in the hs_err_log was TestThread (which makes sense for what the test logic says). >>> See later in email for stack traces from Thread 2. >>> >>> Summary of stack trace: >>> >>> TestThread: >>> loadClasses(#) -> forName(TestClass#, URLClassLoader) >>> vm calls out to URLClassLoader.loadClass(String) which is inherited from java.lang.ClassLoader.loadClass(String) >>> ... calls java.net.URLClassLoader.findClass(...) which calls >>> DoPrivileged java.net.URLClassLoader$1.run which calls >>> sun.misc.URLClassPath.getResource(name, false) which calls >>> sun.misc.URLClassPath$JarLoader.getResource which calls >>> sun.misc.URLClassPath$JarLoader.checkResource which tries to call sun.misc.URLClassPath$JarLoader$2 >>> - and then the transformer jumps in with loadClasses(# (which we know is 3) and walks the same logic which tries to load sun.misc.URLClassPath$JarLoader$2 again >>> >>> Note that in the placeholder table information that Yumin printed, the circularity error is on sun.misc.URLClassPath$JarLoader$2 with the null == boot loader, which >>> makes sense -- that is the appropriate defining loader, and therefore the one the CFLH would intercept during the defineClass phase. >>> >>> In the sun.misc.URLClassPath.java file, in the class JarLoader, in the method checkResource >>> ... return new Resource() { ... } >>> -- I do not know why that generates sun.misc.URLClassPath$JarLoader$1, $2 and $3 at build time or when that was added. >>> I would guess that is when the bug started happening. >>> >>> When I have a successful run, sun.misc.URLClassPath$JarLoader$2 loads before any TestClass1 loads. >>> >>> My belief is that the point of the test is to test parallel class loading for URL class loaders. >>> I don't think the point is to test the bootstrap class loader, nor to test bootstrapping - i.e. running the agent before >>> we have loaded sufficient classes to allow loading URLClassLoader classes. >>> >>> What I suggested to Yumin that he try would be to change the test to NOT intercept boot loader loads, so that sun.misc.URLClassPath$JarLoader$# >>> can load which will in turn allow classes loaded by a URLClassLoader subclass to load. >>> >>> So that change to the test would be: >>> in TestTransformer: >>> if (loader != null) { >>> if (tName.equals("TestThread")) { >>> { >>> loadClasses(3); >>> } >>> } >>> return null; >>> } >>> // I also suspect with that change, we can remove the sleep loop >>> Note: there was a printed message which said that the Thread "Signal Dispatcher" has called transform(), which I >>> ignored, however it is good that we don't call loadClass on that thread - which is part of what the sleep loop does - >>> but that would be handled by the boot loader screening above >>> >>> Alternatively we can preload the URLClassPath classes, but I don't think we want to do that, or >>> we can have the agent explicitly screen on a variety of jdk bootstrapping classes. But I think the cleaner >>> solution is to screen on the boot loader. >>> >>> Does that make any sense to others? >>> >>> thanks, >>> Karen >>> >>> p.s. How to run with hotspot flags (jtreg has a -show:rerun option, but with a shell script in the test, this is more complex, so >>> the following should be easier): >>> >>> So what I did was run the test once for it to pass (not your script, but just once with jtreg) so that it generated >>> the $DST/work directory. >>> I then created a rerun.csh script - attached - you can modify for your own $DST directory. >>> I used it to be able to quickly rerun the test without the jtreg framework and compile time etc. but mostly >>> to be able to actually add hotspot command-line flags. >>> >>> >>> >>> p.p.s. details from the error log (let me know if you want me to attach the error log to the bug report) >>> >>> note: error log shows last 10 events including: >>> Event: 0.928 loading class sun/misc/URLClassPath$JarLoader$2 >>> Event: 0.928 loading class TestClass3 >>> Event: 0.929 loading class TestClass3 done >>> Event: 0.929 loading class java/lang/ClassCircularityError >>> Event: 0.929 loading class java/lang/ClassCircularityError done >>> >>> TestThread >>> >>> java frames: >>> >>> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>> v ~StubRoutines::call_stub >>> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>> j ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>> j sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>> j sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>> v ~StubRoutines::call_stub >>> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>> v ~StubRoutines::call_stub >>> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>> j ParallelTransformerLoaderApp$TestThread.run()V+4 >>> v ~StubRoutines::call_stub >>> >>> >>> >>> >>> detailed frames: >>> >>> V [libjvm.so+0x760f5a] Exceptions::_throw_msg(Thread*, char const*, int, Symbol*, char const*)+0x7c >>> V [libjvm.so+0xce005c] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x7d8 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x690fbc] ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, ConstantPool*, int)+0x14a >>> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>> V [libjvm.so+0xce2096] SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>> V [libjvm.so+0xce00a8] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x824 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>> j ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>> j sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>> j sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x911bfb] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x3cd >>> V [libjvm.so+0x916918] jni_CallObjectMethod+0x388 >>> C [libinstrument.so+0x4eb5] transformClassFile+0x1e5 >>> C [libinstrument.so+0x1e06] eventHandlerClassFileLoadHook+0x96 >>> V [libjvm.so+0xa04afa] JvmtiClassFileLoadHookPoster::post_to_env(JvmtiEnv*, bool)+0x1a8 >>> V [libjvm.so+0xa0485e] JvmtiClassFileLoadHookPoster::post_all_envs()+0x8a >>> V [libjvm.so+0xa047c6] JvmtiClassFileLoadHookPoster::post()+0x18 >>> V [libjvm.so+0x9fb6e1] JvmtiExport::post_class_file_load_hook(Symbol*, Handle, Handle, unsigned char**, unsigned char**, JvmtiCachedClassFileData**)+0x85 >>> V [libjvm.so+0x5cd17d] ClassFileParser::parseClassFile(Symbol*, ClassLoaderData*, Handle, KlassHandle, GrowableArray*, TempNewSymbol&, bool, Thread*)+0x2af >>> V [libjvm.so+0x5dd441] ClassFileParser::parseClassFile(Symbol*, ClassLoaderData*, Handle, TempNewSymbol&, bool, Thread*)+0x95 >>> V [libjvm.so+0x5daf03] ClassLoader::load_classfile(Symbol*, Thread*)+0x2ed >>> V [libjvm.so+0xce1cc4] SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x118 >>> V [libjvm.so+0xce00a8] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x824 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x690fbc] ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, ConstantPool*, int)+0x14a >>> j sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> j sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> j sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>> j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> j java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>> V [libjvm.so+0xce2096] SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>> V [libjvm.so+0xce00a8] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x824 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>> j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>> ...... > From peter.levart at gmail.com Sat Nov 1 16:40:02 2014 From: peter.levart at gmail.com (Peter Levart) Date: Sat, 01 Nov 2014 17:40:02 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <545406BC.20005@gmail.com> Message-ID: <54550CE2.9090605@gmail.com> On 10/31/2014 11:59 PM, David Chase wrote: > Thanks very much, I shall attend to these irregularities. > > David Hi David, Just a nit (in Class.ClassData): 2537 if (oldCapacity > 0) { 2538 element_data[oldCapacity] = element_data[oldCapacity - 1]; 2539 // all array elements are non-null and sorted, increase size. 2540 // if store to element_data above floats below 2541 // store to size on the next line, that will be 2542 // inconsistent to the VM if a safepoint occurs here. 2543 size += 1; 2544 for (int i = oldCapacity; i > index; i--) { 2545 // pre: element_data[i] is duplicated at [i+1] 2546 element_data[i] = element_data[i - 1]; 2547 // post: element_data[i-1] is duplicated at [i] 2548 } 2549 // element_data[index] is duplicated at [index+1] 2550 element_data[index] = (Comparable) e; 2551 } else { In line 2544, you could start the for loop with (int i = oldCapacity - 1; ...), since you have already moved the last element before incrementing the size. Also, I would more quickly grasp the code if "oldCapacity" was called "oldSize". Now just a though... What is the expected ratio of intern() calls that insert new element to those that just return existing interned element? If those that return existing element are frequent and since you already carefully arrange insertion so that VM can at any safepoint see the "consistent" state without null elements, I wonder if intern() could itself perform an optimistic search without holding an exclusive lock. This is just a speculation, but would the following code work? private Comparable[] elementData() { Comparable[] elementData = this.elementData; if (elementData == null) { synchronized (this) { elementData = this.elementData; if (elementData == null) { this.elementData = elementData = new Comparable[1]; } } } return elementData; } private final StampedLock lock = new StampedLock(); public > E intern(Class klass, E memberName, int redefined_count) { int size, index = 0; Comparable[] elementData; // try to take an optimistic-read stamp long rstamp = lock.tryOptimisticRead(); long wstamp = 0L; if (rstamp != 0L) { // successfull // 1st read size so that it doesn't overshoot the actual elementData.length size = this.size; // 2nd read elementData elementData = elementData(); index = Arrays.binarySearch(elementData, 0, size, memberName); if (index >= 0) { E element = (E) elementData[index]; // validate that our reads were not disturbed by any writes if (lock.validate(rstamp)) { return element; } } // try to convert to write lock wstamp = lock.tryConvertToWriteLock(rstamp); } if (wstamp == 0L) { // either tryOptimisticRead or tryConvertToWriteLock failed - // must acquire write lock and re-read/re-try search wstamp = lock.writeLock(); size = this.size; elementData = elementData(); index = Arrays.binarySearch(elementData, 0, size, memberName); if (index >= 0) { E element = (E) elementData[index]; lock.unlockWrite(wstamp); return element; } } // we have a write lock and are sure there was no element found E element = add(klass, ~index, memberName, redefined_count); lock.unlockWrite(wstamp); return element; } The only thing that will have to be done to add() method is to publish new elements safely. Code doing binary-search under optimistic read could observe an unsafely published MemberName and comparing with such instance could lead to a NPE for example. To remedy this, the newly inserted MemberName would have to be published using a volatile write to the array slot (using Unsafe) - moving existing elements up and down the array does not have to be performed with volatile writes, since they have already been published. Do you think this would be worth the effort? Regards, Peter > On 2014-10-31, at 6:01 PM, Peter Levart wrote: > >> On 10/31/2014 07:11 PM, David Chase wrote: >>> I found a lurking bug and updated the webrevs ? I was mistaken >>> about this version having passed the ute tests (but now, for real, it does). >>> >>> I also added converted Christian?s test code into a jtreg test (which passes): >>> >>> >>> http://cr.openjdk.java.net/~drchase/8013267/hotspot.05/ >>> http://cr.openjdk.java.net/~drchase/8013267/jdk.05/ >> Hi David, >> >> I'll just comment on the JDK side of things. >> >> In Class.ClassData.intern(), ther is a part that synchronizes on the elementData (volatile field holding array of Comparable(s)): >> >> 2500 synchronized (elementData) { >> 2501 final int index = Arrays.binarySearch(elementData, 0, size, memberName); >> 2502 if (index >= 0) { >> 2503 return (E) elementData[index]; >> 2504 } >> 2505 // Not found, add carefully. >> 2506 return add(klass, ~index, memberName, redefined_count); >> 2507 } >> >> Inside this synchronized block, add() method is called, which can call grow() method: >> >> 2522 if (oldCapacity + 1 > element_data.length ) { >> 2523 // Replacing array with a copy is safe; elements are identical. >> 2524 grow(oldCapacity + 1); >> 2525 element_data = elementData; >> 2526 } >> >> grow() method creates a copy of elementData array and replaces it on this volatile field (line 2584): >> >> 2577 private void grow(int minCapacity) { >> 2578 // overflow-conscious code >> 2579 int oldCapacity = elementData.length; >> 2580 int newCapacity = oldCapacity + (oldCapacity >> 1); >> 2581 if (newCapacity - minCapacity < 0) >> 2582 newCapacity = minCapacity; >> 2583 // minCapacity is usually close to size, so this is a win: >> 2584 elementData = Arrays.copyOf(elementData, newCapacity); >> 2585 } >> >> A concurrent call to intern() can therefore synchronize on a different monitor, so two threads will be inserting the element into the same array at the same time, Auch! >> >> >> >> Also, lazy construction of ClassData instance: >> >> 2593 private ClassData classData() { >> 2594 if (this.classData != null) { >> 2595 return this.classData; >> 2596 } >> 2597 synchronized (this) { >> 2598 if (this.classData == null) { >> 2599 this.classData = new ClassData<>(); >> 2600 } >> 2601 } >> 2602 return this.classData; >> 2603 } >> >> Synchronizes on the j.l.Class instance, which can interfere with user synchronization (think synchronized static methods). This dangerous. >> >> Theres an inner class Class.Atomic which is a home for Unsafe machinery in j.l.Class. You can add a casClassData method to it and use it to atomically install the ClassData instance without synchronized blocks. >> >> >> >> Regards, Peter >> From david.r.chase at oracle.com Sat Nov 1 17:03:28 2014 From: david.r.chase at oracle.com (David Chase) Date: Sat, 1 Nov 2014 13:03:28 -0400 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <54550CE2.9090605@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <545406BC.20005@gmail.com> <54550CE2.9090605@gmail.com> Message-ID: <13FF3B65-027D-4152-8CEF-0F31773976EA@oracle.com> Hello Peter, I think it is expected that inserting-interns will be asymptotically rare ? classes have a finite number of methods, after all. I?m not sure if that is worth doing right now, since this is also a bug fix ? maybe the performance enhancements go in as an RFE. Maybe other reviewers will have an opinion? David On 2014-11-01, at 12:40 PM, Peter Levart wrote: > > On 10/31/2014 11:59 PM, David Chase wrote: >> Thanks very much, I shall attend to these irregularities. >> >> David >> > > Hi David, > > Just a nit (in Class.ClassData): > > 2537 if (oldCapacity > 0) { > 2538 element_data[oldCapacity] = element_data[oldCapacity - 1]; > 2539 // all array elements are non-null and sorted, increase size. > 2540 // if store to element_data above floats below > 2541 // store to size on the next line, that will be > 2542 // inconsistent to the VM if a safepoint occurs here. > 2543 size += 1; > 2544 for (int i = oldCapacity; i > index; i--) { > 2545 // pre: element_data[i] is duplicated at [i+1] > 2546 element_data[i] = element_data[i - 1]; > 2547 // post: element_data[i-1] is duplicated at [i] > 2548 } > 2549 // element_data[index] is duplicated at [index+1] > 2550 element_data[index] = (Comparable) e; > 2551 } else { > > In line 2544, you could start the for loop with (int i = oldCapacity - 1; ...), since you have already moved the last element before incrementing the size. Also, I would more quickly grasp the code if "oldCapacity" was called "oldSize". > > Now just a though... > > What is the expected ratio of intern() calls that insert new element to those that just return existing interned element? If those that return existing element are frequent and since you already carefully arrange insertion so that VM can at any safepoint see the "consistent" state without null elements, I wonder if intern() could itself perform an optimistic search without holding an exclusive lock. > > This is just a speculation, but would the following code work? > > private Comparable[] elementData() { > Comparable[] elementData = this.elementData; > if (elementData == null) { > synchronized (this) { > elementData = this.elementData; > if (elementData == null) { > this.elementData = elementData = new Comparable[1]; > } > } > } > return elementData; > } > > private final StampedLock lock = new StampedLock(); > > public > E intern(Class klass, E memberName, int redefined_count) { > int size, index = 0; > Comparable[] elementData; > // try to take an optimistic-read stamp > long rstamp = lock.tryOptimisticRead(); > long wstamp = 0L; > > if (rstamp != 0L) { // successfull > // 1st read size so that it doesn't overshoot the actual elementData.length > size = this.size; > // 2nd read elementData > elementData = elementData(); > > index = Arrays.binarySearch(elementData, 0, size, memberName); > if (index >= 0) { > E element = (E) elementData[index]; > // validate that our reads were not disturbed by any writes > if (lock.validate(rstamp)) { > return element; > } > } > > // try to convert to write lock > wstamp = lock.tryConvertToWriteLock(rstamp); > } > > if (wstamp == 0L) { > // either tryOptimisticRead or tryConvertToWriteLock failed - > // must acquire write lock and re-read/re-try search > wstamp = lock.writeLock(); > size = this.size; > elementData = elementData(); > index = Arrays.binarySearch(elementData, 0, size, memberName); > if (index >= 0) { > E element = (E) elementData[index]; > lock.unlockWrite(wstamp); > return element; > } > } > > // we have a write lock and are sure there was no element found > E element = add(klass, ~index, memberName, redefined_count); > > lock.unlockWrite(wstamp); > return element; > } > > > > The only thing that will have to be done to add() method is to publish new elements safely. Code doing binary-search under optimistic read could observe an unsafely published MemberName and comparing with such instance could lead to a NPE for example. To remedy this, the newly inserted MemberName would have to be published using a volatile write to the array slot (using Unsafe) - moving existing elements up and down the array does not have to be performed with volatile writes, since they have already been published. > > Do you think this would be worth the effort? > > Regards, Peter > > >> On 2014-10-31, at 6:01 PM, Peter Levart >> wrote: >> >> >>> On 10/31/2014 07:11 PM, David Chase wrote: >>> >>>> I found a lurking bug and updated the webrevs ? I was mistaken >>>> about this version having passed the ute tests (but now, for real, it does). >>>> >>>> I also added converted Christian?s test code into a jtreg test (which passes): >>>> >>>> >>>> >>>> http://cr.openjdk.java.net/~drchase/8013267/hotspot.05/ >>>> http://cr.openjdk.java.net/~drchase/8013267/jdk.05/ >>> Hi David, >>> >>> I'll just comment on the JDK side of things. >>> >>> In Class.ClassData.intern(), ther is a part that synchronizes on the elementData (volatile field holding array of Comparable(s)): >>> >>> 2500 synchronized (elementData) { >>> 2501 final int index = Arrays.binarySearch(elementData, 0, size, memberName); >>> 2502 if (index >= 0) { >>> 2503 return (E) elementData[index]; >>> 2504 } >>> 2505 // Not found, add carefully. >>> 2506 return add(klass, ~index, memberName, redefined_count); >>> 2507 } >>> >>> Inside this synchronized block, add() method is called, which can call grow() method: >>> >>> 2522 if (oldCapacity + 1 > element_data.length ) { >>> 2523 // Replacing array with a copy is safe; elements are identical. >>> 2524 grow(oldCapacity + 1); >>> 2525 element_data = elementData; >>> 2526 } >>> >>> grow() method creates a copy of elementData array and replaces it on this volatile field (line 2584): >>> >>> 2577 private void grow(int minCapacity) { >>> 2578 // overflow-conscious code >>> 2579 int oldCapacity = elementData.length; >>> 2580 int newCapacity = oldCapacity + (oldCapacity >> 1); >>> 2581 if (newCapacity - minCapacity < 0) >>> 2582 newCapacity = minCapacity; >>> 2583 // minCapacity is usually close to size, so this is a win: >>> 2584 elementData = Arrays.copyOf(elementData, newCapacity); >>> 2585 } >>> >>> A concurrent call to intern() can therefore synchronize on a different monitor, so two threads will be inserting the element into the same array at the same time, Auch! >>> >>> >>> >>> Also, lazy construction of ClassData instance: >>> >>> 2593 private ClassData classData() { >>> 2594 if (this.classData != null) { >>> 2595 return this.classData; >>> 2596 } >>> 2597 synchronized (this) { >>> 2598 if (this.classData == null) { >>> 2599 this.classData = new ClassData<>(); >>> 2600 } >>> 2601 } >>> 2602 return this.classData; >>> 2603 } >>> >>> Synchronizes on the j.l.Class instance, which can interfere with user synchronization (think synchronized static methods). This dangerous. >>> >>> Theres an inner class Class.Atomic which is a home for Unsafe machinery in j.l.Class. You can add a casClassData method to it and use it to atomically install the ClassData instance without synchronized blocks. >>> >>> >>> >>> Regards, Peter >>> >>> > From david.r.chase at oracle.com Mon Nov 3 00:05:16 2014 From: david.r.chase at oracle.com (David Chase) Date: Sun, 2 Nov 2014 19:05:16 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> Message-ID: <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> On 2014-10-31, at 5:45 PM, Vitaly Davidovich wrote: > The volatile load prevents subsequent loads and stores from reordering with it, but that doesn't stop C from moving before the B store. So breaking B into the load (call it BL) and store (BS) you can still get this ordering: A, BL, C, BS I think this should do the trick. element_data[oldCapacity] = element_data[oldCapacity - 1]; // all array elements are non-null and sorted, increase size. // if store to element_data above floats below // store to size on the next line, that will be // inconsistent to the VM if a safepoint occurs here. size += 1; // Load of volatile size prevents movement of element_data store for (int i = size - 1; i > index; i--) { The change is to load the volatile size for the loop bound; this stops the stores in the loop from moving earlier, right? David From david.holmes at oracle.com Mon Nov 3 02:29:29 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 03 Nov 2014 12:29:29 +1000 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: <5453AC95.3060904@oracle.com> Message-ID: <5456E889.8080100@oracle.com> Again adding in serviceability. David On 1/11/2014 6:17 AM, Jeremy Manson wrote: > Thanks, Coleen - I saw that you committed it, but the change had a long > contributed-by line, so I wasn't sure whether you were the right person to > reach out to. > > Jeremy > > On Fri, Oct 31, 2014 at 8:36 AM, Coleen Phillimore < > coleen.phillimore at oracle.com> wrote: > >> >> Jeremy, >> I will review and sponsor this for you since I wrote the original code. >> Thanks, >> Coleen >> >> >> On 10/30/14, 1:02 PM, Jeremy Manson wrote: >> >>> There's a significant regression in the speed of JVMTI GetClassMethods in >>> JDK8. I've tracked this down to allocation of jmethodids in a tight loop. >>> The issue can be addressed by preallocating enough space for all of the >>> jmethodids when starting the operation and not iterating over all of the >>> existing jmethodids when you allocate a new one. >>> >>> A patch is here: >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >>> >>> A reproducible test case can be found here: >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >>> >>> It's a benchmark, though: I have no idea how to turn it into a test. >>> >>> For whoever reviews it: can you explain to me why it is okay that this >>> code >>> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a lot of >>> problems stemming from accidental reuse. >>> >>> Jeremy >>> >> >> From david.holmes at oracle.com Mon Nov 3 02:39:27 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 03 Nov 2014 12:39:27 +1000 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <5454218D.40009@oracle.com> References: <543C591E.8010602@oracle.com> <544AB477.4000204@oracle.com> <544ADC07.6080904@oracle.com> <544AE76A.9030701@oracle.com> <544E5123.1060202@oracle.com> <544E8844.1070907@oracle.com> <0FB37288-5995-4E0B-B005-E00A4FB3B22A@oracle.com> <5454218D.40009@oracle.com> Message-ID: <5456EADF.4050203@oracle.com> On 1/11/2014 9:55 AM, Yumin Qi wrote: > Karen, > > Thanks for your detail message for debugging. Yes, from my debugging, > the exception did happen in TestThread other than main thread. I have no > idea why in the end the exception was reported in main thread. Until that question is answered I will remain uneasy about simply tweaking the test until it no longer fails. I would also like to know when it started failing - Karen alludes to the possible introduction of a new inner class at some point. Thanks, David > You mention > > So that change to the test would be: > in TestTransformer: > if (loader != null) { > if (tName.equals("TestThread")) { > { > loadClasses(3); > } > } > return null; > } > > > The loader is the one defined in the test case, right? The system class > loader is never null. > I will try this change, let's see if it can work it out. > > Thanks > Yumin > > On 10/31/2014 3:29 PM, Karen Kinnear wrote: >> Yumin, >> >> From your earlier exception stack trace (many thanks) you reported: >> >> Exception in thread "main" java.lang.ClassCircularityError: (no - I >> don't know why this is in thread "main") >> sun/misc/URLClassPath$JarLoader$2 >> at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:771) >> at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:843) >> at sun.misc.URLClassPath.getResource(URLClassPath.java:199) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:364) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:426) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:359) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:340) >> at >> ParallelTransformerLoaderApp.loadClasses(ParallelTransformerLoaderApp.java:83) >> >> at >> ParallelTransformerLoaderApp.main(ParallelTransformerLoaderApp.java:45) >> >> >> So I ran with -XX:AbortVMOnException=java.lang.ClassCircularityError >> -XX:+ShowMessageBoxOnError to get >> a log file and stack trace. See my instructions below on how to do that. >> >> I did this, attached a debugger, which didn't help enough since I >> needed to see the java stack frames, >> and got an hs_err_log also, so the stack traces came from the error >> log. >> >> The stack trace was on Thread 2, which in the hs_err_log was >> TestThread (which makes sense for what the test logic says). >> See later in email for stack traces from Thread 2. >> >> Summary of stack trace: >> >> TestThread: >> loadClasses(#) -> forName(TestClass#, URLClassLoader) >> vm calls out to URLClassLoader.loadClass(String) which is >> inherited from java.lang.ClassLoader.loadClass(String) >> ... calls java.net.URLClassLoader.findClass(...) which calls >> DoPrivileged java.net.URLClassLoader$1.run which calls >> sun.misc.URLClassPath.getResource(name, false) which calls >> sun.misc.URLClassPath$JarLoader.getResource which calls >> sun.misc.URLClassPath$JarLoader.checkResource which >> tries to call sun.misc.URLClassPath$JarLoader$2 >> - and then the transformer jumps in with loadClasses(# (which we >> know is 3) and walks the same logic which tries to load >> sun.misc.URLClassPath$JarLoader$2 again >> >> Note that in the placeholder table information that Yumin printed, the >> circularity error is on sun.misc.URLClassPath$JarLoader$2 with the >> null == boot loader, which >> makes sense -- that is the appropriate defining loader, and therefore >> the one the CFLH would intercept during the defineClass phase. >> >> In the sun.misc.URLClassPath.java file, in the class JarLoader, in the >> method checkResource >> ... return new Resource() { ... } >> -- I do not know why that generates sun.misc.URLClassPath$JarLoader$1, >> $2 and $3 at build time or when that was added. >> I would guess that is when the bug started happening. >> >> When I have a successful run, sun.misc.URLClassPath$JarLoader$2 loads >> before any TestClass1 loads. >> >> My belief is that the point of the test is to test parallel class >> loading for URL class loaders. >> I don't think the point is to test the bootstrap class loader, nor to >> test bootstrapping - i.e. running the agent before >> we have loaded sufficient classes to allow loading URLClassLoader >> classes. >> >> What I suggested to Yumin that he try would be to change the test to >> NOT intercept boot loader loads, so that >> sun.misc.URLClassPath$JarLoader$# >> can load which will in turn allow classes loaded by a URLClassLoader >> subclass to load. >> >> So that change to the test would be: >> in TestTransformer: >> if (loader != null) { >> if (tName.equals("TestThread")) { >> { >> loadClasses(3); >> } >> } >> return null; >> } >> // I also suspect with that change, we can remove the sleep loop >> Note: there was a printed message which said that the Thread "Signal >> Dispatcher" has called transform(), which I >> ignored, however it is good that we don't call loadClass on that >> thread - which is part of what the sleep loop does - >> but that would be handled by the boot loader screening above >> >> Alternatively we can preload the URLClassPath classes, but I don't >> think we want to do that, or >> we can have the agent explicitly screen on a variety of jdk >> bootstrapping classes. But I think the cleaner >> solution is to screen on the boot loader. >> >> Does that make any sense to others? >> >> thanks, >> Karen >> >> p.s. How to run with hotspot flags (jtreg has a -show:rerun option, >> but with a shell script in the test, this is more complex, so >> the following should be easier): >> >> So what I did was run the test once for it to pass (not your script, >> but just once with jtreg) so that it generated >> the $DST/work directory. >> I then created a rerun.csh script - attached - you can modify for your >> own $DST directory. >> I used it to be able to quickly rerun the test without the jtreg >> framework and compile time etc. but mostly >> to be able to actually add hotspot command-line flags. >> >> >> >> >> p.p.s. details from the error log (let me know if you want me to >> attach the error log to the bug report) >> >> note: error log shows last 10 events including: >> Event: 0.928 loading class sun/misc/URLClassPath$JarLoader$2 >> Event: 0.928 loading class TestClass3 >> Event: 0.929 loading class TestClass3 done >> Event: 0.929 loading class java/lang/ClassCircularityError >> Event: 0.929 loading class java/lang/ClassCircularityError done >> >> TestThread >> >> java frames: >> >> j >> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> >> j >> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> >> j >> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> j >> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> >> j >> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j >> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >> v ~StubRoutines::call_stub >> j >> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> >> j >> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> >> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >> j >> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >> >> j >> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >> >> j >> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >> >> v ~StubRoutines::call_stub >> j >> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> >> j >> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> >> j >> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> j >> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> >> j >> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j >> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >> v ~StubRoutines::call_stub >> j >> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> >> j >> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> >> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >> j ParallelTransformerLoaderApp$TestThread.run()V+4 >> v ~StubRoutines::call_stub >> >> >> >> detailed frames: >> >> V [libjvm.so+0x760f5a] Exceptions::_throw_msg(Thread*, char const*, >> int, Symbol*, char const*)+0x7c >> V [libjvm.so+0xce005c] >> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >> Handle, Thread*)+0x7d8 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >> Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >> Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x690fbc] >> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >> ConstantPool*, int)+0x14a >> j >> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> >> j >> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> >> j >> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >> JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >> j >> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> >> j >> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j >> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >> JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >> V [libjvm.so+0xce2096] >> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >> V [libjvm.so+0xce00a8] >> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >> Handle, Thread*)+0x824 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >> Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >> Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >> j >> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> >> j >> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> >> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >> j >> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >> >> j >> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >> >> j >> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >> >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >> JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x911bfb] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, >> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x3cd >> V [libjvm.so+0x916918] jni_CallObjectMethod+0x388 >> C [libinstrument.so+0x4eb5] transformClassFile+0x1e5 >> C [libinstrument.so+0x1e06] eventHandlerClassFileLoadHook+0x96 >> V [libjvm.so+0xa04afa] >> JvmtiClassFileLoadHookPoster::post_to_env(JvmtiEnv*, bool)+0x1a8 >> V [libjvm.so+0xa0485e] >> JvmtiClassFileLoadHookPoster::post_all_envs()+0x8a >> V [libjvm.so+0xa047c6] JvmtiClassFileLoadHookPoster::post()+0x18 >> V [libjvm.so+0x9fb6e1] >> JvmtiExport::post_class_file_load_hook(Symbol*, Handle, Handle, >> unsigned char**, unsigned char**, JvmtiCachedClassFileData**)+0x85 >> V [libjvm.so+0x5cd17d] ClassFileParser::parseClassFile(Symbol*, >> ClassLoaderData*, Handle, KlassHandle, GrowableArray*, >> TempNewSymbol&, bool, Thread*)+0x2af >> V [libjvm.so+0x5dd441] ClassFileParser::parseClassFile(Symbol*, >> ClassLoaderData*, Handle, TempNewSymbol&, bool, Thread*)+0x95 >> V [libjvm.so+0x5daf03] ClassLoader::load_classfile(Symbol*, >> Thread*)+0x2ed >> V [libjvm.so+0xce1cc4] >> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x118 >> V [libjvm.so+0xce00a8] >> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >> Handle, Thread*)+0x824 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >> Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >> Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x690fbc] >> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >> ConstantPool*, int)+0x14a >> j >> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >> >> j >> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >> >> j >> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >> >> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >> JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >> j >> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >> >> j >> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >> j >> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; >> v ~StubRoutines::call_stub >> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >> methodHandle*, JavaCallArguments*, Thread*)+0x3a >> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >> JavaCallArguments*, Thread*)+0x7d >> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >> V [libjvm.so+0xce2096] >> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >> V [libjvm.so+0xce00a8] >> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >> Handle, Thread*)+0x824 >> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >> Handle, Handle, Thread*)+0x26d >> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >> Handle, Handle, bool, Thread*)+0x39 >> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >> j >> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >> >> j >> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >> >> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >> ...... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Oct 27, 2014, at 2:00 PM, serguei.spitsyn at oracle.com wrote: >> >>> Ok. >>> >>> Thanks, Dan! >>> Serguei >>> >>> >>> On 10/27/14 7:05 AM, Daniel D. Daugherty wrote: >>>>> The test case was added by Dan. >>>>> We may want to ask him to clarify the test case purpose. >>>>> (added Dan to the to-list) >>>> Here's the changeset that added the test: >>>> >>>> $ hg log -v -r bca8bf23ac59 >>>> test/java/lang/instrument/ParallelTransformerLoader.sh >>>> changeset: 132:bca8bf23ac59 >>>> user: dcubed >>>> date: Mon Mar 24 15:05:09 2008 -0700 >>>> files: test/java/lang/instrument/ParallelTransformerLoader.sh >>>> test/java/lang/instrument/ParallelTransformerLoaderAgent.java >>>> test/java/lang/instrument/ParallelTransformerLoaderApp.java >>>> test/java/lang/instrument/TestClass1.java >>>> test/java/lang/instrument/TestClass2.java >>>> test/java/lang/instrument/TestClass3.java >>>> description: >>>> 5088398: 3/2 java.lang.instrument TCK test deadlock (test11) >>>> Summary: Add regression test for single-threaded bootstrap classloader. >>>> Reviewed-by: sspitsyn >>>> >>>> >>>> Based on my e-mail archive for this bug and from the bug report itself, >>>> it looks like we got this test from Wily Labs. The original bug was a >>>> deadlock that stopped being reproducible after: >>>> >>>> Karen fixed the bootstrap class loader to work in parallel via: >>>> >>>> 4997893 4/5 Investigate allowing bootstrap loader to work in >>>> parallel >>>> >>>> with that fix in place the deadlock no longer reproduces. >>>> I'm planning to use this bug as the vehicle for getting >>>> the test program into the INSTRUMENT_REGRESSION test suite. >>>> >>>> *** (#2 of 2): 2008-02-29 18:20:17 GMT+00:00 daniel.daugherty at sun.com >>>> >>>> >>>> A careful reading of JDK-5088398 might reveal the intentions of this >>>> test... >>>> >>>> Dan >>>> >>>> >>>> On 10/24/14 5:57 PM, serguei.spitsyn at oracle.com wrote: >>>>> Yumin, >>>>> >>>>> On 10/24/14 4:08 PM, Yumin Qi wrote: >>>>>> Serguei, >>>>>> >>>>>> Thanks for your comments. >>>>>> This test happens intermittently, but now it can repeat with 8/9. >>>>>> Loading TestClass1 in main thread while loading TestClass2 in >>>>>> TestThread in parallel. They both will call transform since >>>>>> TestClass[1-3] are loaded via agent. When loading TestClass2, it >>>>>> will call loading TestClass3 in TestThread. >>>>>> Note in the main thread, for loop: >>>>>> >>>>>> for (int i = 0; i < kNumIterations; i++) >>>>>> { >>>>>> // load some classes from multiple threads >>>>>> (this thread and one other) >>>>>> Thread testThread = new TestThread(2); >>>>>> testThread.start(); >>>>>> loadClasses(1); >>>>>> >>>>>> // log that it completed and reset for the >>>>>> next iteration >>>>>> testThread.join(); >>>>>> System.out.print("."); >>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader(); >>>>>> } >>>>>> >>>>>> The loader got renewed after testThread.join(). So both threads >>>>>> are using the exact same class loader. >>>>> You are right, thanks. >>>>> It means that all three classes (TesClass1, TestClass2 and TestClass3) >>>>> are loaded by the same class loader in each iteration. >>>>> >>>>> However, I see more cases when the TestClass3 gets loaded. >>>>> It happens in a CFLH event when any other class (not TestClass*) in >>>>> the system is loaded. >>>>> The class loading thread can be any, not only "main" or "TestClass" >>>>> thread. >>>>> I suspect this test case mostly targets class loading that happens >>>>> on other threads. >>>>> It is because of the lines: >>>>> // In 160_03 and older, transform() is called >>>>> // with the "system_loader_lock" held and that >>>>> // prevents the bootstrap class loaded from >>>>> // running in parallel. If we add a slight >>>>> sleep >>>>> // delay here when the transform() call is not >>>>> // main or TestThread, then the deadlock in >>>>> // 160_03 and older is much more reproducible. >>>>> if (!tName.equals("main") && >>>>> !tName.equals("TestThread")) { >>>>> System.out.println("Thread '" + tName + >>>>> "' has called transform()"); >>>>> try { >>>>> Thread.sleep(500); >>>>> } catch (InterruptedException ie) { >>>>> } >>>>> } >>>>> >>>>> What about the following? >>>>> >>>>> In the ParallelTransformerLoaderAgent.java make this change: >>>>> if (!tName.equals("main")) >>>>> => if (tName.equals("TestThread")) >>>>> >>>>> Does such updated test still failing? >>>>> >>>>>> After create a new class loader, next loop will use the loader. >>>>>> This is why quite often on the stack trace we can see it resolves >>>>>> JarLoader$2. >>>>>> >>>>>> I am not quite understand the test case either. Loading TestClass3 >>>>>> inside transform using the same classloader will cause call to >>>>>> transform again and form a circle. Nonetheless, if we see >>>>>> TestClass2 already loaded, the loop will end but that still is a >>>>>> risk. >>>>> In fact, I don't like that the test loads the class TestClass3 at >>>>> the TestClass3 CFLH event. >>>>> However, it is interesting to know why we did not see (is it the >>>>> case?) this issue before. >>>>> Also, it is interesting why the test stops failing with you fix >>>>> (replacing loader with SystemClassLoader). >>>>> >>>>> The test case was added by Dan. >>>>> We may want to ask him to clarify the test case purpose. >>>>> (added Dan to the to-list) >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>>> Thanks >>>>>> Yumin >>>>>> >>>>>> On 10/24/2014 1:20 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Yumin, >>>>>>> >>>>>>> Below is some analysis to make sure I understand the test >>>>>>> scenario correctly. >>>>>>> >>>>>>> The ParallelTransformerLoaderApp.main() executes a 1000 iteration >>>>>>> loop. >>>>>>> At each iteration it does: >>>>>>> - creates and starts a new TestThread >>>>>>> - loads TestClass1 with the current class loader: >>>>>>> ParallelTransformerLoaderAgent.getClassLoader() >>>>>>> - changes the current class loader with new one: >>>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader() >>>>>>> >>>>>>> The TestThread loads the TestClass2 concurrently with the main >>>>>>> thread. >>>>>>> >>>>>>> At the CFLH events, the ParallelTransformerLoaderAgent does the >>>>>>> class retransformation. >>>>>>> If the thread loading the class is not "main", it loads the class >>>>>>> TestClass3 >>>>>>> with the current class loader >>>>>>> ParallelTransformerLoaderAgent.getClassLoader(). >>>>>>> >>>>>>> Sometimes, the TestClass2 and TestClass3 are loaded by the same >>>>>>> class loader recursively. >>>>>>> It happens if the class loader has not been changed between >>>>>>> loading TestClass2 and TestClass3 classes. >>>>>>> >>>>>>> I'm not convinced yet the test is incorrect. >>>>>>> And it is not clear why do we get a ClassCircularityError. >>>>>>> >>>>>>> Please, let me know if the above understanding is wrong. >>>>>>> I also see the reply from David and share his concerns. >>>>>>> >>>>>>> It is not clear if this failure is a regression. >>>>>>> Did we observe this issue before? >>>>>>> If - NOT then when and why had this failure started to appear? >>>>>>> >>>>>>> Unfortunately, it is impossible to look at the test run history >>>>>>> at the moment. >>>>>>> The Aurora is at a maintenance. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> On 10/13/14 3:58 PM, Yumin Qi wrote: >>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8038468 >>>>>>>> webrev:*http://cr.openjdk.java.net/~minqi/8038468/webrev00/ >>>>>>>> >>>>>>>> the bug marked as confidential so post the webrev internally. >>>>>>>> >>>>>>>> Problem: The test case tries to load a class from the same jar >>>>>>>> via agent in the middle of loading another class from the jar >>>>>>>> via same class loader in same thread. The call happens in >>>>>>>> transform which is a rare case --- in middle of loading class, >>>>>>>> loading another class. The result is a CircularityError. When >>>>>>>> first class is in loading, in vm we put JarLoader$2 on place >>>>>>>> holder table, then we start the defineClass, which calls >>>>>>>> transform, begins loading the second class so go along the same >>>>>>>> routine for loading JarLoader$2 first, found it already in >>>>>>>> placeholder table. A CircularityError is thrown. >>>>>>>> Fix: The test case should not call loading class with same class >>>>>>>> loader in same thread from same jar in 'transform' method. I >>>>>>>> modify it loading with system class loader and we expect see >>>>>>>> ClassNotFoundException. Detail see bug comments. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Yumin * > From david.holmes at oracle.com Mon Nov 3 03:49:45 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 03 Nov 2014 13:49:45 +1000 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> Message-ID: <5456FB59.60905@oracle.com> On 3/11/2014 10:05 AM, David Chase wrote: > > On 2014-10-31, at 5:45 PM, Vitaly Davidovich wrote: > >> The volatile load prevents subsequent loads and stores from reordering with it, but that doesn't stop C from moving before the B store. So breaking B into the load (call it BL) and store (BS) you can still get this ordering: A, BL, C, BS > > I think this should do the trick. > > element_data[oldCapacity] = element_data[oldCapacity - 1]; > // all array elements are non-null and sorted, increase size. > // if store to element_data above floats below > // store to size on the next line, that will be > // inconsistent to the VM if a safepoint occurs here. > size += 1; > // Load of volatile size prevents movement of element_data store > for (int i = size - 1; i > index; i--) { > > The change is to load the volatile size for the loop bound; this stops the stores > in the loop from moving earlier, right? Treating volatile accesses like memory barriers is playing a bit fast-and-loose with the spec+implementation. The basic happens-before relationship for volatiles states that if a volatile read sees a value X, then the volatile write that wrote X happened-before the read [1]. But in this code there are no checks of the values of the volatile fields. Instead you are relying on a volatile read "acting like acquire()" and a volatile write "acting like release()". That said you are trying to "synchronize" the hotspot code with the JDK code so you have stepped outside the JMM in any case and reasoning about what is and is not allowed is somewhat moot - unless the hotspot code always uses Java-style accesses to the Java-level variables. BTW the Java side of this needs to be reviewed on core-libs-dev at openjdk.java.net David H. [1] http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.4 > David > From david.holmes at oracle.com Mon Nov 3 04:44:59 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 03 Nov 2014 14:44:59 +1000 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <5452C0B4.4070601@oracle.com> References: <5452C0B4.4070601@oracle.com> Message-ID: <5457084B.6070808@oracle.com> Hi Dan, Looks good. Couple of nits and one semantic query below ... src/cpu/sparc/vm/macroAssembler_sparc.cpp Formatting changes were a bit of a distraction. --- src/cpu/x86/vm/macroAssembler_x86.cpp Formatting changes were a bit of a distraction. 1929 // unconditionally set stackBox->_displaced_header = 3 1930 movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); At 1870 we refer to box rather than stackBox. Also it takes some sleuthing to realize that "3" here is somehow a pseudonym for unused_mark(). Back up at 1808 we have a to-do: 1808 // use markOop::unused_mark() instead of "3". so the current change seems to be implementing that, even though other uses of "3" are left untouched. --- src/share/vm/runtime/sharedRuntime.cpp 1794 JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) 1795 if (!SafepointSynchronize::is_synchronizing()) { 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) return; Is it necessary to check is_synchronizing? If we are executing this code we are not at a safepoint and the quick_enter wont change that, so I'm not sure what we are guarding against. --- src/share/vm/runtime/synchronizer.cpp Minor nit: line 153 the usual acronym is NPE (for NullPointerException) not NPX Nit: 159 Thread * const ox Please change ox to owner. --- Thanks, David On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: > Greetings, > > I have the Contended Locking fast enter bucket ready for review. > > The code changes in this bucket are primarily a quick_enter() > function that works on inflated but uncontended Java monitors. > This quick_enter() function is used on the "slow path" for Java > Monitor enter operations when the built-in "fast path" (read > assembly code) doesn't work. > > This work is being tracked by the following bug ID: > > JDK-8061553 Contended Locking fast enter bucket > https://bugs.openjdk.java.net/browse/JDK-8061553 > > Here is the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ > > Here is the JEP link: > > https://bugs.openjdk.java.net/browse/JDK-8046133 > > 8061553 summary of changes: > > macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() > > - clean up spacing around some > 'ObjectMonitor::owner_offset_in_bytes() - 2' uses > - remove optional (EmitSync & 64) code > - change from cmp() to andcc() so icc.zf flag is set > > macroAssembler_x86.cpp: MacroAssembler::fast_lock() > > - remove optional (EmitSync & 2) code > - rewrite LP64 inflated lock code that tries to CAS in > the new owner value to be more efficient > > interfaceSupport.hpp: > > - add JRT_BLOCK_NO_ASYNC to permit splitting a > JRT_BLOCK_ENTRY into two pieces. > > sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() > > - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY > to permit ObjectSynchronizer::quick_enter() call > - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work > to revert to JRT_ENTRY_NO_ASYNC-like semantics > > synchronizer.[ch]pp: > > - add ObjectSynchronizer::quick_enter() for entering an > inflated but unowned Java monitor without thread state > changes > > Testing: > > - Aurora Adhoc RT/SVC baseline batch > - JPRT test jobs > - MonitorEnterStresser micro-benchmark (in process) > - CallTimerGrid stress testing (in process) > - Aurora performance testing: > - out of the box for the "promotion" and 32-bit server configs > - heavy weight monitors for the "promotion" and 32-bit server configs > (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) > (in process) > > > Thanks, in advance, for any comments, questions or suggestions. > > Dan From yumin.qi at oracle.com Mon Nov 3 05:52:39 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Sun, 02 Nov 2014 21:52:39 -0800 Subject: RFR: 8062247: Allow WhiteBox test to access JVM offsets In-Reply-To: <5453DB4F.70709@oracle.com> References: <544F150A.3090500@oracle.com> <54518A60.9080800@oracle.com> <5453DB4F.70709@oracle.com> Message-ID: <54571827.5050807@oracle.com> Misha, It is a generic name, now it only targets on FileMapHeader, it can add other data structure of vm if needed in future. Maybe a name like getOffsetForName(String name) is better? Thanks Yumin On 10/31/2014 11:56 AM, Mikhailo Seledtsov wrote: > Hi Yumin, > > The name getOffsets() seems too generic. Perhaps, we could rename it > to be more specific to the task. > > Thank you, > Misha > > On 10/29/2014 5:46 PM, Yumin Qi wrote: >> Please review the new changeset at same location. >> New API supply an interface to get data member offset by it's name. >> http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >> >> Thanks >> Yumin >> >> On 10/27/2014 9:01 PM, Yumin Qi wrote: >>> Please review >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8062247 >>> webrev: http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>> >>> Summary: Internal test failed since the variable offsets changed in >>> hotspot. The way to get offset in the test is hard-coded. To reduce >>> the risk of future changes of hotspot offsets, the fix add a >>> WhiteBox API function to get a map for FileMapHeaderInfo, which >>> return the members' offsets in a Hashtable. >>> >>> Tests: JPRT, jtreg. >>> >>> Thanks >>> Yumin >> > From david.r.chase at oracle.com Mon Nov 3 12:49:55 2014 From: david.r.chase at oracle.com (David Chase) Date: Mon, 3 Nov 2014 07:49:55 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5456FB59.60905@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> Message-ID: <632A5C98-B386-4625-BE12-355241581955@oracle.com> On 2014-11-02, at 10:49 PM, David Holmes wrote: >> The change is to load the volatile size for the loop bound; this stops the stores >> in the loop from moving earlier, right? > > Treating volatile accesses like memory barriers is playing a bit fast-and-loose with the spec+implementation. The basic happens-before relationship for volatiles states that if a volatile read sees a value X, then the volatile write that wrote X happened-before the read [1]. But in this code there are no checks of the values of the volatile fields. Instead you are relying on a volatile read "acting like acquire()" and a volatile write "acting like release()". > > That said you are trying to "synchronize" the hotspot code with the JDK code so you have stepped outside the JMM in any case and reasoning about what is and is not allowed is somewhat moot - unless the hotspot code always uses Java-style accesses to the Java-level variables. My main concern is that the compiler is inhibited from any peculiar code motion; I assume that taking a safe point has a bit of barrier built into it anyway, especially given that the worry case is safepoint + JVMTI. Given the worry, what?s the best way to spell ?barrier? here? I could synchronize on classData (it would be a recursive lock in the current version of the code) synchronized (this) { size++; } or I could synchronize on elementData (no longer used for a lock elsewhere, so always uncontended) synchronized (elementData) { size++; } or is there some Unsafe thing that would be better? (core-libs-dev ? there will be another webrev coming. This is a runtime+jdk patch.) David > BTW the Java side of this needs to be reviewed on core-libs-dev at openjdk.java.net > > David H. > > [1] http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.4 > > >> David From peter.levart at gmail.com Mon Nov 3 16:16:53 2014 From: peter.levart at gmail.com (Peter Levart) Date: Mon, 03 Nov 2014 17:16:53 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <632A5C98-B386-4625-BE12-355241581955@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> Message-ID: <5457AA75.8090103@gmail.com> On 11/03/2014 01:49 PM, David Chase wrote: > On 2014-11-02, at 10:49 PM, David Holmes wrote: >>> The change is to load the volatile size for the loop bound; this stops the stores >>> in the loop from moving earlier, right? >> Treating volatile accesses like memory barriers is playing a bit fast-and-loose with the spec+implementation. The basic happens-before relationship for volatiles states that if a volatile read sees a value X, then the volatile write that wrote X happened-before the read [1]. But in this code there are no checks of the values of the volatile fields. Instead you are relying on a volatile read "acting like acquire()" and a volatile write "acting like release()". >> >> That said you are trying to "synchronize" the hotspot code with the JDK code so you have stepped outside the JMM in any case and reasoning about what is and is not allowed is somewhat moot - unless the hotspot code always uses Java-style accesses to the Java-level variables. > My main concern is that the compiler is inhibited from any peculiar code motion; I assume that taking a safe point has a bit of barrier built into it anyway, especially given that the worry case is safepoint + JVMTI. > > Given the worry, what?s the best way to spell ?barrier? here? > I could synchronize on classData (it would be a recursive lock in the current version of the code) > synchronized (this) { size++; } > or I could synchronize on elementData (no longer used for a lock elsewhere, so always uncontended) > synchronized (elementData) { size++; } > or is there some Unsafe thing that would be better? > > (core-libs-dev ? there will be another webrev coming. This is a runtime+jdk patch.) > > David Hi David, You're worried that writes moving array elements up for one slot would bubble up before write of size = size+1, right? If that happens, VM could skip an existing (last) element and not update it. It seems that Unsafe.storeFence() between size++ and moving of elements could do, as the javadoc for it says: /** * Ensures lack of reordering of stores before the fence * with loads or stores after the fence. * @since 1.8 */ public native void storeFence(); Regards, Peter > >> BTW the Java side of this needs to be reviewed on core-libs-dev at openjdk.java.net >> >> David H. >> >> [1] http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.4 >> >> >>> David From david.r.chase at oracle.com Mon Nov 3 16:36:59 2014 From: david.r.chase at oracle.com (David Chase) Date: Mon, 3 Nov 2014 11:36:59 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5457AA75.8090103@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> Message-ID: >> My main concern is that the compiler is inhibited from any peculiar code motion; I assume that taking a safe point has a bit of barrier built into it anyway, especially given that the worry case is safepoint + JVMTI. >> >> Given the worry, what?s the best way to spell ?barrier? here? >> I could synchronize on classData (it would be a recursive lock in the current version of the code) >> synchronized (this) { size++; } >> or I could synchronize on elementData (no longer used for a lock elsewhere, so always uncontended) >> synchronized (elementData) { size++; } >> or is there some Unsafe thing that would be better? > > You're worried that writes moving array elements up for one slot would bubble up before write of size = size+1, right? If that happens, VM could skip an existing (last) element and not update it. exactly, with the restriction that it would be compiler-induced bubbling, not architectural. Which is both better, and worse ? I don?t have to worry about crazy hardware, but the rules of java/jvm "memory model" are not as thoroughly defined as those for java itself. I added a method to Atomic (.storeFence() ). New webrev to come after I rebuild and retest. Thanks much, David > It seems that Unsafe.storeFence() between size++ and moving of elements could do, as the javadoc for it says: > > /** > * Ensures lack of reordering of stores before the fence > * with loads or stores after the fence. > * @since 1.8 > */ > public native void storeFence(); From peter.levart at gmail.com Mon Nov 3 16:42:30 2014 From: peter.levart at gmail.com (Peter Levart) Date: Mon, 03 Nov 2014 17:42:30 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5457AA75.8090103@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> Message-ID: <5457B076.10205@gmail.com> On 11/03/2014 05:16 PM, Peter Levart wrote: > On 11/03/2014 01:49 PM, David Chase wrote: >> On 2014-11-02, at 10:49 PM, David Holmes >> wrote: >>>> The change is to load the volatile size for the loop bound; this >>>> stops the stores >>>> in the loop from moving earlier, right? >>> Treating volatile accesses like memory barriers is playing a bit >>> fast-and-loose with the spec+implementation. The basic >>> happens-before relationship for volatiles states that if a volatile >>> read sees a value X, then the volatile write that wrote X >>> happened-before the read [1]. But in this code there are no checks >>> of the values of the volatile fields. Instead you are relying on a >>> volatile read "acting like acquire()" and a volatile write "acting >>> like release()". >>> >>> That said you are trying to "synchronize" the hotspot code with the >>> JDK code so you have stepped outside the JMM in any case and >>> reasoning about what is and is not allowed is somewhat moot - unless >>> the hotspot code always uses Java-style accesses to the Java-level >>> variables. >> My main concern is that the compiler is inhibited from any peculiar >> code motion; I assume that taking a safe point has a bit of barrier >> built into it anyway, especially given that the worry case is >> safepoint + JVMTI. >> >> Given the worry, what?s the best way to spell ?barrier? here? >> I could synchronize on classData (it would be a recursive lock in the >> current version of the code) >> synchronized (this) { size++; } >> or I could synchronize on elementData (no longer used for a lock >> elsewhere, so always uncontended) >> synchronized (elementData) { size++; } >> or is there some Unsafe thing that would be better? >> >> (core-libs-dev ? there will be another webrev coming. This is a >> runtime+jdk patch.) >> >> David > > Hi David, > > You're worried that writes moving array elements up for one slot would > bubble up before write of size = size+1, right? If that happens, VM > could skip an existing (last) element and not update it. > > It seems that Unsafe.storeFence() between size++ and moving of > elements could do, as the javadoc for it says: > > /** > * Ensures lack of reordering of stores before the fence > * with loads or stores after the fence. > * @since 1.8 > */ > public native void storeFence(); You might need a storeFence() between each two writes into the array too. Your moving loop is the following: 2544 for (int i = oldCapacity; i > index; i--) { 2545 // pre: element_data[i] is duplicated at [i+1] 2546 element_data[i] = element_data[i - 1]; 2547 // post: element_data[i-1] is duplicated at [i] 2548 } If we start unrolling, it becomes: w1: element_data[old_capacity - 0] = element_data[old_capacity - 1]; w2: element_data[old_capacity - 1] = element_data[old_capacity - 2]; w3: element_data[old_capacity - 2] = element_data[old_capacity - 3]; ... Can compiler reorder w2 and w3 (just writes - not the whole statements)? Say that it reads a chunk of elements into the registers and then writes them out, but in different order, and a check for safepoint comes inside this chunk of writes... This is hypothetical, but it could do it without breaking the local semantics... Peter > > > Regards, Peter > > > >> >>> BTW the Java side of this needs to be reviewed on >>> core-libs-dev at openjdk.java.net >>> >>> David H. >>> >>> [1] >>> http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.4 >>> >>> >>>> David > From christian.tornqvist at oracle.com Mon Nov 3 16:58:12 2014 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Mon, 3 Nov 2014 11:58:12 -0500 Subject: RFR(L): 8056049: getProcessCpuLoad() stops working in one process when a different process exits In-Reply-To: <88c4f758-f8b7-45d2-96e6-d940bb40fcb3@default> References: <88c4f758-f8b7-45d2-96e6-d940bb40fcb3@default> Message-ID: <002f01cff787$5b106340$113129c0$@oracle.com> Hi Markus, Thanks for the detailed walkthrough, this looks really good. Thanks, Christian -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Markus Gr?nlund Sent: Wednesday, October 29, 2014 5:23 AM To: hotspot-runtime-dev at openjdk.java.net Subject: FW: RFR(L): 8056049: getProcessCpuLoad() stops working in one process when a different process exits Hi, Trying my luck with Runtime as well - any Windows people that might be able to do a review? Bug: https://bugs.openjdk.java.net/browse/JDK-8056049 Webrev: http://cr.openjdk.java.net/~mgronlun/8056049/webrev01/ Thanks in advance Markus -----Original Message----- From: Markus Gr?nlund Sent: den 24 oktober 2014 12:04 To: core-libs-dev Libs Subject: FW: RFR(L): 8056049: getProcessCpuLoad() stops working in one process when a different process exits Also sending this to core-libs. ? Thanks in advance Markus ? From: Markus Gr?nlund Sent: den 22 oktober 2014 11:44 To: serviceability-dev at openjdk.java.net; jmx-dev at openjdk.java.net Subject: RFR(L): 8056049: getProcessCpuLoad() stops working in one process when a different process exits ? Greetings, ? Kindly asking for reviews for the following changeset. ? Bug: https://bugs.openjdk.java.net/browse/JDK-8056049 Webrev: http://cr.openjdk.java.net/~mgronlun/8056049/webrev01/ ? Description: ? The issue is ?Windows specific. And the problem relates to using the Performance Data Helper API (PDH), more specifically how to use the "Process" PDH object in PDH queries: ? // code comment extract ? /* * Working against the Process object and it's related counters is inherently problematic * when using the PDH API: * * For PDH, a process is not primarily identified by it's process id, * but with a sequential number, for example \Process(java#0), \Process(java#1), .... * The really bad part is that this list is reset as soon as one process exits: * If \Process(java#1) exits, \Process(java#3) now becomes \Process(java#2) etc. * * The PDH query api requires a process identifier to be submitted when registering * a query, but as soon as the list resets, the query is invalidated (since the name * changed). * * Solution: * The #number identifier for a Process query can only decrease after process creation. * * Therefore we create an array of counter queries for all process object instances * up to and including ourselves: * * Ex. we come in as third process instance (java#2), we then create and register * queries for the following Process object instances: * java#0, java#1, java#2 * * currentQueryIndexForProcess() keeps track of the current "correct" query * (in order to keep this index valid when the list resets from underneath, * ensure to call getCurrentQueryIndexForProcess() before every query involving * Process object instance data). */ ? I have already fixed this in the VM as of https://bugs.openjdk.java.net/browse/JDK-8019921 ? In the process of fixing this issue now in the JDK, I realized that the previous implementation of using PDH in the JDK was a bit convoluted - especially if you would like to reuse functionality / add new counters. ? Therefore this change also includes an overall rewrite of the how the JDK will interface with the PDH library, a rewrite of which (hopefully) improves both readability and extensibility. ? I can do a code walkthrough live if anyone is interested to know the exact details of this change. ? Testing completed : Testset SVC (includes jdk_instrument, jdk_management, jdk_jmx, jdk_jdi) ? Thanks in advance Markus ? From david.r.chase at oracle.com Mon Nov 3 17:28:00 2014 From: david.r.chase at oracle.com (David Chase) Date: Mon, 3 Nov 2014 12:28:00 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5457B076.10205@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457B076.10205@gmail.com> Message-ID: <73CF1882-22F1-4D9A-B37A-EAC9BCF675B0@oracle.com> On 2014-11-03, at 11:42 AM, Peter Levart wrote: >> You're worried that writes moving array elements up for one slot would bubble up before write of size = size+1, right? If that happens, VM could skip an existing (last) element and not update it. >> >> It seems that Unsafe.storeFence() between size++ and moving of elements could do, as the javadoc for it says: >> >> /** >> * Ensures lack of reordering of stores before the fence >> * with loads or stores after the fence. >> * @since 1.8 >> */ >> public native void storeFence(); > You might need a storeFence() between each two writes into the array too. Your moving loop is the following: > > 2544 for (int i = oldCapacity; i > index; i--) { > 2545 // pre: element_data[i] is duplicated at [i+1] > 2546 element_data[i] = element_data[i - 1]; > 2547 // post: element_data[i-1] is duplicated at [i] > 2548 } > > > If we start unrolling, it becomes: > > w1: element_data[old_capacity - 0] = element_data[old_capacity - 1]; > w2: element_data[old_capacity - 1] = element_data[old_capacity - 2]; > w3: element_data[old_capacity - 2] = element_data[old_capacity - 3]; > ... > > Can compiler reorder w2 and w3 (just writes - not the whole statements)? Say that it reads a chunk of elements into the registers and then writes them out, but in different order, and a check for safepoint comes inside this chunk of writes... This is hypothetical, but it could do it without breaking the local semantics? I think you are right, certainly in theory, and if I don?t hear someone else declaring that in practice we?re both just being paranoid, I?ll do that too. Seems like it might eventually slow things down to do all those fences. David From mikhailo.seledtsov at oracle.com Mon Nov 3 20:05:55 2014 From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov) Date: Mon, 03 Nov 2014 12:05:55 -0800 Subject: RFR: 8062247: Allow WhiteBox test to access JVM offsets In-Reply-To: <54571827.5050807@oracle.com> References: <544F150A.3090500@oracle.com> <54518A60.9080800@oracle.com> <5453DB4F.70709@oracle.com> <54571827.5050807@oracle.com> Message-ID: <5457E023.20801@oracle.com> Hi Yumin, If this API is intended to get offsets for various data structures, I would expect a data structure type identified to be passed as a parameter. For instance, public native int getOffset(String dataStructureId, String fieldName) where dataStructureId would be some kind of ID for data structure, either data structure name or internal alias fieldName - the name of the field for which the offset value is returned A specific error code could be returned for unsupported dataStructureId Alternatively, this could be an API specific to a given data-structure. E.g. getMySpecifiedDatastructOffset(String fieldName) Thank you, Misha On 11/2/2014 9:52 PM, Yumin Qi wrote: > Misha, > > It is a generic name, now it only targets on FileMapHeader, it can > add other data structure of vm if needed in future. Maybe a name like > getOffsetForName(String name) is better? > > Thanks > Yumin > > On 10/31/2014 11:56 AM, Mikhailo Seledtsov wrote: >> Hi Yumin, >> >> The name getOffsets() seems too generic. Perhaps, we could rename it >> to be more specific to the task. >> >> Thank you, >> Misha >> >> On 10/29/2014 5:46 PM, Yumin Qi wrote: >>> Please review the new changeset at same location. >>> New API supply an interface to get data member offset by it's name. >>> http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>> >>> Thanks >>> Yumin >>> >>> On 10/27/2014 9:01 PM, Yumin Qi wrote: >>>> Please review >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8062247 >>>> webrev: http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>> >>>> Summary: Internal test failed since the variable offsets changed in >>>> hotspot. The way to get offset in the test is hard-coded. To reduce >>>> the risk of future changes of hotspot offsets, the fix add a >>>> WhiteBox API function to get a map for FileMapHeaderInfo, which >>>> return the members' offsets in a Hashtable. >>>> >>>> Tests: JPRT, jtreg. >>>> >>>> Thanks >>>> Yumin >>> >> > From peter.levart at gmail.com Mon Nov 3 20:09:29 2014 From: peter.levart at gmail.com (Peter Levart) Date: Mon, 03 Nov 2014 21:09:29 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> Message-ID: <5457E0F9.8090004@gmail.com> Hi David, I was thinking about the fact that java.lang.invoke code is leaking into java.lang.Class. Perhaps, If you don't mind rewriting the code, a better code structure would be, if j.l.Class changes only consisted of adding a simple: + // A reference to canonicalizing cache of java.lang.invoke.MemberName(s) + // for members declared by class represented by this Class object + private transient volatile Object memberNameData; ...and nothing else. All the logic could live in MemberName itself (together with Unsafe machinery for accessing/cas-ing Class.memberNameData). Now to an idea about implementation. Since VM code is not doing any binary-search and only linearly scans the array when it has to update MemberNames, the code could be changed to scan a linked-list of MemberName(s) instead. You could add a field to MemberName: class MemberName { ... // next MemberName in chain of interned MemberNames for particular declaring class private MemberName next; Have a volatile field in MemberNameData (or ClassData - whatever you call it): class MemberNameData { ... // a chain of interned MemberName(s) for particular declaring class // accessed by VM when it has to modify them in-place private volatile MemberName memberNames; MemberName add(Class klass, int index, MemberName mn, int redefined_count) { mn.next = memberNames; memberNames = mn; if (jla.getClassRedefinedCount(klass) == redefined_count) { // no changes to class ... ... code to update array of sorted MemberName(s) with new 'mn' ... return mn; } // lost race, undo insertion memberNames = mn.next; return null; } This way all the worries about ordering of writes into array and/or size are gone. The array is still used to quickly search for an element, but VM only scans the linked-list. What do you think of this? Regards, Peter On 11/03/2014 05:36 PM, David Chase wrote: >>> My main concern is that the compiler is inhibited from any peculiar code motion; I assume that taking a safe point has a bit of barrier built into it anyway, especially given that the worry case is safepoint + JVMTI. >>> >>> Given the worry, what?s the best way to spell ?barrier? here? >>> I could synchronize on classData (it would be a recursive lock in the current version of the code) >>> synchronized (this) { size++; } >>> or I could synchronize on elementData (no longer used for a lock elsewhere, so always uncontended) >>> synchronized (elementData) { size++; } >>> or is there some Unsafe thing that would be better? >> You're worried that writes moving array elements up for one slot would bubble up before write of size = size+1, right? If that happens, VM could skip an existing (last) element and not update it. > exactly, with the restriction that it would be compiler-induced bubbling, not architectural. > Which is both better, and worse ? I don?t have to worry about crazy hardware, but the rules > of java/jvm "memory model" are not as thoroughly defined as those for java itself. > > I added a method to Atomic (.storeFence() ). New webrev to come after I rebuild and retest. > > Thanks much, > > David > >> It seems that Unsafe.storeFence() between size++ and moving of elements could do, as the javadoc for it says: >> >> /** >> * Ensures lack of reordering of stores before the fence >> * with loads or stores after the fence. >> * @since 1.8 >> */ >> public native void storeFence(); From coleen.phillimore at oracle.com Mon Nov 3 20:19:54 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 03 Nov 2014 15:19:54 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: Message-ID: <5457E36A.3020800@oracle.com> Hi Jeremy, I reviewed your new code and it looks fine. I had one comment in http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html The name "need_to_resolve" doesn't make sense when reading this code. Isn't it more like "need_to_ensure_space" ? I think method resolution with the other name, which it doesn't do. I was trying to find a way to make this new code not appear twice (maybe with a local jvmtiEnv function get_jmethodID(m) - instanceK_h is m->method_holder()). Also, I was trying to figure out if the new class in utilities called chunkedList.hpp could be used to store jmethodIDs, since the data structures are similar. There is still more things in JNIMethodBlock has to do so I think a specialized structure is still needed (which is why I originally wrote it to be very simple). I'm not sure if the comment above it still applies. Maybe only the first and third sentences. Can you rewrite the comment slightly? Your other comments in the changes are good. I can't completely answer your question about reusing free_methods - but if a jmethodID is created provisionally in InstanceKlass::get_jmethod_id and not needed because it loses the race in the method id cache, it's never handed back to native code, so it's safe to reuse. This is different than jmethodIDs for methods that are unloaded. They are cleared and never reused. At least that's my reading of this caching code but it's pretty complicated stuff. I've also run our nsk and jck vm/jvmti on this change and they all passed. I'd be happy to sponsor it with these suggested changes and it needs another reviewer. Thanks for diagnosing and fixing this problem! Coleen On 10/30/2014 01:02 PM, Jeremy Manson wrote: > There's a significant regression in the speed of JVMTI GetClassMethods in > JDK8. I've tracked this down to allocation of jmethodids in a tight loop. > The issue can be addressed by preallocating enough space for all of the > jmethodids when starting the operation and not iterating over all of the > existing jmethodids when you allocate a new one. > > A patch is here: > > http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ > > A reproducible test case can be found here: > > http://cr.openjdk.java.net/~jmanson/8062116/repro/ > > It's a benchmark, though: I have no idea how to turn it into a test. > > For whoever reviews it: can you explain to me why it is okay that this code > reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a lot of > problems stemming from accidental reuse. > > Jeremy From david.r.chase at oracle.com Mon Nov 3 20:41:30 2014 From: david.r.chase at oracle.com (David Chase) Date: Mon, 3 Nov 2014 15:41:30 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5457E0F9.8090004@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> Message-ID: On 2014-11-03, at 3:09 PM, Peter Levart wrote: > Hi David, > > I was thinking about the fact that java.lang.invoke code is leaking into java.lang.Class. Perhaps, If you don't mind rewriting the code, a better code structure would be, if j.l.Class changes only consisted of adding a simple: > ? > This way all the worries about ordering of writes into array and/or size are gone. The array is still used to quickly search for an element, but VM only scans the linked-list. > > What do you think of this? I?m not sure. I know Coleen Ph would like to see that happen. A couple of people have vague plans to move more of the MemberName resolution into core libs. (Years ago I worked on a VM where *all* of this occurred in Java, but some of it was ahead of time.) I heard mention of ?we want to put more stuff in there? but I got the impression that already happened (there?s reflection data, for example) so I?m not sure that makes sense. There?s also a proposal from people in the runtime to just use a jmethodid, take the hit of an extra indirection, and no need to for this worrisome jvm/java concurrency. And if we instead wrote a hash table that only grew, and never relocated elements, we could (I think) allow non-synchronized O(1) probes of the table from the Java side, synchronized O(1) insertions from the Java side, and because nothing moves, a smaller dance with the VM. I?m rather tempted to look into this ? given the amount of work it would take to do the benchmarking to see if (a) jmethodid would have acceptable performance or (b) the existing costs are too high, I could instead just write fast code and be done. And another way to view this is that we?re now quibbling about performance, when we still have an existing correctness problem that this patch solves, so maybe we should just get this done and then file an RFE. David From yumin.qi at oracle.com Mon Nov 3 21:15:17 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Mon, 03 Nov 2014 13:15:17 -0800 Subject: RFR: 8062247: Allow WhiteBox test to access JVM offsets In-Reply-To: <5457E023.20801@oracle.com> References: <544F150A.3090500@oracle.com> <54518A60.9080800@oracle.com> <5453DB4F.70709@oracle.com> <54571827.5050807@oracle.com> <5457E023.20801@oracle.com> Message-ID: <5457F065.7070300@oracle.com> If you check the name passed to this function, it already told (I just changed to getOffsetForName and in testing): wb.getOffsetForName("FileMapHeader::_crc") Do not need to give type info(that will need build a list for type info like in vmStructs). This is a simple code fore testing purpose, I think we should keep it simple. I added that if the offsetname not supported, throw exception instead. Will post webrev today after jprt finished. Thanks Yumin On 11/3/2014 12:05 PM, Mikhailo Seledtsov wrote: > Hi Yumin, > > If this API is intended to get offsets for various data structures, I > would expect a data structure type identified to be passed as a > parameter. For instance, > > public native int getOffset(String dataStructureId, String fieldName) > where > dataStructureId would be some kind of ID for data structure, > either data structure name or internal alias > fieldName - the name of the field for which the offset value > is returned > A specific error code could be returned for unsupported > dataStructureId > > Alternatively, this could be an API specific to a given > data-structure. E.g. getMySpecifiedDatastructOffset(String fieldName) > > Thank you, > Misha > > > On 11/2/2014 9:52 PM, Yumin Qi wrote: >> Misha, >> >> It is a generic name, now it only targets on FileMapHeader, it can >> add other data structure of vm if needed in future. Maybe a name like >> getOffsetForName(String name) is better? >> >> Thanks >> Yumin >> >> On 10/31/2014 11:56 AM, Mikhailo Seledtsov wrote: >>> Hi Yumin, >>> >>> The name getOffsets() seems too generic. Perhaps, we could rename >>> it to be more specific to the task. >>> >>> Thank you, >>> Misha >>> >>> On 10/29/2014 5:46 PM, Yumin Qi wrote: >>>> Please review the new changeset at same location. >>>> New API supply an interface to get data member offset by it's name. >>>> http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>> >>>> Thanks >>>> Yumin >>>> >>>> On 10/27/2014 9:01 PM, Yumin Qi wrote: >>>>> Please review >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8062247 >>>>> webrev: http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>>> >>>>> Summary: Internal test failed since the variable offsets changed >>>>> in hotspot. The way to get offset in the test is hard-coded. To >>>>> reduce the risk of future changes of hotspot offsets, the fix add >>>>> a WhiteBox API function to get a map for FileMapHeaderInfo, which >>>>> return the members' offsets in a Hashtable. >>>>> >>>>> Tests: JPRT, jtreg. >>>>> >>>>> Thanks >>>>> Yumin >>>> >>> >> > From christian.thalinger at oracle.com Mon Nov 3 21:30:07 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 3 Nov 2014 13:30:07 -0800 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> Message-ID: <00703487-00EB-4E43-9613-01EE9EE64147@oracle.com> > On Nov 3, 2014, at 12:41 PM, David Chase wrote: > > > On 2014-11-03, at 3:09 PM, Peter Levart wrote: > >> Hi David, >> >> I was thinking about the fact that java.lang.invoke code is leaking into java.lang.Class. Perhaps, If you don't mind rewriting the code, a better code structure would be, if j.l.Class changes only consisted of adding a simple: >> ? > >> This way all the worries about ordering of writes into array and/or size are gone. The array is still used to quickly search for an element, but VM only scans the linked-list. >> >> What do you think of this? > > I?m not sure. I know Coleen Ph would like to see that happen. > > A couple of people have vague plans to move more of the MemberName resolution into core libs. > (Years ago I worked on a VM where *all* of this occurred in Java, but some of it was ahead of time.) > > I heard mention of ?we want to put more stuff in there? but I got the impression that already happened > (there?s reflection data, for example) so I?m not sure that makes sense. > > There?s also a proposal from people in the runtime to just use a jmethodid, take the hit of an extra indirection, > and no need to for this worrisome jvm/java concurrency. > > And if we instead wrote a hash table that only grew, and never relocated elements, we could > (I think) allow non-synchronized O(1) probes of the table from the Java side, synchronized > O(1) insertions from the Java side, and because nothing moves, a smaller dance with the > VM. I?m rather tempted to look into this ? given the amount of work it would take to do the > benchmarking to see if (a) jmethodid would have acceptable performance or (b) the existing > costs are too high, I could instead just write fast code and be done. ?but you still have to do the benchmarking. Let?s not forget that there was a performance regression with the first C++ implementation of this. > > And another way to view this is that we?re now quibbling about performance, when we still > have an existing correctness problem that this patch solves, so maybe we should just get this > done and then file an RFE. > > David > From mikhailo.seledtsov at oracle.com Mon Nov 3 21:34:44 2014 From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov) Date: Mon, 03 Nov 2014 13:34:44 -0800 Subject: RFR: 8062247: Allow WhiteBox test to access JVM offsets In-Reply-To: <5457F065.7070300@oracle.com> References: <544F150A.3090500@oracle.com> <54518A60.9080800@oracle.com> <5453DB4F.70709@oracle.com> <54571827.5050807@oracle.com> <5457E023.20801@oracle.com> <5457F065.7070300@oracle.com> Message-ID: <5457F4F4.60804@oracle.com> Yumin, OK, I see you chose to pass the name of the struct implicitly instead of using an explicit parameter. I have no strong objection to that. Please make sure to clarify this in the comments. Thank you, Misha On 11/3/2014 1:15 PM, Yumin Qi wrote: > If you check the name passed to this function, it already told (I just > changed to getOffsetForName and in testing): > > wb.getOffsetForName("FileMapHeader::_crc") > > Do not need to give type info(that will need build a list for type > info like in vmStructs). This is a simple code fore testing purpose, I > think we should keep it simple. I added that if the offsetname not > supported, throw exception instead. Will post webrev today after jprt > finished. > > Thanks > Yumin > > On 11/3/2014 12:05 PM, Mikhailo Seledtsov wrote: >> Hi Yumin, >> >> If this API is intended to get offsets for various data structures, >> I would expect a data structure type identified to be passed as a >> parameter. For instance, >> >> public native int getOffset(String dataStructureId, String fieldName) >> where >> dataStructureId would be some kind of ID for data structure, >> either data structure name or internal alias >> fieldName - the name of the field for which the offset value >> is returned >> A specific error code could be returned for unsupported >> dataStructureId >> >> Alternatively, this could be an API specific to a given >> data-structure. E.g. getMySpecifiedDatastructOffset(String fieldName) >> >> Thank you, >> Misha >> >> >> On 11/2/2014 9:52 PM, Yumin Qi wrote: >>> Misha, >>> >>> It is a generic name, now it only targets on FileMapHeader, it can >>> add other data structure of vm if needed in future. Maybe a name >>> like getOffsetForName(String name) is better? >>> >>> Thanks >>> Yumin >>> >>> On 10/31/2014 11:56 AM, Mikhailo Seledtsov wrote: >>>> Hi Yumin, >>>> >>>> The name getOffsets() seems too generic. Perhaps, we could rename >>>> it to be more specific to the task. >>>> >>>> Thank you, >>>> Misha >>>> >>>> On 10/29/2014 5:46 PM, Yumin Qi wrote: >>>>> Please review the new changeset at same location. >>>>> New API supply an interface to get data member offset by it's name. >>>>> http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>>> >>>>> Thanks >>>>> Yumin >>>>> >>>>> On 10/27/2014 9:01 PM, Yumin Qi wrote: >>>>>> Please review >>>>>> >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8062247 >>>>>> webrev: http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>>>> >>>>>> Summary: Internal test failed since the variable offsets changed >>>>>> in hotspot. The way to get offset in the test is hard-coded. To >>>>>> reduce the risk of future changes of hotspot offsets, the fix add >>>>>> a WhiteBox API function to get a map for FileMapHeaderInfo, which >>>>>> return the members' offsets in a Hashtable. >>>>>> >>>>>> Tests: JPRT, jtreg. >>>>>> >>>>>> Thanks >>>>>> Yumin >>>>> >>>> >>> >> > From yumin.qi at oracle.com Tue Nov 4 00:11:21 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Mon, 03 Nov 2014 16:11:21 -0800 Subject: RFR: 8062247: Allow WhiteBox test to access JVM offsets In-Reply-To: <5457F4F4.60804@oracle.com> References: <544F150A.3090500@oracle.com> <54518A60.9080800@oracle.com> <5453DB4F.70709@oracle.com> <54571827.5050807@oracle.com> <5457E023.20801@oracle.com> <5457F065.7070300@oracle.com> <5457F4F4.60804@oracle.com> Message-ID: <545819A9.20400@oracle.com> I have made change to the function name in WhiteBox. New webrev at http://cr.openjdk.java.net/~minqi/8062247/webrev01/ The function getOffsetForName(String name), takes a form of "FileMapHead::_magic" as in vm, if there is no name present in vm for search its offset, a RuntimeExcetion with message " not found" will be thrown. tests: JPRT, jtreg Thanks Yumin On 11/3/2014 1:34 PM, Mikhailo Seledtsov wrote: > Yumin, > > OK, I see you chose to pass the name of the struct implicitly instead > of using an explicit parameter. I have no strong objection to that. > Please make sure to clarify this in the comments. > > Thank you, > Misha > > On 11/3/2014 1:15 PM, Yumin Qi wrote: >> If you check the name passed to this function, it already told (I >> just changed to getOffsetForName and in testing): >> >> wb.getOffsetForName("FileMapHeader::_crc") >> >> Do not need to give type info(that will need build a list for type >> info like in vmStructs). This is a simple code fore testing purpose, >> I think we should keep it simple. I added that if the offsetname not >> supported, throw exception instead. Will post webrev today after jprt >> finished. >> >> Thanks >> Yumin >> >> On 11/3/2014 12:05 PM, Mikhailo Seledtsov wrote: >>> Hi Yumin, >>> >>> If this API is intended to get offsets for various data structures, >>> I would expect a data structure type identified to be passed as a >>> parameter. For instance, >>> >>> public native int getOffset(String dataStructureId, String fieldName) >>> where >>> dataStructureId would be some kind of ID for data structure, >>> either data structure name or internal alias >>> fieldName - the name of the field for which the offset value >>> is returned >>> A specific error code could be returned for unsupported >>> dataStructureId >>> >>> Alternatively, this could be an API specific to a given >>> data-structure. E.g. getMySpecifiedDatastructOffset(String fieldName) >>> >>> Thank you, >>> Misha >>> >>> >>> On 11/2/2014 9:52 PM, Yumin Qi wrote: >>>> Misha, >>>> >>>> It is a generic name, now it only targets on FileMapHeader, it >>>> can add other data structure of vm if needed in future. Maybe a >>>> name like getOffsetForName(String name) is better? >>>> >>>> Thanks >>>> Yumin >>>> >>>> On 10/31/2014 11:56 AM, Mikhailo Seledtsov wrote: >>>>> Hi Yumin, >>>>> >>>>> The name getOffsets() seems too generic. Perhaps, we could rename >>>>> it to be more specific to the task. >>>>> >>>>> Thank you, >>>>> Misha >>>>> >>>>> On 10/29/2014 5:46 PM, Yumin Qi wrote: >>>>>> Please review the new changeset at same location. >>>>>> New API supply an interface to get data member offset by it's name. >>>>>> http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>>>> >>>>>> Thanks >>>>>> Yumin >>>>>> >>>>>> On 10/27/2014 9:01 PM, Yumin Qi wrote: >>>>>>> Please review >>>>>>> >>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8062247 >>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>>>>> >>>>>>> Summary: Internal test failed since the variable offsets changed >>>>>>> in hotspot. The way to get offset in the test is hard-coded. To >>>>>>> reduce the risk of future changes of hotspot offsets, the fix >>>>>>> add a WhiteBox API function to get a map for FileMapHeaderInfo, >>>>>>> which return the members' offsets in a Hashtable. >>>>>>> >>>>>>> Tests: JPRT, jtreg. >>>>>>> >>>>>>> Thanks >>>>>>> Yumin >>>>>> >>>>> >>>> >>> >> > From daniel.daugherty at oracle.com Tue Nov 4 01:59:42 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 03 Nov 2014 18:59:42 -0700 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <5457084B.6070808@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> Message-ID: <5458330E.1080207@oracle.com> David, Thanks for the review! As usual, replies are embedded below... On 11/2/14 9:44 PM, David Holmes wrote: > Hi Dan, > > Looks good. Thanks! > Couple of nits and one semantic query below ... > > src/cpu/sparc/vm/macroAssembler_sparc.cpp > > Formatting changes were a bit of a distraction. Yes, I have no idea what got into me. Normally I do formatting changes separately so the noise does not distract... It turns out there is a constant defined that should be used instead of all these literal '2's: src/share/vm/oops/markOop.hpp: monitor_value = 2 Typically used as follows: src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; I will clean this up just for the files that I'm touching as part of this fix. > > --- > > src/cpu/x86/vm/macroAssembler_x86.cpp > > Formatting changes were a bit of a distraction. Same reply as for macroAssembler_sparc.cpp. > 1929 // unconditionally set stackBox->_displaced_header = 3 > 1930 movptr(Address(boxReg, 0), > (int32_t)intptr_t(markOopDesc::unused_mark())); > > At 1870 we refer to box rather than stackBox. Also it takes some > sleuthing to realize that "3" here is somehow a pseudonym for > unused_mark(). Back up at 1808 we have a to-do: > > 1808 // use markOop::unused_mark() instead of "3". > > so the current change seems to be implementing that, even though other > uses of "3" are left untouched. I'll take a look at cleaning those up also... In some cases markOopDesc::marked_value will work for the literal '3', but in other cases we'll use markOop::unused_mark(): static markOop unused_mark() { return (markOop) marked_value; } to save us the noise of the (markOop) cast. > --- > > src/share/vm/runtime/sharedRuntime.cpp > > 1794 JRT_BLOCK_ENTRY(void, > SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* > lock, JavaThread* thread)) > 1795 if (!SafepointSynchronize::is_synchronizing()) { > 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) return; > > Is it necessary to check is_synchronizing? If we are executing this > code we are not at a safepoint and the quick_enter wont change that, > so I'm not sure what we are guarding against. So this first state checker: src/share/vm/runtime/safepoint.hpp: inline static bool is_synchronizing() { return _state == _synchronizing; } means that we want to go to a safepoint and: inline static bool is_at_safepoint() { return _state == _synchronized; } means that we are at a safepoint. Dice's optimization bails out if we want to go to a safepoint and ObjectSynchronizer::quick_enter() has a "No_Safepoint_Verifier nsv" in it so we're expecting that code to be quick (and not go to a safepoint). I'm not seeing anything obvious.... Sometimes we have to be careful with JavaThread suspend requests and monitor acquisition, but I don't think that's a problem here... In order for the "suspend requesting" thread to be surprised, the suspend API, e.g., JVM/TI SuspendThread() has to return to the caller and then the suspend target has do something unexpected like acquire a monitor that it was previously blocked upon when it was suspended. We've had bugs like that in the past... In this optimization case, our target thread is not blocked on a contended monitor... In this particular case, the "suspend requesting" thread will set the suspend request state on the target thread, but the target thread is busy trying to enter this uncontended monitor (quickly). So the "suspend requesting" thread, will request a no-op safepoint, but it won't return from the suspend API until that safepoint completes. The safepoint won't complete until the target thread is done acquiring the previously uncontended monitor... so the target thread will be suspended while holding the previous uncontended monitor and the "suspend requesting" thread will return from the suspend API all happy... Well, I don't see the reason either so I'll have to ping Dave Dice and Karen Kinnear to see if either of them can fill in the history here. This could be an abundance of caution case. > --- > > src/share/vm/runtime/synchronizer.cpp > > Minor nit: line 153 the usual acronym is NPE (for > NullPointerException) not NPX I'll do a search for uses of NPX and other uses of 'X' in exception acronyms... > > Nit: 159 Thread * const ox > > Please change ox to owner. Will do. Thanks again for the review! Dan > > --- > > Thanks, > David > > > > On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have the Contended Locking fast enter bucket ready for review. >> >> The code changes in this bucket are primarily a quick_enter() >> function that works on inflated but uncontended Java monitors. >> This quick_enter() function is used on the "slow path" for Java >> Monitor enter operations when the built-in "fast path" (read >> assembly code) doesn't work. >> >> This work is being tracked by the following bug ID: >> >> JDK-8061553 Contended Locking fast enter bucket >> https://bugs.openjdk.java.net/browse/JDK-8061553 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >> >> Here is the JEP link: >> >> https://bugs.openjdk.java.net/browse/JDK-8046133 >> >> 8061553 summary of changes: >> >> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >> >> - clean up spacing around some >> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >> - remove optional (EmitSync & 64) code >> - change from cmp() to andcc() so icc.zf flag is set >> >> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >> >> - remove optional (EmitSync & 2) code >> - rewrite LP64 inflated lock code that tries to CAS in >> the new owner value to be more efficient >> >> interfaceSupport.hpp: >> >> - add JRT_BLOCK_NO_ASYNC to permit splitting a >> JRT_BLOCK_ENTRY into two pieces. >> >> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >> >> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >> to permit ObjectSynchronizer::quick_enter() call >> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >> to revert to JRT_ENTRY_NO_ASYNC-like semantics >> >> synchronizer.[ch]pp: >> >> - add ObjectSynchronizer::quick_enter() for entering an >> inflated but unowned Java monitor without thread state >> changes >> >> Testing: >> >> - Aurora Adhoc RT/SVC baseline batch >> - JPRT test jobs >> - MonitorEnterStresser micro-benchmark (in process) >> - CallTimerGrid stress testing (in process) >> - Aurora performance testing: >> - out of the box for the "promotion" and 32-bit server configs >> - heavy weight monitors for the "promotion" and 32-bit server configs >> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >> (in process) >> >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan From mikhailo.seledtsov at oracle.com Tue Nov 4 02:30:31 2014 From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov) Date: Mon, 03 Nov 2014 18:30:31 -0800 Subject: RFR: 8062247: Allow WhiteBox test to access JVM offsets In-Reply-To: <545819A9.20400@oracle.com> References: <544F150A.3090500@oracle.com> <54518A60.9080800@oracle.com> <5453DB4F.70709@oracle.com> <54571827.5050807@oracle.com> <5457E023.20801@oracle.com> <5457F065.7070300@oracle.com> <5457F4F4.60804@oracle.com> <545819A9.20400@oracle.com> Message-ID: <54583A47.2090207@oracle.com> Looks good to me. Misha On 11/3/2014 4:11 PM, Yumin Qi wrote: > I have made change to the function name in WhiteBox. > New webrev at > > http://cr.openjdk.java.net/~minqi/8062247/webrev01/ > > The function getOffsetForName(String name), takes a form of > "FileMapHead::_magic" as in vm, if there is no name present in vm for > search its offset, a RuntimeExcetion with message " not > found" will be thrown. > > tests: JPRT, jtreg > > Thanks > Yumin > > > On 11/3/2014 1:34 PM, Mikhailo Seledtsov wrote: >> Yumin, >> >> OK, I see you chose to pass the name of the struct implicitly >> instead of using an explicit parameter. I have no strong objection to >> that. >> Please make sure to clarify this in the comments. >> >> Thank you, >> Misha >> >> On 11/3/2014 1:15 PM, Yumin Qi wrote: >>> If you check the name passed to this function, it already told (I >>> just changed to getOffsetForName and in testing): >>> >>> wb.getOffsetForName("FileMapHeader::_crc") >>> >>> Do not need to give type info(that will need build a list for type >>> info like in vmStructs). This is a simple code fore testing purpose, >>> I think we should keep it simple. I added that if the offsetname not >>> supported, throw exception instead. Will post webrev today after >>> jprt finished. >>> >>> Thanks >>> Yumin >>> >>> On 11/3/2014 12:05 PM, Mikhailo Seledtsov wrote: >>>> Hi Yumin, >>>> >>>> If this API is intended to get offsets for various data >>>> structures, I would expect a data structure type identified to be >>>> passed as a parameter. For instance, >>>> >>>> public native int getOffset(String dataStructureId, String fieldName) >>>> where >>>> dataStructureId would be some kind of ID for data >>>> structure, either data structure name or internal alias >>>> fieldName - the name of the field for which the offset >>>> value is returned >>>> A specific error code could be returned for unsupported >>>> dataStructureId >>>> >>>> Alternatively, this could be an API specific to a given >>>> data-structure. E.g. getMySpecifiedDatastructOffset(String fieldName) >>>> >>>> Thank you, >>>> Misha >>>> >>>> >>>> On 11/2/2014 9:52 PM, Yumin Qi wrote: >>>>> Misha, >>>>> >>>>> It is a generic name, now it only targets on FileMapHeader, it >>>>> can add other data structure of vm if needed in future. Maybe a >>>>> name like getOffsetForName(String name) is better? >>>>> >>>>> Thanks >>>>> Yumin >>>>> >>>>> On 10/31/2014 11:56 AM, Mikhailo Seledtsov wrote: >>>>>> Hi Yumin, >>>>>> >>>>>> The name getOffsets() seems too generic. Perhaps, we could >>>>>> rename it to be more specific to the task. >>>>>> >>>>>> Thank you, >>>>>> Misha >>>>>> >>>>>> On 10/29/2014 5:46 PM, Yumin Qi wrote: >>>>>>> Please review the new changeset at same location. >>>>>>> New API supply an interface to get data member offset by it's name. >>>>>>> http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>>>>> >>>>>>> Thanks >>>>>>> Yumin >>>>>>> >>>>>>> On 10/27/2014 9:01 PM, Yumin Qi wrote: >>>>>>>> Please review >>>>>>>> >>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8062247 >>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8062247/webrev00/ >>>>>>>> >>>>>>>> Summary: Internal test failed since the variable offsets >>>>>>>> changed in hotspot. The way to get offset in the test is >>>>>>>> hard-coded. To reduce the risk of future changes of hotspot >>>>>>>> offsets, the fix add a WhiteBox API function to get a map for >>>>>>>> FileMapHeaderInfo, which return the members' offsets in a >>>>>>>> Hashtable. >>>>>>>> >>>>>>>> Tests: JPRT, jtreg. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Yumin >>>>>>> >>>>>> >>>>> >>>> >>> >> > From david.holmes at oracle.com Tue Nov 4 07:03:29 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 04 Nov 2014 17:03:29 +1000 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <5458330E.1080207@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> Message-ID: <54587A41.2020508@oracle.com> Hi Dan, One follow up deep below ... On 4/11/2014 11:59 AM, Daniel D. Daugherty wrote: > David, > > Thanks for the review! As usual, replies are embedded below... > > > On 11/2/14 9:44 PM, David Holmes wrote: >> Hi Dan, >> >> Looks good. > > Thanks! > > >> Couple of nits and one semantic query below ... >> >> src/cpu/sparc/vm/macroAssembler_sparc.cpp >> >> Formatting changes were a bit of a distraction. > > Yes, I have no idea what got into me. Normally I do formatting > changes separately so the noise does not distract... > > It turns out there is a constant defined that should be used > instead of all these literal '2's: > > src/share/vm/oops/markOop.hpp: monitor_value = 2 > > Typically used as follows: > > src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = > ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; > > I will clean this up just for the files that I'm touching as > part of this fix. > > >> >> --- >> >> src/cpu/x86/vm/macroAssembler_x86.cpp >> >> Formatting changes were a bit of a distraction. > > Same reply as for macroAssembler_sparc.cpp. > > >> 1929 // unconditionally set stackBox->_displaced_header = 3 >> 1930 movptr(Address(boxReg, 0), >> (int32_t)intptr_t(markOopDesc::unused_mark())); >> >> At 1870 we refer to box rather than stackBox. Also it takes some >> sleuthing to realize that "3" here is somehow a pseudonym for >> unused_mark(). Back up at 1808 we have a to-do: >> >> 1808 // use markOop::unused_mark() instead of "3". >> >> so the current change seems to be implementing that, even though other >> uses of "3" are left untouched. > > I'll take a look at cleaning those up also... > > In some cases markOopDesc::marked_value will work for the literal '3', > but in other cases we'll use markOop::unused_mark(): > > static markOop unused_mark() { > return (markOop) marked_value; > } > > to save us the noise of the (markOop) cast. > > >> --- >> >> src/share/vm/runtime/sharedRuntime.cpp >> >> 1794 JRT_BLOCK_ENTRY(void, >> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >> lock, JavaThread* thread)) >> 1795 if (!SafepointSynchronize::is_synchronizing()) { >> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) return; >> >> Is it necessary to check is_synchronizing? If we are executing this >> code we are not at a safepoint and the quick_enter wont change that, >> so I'm not sure what we are guarding against. > > So this first state checker: > > src/share/vm/runtime/safepoint.hpp: > inline static bool is_synchronizing() { return _state == > _synchronizing; } > > means that we want to go to a safepoint and: > > inline static bool is_at_safepoint() { return _state == _synchronized; } > > means that we are at a safepoint. Dice's optimization bails out if > we want to go to a safepoint and ObjectSynchronizer::quick_enter() > has a "No_Safepoint_Verifier nsv" in it so we're expecting that > code to be quick (and not go to a safepoint). I'm not seeing > anything obvious.... So it occurred to me that this is just an optimization not a true guard - as the safepoint could be initiated just after we do the check. So it's basically trying to ensure that if a safepoint has been requested then we don't unduly delay it by taking the non-safepointing quick_enter path. Cheers, David > Sometimes we have to be careful with JavaThread suspend requests and > monitor acquisition, but I don't think that's a problem here... In > order for the "suspend requesting" thread to be surprised, the suspend > API, e.g., JVM/TI SuspendThread() has to return to the caller and then > the suspend target has do something unexpected like acquire a monitor > that it was previously blocked upon when it was suspended. We've had > bugs like that in the past... In this optimization case, our target > thread is not blocked on a contended monitor... > > In this particular case, the "suspend requesting" thread will set the > suspend request state on the target thread, but the target thread is > busy trying to enter this uncontended monitor (quickly). So the > "suspend requesting" thread, will request a no-op safepoint, but it > won't return from the suspend API until that safepoint completes. > The safepoint won't complete until the target thread is done acquiring > the previously uncontended monitor... so the target thread will be > suspended while holding the previous uncontended monitor and the > "suspend requesting" thread will return from the suspend API all > happy... > > Well, I don't see the reason either so I'll have to ping Dave Dice > and Karen Kinnear to see if either of them can fill in the history > here. This could be an abundance of caution case. > > >> --- >> >> src/share/vm/runtime/synchronizer.cpp >> >> Minor nit: line 153 the usual acronym is NPE (for >> NullPointerException) not NPX > > I'll do a search for uses of NPX and other uses of 'X' in exception > acronyms... > > >> >> Nit: 159 Thread * const ox >> >> Please change ox to owner. > > Will do. > > Thanks again for the review! > > Dan > > >> >> --- >> >> Thanks, >> David >> >> >> >> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have the Contended Locking fast enter bucket ready for review. >>> >>> The code changes in this bucket are primarily a quick_enter() >>> function that works on inflated but uncontended Java monitors. >>> This quick_enter() function is used on the "slow path" for Java >>> Monitor enter operations when the built-in "fast path" (read >>> assembly code) doesn't work. >>> >>> This work is being tracked by the following bug ID: >>> >>> JDK-8061553 Contended Locking fast enter bucket >>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>> >>> Here is the JEP link: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>> >>> 8061553 summary of changes: >>> >>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>> >>> - clean up spacing around some >>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>> - remove optional (EmitSync & 64) code >>> - change from cmp() to andcc() so icc.zf flag is set >>> >>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>> >>> - remove optional (EmitSync & 2) code >>> - rewrite LP64 inflated lock code that tries to CAS in >>> the new owner value to be more efficient >>> >>> interfaceSupport.hpp: >>> >>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>> JRT_BLOCK_ENTRY into two pieces. >>> >>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>> >>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>> to permit ObjectSynchronizer::quick_enter() call >>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>> >>> synchronizer.[ch]pp: >>> >>> - add ObjectSynchronizer::quick_enter() for entering an >>> inflated but unowned Java monitor without thread state >>> changes >>> >>> Testing: >>> >>> - Aurora Adhoc RT/SVC baseline batch >>> - JPRT test jobs >>> - MonitorEnterStresser micro-benchmark (in process) >>> - CallTimerGrid stress testing (in process) >>> - Aurora performance testing: >>> - out of the box for the "promotion" and 32-bit server configs >>> - heavy weight monitors for the "promotion" and 32-bit server configs >>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>> (in process) >>> >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan > From peter.levart at gmail.com Tue Nov 4 10:07:56 2014 From: peter.levart at gmail.com (Peter Levart) Date: Tue, 04 Nov 2014 11:07:56 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> Message-ID: <5458A57C.4060208@gmail.com> On 11/03/2014 09:41 PM, David Chase wrote: > On 2014-11-03, at 3:09 PM, Peter Levart wrote: > >> Hi David, >> >> I was thinking about the fact that java.lang.invoke code is leaking into java.lang.Class. Perhaps, If you don't mind rewriting the code, a better code structure would be, if j.l.Class changes only consisted of adding a simple: >> ? >> This way all the worries about ordering of writes into array and/or size are gone. The array is still used to quickly search for an element, but VM only scans the linked-list. >> >> What do you think of this? > I?m not sure. I know Coleen Ph would like to see that happen. > > A couple of people have vague plans to move more of the MemberName resolution into core libs. > (Years ago I worked on a VM where *all* of this occurred in Java, but some of it was ahead of time.) Hi David, > > I heard mention of ?we want to put more stuff in there? but I got the impression that already happened > (there?s reflection data, for example) so I?m not sure that makes sense. Reflection is an API that is rooted in j.l.Class. If the plans are to move some of the java.lang.invoke public API to java.lang package (into the j.l.Class, ...), then this is understandable. > > There?s also a proposal from people in the runtime to just use a jmethodid, take the hit of an extra indirection, > and no need to for this worrisome jvm/java concurrency. The linked list of MemberName(s) is also worry-less and doesn't need an extra indirection via jmethodid. Does the hit of extra indirection occur when invoking a MethodHandle? > > And if we instead wrote a hash table that only grew, and never relocated elements, we could > (I think) allow non-synchronized O(1) probes of the table from the Java side, synchronized > O(1) insertions from the Java side, and because nothing moves, a smaller dance with the > VM. I?m rather tempted to look into this ? given the amount of work it would take to do the > benchmarking to see if (a) jmethodid would have acceptable performance or (b) the existing > costs are too high, I could instead just write fast code and be done. Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? > > And another way to view this is that we?re now quibbling about performance, when we still > have an existing correctness problem that this patch solves, so maybe we should just get this > done and then file an RFE. Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. Regards, Peter > > David > From daniel.daugherty at oracle.com Tue Nov 4 14:46:51 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 04 Nov 2014 07:46:51 -0700 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <54587A41.2020508@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> <54587A41.2020508@oracle.com> Message-ID: <5458E6DB.8020409@oracle.com> On 11/4/14 12:03 AM, David Holmes wrote: > Hi Dan, > > One follow up deep below ... > > On 4/11/2014 11:59 AM, Daniel D. Daugherty wrote: >> David, >> >> Thanks for the review! As usual, replies are embedded below... >> >> >> On 11/2/14 9:44 PM, David Holmes wrote: >>> Hi Dan, >>> >>> Looks good. >> >> Thanks! >> >> >>> Couple of nits and one semantic query below ... >>> >>> src/cpu/sparc/vm/macroAssembler_sparc.cpp >>> >>> Formatting changes were a bit of a distraction. >> >> Yes, I have no idea what got into me. Normally I do formatting >> changes separately so the noise does not distract... >> >> It turns out there is a constant defined that should be used >> instead of all these literal '2's: >> >> src/share/vm/oops/markOop.hpp: monitor_value = 2 >> >> Typically used as follows: >> >> src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = >> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; >> >> I will clean this up just for the files that I'm touching as >> part of this fix. >> >> >>> >>> --- >>> >>> src/cpu/x86/vm/macroAssembler_x86.cpp >>> >>> Formatting changes were a bit of a distraction. >> >> Same reply as for macroAssembler_sparc.cpp. >> >> >>> 1929 // unconditionally set stackBox->_displaced_header = 3 >>> 1930 movptr(Address(boxReg, 0), >>> (int32_t)intptr_t(markOopDesc::unused_mark())); >>> >>> At 1870 we refer to box rather than stackBox. Also it takes some >>> sleuthing to realize that "3" here is somehow a pseudonym for >>> unused_mark(). Back up at 1808 we have a to-do: >>> >>> 1808 // use markOop::unused_mark() instead of "3". >>> >>> so the current change seems to be implementing that, even though other >>> uses of "3" are left untouched. >> >> I'll take a look at cleaning those up also... >> >> In some cases markOopDesc::marked_value will work for the literal '3', >> but in other cases we'll use markOop::unused_mark(): >> >> static markOop unused_mark() { >> return (markOop) marked_value; >> } >> >> to save us the noise of the (markOop) cast. >> >> >>> --- >>> >>> src/share/vm/runtime/sharedRuntime.cpp >>> >>> 1794 JRT_BLOCK_ENTRY(void, >>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >>> lock, JavaThread* thread)) >>> 1795 if (!SafepointSynchronize::is_synchronizing()) { >>> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >>> return; >>> >>> Is it necessary to check is_synchronizing? If we are executing this >>> code we are not at a safepoint and the quick_enter wont change that, >>> so I'm not sure what we are guarding against. >> >> So this first state checker: >> >> src/share/vm/runtime/safepoint.hpp: >> inline static bool is_synchronizing() { return _state == >> _synchronizing; } >> >> means that we want to go to a safepoint and: >> >> inline static bool is_at_safepoint() { return _state == >> _synchronized; } >> >> means that we are at a safepoint. Dice's optimization bails out if >> we want to go to a safepoint and ObjectSynchronizer::quick_enter() >> has a "No_Safepoint_Verifier nsv" in it so we're expecting that >> code to be quick (and not go to a safepoint). I'm not seeing >> anything obvious.... > > So it occurred to me that this is just an optimization not a true > guard - as the safepoint could be initiated just after we do the > check. So it's basically trying to ensure that if a safepoint has been > requested then we don't unduly delay it by taking the non-safepointing > quick_enter path. Sounds reasonable to me. Dan > > Cheers, > David > >> Sometimes we have to be careful with JavaThread suspend requests and >> monitor acquisition, but I don't think that's a problem here... In >> order for the "suspend requesting" thread to be surprised, the suspend >> API, e.g., JVM/TI SuspendThread() has to return to the caller and then >> the suspend target has do something unexpected like acquire a monitor >> that it was previously blocked upon when it was suspended. We've had >> bugs like that in the past... In this optimization case, our target >> thread is not blocked on a contended monitor... >> >> In this particular case, the "suspend requesting" thread will set the >> suspend request state on the target thread, but the target thread is >> busy trying to enter this uncontended monitor (quickly). So the >> "suspend requesting" thread, will request a no-op safepoint, but it >> won't return from the suspend API until that safepoint completes. >> The safepoint won't complete until the target thread is done acquiring >> the previously uncontended monitor... so the target thread will be >> suspended while holding the previous uncontended monitor and the >> "suspend requesting" thread will return from the suspend API all >> happy... >> >> Well, I don't see the reason either so I'll have to ping Dave Dice >> and Karen Kinnear to see if either of them can fill in the history >> here. This could be an abundance of caution case. >> >> >>> --- >>> >>> src/share/vm/runtime/synchronizer.cpp >>> >>> Minor nit: line 153 the usual acronym is NPE (for >>> NullPointerException) not NPX >> >> I'll do a search for uses of NPX and other uses of 'X' in exception >> acronyms... >> >> >>> >>> Nit: 159 Thread * const ox >>> >>> Please change ox to owner. >> >> Will do. >> >> Thanks again for the review! >> >> Dan >> >> >>> >>> --- >>> >>> Thanks, >>> David >>> >>> >>> >>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I have the Contended Locking fast enter bucket ready for review. >>>> >>>> The code changes in this bucket are primarily a quick_enter() >>>> function that works on inflated but uncontended Java monitors. >>>> This quick_enter() function is used on the "slow path" for Java >>>> Monitor enter operations when the built-in "fast path" (read >>>> assembly code) doesn't work. >>>> >>>> This work is being tracked by the following bug ID: >>>> >>>> JDK-8061553 Contended Locking fast enter bucket >>>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>>> >>>> Here is the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>>> >>>> Here is the JEP link: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>> >>>> 8061553 summary of changes: >>>> >>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>>> >>>> - clean up spacing around some >>>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>>> - remove optional (EmitSync & 64) code >>>> - change from cmp() to andcc() so icc.zf flag is set >>>> >>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>>> >>>> - remove optional (EmitSync & 2) code >>>> - rewrite LP64 inflated lock code that tries to CAS in >>>> the new owner value to be more efficient >>>> >>>> interfaceSupport.hpp: >>>> >>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>>> JRT_BLOCK_ENTRY into two pieces. >>>> >>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>>> >>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>>> to permit ObjectSynchronizer::quick_enter() call >>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>>> >>>> synchronizer.[ch]pp: >>>> >>>> - add ObjectSynchronizer::quick_enter() for entering an >>>> inflated but unowned Java monitor without thread state >>>> changes >>>> >>>> Testing: >>>> >>>> - Aurora Adhoc RT/SVC baseline batch >>>> - JPRT test jobs >>>> - MonitorEnterStresser micro-benchmark (in process) >>>> - CallTimerGrid stress testing (in process) >>>> - Aurora performance testing: >>>> - out of the box for the "promotion" and 32-bit server configs >>>> - heavy weight monitors for the "promotion" and 32-bit server >>>> configs >>>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>>> (in process) >>>> >>>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan >> From david.r.chase at oracle.com Tue Nov 4 15:19:24 2014 From: david.r.chase at oracle.com (David Chase) Date: Tue, 4 Nov 2014 10:19:24 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5458A57C.4060208@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> Message-ID: <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> On 2014-11-04, at 5:07 AM, Peter Levart wrote: > Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? It can?t be an identityHashMap, because we are interning member names. In spite of my grumbling about benchmarking, I?m inclined to do that and try a couple of experiments. One possibility would be to use two data structures, one for interning, the other for communication with the VM. Because there?s no lookup in the VM data stucture it can just be an array that gets elements appended, and the synchronization dance is much simpler. For interning, maybe I use a ConcurrentHashMap, and I try the following idiom: mn = resolve(args) // deal with any errors mn? = chm.get(mn) if (mn? != null) return mn? // hoped-for-common-case synchronized (something) { mn? = chm.get(mn) if (mn? != null) return mn? txn_class = mn.getDeclaringClass() while (true) { redef_count = txn_class.redefCount() mn = resolve(args) shared_array.add(mn); // barrier, because we are a paranoid if (redef_count = redef_count.redefCount()) { chm.add(mn); // safe to publish to other Java threads. return mn; } shared_array.drop_last(); // Try again } } (Idiom gets slightly revised for the one or two other intern use cases, but this is the basic idea). David >> >> And another way to view this is that we?re now quibbling about performance, when we still >> have an existing correctness problem that this patch solves, so maybe we should just get this >> done and then file an RFE. > > Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. > > Regards, Peter > >> >> David From coleen.phillimore at oracle.com Tue Nov 4 15:26:39 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 04 Nov 2014 10:26:39 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5458A57C.4060208@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> Message-ID: <5458F02F.5020409@oracle.com> On 11/04/2014 05:07 AM, Peter Levart wrote: > On 11/03/2014 09:41 PM, David Chase wrote: >> On 2014-11-03, at 3:09 PM, Peter Levart wrote: >> >>> Hi David, >>> >>> I was thinking about the fact that java.lang.invoke code is leaking >>> into java.lang.Class. Perhaps, If you don't mind rewriting the code, >>> a better code structure would be, if j.l.Class changes only >>> consisted of adding a simple: >>> ? Peter, I agreed with your comment about java/lang/invoke things leaking into java/lang/Class. I think this should be in another class with a pointer in java/lang/Class to it. I'm adding jdk9-dev because I think the core-libs people may have an opinion about this. On the JVM side, I suggested jmethodID as an alternate place to store the Method* to save the JVM from knowing how to inspect the contents of the MemberName type. I'm not sure if that's the best solution since jmethodIDs leak memory and except for jvmti, the code assumes there aren't many. But I would like us to think of a better solution. My original idea was to save method->idnum() like we do with reflection but finding Method* from idnum can be complicated and apparently the code to to this is in assembly code for MethodHandles. I would be surprised if the extra level of indirection at these calls would be a performance issue given all the code to added intern these things. The idea that we should ship this because it works and file an RFE to rewrite it later is not acceptable to me. Thanks, Co9leen >>> This way all the worries about ordering of writes into array and/or >>> size are gone. The array is still used to quickly search for an >>> element, but VM only scans the linked-list. >>> >>> What do you think of this? >> I?m not sure. I know Coleen Ph would like to see that happen. >> >> A couple of people have vague plans to move more of the MemberName >> resolution into core libs. >> (Years ago I worked on a VM where *all* of this occurred in Java, but >> some of it was ahead of time.) > > Hi David, > >> >> I heard mention of ?we want to put more stuff in there? but I got the >> impression that already happened >> (there?s reflection data, for example) so I?m not sure that makes sense. > > Reflection is an API that is rooted in j.l.Class. If the plans are to > move some of the java.lang.invoke public API to java.lang package > (into the j.l.Class, ...), then this is understandable. > >> >> There?s also a proposal from people in the runtime to just use a >> jmethodid, take the hit of an extra indirection, >> and no need to for this worrisome jvm/java concurrency. > > The linked list of MemberName(s) is also worry-less and doesn't need > an extra indirection via jmethodid. Does the hit of extra indirection > occur when invoking a MethodHandle? > >> >> And if we instead wrote a hash table that only grew, and never >> relocated elements, we could >> (I think) allow non-synchronized O(1) probes of the table from the >> Java side, synchronized >> O(1) insertions from the Java side, and because nothing moves, a >> smaller dance with the >> VM. I?m rather tempted to look into this ? given the amount of work >> it would take to do the >> benchmarking to see if (a) jmethodid would have acceptable >> performance or (b) the existing >> costs are too high, I could instead just write fast code and be done. > > Are you thinking of an IdentityHashMap type of hash table (no > linked-list of elements for same bucket, just search for 1st free slot > on insert)? The problem would be how to pre-size the array. Count > declared members? > >> >> And another way to view this is that we?re now quibbling about >> performance, when we still >> have an existing correctness problem that this patch solves, so maybe >> we should just get this >> done and then file an RFE. > > Perhaps, yes. But note that questions about JMM and ordering of writes > to array elements are about correctness, not performance. > > Regards, Peter > >> >> David >> > From peter.levart at gmail.com Tue Nov 4 16:48:14 2014 From: peter.levart at gmail.com (Peter Levart) Date: Tue, 04 Nov 2014 17:48:14 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> Message-ID: <5459034E.8070809@gmail.com> On 11/04/2014 04:19 PM, David Chase wrote: > On 2014-11-04, at 5:07 AM, Peter Levart wrote: >> Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? > It can?t be an identityHashMap, because we are interning member names. I know it can't be IdentityHashMap - I just wondered if you were thinking of an IdentityHashMap-like data structure in contrast to standard HashMap-like. Not in terms of equality/hashCode used, but in terms of internal data structure. IdentityHashMap is just an array of elements (well pairs of them - key, value are placed in two consecutive array slots). Lookup searches for element linearly in the array starting from hashCode based index to the element if found or 1st empty array slot. It's very easy to implement if the only operations are get() and put() and could be used for interning and as a shared structure for VM to scan, but array has to be sized to at least 3/2 the number of elements for performance to not degrade. > In spite of my grumbling about benchmarking, I?m inclined to do that and try a couple of experiments. > One possibility would be to use two data structures, one for interning, the other for communication with the VM. > Because there?s no lookup in the VM data stucture it can just be an array that gets elements appended, > and the synchronization dance is much simpler. > > For interning, maybe I use a ConcurrentHashMap, and I try the following idiom: > > mn = resolve(args) > // deal with any errors > mn? = chm.get(mn) > if (mn? != null) return mn? // hoped-for-common-case > > synchronized (something) { > mn? = chm.get(mn) > if (mn? != null) return mn? > > txn_class = mn.getDeclaringClass() > > while (true) { > redef_count = txn_class.redefCount() > mn = resolve(args) > > shared_array.add(mn); > // barrier, because we are a paranoid > if (redef_count = redef_count.redefCount()) { > chm.add(mn); // safe to publish to other Java threads. > return mn; > } > shared_array.drop_last(); // Try again > } > } > > (Idiom gets slightly revised for the one or two other intern use cases, but this is the basic idea). Yes, that's similar to what I suggested by using a linked-list of MemberName(s) instead of the "shared_array" (easier to reason about ordering of writes) and a sorted array of MemberName(s) instead of the "chm" in your scheme above. ConcurrentHashMap would certainly be the most performant solution in terms of lookup/insertion-time and concurrent throughput, but it will use more heap than a simple packed array of MemberNames. CHM is much better now in JDK8 though regarding heap use. A combination of the two approaches is also possible: - instead of maintaining a "shared_array" of MemberName(s), have them form a linked-list (you trade a slot in array for 'next' pointer in MemberName) - use ConcurrentHashMap for interning. Regards, Peter > > David > >>> And another way to view this is that we?re now quibbling about performance, when we still >>> have an existing correctness problem that this patch solves, so maybe we should just get this >>> done and then file an RFE. >> Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. >> >> Regards, Peter >> >>> David From daniel.daugherty at oracle.com Tue Nov 4 18:26:02 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 04 Nov 2014 11:26:02 -0700 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <5458330E.1080207@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> Message-ID: <54591A3A.1090005@oracle.com> The cleanup is turning into a bigger change than the fast enter bucket itself so I'm spinning the cleanup into a new bug: JDK-8062851 cleanup ObjectMonitor offset adjustments https://bugs.openjdk.java.net/browse/JDK-8062851 Yes, this means that the Contended Locking cleanup bucket has reopened for yet another change... We'll get back to "fast enter" after the dust has settled... Dan On 11/3/14 6:59 PM, Daniel D. Daugherty wrote: > David, > > Thanks for the review! As usual, replies are embedded below... > > > On 11/2/14 9:44 PM, David Holmes wrote: >> Hi Dan, >> >> Looks good. > > Thanks! > > >> Couple of nits and one semantic query below ... >> >> src/cpu/sparc/vm/macroAssembler_sparc.cpp >> >> Formatting changes were a bit of a distraction. > > Yes, I have no idea what got into me. Normally I do formatting > changes separately so the noise does not distract... > > It turns out there is a constant defined that should be used > instead of all these literal '2's: > > src/share/vm/oops/markOop.hpp: monitor_value = 2 > > Typically used as follows: > > src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = > ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; > > I will clean this up just for the files that I'm touching as > part of this fix. > > >> >> --- >> >> src/cpu/x86/vm/macroAssembler_x86.cpp >> >> Formatting changes were a bit of a distraction. > > Same reply as for macroAssembler_sparc.cpp. > > >> 1929 // unconditionally set stackBox->_displaced_header = 3 >> 1930 movptr(Address(boxReg, 0), >> (int32_t)intptr_t(markOopDesc::unused_mark())); >> >> At 1870 we refer to box rather than stackBox. Also it takes some >> sleuthing to realize that "3" here is somehow a pseudonym for >> unused_mark(). Back up at 1808 we have a to-do: >> >> 1808 // use markOop::unused_mark() instead of "3". >> >> so the current change seems to be implementing that, even though >> other uses of "3" are left untouched. > > I'll take a look at cleaning those up also... > > In some cases markOopDesc::marked_value will work for the literal '3', > but in other cases we'll use markOop::unused_mark(): > > static markOop unused_mark() { > return (markOop) marked_value; > } > > to save us the noise of the (markOop) cast. > > >> --- >> >> src/share/vm/runtime/sharedRuntime.cpp >> >> 1794 JRT_BLOCK_ENTRY(void, >> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >> lock, JavaThread* thread)) >> 1795 if (!SafepointSynchronize::is_synchronizing()) { >> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >> return; >> >> Is it necessary to check is_synchronizing? If we are executing this >> code we are not at a safepoint and the quick_enter wont change that, >> so I'm not sure what we are guarding against. > > So this first state checker: > > src/share/vm/runtime/safepoint.hpp: > inline static bool is_synchronizing() { return _state == > _synchronizing; } > > means that we want to go to a safepoint and: > > inline static bool is_at_safepoint() { return _state == > _synchronized; } > > means that we are at a safepoint. Dice's optimization bails out if > we want to go to a safepoint and ObjectSynchronizer::quick_enter() > has a "No_Safepoint_Verifier nsv" in it so we're expecting that > code to be quick (and not go to a safepoint). I'm not seeing > anything obvious.... > > Sometimes we have to be careful with JavaThread suspend requests and > monitor acquisition, but I don't think that's a problem here... In > order for the "suspend requesting" thread to be surprised, the suspend > API, e.g., JVM/TI SuspendThread() has to return to the caller and then > the suspend target has do something unexpected like acquire a monitor > that it was previously blocked upon when it was suspended. We've had > bugs like that in the past... In this optimization case, our target > thread is not blocked on a contended monitor... > > In this particular case, the "suspend requesting" thread will set the > suspend request state on the target thread, but the target thread is > busy trying to enter this uncontended monitor (quickly). So the > "suspend requesting" thread, will request a no-op safepoint, but it > won't return from the suspend API until that safepoint completes. > The safepoint won't complete until the target thread is done acquiring > the previously uncontended monitor... so the target thread will be > suspended while holding the previous uncontended monitor and the > "suspend requesting" thread will return from the suspend API all > happy... > > Well, I don't see the reason either so I'll have to ping Dave Dice > and Karen Kinnear to see if either of them can fill in the history > here. This could be an abundance of caution case. > > >> --- >> >> src/share/vm/runtime/synchronizer.cpp >> >> Minor nit: line 153 the usual acronym is NPE (for >> NullPointerException) not NPX > > I'll do a search for uses of NPX and other uses of 'X' in exception > acronyms... > > >> >> Nit: 159 Thread * const ox >> >> Please change ox to owner. > > Will do. > > Thanks again for the review! > > Dan > > >> >> --- >> >> Thanks, >> David >> >> >> >> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have the Contended Locking fast enter bucket ready for review. >>> >>> The code changes in this bucket are primarily a quick_enter() >>> function that works on inflated but uncontended Java monitors. >>> This quick_enter() function is used on the "slow path" for Java >>> Monitor enter operations when the built-in "fast path" (read >>> assembly code) doesn't work. >>> >>> This work is being tracked by the following bug ID: >>> >>> JDK-8061553 Contended Locking fast enter bucket >>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>> >>> Here is the JEP link: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>> >>> 8061553 summary of changes: >>> >>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>> >>> - clean up spacing around some >>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>> - remove optional (EmitSync & 64) code >>> - change from cmp() to andcc() so icc.zf flag is set >>> >>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>> >>> - remove optional (EmitSync & 2) code >>> - rewrite LP64 inflated lock code that tries to CAS in >>> the new owner value to be more efficient >>> >>> interfaceSupport.hpp: >>> >>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>> JRT_BLOCK_ENTRY into two pieces. >>> >>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>> >>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>> to permit ObjectSynchronizer::quick_enter() call >>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>> >>> synchronizer.[ch]pp: >>> >>> - add ObjectSynchronizer::quick_enter() for entering an >>> inflated but unowned Java monitor without thread state >>> changes >>> >>> Testing: >>> >>> - Aurora Adhoc RT/SVC baseline batch >>> - JPRT test jobs >>> - MonitorEnterStresser micro-benchmark (in process) >>> - CallTimerGrid stress testing (in process) >>> - Aurora performance testing: >>> - out of the box for the "promotion" and 32-bit server configs >>> - heavy weight monitors for the "promotion" and 32-bit server >>> configs >>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>> (in process) >>> >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan > > From serguei.spitsyn at oracle.com Tue Nov 4 19:57:54 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 04 Nov 2014 11:57:54 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <5457E36A.3020800@oracle.com> References: <5457E36A.3020800@oracle.com> Message-ID: <54592FC2.7090406@oracle.com> Hi Jeremy and Coleen, I'm reviewing this too. We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. Thanks, Serguei On 11/3/14 12:19 PM, Coleen Phillimore wrote: > > Hi Jeremy, > > I reviewed your new code and it looks fine. I had one comment in > > http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html > > > The name "need_to_resolve" doesn't make sense when reading this code. > Isn't it more like "need_to_ensure_space" ? I think method resolution > with the other name, which it doesn't do. > > I was trying to find a way to make this new code not appear twice > (maybe with a local jvmtiEnv function get_jmethodID(m) - instanceK_h > is m->method_holder()). Agreed on the above. > > Also, I was trying to figure out if the new class in utilities called > chunkedList.hpp could be used to store jmethodIDs, since the data > structures are similar. There is still more things in JNIMethodBlock > has to do so I think a specialized structure is still needed (which is > why I originally wrote it to be very simple). I'm not sure if the > comment above it still applies. Maybe only the first and third > sentences. Can you rewrite the comment slightly? > > Your other comments in the changes are good. > > I can't completely answer your question about reusing free_methods - > but if a jmethodID is created provisionally in > InstanceKlass::get_jmethod_id and not needed because it loses the race > in the method id cache, it's never handed back to native code, so it's > safe to reuse. This is different than jmethodIDs for methods that are > unloaded. They are cleared and never reused. At least that's my > reading of this caching code but it's pretty complicated stuff. > > I've also run our nsk and jck vm/jvmti on this change and they all > passed. I'd be happy to sponsor it with these suggested changes and > it needs another reviewer. > > Thanks for diagnosing and fixing this problem! > Coleen > > > On 10/30/2014 01:02 PM, Jeremy Manson wrote: >> There's a significant regression in the speed of JVMTI >> GetClassMethods in >> JDK8. I've tracked this down to allocation of jmethodids in a tight >> loop. >> The issue can be addressed by preallocating enough space for all of the >> jmethodids when starting the operation and not iterating over all of the >> existing jmethodids when you allocate a new one. >> >> A patch is here: >> >> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >> >> A reproducible test case can be found here: >> >> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >> >> It's a benchmark, though: I have no idea how to turn it into a test. >> >> For whoever reviews it: can you explain to me why it is okay that >> this code >> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a >> lot of >> problems stemming from accidental reuse. >> >> Jeremy > From jeremymanson at google.com Tue Nov 4 19:58:26 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Tue, 4 Nov 2014 11:58:26 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <5457E36A.3020800@oracle.com> References: <5457E36A.3020800@oracle.com> Message-ID: Thanks for taking a look, Coleen! On Mon, Nov 3, 2014 at 12:19 PM, Coleen Phillimore < coleen.phillimore at oracle.com> wrote: > > Hi Jeremy, > > I reviewed your new code and it looks fine. I had one comment in > > http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/ > share/vm/prims/jvmtiEnv.cpp.udiff.html > > The name "need_to_resolve" doesn't make sense when reading this code. > Isn't it more like "need_to_ensure_space" ? I think method resolution with > the other name, which it doesn't do. > Hmmm... it is there to tell you that there are jmethodids for that class that haven't been instantiated. Is it all right if I change it to "jmethodids_found" (and reverse the sense of it)? > I was trying to find a way to make this new code not appear twice (maybe > with a local jvmtiEnv function get_jmethodID(m) - instanceK_h is > m->method_holder()). > You know, I initially did that, but this file is parsed with some weird XSL setup that doesn't allow methods other than the ones that map directly to the JVMTI calls. Also, I was trying to figure out if the new class in utilities called > chunkedList.hpp could be used to store jmethodIDs, since the data > structures are similar. There is still more things in JNIMethodBlock has > to do so I think a specialized structure is still needed (which is why I > originally wrote it to be very simple). I'm not sure if the comment above > it still applies. Maybe only the first and third sentences. Can you > rewrite the comment slightly? > chunkedList wouldn't work as is, because it doesn't let you parameterize the bucket size, but it could probably be made to work (in the same way I made this one work). It's also an oddly bare-bones class - I'm not sure why it doesn't have contains and insert methods and so on. I'm not in love with the idea of doing it, because a) it would complicate my backport and b) I don't really have a lot of time to do hotspot refactoring, but if you think it should happen, I can make it happen (perhaps not in a timely way :) ). As for the comment, I'll eliminate all but the first and third sentences. > Your other comments in the changes are good. > > I can't completely answer your question about reusing free_methods - but > if a jmethodID is created provisionally in InstanceKlass::get_jmethod_id > and not needed because it loses the race in the method id cache, it's never > handed back to native code, so it's safe to reuse. This is different than > jmethodIDs for methods that are unloaded. They are cleared and never > reused. At least that's my reading of this caching code but it's pretty > complicated stuff. > Ah, I see. Thanks. > I've also run our nsk and jck vm/jvmti on this change and they all > passed. I'd be happy to sponsor it with these suggested changes and it > needs another reviewer. > I've cc'd Chuck Rasbold, who has already reviewed it internally and given it the thumbs-up. I'm sure he would be happy to do so publicly, too. Thanks for diagnosing and fixing this problem! Happy to do it! And so are the programs that use my JVMTI! Jeremy From jeremymanson at google.com Tue Nov 4 19:59:33 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Tue, 4 Nov 2014 11:59:33 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <54592FC2.7090406@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> Message-ID: Weird coincidence. On Tue, Nov 4, 2014 at 11:57 AM, serguei.spitsyn at oracle.com < serguei.spitsyn at oracle.com> wrote: > Hi Jeremy and Coleen, > > I'm reviewing this too. > We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. > > Thanks, > Serguei > > On 11/3/14 12:19 PM, Coleen Phillimore wrote: > >> >> Hi Jeremy, >> >> I reviewed your new code and it looks fine. I had one comment in >> >> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/ >> share/vm/prims/jvmtiEnv.cpp.udiff.html >> >> The name "need_to_resolve" doesn't make sense when reading this code. >> Isn't it more like "need_to_ensure_space" ? I think method resolution with >> the other name, which it doesn't do. >> >> I was trying to find a way to make this new code not appear twice (maybe >> with a local jvmtiEnv function get_jmethodID(m) - instanceK_h is >> m->method_holder()). >> > > Agreed on the above. > Per my message to Coleen, you can't add methods to this file. All other possibilities seemed like overkill, but other suggestions welcome. Jeremy From coleen.phillimore at oracle.com Tue Nov 4 20:40:33 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 04 Nov 2014 15:40:33 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: <5457E36A.3020800@oracle.com> Message-ID: <545939C1.6040703@oracle.com> Hi Jeremy, Having Chuck reply publicly to the review would be good. We miss seeing his emails :) On 11/04/2014 02:58 PM, Jeremy Manson wrote: > > Thanks for taking a look, Coleen! > > On Mon, Nov 3, 2014 at 12:19 PM, Coleen Phillimore > > > wrote: > > > Hi Jeremy, > > I reviewed your new code and it looks fine. I had one comment in > > http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html > > > The name "need_to_resolve" doesn't make sense when reading this > code. Isn't it more like "need_to_ensure_space" ? I think method > resolution with the other name, which it doesn't do. > > > Hmmm... it is there to tell you that there are jmethodids for that > class that haven't been instantiated. Is it all right if I change it > to "jmethodids_found" (and reverse the sense of it)? Okay, yes jmethodids_found makes more sense to me in this context. > I was trying to find a way to make this new code not appear twice > (maybe with a local jvmtiEnv function get_jmethodID(m) - > instanceK_h is m->method_holder()). > > > You know, I initially did that, but this file is parsed with some > weird XSL setup that doesn't allow methods other than the ones that > map directly to the JVMTI calls. Oh, yes. You are right. The code is fine then. It's not too much duplicated. > > Also, I was trying to figure out if the new class in utilities > called chunkedList.hpp could be used to store jmethodIDs, since > the data structures are similar. There is still more things in > JNIMethodBlock has to do so I think a specialized structure is > still needed (which is why I originally wrote it to be very > simple). I'm not sure if the comment above it still applies. > Maybe only the first and third sentences. Can you rewrite the > comment slightly? > > > chunkedList wouldn't work as is, because it doesn't let you > parameterize the bucket size, but it could probably be made to work > (in the same way I made this one work). It's also an oddly bare-bones > class - I'm not sure why it doesn't have contains and insert methods > and so on. > > I'm not in love with the idea of doing it, because a) it would > complicate my backport and b) I don't really have a lot of time to do > hotspot refactoring, but if you think it should happen, I can make it > happen (perhaps not in a timely way :) ). > No, I don't think you should do this. It was a general comment that this utility class is available for such things but has only one use so far. > As for the comment, I'll eliminate all but the first and third sentences. Thanks! > Your other comments in the changes are good. > > I can't completely answer your question about reusing free_methods > - but if a jmethodID is created provisionally in > InstanceKlass::get_jmethod_id and not needed because it loses the > race in the method id cache, it's never handed back to native > code, so it's safe to reuse. This is different than jmethodIDs > for methods that are unloaded. They are cleared and never reused. > At least that's my reading of this caching code but it's pretty > complicated stuff. > > > Ah, I see. Thanks. > > I've also run our nsk and jck vm/jvmti on this change and they all > passed. I'd be happy to sponsor it with these suggested changes > and it needs another reviewer. > > > I've cc'd Chuck Rasbold, who has already reviewed it internally and > given it the thumbs-up. I'm sure he would be happy to do so publicly, > too. > > Thanks for diagnosing and fixing this problem! > > > Happy to do it! And so are the programs that use my JVMTI! > Thank you! If you commit and send me the result of hg export your changeset, then I'll get your comments also and won't get the chance to mess up and not use commit -u jmanson. Coleen > Jeremy > From coleen.phillimore at oracle.com Tue Nov 4 20:43:02 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 04 Nov 2014 15:43:02 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <54592FC2.7090406@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> Message-ID: <54593A56.4050603@oracle.com> On 11/04/2014 02:57 PM, serguei.spitsyn at oracle.com wrote: > Hi Jeremy and Coleen, > > I'm reviewing this too. > We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. Hi Serguei, I ran all of vm.quick.testlist on this which includes jvmti, jdi tests. I'll run jtreg jdi tests too (where are they?) Thanks, Coleen > > Thanks, > Serguei > > On 11/3/14 12:19 PM, Coleen Phillimore wrote: >> >> Hi Jeremy, >> >> I reviewed your new code and it looks fine. I had one comment in >> >> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html >> >> >> The name "need_to_resolve" doesn't make sense when reading this >> code. Isn't it more like "need_to_ensure_space" ? I think method >> resolution with the other name, which it doesn't do. >> >> I was trying to find a way to make this new code not appear twice >> (maybe with a local jvmtiEnv function get_jmethodID(m) - instanceK_h >> is m->method_holder()). > > Agreed on the above. > >> >> Also, I was trying to figure out if the new class in utilities called >> chunkedList.hpp could be used to store jmethodIDs, since the data >> structures are similar. There is still more things in JNIMethodBlock >> has to do so I think a specialized structure is still needed (which >> is why I originally wrote it to be very simple). I'm not sure if the >> comment above it still applies. Maybe only the first and third >> sentences. Can you rewrite the comment slightly? >> >> Your other comments in the changes are good. >> >> I can't completely answer your question about reusing free_methods - >> but if a jmethodID is created provisionally in >> InstanceKlass::get_jmethod_id and not needed because it loses the >> race in the method id cache, it's never handed back to native code, >> so it's safe to reuse. This is different than jmethodIDs for methods >> that are unloaded. They are cleared and never reused. At least >> that's my reading of this caching code but it's pretty complicated >> stuff. >> >> I've also run our nsk and jck vm/jvmti on this change and they all >> passed. I'd be happy to sponsor it with these suggested changes and >> it needs another reviewer. >> >> Thanks for diagnosing and fixing this problem! >> Coleen >> >> >> On 10/30/2014 01:02 PM, Jeremy Manson wrote: >>> There's a significant regression in the speed of JVMTI >>> GetClassMethods in >>> JDK8. I've tracked this down to allocation of jmethodids in a tight >>> loop. >>> The issue can be addressed by preallocating enough space for all of the >>> jmethodids when starting the operation and not iterating over all of >>> the >>> existing jmethodids when you allocate a new one. >>> >>> A patch is here: >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >>> >>> A reproducible test case can be found here: >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >>> >>> It's a benchmark, though: I have no idea how to turn it into a test. >>> >>> For whoever reviews it: can you explain to me why it is okay that >>> this code >>> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a >>> lot of >>> problems stemming from accidental reuse. >>> >>> Jeremy >> > From david.r.chase at oracle.com Tue Nov 4 20:54:03 2014 From: david.r.chase at oracle.com (David Chase) Date: Tue, 4 Nov 2014 15:54:03 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <5459034E.8070809@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> Message-ID: I?m working on the initial benchmarking, and so far this arrangement (with synchronization and binary search for lookup, lots of barriers and linear cost insertion) has not yet been any slower. I am nonetheless tempted by the 2-tables solution, because I think the simpler JVM-side interface that it allows is desirable. David On 2014-11-04, at 11:48 AM, Peter Levart wrote: > On 11/04/2014 04:19 PM, David Chase wrote: >> On 2014-11-04, at 5:07 AM, Peter Levart wrote: >>> Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? >> It can?t be an identityHashMap, because we are interning member names. > > I know it can't be IdentityHashMap - I just wondered if you were thinking of an IdentityHashMap-like data structure in contrast to standard HashMap-like. Not in terms of equality/hashCode used, but in terms of internal data structure. IdentityHashMap is just an array of elements (well pairs of them - key, value are placed in two consecutive array slots). Lookup searches for element linearly in the array starting from hashCode based index to the element if found or 1st empty array slot. It's very easy to implement if the only operations are get() and put() and could be used for interning and as a shared structure for VM to scan, but array has to be sized to at least 3/2 the number of elements for performance to not degrade. > >> In spite of my grumbling about benchmarking, I?m inclined to do that and try a couple of experiments. >> One possibility would be to use two data structures, one for interning, the other for communication with the VM. >> Because there?s no lookup in the VM data stucture it can just be an array that gets elements appended, >> and the synchronization dance is much simpler. >> >> For interning, maybe I use a ConcurrentHashMap, and I try the following idiom: >> >> mn = resolve(args) >> // deal with any errors >> mn? = chm.get(mn) >> if (mn? != null) return mn? // hoped-for-common-case >> >> synchronized (something) { >> mn? = chm.get(mn) >> if (mn? != null) return mn? >> txn_class = mn.getDeclaringClass() >> >> while (true) { >> redef_count = txn_class.redefCount() >> mn = resolve(args) >> >> shared_array.add(mn); >> // barrier, because we are a paranoid >> if (redef_count = redef_count.redefCount()) { >> chm.add(mn); // safe to publish to other Java threads. >> return mn; >> } >> shared_array.drop_last(); // Try again >> } >> } >> >> (Idiom gets slightly revised for the one or two other intern use cases, but this is the basic idea). > > Yes, that's similar to what I suggested by using a linked-list of MemberName(s) instead of the "shared_array" (easier to reason about ordering of writes) and a sorted array of MemberName(s) instead of the "chm" in your scheme above. ConcurrentHashMap would certainly be the most performant solution in terms of lookup/insertion-time and concurrent throughput, but it will use more heap than a simple packed array of MemberNames. CHM is much better now in JDK8 though regarding heap use. > > A combination of the two approaches is also possible: > > - instead of maintaining a "shared_array" of MemberName(s), have them form a linked-list (you trade a slot in array for 'next' pointer in MemberName) > - use ConcurrentHashMap for interning. > > Regards, Peter > >> >> David >> >>>> And another way to view this is that we?re now quibbling about performance, when we still >>>> have an existing correctness problem that this patch solves, so maybe we should just get this >>>> done and then file an RFE. >>> Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. >>> >>> Regards, Peter >>> >>>> David From jeremymanson at google.com Tue Nov 4 21:05:37 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Tue, 4 Nov 2014 13:05:37 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <54593A56.4050603@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <54593A56.4050603@oracle.com> Message-ID: FWIW, all of the JDK8 jtreg tests passed. On Tue, Nov 4, 2014 at 12:43 PM, Coleen Phillimore < coleen.phillimore at oracle.com> wrote: > > On 11/04/2014 02:57 PM, serguei.spitsyn at oracle.com wrote: > >> Hi Jeremy and Coleen, >> >> I'm reviewing this too. >> We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. >> > > Hi Serguei, I ran all of vm.quick.testlist on this which includes jvmti, > jdi tests. I'll run jtreg jdi tests too (where are they?) > > Thanks, > Coleen > > > >> Thanks, >> Serguei >> >> On 11/3/14 12:19 PM, Coleen Phillimore wrote: >> >>> >>> Hi Jeremy, >>> >>> I reviewed your new code and it looks fine. I had one comment in >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/ >>> share/vm/prims/jvmtiEnv.cpp.udiff.html >>> >>> The name "need_to_resolve" doesn't make sense when reading this code. >>> Isn't it more like "need_to_ensure_space" ? I think method resolution with >>> the other name, which it doesn't do. >>> >>> I was trying to find a way to make this new code not appear twice (maybe >>> with a local jvmtiEnv function get_jmethodID(m) - instanceK_h is >>> m->method_holder()). >>> >> >> Agreed on the above. >> >> >>> Also, I was trying to figure out if the new class in utilities called >>> chunkedList.hpp could be used to store jmethodIDs, since the data >>> structures are similar. There is still more things in JNIMethodBlock has >>> to do so I think a specialized structure is still needed (which is why I >>> originally wrote it to be very simple). I'm not sure if the comment above >>> it still applies. Maybe only the first and third sentences. Can you >>> rewrite the comment slightly? >>> >>> Your other comments in the changes are good. >>> >>> I can't completely answer your question about reusing free_methods - but >>> if a jmethodID is created provisionally in InstanceKlass::get_jmethod_id >>> and not needed because it loses the race in the method id cache, it's never >>> handed back to native code, so it's safe to reuse. This is different than >>> jmethodIDs for methods that are unloaded. They are cleared and never >>> reused. At least that's my reading of this caching code but it's pretty >>> complicated stuff. >>> >>> I've also run our nsk and jck vm/jvmti on this change and they all >>> passed. I'd be happy to sponsor it with these suggested changes and it >>> needs another reviewer. >>> >>> Thanks for diagnosing and fixing this problem! >>> Coleen >>> >>> >>> On 10/30/2014 01:02 PM, Jeremy Manson wrote: >>> >>>> There's a significant regression in the speed of JVMTI GetClassMethods >>>> in >>>> JDK8. I've tracked this down to allocation of jmethodids in a tight >>>> loop. >>>> The issue can be addressed by preallocating enough space for all of the >>>> jmethodids when starting the operation and not iterating over all of the >>>> existing jmethodids when you allocate a new one. >>>> >>>> A patch is here: >>>> >>>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >>>> >>>> A reproducible test case can be found here: >>>> >>>> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >>>> >>>> It's a benchmark, though: I have no idea how to turn it into a test. >>>> >>>> For whoever reviews it: can you explain to me why it is okay that this >>>> code >>>> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a lot >>>> of >>>> problems stemming from accidental reuse. >>>> >>>> Jeremy >>>> >>> >>> >> > From serguei.spitsyn at oracle.com Tue Nov 4 21:07:55 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 04 Nov 2014 13:07:55 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <54593A56.4050603@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <54593A56.4050603@oracle.com> Message-ID: <5459402B.5030304@oracle.com> On 11/4/14 12:43 PM, Coleen Phillimore wrote: > > On 11/04/2014 02:57 PM, serguei.spitsyn at oracle.com wrote: >> Hi Jeremy and Coleen, >> >> I'm reviewing this too. >> We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. > > Hi Serguei, I ran all of vm.quick.testlist on this which includes > jvmti, jdi tests. I'll run jtreg jdi tests too (where are they?) Hi Coleen, It is more safe to run the nsk.jvmti.testlist and nsk.jdi.testlist instead of the vm.quick.testlist. The jtreg jdi tests are in the /jdk/test/com/sun/jdi folder. Thanks, Serguei > > Thanks, > Coleen > >> >> Thanks, >> Serguei >> >> On 11/3/14 12:19 PM, Coleen Phillimore wrote: >>> >>> Hi Jeremy, >>> >>> I reviewed your new code and it looks fine. I had one comment in >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html >>> >>> >>> The name "need_to_resolve" doesn't make sense when reading this >>> code. Isn't it more like "need_to_ensure_space" ? I think method >>> resolution with the other name, which it doesn't do. >>> >>> I was trying to find a way to make this new code not appear twice >>> (maybe with a local jvmtiEnv function get_jmethodID(m) - instanceK_h >>> is m->method_holder()). >> >> Agreed on the above. >> >>> >>> Also, I was trying to figure out if the new class in utilities >>> called chunkedList.hpp could be used to store jmethodIDs, since the >>> data structures are similar. There is still more things in >>> JNIMethodBlock has to do so I think a specialized structure is still >>> needed (which is why I originally wrote it to be very simple). I'm >>> not sure if the comment above it still applies. Maybe only the first >>> and third sentences. Can you rewrite the comment slightly? >>> >>> Your other comments in the changes are good. >>> >>> I can't completely answer your question about reusing free_methods - >>> but if a jmethodID is created provisionally in >>> InstanceKlass::get_jmethod_id and not needed because it loses the >>> race in the method id cache, it's never handed back to native code, >>> so it's safe to reuse. This is different than jmethodIDs for >>> methods that are unloaded. They are cleared and never reused. At >>> least that's my reading of this caching code but it's pretty >>> complicated stuff. >>> >>> I've also run our nsk and jck vm/jvmti on this change and they all >>> passed. I'd be happy to sponsor it with these suggested changes and >>> it needs another reviewer. >>> >>> Thanks for diagnosing and fixing this problem! >>> Coleen >>> >>> >>> On 10/30/2014 01:02 PM, Jeremy Manson wrote: >>>> There's a significant regression in the speed of JVMTI >>>> GetClassMethods in >>>> JDK8. I've tracked this down to allocation of jmethodids in a tight >>>> loop. >>>> The issue can be addressed by preallocating enough space for all of >>>> the >>>> jmethodids when starting the operation and not iterating over all >>>> of the >>>> existing jmethodids when you allocate a new one. >>>> >>>> A patch is here: >>>> >>>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >>>> >>>> A reproducible test case can be found here: >>>> >>>> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >>>> >>>> It's a benchmark, though: I have no idea how to turn it into a test. >>>> >>>> For whoever reviews it: can you explain to me why it is okay that >>>> this code >>>> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a >>>> lot of >>>> problems stemming from accidental reuse. >>>> >>>> Jeremy >>> >> > From rasbold at google.com Tue Nov 4 21:11:32 2014 From: rasbold at google.com (Chuck Rasbold) Date: Tue, 4 Nov 2014 13:11:32 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <54593A56.4050603@oracle.com> Message-ID: Jeremy's webrev looks good to me. -- Chuck On Tue, Nov 4, 2014 at 1:05 PM, Jeremy Manson wrote: > FWIW, all of the JDK8 jtreg tests passed. > > On Tue, Nov 4, 2014 at 12:43 PM, Coleen Phillimore < > coleen.phillimore at oracle.com> wrote: > >> >> On 11/04/2014 02:57 PM, serguei.spitsyn at oracle.com wrote: >> >>> Hi Jeremy and Coleen, >>> >>> I'm reviewing this too. >>> We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. >>> >> >> Hi Serguei, I ran all of vm.quick.testlist on this which includes jvmti, >> jdi tests. I'll run jtreg jdi tests too (where are they?) >> >> Thanks, >> Coleen >> >> >> >>> Thanks, >>> Serguei >>> >>> On 11/3/14 12:19 PM, Coleen Phillimore wrote: >>> >>>> >>>> Hi Jeremy, >>>> >>>> I reviewed your new code and it looks fine. I had one comment in >>>> >>>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/ >>>> share/vm/prims/jvmtiEnv.cpp.udiff.html >>>> >>>> The name "need_to_resolve" doesn't make sense when reading this code. >>>> Isn't it more like "need_to_ensure_space" ? I think method resolution with >>>> the other name, which it doesn't do. >>>> >>>> I was trying to find a way to make this new code not appear twice >>>> (maybe with a local jvmtiEnv function get_jmethodID(m) - instanceK_h is >>>> m->method_holder()). >>>> >>> >>> Agreed on the above. >>> >>> >>>> Also, I was trying to figure out if the new class in utilities called >>>> chunkedList.hpp could be used to store jmethodIDs, since the data >>>> structures are similar. There is still more things in JNIMethodBlock has >>>> to do so I think a specialized structure is still needed (which is why I >>>> originally wrote it to be very simple). I'm not sure if the comment above >>>> it still applies. Maybe only the first and third sentences. Can you >>>> rewrite the comment slightly? >>>> >>>> Your other comments in the changes are good. >>>> >>>> I can't completely answer your question about reusing free_methods - >>>> but if a jmethodID is created provisionally in >>>> InstanceKlass::get_jmethod_id and not needed because it loses the race in >>>> the method id cache, it's never handed back to native code, so it's safe to >>>> reuse. This is different than jmethodIDs for methods that are unloaded. >>>> They are cleared and never reused. At least that's my reading of this >>>> caching code but it's pretty complicated stuff. >>>> >>>> I've also run our nsk and jck vm/jvmti on this change and they all >>>> passed. I'd be happy to sponsor it with these suggested changes and it >>>> needs another reviewer. >>>> >>>> Thanks for diagnosing and fixing this problem! >>>> Coleen >>>> >>>> >>>> On 10/30/2014 01:02 PM, Jeremy Manson wrote: >>>> >>>>> There's a significant regression in the speed of JVMTI GetClassMethods >>>>> in >>>>> JDK8. I've tracked this down to allocation of jmethodids in a tight >>>>> loop. >>>>> The issue can be addressed by preallocating enough space for all of the >>>>> jmethodids when starting the operation and not iterating over all of >>>>> the >>>>> existing jmethodids when you allocate a new one. >>>>> >>>>> A patch is here: >>>>> >>>>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >>>>> >>>>> A reproducible test case can be found here: >>>>> >>>>> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >>>>> >>>>> It's a benchmark, though: I have no idea how to turn it into a test. >>>>> >>>>> For whoever reviews it: can you explain to me why it is okay that this >>>>> code >>>>> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a lot >>>>> of >>>>> problems stemming from accidental reuse. >>>>> >>>>> Jeremy >>>>> >>>> >>>> >>> >> > From serguei.spitsyn at oracle.com Tue Nov 4 22:15:56 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 04 Nov 2014 14:15:56 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <54592FC2.7090406@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> Message-ID: <5459501C.4040807@oracle.com> Jeremy and Coleen, Thank you for taking care about this bug! The fix looks good to me. I do not see any issues. Coleen, Please, let me know if you need any help with testing or anything else. Thanks, Serguei On 11/4/14 11:57 AM, serguei.spitsyn at oracle.com wrote: > Hi Jeremy and Coleen, > > I'm reviewing this too. > We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. > > Thanks, > Serguei > > On 11/3/14 12:19 PM, Coleen Phillimore wrote: >> >> Hi Jeremy, >> >> I reviewed your new code and it looks fine. I had one comment in >> >> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html >> >> >> The name "need_to_resolve" doesn't make sense when reading this >> code. Isn't it more like "need_to_ensure_space" ? I think method >> resolution with the other name, which it doesn't do. >> >> I was trying to find a way to make this new code not appear twice >> (maybe with a local jvmtiEnv function get_jmethodID(m) - instanceK_h >> is m->method_holder()). > > Agreed on the above. > >> >> Also, I was trying to figure out if the new class in utilities called >> chunkedList.hpp could be used to store jmethodIDs, since the data >> structures are similar. There is still more things in JNIMethodBlock >> has to do so I think a specialized structure is still needed (which >> is why I originally wrote it to be very simple). I'm not sure if the >> comment above it still applies. Maybe only the first and third >> sentences. Can you rewrite the comment slightly? >> >> Your other comments in the changes are good. >> >> I can't completely answer your question about reusing free_methods - >> but if a jmethodID is created provisionally in >> InstanceKlass::get_jmethod_id and not needed because it loses the >> race in the method id cache, it's never handed back to native code, >> so it's safe to reuse. This is different than jmethodIDs for methods >> that are unloaded. They are cleared and never reused. At least >> that's my reading of this caching code but it's pretty complicated >> stuff. >> >> I've also run our nsk and jck vm/jvmti on this change and they all >> passed. I'd be happy to sponsor it with these suggested changes and >> it needs another reviewer. >> >> Thanks for diagnosing and fixing this problem! >> Coleen >> >> >> On 10/30/2014 01:02 PM, Jeremy Manson wrote: >>> There's a significant regression in the speed of JVMTI >>> GetClassMethods in >>> JDK8. I've tracked this down to allocation of jmethodids in a tight >>> loop. >>> The issue can be addressed by preallocating enough space for all of the >>> jmethodids when starting the operation and not iterating over all of >>> the >>> existing jmethodids when you allocate a new one. >>> >>> A patch is here: >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >>> >>> A reproducible test case can be found here: >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >>> >>> It's a benchmark, though: I have no idea how to turn it into a test. >>> >>> For whoever reviews it: can you explain to me why it is okay that >>> this code >>> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a >>> lot of >>> problems stemming from accidental reuse. >>> >>> Jeremy >> > From jeremymanson at google.com Wed Nov 5 01:52:50 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Tue, 4 Nov 2014 17:52:50 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <5459501C.4040807@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> Message-ID: Updated patch here: http://cr.openjdk.java.net/~jmanson/8062116/webrev.01/ Jeremy On Tue, Nov 4, 2014 at 2:15 PM, serguei.spitsyn at oracle.com < serguei.spitsyn at oracle.com> wrote: > Jeremy and Coleen, > > Thank you for taking care about this bug! > > The fix looks good to me. > I do not see any issues. > > Coleen, > > Please, let me know if you need any help with testing or anything else. > > Thanks, > Serguei > > > On 11/4/14 11:57 AM, serguei.spitsyn at oracle.com wrote: > >> Hi Jeremy and Coleen, >> >> I'm reviewing this too. >> We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. >> >> Thanks, >> Serguei >> >> On 11/3/14 12:19 PM, Coleen Phillimore wrote: >> >>> >>> Hi Jeremy, >>> >>> I reviewed your new code and it looks fine. I had one comment in >>> >>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/ >>> share/vm/prims/jvmtiEnv.cpp.udiff.html >>> >>> The name "need_to_resolve" doesn't make sense when reading this code. >>> Isn't it more like "need_to_ensure_space" ? I think method resolution with >>> the other name, which it doesn't do. >>> >>> I was trying to find a way to make this new code not appear twice (maybe >>> with a local jvmtiEnv function get_jmethodID(m) - instanceK_h is >>> m->method_holder()). >>> >> >> Agreed on the above. >> >> >>> Also, I was trying to figure out if the new class in utilities called >>> chunkedList.hpp could be used to store jmethodIDs, since the data >>> structures are similar. There is still more things in JNIMethodBlock has >>> to do so I think a specialized structure is still needed (which is why I >>> originally wrote it to be very simple). I'm not sure if the comment above >>> it still applies. Maybe only the first and third sentences. Can you >>> rewrite the comment slightly? >>> >>> Your other comments in the changes are good. >>> >>> I can't completely answer your question about reusing free_methods - but >>> if a jmethodID is created provisionally in InstanceKlass::get_jmethod_id >>> and not needed because it loses the race in the method id cache, it's never >>> handed back to native code, so it's safe to reuse. This is different than >>> jmethodIDs for methods that are unloaded. They are cleared and never >>> reused. At least that's my reading of this caching code but it's pretty >>> complicated stuff. >>> >>> I've also run our nsk and jck vm/jvmti on this change and they all >>> passed. I'd be happy to sponsor it with these suggested changes and it >>> needs another reviewer. >>> >>> Thanks for diagnosing and fixing this problem! >>> Coleen >>> >>> >>> On 10/30/2014 01:02 PM, Jeremy Manson wrote: >>> >>>> There's a significant regression in the speed of JVMTI GetClassMethods >>>> in >>>> JDK8. I've tracked this down to allocation of jmethodids in a tight >>>> loop. >>>> The issue can be addressed by preallocating enough space for all of the >>>> jmethodids when starting the operation and not iterating over all of the >>>> existing jmethodids when you allocate a new one. >>>> >>>> A patch is here: >>>> >>>> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >>>> >>>> A reproducible test case can be found here: >>>> >>>> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >>>> >>>> It's a benchmark, though: I have no idea how to turn it into a test. >>>> >>>> For whoever reviews it: can you explain to me why it is okay that this >>>> code >>>> reuses jmethodIDs (in JNIMethodBlock::add_method? I can imagine a lot >>>> of >>>> problems stemming from accidental reuse. >>>> >>>> Jeremy >>>> >>> >>> >> > From daniel.daugherty at oracle.com Wed Nov 5 04:34:53 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 04 Nov 2014 21:34:53 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) Message-ID: <5459A8ED.8060808@oracle.com> Greetings, I have a Contended Locking cleanup bucket fix ready for review. This fix was spun off from the Contended Locking fast enter bucket which was sent out for review late last week. This fix cleans up the computation of ObjectMonitor field pointers and gets rid of the use of literal '-2' in appropriate places. For example: - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, Rscratch); + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the specified field and subtracts markOopDesc:monitor_value (2). There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. Thanks to David Holmes for his comments on JDK-8061553 that motivated this (long overdue) cleanup. This work is being tracked by the following bug ID: JDK-8062851 cleanup ObjectMonitor offset adjustments https://bugs.openjdk.java.net/browse/JDK-8062851 Here is the webrev URL: http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ Here is the JEP link: https://bugs.openjdk.java.net/browse/JDK-8046133 Testing: - JPRT test jobs (since this is only syntax and comment cleanup) Thanks, in advance, for any comments, questions or suggestions. Dan From serguei.spitsyn at oracle.com Wed Nov 5 04:56:27 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 04 Nov 2014 20:56:27 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> Message-ID: <5459ADFB.4090808@oracle.com> The fix looks good in general. src/share/vm/oops/method.cpp 1785 bool contains(Method** m) { 1786 if (m == NULL) return false; 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { *1789 ptrdiff_t idx = m - b->_methods;** **1790 if (b->_methods + idx == m) {** 1791 return true; 1792 }* 1793 } 1794 } 1795 return false; // not found 1796 } Just noticed that the lines 1789-1792 can be replaced with one liner: * return true;* It is because the condition *(b->_methods + idx == m)* is always true. :) Also, should we check the condition: **m != _free_method*** ? What about the following ?: * return (****m != _free_method***);* Thanks, Serguei On 11/4/14 5:52 PM, Jeremy Manson wrote: > Updated patch here: > > http://cr.openjdk.java.net/~jmanson/8062116/webrev.01/ > > > Jeremy > > On Tue, Nov 4, 2014 at 2:15 PM, serguei.spitsyn at oracle.com > > wrote: > > Jeremy and Coleen, > > Thank you for taking care about this bug! > > The fix looks good to me. > I do not see any issues. > > Coleen, > > Please, let me know if you need any help with testing or anything > else. > > Thanks, > Serguei > > > On 11/4/14 11:57 AM, serguei.spitsyn at oracle.com > wrote: > > Hi Jeremy and Coleen, > > I'm reviewing this too. > We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. > > Thanks, > Serguei > > On 11/3/14 12:19 PM, Coleen Phillimore wrote: > > > Hi Jeremy, > > I reviewed your new code and it looks fine. I had one > comment in > > http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html > > > > The name "need_to_resolve" doesn't make sense when reading > this code. Isn't it more like "need_to_ensure_space" ? I > think method resolution with the other name, which it > doesn't do. > > I was trying to find a way to make this new code not > appear twice (maybe with a local jvmtiEnv function > get_jmethodID(m) - instanceK_h is m->method_holder()). > > > Agreed on the above. > > > Also, I was trying to figure out if the new class in > utilities called chunkedList.hpp could be used to store > jmethodIDs, since the data structures are similar. There > is still more things in JNIMethodBlock has to do so I > think a specialized structure is still needed (which is > why I originally wrote it to be very simple). I'm not > sure if the comment above it still applies. Maybe only the > first and third sentences. Can you rewrite the comment > slightly? > > Your other comments in the changes are good. > > I can't completely answer your question about reusing > free_methods - but if a jmethodID is created provisionally > in InstanceKlass::get_jmethod_id and not needed because it > loses the race in the method id cache, it's never handed > back to native code, so it's safe to reuse. This is > different than jmethodIDs for methods that are unloaded. > They are cleared and never reused. At least that's my > reading of this caching code but it's pretty complicated > stuff. > > I've also run our nsk and jck vm/jvmti on this change and > they all passed. I'd be happy to sponsor it with these > suggested changes and it needs another reviewer. > > Thanks for diagnosing and fixing this problem! > Coleen > > > On 10/30/2014 01:02 PM, Jeremy Manson wrote: > > There's a significant regression in the speed of JVMTI > GetClassMethods in > JDK8. I've tracked this down to allocation of > jmethodids in a tight loop. > The issue can be addressed by preallocating enough > space for all of the > jmethodids when starting the operation and not > iterating over all of the > existing jmethodids when you allocate a new one. > > A patch is here: > > http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ > > > A reproducible test case can be found here: > > http://cr.openjdk.java.net/~jmanson/8062116/repro/ > > > It's a benchmark, though: I have no idea how to turn > it into a test. > > For whoever reviews it: can you explain to me why it > is okay that this code > reuses jmethodIDs (in JNIMethodBlock::add_method? I > can imagine a lot of > problems stemming from accidental reuse. > > Jeremy > > > > > From serguei.spitsyn at oracle.com Wed Nov 5 06:08:05 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 04 Nov 2014 22:08:05 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <5459ADFB.4090808@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> Message-ID: <5459BEC5.4090809@oracle.com> Got rid of the bold selection below to make it more readable. Thanks, Serguei On 11/4/14 8:56 PM, serguei.spitsyn at oracle.com wrote: > The fix looks good in general. > > src/share/vm/oops/method.cpp > 1785 bool contains(Method** m) { > 1786 if (m == NULL) return false; > 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { > 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { > 1789 ptrdiff_t idx = m - b->_methods; > 1790 if (b->_methods + idx == m) { > 1791 return true; > 1792 } > 1793 } > 1794 } > 1795 return false; // not found > 1796 } > > Just noticed that the lines 1789-1792 can be replaced with one liner: > **return true; > > It is because the condition (b->_methods + idx == m) is always true. > :) > > Also, should we check the condition: *m != _free_method? > What about the following ?: > **return (*m != _free_method); > > > Thanks, > Serguei > > > On 11/4/14 5:52 PM, Jeremy Manson wrote: >> Updated patch here: >> >> http://cr.openjdk.java.net/~jmanson/8062116/webrev.01/ >> >> >> Jeremy >> >> On Tue, Nov 4, 2014 at 2:15 PM, serguei.spitsyn at oracle.com >> > > wrote: >> >> Jeremy and Coleen, >> >> Thank you for taking care about this bug! >> >> The fix looks good to me. >> I do not see any issues. >> >> Coleen, >> >> Please, let me know if you need any help with testing or anything >> else. >> >> Thanks, >> Serguei >> >> >> On 11/4/14 11:57 AM, serguei.spitsyn at oracle.com >> wrote: >> >> Hi Jeremy and Coleen, >> >> I'm reviewing this too. >> We also need to run the nsk.jvmti, nsk.jdi and jtreg jdi tests. >> >> Thanks, >> Serguei >> >> On 11/3/14 12:19 PM, Coleen Phillimore wrote: >> >> >> Hi Jeremy, >> >> I reviewed your new code and it looks fine. I had one >> comment in >> >> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/src/share/vm/prims/jvmtiEnv.cpp.udiff.html >> >> >> >> The name "need_to_resolve" doesn't make sense when >> reading this code. Isn't it more like >> "need_to_ensure_space" ? I think method resolution with >> the other name, which it doesn't do. >> >> I was trying to find a way to make this new code not >> appear twice (maybe with a local jvmtiEnv function >> get_jmethodID(m) - instanceK_h is m->method_holder()). >> >> >> Agreed on the above. >> >> >> Also, I was trying to figure out if the new class in >> utilities called chunkedList.hpp could be used to store >> jmethodIDs, since the data structures are similar. There >> is still more things in JNIMethodBlock has to do so I >> think a specialized structure is still needed (which is >> why I originally wrote it to be very simple). I'm not >> sure if the comment above it still applies. Maybe only >> the first and third sentences. Can you rewrite the >> comment slightly? >> >> Your other comments in the changes are good. >> >> I can't completely answer your question about reusing >> free_methods - but if a jmethodID is created >> provisionally in InstanceKlass::get_jmethod_id and not >> needed because it loses the race in the method id cache, >> it's never handed back to native code, so it's safe to >> reuse. This is different than jmethodIDs for methods >> that are unloaded. They are cleared and never reused. >> At least that's my reading of this caching code but it's >> pretty complicated stuff. >> >> I've also run our nsk and jck vm/jvmti on this change and >> they all passed. I'd be happy to sponsor it with these >> suggested changes and it needs another reviewer. >> >> Thanks for diagnosing and fixing this problem! >> Coleen >> >> >> On 10/30/2014 01:02 PM, Jeremy Manson wrote: >> >> There's a significant regression in the speed of >> JVMTI GetClassMethods in >> JDK8. I've tracked this down to allocation of >> jmethodids in a tight loop. >> The issue can be addressed by preallocating enough >> space for all of the >> jmethodids when starting the operation and not >> iterating over all of the >> existing jmethodids when you allocate a new one. >> >> A patch is here: >> >> http://cr.openjdk.java.net/~jmanson/8062116/webrev.00/ >> >> A reproducible test case can be found here: >> >> http://cr.openjdk.java.net/~jmanson/8062116/repro/ >> >> >> It's a benchmark, though: I have no idea how to turn >> it into a test. >> >> For whoever reviews it: can you explain to me why it >> is okay that this code >> reuses jmethodIDs (in JNIMethodBlock::add_method? I >> can imagine a lot of >> problems stemming from accidental reuse. >> >> Jeremy >> >> >> >> >> > From david.holmes at oracle.com Wed Nov 5 10:42:25 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 05 Nov 2014 20:42:25 +1000 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <5459A8ED.8060808@oracle.com> References: <5459A8ED.8060808@oracle.com> Message-ID: <5459FF11.1080801@oracle.com> Hi Dan, Reviewed. I find the name OM_OFFSET_NO_MONITOR_VALUE somewhat awkward but have no better suggestion. In fact I have to ask what _is_ the object monitor tagging mechanism? I can't see it defined in the objectMonitor.* files. ?? Thanks, David On 5/11/2014 2:34 PM, Daniel D. Daugherty wrote: > Greetings, > > I have a Contended Locking cleanup bucket fix ready for review. > > This fix was spun off from the Contended Locking fast enter bucket > which was sent out for review late last week. This fix cleans up > the computation of ObjectMonitor field pointers and gets rid of > the use of literal '-2' in appropriate places. For example: > > - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, > Rscratch); > + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); > > The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the > specified field and subtracts markOopDesc:monitor_value (2). > There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. > > Thanks to David Holmes for his comments on JDK-8061553 that > motivated this (long overdue) cleanup. > > This work is being tracked by the following bug ID: > > JDK-8062851 cleanup ObjectMonitor offset adjustments > https://bugs.openjdk.java.net/browse/JDK-8062851 > > Here is the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ > > Here is the JEP link: > > https://bugs.openjdk.java.net/browse/JDK-8046133 > > Testing: > > - JPRT test jobs (since this is only syntax and comment cleanup) > > Thanks, in advance, for any comments, questions or suggestions. > > Dan From christian.tornqvist at oracle.com Wed Nov 5 14:54:27 2014 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Wed, 5 Nov 2014 09:54:27 -0500 Subject: RFR(XS): 8061733 - [TESTBUG] Exclude tests that have issues with Jigsaw M2 changes Message-ID: <013c01cff908$65d00560$31701020$@oracle.com> Hi everyone, Please review this small change that adds @ignore to one test that fails when running with the upcoming changes for Jigsaw M2. The affected test is not critical and will be fixed at a later time. Webrev: http://cr.openjdk.java.net/~ctornqvi/webrev/8061733/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8061733 Thanks, Christian From lois.foltan at oracle.com Wed Nov 5 15:01:12 2014 From: lois.foltan at oracle.com (Lois Foltan) Date: Wed, 05 Nov 2014 10:01:12 -0500 Subject: RFR(XS): 8061733 - [TESTBUG] Exclude tests that have issues with Jigsaw M2 changes In-Reply-To: <013c01cff908$65d00560$31701020$@oracle.com> References: <013c01cff908$65d00560$31701020$@oracle.com> Message-ID: <545A3BB8.7020400@oracle.com> Looks good. Lois On 11/5/2014 9:54 AM, Christian Tornqvist wrote: > Hi everyone, > > > > Please review this small change that adds @ignore to one test that fails > when running with the upcoming changes for Jigsaw M2. The affected test is > not critical and will be fixed at a later time. > > > > Webrev: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8061733/webrev.00/ > > > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8061733 > > > > Thanks, > > Christian > > > From george.triantafillou at oracle.com Wed Nov 5 15:01:31 2014 From: george.triantafillou at oracle.com (George Triantafillou) Date: Wed, 05 Nov 2014 10:01:31 -0500 Subject: RFR(XS): 8061733 - [TESTBUG] Exclude tests that have issues with Jigsaw M2 changes In-Reply-To: <013c01cff908$65d00560$31701020$@oracle.com> References: <013c01cff908$65d00560$31701020$@oracle.com> Message-ID: <545A3BCB.5010604@oracle.com> Christian, Looks good. -George On 11/5/2014 9:54 AM, Christian Tornqvist wrote: > Hi everyone, > > > > Please review this small change that adds @ignore to one test that fails > when running with the upcoming changes for Jigsaw M2. The affected test is > not critical and will be fixed at a later time. > > > > Webrev: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8061733/webrev.00/ > > > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8061733 > > > > Thanks, > > Christian > > > From daniel.daugherty at oracle.com Wed Nov 5 15:29:56 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 05 Nov 2014 08:29:56 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <5459FF11.1080801@oracle.com> References: <5459A8ED.8060808@oracle.com> <5459FF11.1080801@oracle.com> Message-ID: <545A4274.6090409@oracle.com> On 11/5/14 3:42 AM, David Holmes wrote: > Hi Dan, > > Reviewed. Thanks! > I find the name OM_OFFSET_NO_MONITOR_VALUE somewhat awkward but have > no better suggestion. Understood. I didn't like the original "OFFSET_SKEWED" name especially since I was moving it to objectMonitor.hpp... If you think of a better, let me know... we can always change it. > In fact I have to ask what _is_ the object monitor tagging mechanism? > I can't see it defined in the objectMonitor.* files. ?? That would be this code: src/share/vm/oops/markOop.hpp: 317 static markOop encode(ObjectMonitor* monitor) { 318 intptr_t tmp = (intptr_t) monitor; 319 return (markOop) (tmp | monitor_value); 320 } and the other methods in that file that have to account for the monitor_value being set... Dan > > Thanks, > David > > On 5/11/2014 2:34 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Contended Locking cleanup bucket fix ready for review. >> >> This fix was spun off from the Contended Locking fast enter bucket >> which was sent out for review late last week. This fix cleans up >> the computation of ObjectMonitor field pointers and gets rid of >> the use of literal '-2' in appropriate places. For example: >> >> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >> Rscratch); >> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >> >> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >> specified field and subtracts markOopDesc:monitor_value (2). >> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >> >> Thanks to David Holmes for his comments on JDK-8061553 that >> motivated this (long overdue) cleanup. >> >> This work is being tracked by the following bug ID: >> >> JDK-8062851 cleanup ObjectMonitor offset adjustments >> https://bugs.openjdk.java.net/browse/JDK-8062851 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >> >> Here is the JEP link: >> >> https://bugs.openjdk.java.net/browse/JDK-8046133 >> >> Testing: >> >> - JPRT test jobs (since this is only syntax and comment cleanup) >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan From claes.redestad at oracle.com Wed Nov 5 15:49:45 2014 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 05 Nov 2014 16:49:45 +0100 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <5459A8ED.8060808@oracle.com> References: <5459A8ED.8060808@oracle.com> Message-ID: <545A4719.50705@oracle.com> Hi, On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: > Greetings, > > I have a Contended Locking cleanup bucket fix ready for review. > > This fix was spun off from the Contended Locking fast enter bucket > which was sent out for review late last week. This fix cleans up > the computation of ObjectMonitor field pointers and gets rid of > the use of literal '-2' in appropriate places. For example: > > - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, > Rscratch); > + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); > > The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the > specified field and subtracts markOopDesc:monitor_value (2). > There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. any reason not to add it as a function in objectMonitor.hpp instead of a macro? How about: static int no_monitor_offset_in_bytes() { return offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } Example usage: - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, Rscratch); + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), Rscratch); Seems this should be inlined regardless and looks a bit cleaner to me. Thanks! /Claes > > Thanks to David Holmes for his comments on JDK-8061553 that > motivated this (long overdue) cleanup. > > This work is being tracked by the following bug ID: > > JDK-8062851 cleanup ObjectMonitor offset adjustments > https://bugs.openjdk.java.net/browse/JDK-8062851 > > Here is the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ > > Here is the JEP link: > > https://bugs.openjdk.java.net/browse/JDK-8046133 > > Testing: > > - JPRT test jobs (since this is only syntax and comment cleanup) > > Thanks, in advance, for any comments, questions or suggestions. > > Dan From george.triantafillou at oracle.com Wed Nov 5 16:01:08 2014 From: george.triantafillou at oracle.com (George Triantafillou) Date: Wed, 05 Nov 2014 11:01:08 -0500 Subject: RFR(S): 8058251 - assert(_count > 0) failed: Negative counter when running runtime/NMT/MallocTrackingVerify.java In-Reply-To: <54394D75.8080601@oracle.com> References: <028001cfe4e5$8078ae30$816a0a90$@oracle.com> <54394D75.8080601@oracle.com> Message-ID: <545A49C4.3040808@oracle.com> Hi Christian, This looks good. Thanks for fixing this. As Coleen requested, I filed 8062870 and assigned it to her. -George On 10/11/2014 11:32 AM, Coleen Phillimore wrote: > > Hi Christian, > > This is a good cleanup. As we were talking about, I suspect that the > tracking level was in the header for startup so that it could be > increased, which is something that isn't used. > > We should write a test that explicitly overflows the malloc site table > buckets though, if we don't have one already. > > But this code looks good and we should file another bug for the malloc > site table overflows and poor hashing. > > Thanks, > Coleen > > On 10/10/14, 7:54 PM, Christian Tornqvist wrote: >> Hi everyone, >> >> >> Fairly small change which fixes one of the instances of assert(count >> > 0), >> the issue was that the mallocSiteTable became full, NMT changed from >> detail >> to summary but never updated the tracking level field in the malloc >> header. >> Since the malloc was never inserted into the mallocSiteTable we didn't >> update the bucket and position in the malloc header and when we later >> on was >> trying to free that memory block we found tracking level == detailed and >> used the never initialized fields for bucket and position indexes. >> >> >> The only place that looked at the level field in the header was >> MallocHeader::release and it could check the global level state >> instead. So >> I removed the 2bit level field from the malloc headers and this >> enabled me >> to get rid of the 30bit malloc limitation on 32bit systems. >> >> >> Also fixed a sign conversion issue on 32bit platforms in WB API >> NMTMallocWithPseudoStack. >> >> >> Note that this fix doesn't solve all the sources for the assert and >> I'm not >> going to enable the test at this point as we continue to track down the >> additional issues. >> >> >> The fix has been tested using jprt and aurora adhoc with NMT. >> >> >> Webrev: >> >> http://cr.openjdk.java.net/~ctornqvi/webrev/8058251/webrev.00/ >> >> >> Bug: >> >> https://bugs.openjdk.java.net/browse/JDK-8058251 >> >> >> Thanks, >> >> Christian >> >> > From coleen.phillimore at oracle.com Wed Nov 5 17:33:13 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 05 Nov 2014 12:33:13 -0500 Subject: [9] RFR (S): 8060147: SIGSEGV in Metadata::mark_on_stack() while marking metadata in ciEnv In-Reply-To: <5452A077.2050903@oracle.com> References: <5450F261.60400@oracle.com> <545114DF.7040005@oracle.com> <54511744.4060904@oracle.com> <5451F43A.1010108@oracle.com> <5452128C.4090408@oracle.com> <54522805.5040701@oracle.com> <1421AC31-CBEA-4AED-A657-3C43B258A6DB@oracle.com> <54522357.4070705@oracle.com> <5452425D.7040405@oracle.com> <5452517C.4050104@oracle.com> <54527E1E.1070507@oracle.com> <5452A786.7030000@oracle.com> <5452A077.2050903@oracle.com> Message-ID: <545A5F59.2020907@oracle.com> On 10/30/14, 4:32 PM, Vladimir Ivanov wrote: > Coleen, > > I implemented 2 approaches of the fix. > > The fix with a special case for VM anon classes is: > http://cr.openjdk.java.net/~vlivanov/8060147/webrev.anon.00/ > > Both fix the bug, but have different properties. > > (1) Special case for VM anon class is very focused on the actual > cause, but more fragile - all the logic which keeps metadata from > being deallocated is non-trivial and scattered around the whole > ciMetadata hierarchy. > > (2) On the other hand, initial version, which forcibly creates > klass_holder ciObject for each ciMetadata, is much cleaner and > localized, but does unnecessary work. > > Am I right that you prefer (1) as a fix? Yes, I think this version does less unnecessary work and creates less ciObjects. And the comment is useful for finding how we keep ciMetadata alive for anonymous classes. You still have a UseNewCode in the webrev thought that you want to take out. > >> I'm sorry that I didn't get to my email today but from the discussion I >> think changing two occurrences of "class_loader" in >> ciInstanceKlass::ciInstanceKlass to "klass_holder" would have solved >> your problem. > I don't think that's what we want. For VM anon classes, _loader == > NULL, but if we place java_mirror there instead, it could cause > problems in other parts of VM, since non-NULL _loader value implicates > ClassLoader instance. Not sure all these places are guarded against > seeing VM anon classes. I thought that field was only added to hold the class_loader as a holder but if you think using mirror would cause problems, it seems like a reason to not do this. I reviewed this code. Coleen > > >> Unless, you can add a ciMethod or ciMethodData without adding a >> ciInstanceKlass (which I don't think you can). > It's not possible right now. But ciObjectFactory doesn't forbid that. > >> I think Roland pointed out a flaw though that you can safepoint before >> adding a ciInstanceKlass though, which you could fix by moving this up >> in ciMethod::ciMethod to before the safepoint. >> >> _holder = env->get_instance_klass(h_m()->method_holder()); > I simply pass _holder value into the ciMethod ctor. > > Best regards, > Vladimir Ivanov > >> I know I suggested adding the ciObject in ciMetadata but that's because >> this is done somewhere that is hard to find. A good comment that this >> is what keeps metadata that ci points to from being unloaded by GC would >> help a lot with that. > > >> >> Thanks, >> Coleen >> >> >> On 10/30/2014 02:06 PM, Vladimir Kozlov wrote: >>> I would go with webrev.01 (updated initial version). >>> >>> Regards, >>> Vladimir >>> >>> On 10/30/14 7:55 AM, Vladimir Ivanov wrote: >>>>>> As a solution, _holder can be passed into ciMethod::ciMethod as a >>>>>> parameter. It should fix the problem. >>>>> >>>>> The first change you suggested >>>>> (http://cr.openjdk.java.net/~vlivanov/8060147/webrev.00) would fix >>>>> the >>>>> ciMethod::ciMethod problem, right? The code would be more robust that >>>>> way and other similar issues could be avoided. >>>> Yes, initial version fixes ciMethod::ciMethod problem. It's also more >>>> robust and easier to reason about. >>>> >>>> The downside is that for every ciMetadata instantiation we do more >>>> work. >>>> >>>> I have an alternative version: >>>> http://cr.openjdk.java.net/~vlivanov/8060147/webrev.anon.00/ >>>> >>>> Initial version (updated): >>>> http://cr.openjdk.java.net/~vlivanov/8060147/webrev.01/ >>>> >>>> I like initial version more, but I don't have strong opinion here. >>>> >>>> Best regards, >>>> Vladimir Ivanov >> From vladimir.x.ivanov at oracle.com Wed Nov 5 17:02:12 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 05 Nov 2014 21:02:12 +0400 Subject: [9] RFR (S): 8060147: SIGSEGV in Metadata::mark_on_stack() while marking metadata in ciEnv In-Reply-To: <545A5F59.2020907@oracle.com> References: <5450F261.60400@oracle.com> <545114DF.7040005@oracle.com> <54511744.4060904@oracle.com> <5451F43A.1010108@oracle.com> <5452128C.4090408@oracle.com> <54522805.5040701@oracle.com> <1421AC31-CBEA-4AED-A657-3C43B258A6DB@oracle.com> <54522357.4070705@oracle.com> <5452425D.7040405@oracle.com> <5452517C.4050104@oracle.com> <54527E1E.1070507@oracle.com> <5452A786.7030000@oracle.com> <5452A077.2050903@oracle.com> <545A5F59.2020907@oracle.com> Message-ID: <545A5814.8000109@oracle.com> On 11/5/14, 9:33 PM, Coleen Phillimore wrote: > > On 10/30/14, 4:32 PM, Vladimir Ivanov wrote: >> Coleen, >> >> I implemented 2 approaches of the fix. >> >> The fix with a special case for VM anon classes is: >> http://cr.openjdk.java.net/~vlivanov/8060147/webrev.anon.00/ >> >> Both fix the bug, but have different properties. >> >> (1) Special case for VM anon class is very focused on the actual >> cause, but more fragile - all the logic which keeps metadata from >> being deallocated is non-trivial and scattered around the whole >> ciMetadata hierarchy. >> >> (2) On the other hand, initial version, which forcibly creates >> klass_holder ciObject for each ciMetadata, is much cleaner and >> localized, but does unnecessary work. >> >> Am I right that you prefer (1) as a fix? > > Yes, I think this version does less unnecessary work and creates less > ciObjects. And the comment is useful for finding how we keep > ciMetadata alive for anonymous classes. You still have a UseNewCode in > the webrev thought that you want to take out. Thanks, Coleen. VladimirK, Roland, what do you think about (1)? Best regards, Vladimir Ivanov From coleen.phillimore at oracle.com Wed Nov 5 18:37:55 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 05 Nov 2014 13:37:55 -0500 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545A4274.6090409@oracle.com> References: <5459A8ED.8060808@oracle.com> <5459FF11.1080801@oracle.com> <545A4274.6090409@oracle.com> Message-ID: <545A6E83.8060909@oracle.com> Dan, I had a look at this change too. On 11/5/14, 10:29 AM, Daniel D. Daugherty wrote: > On 11/5/14 3:42 AM, David Holmes wrote: >> Hi Dan, >> >> Reviewed. > > Thanks! > > >> I find the name OM_OFFSET_NO_MONITOR_VALUE somewhat awkward but have >> no better suggestion. > > Understood. I didn't like the original "OFFSET_SKEWED" name > especially since I was moving it to objectMonitor.hpp... > > If you think of a better, let me know... we can always change it. > So the -2 was a tag? Then maybe a better name is UNTAGGED_OM_OFFSET .. Weird stuff anyway. In http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html Can you make the whitespace changes to the lines you've changed: + movptr(tmpReg, Address (tmpReg, OM_OFFSET_NO_MONITOR_VALUE(owner))); // rax, = m->_owner to + movptr(tmpReg, Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE(owner))); // rax, = m->_owner In general, this looks like a great improvement not subtracting two from seemingly random places in assembly code. thanks, Coleen > > >> In fact I have to ask what _is_ the object monitor tagging mechanism? >> I can't see it defined in the objectMonitor.* files. ?? > > That would be this code: > > src/share/vm/oops/markOop.hpp: > > 317 static markOop encode(ObjectMonitor* monitor) { > 318 intptr_t tmp = (intptr_t) monitor; > 319 return (markOop) (tmp | monitor_value); > 320 } > > and the other methods in that file that have to account for > the monitor_value being set... > > Dan > > >> >> Thanks, >> David >> >> On 5/11/2014 2:34 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Contended Locking cleanup bucket fix ready for review. >>> >>> This fix was spun off from the Contended Locking fast enter bucket >>> which was sent out for review late last week. This fix cleans up >>> the computation of ObjectMonitor field pointers and gets rid of >>> the use of literal '-2' in appropriate places. For example: >>> >>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>> Rscratch); >>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >>> >>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>> specified field and subtracts markOopDesc:monitor_value (2). >>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>> >>> Thanks to David Holmes for his comments on JDK-8061553 that >>> motivated this (long overdue) cleanup. >>> >>> This work is being tracked by the following bug ID: >>> >>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>> >>> Here is the JEP link: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>> >>> Testing: >>> >>> - JPRT test jobs (since this is only syntax and comment cleanup) >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan > From david.buck at oracle.com Wed Nov 5 18:59:25 2014 From: david.buck at oracle.com (david buck) Date: Thu, 06 Nov 2014 03:59:25 +0900 Subject: RFR 8058715: stability issues when being launched as an embedded JVM via JNI Message-ID: <545A738D.2080201@oracle.com> Hi! This is a request for code review of my fix for jdk8058715 BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058715 WEBREV: http://cr.openjdk.java.net/~dbuck/8058715/webrev.01/ We have also received confirmation from the original reporter of the issue that this solution resolves the crashes they were seeing in their environment. I have tested that this change does not break the original NX bug workaround. I also ran the NX bug reproducer (v8 benchmark of Nashorn running in a loop) using a fastdebug build with the -XX:NativeMemoryTracking=summary option. Obviously no crashes or other issues were detected. Cheers, -Buck From calvin.cheung at oracle.com Wed Nov 5 19:14:20 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 05 Nov 2014 11:14:20 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler Message-ID: <545A770C.3030503@oracle.com> While upgrading the compiler on Mac for jdk9, we found this compiler bug where it skips the following 2 lines of code in metaspaceShared.cpp when optimization is enable (set to -Os) for the fastdebug and product builds. strcat(class_list_path_str, os::file_separator()); strcat(class_list_path_str, "classlist"); The bug is reproducible with Xcode 5.1.1 and 6.1. A workaround fix is to rewrite an "if" block in the MetaspaceShared::preload_and_dump() method. JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ Testing: JPRT The affected testcase with product, fastdebug, and debug builds built with Xcode 5.1.1 and 6.1. thanks, Calvin From coleen.phillimore at oracle.com Wed Nov 5 19:50:20 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 05 Nov 2014 14:50:20 -0500 Subject: RFR 8058715: stability issues when being launched as an embedded JVM via JNI In-Reply-To: <545A738D.2080201@oracle.com> References: <545A738D.2080201@oracle.com> Message-ID: <545A7F7C.7030007@oracle.com> Looks good, David. Thank you for diagnosing and resolving this customer problem! Coleen On 11/5/14, 1:59 PM, david buck wrote: > Hi! > > This is a request for code review of my fix for jdk8058715 > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058715 > WEBREV: http://cr.openjdk.java.net/~dbuck/8058715/webrev.01/ > > We have also received confirmation from the original reporter of the > issue that this solution resolves the crashes they were seeing in > their environment. I have tested that this change does not break the > original NX bug workaround. I also ran the NX bug reproducer (v8 > benchmark of Nashorn running in a loop) using a fastdebug build with > the -XX:NativeMemoryTracking=summary option. Obviously no crashes or > other issues were detected. > > Cheers, > -Buck From dean.long at oracle.com Wed Nov 5 21:28:28 2014 From: dean.long at oracle.com (Dean Long) Date: Wed, 05 Nov 2014 13:28:28 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545A770C.3030503@oracle.com> References: <545A770C.3030503@oracle.com> Message-ID: <545A967C.6020200@oracle.com> I'm just curious if the following also works: 721 strcat(class_list_path_str, (volatile char *)os::file_separator()); 722 strcat(class_list_path_str,(volatile char *)"classlist"); dl On 11/5/2014 11:14 AM, Calvin Cheung wrote: > While upgrading the compiler on Mac for jdk9, we found this compiler > bug where it skips the following 2 lines of code in > metaspaceShared.cpp when optimization is enable (set to -Os) for the > fastdebug and product builds. > strcat(class_list_path_str, os::file_separator()); > strcat(class_list_path_str, "classlist"); > > The bug is reproducible with Xcode 5.1.1 and 6.1. > > A workaround fix is to rewrite an "if" block in the > MetaspaceShared::preload_and_dump() method. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 > > webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ > > Testing: > JPRT > The affected testcase with product, fastdebug, and debug builds > built with Xcode 5.1.1 and 6.1. > > thanks, > Calvin From vladimir.kozlov at oracle.com Wed Nov 5 21:51:39 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 05 Nov 2014 13:51:39 -0800 Subject: [9] RFR (S): 8060147: SIGSEGV in Metadata::mark_on_stack() while marking metadata in ciEnv In-Reply-To: <545A5814.8000109@oracle.com> References: <5450F261.60400@oracle.com> <545114DF.7040005@oracle.com> <54511744.4060904@oracle.com> <5451F43A.1010108@oracle.com> <5452128C.4090408@oracle.com> <54522805.5040701@oracle.com> <1421AC31-CBEA-4AED-A657-3C43B258A6DB@oracle.com> <54522357.4070705@oracle.com> <5452425D.7040405@oracle.com> <5452517C.4050104@oracle.com> <54527E1E.1070507@oracle.com> <5452A786.7030000@oracle.com> <5452A077.2050903@oracle.com> <545A5F59.2020907@oracle.com> <545A5814.8000109@oracle.com> Message-ID: <545A9BEB.8020507@oracle.com> I am fine with targeted fix only. One comment env->get_instance_klass() checks for NULL. Your new code in create_new_metadata() does not: ciInstanceKlass* holder = get_metadata(h_m()->method_holder())->as_instance_klass(); Thanks, Vladimir K On 11/5/14 9:02 AM, Vladimir Ivanov wrote: > > On 11/5/14, 9:33 PM, Coleen Phillimore wrote: >> >> On 10/30/14, 4:32 PM, Vladimir Ivanov wrote: >>> Coleen, >>> >>> I implemented 2 approaches of the fix. >>> >>> The fix with a special case for VM anon classes is: >>> http://cr.openjdk.java.net/~vlivanov/8060147/webrev.anon.00/ >>> >>> Both fix the bug, but have different properties. >>> >>> (1) Special case for VM anon class is very focused on the actual >>> cause, but more fragile - all the logic which keeps metadata from >>> being deallocated is non-trivial and scattered around the whole >>> ciMetadata hierarchy. >>> >>> (2) On the other hand, initial version, which forcibly creates >>> klass_holder ciObject for each ciMetadata, is much cleaner and >>> localized, but does unnecessary work. >>> >>> Am I right that you prefer (1) as a fix? >> >> Yes, I think this version does less unnecessary work and creates less >> ciObjects. And the comment is useful for finding how we keep >> ciMetadata alive for anonymous classes. You still have a UseNewCode in >> the webrev thought that you want to take out. > > Thanks, Coleen. > > VladimirK, Roland, what do you think about (1)? > > Best regards, > Vladimir Ivanov From yumin.qi at oracle.com Wed Nov 5 22:16:38 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Wed, 05 Nov 2014 14:16:38 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545A770C.3030503@oracle.com> References: <545A770C.3030503@oracle.com> Message-ID: <545AA1C6.70902@oracle.com> Looks good to me. Thanks Yumin On 11/5/2014 11:14 AM, Calvin Cheung wrote: > While upgrading the compiler on Mac for jdk9, we found this compiler > bug where it skips the following 2 lines of code in > metaspaceShared.cpp when optimization is enable (set to -Os) for the > fastdebug and product builds. > strcat(class_list_path_str, os::file_separator()); > strcat(class_list_path_str, "classlist"); > > The bug is reproducible with Xcode 5.1.1 and 6.1. > > A workaround fix is to rewrite an "if" block in the > MetaspaceShared::preload_and_dump() method. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 > > webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ > > Testing: > JPRT > The affected testcase with product, fastdebug, and debug builds > built with Xcode 5.1.1 and 6.1. > > thanks, > Calvin From calvin.cheung at oracle.com Wed Nov 5 22:34:47 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 05 Nov 2014 14:34:47 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545A967C.6020200@oracle.com> References: <545A770C.3030503@oracle.com> <545A967C.6020200@oracle.com> Message-ID: <545AA607.2050500@oracle.com> Hi Dean, I've tried your suggestion but got the following compilation error: Compiling /Users/ccheung/jdk9-comp-upgrade/hotspot/src/share/vm/memory/metaspaceShared.cpp /Users/ccheung/jdk9-comp-upgrade/hotspot/src/share/vm/memory/metaspaceShared.cpp:720:5: error: no matching function for call to 'strcat' strcat(class_list_path_str, (volatile char *)os::file_separator()); ^~~~~~ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/string.h:75:7: note: candidate function not viable: 2nd argument ('volatile char *') would lose volatile qualifier char *strcat(char *, const char *); ^ /Users/ccheung/jdk9-comp-upgrade/hotspot/src/share/vm/memory/metaspaceShared.cpp:721:5: error: no matching function for call to 'strcat' strcat(class_list_path_str, (volatile char *)"classlist"); ^~~~~~ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/string.h:75:7: note: candidate function not viable: 2nd argument ('volatile char *') would lose volatile qualifier char *strcat(char *, const char *); ^ 2 errors generated. Calvin On 11/5/2014 1:28 PM, Dean Long wrote: > I'm just curious if the following also works: > > 721 strcat(class_list_path_str, (volatile char > *)os::file_separator()); > 722 strcat(class_list_path_str,(volatile char *)"classlist"); > > dl > > On 11/5/2014 11:14 AM, Calvin Cheung wrote: >> While upgrading the compiler on Mac for jdk9, we found this compiler >> bug where it skips the following 2 lines of code in >> metaspaceShared.cpp when optimization is enable (set to -Os) for the >> fastdebug and product builds. >> strcat(class_list_path_str, os::file_separator()); >> strcat(class_list_path_str, "classlist"); >> >> The bug is reproducible with Xcode 5.1.1 and 6.1. >> >> A workaround fix is to rewrite an "if" block in the >> MetaspaceShared::preload_and_dump() method. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >> >> Testing: >> JPRT >> The affected testcase with product, fastdebug, and debug builds >> built with Xcode 5.1.1 and 6.1. >> >> thanks, >> Calvin > From david.holmes at oracle.com Wed Nov 5 23:12:42 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 09:12:42 +1000 Subject: RFR 8058715: stability issues when being launched as an embedded JVM via JNI In-Reply-To: <545A738D.2080201@oracle.com> References: <545A738D.2080201@oracle.com> Message-ID: <545AAEEA.1070605@oracle.com> Hi David, On 6/11/2014 4:59 AM, david buck wrote: > Hi! > > This is a request for code review of my fix for jdk8058715 > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058715 > WEBREV: http://cr.openjdk.java.net/~dbuck/8058715/webrev.01/ > > We have also received confirmation from the original reporter of the > issue that this solution resolves the crashes they were seeing in their > environment. I have tested that this change does not break the original > NX bug workaround. I also ran the NX bug reproducer (v8 benchmark of > Nashorn running in a loop) using a fastdebug build with the > -XX:NativeMemoryTracking=summary option. Obviously no crashes or other > issues were detected. The failure mode for this suggests we are lacking something when we attempt to reserve memory. I think that needs closer examination as we should not have something that leads to silent corruption followed by spurious failures! That aside I don't see how this "does not break the original NX bug workaround". We will skip the workaround if the memory reservation fails. Is it the case that in such circumstances we don't need the workaround? Thanks, David H. > Cheers, > -Buck From christian.thalinger at oracle.com Wed Nov 5 23:13:12 2014 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 5 Nov 2014 15:13:12 -0800 Subject: RFR (XS): 8062950: Bug in locking code when UseOptoBiasInlining is disabled: assert(dmw->is_neutral()) failed: invariant In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116566ACE597E@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116566ACE597E@DEWDFEMB19C.global.corp.sap> Message-ID: I?m not exactly sure who is our biased locking expert these days but I guess it?s someone from the runtime team. CC?ing them. > On Nov 5, 2014, at 7:38 AM, Doerr, Martin wrote: > > Hi, > > we found a bug in MacroAssembler::fast_lock on x86 which shows up when UseOptoBiasInlining is switched off. > The problem is that biased_locking_enter is used with swap_reg_contains_mark==true, which is no longer correct after biased_locking_enter was put in front of check for IsInflated. > > Please review > http://cr.openjdk.java.net/~goetz/webrevs/8062950-lockBug/webrev.00/ > > Best regards, > Martin From jeremymanson at google.com Wed Nov 5 23:13:45 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Wed, 5 Nov 2014 15:13:45 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <5459ADFB.4090808@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> Message-ID: On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com < serguei.spitsyn at oracle.com> wrote: > The fix looks good in general. > > src/share/vm/oops/method.cpp > > 1785 bool contains(Method** m) {1786 if (m == NULL) return false;1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) {1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) {*1789 ptrdiff_t idx = m - b->_methods;**1790 if (b->_methods + idx == m) {** > 1791 return true; > 1792 }* > 1793 } > 1794 } > 1795 return false; // not found > 1796 } > > > Just noticed that the lines 1789-1792 can be replaced with one liner: > * return true;* > Ah, you have found our crappy workaround for wild pointers to non-aligned places in the middle of _methods. > It is because the condition * (b->_methods + idx == m)* is always true. > :) > > Also, should we check the condition: **m != _free_method* ? > What about the following ?: > * return (***m != _free_method);* > I don't mind adding this, if Coleen thinks those are the semantics this needs. It wasn't there before, of course. Jeremy From vladimir.kozlov at oracle.com Wed Nov 5 23:30:54 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 05 Nov 2014 15:30:54 -0800 Subject: RFR (XS): 8062950: Bug in locking code when UseOptoBiasInlining is disabled: assert(dmw->is_neutral()) failed: invariant In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB418116566ACE597E@DEWDFEMB19C.global.corp.sap> Message-ID: <545AB32E.8070402@oracle.com> It is our (Compiler group) code. This problem was introduced with my changes for RTM locking. Martin your changes are good. But you cleanup a bit this code since we now never put markword to tmpReg before this call? Thanks, Vladimir On 11/5/14 3:13 PM, Christian Thalinger wrote: > I?m not exactly sure who is our biased locking expert these days but I guess it?s someone from the runtime team. CC?ing them. > >> On Nov 5, 2014, at 7:38 AM, Doerr, Martin wrote: >> >> Hi, >> >> we found a bug in MacroAssembler::fast_lock on x86 which shows up when UseOptoBiasInlining is switched off. >> The problem is that biased_locking_enter is used with swap_reg_contains_mark==true, which is no longer correct after biased_locking_enter was put in front of check for IsInflated. >> >> Please review >> http://cr.openjdk.java.net/~goetz/webrevs/8062950-lockBug/webrev.00/ >> >> Best regards, >> Martin > From coleen.phillimore at oracle.com Wed Nov 5 23:40:17 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 05 Nov 2014 18:40:17 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> Message-ID: <545AB561.9020204@oracle.com> On 11/5/14, 6:13 PM, Jeremy Manson wrote: > > > On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com > > wrote: > > The fix looks good in general. > > src/share/vm/oops/method.cpp > > 1785 bool contains(Method** m) { > 1786 if (m == NULL) return false; > 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { > 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { > *1789 ptrdiff_t idx = m - b->_methods;** > **1790 if (b->_methods + idx == m) {** > 1791 return true; > 1792 }* > 1793 } > 1794 } > 1795 return false; // not found > 1796 } > > > Just noticed that the lines 1789-1792 can be replaced with one liner: > * return true;* > > > Ah, you have found our crappy workaround for wild pointers to > non-aligned places in the middle of _methods. Can you explain this? Why are there wild pointers? > > It is because the condition *(b->_methods + idx == m)* is always > true. :) > > Also, should we check the condition: **m != _free_method*** ? > What about the following ?: > * return (****m != _free_method***);* > > > I don't mind adding this, if Coleen thinks those are the semantics > this needs. It wasn't there before, of course. > The semantics weren't there before and the way this is called has already checked that *m != _free_method. Would it be an improvement? I don't think so. It seems that contains() should just check that the Method** is contained in the methodID table. To be more correct, is_method_id should check that it's not a freed methodID but the caller verifies this already. So I don't think this should change. BTW, I've run the test sets suggested by Serguei and they all passed. Coleen > Jeremy > From david.holmes at oracle.com Wed Nov 5 23:44:57 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 09:44:57 +1000 Subject: RFR (XS): 8062950: Bug in locking code when UseOptoBiasInlining is disabled: assert(dmw->is_neutral()) failed: invariant In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB418116566ACE597E@DEWDFEMB19C.global.corp.sap> Message-ID: <545AB679.4040705@oracle.com> On 6/11/2014 9:13 AM, Christian Thalinger wrote: > I?m not exactly sure who is our biased locking expert these days but I guess it?s someone from the runtime team. CC?ing them. The fact I am responding does not imply I am, or consider myself, such an expert. ;-) I think we need to hear from Vladimir and Roland concerning the original fix for: 8033805: Move Fast_Lock/Fast_Unlock code from .ad files to macroassembler Looking at that changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5292439ef895 it seems that in x86_32.ad we had: if (UseBiasedLocking) { masm.biased_locking_enter(boxReg, objReg, tmpReg, scrReg, false, DONE_LABEL, NULL, _counters); } which passes "false", but in x86_64.ad we had: if (UseBiasedLocking && !UseOptoBiasInlining) { masm.biased_locking_enter(boxReg, objReg, tmpReg, scrReg, true, DONE_LABEL, NULL, _counters); masm.movptr(tmpReg, Address(objReg, 0)) ; // [FETCH] } which passes "true" because there was a prior load of the markword into tmpReg. The new code then has the 64-bit version: if (UseBiasedLocking && !UseOptoBiasInlining) { biased_locking_enter(boxReg, objReg, tmpReg, scrReg, true, DONE_LABEL, NULL, counters); } but not the prior load and hence is incorrect. So I concur with Martin's suggested fix. Cheers, David >> On Nov 5, 2014, at 7:38 AM, Doerr, Martin wrote: >> >> Hi, >> >> we found a bug in MacroAssembler::fast_lock on x86 which shows up when UseOptoBiasInlining is switched off. >> The problem is that biased_locking_enter is used with swap_reg_contains_mark==true, which is no longer correct after biased_locking_enter was put in front of check for IsInflated. Thanks, David >> >> Please review >> http://cr.openjdk.java.net/~goetz/webrevs/8062950-lockBug/webrev.00/ >> >> Best regards, >> Martin > From serguei.spitsyn at oracle.com Wed Nov 5 23:51:03 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 05 Nov 2014 15:51:03 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545AB561.9020204@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> Message-ID: <545AB7E7.4020809@oracle.com> On 11/5/14 3:40 PM, Coleen Phillimore wrote: > > On 11/5/14, 6:13 PM, Jeremy Manson wrote: >> >> >> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >> > > wrote: >> >> The fix looks good in general. >> >> src/share/vm/oops/method.cpp >> >> 1785 bool contains(Method** m) { >> 1786 if (m == NULL) return false; >> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { >> 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { >> *1789 ptrdiff_t idx = m - b->_methods;** >> **1790 if (b->_methods + idx == m) {** >> 1791 return true; >> 1792 }* >> 1793 } >> 1794 } >> 1795 return false; // not found >> 1796 } >> >> >> Just noticed that the lines 1789-1792 can be replaced with one liner: >> * return true;* >> >> >> Ah, you have found our crappy workaround for wild pointers to >> non-aligned places in the middle of _methods. > > Can you explain this? Why are there wild pointers? >> >> It is because the condition *(b->_methods + idx == m)* is always >> true. :) >> >> Also, should we check the condition: **m != _free_method*** ? >> What about the following ?: >> * return (****m != _free_method***);* >> >> >> I don't mind adding this, if Coleen thinks those are the semantics >> this needs. It wasn't there before, of course. >> > > The semantics weren't there before and the way this is called has > already checked that *m != _free_method. Would it be an improvement? > I don't think so. It seems that contains() should just check that the > Method** is contained in the methodID table. To be more correct, > is_method_id should check that it's not a freed methodID but the > caller verifies this already. So I don't think this should change. Agreed. Thank you for the explanation! > > BTW, I've run the test sets suggested by Serguei and they all passed. Nice! Thanks, Serguei > > Coleen > >> Jeremy >> > From david.holmes at oracle.com Thu Nov 6 00:16:49 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 10:16:49 +1000 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545A4719.50705@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> Message-ID: <545ABDF1.6050107@oracle.com> On 6/11/2014 1:49 AM, Claes Redestad wrote: > Hi, > > On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Contended Locking cleanup bucket fix ready for review. >> >> This fix was spun off from the Contended Locking fast enter bucket >> which was sent out for review late last week. This fix cleans up >> the computation of ObjectMonitor field pointers and gets rid of >> the use of literal '-2' in appropriate places. For example: >> >> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >> Rscratch); >> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >> >> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >> specified field and subtracts markOopDesc:monitor_value (2). >> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. > > any reason not to add it as a function in objectMonitor.hpp instead of a > macro? How about: > > static int no_monitor_offset_in_bytes() { return > offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } _owner is not the only field used so you would need a function for each one. David ----- > Example usage: > > - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, > Rscratch); > + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), > Rscratch); > > > Seems this should be inlined regardless and looks a bit cleaner to me. > > Thanks! > > /Claes > >> >> Thanks to David Holmes for his comments on JDK-8061553 that >> motivated this (long overdue) cleanup. >> >> This work is being tracked by the following bug ID: >> >> JDK-8062851 cleanup ObjectMonitor offset adjustments >> https://bugs.openjdk.java.net/browse/JDK-8062851 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >> >> Here is the JEP link: >> >> https://bugs.openjdk.java.net/browse/JDK-8046133 >> >> Testing: >> >> - JPRT test jobs (since this is only syntax and comment cleanup) >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan > From david.holmes at oracle.com Thu Nov 6 00:21:58 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 10:21:58 +1000 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545A4274.6090409@oracle.com> References: <5459A8ED.8060808@oracle.com> <5459FF11.1080801@oracle.com> <545A4274.6090409@oracle.com> Message-ID: <545ABF26.5010505@oracle.com> On 6/11/2014 1:29 AM, Daniel D. Daugherty wrote: > On 11/5/14 3:42 AM, David Holmes wrote: >> Hi Dan, >> >> Reviewed. > > Thanks! > > >> I find the name OM_OFFSET_NO_MONITOR_VALUE somewhat awkward but have >> no better suggestion. > > Understood. I didn't like the original "OFFSET_SKEWED" name > especially since I was moving it to objectMonitor.hpp... > > If you think of a better, let me know... we can always change it. > > > >> In fact I have to ask what _is_ the object monitor tagging mechanism? >> I can't see it defined in the objectMonitor.* files. ?? > > That would be this code: > > src/share/vm/oops/markOop.hpp: Doh! The markword encoding - of course. Thanks, David > 317 static markOop encode(ObjectMonitor* monitor) { > 318 intptr_t tmp = (intptr_t) monitor; > 319 return (markOop) (tmp | monitor_value); > 320 } > > and the other methods in that file that have to account for > the monitor_value being set... > > Dan > > >> >> Thanks, >> David >> >> On 5/11/2014 2:34 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Contended Locking cleanup bucket fix ready for review. >>> >>> This fix was spun off from the Contended Locking fast enter bucket >>> which was sent out for review late last week. This fix cleans up >>> the computation of ObjectMonitor field pointers and gets rid of >>> the use of literal '-2' in appropriate places. For example: >>> >>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>> Rscratch); >>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >>> >>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>> specified field and subtracts markOopDesc:monitor_value (2). >>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>> >>> Thanks to David Holmes for his comments on JDK-8061553 that >>> motivated this (long overdue) cleanup. >>> >>> This work is being tracked by the following bug ID: >>> >>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>> >>> Here is the JEP link: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>> >>> Testing: >>> >>> - JPRT test jobs (since this is only syntax and comment cleanup) >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan > From coleen.phillimore at oracle.com Thu Nov 6 00:23:42 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 05 Nov 2014 19:23:42 -0500 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545ABDF1.6050107@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> Message-ID: <545ABF8E.1050408@oracle.com> On 11/5/14, 7:16 PM, David Holmes wrote: > On 6/11/2014 1:49 AM, Claes Redestad wrote: >> Hi, >> >> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Contended Locking cleanup bucket fix ready for review. >>> >>> This fix was spun off from the Contended Locking fast enter bucket >>> which was sent out for review late last week. This fix cleans up >>> the computation of ObjectMonitor field pointers and gets rid of >>> the use of literal '-2' in appropriate places. For example: >>> >>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>> Rscratch); >>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >>> >>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>> specified field and subtracts markOopDesc:monitor_value (2). >>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >> >> any reason not to add it as a function in objectMonitor.hpp instead of a >> macro? How about: >> >> static int no_monitor_offset_in_bytes() { return >> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } > > _owner is not the only field used so you would need a function for > each one. I thought this would be better too. There are only 6 functions (6 lines) max that need this. It would look nicer. My suggestion would be to make them static int untagged_offset_in_bytes() or whatever monitor_value is. It's not a very descriptive name so better to name the functions after what it's for. Coleen > > David > ----- > >> Example usage: >> >> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >> Rscratch); >> + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), >> Rscratch); >> >> >> Seems this should be inlined regardless and looks a bit cleaner to me. >> >> Thanks! >> >> /Claes >> >>> >>> Thanks to David Holmes for his comments on JDK-8061553 that >>> motivated this (long overdue) cleanup. >>> >>> This work is being tracked by the following bug ID: >>> >>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>> >>> Here is the JEP link: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>> >>> Testing: >>> >>> - JPRT test jobs (since this is only syntax and comment cleanup) >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan >> From david.holmes at oracle.com Thu Nov 6 00:34:04 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 10:34:04 +1000 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545ABF8E.1050408@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> Message-ID: <545AC1FC.8010905@oracle.com> On 6/11/2014 10:23 AM, Coleen Phillimore wrote: > > On 11/5/14, 7:16 PM, David Holmes wrote: >> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>> Hi, >>> >>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I have a Contended Locking cleanup bucket fix ready for review. >>>> >>>> This fix was spun off from the Contended Locking fast enter bucket >>>> which was sent out for review late last week. This fix cleans up >>>> the computation of ObjectMonitor field pointers and gets rid of >>>> the use of literal '-2' in appropriate places. For example: >>>> >>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>> Rscratch); >>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >>>> >>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>> specified field and subtracts markOopDesc:monitor_value (2). >>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>> >>> any reason not to add it as a function in objectMonitor.hpp instead of a >>> macro? How about: >>> >>> static int no_monitor_offset_in_bytes() { return >>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >> >> _owner is not the only field used so you would need a function for >> each one. > > I thought this would be better too. There are only 6 functions (6 > lines) max that need this. It would look nicer. Only changes an upper case macro name to a lower case function name. > My suggestion would be to make them static int > untagged_offset_in_bytes() or whatever monitor_value is. It's not a > very descriptive name so better to name the functions after what it's for. You need the field name included in the function name: untagged_offset_of_owner() untagged_offset_of_xxx() but it is only untagged if the OM is currently inflated, so then: untagged_offset_of_XXX_for_inflated_om() I can live with Dan's macro (which is an improvement on the original). David > Coleen > >> >> David >> ----- >> >>> Example usage: >>> >>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>> Rscratch); >>> + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), >>> Rscratch); >>> >>> >>> Seems this should be inlined regardless and looks a bit cleaner to me. >>> >>> Thanks! >>> >>> /Claes >>> >>>> >>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>> motivated this (long overdue) cleanup. >>>> >>>> This work is being tracked by the following bug ID: >>>> >>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>> >>>> Here is the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>> >>>> Here is the JEP link: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>> >>>> Testing: >>>> >>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan >>> > From jeremymanson at google.com Thu Nov 6 00:35:23 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Wed, 5 Nov 2014 16:35:23 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545AB561.9020204@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> Message-ID: On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore < coleen.phillimore at oracle.com> wrote: > > On 11/5/14, 6:13 PM, Jeremy Manson wrote: > > > > On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com < > serguei.spitsyn at oracle.com> wrote: > >> The fix looks good in general. >> >> src/share/vm/oops/method.cpp >> >> 1785 bool contains(Method** m) {1786 if (m == NULL) return false;1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) {1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) {*1789 ptrdiff_t idx = m - b->_methods;**1790 if (b->_methods + idx == m) {** >> 1791 return true; >> 1792 }* >> 1793 } >> 1794 } >> 1795 return false; // not found >> 1796 } >> >> >> Just noticed that the lines 1789-1792 can be replaced with one liner: >> * return true;* >> > > Ah, you have found our crappy workaround for wild pointers to > non-aligned places in the middle of _methods. > > > Can you explain this? Why are there wild pointers? > My belief was that end user code could pass any old garbage to this function. It's called by Method::is_method_id, which is called by jniCheck::validate_jmethod_id. The idea was that this would help check jni deliver useful information in the case of the end user inputting garbage that happened to be in the right memory range. Having said that, at a second glance, it looks as if it that call is protected by a call to is_method() (in checked_resolve_jmethod_id), so the program will probably crash before it gets to this check. The other point about it was that the result of >= and < is technically unspecified; if it were ever implemented as anything other than a binary comparison between integers (which it never is, now that no one has a segmented architecture), the comparison could pass spuriously, so checking would be a good thing. Of course, the comparison could fail spuriously, too. Anyway, I'm happy to leave it in as belt-and-suspenders (and add a comment, obviously, since it has caused confusion), or take it out. Your call. Jeremy From coleen.phillimore at oracle.com Thu Nov 6 00:41:42 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 05 Nov 2014 19:41:42 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> Message-ID: <545AC3C6.6070105@oracle.com> Yes, leave it in and add a comment then (sorry for top-posting). Thank you for the explanation. Coleen On 11/5/14, 7:35 PM, Jeremy Manson wrote: > > > On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore > > > wrote: > > > On 11/5/14, 6:13 PM, Jeremy Manson wrote: >> >> >> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >> > > wrote: >> >> The fix looks good in general. >> >> src/share/vm/oops/method.cpp >> >> 1785 bool contains(Method** m) { >> 1786 if (m == NULL) return false; >> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { >> 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { >> *1789 ptrdiff_t idx = m - b->_methods;** >> **1790 if (b->_methods + idx == m) {** >> 1791 return true; >> 1792 }* >> 1793 } >> 1794 } >> 1795 return false; // not found >> 1796 } >> >> >> Just noticed that the lines 1789-1792 can be replaced with >> one liner: >> * return true;* >> >> >> Ah, you have found our crappy workaround for wild pointers to >> non-aligned places in the middle of _methods. > > Can you explain this? Why are there wild pointers? > > > My belief was that end user code could pass any old garbage to this > function. It's called by Method::is_method_id, which is called > by jniCheck::validate_jmethod_id. The idea was that this would help > check jni deliver useful information in the case of the end user > inputting garbage that happened to be in the right memory range. > > Having said that, at a second glance, it looks as if it that call is > protected by a call to is_method() (in checked_resolve_jmethod_id), so > the program will probably crash before it gets to this check. > > The other point about it was that the result of >= and < is > technically unspecified; if it were ever implemented as anything other > than a binary comparison between integers (which it never is, now that > no one has a segmented architecture), the comparison could pass > spuriously, so checking would be a good thing. Of course, the > comparison could fail spuriously, too. > > Anyway, I'm happy to leave it in as belt-and-suspenders (and add a > comment, obviously, since it has caused confusion), or take it out. > Your call. > > Jeremy From david.holmes at oracle.com Thu Nov 6 00:50:27 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 10:50:27 +1000 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545A770C.3030503@oracle.com> References: <545A770C.3030503@oracle.com> Message-ID: <545AC5D3.9090005@oracle.com> On 6/11/2014 5:14 AM, Calvin Cheung wrote: > While upgrading the compiler on Mac for jdk9, we found this compiler bug > where it skips the following 2 lines of code in metaspaceShared.cpp when > optimization is enable (set to -Os) for the fastdebug and product builds. > strcat(class_list_path_str, os::file_separator()); > strcat(class_list_path_str, "classlist"); > > The bug is reproducible with Xcode 5.1.1 and 6.1. > > A workaround fix is to rewrite an "if" block in the > MetaspaceShared::preload_and_dump() method. Can't you simply replace the strcats with jio_snprintf and do away with the sub_path array? Or even try strncat instead of strcat? David > JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 > > webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ > > Testing: > JPRT > The affected testcase with product, fastdebug, and debug builds > built with Xcode 5.1.1 and 6.1. > > thanks, > Calvin From serguei.spitsyn at oracle.com Thu Nov 6 01:11:00 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 05 Nov 2014 17:11:00 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> Message-ID: <545ACAA4.3020906@oracle.com> On 11/5/14 4:35 PM, Jeremy Manson wrote: > > > On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore > > > wrote: > > > On 11/5/14, 6:13 PM, Jeremy Manson wrote: >> >> >> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >> > > wrote: >> >> The fix looks good in general. >> >> src/share/vm/oops/method.cpp >> >> 1785 bool contains(Method** m) { >> 1786 if (m == NULL) return false; >> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { >> 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { >> *1789 ptrdiff_t idx = m - b->_methods;** >> **1790 if (b->_methods + idx == m) {** >> 1791 return true; >> 1792 }* >> 1793 } >> 1794 } >> 1795 return false; // not found >> 1796 } >> >> >> Just noticed that the lines 1789-1792 can be replaced with >> one liner: >> * return true;* >> >> >> Ah, you have found our crappy workaround for wild pointers to >> non-aligned places in the middle of _methods. > > Can you explain this? Why are there wild pointers? > > > My belief was that end user code could pass any old garbage to this > function. It's called by Method::is_method_id, which is called > by jniCheck::validate_jmethod_id. The idea was that this would help > check jni deliver useful information in the case of the end user > inputting garbage that happened to be in the right memory range. > > Having said that, at a second glance, it looks as if it that call is > protected by a call to is_method() (in checked_resolve_jmethod_id), so > the program will probably crash before it gets to this check. > > The other point about it was that the result of >= and < is > technically unspecified; if it were ever implemented as anything other > than a binary comparison between integers (which it never is, now that > no one has a segmented architecture), the comparison could pass > spuriously, so checking would be a good thing. Of course, the > comparison could fail spuriously, too. > > Anyway, I'm happy to leave it in as belt-and-suspenders (and add a > comment, obviously, since it has caused confusion), or take it out. > Your call. I'm still confused. How this code could possibly check anything? ptrdiff_t idx = m - b->_methods; if (b->_methods + idx == m) { The condition above always gives true: b->_methods + (idx) == b->_methods + (m - b->_methods) == (b->_methods- b->_methods) + m == (0 + m) == m Even if m was unaligned then at the end we compare m with m which is still true. Do I miss anything? Thanks, Serguei ** > > Jeremy From david.holmes at oracle.com Thu Nov 6 01:34:43 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 11:34:43 +1000 Subject: RFR 8058715: stability issues when being launched as an embedded JVM via JNI In-Reply-To: <545AAEEA.1070605@oracle.com> References: <545A738D.2080201@oracle.com> <545AAEEA.1070605@oracle.com> Message-ID: <545AD033.8060107@oracle.com> On 6/11/2014 9:12 AM, David Holmes wrote: > Hi David, > > On 6/11/2014 4:59 AM, david buck wrote: >> Hi! >> >> This is a request for code review of my fix for jdk8058715 >> >> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058715 >> WEBREV: http://cr.openjdk.java.net/~dbuck/8058715/webrev.01/ >> >> We have also received confirmation from the original reporter of the >> issue that this solution resolves the crashes they were seeing in their >> environment. I have tested that this change does not break the original >> NX bug workaround. I also ran the NX bug reproducer (v8 benchmark of >> Nashorn running in a loop) using a fastdebug build with the >> -XX:NativeMemoryTracking=summary option. Obviously no crashes or other >> issues were detected. > > The failure mode for this suggests we are lacking something when we > attempt to reserve memory. I think that needs closer examination as we > should not have something that leads to silent corruption followed by > spurious failures! > > That aside I don't see how this "does not break the original NX bug > workaround". We will skip the workaround if the memory reservation > fails. Is it the case that in such circumstances we don't need the > workaround? Sorry ignore this part. The original code was already bailing out if the reservation failed. Thanks, David > Thanks, > David H. > >> Cheers, >> -Buck From coleen.phillimore at oracle.com Thu Nov 6 03:00:42 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 05 Nov 2014 22:00:42 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545ACAA4.3020906@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> <545ACAA4.3020906@oracle.com> Message-ID: <545AE45A.5080003@oracle.com> On 11/5/14, 8:11 PM, serguei.spitsyn at oracle.com wrote: > > On 11/5/14 4:35 PM, Jeremy Manson wrote: >> >> >> On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore >> > >> wrote: >> >> >> On 11/5/14, 6:13 PM, Jeremy Manson wrote: >>> >>> >>> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >>> >> > wrote: >>> >>> The fix looks good in general. >>> >>> src/share/vm/oops/method.cpp >>> >>> 1785 bool contains(Method** m) { >>> 1786 if (m == NULL) return false; >>> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { >>> 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { >>> *1789 ptrdiff_t idx = m - b->_methods;** >>> **1790 if (b->_methods + idx == m) {** >>> 1791 return true; >>> 1792 }* >>> 1793 } >>> 1794 } >>> 1795 return false; // not found >>> 1796 } >>> >>> >>> Just noticed that the lines 1789-1792 can be replaced with >>> one liner: >>> * return true;* >>> >>> >>> Ah, you have found our crappy workaround for wild pointers to >>> non-aligned places in the middle of _methods. >> >> Can you explain this? Why are there wild pointers? >> >> >> My belief was that end user code could pass any old garbage to this >> function. It's called by Method::is_method_id, which is called >> by jniCheck::validate_jmethod_id. The idea was that this would help >> check jni deliver useful information in the case of the end user >> inputting garbage that happened to be in the right memory range. >> >> Having said that, at a second glance, it looks as if it that call is >> protected by a call to is_method() (in checked_resolve_jmethod_id), >> so the program will probably crash before it gets to this check. >> >> The other point about it was that the result of >= and < is >> technically unspecified; if it were ever implemented as anything >> other than a binary comparison between integers (which it never is, >> now that no one has a segmented architecture), the comparison could >> pass spuriously, so checking would be a good thing. Of course, the >> comparison could fail spuriously, too. >> >> Anyway, I'm happy to leave it in as belt-and-suspenders (and add a >> comment, obviously, since it has caused confusion), or take it out. >> Your call. > > I'm still confused. > > How this code could possibly check anything? > ptrdiff_t idx = m - b->_methods; > if (b->_methods + idx == m) { > > The condition above always gives true: > b->_methods + (idx) == b->_methods + (m - b->_methods) == > (b->_methods- b->_methods) + m == (0 + m) == m > > Even if m was unaligned then at the end we compare m with m which is > still true. > Do I miss anything? If 'm' is unaligned we would fail this comparison: (gdb) print &methods->_data[2] $34 = (Method **) 0x7fffe0022440 (gdb) print &methods->_data[0] $35 = (Method **) 0x7fffe0022430 (gdb) print 0x7fffe0022444 - 0x7fffe0022430 $32 = 20 (gdb) print 20/8 $33 = 2 if m is misaligned 0x7fffe0022444 the idx would be 2 and the expression (b->_methods + idx) would evaluate to the aligned 0xfffe0022440 so not equal m. But the code could check for misaligned m instead (or it would have already crashed). I think all bets are off if the address space is segmented. The comment Jeremy added is: if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { // This is a bit of extra checking, for two reasons. One is // that contains() deals with pointers that are passed in by // JNI code, so making sure that the pointer is aligned // correctly is valuable. The other is that <= and > are // technically not defined on pointers, so the if guard can // pass spuriously; no modern compiler is likely to make that // a problem, though (and if one did, the guard could also // fail spuriously, which would be bad). ptrdiff_t idx = m - b->_methods; if (b->_methods + idx == m) { return true; } Coleen > > > Thanks, > Serguei > > ** >> >> Jeremy > From david.holmes at oracle.com Thu Nov 6 03:11:18 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 13:11:18 +1000 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545AE45A.5080003@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> <545ACAA4.3020906@oracle.com> <545AE45A.5080003@oracle.com> Message-ID: <545AE6D6.4040401@oracle.com> On 6/11/2014 1:00 PM, Coleen Phillimore wrote: > > On 11/5/14, 8:11 PM, serguei.spitsyn at oracle.com wrote: >> >> On 11/5/14 4:35 PM, Jeremy Manson wrote: >>> >>> >>> On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore >>> > >>> wrote: >>> >>> >>> On 11/5/14, 6:13 PM, Jeremy Manson wrote: >>>> >>>> >>>> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >>>> >>> > wrote: >>>> >>>> The fix looks good in general. >>>> >>>> src/share/vm/oops/method.cpp >>>> >>>> 1785 bool contains(Method** m) { >>>> 1786 if (m == NULL) return false; >>>> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b = b->_next) { >>>> 1788 if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { >>>> *1789 ptrdiff_t idx = m - b->_methods;** >>>> **1790 if (b->_methods + idx == m) {** >>>> 1791 return true; >>>> 1792 }* >>>> 1793 } >>>> 1794 } >>>> 1795 return false; // not found >>>> 1796 } >>>> >>>> >>>> Just noticed that the lines 1789-1792 can be replaced with >>>> one liner: >>>> * return true;* >>>> >>>> >>>> Ah, you have found our crappy workaround for wild pointers to >>>> non-aligned places in the middle of _methods. >>> >>> Can you explain this? Why are there wild pointers? >>> >>> >>> My belief was that end user code could pass any old garbage to this >>> function. It's called by Method::is_method_id, which is called >>> by jniCheck::validate_jmethod_id. The idea was that this would help >>> check jni deliver useful information in the case of the end user >>> inputting garbage that happened to be in the right memory range. >>> >>> Having said that, at a second glance, it looks as if it that call is >>> protected by a call to is_method() (in checked_resolve_jmethod_id), >>> so the program will probably crash before it gets to this check. >>> >>> The other point about it was that the result of >= and < is >>> technically unspecified; if it were ever implemented as anything >>> other than a binary comparison between integers (which it never is, >>> now that no one has a segmented architecture), the comparison could >>> pass spuriously, so checking would be a good thing. Of course, the >>> comparison could fail spuriously, too. >>> >>> Anyway, I'm happy to leave it in as belt-and-suspenders (and add a >>> comment, obviously, since it has caused confusion), or take it out. >>> Your call. >> >> I'm still confused. >> >> How this code could possibly check anything? >> ptrdiff_t idx = m - b->_methods; >> if (b->_methods + idx == m) { >> >> The condition above always gives true: >> b->_methods + (idx) == b->_methods + (m - b->_methods) == >> (b->_methods- b->_methods) + m == (0 + m) == m >> >> Even if m was unaligned then at the end we compare m with m which is >> still true. >> Do I miss anything? > > If 'm' is unaligned we would fail this comparison: > > (gdb) print &methods->_data[2] > $34 = (Method **) 0x7fffe0022440 > (gdb) print &methods->_data[0] > $35 = (Method **) 0x7fffe0022430 > (gdb) print 0x7fffe0022444 - 0x7fffe0022430 > $32 = 20 I was confused about this too. What we have here is pointer arithmetic, not regular arithmetic, so I'm assuming an unaligned value has to be adjusted before the actual difference is computed. So in practice: m - b->_methods is really adjusted_for_alignment(m) - b->_methods David ----- > (gdb) print 20/8 > $33 = 2 > > if m is misaligned 0x7fffe0022444 the idx would be 2 and the expression > (b->_methods + idx) would evaluate to the aligned 0xfffe0022440 so not > equal m. > > But the code could check for misaligned m instead (or it would have > already crashed). I think all bets are off if the address space is > segmented. > > The comment Jeremy added is: > > if (b->_methods <= m && m < b->_methods + b->_number_of_methods) { > // This is a bit of extra checking, for two reasons. One is > // that contains() deals with pointers that are passed in by > // JNI code, so making sure that the pointer is aligned > // correctly is valuable. The other is that <= and > are > // technically not defined on pointers, so the if guard can > // pass spuriously; no modern compiler is likely to make that > // a problem, though (and if one did, the guard could also > // fail spuriously, which would be bad). > ptrdiff_t idx = m - b->_methods; > if (b->_methods + idx == m) { > return true; > } > > Coleen >> >> >> Thanks, >> Serguei >> >> ** >>> >>> Jeremy >> > From calvin.cheung at oracle.com Thu Nov 6 04:28:34 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 05 Nov 2014 20:28:34 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545AC5D3.9090005@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> Message-ID: <545AF8F2.1010106@oracle.com> On 11/5/2014 4:50 PM, David Holmes wrote: > On 6/11/2014 5:14 AM, Calvin Cheung wrote: >> While upgrading the compiler on Mac for jdk9, we found this compiler bug >> where it skips the following 2 lines of code in metaspaceShared.cpp when >> optimization is enable (set to -Os) for the fastdebug and product >> builds. >> strcat(class_list_path_str, os::file_separator()); >> strcat(class_list_path_str, "classlist"); >> >> The bug is reproducible with Xcode 5.1.1 and 6.1. >> >> A workaround fix is to rewrite an "if" block in the >> MetaspaceShared::preload_and_dump() method. > > Can't you simply replace the strcats with jio_snprintf and do away > with the sub_path array? The following works. I'll do more testing before sending an updated webrev. --- a/src/share/vm/memory/metaspaceShared.cpp +++ b/src/share/vm/memory/metaspaceShared.cpp @@ -713,12 +713,15 @@ int class_list_path_len = (int)strlen(class_list_path_str); if (class_list_path_len >= 3) { if (strcmp(class_list_path_str + class_list_path_len - 3, "lib") != 0) { - strcat(class_list_path_str, os::file_separator()); - strcat(class_list_path_str, "lib"); + jio_snprintf(class_list_path_str + class_list_path_len, + sizeof(class_list_path_str) - class_list_path_len, + "%slib", os::file_separator()); } } - strcat(class_list_path_str, os::file_separator()); - strcat(class_list_path_str, "classlist"); + class_list_path_len = (int)strlen(class_list_path_str); + jio_snprintf(class_list_path_str + class_list_path_len, + sizeof(class_list_path_str) - class_list_path_len, + "%sclasslist", os::file_separator()); class_list_path = class_list_path_str; } else { class_list_path = SharedClassListFile; > > Or even try strncat instead of strcat? I think jio_snprintf is better because it null terminates the string. If I use strncat, I'll need to initialize the entire buffer to null. thanks, Calvin > > David > >> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >> >> Testing: >> JPRT >> The affected testcase with product, fastdebug, and debug builds >> built with Xcode 5.1.1 and 6.1. >> >> thanks, >> Calvin From coleen.phillimore at oracle.com Thu Nov 6 05:02:01 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 06 Nov 2014 00:02:01 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545AE6D6.4040401@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> <545ACAA4.3020906@oracle.com> <545AE45A.5080003@oracle.com> <545AE6D6.4040401@oracle.com> Message-ID: <545B00C9.1070502@oracle.com> David and Serguei (and Jeremy), see below. Summary: I think Jeremy's code and comments are good. On 11/5/14, 10:11 PM, David Holmes wrote: > On 6/11/2014 1:00 PM, Coleen Phillimore wrote: >> >> On 11/5/14, 8:11 PM, serguei.spitsyn at oracle.com wrote: >>> >>> On 11/5/14 4:35 PM, Jeremy Manson wrote: >>>> >>>> >>>> On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore >>>> > >>>> wrote: >>>> >>>> >>>> On 11/5/14, 6:13 PM, Jeremy Manson wrote: >>>>> >>>>> >>>>> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >>>>> >>>> > wrote: >>>>> >>>>> The fix looks good in general. >>>>> >>>>> src/share/vm/oops/method.cpp >>>>> >>>>> 1785 bool contains(Method** m) { >>>>> 1786 if (m == NULL) return false; >>>>> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b >>>>> = b->_next) { >>>>> 1788 if (b->_methods <= m && m < b->_methods + >>>>> b->_number_of_methods) { >>>>> *1789 ptrdiff_t idx = m - b->_methods;** >>>>> **1790 if (b->_methods + idx == m) {** >>>>> 1791 return true; >>>>> 1792 }* >>>>> 1793 } >>>>> 1794 } >>>>> 1795 return false; // not found >>>>> 1796 } >>>>> >>>>> >>>>> Just noticed that the lines 1789-1792 can be replaced with >>>>> one liner: >>>>> * return true;* >>>>> >>>>> >>>>> Ah, you have found our crappy workaround for wild pointers to >>>>> non-aligned places in the middle of _methods. >>>> >>>> Can you explain this? Why are there wild pointers? >>>> >>>> >>>> My belief was that end user code could pass any old garbage to this >>>> function. It's called by Method::is_method_id, which is called >>>> by jniCheck::validate_jmethod_id. The idea was that this would help >>>> check jni deliver useful information in the case of the end user >>>> inputting garbage that happened to be in the right memory range. >>>> >>>> Having said that, at a second glance, it looks as if it that call is >>>> protected by a call to is_method() (in checked_resolve_jmethod_id), >>>> so the program will probably crash before it gets to this check. >>>> >>>> The other point about it was that the result of >= and < is >>>> technically unspecified; if it were ever implemented as anything >>>> other than a binary comparison between integers (which it never is, >>>> now that no one has a segmented architecture), the comparison could >>>> pass spuriously, so checking would be a good thing. Of course, the >>>> comparison could fail spuriously, too. >>>> >>>> Anyway, I'm happy to leave it in as belt-and-suspenders (and add a >>>> comment, obviously, since it has caused confusion), or take it out. >>>> Your call. >>> >>> I'm still confused. >>> >>> How this code could possibly check anything? >>> ptrdiff_t idx = m - b->_methods; >>> if (b->_methods + idx == m) { >>> >>> The condition above always gives true: >>> b->_methods + (idx) == b->_methods + (m - b->_methods) == >>> (b->_methods- b->_methods) + m == (0 + m) == m >>> >>> Even if m was unaligned then at the end we compare m with m which is >>> still true. >>> Do I miss anything? >> >> If 'm' is unaligned we would fail this comparison: >> >> (gdb) print &methods->_data[2] >> $34 = (Method **) 0x7fffe0022440 >> (gdb) print &methods->_data[0] >> $35 = (Method **) 0x7fffe0022430 >> (gdb) print 0x7fffe0022444 - 0x7fffe0022430 >> $32 = 20 > > I was confused about this too. What we have here is pointer > arithmetic, not regular arithmetic, so I'm assuming an unaligned value > has to be adjusted before the actual difference is computed. So in > practice: > > m - b->_methods > > is really > > adjusted_for_alignment(m) - b->_methods It's not adjusted for alignment: #include extern "C" int printf(const char *,...); class Method { int i ; int j; int k; }; Method* array[10] = { new Method(),new Method(),new Method(),new Method(),new Method(),n ew Method(),new Method(),new Method(),new Method(),new Method() }; void test(Method** m) { printf("m is 0x%p ", m); ptrdiff_t idx = m - array; if (array + idx == m) { printf("true %ld\n", idx); } else { printf("false %ld\n", idx); } } main() { Method** xx = &array[3]; test(xx); test((Method**)(((char*)xx) - 2)); } cphilli% a.out m is 0x0x601098 true 3 m is 0x0x601096 false 2 Coleen > > David > ----- > >> (gdb) print 20/8 >> $33 = 2 >> >> if m is misaligned 0x7fffe0022444 the idx would be 2 and the expression >> (b->_methods + idx) would evaluate to the aligned 0xfffe0022440 so not >> equal m. >> >> But the code could check for misaligned m instead (or it would have >> already crashed). I think all bets are off if the address space is >> segmented. >> >> The comment Jeremy added is: >> >> if (b->_methods <= m && m < b->_methods + >> b->_number_of_methods) { >> // This is a bit of extra checking, for two reasons. One is >> // that contains() deals with pointers that are passed in by >> // JNI code, so making sure that the pointer is aligned >> // correctly is valuable. The other is that <= and > are >> // technically not defined on pointers, so the if guard can >> // pass spuriously; no modern compiler is likely to make that >> // a problem, though (and if one did, the guard could also >> // fail spuriously, which would be bad). >> ptrdiff_t idx = m - b->_methods; >> if (b->_methods + idx == m) { >> return true; >> } >> >> Coleen >>> >>> >>> Thanks, >>> Serguei >>> >>> ** >>>> >>>> Jeremy >>> >> From david.holmes at oracle.com Thu Nov 6 05:35:10 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 15:35:10 +1000 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545B00C9.1070502@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> <545ACAA4.3020906@oracle.com> <545AE45A.5080003@oracle.com> <545AE6D6.4040401@oracle.com> <545B00C9.1070502@oracle.com> Message-ID: <545B088E.20903@oracle.com> On 6/11/2014 3:02 PM, Coleen Phillimore wrote: > > David and Serguei (and Jeremy), see below. Summary: I think Jeremy's > code and comments are good. > > On 11/5/14, 10:11 PM, David Holmes wrote: >> On 6/11/2014 1:00 PM, Coleen Phillimore wrote: >>> >>> On 11/5/14, 8:11 PM, serguei.spitsyn at oracle.com wrote: >>>> >>>> On 11/5/14 4:35 PM, Jeremy Manson wrote: >>>>> >>>>> >>>>> On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore >>>>> > >>>>> wrote: >>>>> >>>>> >>>>> On 11/5/14, 6:13 PM, Jeremy Manson wrote: >>>>>> >>>>>> >>>>>> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >>>>>> >>>>> > wrote: >>>>>> >>>>>> The fix looks good in general. >>>>>> >>>>>> src/share/vm/oops/method.cpp >>>>>> >>>>>> 1785 bool contains(Method** m) { >>>>>> 1786 if (m == NULL) return false; >>>>>> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b >>>>>> = b->_next) { >>>>>> 1788 if (b->_methods <= m && m < b->_methods + >>>>>> b->_number_of_methods) { >>>>>> *1789 ptrdiff_t idx = m - b->_methods;** >>>>>> **1790 if (b->_methods + idx == m) {** >>>>>> 1791 return true; >>>>>> 1792 }* >>>>>> 1793 } >>>>>> 1794 } >>>>>> 1795 return false; // not found >>>>>> 1796 } >>>>>> >>>>>> >>>>>> Just noticed that the lines 1789-1792 can be replaced with >>>>>> one liner: >>>>>> * return true;* >>>>>> >>>>>> >>>>>> Ah, you have found our crappy workaround for wild pointers to >>>>>> non-aligned places in the middle of _methods. >>>>> >>>>> Can you explain this? Why are there wild pointers? >>>>> >>>>> >>>>> My belief was that end user code could pass any old garbage to this >>>>> function. It's called by Method::is_method_id, which is called >>>>> by jniCheck::validate_jmethod_id. The idea was that this would help >>>>> check jni deliver useful information in the case of the end user >>>>> inputting garbage that happened to be in the right memory range. >>>>> >>>>> Having said that, at a second glance, it looks as if it that call is >>>>> protected by a call to is_method() (in checked_resolve_jmethod_id), >>>>> so the program will probably crash before it gets to this check. >>>>> >>>>> The other point about it was that the result of >= and < is >>>>> technically unspecified; if it were ever implemented as anything >>>>> other than a binary comparison between integers (which it never is, >>>>> now that no one has a segmented architecture), the comparison could >>>>> pass spuriously, so checking would be a good thing. Of course, the >>>>> comparison could fail spuriously, too. >>>>> >>>>> Anyway, I'm happy to leave it in as belt-and-suspenders (and add a >>>>> comment, obviously, since it has caused confusion), or take it out. >>>>> Your call. >>>> >>>> I'm still confused. >>>> >>>> How this code could possibly check anything? >>>> ptrdiff_t idx = m - b->_methods; >>>> if (b->_methods + idx == m) { >>>> >>>> The condition above always gives true: >>>> b->_methods + (idx) == b->_methods + (m - b->_methods) == >>>> (b->_methods- b->_methods) + m == (0 + m) == m >>>> >>>> Even if m was unaligned then at the end we compare m with m which is >>>> still true. >>>> Do I miss anything? >>> >>> If 'm' is unaligned we would fail this comparison: >>> >>> (gdb) print &methods->_data[2] >>> $34 = (Method **) 0x7fffe0022440 >>> (gdb) print &methods->_data[0] >>> $35 = (Method **) 0x7fffe0022430 >>> (gdb) print 0x7fffe0022444 - 0x7fffe0022430 >>> $32 = 20 >> >> I was confused about this too. What we have here is pointer >> arithmetic, not regular arithmetic, so I'm assuming an unaligned value >> has to be adjusted before the actual difference is computed. So in >> practice: >> >> m - b->_methods >> >> is really >> >> adjusted_for_alignment(m) - b->_methods > > It's not adjusted for alignment: Right - now I get it. Pointer difference is an algebraic subtraction with "div sizeof what is pointed to". For aligned pointers there will be no remainder and adding back the difference to the initial pointer will yield the end pointer. But if one of the pointers is not aligned that is not the case. All rather icky. Thanks, David ---- > #include > > extern "C" int printf(const char *,...); > class Method { > int i ; int j; int k; > }; > > Method* array[10] = { new Method(),new Method(),new Method(),new > Method(),new Method(),n > ew Method(),new Method(),new Method(),new Method(),new Method() }; > > void test(Method** m) { > printf("m is 0x%p ", m); > ptrdiff_t idx = m - array; > if (array + idx == m) { > printf("true %ld\n", idx); > } else { > printf("false %ld\n", idx); > } > } > main() { > Method** xx = &array[3]; > test(xx); > test((Method**)(((char*)xx) - 2)); > } > > cphilli% a.out > m is 0x0x601098 true 3 > m is 0x0x601096 false 2 > > > Coleen > >> >> David >> ----- >> >>> (gdb) print 20/8 >>> $33 = 2 >>> >>> if m is misaligned 0x7fffe0022444 the idx would be 2 and the expression >>> (b->_methods + idx) would evaluate to the aligned 0xfffe0022440 so not >>> equal m. >>> >>> But the code could check for misaligned m instead (or it would have >>> already crashed). I think all bets are off if the address space is >>> segmented. >>> >>> The comment Jeremy added is: >>> >>> if (b->_methods <= m && m < b->_methods + >>> b->_number_of_methods) { >>> // This is a bit of extra checking, for two reasons. One is >>> // that contains() deals with pointers that are passed in by >>> // JNI code, so making sure that the pointer is aligned >>> // correctly is valuable. The other is that <= and > are >>> // technically not defined on pointers, so the if guard can >>> // pass spuriously; no modern compiler is likely to make that >>> // a problem, though (and if one did, the guard could also >>> // fail spuriously, which would be bad). >>> ptrdiff_t idx = m - b->_methods; >>> if (b->_methods + idx == m) { >>> return true; >>> } >>> >>> Coleen >>>> >>>> >>>> Thanks, >>>> Serguei >>>> >>>> ** >>>>> >>>>> Jeremy >>>> >>> > From jeremymanson at google.com Thu Nov 6 05:44:53 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Wed, 5 Nov 2014 21:44:53 -0800 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545B088E.20903@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> <545ACAA4.3020906@oracle.com> <545AE45A.5080003@oracle.com> <545AE6D6.4040401@oracle.com> <545B00C9.1070502@oracle.com> <545B088E.20903@oracle.com> Message-ID: Wow, go take care of my toddler for a few hours, come back, and all the questions are answered for me! Thanks, Coleen. To be fair, the original code was actually correct (instead of, you know, implementation-dependent-correct), so I feel a little weird about the whole thing. Jeremy On Wed, Nov 5, 2014 at 9:35 PM, David Holmes wrote: > On 6/11/2014 3:02 PM, Coleen Phillimore wrote: > >> >> David and Serguei (and Jeremy), see below. Summary: I think Jeremy's >> code and comments are good. >> >> On 11/5/14, 10:11 PM, David Holmes wrote: >> >>> On 6/11/2014 1:00 PM, Coleen Phillimore wrote: >>> >>>> >>>> On 11/5/14, 8:11 PM, serguei.spitsyn at oracle.com wrote: >>>> >>>>> >>>>> On 11/5/14 4:35 PM, Jeremy Manson wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Nov 5, 2014 at 3:40 PM, Coleen Phillimore >>>>>> > >>>>>> wrote: >>>>>> >>>>>> >>>>>> On 11/5/14, 6:13 PM, Jeremy Manson wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 4, 2014 at 8:56 PM, serguei.spitsyn at oracle.com >>>>>>> >>>>>> > wrote: >>>>>>> >>>>>>> The fix looks good in general. >>>>>>> >>>>>>> src/share/vm/oops/method.cpp >>>>>>> >>>>>>> 1785 bool contains(Method** m) { >>>>>>> 1786 if (m == NULL) return false; >>>>>>> 1787 for (JNIMethodBlockNode* b = &_head; b != NULL; b >>>>>>> = b->_next) { >>>>>>> 1788 if (b->_methods <= m && m < b->_methods + >>>>>>> b->_number_of_methods) { >>>>>>> *1789 ptrdiff_t idx = m - b->_methods;** >>>>>>> **1790 if (b->_methods + idx == m) {** >>>>>>> 1791 return true; >>>>>>> 1792 }* >>>>>>> 1793 } >>>>>>> 1794 } >>>>>>> 1795 return false; // not found >>>>>>> 1796 } >>>>>>> >>>>>>> >>>>>>> Just noticed that the lines 1789-1792 can be replaced with >>>>>>> one liner: >>>>>>> * return true;* >>>>>>> >>>>>>> >>>>>>> Ah, you have found our crappy workaround for wild pointers to >>>>>>> non-aligned places in the middle of _methods. >>>>>>> >>>>>> >>>>>> Can you explain this? Why are there wild pointers? >>>>>> >>>>>> >>>>>> My belief was that end user code could pass any old garbage to this >>>>>> function. It's called by Method::is_method_id, which is called >>>>>> by jniCheck::validate_jmethod_id. The idea was that this would help >>>>>> check jni deliver useful information in the case of the end user >>>>>> inputting garbage that happened to be in the right memory range. >>>>>> >>>>>> Having said that, at a second glance, it looks as if it that call is >>>>>> protected by a call to is_method() (in checked_resolve_jmethod_id), >>>>>> so the program will probably crash before it gets to this check. >>>>>> >>>>>> The other point about it was that the result of >= and < is >>>>>> technically unspecified; if it were ever implemented as anything >>>>>> other than a binary comparison between integers (which it never is, >>>>>> now that no one has a segmented architecture), the comparison could >>>>>> pass spuriously, so checking would be a good thing. Of course, the >>>>>> comparison could fail spuriously, too. >>>>>> >>>>>> Anyway, I'm happy to leave it in as belt-and-suspenders (and add a >>>>>> comment, obviously, since it has caused confusion), or take it out. >>>>>> Your call. >>>>>> >>>>> >>>>> I'm still confused. >>>>> >>>>> How this code could possibly check anything? >>>>> ptrdiff_t idx = m - b->_methods; >>>>> if (b->_methods + idx == m) { >>>>> >>>>> The condition above always gives true: >>>>> b->_methods + (idx) == b->_methods + (m - b->_methods) == >>>>> (b->_methods- b->_methods) + m == (0 + m) == m >>>>> >>>>> Even if m was unaligned then at the end we compare m with m which is >>>>> still true. >>>>> Do I miss anything? >>>>> >>>> >>>> If 'm' is unaligned we would fail this comparison: >>>> >>>> (gdb) print &methods->_data[2] >>>> $34 = (Method **) 0x7fffe0022440 >>>> (gdb) print &methods->_data[0] >>>> $35 = (Method **) 0x7fffe0022430 >>>> (gdb) print 0x7fffe0022444 - 0x7fffe0022430 >>>> $32 = 20 >>>> >>> >>> I was confused about this too. What we have here is pointer >>> arithmetic, not regular arithmetic, so I'm assuming an unaligned value >>> has to be adjusted before the actual difference is computed. So in >>> practice: >>> >>> m - b->_methods >>> >>> is really >>> >>> adjusted_for_alignment(m) - b->_methods >>> >> >> It's not adjusted for alignment: >> > > Right - now I get it. Pointer difference is an algebraic subtraction with > "div sizeof what is pointed to". For aligned pointers there will be no > remainder and adding back the difference to the initial pointer will yield > the end pointer. But if one of the pointers is not aligned that is not the > case. > > All rather icky. > > Thanks, > David > ---- > > > #include >> >> extern "C" int printf(const char *,...); >> class Method { >> int i ; int j; int k; >> }; >> >> Method* array[10] = { new Method(),new Method(),new Method(),new >> Method(),new Method(),n >> ew Method(),new Method(),new Method(),new Method(),new Method() }; >> >> void test(Method** m) { >> printf("m is 0x%p ", m); >> ptrdiff_t idx = m - array; >> if (array + idx == m) { >> printf("true %ld\n", idx); >> } else { >> printf("false %ld\n", idx); >> } >> } >> main() { >> Method** xx = &array[3]; >> test(xx); >> test((Method**)(((char*)xx) - 2)); >> } >> >> cphilli% a.out >> m is 0x0x601098 true 3 >> m is 0x0x601096 false 2 >> >> >> Coleen >> >> >>> David >>> ----- >>> >>> (gdb) print 20/8 >>>> $33 = 2 >>>> >>>> if m is misaligned 0x7fffe0022444 the idx would be 2 and the expression >>>> (b->_methods + idx) would evaluate to the aligned 0xfffe0022440 so not >>>> equal m. >>>> >>>> But the code could check for misaligned m instead (or it would have >>>> already crashed). I think all bets are off if the address space is >>>> segmented. >>>> >>>> The comment Jeremy added is: >>>> >>>> if (b->_methods <= m && m < b->_methods + >>>> b->_number_of_methods) { >>>> // This is a bit of extra checking, for two reasons. One is >>>> // that contains() deals with pointers that are passed in by >>>> // JNI code, so making sure that the pointer is aligned >>>> // correctly is valuable. The other is that <= and > are >>>> // technically not defined on pointers, so the if guard can >>>> // pass spuriously; no modern compiler is likely to make that >>>> // a problem, though (and if one did, the guard could also >>>> // fail spuriously, which would be bad). >>>> ptrdiff_t idx = m - b->_methods; >>>> if (b->_methods + idx == m) { >>>> return true; >>>> } >>>> >>>> Coleen >>>> >>>>> >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> ** >>>>> >>>>>> >>>>>> Jeremy >>>>>> >>>>> >>>>> >>>> >> From david.simms at oracle.com Thu Nov 6 07:46:01 2014 From: david.simms at oracle.com (David Simms) Date: Thu, 06 Nov 2014 08:46:01 +0100 Subject: RFR 8058715: stability issues when being launched as an embedded JVM via JNI In-Reply-To: <545A738D.2080201@oracle.com> References: <545A738D.2080201@oracle.com> Message-ID: <545B2739.4000807@oracle.com> Patch looks good David. Cheers On 2014-11-05 19:59, david buck wrote: > Hi! > > This is a request for code review of my fix for jdk8058715 > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058715 > WEBREV: http://cr.openjdk.java.net/~dbuck/8058715/webrev.01/ > > We have also received confirmation from the original reporter of the > issue that this solution resolves the crashes they were seeing in > their environment. I have tested that this change does not break the > original NX bug workaround. I also ran the NX bug reproducer (v8 > benchmark of Nashorn running in a loop) using a fastdebug build with > the -XX:NativeMemoryTracking=summary option. Obviously no crashes or > other issues were detected. > > Cheers, > -Buck From roland.westrelin at oracle.com Thu Nov 6 09:53:17 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 6 Nov 2014 10:53:17 +0100 Subject: [9] RFR (S): 8060147: SIGSEGV in Metadata::mark_on_stack() while marking metadata in ciEnv In-Reply-To: <545A5814.8000109@oracle.com> References: <5450F261.60400@oracle.com> <545114DF.7040005@oracle.com> <54511744.4060904@oracle.com> <5451F43A.1010108@oracle.com> <5452128C.4090408@oracle.com> <54522805.5040701@oracle.com> <1421AC31-CBEA-4AED-A657-3C43B258A6DB@oracle.com> <54522357.4070705@oracle.com> <5452425D.7040405@oracle.com> <5452517C.4050104@oracle.com> <54527E1E.1070507@oracle.com> <5452A786.7030000@oracle.com> <5452A077.2050903@oracle.com> <545A5F59.2020907@oracle.com> <545A5814.8000109@oracle.com> Message-ID: > VladimirK, Roland, what do you think about (1)? Looks ok to me. Roland. From david.buck at oracle.com Thu Nov 6 10:21:38 2014 From: david.buck at oracle.com (david buck) Date: Thu, 06 Nov 2014 19:21:38 +0900 Subject: [8u40] RFR backport 8058715: stability issues when being launched as an embedded JVM via JNI Message-ID: <545B4BB2.9020701@oracle.com> Hi! This is a request for approval to backport this fix to jdk8. The jdk9 change applies cleanly and I have already built and tested on 8. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058715 JDK9 changeset: http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/6748f6322b92 Cheers, -Buck From david.holmes at oracle.com Thu Nov 6 10:32:06 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Nov 2014 20:32:06 +1000 Subject: [8u40] RFR backport 8058715: stability issues when being launched as an embedded JVM via JNI In-Reply-To: <545B4BB2.9020701@oracle.com> References: <545B4BB2.9020701@oracle.com> Message-ID: <545B4E26.9030807@oracle.com> Approved. David H. On 6/11/2014 8:21 PM, david buck wrote: > Hi! > > This is a request for approval to backport this fix to jdk8. The jdk9 > change applies cleanly and I have already built and tested on 8. > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058715 > JDK9 changeset: > http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/6748f6322b92 > > Cheers, > -Buck From martin.doerr at sap.com Thu Nov 6 10:40:24 2014 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 6 Nov 2014 10:40:24 +0000 Subject: RFR (XS): 8062950: Bug in locking code when UseOptoBiasInlining is disabled: assert(dmw->is_neutral()) failed: invariant In-Reply-To: <545AB32E.8070402@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB418116566ACE597E@DEWDFEMB19C.global.corp.sap> <545AB32E.8070402@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116566ACE6BEC@DEWDFEMB19C.global.corp.sap> Hi Vladimir, thanks for replying quickly. Are you sure you want the swap_reg_contains_mark flag to get removed? There's a TODO in front of the changed line: // TODO: optimize away redundant LDs of obj->mark and improve the markword triage // order to reduce the number of conditional branches in the most common cases. // Beware -- there's a subtle invariant that fetch of the markword // at [FETCH], below, will never observe a biased encoding (*101b). // If this invariant is not held we risk exclusion (safety) failure. So I'm not sure if the flag may be useful again when somebody works on this TODO. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 6. November 2014 00:31 To: Christian Thalinger; Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime Subject: Re: RFR (XS): 8062950: Bug in locking code when UseOptoBiasInlining is disabled: assert(dmw->is_neutral()) failed: invariant It is our (Compiler group) code. This problem was introduced with my changes for RTM locking. Martin your changes are good. But you cleanup a bit this code since we now never put markword to tmpReg before this call? Thanks, Vladimir On 11/5/14 3:13 PM, Christian Thalinger wrote: > I?m not exactly sure who is our biased locking expert these days but I guess it?s someone from the runtime team. CC?ing them. > >> On Nov 5, 2014, at 7:38 AM, Doerr, Martin wrote: >> >> Hi, >> >> we found a bug in MacroAssembler::fast_lock on x86 which shows up when UseOptoBiasInlining is switched off. >> The problem is that biased_locking_enter is used with swap_reg_contains_mark==true, which is no longer correct after biased_locking_enter was put in front of check for IsInflated. >> >> Please review >> http://cr.openjdk.java.net/~goetz/webrevs/8062950-lockBug/webrev.00/ >> >> Best regards, >> Martin > From aleksey.shipilev at oracle.com Thu Nov 6 13:00:38 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 06 Nov 2014 16:00:38 +0300 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <525B0A18.8000105@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> Message-ID: <545B70F6.60801@oracle.com> Hi, The Halloween is over, but here is a creepy undead patch from the past. http://cr.openjdk.java.net/~shade/8015272/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8015272 8u does not need the patch, but it would be nice to have it in 9. I have checked: * Still the same chunk of code as for non- at Contended cases * Still builds fine on Linux x86_64 fastdebug/release * Still passes all hotspot/test/runtime jtreg tests * Still passes the JPRT Thanks, -Aleksey. On 10/14/2013 01:01 AM, Aleksey Shipilev wrote: > Hi Christian, > > Your call. I'm merely announcing the patch is ready. :) > > -Aleksey. > > On 10/13/2013 10:02 PM, Christian Tornqvist wrote: >> Hi Aleksey >> >> We're well past Feature Complete (May 23) and Zero Bug Bounce is around the >> corner, this is not the time to push enhancements. In my opinion this should >> be done in the next 8u or in jdk9. >> >> Thanks, >> Christian >> >> -----Original Message----- >> From: hotspot-runtime-dev-bounces at openjdk.java.net >> [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Aleksey >> Shipilev >> Sent: Sunday, October 13, 2013 12:11 PM >> To: hotspot-runtime-dev at openjdk.java.net >> Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use >> the same oop map >> >> Hi, >> >> Please review the simple improvement: >> http://cr.openjdk.java.net/~shade/8015272/webrev.00/ >> >> I have copy-pasted the same block from the non- at Contended handling, because >> it is generic for both cases. The change is also on the path which is >> excercized only with @Contended with the same tag. Both @Contended >> regression tests cover this case, as well as j.l.Thread containing >> @Contended over the TLR state implicitly tests it in every VM run. >> >> Testing: >> - Linux x86_64 fastdebug/release: all hotspot/test/runtime jtreg >> - JPRT full cycle against hotspot-rt >> - vm.quick (running) >> >> -Aleksey. >> > From coleen.phillimore at oracle.com Thu Nov 6 13:49:12 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 06 Nov 2014 08:49:12 -0500 Subject: RFR 8062116: JVMTI GetClassMethods is Slow In-Reply-To: <545B088E.20903@oracle.com> References: <5457E36A.3020800@oracle.com> <54592FC2.7090406@oracle.com> <5459501C.4040807@oracle.com> <5459ADFB.4090808@oracle.com> <545AB561.9020204@oracle.com> <545ACAA4.3020906@oracle.com> <545AE45A.5080003@oracle.com> <545AE6D6.4040401@oracle.com> <545B00C9.1070502@oracle.com> <545B088E.20903@oracle.com> Message-ID: <545B7C58.8070404@oracle.com> David, you didn't recommend taking the code out, because it looked like something that would trick people, so we'll leave it in. It's benign. The rest of the change improves performance, which we want. Thanks, Coleen On 11/6/14, 12:35 AM, David Holmes wrote: > Right - now I get it. Pointer difference is an algebraic subtraction > with "div sizeof what is pointed to". For aligned pointers there will > be no remainder and adding back the difference to the initial pointer > will yield the end pointer. But if one of the pointers is not aligned > that is not the case. > > All rather icky. > > Thanks, > David > ---- From karen.kinnear at oracle.com Thu Nov 6 15:01:30 2014 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 6 Nov 2014 10:01:30 -0500 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <545B70F6.60801@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> Message-ID: <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> I agree with Christian that it is too late for jdk8u. Could you please do additional testing - e.g. what about the vmtestbase vm/runtime/contended tests (and yes, some tests should be removed from that testlist) - vmtestbase: vm.quick.testlist (required for runtime changes) - and since @Contended annotation is used in the JDK core libraries - the jtreg jdk tests? Does @Contended sometimes run into platform-specific bugs? Looking through earlier bugtails I see bugs only filed against specific platforms, but it is not clear to me if the bugs also were seen on other platforms and not recorded. So the question is - is this a feature that needs testing on multiple platforms? thanks, Karen On Nov 6, 2014, at 8:00 AM, Aleksey Shipilev wrote: > Hi, > > The Halloween is over, but here is a creepy undead patch from the past. > http://cr.openjdk.java.net/~shade/8015272/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8015272 > > 8u does not need the patch, but it would be nice to have it in 9. > > I have checked: > * Still the same chunk of code as for non- at Contended cases > * Still builds fine on Linux x86_64 fastdebug/release > * Still passes all hotspot/test/runtime jtreg tests > * Still passes the JPRT > > Thanks, > -Aleksey. > > On 10/14/2013 01:01 AM, Aleksey Shipilev wrote: >> Hi Christian, >> >> Your call. I'm merely announcing the patch is ready. :) >> >> -Aleksey. >> >> On 10/13/2013 10:02 PM, Christian Tornqvist wrote: >>> Hi Aleksey >>> >>> We're well past Feature Complete (May 23) and Zero Bug Bounce is around the >>> corner, this is not the time to push enhancements. In my opinion this should >>> be done in the next 8u or in jdk9. >>> >>> Thanks, >>> Christian >>> >>> -----Original Message----- >>> From: hotspot-runtime-dev-bounces at openjdk.java.net >>> [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Aleksey >>> Shipilev >>> Sent: Sunday, October 13, 2013 12:11 PM >>> To: hotspot-runtime-dev at openjdk.java.net >>> Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use >>> the same oop map >>> >>> Hi, >>> >>> Please review the simple improvement: >>> http://cr.openjdk.java.net/~shade/8015272/webrev.00/ >>> >>> I have copy-pasted the same block from the non- at Contended handling, because >>> it is generic for both cases. The change is also on the path which is >>> excercized only with @Contended with the same tag. Both @Contended >>> regression tests cover this case, as well as j.l.Thread containing >>> @Contended over the TLR state implicitly tests it in every VM run. >>> >>> Testing: >>> - Linux x86_64 fastdebug/release: all hotspot/test/runtime jtreg >>> - JPRT full cycle against hotspot-rt >>> - vm.quick (running) >>> >>> -Aleksey. >>> >> > > From aleksey.shipilev at oracle.com Thu Nov 6 16:07:28 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 06 Nov 2014 19:07:28 +0300 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> Message-ID: <545B9CC0.3080106@oracle.com> Hi Karen, Thanks for looking into this. On 11/06/2014 06:01 PM, Karen Kinnear wrote: > I agree with Christian that it is too late for jdk8u. ...and I am not advocating for the inclusion to jdk8u, as you can see in my today's message. This minor cleanup may be done in jdk9 only, therefore jdk8u schedule does not apply. > Could you please do additional testing Sure, that would take a while. I'm not sure why we need to burn time on this trivial change, since the the code is copied 1:1 from the same well-exercised codepath for non- at Contended oops, and additionally exercised by runtime jtreg tests. Anyhow: > - e.g. what about the vmtestbase vm/runtime/contended tests (and yes, some tests should be removed from that testlist) Submitted the test job, no progress yet. > - vmtestbase: vm.quick.testlist (required for runtime changes) Submitted the test job, no progress yet. > - and since @Contended annotation is used in the JDK core libraries - the jtreg jdk tests? There are two group of @Contended users: java/lang/Thread and java/util/concurrent/*. jdk/test/java/lang and jdk/test/java/util/concurrent jtreg tests yield no new failures on my Linux x86_64/fastdebug. The testing jobs submitted above should run them on all platforms. > Does @Contended sometimes run into platform-specific bugs? Looking through earlier bugtails > I see bugs only filed against specific platforms, but it is not clear to me if the bugs also were seen > on other platforms and not recorded. @Contended handling code is platform-agnostic, we haven't seen the platform-specific bugs there. > So the question is - is this a feature that needs testing on multiple platforms? No, I don't think so. Thanks, -Aleksey. > thanks, > Karen > > On Nov 6, 2014, at 8:00 AM, Aleksey Shipilev wrote: > >> Hi, >> >> The Halloween is over, but here is a creepy undead patch from the past. >> http://cr.openjdk.java.net/~shade/8015272/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8015272 >> >> 8u does not need the patch, but it would be nice to have it in 9. >> >> I have checked: >> * Still the same chunk of code as for non- at Contended cases >> * Still builds fine on Linux x86_64 fastdebug/release >> * Still passes all hotspot/test/runtime jtreg tests >> * Still passes the JPRT >> >> Thanks, >> -Aleksey. >> >> On 10/14/2013 01:01 AM, Aleksey Shipilev wrote: >>> Hi Christian, >>> >>> Your call. I'm merely announcing the patch is ready. :) >>> >>> -Aleksey. >>> >>> On 10/13/2013 10:02 PM, Christian Tornqvist wrote: >>>> Hi Aleksey >>>> >>>> We're well past Feature Complete (May 23) and Zero Bug Bounce is around the >>>> corner, this is not the time to push enhancements. In my opinion this should >>>> be done in the next 8u or in jdk9. >>>> >>>> Thanks, >>>> Christian >>>> >>>> -----Original Message----- >>>> From: hotspot-runtime-dev-bounces at openjdk.java.net >>>> [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Aleksey >>>> Shipilev >>>> Sent: Sunday, October 13, 2013 12:11 PM >>>> To: hotspot-runtime-dev at openjdk.java.net >>>> Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use >>>> the same oop map >>>> >>>> Hi, >>>> >>>> Please review the simple improvement: >>>> http://cr.openjdk.java.net/~shade/8015272/webrev.00/ >>>> >>>> I have copy-pasted the same block from the non- at Contended handling, because >>>> it is generic for both cases. The change is also on the path which is >>>> excercized only with @Contended with the same tag. Both @Contended >>>> regression tests cover this case, as well as j.l.Thread containing >>>> @Contended over the TLR state implicitly tests it in every VM run. >>>> >>>> Testing: >>>> - Linux x86_64 fastdebug/release: all hotspot/test/runtime jtreg >>>> - JPRT full cycle against hotspot-rt >>>> - vm.quick (running) >>>> >>>> -Aleksey. >>>> >>> >> >> > From andreas.eriksson at oracle.com Thu Nov 6 16:38:25 2014 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Thu, 06 Nov 2014 17:38:25 +0100 Subject: RFR: JDK7 backport of 8020675 - invalid jar file in the bootclasspath could lead to jvm fatal error In-Reply-To: <545B797D.70907@oracle.com> References: <545B797D.70907@oracle.com> Message-ID: <545BA401.1070205@oracle.com> Hi, Could someone please review this jdk7 backport of JDK-8020675 . Summary: invalid jar file in the bootclasspath could lead to jvm fatal error removed offending EXCEPTION_MARK calls and code cleanup One code change necessary for the backport was in method ClassLoader::load_classfile. The change was to use CHECK_(instanceKlassHandle()) instead of CHECK_NULL. See the mail thread at http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-November/015825.html for more information. Webrev: http://cr.openjdk.java.net/~aeriksso/8020675/webrev.00/ Regards, Andreas From daniel.daugherty at oracle.com Thu Nov 6 18:01:45 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 11:01:45 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545ABDF1.6050107@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> Message-ID: <545BB789.2050906@oracle.com> On 11/5/14 5:16 PM, David Holmes wrote: > On 6/11/2014 1:49 AM, Claes Redestad wrote: >> Hi, >> >> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Contended Locking cleanup bucket fix ready for review. >>> >>> This fix was spun off from the Contended Locking fast enter bucket >>> which was sent out for review late last week. This fix cleans up >>> the computation of ObjectMonitor field pointers and gets rid of >>> the use of literal '-2' in appropriate places. For example: >>> >>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>> Rscratch); >>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >>> >>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>> specified field and subtracts markOopDesc:monitor_value (2). >>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >> >> any reason not to add it as a function in objectMonitor.hpp instead of a >> macro? How about: >> >> static int no_monitor_offset_in_bytes() { return >> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } > > _owner is not the only field used so you would need a function for > each one. David, thanks for jumping in on this part of the review thread. I ended up being off-the-air yesterday from mid-morning on. Claes, David is correct that your suggestion would require a function for each field that we use currently and for completeness should have a function for each field that has an offset_in_bytes() function. We have 12 fields in ObjectMonitor for which we provide an offset_in_bytes() so I don't think we want to go down that route. Dan > > David > ----- > >> Example usage: >> >> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >> Rscratch); >> + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), >> Rscratch); >> >> >> Seems this should be inlined regardless and looks a bit cleaner to me. >> >> Thanks! >> >> /Claes >> >>> >>> Thanks to David Holmes for his comments on JDK-8061553 that >>> motivated this (long overdue) cleanup. >>> >>> This work is being tracked by the following bug ID: >>> >>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>> >>> Here is the JEP link: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>> >>> Testing: >>> >>> - JPRT test jobs (since this is only syntax and comment cleanup) >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan >> > From daniel.daugherty at oracle.com Thu Nov 6 18:02:35 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 11:02:35 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545A4719.50705@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> Message-ID: <545BB7BB.6050202@oracle.com> On 11/5/14 8:49 AM, Claes Redestad wrote: > Hi, > > On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Contended Locking cleanup bucket fix ready for review. >> >> This fix was spun off from the Contended Locking fast enter bucket >> which was sent out for review late last week. This fix cleans up >> the computation of ObjectMonitor field pointers and gets rid of >> the use of literal '-2' in appropriate places. For example: >> >> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >> Rscratch); >> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >> >> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >> specified field and subtracts markOopDesc:monitor_value (2). >> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. > > any reason not to add it as a function in objectMonitor.hpp instead of > a macro? How about: > > static int no_monitor_offset_in_bytes() { return > offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } > > Example usage: > > - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, > Rscratch); > + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), > Rscratch); > > > Seems this should be inlined regardless and looks a bit cleaner to me. Claes, thanks for reviewing! Please see my reply to David H where I pickup your comments (and hopefully resolve them). Dan > > Thanks! > > /Claes > >> >> Thanks to David Holmes for his comments on JDK-8061553 that >> motivated this (long overdue) cleanup. >> >> This work is being tracked by the following bug ID: >> >> JDK-8062851 cleanup ObjectMonitor offset adjustments >> https://bugs.openjdk.java.net/browse/JDK-8062851 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >> >> Here is the JEP link: >> >> https://bugs.openjdk.java.net/browse/JDK-8046133 >> >> Testing: >> >> - JPRT test jobs (since this is only syntax and comment cleanup) >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan > > From daniel.daugherty at oracle.com Thu Nov 6 18:10:14 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 11:10:14 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545A6E83.8060909@oracle.com> References: <5459A8ED.8060808@oracle.com> <5459FF11.1080801@oracle.com> <545A4274.6090409@oracle.com> <545A6E83.8060909@oracle.com> Message-ID: <545BB986.3000803@oracle.com> On 11/5/14 11:37 AM, Coleen Phillimore wrote: > > Dan, I had a look at this change too. Thanks for reviewing! > On 11/5/14, 10:29 AM, Daniel D. Daugherty wrote: >> On 11/5/14 3:42 AM, David Holmes wrote: >>> Hi Dan, >>> >>> Reviewed. >> >> Thanks! >> >> >>> I find the name OM_OFFSET_NO_MONITOR_VALUE somewhat awkward but have >>> no better suggestion. >> >> Understood. I didn't like the original "OFFSET_SKEWED" name >> especially since I was moving it to objectMonitor.hpp... >> >> If you think of a better, let me know... we can always change it. >> > > So the -2 was a tag? Then maybe a better name is UNTAGGED_OM_OFFSET > .. Weird stuff anyway. Yes, the '2' is one of the markWord encodings so we have to remove it in order to have a proper pointer. Definitely weird stuff. > In > http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html > > Can you make the whitespace changes to the lines you've changed: > > + movptr(tmpReg, Address (tmpReg, > OM_OFFSET_NO_MONITOR_VALUE(owner))); // rax, = m->_owner > > > to > > + movptr(tmpReg, Address(tmpReg, > OM_OFFSET_NO_MONITOR_VALUE(owner))); // rax, = m->_owner Yes, I'll make some whitespace cleanups to the lines that I touch. > In general, this looks like a great improvement not subtracting two > from seemingly random places in assembly code. Thanks goes to David H. for noticing the cleanup note that was left in the code previously. My only tweak to it was putting the macro in place where it could be shared by different CPU impls and hopefully I improved the comment. :-) Dan > > thanks, > Coleen > >> >> >>> In fact I have to ask what _is_ the object monitor tagging >>> mechanism? I can't see it defined in the objectMonitor.* files. ?? >> >> That would be this code: >> >> src/share/vm/oops/markOop.hpp: >> >> 317 static markOop encode(ObjectMonitor* monitor) { >> 318 intptr_t tmp = (intptr_t) monitor; >> 319 return (markOop) (tmp | monitor_value); >> 320 } >> >> and the other methods in that file that have to account for >> the monitor_value being set... >> >> Dan >> >> >>> >>> Thanks, >>> David >>> >>> On 5/11/2014 2:34 PM, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I have a Contended Locking cleanup bucket fix ready for review. >>>> >>>> This fix was spun off from the Contended Locking fast enter bucket >>>> which was sent out for review late last week. This fix cleans up >>>> the computation of ObjectMonitor field pointers and gets rid of >>>> the use of literal '-2' in appropriate places. For example: >>>> >>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>> Rscratch); >>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >>>> >>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>> specified field and subtracts markOopDesc:monitor_value (2). >>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>> >>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>> motivated this (long overdue) cleanup. >>>> >>>> This work is being tracked by the following bug ID: >>>> >>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>> >>>> Here is the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>> >>>> Here is the JEP link: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>> >>>> Testing: >>>> >>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan >> > > From calvin.cheung at oracle.com Thu Nov 6 18:12:47 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Thu, 06 Nov 2014 10:12:47 -0800 Subject: RFR: JDK7 backport of 8020675 - invalid jar file in the bootclasspath could lead to jvm fatal error In-Reply-To: <545BA401.1070205@oracle.com> References: <545B797D.70907@oracle.com> <545BA401.1070205@oracle.com> Message-ID: <545BBA1F.3040301@oracle.com> Hi Andreas, The change looks good. There should be a dummy.jar to go with the test cases. http://cr.openjdk.java.net/~ccheung/8020675/webrev.02/ The webrev won't show any diffs for the jar file but don't forget to include it when you push the fix. thanks, Calvin On 11/6/2014 8:38 AM, Andreas Eriksson wrote: > Hi, > > Could someone please review this jdk7 backport of JDK-8020675 > . > Summary: > invalid jar file in the bootclasspath could lead to jvm fatal error > removed offending EXCEPTION_MARK calls and code cleanup > > One code change necessary for the backport was in method > ClassLoader::load_classfile. > The change was to use CHECK_(instanceKlassHandle()) instead of > CHECK_NULL. > See the mail thread at > http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-November/015825.html > for more information. > > Webrev: http://cr.openjdk.java.net/~aeriksso/8020675/webrev.00/ > > Regards, > Andreas > > From daniel.daugherty at oracle.com Thu Nov 6 18:13:46 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 11:13:46 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545AC1FC.8010905@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> <545AC1FC.8010905@oracle.com> Message-ID: <545BBA5A.9000007@oracle.com> On 11/5/14 5:34 PM, David Holmes wrote: > On 6/11/2014 10:23 AM, Coleen Phillimore wrote: >> >> On 11/5/14, 7:16 PM, David Holmes wrote: >>> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>>> Hi, >>>> >>>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> I have a Contended Locking cleanup bucket fix ready for review. >>>>> >>>>> This fix was spun off from the Contended Locking fast enter bucket >>>>> which was sent out for review late last week. This fix cleans up >>>>> the computation of ObjectMonitor field pointers and gets rid of >>>>> the use of literal '-2' in appropriate places. For example: >>>>> >>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>> Rscratch); >>>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), Rscratch); >>>>> >>>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>>> specified field and subtracts markOopDesc:monitor_value (2). >>>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>> >>>> any reason not to add it as a function in objectMonitor.hpp instead >>>> of a >>>> macro? How about: >>>> >>>> static int no_monitor_offset_in_bytes() { return >>>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >>> >>> _owner is not the only field used so you would need a function for >>> each one. >> >> I thought this would be better too. There are only 6 functions (6 >> lines) max that need this. It would look nicer. > > Only changes an upper case macro name to a lower case function name. As I mentioned in my reply to Claes, there are 12 offset_in_bytes() functions so for completeness we would add 12 new functions. I really don't want to do that. > >> My suggestion would be to make them static int >> untagged_offset_in_bytes() or whatever monitor_value is. It's not a >> very descriptive name so better to name the functions after what it's >> for. > > You need the field name included in the function name: > > untagged_offset_of_owner() > untagged_offset_of_xxx() > > but it is only untagged if the OM is currently inflated, so then: > > untagged_offset_of_XXX_for_inflated_om() > > I can live with Dan's macro (which is an improvement on the original). Thanks! I'm planning to stick with the macro which has a much smaller footprint for such a strange little task... Dan > > David > >> Coleen >> >>> >>> David >>> ----- >>> >>>> Example usage: >>>> >>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>> Rscratch); >>>> + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), >>>> Rscratch); >>>> >>>> >>>> Seems this should be inlined regardless and looks a bit cleaner to me. >>>> >>>> Thanks! >>>> >>>> /Claes >>>> >>>>> >>>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>>> motivated this (long overdue) cleanup. >>>>> >>>>> This work is being tracked by the following bug ID: >>>>> >>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>> >>>>> Here is the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>>> >>>>> Here is the JEP link: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>> >>>>> Testing: >>>>> >>>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>>> >>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>> >>>>> Dan >>>> >> > > > From daniel.daugherty at oracle.com Thu Nov 6 18:19:07 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 11:19:07 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545BBA5A.9000007@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> <545AC1FC.8010905@oracle.com> <545BBA5A.9000007@oracle.com> Message-ID: <545BBB9B.5000807@oracle.com> I just reread the entire review thread. I'm going to tweak the macro name a little bit: OM_OFFSET_NO_MONITOR_VALUE => OM_OFFSET_NO_MONITOR_VALUE_TAG I think that captures the intent quite nicely... Yes, I considered OM_OFFSET_WITHOUT_MONITOR_VALUE_TAG, but that made the macro even longer... :-) Dan On 11/6/14 11:13 AM, Daniel D. Daugherty wrote: > On 11/5/14 5:34 PM, David Holmes wrote: >> On 6/11/2014 10:23 AM, Coleen Phillimore wrote: >>> >>> On 11/5/14, 7:16 PM, David Holmes wrote: >>>> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>>>> Hi, >>>>> >>>>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>>>> Greetings, >>>>>> >>>>>> I have a Contended Locking cleanup bucket fix ready for review. >>>>>> >>>>>> This fix was spun off from the Contended Locking fast enter bucket >>>>>> which was sent out for review late last week. This fix cleans up >>>>>> the computation of ObjectMonitor field pointers and gets rid of >>>>>> the use of literal '-2' in appropriate places. For example: >>>>>> >>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>>> Rscratch); >>>>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), >>>>>> Rscratch); >>>>>> >>>>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>>>> specified field and subtracts markOopDesc:monitor_value (2). >>>>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>>> >>>>> any reason not to add it as a function in objectMonitor.hpp >>>>> instead of a >>>>> macro? How about: >>>>> >>>>> static int no_monitor_offset_in_bytes() { return >>>>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >>>> >>>> _owner is not the only field used so you would need a function for >>>> each one. >>> >>> I thought this would be better too. There are only 6 functions (6 >>> lines) max that need this. It would look nicer. >> >> Only changes an upper case macro name to a lower case function name. > > As I mentioned in my reply to Claes, there are 12 offset_in_bytes() > functions > so for completeness we would add 12 new functions. I really don't want > to do > that. > > >> >>> My suggestion would be to make them static int >>> untagged_offset_in_bytes() or whatever monitor_value is. It's not a >>> very descriptive name so better to name the functions after what >>> it's for. >> >> You need the field name included in the function name: >> >> untagged_offset_of_owner() >> untagged_offset_of_xxx() >> >> but it is only untagged if the OM is currently inflated, so then: >> >> untagged_offset_of_XXX_for_inflated_om() >> >> I can live with Dan's macro (which is an improvement on the original). > > Thanks! I'm planning to stick with the macro which has a much smaller > footprint for such a strange little task... > > Dan > > >> >> David >> >>> Coleen >>> >>>> >>>> David >>>> ----- >>>> >>>>> Example usage: >>>>> >>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>> Rscratch); >>>>> + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), >>>>> Rscratch); >>>>> >>>>> >>>>> Seems this should be inlined regardless and looks a bit cleaner to >>>>> me. >>>>> >>>>> Thanks! >>>>> >>>>> /Claes >>>>> >>>>>> >>>>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>>>> motivated this (long overdue) cleanup. >>>>>> >>>>>> This work is being tracked by the following bug ID: >>>>>> >>>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>>> >>>>>> Here is the webrev URL: >>>>>> >>>>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>>>> >>>>>> Here is the JEP link: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>> >>>>>> Testing: >>>>>> >>>>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>>>> >>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>> >>>>>> Dan >>>>> >>> >> >> >> > From daniel.daugherty at oracle.com Thu Nov 6 19:01:47 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 12:01:47 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545BBB9B.5000807@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> <545AC1FC.8010905@oracle.com> <545BBA5A.9000007@oracle.com> <545BBB9B.5000807@oracle.com> Message-ID: <545BC59B.5080902@oracle.com> Here's an updated webrev: http://cr.openjdk.java.net/~dcubed/8062851-webrev/1-jdk9-hs-rt/ What I did to sanity check this is compare the patch files from the two webrevs... Dan On 11/6/14 11:19 AM, Daniel D. Daugherty wrote: > I just reread the entire review thread. > > I'm going to tweak the macro name a little bit: > > OM_OFFSET_NO_MONITOR_VALUE => OM_OFFSET_NO_MONITOR_VALUE_TAG > > I think that captures the intent quite nicely... > > Yes, I considered OM_OFFSET_WITHOUT_MONITOR_VALUE_TAG, but that > made the macro even longer... :-) > > Dan > > > On 11/6/14 11:13 AM, Daniel D. Daugherty wrote: >> On 11/5/14 5:34 PM, David Holmes wrote: >>> On 6/11/2014 10:23 AM, Coleen Phillimore wrote: >>>> >>>> On 11/5/14, 7:16 PM, David Holmes wrote: >>>>> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>>>>> Hi, >>>>>> >>>>>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> I have a Contended Locking cleanup bucket fix ready for review. >>>>>>> >>>>>>> This fix was spun off from the Contended Locking fast enter bucket >>>>>>> which was sent out for review late last week. This fix cleans up >>>>>>> the computation of ObjectMonitor field pointers and gets rid of >>>>>>> the use of literal '-2' in appropriate places. For example: >>>>>>> >>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>>>> Rscratch); >>>>>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), >>>>>>> Rscratch); >>>>>>> >>>>>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>>>>> specified field and subtracts markOopDesc:monitor_value (2). >>>>>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>>>> >>>>>> any reason not to add it as a function in objectMonitor.hpp >>>>>> instead of a >>>>>> macro? How about: >>>>>> >>>>>> static int no_monitor_offset_in_bytes() { return >>>>>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >>>>> >>>>> _owner is not the only field used so you would need a function for >>>>> each one. >>>> >>>> I thought this would be better too. There are only 6 functions (6 >>>> lines) max that need this. It would look nicer. >>> >>> Only changes an upper case macro name to a lower case function name. >> >> As I mentioned in my reply to Claes, there are 12 offset_in_bytes() >> functions >> so for completeness we would add 12 new functions. I really don't >> want to do >> that. >> >> >>> >>>> My suggestion would be to make them static int >>>> untagged_offset_in_bytes() or whatever monitor_value is. It's not a >>>> very descriptive name so better to name the functions after what >>>> it's for. >>> >>> You need the field name included in the function name: >>> >>> untagged_offset_of_owner() >>> untagged_offset_of_xxx() >>> >>> but it is only untagged if the OM is currently inflated, so then: >>> >>> untagged_offset_of_XXX_for_inflated_om() >>> >>> I can live with Dan's macro (which is an improvement on the original). >> >> Thanks! I'm planning to stick with the macro which has a much smaller >> footprint for such a strange little task... >> >> Dan >> >> >>> >>> David >>> >>>> Coleen >>>> >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> Example usage: >>>>>> >>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>>> Rscratch); >>>>>> + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), >>>>>> Rscratch); >>>>>> >>>>>> >>>>>> Seems this should be inlined regardless and looks a bit cleaner >>>>>> to me. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> /Claes >>>>>> >>>>>>> >>>>>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>>>>> motivated this (long overdue) cleanup. >>>>>>> >>>>>>> This work is being tracked by the following bug ID: >>>>>>> >>>>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>>>> >>>>>>> Here is the webrev URL: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>>>>> >>>>>>> Here is the JEP link: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>>> >>>>>>> Testing: >>>>>>> >>>>>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>>>>> >>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>> >>>>>>> Dan >>>>>> >>>> >>> >>> >>> >> > > From coleen.phillimore at oracle.com Thu Nov 6 19:34:50 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 06 Nov 2014 14:34:50 -0500 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545BC59B.5080902@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> <545AC1FC.8010905@oracle.com> <545BBA5A.9000007@oracle.com> <545BBB9B.5000807@oracle.com> <545BC59B.5080902@oracle.com> Message-ID: <545BCD5A.6060001@oracle.com> While I'm not a huge fan of long macro names, it's still shorter than the thing it replaced: - add(Rmark, ObjectMonitor::owner_offset_in_bytes()-2, Rmark); + add(Rmark, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner), Rmark); I like it! Coleen On 11/6/14, 2:01 PM, Daniel D. Daugherty wrote: > Here's an updated webrev: > > http://cr.openjdk.java.net/~dcubed/8062851-webrev/1-jdk9-hs-rt/ > > What I did to sanity check this is compare the patch files from > the two webrevs... > > Dan > > > On 11/6/14 11:19 AM, Daniel D. Daugherty wrote: >> I just reread the entire review thread. >> >> I'm going to tweak the macro name a little bit: >> >> OM_OFFSET_NO_MONITOR_VALUE => OM_OFFSET_NO_MONITOR_VALUE_TAG >> >> I think that captures the intent quite nicely... >> >> Yes, I considered OM_OFFSET_WITHOUT_MONITOR_VALUE_TAG, but that >> made the macro even longer... :-) >> >> Dan >> >> >> On 11/6/14 11:13 AM, Daniel D. Daugherty wrote: >>> On 11/5/14 5:34 PM, David Holmes wrote: >>>> On 6/11/2014 10:23 AM, Coleen Phillimore wrote: >>>>> >>>>> On 11/5/14, 7:16 PM, David Holmes wrote: >>>>>> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> I have a Contended Locking cleanup bucket fix ready for review. >>>>>>>> >>>>>>>> This fix was spun off from the Contended Locking fast enter bucket >>>>>>>> which was sent out for review late last week. This fix cleans up >>>>>>>> the computation of ObjectMonitor field pointers and gets rid of >>>>>>>> the use of literal '-2' in appropriate places. For example: >>>>>>>> >>>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() >>>>>>>> - 2, >>>>>>>> Rscratch); >>>>>>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), >>>>>>>> Rscratch); >>>>>>>> >>>>>>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>>>>>> specified field and subtracts markOopDesc:monitor_value (2). >>>>>>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>>>>> >>>>>>> any reason not to add it as a function in objectMonitor.hpp >>>>>>> instead of a >>>>>>> macro? How about: >>>>>>> >>>>>>> static int no_monitor_offset_in_bytes() { return >>>>>>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >>>>>> >>>>>> _owner is not the only field used so you would need a function for >>>>>> each one. >>>>> >>>>> I thought this would be better too. There are only 6 functions (6 >>>>> lines) max that need this. It would look nicer. >>>> >>>> Only changes an upper case macro name to a lower case function name. >>> >>> As I mentioned in my reply to Claes, there are 12 offset_in_bytes() >>> functions >>> so for completeness we would add 12 new functions. I really don't >>> want to do >>> that. >>> >>> >>>> >>>>> My suggestion would be to make them static int >>>>> untagged_offset_in_bytes() or whatever monitor_value is. It's not a >>>>> very descriptive name so better to name the functions after what >>>>> it's for. >>>> >>>> You need the field name included in the function name: >>>> >>>> untagged_offset_of_owner() >>>> untagged_offset_of_xxx() >>>> >>>> but it is only untagged if the OM is currently inflated, so then: >>>> >>>> untagged_offset_of_XXX_for_inflated_om() >>>> >>>> I can live with Dan's macro (which is an improvement on the original). >>> >>> Thanks! I'm planning to stick with the macro which has a much smaller >>> footprint for such a strange little task... >>> >>> Dan >>> >>> >>>> >>>> David >>>> >>>>> Coleen >>>>> >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Example usage: >>>>>>> >>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>>>> Rscratch); >>>>>>> + ld_ptr(Rmark, >>>>>>> ObjectMonitor::no_monitor_offset_in_bytes(), >>>>>>> Rscratch); >>>>>>> >>>>>>> >>>>>>> Seems this should be inlined regardless and looks a bit cleaner >>>>>>> to me. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> /Claes >>>>>>> >>>>>>>> >>>>>>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>>>>>> motivated this (long overdue) cleanup. >>>>>>>> >>>>>>>> This work is being tracked by the following bug ID: >>>>>>>> >>>>>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>>>>> >>>>>>>> Here is the webrev URL: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>>>>>> >>>>>>>> Here is the JEP link: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>>>> >>>>>>>> Testing: >>>>>>>> >>>>>>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>>>>>> >>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>> >>>>>>>> Dan >>>>>>> >>>>> >>>> >>>> >>>> >>> >> >> > From daniel.daugherty at oracle.com Thu Nov 6 20:32:22 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 13:32:22 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545BCD5A.6060001@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> <545AC1FC.8010905@oracle.com> <545BBA5A.9000007@oracle.com> <545BBB9B.5000807@oracle.com> <545BC59B.5080902@oracle.com> <545BCD5A.6060001@oracle.com> Message-ID: <545BDAD6.5050609@oracle.com> Thanks for the re-review! Dan On 11/6/14 12:34 PM, Coleen Phillimore wrote: > > While I'm not a huge fan of long macro names, it's still shorter than > the thing it replaced: > > - add(Rmark, ObjectMonitor::owner_offset_in_bytes()-2, Rmark); > + add(Rmark, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner), Rmark); > > I like it! > > Coleen > > On 11/6/14, 2:01 PM, Daniel D. Daugherty wrote: >> Here's an updated webrev: >> >> http://cr.openjdk.java.net/~dcubed/8062851-webrev/1-jdk9-hs-rt/ >> >> What I did to sanity check this is compare the patch files from >> the two webrevs... >> >> Dan >> >> >> On 11/6/14 11:19 AM, Daniel D. Daugherty wrote: >>> I just reread the entire review thread. >>> >>> I'm going to tweak the macro name a little bit: >>> >>> OM_OFFSET_NO_MONITOR_VALUE => OM_OFFSET_NO_MONITOR_VALUE_TAG >>> >>> I think that captures the intent quite nicely... >>> >>> Yes, I considered OM_OFFSET_WITHOUT_MONITOR_VALUE_TAG, but that >>> made the macro even longer... :-) >>> >>> Dan >>> >>> >>> On 11/6/14 11:13 AM, Daniel D. Daugherty wrote: >>>> On 11/5/14 5:34 PM, David Holmes wrote: >>>>> On 6/11/2014 10:23 AM, Coleen Phillimore wrote: >>>>>> >>>>>> On 11/5/14, 7:16 PM, David Holmes wrote: >>>>>>> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>>>>>>> Greetings, >>>>>>>>> >>>>>>>>> I have a Contended Locking cleanup bucket fix ready for review. >>>>>>>>> >>>>>>>>> This fix was spun off from the Contended Locking fast enter >>>>>>>>> bucket >>>>>>>>> which was sent out for review late last week. This fix cleans up >>>>>>>>> the computation of ObjectMonitor field pointers and gets rid of >>>>>>>>> the use of literal '-2' in appropriate places. For example: >>>>>>>>> >>>>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() >>>>>>>>> - 2, >>>>>>>>> Rscratch); >>>>>>>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), >>>>>>>>> Rscratch); >>>>>>>>> >>>>>>>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>>>>>>> specified field and subtracts markOopDesc:monitor_value (2). >>>>>>>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>>>>>> >>>>>>>> any reason not to add it as a function in objectMonitor.hpp >>>>>>>> instead of a >>>>>>>> macro? How about: >>>>>>>> >>>>>>>> static int no_monitor_offset_in_bytes() { return >>>>>>>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >>>>>>> >>>>>>> _owner is not the only field used so you would need a function for >>>>>>> each one. >>>>>> >>>>>> I thought this would be better too. There are only 6 functions (6 >>>>>> lines) max that need this. It would look nicer. >>>>> >>>>> Only changes an upper case macro name to a lower case function name. >>>> >>>> As I mentioned in my reply to Claes, there are 12 offset_in_bytes() >>>> functions >>>> so for completeness we would add 12 new functions. I really don't >>>> want to do >>>> that. >>>> >>>> >>>>> >>>>>> My suggestion would be to make them static int >>>>>> untagged_offset_in_bytes() or whatever monitor_value is. It's not a >>>>>> very descriptive name so better to name the functions after what >>>>>> it's for. >>>>> >>>>> You need the field name included in the function name: >>>>> >>>>> untagged_offset_of_owner() >>>>> untagged_offset_of_xxx() >>>>> >>>>> but it is only untagged if the OM is currently inflated, so then: >>>>> >>>>> untagged_offset_of_XXX_for_inflated_om() >>>>> >>>>> I can live with Dan's macro (which is an improvement on the >>>>> original). >>>> >>>> Thanks! I'm planning to stick with the macro which has a much smaller >>>> footprint for such a strange little task... >>>> >>>> Dan >>>> >>>> >>>>> >>>>> David >>>>> >>>>>> Coleen >>>>>> >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Example usage: >>>>>>>> >>>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() >>>>>>>> - 2, >>>>>>>> Rscratch); >>>>>>>> + ld_ptr(Rmark, >>>>>>>> ObjectMonitor::no_monitor_offset_in_bytes(), >>>>>>>> Rscratch); >>>>>>>> >>>>>>>> >>>>>>>> Seems this should be inlined regardless and looks a bit cleaner >>>>>>>> to me. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> /Claes >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>>>>>>> motivated this (long overdue) cleanup. >>>>>>>>> >>>>>>>>> This work is being tracked by the following bug ID: >>>>>>>>> >>>>>>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>>>>>> >>>>>>>>> Here is the webrev URL: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>>>>>>> >>>>>>>>> Here is the JEP link: >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>>>>> >>>>>>>>> Testing: >>>>>>>>> >>>>>>>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>>>>>>> >>>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>>> >>>>>>>>> Dan >>>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> > From calvin.cheung at oracle.com Fri Nov 7 01:06:57 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Thu, 06 Nov 2014 17:06:57 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545AF8F2.1010106@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> <545AF8F2.1010106@oracle.com> Message-ID: <545C1B31.3060901@oracle.com> I've updated the webrev at the same location: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ I also re-ran the tests. Please take a look. thanks, Calvin On 11/5/2014 8:28 PM, Calvin Cheung wrote: > On 11/5/2014 4:50 PM, David Holmes wrote: >> On 6/11/2014 5:14 AM, Calvin Cheung wrote: >>> While upgrading the compiler on Mac for jdk9, we found this compiler >>> bug >>> where it skips the following 2 lines of code in metaspaceShared.cpp >>> when >>> optimization is enable (set to -Os) for the fastdebug and product >>> builds. >>> strcat(class_list_path_str, os::file_separator()); >>> strcat(class_list_path_str, "classlist"); >>> >>> The bug is reproducible with Xcode 5.1.1 and 6.1. >>> >>> A workaround fix is to rewrite an "if" block in the >>> MetaspaceShared::preload_and_dump() method. >> >> Can't you simply replace the strcats with jio_snprintf and do away >> with the sub_path array? > The following works. I'll do more testing before sending an updated > webrev. > > --- a/src/share/vm/memory/metaspaceShared.cpp > +++ b/src/share/vm/memory/metaspaceShared.cpp > @@ -713,12 +713,15 @@ > int class_list_path_len = (int)strlen(class_list_path_str); > if (class_list_path_len >= 3) { > if (strcmp(class_list_path_str + class_list_path_len - 3, > "lib") != 0) { > - strcat(class_list_path_str, os::file_separator()); > - strcat(class_list_path_str, "lib"); > + jio_snprintf(class_list_path_str + class_list_path_len, > + sizeof(class_list_path_str) - class_list_path_len, > + "%slib", os::file_separator()); > } > } > - strcat(class_list_path_str, os::file_separator()); > - strcat(class_list_path_str, "classlist"); > + class_list_path_len = (int)strlen(class_list_path_str); > + jio_snprintf(class_list_path_str + class_list_path_len, > + sizeof(class_list_path_str) - class_list_path_len, > + "%sclasslist", os::file_separator()); > class_list_path = class_list_path_str; > } else { > class_list_path = SharedClassListFile; >> >> Or even try strncat instead of strcat? > I think jio_snprintf is better because it null terminates the string. > If I use strncat, I'll need to initialize the entire buffer to null. > > thanks, > Calvin >> >> David >> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >>> >>> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>> >>> Testing: >>> JPRT >>> The affected testcase with product, fastdebug, and debug builds >>> built with Xcode 5.1.1 and 6.1. >>> >>> thanks, >>> Calvin > From jiangli.zhou at oracle.com Fri Nov 7 01:35:34 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Thu, 06 Nov 2014 17:35:34 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit Message-ID: <545C21E6.90709@oracle.com> Hi, Please review the following changes that fix the crash with -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). During VM initialization, current_stack_pointer() could be called before the VM generates stub routines. The generated get_previous_sp routine cannot be used during that time, use the estimated value for the sp value instead. The x86 implementation is unaffected by the change and always returns the estimated sp value as before. bug: https://bugs.openjdk.java.net/browse/JDK-8054008 webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ Tested with JPRT and ExtBadJAR test. Background: As part of the VM initialization, classLoader_init() calls ZIP_Open from the zip library for processing the boot class path when -XX:-LazyBootClassLoader is specified. The call path re-enters VM before returning from the zip library call. Following is the backtrace right before when the crash happens. The windows x64 version of current_stack_pointer() uses generated stub routine get_previous_sp (generated by generate_get_previous_sp()) to obtain the stack pointer value. Since classLoader_init() happens before stubRoutines_init1() and the stub routines are not generated at the time, the execution jumps to address 0 (referenced by _get_previous_sp_entry which should contain the address of the generated routine after stubRoutines_init1()) when it's trying to call the stub routine and crashes. jvm.dll!os::current_stack_pointer() Line 468 C++ jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ jvm.dll!JVM_NativePath(char * path) Line 691 C++ zip.dll!000007feebc49de0() [Frames below may be incorrect and/or missing, no symbols loaded for zip.dll] zip.dll!000007feebc4af1d() zip.dll!000007feebc4b004() jvm.dll!ClassLoader::create_class_path_entry(const char * path, const stat * st, bool lazy, bool throw_exception, Thread * __the_thread__) Line 666 + 0x13 bytes C++ jvm.dll!ClassLoader::update_class_path_entry_list(const char * path, bool check_for_duplicates, bool throw_exception) Line 763 + 0x2d bytes C++ jvm.dll!ClassLoader::setup_search_path(const char * class_path) Line 630 C++ jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ jvm.dll!ClassLoader::initialize() Line 1237 C++ jvm.dll!classLoader_init() Line 1291 C++ jvm.dll!init_globals() Line 100 C++ jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * canTryAgain) Line 3414 + 0x5 bytes C++ jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * args) Line 5199 + 0x12 bytes C++ java.exe!000000013f0520f6() java.exe!000000013f05cb63() java.exe!000000013f05cbf7() kernel32.dll!0000000076ba59ed() ntdll.dll!0000000076cdc541() Thanks, Jiangli From daniel.daugherty at oracle.com Fri Nov 7 02:17:37 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 06 Nov 2014 19:17:37 -0700 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <54591A3A.1090005@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> <54591A3A.1090005@oracle.com> Message-ID: <545C2BC0.3080207@oracle.com> The fix for JDK-8062851 has been reviewed, tested and pushed to RT_Baseline. Time to get back to this review thread so here's an updated webrev: http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/ David H., I believe I've addressed all of your comments. Please let me know if I missed something... Thanks, in advance, for any comments, questions or suggestions. Dan On 11/4/14 11:26 AM, Daniel D. Daugherty wrote: > The cleanup is turning into a bigger change than the fast enter > bucket itself so I'm spinning the cleanup into a new bug: > > JDK-8062851 cleanup ObjectMonitor offset adjustments > https://bugs.openjdk.java.net/browse/JDK-8062851 > > Yes, this means that the Contended Locking cleanup bucket has reopened > for yet another change... > > We'll get back to "fast enter" after the dust has settled... > > Dan > > > On 11/3/14 6:59 PM, Daniel D. Daugherty wrote: >> David, >> >> Thanks for the review! As usual, replies are embedded below... >> >> >> On 11/2/14 9:44 PM, David Holmes wrote: >>> Hi Dan, >>> >>> Looks good. >> >> Thanks! >> >> >>> Couple of nits and one semantic query below ... >>> >>> src/cpu/sparc/vm/macroAssembler_sparc.cpp >>> >>> Formatting changes were a bit of a distraction. >> >> Yes, I have no idea what got into me. Normally I do formatting >> changes separately so the noise does not distract... >> >> It turns out there is a constant defined that should be used >> instead of all these literal '2's: >> >> src/share/vm/oops/markOop.hpp: monitor_value = 2 >> >> Typically used as follows: >> >> src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = >> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; >> >> I will clean this up just for the files that I'm touching as >> part of this fix. >> >> >>> >>> --- >>> >>> src/cpu/x86/vm/macroAssembler_x86.cpp >>> >>> Formatting changes were a bit of a distraction. >> >> Same reply as for macroAssembler_sparc.cpp. >> >> >>> 1929 // unconditionally set stackBox->_displaced_header = 3 >>> 1930 movptr(Address(boxReg, 0), >>> (int32_t)intptr_t(markOopDesc::unused_mark())); >>> >>> At 1870 we refer to box rather than stackBox. Also it takes some >>> sleuthing to realize that "3" here is somehow a pseudonym for >>> unused_mark(). Back up at 1808 we have a to-do: >>> >>> 1808 // use markOop::unused_mark() instead of "3". >>> >>> so the current change seems to be implementing that, even though >>> other uses of "3" are left untouched. >> >> I'll take a look at cleaning those up also... >> >> In some cases markOopDesc::marked_value will work for the literal '3', >> but in other cases we'll use markOop::unused_mark(): >> >> static markOop unused_mark() { >> return (markOop) marked_value; >> } >> >> to save us the noise of the (markOop) cast. >> >> >>> --- >>> >>> src/share/vm/runtime/sharedRuntime.cpp >>> >>> 1794 JRT_BLOCK_ENTRY(void, >>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >>> lock, JavaThread* thread)) >>> 1795 if (!SafepointSynchronize::is_synchronizing()) { >>> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >>> return; >>> >>> Is it necessary to check is_synchronizing? If we are executing this >>> code we are not at a safepoint and the quick_enter wont change that, >>> so I'm not sure what we are guarding against. >> >> So this first state checker: >> >> src/share/vm/runtime/safepoint.hpp: >> inline static bool is_synchronizing() { return _state == >> _synchronizing; } >> >> means that we want to go to a safepoint and: >> >> inline static bool is_at_safepoint() { return _state == >> _synchronized; } >> >> means that we are at a safepoint. Dice's optimization bails out if >> we want to go to a safepoint and ObjectSynchronizer::quick_enter() >> has a "No_Safepoint_Verifier nsv" in it so we're expecting that >> code to be quick (and not go to a safepoint). I'm not seeing >> anything obvious.... >> >> Sometimes we have to be careful with JavaThread suspend requests and >> monitor acquisition, but I don't think that's a problem here... In >> order for the "suspend requesting" thread to be surprised, the suspend >> API, e.g., JVM/TI SuspendThread() has to return to the caller and then >> the suspend target has do something unexpected like acquire a monitor >> that it was previously blocked upon when it was suspended. We've had >> bugs like that in the past... In this optimization case, our target >> thread is not blocked on a contended monitor... >> >> In this particular case, the "suspend requesting" thread will set the >> suspend request state on the target thread, but the target thread is >> busy trying to enter this uncontended monitor (quickly). So the >> "suspend requesting" thread, will request a no-op safepoint, but it >> won't return from the suspend API until that safepoint completes. >> The safepoint won't complete until the target thread is done acquiring >> the previously uncontended monitor... so the target thread will be >> suspended while holding the previous uncontended monitor and the >> "suspend requesting" thread will return from the suspend API all >> happy... >> >> Well, I don't see the reason either so I'll have to ping Dave Dice >> and Karen Kinnear to see if either of them can fill in the history >> here. This could be an abundance of caution case. >> >> >>> --- >>> >>> src/share/vm/runtime/synchronizer.cpp >>> >>> Minor nit: line 153 the usual acronym is NPE (for >>> NullPointerException) not NPX >> >> I'll do a search for uses of NPX and other uses of 'X' in exception >> acronyms... >> >> >>> >>> Nit: 159 Thread * const ox >>> >>> Please change ox to owner. >> >> Will do. >> >> Thanks again for the review! >> >> Dan >> >> >>> >>> --- >>> >>> Thanks, >>> David >>> >>> >>> >>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I have the Contended Locking fast enter bucket ready for review. >>>> >>>> The code changes in this bucket are primarily a quick_enter() >>>> function that works on inflated but uncontended Java monitors. >>>> This quick_enter() function is used on the "slow path" for Java >>>> Monitor enter operations when the built-in "fast path" (read >>>> assembly code) doesn't work. >>>> >>>> This work is being tracked by the following bug ID: >>>> >>>> JDK-8061553 Contended Locking fast enter bucket >>>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>>> >>>> Here is the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>>> >>>> Here is the JEP link: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>> >>>> 8061553 summary of changes: >>>> >>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>>> >>>> - clean up spacing around some >>>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>>> - remove optional (EmitSync & 64) code >>>> - change from cmp() to andcc() so icc.zf flag is set >>>> >>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>>> >>>> - remove optional (EmitSync & 2) code >>>> - rewrite LP64 inflated lock code that tries to CAS in >>>> the new owner value to be more efficient >>>> >>>> interfaceSupport.hpp: >>>> >>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>>> JRT_BLOCK_ENTRY into two pieces. >>>> >>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>>> >>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>>> to permit ObjectSynchronizer::quick_enter() call >>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>>> >>>> synchronizer.[ch]pp: >>>> >>>> - add ObjectSynchronizer::quick_enter() for entering an >>>> inflated but unowned Java monitor without thread state >>>> changes >>>> >>>> Testing: >>>> >>>> - Aurora Adhoc RT/SVC baseline batch >>>> - JPRT test jobs >>>> - MonitorEnterStresser micro-benchmark (in process) >>>> - CallTimerGrid stress testing (in process) >>>> - Aurora performance testing: >>>> - out of the box for the "promotion" and 32-bit server configs >>>> - heavy weight monitors for the "promotion" and 32-bit server >>>> configs >>>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>>> (in process) >>>> >>>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan >> >> > > > From david.holmes at oracle.com Fri Nov 7 06:26:40 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 07 Nov 2014 16:26:40 +1000 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <545C21E6.90709@oracle.com> References: <545C21E6.90709@oracle.com> Message-ID: <545C6620.1040301@oracle.com> Looks good to me! Glad to see this could be resolved with changing the initialization sequence! Please update copyright year before pushing. Thanks, David On 7/11/2014 11:35 AM, Jiangli Zhou wrote: > Hi, > > Please review the following changes that fix the crash with > -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). > During VM initialization, current_stack_pointer() could be called > before the VM generates stub routines. The generated get_previous_sp > routine cannot be used during that time, use the estimated value for the > sp value instead. The x86 implementation is unaffected by the change and > always returns the estimated sp value as before. > > bug: https://bugs.openjdk.java.net/browse/JDK-8054008 > webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ > > Tested with JPRT and ExtBadJAR test. > > Background: > As part of the VM initialization, classLoader_init() calls ZIP_Open from > the zip library for processing the boot class path when > -XX:-LazyBootClassLoader is specified. The call path re-enters VM before > returning from the zip library call. Following is the backtrace right > before when the crash happens. The windows x64 version of > current_stack_pointer() uses generated stub routine get_previous_sp > (generated by generate_get_previous_sp()) to obtain the stack pointer > value. Since classLoader_init() happens before stubRoutines_init1() and > the stub routines are not generated at the time, the execution jumps to > address 0 (referenced by _get_previous_sp_entry which should contain the > address of the generated routine after stubRoutines_init1()) when it's > trying to call the stub routine and crashes. > > > jvm.dll!os::current_stack_pointer() Line 468 C++ > jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ > jvm.dll!JVM_NativePath(char * path) Line 691 C++ > zip.dll!000007feebc49de0() > [Frames below may be incorrect and/or missing, no symbols loaded > for zip.dll] > zip.dll!000007feebc4af1d() > zip.dll!000007feebc4b004() > jvm.dll!ClassLoader::create_class_path_entry(const char * path, > const stat * st, bool lazy, bool throw_exception, Thread * > __the_thread__) Line 666 + 0x13 bytes C++ > jvm.dll!ClassLoader::update_class_path_entry_list(const char * > path, bool check_for_duplicates, bool throw_exception) Line 763 + 0x2d > bytes C++ > jvm.dll!ClassLoader::setup_search_path(const char * class_path) > Line 630 C++ > jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ > jvm.dll!ClassLoader::initialize() Line 1237 C++ > jvm.dll!classLoader_init() Line 1291 C++ > jvm.dll!init_globals() Line 100 C++ > jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * > canTryAgain) Line 3414 + 0x5 bytes C++ > jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * > args) Line 5199 + 0x12 bytes C++ > java.exe!000000013f0520f6() > java.exe!000000013f05cb63() > java.exe!000000013f05cbf7() > kernel32.dll!0000000076ba59ed() > ntdll.dll!0000000076cdc541() > > Thanks, > Jiangli > From david.holmes at oracle.com Fri Nov 7 06:31:36 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 07 Nov 2014 16:31:36 +1000 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545BC59B.5080902@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> <545AC1FC.8010905@oracle.com> <545BBA5A.9000007@oracle.com> <545BBB9B.5000807@oracle.com> <545BC59B.5080902@oracle.com> Message-ID: <545C6748.1000901@oracle.com> Still fine for me. Thanks, David On 7/11/2014 5:01 AM, Daniel D. Daugherty wrote: > Here's an updated webrev: > > http://cr.openjdk.java.net/~dcubed/8062851-webrev/1-jdk9-hs-rt/ > > What I did to sanity check this is compare the patch files from > the two webrevs... > > Dan > > > On 11/6/14 11:19 AM, Daniel D. Daugherty wrote: >> I just reread the entire review thread. >> >> I'm going to tweak the macro name a little bit: >> >> OM_OFFSET_NO_MONITOR_VALUE => OM_OFFSET_NO_MONITOR_VALUE_TAG >> >> I think that captures the intent quite nicely... >> >> Yes, I considered OM_OFFSET_WITHOUT_MONITOR_VALUE_TAG, but that >> made the macro even longer... :-) >> >> Dan >> >> >> On 11/6/14 11:13 AM, Daniel D. Daugherty wrote: >>> On 11/5/14 5:34 PM, David Holmes wrote: >>>> On 6/11/2014 10:23 AM, Coleen Phillimore wrote: >>>>> >>>>> On 11/5/14, 7:16 PM, David Holmes wrote: >>>>>> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> I have a Contended Locking cleanup bucket fix ready for review. >>>>>>>> >>>>>>>> This fix was spun off from the Contended Locking fast enter bucket >>>>>>>> which was sent out for review late last week. This fix cleans up >>>>>>>> the computation of ObjectMonitor field pointers and gets rid of >>>>>>>> the use of literal '-2' in appropriate places. For example: >>>>>>>> >>>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>>>>> Rscratch); >>>>>>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), >>>>>>>> Rscratch); >>>>>>>> >>>>>>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>>>>>> specified field and subtracts markOopDesc:monitor_value (2). >>>>>>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>>>>> >>>>>>> any reason not to add it as a function in objectMonitor.hpp >>>>>>> instead of a >>>>>>> macro? How about: >>>>>>> >>>>>>> static int no_monitor_offset_in_bytes() { return >>>>>>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >>>>>> >>>>>> _owner is not the only field used so you would need a function for >>>>>> each one. >>>>> >>>>> I thought this would be better too. There are only 6 functions (6 >>>>> lines) max that need this. It would look nicer. >>>> >>>> Only changes an upper case macro name to a lower case function name. >>> >>> As I mentioned in my reply to Claes, there are 12 offset_in_bytes() >>> functions >>> so for completeness we would add 12 new functions. I really don't >>> want to do >>> that. >>> >>> >>>> >>>>> My suggestion would be to make them static int >>>>> untagged_offset_in_bytes() or whatever monitor_value is. It's not a >>>>> very descriptive name so better to name the functions after what >>>>> it's for. >>>> >>>> You need the field name included in the function name: >>>> >>>> untagged_offset_of_owner() >>>> untagged_offset_of_xxx() >>>> >>>> but it is only untagged if the OM is currently inflated, so then: >>>> >>>> untagged_offset_of_XXX_for_inflated_om() >>>> >>>> I can live with Dan's macro (which is an improvement on the original). >>> >>> Thanks! I'm planning to stick with the macro which has a much smaller >>> footprint for such a strange little task... >>> >>> Dan >>> >>> >>>> >>>> David >>>> >>>>> Coleen >>>>> >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Example usage: >>>>>>> >>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() - 2, >>>>>>> Rscratch); >>>>>>> + ld_ptr(Rmark, ObjectMonitor::no_monitor_offset_in_bytes(), >>>>>>> Rscratch); >>>>>>> >>>>>>> >>>>>>> Seems this should be inlined regardless and looks a bit cleaner >>>>>>> to me. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> /Claes >>>>>>> >>>>>>>> >>>>>>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>>>>>> motivated this (long overdue) cleanup. >>>>>>>> >>>>>>>> This work is being tracked by the following bug ID: >>>>>>>> >>>>>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>>>>> >>>>>>>> Here is the webrev URL: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>>>>>> >>>>>>>> Here is the JEP link: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>>>> >>>>>>>> Testing: >>>>>>>> >>>>>>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>>>>>> >>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>> >>>>>>>> Dan >>>>>>> >>>>> >>>> >>>> >>>> >>> >> >> > From david.holmes at oracle.com Fri Nov 7 06:38:31 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 07 Nov 2014 16:38:31 +1000 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545C1B31.3060901@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> <545AF8F2.1010106@oracle.com> <545C1B31.3060901@oracle.com> Message-ID: <545C68E7.4080807@oracle.com> Hi Calvin, On 7/11/2014 11:06 AM, Calvin Cheung wrote: > I've updated the webrev at the same location: > http://cr.openjdk.java.net/~ccheung/8060721/webrev/ > I also re-ran the tests. > > Please take a look. 717 jio_snprintf(class_list_path_str + class_list_path_len, 718 sizeof(class_list_path_str) - class_list_path_len, 719 "%slib", os::file_separator()); 720 } 721 } 722 class_list_path_len = (int)strlen(class_list_path_str); The strlen recalculation at #722 should be moved inside the if-block as that is the only time it is needed. Also can we not just do += 4 ? Thanks, David > thanks, > Calvin > > On 11/5/2014 8:28 PM, Calvin Cheung wrote: >> On 11/5/2014 4:50 PM, David Holmes wrote: >>> On 6/11/2014 5:14 AM, Calvin Cheung wrote: >>>> While upgrading the compiler on Mac for jdk9, we found this compiler >>>> bug >>>> where it skips the following 2 lines of code in metaspaceShared.cpp >>>> when >>>> optimization is enable (set to -Os) for the fastdebug and product >>>> builds. >>>> strcat(class_list_path_str, os::file_separator()); >>>> strcat(class_list_path_str, "classlist"); >>>> >>>> The bug is reproducible with Xcode 5.1.1 and 6.1. >>>> >>>> A workaround fix is to rewrite an "if" block in the >>>> MetaspaceShared::preload_and_dump() method. >>> >>> Can't you simply replace the strcats with jio_snprintf and do away >>> with the sub_path array? >> The following works. I'll do more testing before sending an updated >> webrev. >> >> --- a/src/share/vm/memory/metaspaceShared.cpp >> +++ b/src/share/vm/memory/metaspaceShared.cpp >> @@ -713,12 +713,15 @@ >> int class_list_path_len = (int)strlen(class_list_path_str); >> if (class_list_path_len >= 3) { >> if (strcmp(class_list_path_str + class_list_path_len - 3, >> "lib") != 0) { >> - strcat(class_list_path_str, os::file_separator()); >> - strcat(class_list_path_str, "lib"); >> + jio_snprintf(class_list_path_str + class_list_path_len, >> + sizeof(class_list_path_str) - class_list_path_len, >> + "%slib", os::file_separator()); >> } >> } >> - strcat(class_list_path_str, os::file_separator()); >> - strcat(class_list_path_str, "classlist"); >> + class_list_path_len = (int)strlen(class_list_path_str); >> + jio_snprintf(class_list_path_str + class_list_path_len, >> + sizeof(class_list_path_str) - class_list_path_len, >> + "%sclasslist", os::file_separator()); >> class_list_path = class_list_path_str; >> } else { >> class_list_path = SharedClassListFile; >>> >>> Or even try strncat instead of strcat? >> I think jio_snprintf is better because it null terminates the string. >> If I use strncat, I'll need to initialize the entire buffer to null. >> >> thanks, >> Calvin >>> >>> David >>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >>>> >>>> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>> >>>> Testing: >>>> JPRT >>>> The affected testcase with product, fastdebug, and debug builds >>>> built with Xcode 5.1.1 and 6.1. >>>> >>>> thanks, >>>> Calvin >> > From david.holmes at oracle.com Fri Nov 7 06:40:10 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 07 Nov 2014 16:40:10 +1000 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <545C6620.1040301@oracle.com> References: <545C21E6.90709@oracle.com> <545C6620.1040301@oracle.com> Message-ID: <545C694A.9070004@oracle.com> On 7/11/2014 4:26 PM, David Holmes wrote: > Looks good to me! Glad to see this could be resolved with changing the > initialization sequence! s/with/without/ :) David > > Please update copyright year before pushing. > > Thanks, > David > > On 7/11/2014 11:35 AM, Jiangli Zhou wrote: >> Hi, >> >> Please review the following changes that fix the crash with >> -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). >> During VM initialization, current_stack_pointer() could be called >> before the VM generates stub routines. The generated get_previous_sp >> routine cannot be used during that time, use the estimated value for the >> sp value instead. The x86 implementation is unaffected by the change and >> always returns the estimated sp value as before. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >> >> Tested with JPRT and ExtBadJAR test. >> >> Background: >> As part of the VM initialization, classLoader_init() calls ZIP_Open from >> the zip library for processing the boot class path when >> -XX:-LazyBootClassLoader is specified. The call path re-enters VM before >> returning from the zip library call. Following is the backtrace right >> before when the crash happens. The windows x64 version of >> current_stack_pointer() uses generated stub routine get_previous_sp >> (generated by generate_get_previous_sp()) to obtain the stack pointer >> value. Since classLoader_init() happens before stubRoutines_init1() and >> the stub routines are not generated at the time, the execution jumps to >> address 0 (referenced by _get_previous_sp_entry which should contain the >> address of the generated routine after stubRoutines_init1()) when it's >> trying to call the stub routine and crashes. >> >> >> jvm.dll!os::current_stack_pointer() Line 468 C++ >> jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ >> jvm.dll!JVM_NativePath(char * path) Line 691 C++ >> zip.dll!000007feebc49de0() >> [Frames below may be incorrect and/or missing, no symbols loaded >> for zip.dll] >> zip.dll!000007feebc4af1d() >> zip.dll!000007feebc4b004() >> jvm.dll!ClassLoader::create_class_path_entry(const char * path, >> const stat * st, bool lazy, bool throw_exception, Thread * >> __the_thread__) Line 666 + 0x13 bytes C++ >> jvm.dll!ClassLoader::update_class_path_entry_list(const char * >> path, bool check_for_duplicates, bool throw_exception) Line 763 + 0x2d >> bytes C++ >> jvm.dll!ClassLoader::setup_search_path(const char * class_path) >> Line 630 C++ >> jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ >> jvm.dll!ClassLoader::initialize() Line 1237 C++ >> jvm.dll!classLoader_init() Line 1291 C++ >> jvm.dll!init_globals() Line 100 C++ >> jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * >> canTryAgain) Line 3414 + 0x5 bytes C++ >> jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * >> args) Line 5199 + 0x12 bytes C++ >> java.exe!000000013f0520f6() >> java.exe!000000013f05cb63() >> java.exe!000000013f05cbf7() >> kernel32.dll!0000000076ba59ed() >> ntdll.dll!0000000076cdc541() >> >> Thanks, >> Jiangli >> From roland.westrelin at oracle.com Fri Nov 7 13:16:21 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 7 Nov 2014 14:16:21 +0100 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <545C21E6.90709@oracle.com> References: <545C21E6.90709@oracle.com> Message-ID: <682E3725-DDA0-4A1D-9F6E-94C6CF6045A1@oracle.com> Hi Jiangli, > Please review the following changes that fix the crash with -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). During VM initialization, current_stack_pointer() could be called before the VM generates stub routines. The generated get_previous_sp routine cannot be used during that time, use the estimated value for the sp value instead. The x86 implementation is unaffected by the change and always returns the estimated sp value as before. > > bug: https://bugs.openjdk.java.net/browse/JDK-8054008 > webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ > > Tested with JPRT and ExtBadJAR test. But if what os::current_stack_pointer() returns is no longer ?accurate?, aren?t you at risk of hitting the assert in os::verify_stack_alignment()? Shouldn?t you skip the assert entirely if the routine is not yet available? Also why not make that change on all platform to improve robustness while you?re doing this? Roland. From daniel.daugherty at oracle.com Fri Nov 7 14:02:47 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 07 Nov 2014 07:02:47 -0700 Subject: RFR(S) Contended Locking cleanup bucket (8062851) In-Reply-To: <545C6748.1000901@oracle.com> References: <5459A8ED.8060808@oracle.com> <545A4719.50705@oracle.com> <545ABDF1.6050107@oracle.com> <545ABF8E.1050408@oracle.com> <545AC1FC.8010905@oracle.com> <545BBA5A.9000007@oracle.com> <545BBB9B.5000807@oracle.com> <545BC59B.5080902@oracle.com> <545C6748.1000901@oracle.com> Message-ID: <545CD107.8060700@oracle.com> Thanks for the re-review! Dan On 11/6/14 11:31 PM, David Holmes wrote: > Still fine for me. > > Thanks, > David > > On 7/11/2014 5:01 AM, Daniel D. Daugherty wrote: >> Here's an updated webrev: >> >> http://cr.openjdk.java.net/~dcubed/8062851-webrev/1-jdk9-hs-rt/ >> >> What I did to sanity check this is compare the patch files from >> the two webrevs... >> >> Dan >> >> >> On 11/6/14 11:19 AM, Daniel D. Daugherty wrote: >>> I just reread the entire review thread. >>> >>> I'm going to tweak the macro name a little bit: >>> >>> OM_OFFSET_NO_MONITOR_VALUE => OM_OFFSET_NO_MONITOR_VALUE_TAG >>> >>> I think that captures the intent quite nicely... >>> >>> Yes, I considered OM_OFFSET_WITHOUT_MONITOR_VALUE_TAG, but that >>> made the macro even longer... :-) >>> >>> Dan >>> >>> >>> On 11/6/14 11:13 AM, Daniel D. Daugherty wrote: >>>> On 11/5/14 5:34 PM, David Holmes wrote: >>>>> On 6/11/2014 10:23 AM, Coleen Phillimore wrote: >>>>>> >>>>>> On 11/5/14, 7:16 PM, David Holmes wrote: >>>>>>> On 6/11/2014 1:49 AM, Claes Redestad wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 11/05/2014 05:34 AM, Daniel D. Daugherty wrote: >>>>>>>>> Greetings, >>>>>>>>> >>>>>>>>> I have a Contended Locking cleanup bucket fix ready for review. >>>>>>>>> >>>>>>>>> This fix was spun off from the Contended Locking fast enter >>>>>>>>> bucket >>>>>>>>> which was sent out for review late last week. This fix cleans up >>>>>>>>> the computation of ObjectMonitor field pointers and gets rid of >>>>>>>>> the use of literal '-2' in appropriate places. For example: >>>>>>>>> >>>>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() >>>>>>>>> - 2, >>>>>>>>> Rscratch); >>>>>>>>> + ld_ptr(Rmark, OM_OFFSET_NO_MONITOR_VALUE(owner), >>>>>>>>> Rscratch); >>>>>>>>> >>>>>>>>> The OM_OFFSET_NO_MONITOR_VALUE macro computes the offset to the >>>>>>>>> specified field and subtracts markOopDesc:monitor_value (2). >>>>>>>>> There's a nice comment in src/share/vm/runtime/objectMonitor.hpp. >>>>>>>> >>>>>>>> any reason not to add it as a function in objectMonitor.hpp >>>>>>>> instead of a >>>>>>>> macro? How about: >>>>>>>> >>>>>>>> static int no_monitor_offset_in_bytes() { return >>>>>>>> offset_of(ObjectMonitor, _owner) - markOopDesc::monitor_value; } >>>>>>> >>>>>>> _owner is not the only field used so you would need a function for >>>>>>> each one. >>>>>> >>>>>> I thought this would be better too. There are only 6 functions (6 >>>>>> lines) max that need this. It would look nicer. >>>>> >>>>> Only changes an upper case macro name to a lower case function name. >>>> >>>> As I mentioned in my reply to Claes, there are 12 offset_in_bytes() >>>> functions >>>> so for completeness we would add 12 new functions. I really don't >>>> want to do >>>> that. >>>> >>>> >>>>> >>>>>> My suggestion would be to make them static int >>>>>> untagged_offset_in_bytes() or whatever monitor_value is. It's not a >>>>>> very descriptive name so better to name the functions after what >>>>>> it's for. >>>>> >>>>> You need the field name included in the function name: >>>>> >>>>> untagged_offset_of_owner() >>>>> untagged_offset_of_xxx() >>>>> >>>>> but it is only untagged if the OM is currently inflated, so then: >>>>> >>>>> untagged_offset_of_XXX_for_inflated_om() >>>>> >>>>> I can live with Dan's macro (which is an improvement on the >>>>> original). >>>> >>>> Thanks! I'm planning to stick with the macro which has a much smaller >>>> footprint for such a strange little task... >>>> >>>> Dan >>>> >>>> >>>>> >>>>> David >>>>> >>>>>> Coleen >>>>>> >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Example usage: >>>>>>>> >>>>>>>> - ld_ptr(Rmark, ObjectMonitor::owner_offset_in_bytes() >>>>>>>> - 2, >>>>>>>> Rscratch); >>>>>>>> + ld_ptr(Rmark, >>>>>>>> ObjectMonitor::no_monitor_offset_in_bytes(), >>>>>>>> Rscratch); >>>>>>>> >>>>>>>> >>>>>>>> Seems this should be inlined regardless and looks a bit cleaner >>>>>>>> to me. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> /Claes >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks to David Holmes for his comments on JDK-8061553 that >>>>>>>>> motivated this (long overdue) cleanup. >>>>>>>>> >>>>>>>>> This work is being tracked by the following bug ID: >>>>>>>>> >>>>>>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>>>>>> >>>>>>>>> Here is the webrev URL: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~dcubed/8062851-webrev/0-jdk9-hs-rt/ >>>>>>>>> >>>>>>>>> Here is the JEP link: >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>>>>> >>>>>>>>> Testing: >>>>>>>>> >>>>>>>>> - JPRT test jobs (since this is only syntax and comment cleanup) >>>>>>>>> >>>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>>> >>>>>>>>> Dan >>>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> From andreas.eriksson at oracle.com Fri Nov 7 14:48:16 2014 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Fri, 07 Nov 2014 15:48:16 +0100 Subject: RFR: JDK7 backport of 8020675 - invalid jar file in the bootclasspath could lead to jvm fatal error In-Reply-To: <545BBA1F.3040301@oracle.com> References: <545B797D.70907@oracle.com> <545BA401.1070205@oracle.com> <545BBA1F.3040301@oracle.com> Message-ID: <545CDBB0.80700@oracle.com> Oh, interesting. The hsx25 changeset does not display the dummy.jar as being a part of the checkin: http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/7e7dd25666da But when I navigate to the dummy.jar path I can see that it was checked in as part of that changeset: http://hg.openjdk.java.net/hsx/hsx25/hotspot/log/7e7dd25666da/test/runtime/LoadClass/dummy.jar Is this a know issue with mercurial? Anyway, thanks for pointing this out, I would probably have missed it otherwise. It seems that if the dummy.jar is not present the test always succeeds. Thanks, Andreas On 2014-11-06 19:12, Calvin Cheung wrote: > Hi Andreas, > > The change looks good. > There should be a dummy.jar to go with the test cases. > http://cr.openjdk.java.net/~ccheung/8020675/webrev.02/ > > The webrev won't show any diffs for the jar file but don't forget to > include it when you push the fix. > > thanks, > Calvin > > On 11/6/2014 8:38 AM, Andreas Eriksson wrote: >> Hi, >> >> Could someone please review this jdk7 backport of JDK-8020675 >> . >> Summary: >> invalid jar file in the bootclasspath could lead to jvm fatal error >> removed offending EXCEPTION_MARK calls and code cleanup >> >> One code change necessary for the backport was in method >> ClassLoader::load_classfile. >> The change was to use CHECK_(instanceKlassHandle()) instead of >> CHECK_NULL. >> See the mail thread at >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-November/015825.html >> for more information. >> >> Webrev: http://cr.openjdk.java.net/~aeriksso/8020675/webrev.00/ >> >> Regards, >> Andreas >> >> > From andreas.eriksson at oracle.com Fri Nov 7 15:11:01 2014 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Fri, 07 Nov 2014 16:11:01 +0100 Subject: RFR: JDK7 backport of 8020675 - invalid jar file in the bootclasspath could lead to jvm fatal error In-Reply-To: <545CDBB0.80700@oracle.com> References: <545B797D.70907@oracle.com> <545BA401.1070205@oracle.com> <545BBA1F.3040301@oracle.com> <545CDBB0.80700@oracle.com> Message-ID: <545CE105.4020208@oracle.com> I think I need a jdk7u Reviewer to look at this as well, right? New webrev where I added the 0 byte dummy.jar: http://cr.openjdk.java.net/~aeriksso/8020675/webrev.01/ Checked so that the test fails on older versions and still passes on a fixed version. Regards, Andreas On 2014-11-07 15:48, Andreas Eriksson wrote: > Oh, interesting. > The hsx25 changeset does not display the dummy.jar as being a part of > the checkin: > http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/7e7dd25666da > > But when I navigate to the dummy.jar path I can see that it was > checked in as part of that changeset: > http://hg.openjdk.java.net/hsx/hsx25/hotspot/log/7e7dd25666da/test/runtime/LoadClass/dummy.jar > > > Is this a know issue with mercurial? > > Anyway, thanks for pointing this out, I would probably have missed it > otherwise. > It seems that if the dummy.jar is not present the test always succeeds. > > Thanks, > Andreas > > On 2014-11-06 19:12, Calvin Cheung wrote: >> Hi Andreas, >> >> The change looks good. >> There should be a dummy.jar to go with the test cases. >> http://cr.openjdk.java.net/~ccheung/8020675/webrev.02/ >> >> The webrev won't show any diffs for the jar file but don't forget to >> include it when you push the fix. >> >> thanks, >> Calvin >> >> On 11/6/2014 8:38 AM, Andreas Eriksson wrote: >>> Hi, >>> >>> Could someone please review this jdk7 backport of JDK-8020675 >>> . >>> Summary: >>> invalid jar file in the bootclasspath could lead to jvm fatal error >>> removed offending EXCEPTION_MARK calls and code cleanup >>> >>> One code change necessary for the backport was in method >>> ClassLoader::load_classfile. >>> The change was to use CHECK_(instanceKlassHandle()) instead of >>> CHECK_NULL. >>> See the mail thread at >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-November/015825.html >>> for more information. >>> >>> Webrev: http://cr.openjdk.java.net/~aeriksso/8020675/webrev.00/ >>> >>> Regards, >>> Andreas >>> >>> >> > From jiangli.zhou at oracle.com Fri Nov 7 17:11:07 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Fri, 07 Nov 2014 09:11:07 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <545C694A.9070004@oracle.com> References: <545C21E6.90709@oracle.com> <545C6620.1040301@oracle.com> <545C694A.9070004@oracle.com> Message-ID: <545CFD2B.8030108@oracle.com> Thank you, David. I see Roland has some suggestions regarding the change. I'll explore those. Thanks, Jiangli On 11/06/2014 10:40 PM, David Holmes wrote: > On 7/11/2014 4:26 PM, David Holmes wrote: >> Looks good to me! Glad to see this could be resolved with changing the >> initialization sequence! > > s/with/without/ :) > > David > >> >> Please update copyright year before pushing. >> >> Thanks, >> David >> >> On 7/11/2014 11:35 AM, Jiangli Zhou wrote: >>> Hi, >>> >>> Please review the following changes that fix the crash with >>> -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). >>> During VM initialization, current_stack_pointer() could be called >>> before the VM generates stub routines. The generated get_previous_sp >>> routine cannot be used during that time, use the estimated value for >>> the >>> sp value instead. The x86 implementation is unaffected by the change >>> and >>> always returns the estimated sp value as before. >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >>> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >>> >>> Tested with JPRT and ExtBadJAR test. >>> >>> Background: >>> As part of the VM initialization, classLoader_init() calls ZIP_Open >>> from >>> the zip library for processing the boot class path when >>> -XX:-LazyBootClassLoader is specified. The call path re-enters VM >>> before >>> returning from the zip library call. Following is the backtrace right >>> before when the crash happens. The windows x64 version of >>> current_stack_pointer() uses generated stub routine get_previous_sp >>> (generated by generate_get_previous_sp()) to obtain the stack pointer >>> value. Since classLoader_init() happens before stubRoutines_init1() and >>> the stub routines are not generated at the time, the execution jumps to >>> address 0 (referenced by _get_previous_sp_entry which should contain >>> the >>> address of the generated routine after stubRoutines_init1()) when it's >>> trying to call the stub routine and crashes. >>> >>> >>> jvm.dll!os::current_stack_pointer() Line 468 C++ >>> jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ >>> jvm.dll!JVM_NativePath(char * path) Line 691 C++ >>> zip.dll!000007feebc49de0() >>> [Frames below may be incorrect and/or missing, no symbols loaded >>> for zip.dll] >>> zip.dll!000007feebc4af1d() >>> zip.dll!000007feebc4b004() >>> jvm.dll!ClassLoader::create_class_path_entry(const char * path, >>> const stat * st, bool lazy, bool throw_exception, Thread * >>> __the_thread__) Line 666 + 0x13 bytes C++ >>> jvm.dll!ClassLoader::update_class_path_entry_list(const char * >>> path, bool check_for_duplicates, bool throw_exception) Line 763 + 0x2d >>> bytes C++ >>> jvm.dll!ClassLoader::setup_search_path(const char * class_path) >>> Line 630 C++ >>> jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ >>> jvm.dll!ClassLoader::initialize() Line 1237 C++ >>> jvm.dll!classLoader_init() Line 1291 C++ >>> jvm.dll!init_globals() Line 100 C++ >>> jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * >>> canTryAgain) Line 3414 + 0x5 bytes C++ >>> jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * >>> args) Line 5199 + 0x12 bytes C++ >>> java.exe!000000013f0520f6() >>> java.exe!000000013f05cb63() >>> java.exe!000000013f05cbf7() >>> kernel32.dll!0000000076ba59ed() >>> ntdll.dll!0000000076cdc541() >>> >>> Thanks, >>> Jiangli >>> From jiangli.zhou at oracle.com Fri Nov 7 17:29:11 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Fri, 07 Nov 2014 09:29:11 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <682E3725-DDA0-4A1D-9F6E-94C6CF6045A1@oracle.com> References: <545C21E6.90709@oracle.com> <682E3725-DDA0-4A1D-9F6E-94C6CF6045A1@oracle.com> Message-ID: <545D0167.3070903@oracle.com> Hi Roland, Thank you for the review. Please see comments and questions below. On 11/07/2014 05:16 AM, Roland Westrelin wrote: > Hi Jiangli, > >> Please review the following changes that fix the crash with -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). During VM initialization, current_stack_pointer() could be called before the VM generates stub routines. The generated get_previous_sp routine cannot be used during that time, use the estimated value for the sp value instead. The x86 implementation is unaffected by the change and always returns the estimated sp value as before. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >> >> Tested with JPRT and ExtBadJAR test. > But if what os::current_stack_pointer() returns is no longer ?accurate?, aren?t you at risk of hitting the assert in os::verify_stack_alignment()? Shouldn?t you skip the assert entirely if the routine is not yet available? For x64, it still returns the "accurate" value once the routine is generated. Before the routine is ready, it gives the estimate, which might have the risk of upsetting the assert as you suggested. I have a few questions. Have you run into the case where the estimate might trigger the assertion on x64? What about x86, why that's not handled the same as x64? > > Also why not make that change on all platform to improve robustness while you?re doing this? Thank you for the suggestion. Sound good. I'll look into this. Is there a global flag that indicates the stub routines are generated? Thanks, Jiangli > > Roland. From calvin.cheung at oracle.com Fri Nov 7 19:28:22 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Fri, 07 Nov 2014 11:28:22 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545C68E7.4080807@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> <545AF8F2.1010106@oracle.com> <545C1B31.3060901@oracle.com> <545C68E7.4080807@oracle.com> Message-ID: <545D1D56.4050000@oracle.com> On 11/6/2014 10:38 PM, David Holmes wrote: > Hi Calvin, > > On 7/11/2014 11:06 AM, Calvin Cheung wrote: >> I've updated the webrev at the same location: >> http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >> I also re-ran the tests. >> >> Please take a look. > > 717 jio_snprintf(class_list_path_str + class_list_path_len, > 718 sizeof(class_list_path_str) - > class_list_path_len, > 719 "%slib", os::file_separator()); > 720 } > 721 } > 722 class_list_path_len = (int)strlen(class_list_path_str); > > The strlen recalculation at #722 should be moved inside the if-block > as that is the only time it is needed. Agreed. > Also can we not just do += 4 ? I didn't want to use 4 to avoid another magic number but in this case I think it's obvious. I've updated webrev at the same location: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ thanks, Calvin > > Thanks, > David > >> thanks, >> Calvin >> >> On 11/5/2014 8:28 PM, Calvin Cheung wrote: >>> On 11/5/2014 4:50 PM, David Holmes wrote: >>>> On 6/11/2014 5:14 AM, Calvin Cheung wrote: >>>>> While upgrading the compiler on Mac for jdk9, we found this compiler >>>>> bug >>>>> where it skips the following 2 lines of code in metaspaceShared.cpp >>>>> when >>>>> optimization is enable (set to -Os) for the fastdebug and product >>>>> builds. >>>>> strcat(class_list_path_str, os::file_separator()); >>>>> strcat(class_list_path_str, "classlist"); >>>>> >>>>> The bug is reproducible with Xcode 5.1.1 and 6.1. >>>>> >>>>> A workaround fix is to rewrite an "if" block in the >>>>> MetaspaceShared::preload_and_dump() method. >>>> >>>> Can't you simply replace the strcats with jio_snprintf and do away >>>> with the sub_path array? >>> The following works. I'll do more testing before sending an updated >>> webrev. >>> >>> --- a/src/share/vm/memory/metaspaceShared.cpp >>> +++ b/src/share/vm/memory/metaspaceShared.cpp >>> @@ -713,12 +713,15 @@ >>> int class_list_path_len = (int)strlen(class_list_path_str); >>> if (class_list_path_len >= 3) { >>> if (strcmp(class_list_path_str + class_list_path_len - 3, >>> "lib") != 0) { >>> - strcat(class_list_path_str, os::file_separator()); >>> - strcat(class_list_path_str, "lib"); >>> + jio_snprintf(class_list_path_str + class_list_path_len, >>> + sizeof(class_list_path_str) - >>> class_list_path_len, >>> + "%slib", os::file_separator()); >>> } >>> } >>> - strcat(class_list_path_str, os::file_separator()); >>> - strcat(class_list_path_str, "classlist"); >>> + class_list_path_len = (int)strlen(class_list_path_str); >>> + jio_snprintf(class_list_path_str + class_list_path_len, >>> + sizeof(class_list_path_str) - class_list_path_len, >>> + "%sclasslist", os::file_separator()); >>> class_list_path = class_list_path_str; >>> } else { >>> class_list_path = SharedClassListFile; >>>> >>>> Or even try strncat instead of strcat? >>> I think jio_snprintf is better because it null terminates the string. >>> If I use strncat, I'll need to initialize the entire buffer to null. >>> >>> thanks, >>> Calvin >>>> >>>> David >>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >>>>> >>>>> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>>> >>>>> Testing: >>>>> JPRT >>>>> The affected testcase with product, fastdebug, and debug builds >>>>> built with Xcode 5.1.1 and 6.1. >>>>> >>>>> thanks, >>>>> Calvin >>> >> From ioi.lam at oracle.com Fri Nov 7 20:44:02 2014 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 07 Nov 2014 12:44:02 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545D1D56.4050000@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> <545AF8F2.1010106@oracle.com> <545C1B31.3060901@oracle.com> <545C68E7.4080807@oracle.com> <545D1D56.4050000@oracle.com> Message-ID: <545D2F12.1000001@oracle.com> Calvin, the new changes look good to me. - Ioi On 11/7/14, 11:28 AM, Calvin Cheung wrote: > On 11/6/2014 10:38 PM, David Holmes wrote: >> Hi Calvin, >> >> On 7/11/2014 11:06 AM, Calvin Cheung wrote: >>> I've updated the webrev at the same location: >>> http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>> I also re-ran the tests. >>> >>> Please take a look. >> >> 717 jio_snprintf(class_list_path_str + class_list_path_len, >> 718 sizeof(class_list_path_str) - >> class_list_path_len, >> 719 "%slib", os::file_separator()); >> 720 } >> 721 } >> 722 class_list_path_len = (int)strlen(class_list_path_str); >> >> The strlen recalculation at #722 should be moved inside the if-block >> as that is the only time it is needed. > Agreed. >> Also can we not just do += 4 ? > I didn't want to use 4 to avoid another magic number but in this case > I think it's obvious. > > I've updated webrev at the same location: > http://cr.openjdk.java.net/~ccheung/8060721/webrev/ > > thanks, > Calvin >> >> Thanks, >> David >> >>> thanks, >>> Calvin >>> >>> On 11/5/2014 8:28 PM, Calvin Cheung wrote: >>>> On 11/5/2014 4:50 PM, David Holmes wrote: >>>>> On 6/11/2014 5:14 AM, Calvin Cheung wrote: >>>>>> While upgrading the compiler on Mac for jdk9, we found this compiler >>>>>> bug >>>>>> where it skips the following 2 lines of code in metaspaceShared.cpp >>>>>> when >>>>>> optimization is enable (set to -Os) for the fastdebug and product >>>>>> builds. >>>>>> strcat(class_list_path_str, os::file_separator()); >>>>>> strcat(class_list_path_str, "classlist"); >>>>>> >>>>>> The bug is reproducible with Xcode 5.1.1 and 6.1. >>>>>> >>>>>> A workaround fix is to rewrite an "if" block in the >>>>>> MetaspaceShared::preload_and_dump() method. >>>>> >>>>> Can't you simply replace the strcats with jio_snprintf and do away >>>>> with the sub_path array? >>>> The following works. I'll do more testing before sending an updated >>>> webrev. >>>> >>>> --- a/src/share/vm/memory/metaspaceShared.cpp >>>> +++ b/src/share/vm/memory/metaspaceShared.cpp >>>> @@ -713,12 +713,15 @@ >>>> int class_list_path_len = (int)strlen(class_list_path_str); >>>> if (class_list_path_len >= 3) { >>>> if (strcmp(class_list_path_str + class_list_path_len - 3, >>>> "lib") != 0) { >>>> - strcat(class_list_path_str, os::file_separator()); >>>> - strcat(class_list_path_str, "lib"); >>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>> + sizeof(class_list_path_str) - >>>> class_list_path_len, >>>> + "%slib", os::file_separator()); >>>> } >>>> } >>>> - strcat(class_list_path_str, os::file_separator()); >>>> - strcat(class_list_path_str, "classlist"); >>>> + class_list_path_len = (int)strlen(class_list_path_str); >>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>> + sizeof(class_list_path_str) - class_list_path_len, >>>> + "%sclasslist", os::file_separator()); >>>> class_list_path = class_list_path_str; >>>> } else { >>>> class_list_path = SharedClassListFile; >>>>> >>>>> Or even try strncat instead of strcat? >>>> I think jio_snprintf is better because it null terminates the string. >>>> If I use strncat, I'll need to initialize the entire buffer to null. >>>> >>>> thanks, >>>> Calvin >>>>> >>>>> David >>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>>>> >>>>>> Testing: >>>>>> JPRT >>>>>> The affected testcase with product, fastdebug, and debug builds >>>>>> built with Xcode 5.1.1 and 6.1. >>>>>> >>>>>> thanks, >>>>>> Calvin >>>> >>> > From calvin.cheung at oracle.com Fri Nov 7 21:06:05 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Fri, 07 Nov 2014 13:06:05 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545D2F12.1000001@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> <545AF8F2.1010106@oracle.com> <545C1B31.3060901@oracle.com> <545C68E7.4080807@oracle.com> <545D1D56.4050000@oracle.com> <545D2F12.1000001@oracle.com> Message-ID: <545D343D.4080001@oracle.com> Thanks - Ioi. On 11/7/2014 12:44 PM, Ioi Lam wrote: > Calvin, the new changes look good to me. > > - Ioi > > On 11/7/14, 11:28 AM, Calvin Cheung wrote: >> On 11/6/2014 10:38 PM, David Holmes wrote: >>> Hi Calvin, >>> >>> On 7/11/2014 11:06 AM, Calvin Cheung wrote: >>>> I've updated the webrev at the same location: >>>> http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>> I also re-ran the tests. >>>> >>>> Please take a look. >>> >>> 717 jio_snprintf(class_list_path_str + class_list_path_len, >>> 718 sizeof(class_list_path_str) - >>> class_list_path_len, >>> 719 "%slib", os::file_separator()); >>> 720 } >>> 721 } >>> 722 class_list_path_len = (int)strlen(class_list_path_str); >>> >>> The strlen recalculation at #722 should be moved inside the if-block >>> as that is the only time it is needed. >> Agreed. >>> Also can we not just do += 4 ? >> I didn't want to use 4 to avoid another magic number but in this case >> I think it's obvious. >> >> I've updated webrev at the same location: >> http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >> >> thanks, >> Calvin >>> >>> Thanks, >>> David >>> >>>> thanks, >>>> Calvin >>>> >>>> On 11/5/2014 8:28 PM, Calvin Cheung wrote: >>>>> On 11/5/2014 4:50 PM, David Holmes wrote: >>>>>> On 6/11/2014 5:14 AM, Calvin Cheung wrote: >>>>>>> While upgrading the compiler on Mac for jdk9, we found this >>>>>>> compiler >>>>>>> bug >>>>>>> where it skips the following 2 lines of code in metaspaceShared.cpp >>>>>>> when >>>>>>> optimization is enable (set to -Os) for the fastdebug and product >>>>>>> builds. >>>>>>> strcat(class_list_path_str, os::file_separator()); >>>>>>> strcat(class_list_path_str, "classlist"); >>>>>>> >>>>>>> The bug is reproducible with Xcode 5.1.1 and 6.1. >>>>>>> >>>>>>> A workaround fix is to rewrite an "if" block in the >>>>>>> MetaspaceShared::preload_and_dump() method. >>>>>> >>>>>> Can't you simply replace the strcats with jio_snprintf and do away >>>>>> with the sub_path array? >>>>> The following works. I'll do more testing before sending an updated >>>>> webrev. >>>>> >>>>> --- a/src/share/vm/memory/metaspaceShared.cpp >>>>> +++ b/src/share/vm/memory/metaspaceShared.cpp >>>>> @@ -713,12 +713,15 @@ >>>>> int class_list_path_len = (int)strlen(class_list_path_str); >>>>> if (class_list_path_len >= 3) { >>>>> if (strcmp(class_list_path_str + class_list_path_len - 3, >>>>> "lib") != 0) { >>>>> - strcat(class_list_path_str, os::file_separator()); >>>>> - strcat(class_list_path_str, "lib"); >>>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>>> + sizeof(class_list_path_str) - >>>>> class_list_path_len, >>>>> + "%slib", os::file_separator()); >>>>> } >>>>> } >>>>> - strcat(class_list_path_str, os::file_separator()); >>>>> - strcat(class_list_path_str, "classlist"); >>>>> + class_list_path_len = (int)strlen(class_list_path_str); >>>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>>> + sizeof(class_list_path_str) - class_list_path_len, >>>>> + "%sclasslist", os::file_separator()); >>>>> class_list_path = class_list_path_str; >>>>> } else { >>>>> class_list_path = SharedClassListFile; >>>>>> >>>>>> Or even try strncat instead of strcat? >>>>> I think jio_snprintf is better because it null terminates the string. >>>>> If I use strncat, I'll need to initialize the entire buffer to null. >>>>> >>>>> thanks, >>>>> Calvin >>>>>> >>>>>> David >>>>>> >>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>>>>> >>>>>>> Testing: >>>>>>> JPRT >>>>>>> The affected testcase with product, fastdebug, and debug >>>>>>> builds >>>>>>> built with Xcode 5.1.1 and 6.1. >>>>>>> >>>>>>> thanks, >>>>>>> Calvin >>>>> >>>> >> > From david.r.chase at oracle.com Fri Nov 7 21:14:38 2014 From: david.r.chase at oracle.com (David Chase) Date: Fri, 7 Nov 2014 16:14:38 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> Message-ID: <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> New webrev: bug: https://bugs.openjdk.java.net/browse/JDK-8013267 webrevs: http://cr.openjdk.java.net/~drchase/8013267/jdk.06/ http://cr.openjdk.java.net/~drchase/8013267/hotspot.06/ Changes since last: 1) refactored to put ClassData under java.lang.invoke.MemberName 2) split the data structure into two parts; handshake with JVM uses a linked list, which makes for a simpler backout-if-race, and Java side continues to use the simple sorted array. This should allow easier use of (for example) fancier data structures (like ConcurrentHashMap) if this later proves necessary. 3) Cleaned up symbol references in the new hotspot code to go through vmSymbols. 4) renamed oldCapacity to oldSize 5) ran two different benchmarks and saw no change in performance. a) nashorn ScriptTest (see https://bugs.openjdk.java.net/browse/JDK-8014288 ) b) JMH microbenchmarks (see bug comments for details) And it continues to pass the previously-failing tests, as well as the new test which has been added to hotspot/test/compiler/jsr292 . David On 2014-11-04, at 3:54 PM, David Chase wrote: > I?m working on the initial benchmarking, and so far this arrangement (with synchronization > and binary search for lookup, lots of barriers and linear cost insertion) has not yet been any > slower. > > I am nonetheless tempted by the 2-tables solution, because I think the simpler JVM-side > interface that it allows is desirable. > > David > > On 2014-11-04, at 11:48 AM, Peter Levart wrote: > >> On 11/04/2014 04:19 PM, David Chase wrote: >>> On 2014-11-04, at 5:07 AM, Peter Levart wrote: >>>> Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? >>> It can?t be an identityHashMap, because we are interning member names. >> >> I know it can't be IdentityHashMap - I just wondered if you were thinking of an IdentityHashMap-like data structure in contrast to standard HashMap-like. Not in terms of equality/hashCode used, but in terms of internal data structure. IdentityHashMap is just an array of elements (well pairs of them - key, value are placed in two consecutive array slots). Lookup searches for element linearly in the array starting from hashCode based index to the element if found or 1st empty array slot. It's very easy to implement if the only operations are get() and put() and could be used for interning and as a shared structure for VM to scan, but array has to be sized to at least 3/2 the number of elements for performance to not degrade. >> >>> In spite of my grumbling about benchmarking, I?m inclined to do that and try a couple of experiments. >>> One possibility would be to use two data structures, one for interning, the other for communication with the VM. >>> Because there?s no lookup in the VM data stucture it can just be an array that gets elements appended, >>> and the synchronization dance is much simpler. >>> >>> For interning, maybe I use a ConcurrentHashMap, and I try the following idiom: >>> >>> mn = resolve(args) >>> // deal with any errors >>> mn? = chm.get(mn) >>> if (mn? != null) return mn? // hoped-for-common-case >>> >>> synchronized (something) { >>> mn? = chm.get(mn) >>> if (mn? != null) return mn? >>> txn_class = mn.getDeclaringClass() >>> >>> while (true) { >>> redef_count = txn_class.redefCount() >>> mn = resolve(args) >>> >>> shared_array.add(mn); >>> // barrier, because we are a paranoid >>> if (redef_count = redef_count.redefCount()) { >>> chm.add(mn); // safe to publish to other Java threads. >>> return mn; >>> } >>> shared_array.drop_last(); // Try again >>> } >>> } >>> >>> (Idiom gets slightly revised for the one or two other intern use cases, but this is the basic idea). >> >> Yes, that's similar to what I suggested by using a linked-list of MemberName(s) instead of the "shared_array" (easier to reason about ordering of writes) and a sorted array of MemberName(s) instead of the "chm" in your scheme above. ConcurrentHashMap would certainly be the most performant solution in terms of lookup/insertion-time and concurrent throughput, but it will use more heap than a simple packed array of MemberNames. CHM is much better now in JDK8 though regarding heap use. >> >> A combination of the two approaches is also possible: >> >> - instead of maintaining a "shared_array" of MemberName(s), have them form a linked-list (you trade a slot in array for 'next' pointer in MemberName) >> - use ConcurrentHashMap for interning. >> >> Regards, Peter >> >>> >>> David >>> >>>>> And another way to view this is that we?re now quibbling about performance, when we still >>>>> have an existing correctness problem that this patch solves, so maybe we should just get this >>>>> done and then file an RFE. >>>> Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. >>>> >>>> Regards, Peter >>>> >>>>> David > From calvin.cheung at oracle.com Fri Nov 7 21:38:23 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Fri, 07 Nov 2014 13:38:23 -0800 Subject: RFR: JDK7 backport of 8020675 - invalid jar file in the bootclasspath could lead to jvm fatal error In-Reply-To: <545CE105.4020208@oracle.com> References: <545B797D.70907@oracle.com> <545BA401.1070205@oracle.com> <545BBA1F.3040301@oracle.com> <545CDBB0.80700@oracle.com> <545CE105.4020208@oracle.com> Message-ID: <545D3BCF.8090200@oracle.com> The new webrev looks good. On 11/7/2014 7:11 AM, Andreas Eriksson wrote: > I think I need a jdk7u Reviewer to look at this as well, right? For backport, you only need 1 reviewer and it doesn't have to be a capital R Reviewer. Calvin > > New webrev where I added the 0 byte dummy.jar: > http://cr.openjdk.java.net/~aeriksso/8020675/webrev.01/ > > Checked so that the test fails on older versions and still passes on a > fixed version. > > Regards, > Andreas > > On 2014-11-07 15:48, Andreas Eriksson wrote: >> Oh, interesting. >> The hsx25 changeset does not display the dummy.jar as being a part of >> the checkin: >> http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/7e7dd25666da >> >> But when I navigate to the dummy.jar path I can see that it was >> checked in as part of that changeset: >> http://hg.openjdk.java.net/hsx/hsx25/hotspot/log/7e7dd25666da/test/runtime/LoadClass/dummy.jar >> >> >> Is this a know issue with mercurial? >> >> Anyway, thanks for pointing this out, I would probably have missed it >> otherwise. >> It seems that if the dummy.jar is not present the test always succeeds. >> >> Thanks, >> Andreas >> >> On 2014-11-06 19:12, Calvin Cheung wrote: >>> Hi Andreas, >>> >>> The change looks good. >>> There should be a dummy.jar to go with the test cases. >>> http://cr.openjdk.java.net/~ccheung/8020675/webrev.02/ >>> >>> The webrev won't show any diffs for the jar file but don't forget to >>> include it when you push the fix. >>> >>> thanks, >>> Calvin >>> >>> On 11/6/2014 8:38 AM, Andreas Eriksson wrote: >>>> Hi, >>>> >>>> Could someone please review this jdk7 backport of JDK-8020675 >>>> . >>>> Summary: >>>> invalid jar file in the bootclasspath could lead to jvm fatal error >>>> removed offending EXCEPTION_MARK calls and code cleanup >>>> >>>> One code change necessary for the backport was in method >>>> ClassLoader::load_classfile. >>>> The change was to use CHECK_(instanceKlassHandle()) instead of >>>> CHECK_NULL. >>>> See the mail thread at >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-November/015825.html >>>> for more information. >>>> >>>> Webrev: http://cr.openjdk.java.net/~aeriksso/8020675/webrev.00/ >>>> >>>> Regards, >>>> Andreas >>>> >>>> >>> >> > From jiangli.zhou at oracle.com Fri Nov 7 22:46:54 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Fri, 07 Nov 2014 14:46:54 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <545D0167.3070903@oracle.com> References: <545C21E6.90709@oracle.com> <682E3725-DDA0-4A1D-9F6E-94C6CF6045A1@oracle.com> <545D0167.3070903@oracle.com> Message-ID: <545D4BDE.9010908@oracle.com> Hi Roland, On 11/07/2014 09:29 AM, Jiangli Zhou wrote: > Hi Roland, > > Thank you for the review. Please see comments and questions below. > > On 11/07/2014 05:16 AM, Roland Westrelin wrote: >> Hi Jiangli, >> >>> Please review the following changes that fix the crash with >>> -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). >>> During VM initialization, current_stack_pointer() could be called >>> before the VM generates stub routines. The generated get_previous_sp >>> routine cannot be used during that time, use the estimated value for >>> the sp value instead. The x86 implementation is unaffected by the >>> change and always returns the estimated sp value as before. >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >>> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >>> >>> Tested with JPRT and ExtBadJAR test. >> But if what os::current_stack_pointer() returns is no longer >> ?accurate?, aren?t you at risk of hitting the assert in >> os::verify_stack_alignment()? Shouldn?t you skip the assert entirely >> if the routine is not yet available? > > For x64, it still returns the "accurate" value once the routine is > generated. Before the routine is ready, it gives the estimate, which > might have the risk of upsetting the assert as you suggested. I have a > few questions. Have you run into the case where the estimate might > trigger the assertion on x64? What about x86, why that's not handled > the same as x64? Answering my own question, verify_stack_alignment() is a nop on x86. That's probably why there was no need to obtain the "accurate" previous sp on x86 and an estimated value was always returned on x86. > >> >> Also why not make that change on all platform to improve robustness >> while you?re doing this? > > Thank you for the suggestion. Sound good. I'll look into this. Is > there a global flag that indicates the stub routines are generated? I changed windows os::verify_stack_alignment() to skip the assert when StubRoutines::code1() is NULL. Please see the following updated webrev. Regarding you question about other platforms, only windows x64 has this particular issue. The os::verify_stack_alignment() is nop for sparc, ARM, and x86. The ppc, linux-x64, solaris-x64 verify_stack_alignment() implementations do use the generated routine to obtain previous sp. http://cr.openjdk.java.net/~jiangli/8054008/webrev.02/ Thanks, Jiangli > > Thanks, > Jiangli > >> >> Roland. > From chris.plummer at oracle.com Sat Nov 8 03:53:01 2014 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 07 Nov 2014 19:53:01 -0800 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault Message-ID: <545D939D.2030308@oracle.com> This is an initial review for 6762191. I'm guessing there will be recommendations to fix in a different way, but thought this would be a good time to start the discussion. https://bugs.openjdk.java.net/browse/JDK-6762191 http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ The bug is that if the -Xss size is set to something very small (like 16k), on linux there will be a crash due to overwriting the end of the stack. This happens before hotspot can compute its stack needs and verify that the stack is big enough. It didn't seem viable to move the hotspot stack size check earlier. It depends on too much other work done before that point, and the changes would have been disruptive. The stack size check is currently done in os::init_2(). What is needed is a check before the thread is created. That way we can create a thread with a big enough stack to handle all needs up to the point of the check in os::init_2(). This initial check does not need to be the final check. It just needs to confirm that we have enough stack to get us to the check in os::init_2(). I decided to check in java.c if the -Xss size is too small, and set it to a larger size if it is. I hard coded this size to 32k (I'll explain why 32k later). I suspect this is the part that will result in some debate. If you have better suggestions let me know. If it does stay here, then probably the 32k needs to be a #define, and maybe even an OS porting interface, but I'm not sure where to put it. The reason I chose 32k is because this is big enough for all platforms to get to the stack size check in os::init_2(). It is also smaller than the actual minimum stack size allowed on any platform. 32-bit windows has the smallest requirement at 64k. I add some printfs to print the minimum stack requirement, and then ran a simple JTReg test with every JPRT supported platform to get the results. The TooSmallStackSize.sh will run "java -version" with -Xss16k, -Xss32k, and -XXss, where is the size from the error message produced by the JVM, such as in the following: $ java -Xss32k -version The stack size specified is too small, Specify at least 100k Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. I ran this test through JPRT on all platforms, and they all pass. One thing to point out is that Windows behaves a bit different than the other platforms. It always rounds the stack size up to a multiple of 64k , so even if you specify -Xss16k, you get a 64k stack. On 32-bit Windows with C1, 64k is also the minimum requirement, so there is no error produced in this case. However, on 32-bit Windows with C2, 68k is the minimum, so an error is produced since the stack will only be 64k. There is no bug here. It's just a bit confusing. thanks, Chris From peter.levart at gmail.com Sat Nov 8 15:07:38 2014 From: peter.levart at gmail.com (Peter Levart) Date: Sat, 08 Nov 2014 16:07:38 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> Message-ID: <545E31BA.3070500@gmail.com> Hi David, As previously, I feel competent to comment only the Java side of the patch. Using linked list to publish to VM is a safer and easier way for the desired purpose and using a separate data structure for interning makes it easier to change it in the future should need arise. That need may just be around the corner as there are people (Martin Buchholz, Duncan MacGregor, for example) that use classes with huge number of methods... Here are few comments for the new webrev (MemberName): 78 @SuppressWarnings("rawtypes") //Comparable in next line 79 /*non-public*/ final class MemberName implements Member, Comparable, Cloneable { Since MemberName is final and can only be compared to itself, you could make it implement Comparable and eliminate the @SuppressWarnings more MemberName: 84 private volatile MemberName next; // used for a linked list of MemberNames known to VM ... 1375 private static class ClassData { 1376 /** 1377 * This needs to be a simple data structure because we need to access 1378 * and update its elements from the JVM. Note that the Java side controls 1379 * the allocation and order of elements in the array; the JVM modifies 1380 * fields of those elements during class redefinition. 1381 */ 1382 private volatile MemberName[] elementData; 1383 private volatile MemberName publishedToVM; 1384 private volatile int size; 1385 1386 /** 1387 * Interns a member name in the member name table. 1388 * Returns null if a race with the jvm occurred. Races are detected 1389 * by checking for changes in the class redefinition count that occur 1390 * before an intern is complete. 1391 * 1392 * @param klass class whose redefinition count is checked. 1393 * @param memberName member name to be interned 1394 * @param redefined_count the value of classRedefinedCount() observed before 1395 * creation of the MemberName that is being interned. 1396 * @return null if a race occurred, otherwise the interned MemberName. 1397 */ 1398 @SuppressWarnings({"unchecked","rawtypes"}) 1399 public MemberName intern(Class klass, MemberName memberName, int redefined_count) { 1400 if (elementData == null) { 1401 synchronized (this) { 1402 if (elementData == null) { 1403 elementData = new MemberName[1]; 1404 } 1405 } 1406 } 1407 synchronized (this) { // this == ClassData 1408 final int index = Arrays.binarySearch(elementData, 0, size, memberName); 1409 if (index >= 0) { 1410 return elementData[index]; 1411 } 1412 // Not found, add carefully. 1413 return add(klass, ~index, memberName, redefined_count); 1414 } 1415 } ... 1426 private MemberName add(Class klass, int index, MemberName e, int redefined_count) { 1427 // First attempt publication to JVM, if that succeeds, 1428 // then record internally. 1429 e.next = publishedToVM; 1430 publishedToVM = e; 1431 storeFence(); 1432 if (redefined_count != jla.getClassRedefinedCount(klass)) { 1433 // Lost a race, back out publication and report failure. 1434 publishedToVM = e.next; 1435 return null; 1436 } Since you now synchronize lookup/add *and* lazy elementData construction on the same object (the ClassData instance), you can merge two synchronized blocks and simplify code. You can make MemberName.next, ClassData.elementData and ClassData.size be non-volatile (just ClassData.publishedToVM needs to be volatile) and ClassData.intern() can look something like that: public synchronized MemberName intern(Class klass, MemberName memberName, int redefined_count) { final int index; if (elementData == null) { elementData = new MemberName[1]; index = ~0; } else { index = Arrays.binarySearch(elementData, 0, size, memberName); if (index >= 0) return elementData[index]; } // Not found, add carefully. return add(klass, ~index, memberName, redefined_count); } // Note: no need for additional storeFence() in add()... private MemberName add(Class klass, int index, MemberName e, int redefined_count) { // First attempt publication to JVM, if that succeeds, // then record internally. e.next = publishedToVM; // volatile read of publishedToVM, followed by normal write of e.next... publishedToVM = e; // ...which is ordered before volatile write of publishedToVM... if (redefined_count != jla.getClassRedefinedCount(klass)) { // ...which is ordered before volatile read of klass.classRedefinedCount. // Lost a race, back out publication and report failure. publishedToVM = e.next; return null; } ... Now let's take for example one of the MemberName.make() methods that return interned MemberNames: 206 public static MemberName make(Method m, boolean wantSpecial) { 207 // Unreflected member names are resolved so intern them here. 208 MemberName tmp0 = null; 209 InternTransaction tx = new InternTransaction(m.getDeclaringClass()); 210 while (tmp0 == null) { 211 MemberName tmp = new MemberName(m, wantSpecial); 212 tmp0 = tx.tryIntern(tmp); 213 } 214 return tmp0; 215 } I'm trying to understand the workings of InternTransaction helper class (and find an example that breaks it). You create an instance of it, passing Method's declaringClass. You then (in retry loop) create a resolved MemberName from the Method and wantSpecial flag. This MemberName's clazz can apparently differ from Method's declaringClass. I don't know when and why this happens, but apparently it can (super method?), so in InternTransaction.tryIntern() you do... 363 if (member_name.isResolved()) { 364 if (member_name.clazz != tx_class) { 365 Class prev_tx_class = tx_class; 366 int prev_txn_token = txn_token; 367 tx_class = member_name.clazz; 368 txn_token = internTxnToken(tx_class); 369 // Zero is a special case. 370 if (txn_token != 0 || 371 prev_txn_token != internTxnToken(prev_tx_class)) { 372 // Resolved class is different and at least one 373 // redef of it occurred, therefore repeat with 374 // proper class for race consistency checking. 375 return null; 376 } 377 } 378 member_name = member_name.intern(txn_token); 379 if (member_name == null) { 380 // Update the token for the next try. 381 txn_token = internTxnToken(tx_class); 382 } 383 } Now let's assume that the resolved member_name.clazz differs from Method's declaringClass. Let's assume also that either member_name.clazz has had at least one redefinition or Method's declaringClass has been redefined between creating InternTransaction and reading member_name.clazz's txn_token. You return 'null' in such case, concluding that not only the resolved member_name.clazz redefinition matters, but Method's declaringClass redefinition can also invalidate resolved MemberName am I right? It would be helpful if I could understand when and how Method's declaringClass redefinition can affect member_name. Can it affect which clazz is resolved for member_name? Anyway, you return null in such case from an updated InternTransaction (tx_class and txn_token are now updated to have values for resolved member_name.clazz). In next round the checks of newly constructed and resolved member_name are not performed against Method's declaringClass but against previous round's member_name.clazz. Is this what is intended? I can see there has to be a stop condition for loop to end, but shouldn't checks for Method's declaringClass redefinition be performed in every iteration (in addition to the check for member_name.clazz redefinition if it differs from Method's declaringClass)? Regards, Peter On 11/07/2014 10:14 PM, David Chase wrote: > New webrev: > > bug:https://bugs.openjdk.java.net/browse/JDK-8013267 > > webrevs: > http://cr.openjdk.java.net/~drchase/8013267/jdk.06/ > http://cr.openjdk.java.net/~drchase/8013267/hotspot.06/ > > Changes since last: > > 1) refactored to put ClassData under java.lang.invoke.MemberName > > 2) split the data structure into two parts; handshake with JVM uses a linked list, > which makes for a simpler backout-if-race, and Java side continues to use the > simple sorted array. This should allow easier use of (for example) fancier > data structures (like ConcurrentHashMap) if this later proves necessary. > > 3) Cleaned up symbol references in the new hotspot code to go through vmSymbols. > > 4) renamed oldCapacity to oldSize > > 5) ran two different benchmarks and saw no change in performance. > a) nashorn ScriptTest (seehttps://bugs.openjdk.java.net/browse/JDK-8014288 ) > b) JMH microbenchmarks > (see bug comments for details) > > And it continues to pass the previously-failing tests, as well as the new test > which has been added to hotspot/test/compiler/jsr292 . > > David > > On 2014-11-04, at 3:54 PM, David Chase wrote: > >> I?m working on the initial benchmarking, and so far this arrangement (with synchronization >> and binary search for lookup, lots of barriers and linear cost insertion) has not yet been any >> slower. >> >> I am nonetheless tempted by the 2-tables solution, because I think the simpler JVM-side >> interface that it allows is desirable. >> >> David >> >> On 2014-11-04, at 11:48 AM, Peter Levart wrote: >> >>> On 11/04/2014 04:19 PM, David Chase wrote: >>>> On 2014-11-04, at 5:07 AM, Peter Levart wrote: >>>>> Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? >>>> It can?t be an identityHashMap, because we are interning member names. >>> I know it can't be IdentityHashMap - I just wondered if you were thinking of an IdentityHashMap-like data structure in contrast to standard HashMap-like. Not in terms of equality/hashCode used, but in terms of internal data structure. IdentityHashMap is just an array of elements (well pairs of them - key, value are placed in two consecutive array slots). Lookup searches for element linearly in the array starting from hashCode based index to the element if found or 1st empty array slot. It's very easy to implement if the only operations are get() and put() and could be used for interning and as a shared structure for VM to scan, but array has to be sized to at least 3/2 the number of elements for performance to not degrade. >>> >>>> In spite of my grumbling about benchmarking, I?m inclined to do that and try a couple of experiments. >>>> One possibility would be to use two data structures, one for interning, the other for communication with the VM. >>>> Because there?s no lookup in the VM data stucture it can just be an array that gets elements appended, >>>> and the synchronization dance is much simpler. >>>> >>>> For interning, maybe I use a ConcurrentHashMap, and I try the following idiom: >>>> >>>> mn = resolve(args) >>>> // deal with any errors >>>> mn? = chm.get(mn) >>>> if (mn? != null) return mn? // hoped-for-common-case >>>> >>>> synchronized (something) { >>>> mn? = chm.get(mn) >>>> if (mn? != null) return mn? >>>> txn_class = mn.getDeclaringClass() >>>> >>>> while (true) { >>>> redef_count = txn_class.redefCount() >>>> mn = resolve(args) >>>> >>>> shared_array.add(mn); >>>> // barrier, because we are a paranoid >>>> if (redef_count = redef_count.redefCount()) { >>>> chm.add(mn); // safe to publish to other Java threads. >>>> return mn; >>>> } >>>> shared_array.drop_last(); // Try again >>>> } >>>> } >>>> >>>> (Idiom gets slightly revised for the one or two other intern use cases, but this is the basic idea). >>> Yes, that's similar to what I suggested by using a linked-list of MemberName(s) instead of the "shared_array" (easier to reason about ordering of writes) and a sorted array of MemberName(s) instead of the "chm" in your scheme above. ConcurrentHashMap would certainly be the most performant solution in terms of lookup/insertion-time and concurrent throughput, but it will use more heap than a simple packed array of MemberNames. CHM is much better now in JDK8 though regarding heap use. >>> >>> A combination of the two approaches is also possible: >>> >>> - instead of maintaining a "shared_array" of MemberName(s), have them form a linked-list (you trade a slot in array for 'next' pointer in MemberName) >>> - use ConcurrentHashMap for interning. >>> >>> Regards, Peter >>> >>>> David >>>> >>>>>> And another way to view this is that we?re now quibbling about performance, when we still >>>>>> have an existing correctness problem that this patch solves, so maybe we should just get this >>>>>> done and then file an RFE. >>>>> Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. >>>>> >>>>> Regards, Peter >>>>> >>>>>> David From peter.levart at gmail.com Sun Nov 9 12:55:10 2014 From: peter.levart at gmail.com (Peter Levart) Date: Sun, 09 Nov 2014 13:55:10 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> Message-ID: <545F642E.30205@gmail.com> Hi David, I played a little with the idea of having a hash table instead of packed sorted array for interning. Using ConcurrentHashMap would present quite some memory overhead. A more compact representation is possible in the form of a linear-scan hash table where elements of array are MemberNames themselves: http://cr.openjdk.java.net/~plevart/misc/MemberName.intern/jdk.06.diff/ This is a drop-in replacement for MemberName on top of your jdk.06 patch. If you have some time, you can run this with your performance tests to see if it presents any difference. If not, then perhaps this interning is not so performance critical after all. Regards, Peter On 11/07/2014 10:14 PM, David Chase wrote: > New webrev: > > bug: https://bugs.openjdk.java.net/browse/JDK-8013267 > > webrevs: > http://cr.openjdk.java.net/~drchase/8013267/jdk.06/ > http://cr.openjdk.java.net/~drchase/8013267/hotspot.06/ > > Changes since last: > > 1) refactored to put ClassData under java.lang.invoke.MemberName > > 2) split the data structure into two parts; handshake with JVM uses a linked list, > which makes for a simpler backout-if-race, and Java side continues to use the > simple sorted array. This should allow easier use of (for example) fancier > data structures (like ConcurrentHashMap) if this later proves necessary. > > 3) Cleaned up symbol references in the new hotspot code to go through vmSymbols. > > 4) renamed oldCapacity to oldSize > > 5) ran two different benchmarks and saw no change in performance. > a) nashorn ScriptTest (see https://bugs.openjdk.java.net/browse/JDK-8014288 ) > b) JMH microbenchmarks > (see bug comments for details) > > And it continues to pass the previously-failing tests, as well as the new test > which has been added to hotspot/test/compiler/jsr292 . > > David > > On 2014-11-04, at 3:54 PM, David Chase wrote: > >> I?m working on the initial benchmarking, and so far this arrangement (with synchronization >> and binary search for lookup, lots of barriers and linear cost insertion) has not yet been any >> slower. >> >> I am nonetheless tempted by the 2-tables solution, because I think the simpler JVM-side >> interface that it allows is desirable. >> >> David >> >> On 2014-11-04, at 11:48 AM, Peter Levart wrote: >> >>> On 11/04/2014 04:19 PM, David Chase wrote: >>>> On 2014-11-04, at 5:07 AM, Peter Levart wrote: >>>>> Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? >>>> It can?t be an identityHashMap, because we are interning member names. >>> I know it can't be IdentityHashMap - I just wondered if you were thinking of an IdentityHashMap-like data structure in contrast to standard HashMap-like. Not in terms of equality/hashCode used, but in terms of internal data structure. IdentityHashMap is just an array of elements (well pairs of them - key, value are placed in two consecutive array slots). Lookup searches for element linearly in the array starting from hashCode based index to the element if found or 1st empty array slot. It's very easy to implement if the only operations are get() and put() and could be used for interning and as a shared structure for VM to scan, but array has to be sized to at least 3/2 the number of elements for performance to not degrade. >>> >>>> In spite of my grumbling about benchmarking, I?m inclined to do that and try a couple of experiments. >>>> One possibility would be to use two data structures, one for interning, the other for communication with the VM. >>>> Because there?s no lookup in the VM data stucture it can just be an array that gets elements appended, >>>> and the synchronization dance is much simpler. >>>> >>>> For interning, maybe I use a ConcurrentHashMap, and I try the following idiom: >>>> >>>> mn = resolve(args) >>>> // deal with any errors >>>> mn? = chm.get(mn) >>>> if (mn? != null) return mn? // hoped-for-common-case >>>> >>>> synchronized (something) { >>>> mn? = chm.get(mn) >>>> if (mn? != null) return mn? >>>> txn_class = mn.getDeclaringClass() >>>> >>>> while (true) { >>>> redef_count = txn_class.redefCount() >>>> mn = resolve(args) >>>> >>>> shared_array.add(mn); >>>> // barrier, because we are a paranoid >>>> if (redef_count = redef_count.redefCount()) { >>>> chm.add(mn); // safe to publish to other Java threads. >>>> return mn; >>>> } >>>> shared_array.drop_last(); // Try again >>>> } >>>> } >>>> >>>> (Idiom gets slightly revised for the one or two other intern use cases, but this is the basic idea). >>> Yes, that's similar to what I suggested by using a linked-list of MemberName(s) instead of the "shared_array" (easier to reason about ordering of writes) and a sorted array of MemberName(s) instead of the "chm" in your scheme above. ConcurrentHashMap would certainly be the most performant solution in terms of lookup/insertion-time and concurrent throughput, but it will use more heap than a simple packed array of MemberNames. CHM is much better now in JDK8 though regarding heap use. >>> >>> A combination of the two approaches is also possible: >>> >>> - instead of maintaining a "shared_array" of MemberName(s), have them form a linked-list (you trade a slot in array for 'next' pointer in MemberName) >>> - use ConcurrentHashMap for interning. >>> >>> Regards, Peter >>> >>>> David >>>> >>>>>> And another way to view this is that we?re now quibbling about performance, when we still >>>>>> have an existing correctness problem that this patch solves, so maybe we should just get this >>>>>> done and then file an RFE. >>>>> Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. >>>>> >>>>> Regards, Peter >>>>> >>>>>> David From aleksey.shipilev at oracle.com Sun Nov 9 15:49:14 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Sun, 09 Nov 2014 18:49:14 +0300 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <545B9CC0.3080106@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> Message-ID: <545F8CFA.80809@oracle.com> Hi again, No changes in webrev: http://cr.openjdk.java.net/~shade/8015272/webrev.01/ Please review and sponsor: http://cr.openjdk.java.net/~shade/8015272/8015272.changeset As per Karen's request, more testing is done, ran the tests on my Linux x86_64/fastdebug: On 11/06/2014 07:07 PM, Aleksey Shipilev wrote: > On 11/06/2014 06:01 PM, Karen Kinnear wrote: >> - e.g. what about the vmtestbase vm/runtime/contended tests (and yes, some tests should be removed from that testlist) vmtestbase vm/runtime/contended: no issues. hotspot/test/runtime/ jtreg: no issues. >> - vmtestbase: vm.quick.testlist (required for runtime changes) vm.quick.testlist: no issues. >> - and since @Contended annotation is used in the JDK core libraries - the jtreg jdk tests? jdk/test/java/util/concurrent jtreg: no issues. jdk/test/java/lang/Thread jtreg: no issues. Thanks, -Aleksey. From aleksey.shipilev at oracle.com Sun Nov 9 18:45:35 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Sun, 09 Nov 2014 21:45:35 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings Message-ID: <545FB64F.7090705@oracle.com> Hi, Thread.getName() returns String, and does new String instantiation every time, because the thread name is stored in char[]. Even though we use a private String constructor that shares the char[] array without copying it, this still hurts some use cases (think extra-fast logging). To the extent some people actually maintain Map to avoid it. https://bugs.openjdk.java.net/browse/JDK-8059677 Here's the attempt to maintain String instead of char[]: http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ JDK changes are trivial, but HS changes require some rewiring, since VM treats Thread.name specially. However, it turns out we can make a contained change, since the getter is used sparingly, and setter seems to be not used at all. Any trouble with this change? Testing: JPRT, manual tests, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist Thanks, -Aleksey. From andreas.eriksson at oracle.com Mon Nov 10 11:13:36 2014 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Mon, 10 Nov 2014 12:13:36 +0100 Subject: RFR: JDK7 backport of 8020675 - invalid jar file in the bootclasspath could lead to jvm fatal error In-Reply-To: <545D3BCF.8090200@oracle.com> References: <545B797D.70907@oracle.com> <545BA401.1070205@oracle.com> <545BBA1F.3040301@oracle.com> <545CDBB0.80700@oracle.com> <545CE105.4020208@oracle.com> <545D3BCF.8090200@oracle.com> Message-ID: <54609DE0.3080107@oracle.com> On 2014-11-07 22:38, Calvin Cheung wrote: > The new webrev looks good. > > On 11/7/2014 7:11 AM, Andreas Eriksson wrote: >> I think I need a jdk7u Reviewer to look at this as well, right? > For backport, you only need 1 reviewer and it doesn't have to be a > capital R Reviewer. > OK, thanks! - Andreas > Calvin > >> >> New webrev where I added the 0 byte dummy.jar: >> http://cr.openjdk.java.net/~aeriksso/8020675/webrev.01/ >> >> Checked so that the test fails on older versions and still passes on >> a fixed version. >> >> Regards, >> Andreas >> >> On 2014-11-07 15:48, Andreas Eriksson wrote: >>> Oh, interesting. >>> The hsx25 changeset does not display the dummy.jar as being a part >>> of the checkin: >>> http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/7e7dd25666da >>> >>> But when I navigate to the dummy.jar path I can see that it was >>> checked in as part of that changeset: >>> http://hg.openjdk.java.net/hsx/hsx25/hotspot/log/7e7dd25666da/test/runtime/LoadClass/dummy.jar >>> >>> >>> Is this a know issue with mercurial? >>> >>> Anyway, thanks for pointing this out, I would probably have missed >>> it otherwise. >>> It seems that if the dummy.jar is not present the test always succeeds. >>> >>> Thanks, >>> Andreas >>> >>> On 2014-11-06 19:12, Calvin Cheung wrote: >>>> Hi Andreas, >>>> >>>> The change looks good. >>>> There should be a dummy.jar to go with the test cases. >>>> http://cr.openjdk.java.net/~ccheung/8020675/webrev.02/ >>>> >>>> The webrev won't show any diffs for the jar file but don't forget >>>> to include it when you push the fix. >>>> >>>> thanks, >>>> Calvin >>>> >>>> On 11/6/2014 8:38 AM, Andreas Eriksson wrote: >>>>> Hi, >>>>> >>>>> Could someone please review this jdk7 backport of JDK-8020675 >>>>> . >>>>> Summary: >>>>> invalid jar file in the bootclasspath could lead to jvm fatal error >>>>> removed offending EXCEPTION_MARK calls and code cleanup >>>>> >>>>> One code change necessary for the backport was in method >>>>> ClassLoader::load_classfile. >>>>> The change was to use CHECK_(instanceKlassHandle()) instead of >>>>> CHECK_NULL. >>>>> See the mail thread at >>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2014-November/015825.html >>>>> for more information. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~aeriksso/8020675/webrev.00/ >>>>> >>>>> Regards, >>>>> Andreas >>>>> >>>>> >>>> >>> >> > From david.holmes at oracle.com Mon Nov 10 11:21:24 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 10 Nov 2014 21:21:24 +1000 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <545D1D56.4050000@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> <545AF8F2.1010106@oracle.com> <545C1B31.3060901@oracle.com> <545C68E7.4080807@oracle.com> <545D1D56.4050000@oracle.com> Message-ID: <54609FB4.6040203@oracle.com> On 8/11/2014 5:28 AM, Calvin Cheung wrote: > On 11/6/2014 10:38 PM, David Holmes wrote: >> Hi Calvin, >> >> On 7/11/2014 11:06 AM, Calvin Cheung wrote: >>> I've updated the webrev at the same location: >>> http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>> I also re-ran the tests. >>> >>> Please take a look. >> >> 717 jio_snprintf(class_list_path_str + class_list_path_len, >> 718 sizeof(class_list_path_str) - >> class_list_path_len, >> 719 "%slib", os::file_separator()); >> 720 } >> 721 } >> 722 class_list_path_len = (int)strlen(class_list_path_str); >> >> The strlen recalculation at #722 should be moved inside the if-block >> as that is the only time it is needed. > Agreed. >> Also can we not just do += 4 ? > I didn't want to use 4 to avoid another magic number but in this case I > think it's obvious. > > I've updated webrev at the same location: > http://cr.openjdk.java.net/~ccheung/8060721/webrev/ Looks good to me. Thanks, David > thanks, > Calvin >> >> Thanks, >> David >> >>> thanks, >>> Calvin >>> >>> On 11/5/2014 8:28 PM, Calvin Cheung wrote: >>>> On 11/5/2014 4:50 PM, David Holmes wrote: >>>>> On 6/11/2014 5:14 AM, Calvin Cheung wrote: >>>>>> While upgrading the compiler on Mac for jdk9, we found this compiler >>>>>> bug >>>>>> where it skips the following 2 lines of code in metaspaceShared.cpp >>>>>> when >>>>>> optimization is enable (set to -Os) for the fastdebug and product >>>>>> builds. >>>>>> strcat(class_list_path_str, os::file_separator()); >>>>>> strcat(class_list_path_str, "classlist"); >>>>>> >>>>>> The bug is reproducible with Xcode 5.1.1 and 6.1. >>>>>> >>>>>> A workaround fix is to rewrite an "if" block in the >>>>>> MetaspaceShared::preload_and_dump() method. >>>>> >>>>> Can't you simply replace the strcats with jio_snprintf and do away >>>>> with the sub_path array? >>>> The following works. I'll do more testing before sending an updated >>>> webrev. >>>> >>>> --- a/src/share/vm/memory/metaspaceShared.cpp >>>> +++ b/src/share/vm/memory/metaspaceShared.cpp >>>> @@ -713,12 +713,15 @@ >>>> int class_list_path_len = (int)strlen(class_list_path_str); >>>> if (class_list_path_len >= 3) { >>>> if (strcmp(class_list_path_str + class_list_path_len - 3, >>>> "lib") != 0) { >>>> - strcat(class_list_path_str, os::file_separator()); >>>> - strcat(class_list_path_str, "lib"); >>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>> + sizeof(class_list_path_str) - >>>> class_list_path_len, >>>> + "%slib", os::file_separator()); >>>> } >>>> } >>>> - strcat(class_list_path_str, os::file_separator()); >>>> - strcat(class_list_path_str, "classlist"); >>>> + class_list_path_len = (int)strlen(class_list_path_str); >>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>> + sizeof(class_list_path_str) - class_list_path_len, >>>> + "%sclasslist", os::file_separator()); >>>> class_list_path = class_list_path_str; >>>> } else { >>>> class_list_path = SharedClassListFile; >>>>> >>>>> Or even try strncat instead of strcat? >>>> I think jio_snprintf is better because it null terminates the string. >>>> If I use strncat, I'll need to initialize the entire buffer to null. >>>> >>>> thanks, >>>> Calvin >>>>> >>>>> David >>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>>>> >>>>>> Testing: >>>>>> JPRT >>>>>> The affected testcase with product, fastdebug, and debug builds >>>>>> built with Xcode 5.1.1 and 6.1. >>>>>> >>>>>> thanks, >>>>>> Calvin >>>> >>> > From chris.hegarty at oracle.com Mon Nov 10 11:52:08 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Mon, 10 Nov 2014 11:52:08 +0000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <545FB64F.7090705@oracle.com> References: <545FB64F.7090705@oracle.com> Message-ID: <5460A6E8.9050506@oracle.com> Aleksey, I have only looked at the libraries changes, and I think they make sense . As in, I can find no reason why the name cannot be changed to be a String. Trivially, after your changes will NPE be thrown if setName(null), as it is today ? -Chris. On 09/11/14 18:45, Aleksey Shipilev wrote: > Hi, > > Thread.getName() returns String, and does new String instantiation every > time, because the thread name is stored in char[]. Even though we use a > private String constructor that shares the char[] array without copying > it, this still hurts some use cases (think extra-fast logging). To the > extent some people actually maintain Map to avoid it. > https://bugs.openjdk.java.net/browse/JDK-8059677 > > Here's the attempt to maintain String instead of char[]: > http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ > http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ > > JDK changes are trivial, but HS changes require some rewiring, since VM > treats Thread.name specially. However, it turns out we can make a > contained change, since the getter is used sparingly, and setter seems > to be not used at all. Any trouble with this change? > > Testing: JPRT, manual tests, jdk/test/java/lang/Thread jtreg, > hotspot/test/runtime/ jtreg, vm.quick.testlist, nsk.jvmti.testlist, > svc.quick.testlist > > Thanks, > -Aleksey. > From david.holmes at oracle.com Mon Nov 10 12:56:40 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 10 Nov 2014 22:56:40 +1000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460A6E8.9050506@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> Message-ID: <5460B608.4050909@oracle.com> On 10/11/2014 9:52 PM, Chris Hegarty wrote: > Aleksey, > > I have only looked at the libraries changes, and I think they make sense > . As in, I can find no reason why the name cannot be changed to be a > String. Very quick response, but IIRC this has been examined in the past and there were reasons why it can't/shouldn't be done. Will try to dig out more details in the morning. If String construction is a bottleneck just cache it. David ----- > Trivially, after your changes will NPE be thrown if setName(null), as it > is today ? > > -Chris. > > On 09/11/14 18:45, Aleksey Shipilev wrote: >> Hi, >> >> Thread.getName() returns String, and does new String instantiation every >> time, because the thread name is stored in char[]. Even though we use a >> private String constructor that shares the char[] array without copying >> it, this still hurts some use cases (think extra-fast logging). To the >> extent some people actually maintain Map to avoid it. >> https://bugs.openjdk.java.net/browse/JDK-8059677 >> >> Here's the attempt to maintain String instead of char[]: >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ >> >> JDK changes are trivial, but HS changes require some rewiring, since VM >> treats Thread.name specially. However, it turns out we can make a >> contained change, since the getter is used sparingly, and setter seems >> to be not used at all. Any trouble with this change? >> >> Testing: JPRT, manual tests, jdk/test/java/lang/Thread jtreg, >> hotspot/test/runtime/ jtreg, vm.quick.testlist, nsk.jvmti.testlist, >> svc.quick.testlist >> >> Thanks, >> -Aleksey. >> From chris.hegarty at oracle.com Mon Nov 10 13:53:24 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Mon, 10 Nov 2014 13:53:24 +0000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460B608.4050909@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> Message-ID: <5460C354.5000605@oracle.com> On 10/11/14 12:56, David Holmes wrote: > On 10/11/2014 9:52 PM, Chris Hegarty wrote: >> Aleksey, >> >> I have only looked at the libraries changes, and I think they make sense >> . As in, I can find no reason why the name cannot be changed to be a >> String. > > Very quick response, but IIRC this has been examined in the past and > there were reasons why it can't/shouldn't be done. Will try to dig out > more details in the morning. If there was previous discussion on this, that revealed some substantial issue, that would be great, but I can't recall, or find, it now. Hotspot express, and the desire for hotspot to run with different library versions, would certainly cause complication, but I don't believe that is an issue now. Just on that, the library changes are minimal, and if this were to proceed then they can accompany the hotspot change, as they make their way into jdk9/dev. Anyway, this should await your reply. -Chris. > If String construction is a bottleneck just cache it. > > David > ----- > >> Trivially, after your changes will NPE be thrown if setName(null), as it >> is today ? >> >> -Chris. >> >> On 09/11/14 18:45, Aleksey Shipilev wrote: >>> Hi, >>> >>> Thread.getName() returns String, and does new String instantiation every >>> time, because the thread name is stored in char[]. Even though we use a >>> private String constructor that shares the char[] array without copying >>> it, this still hurts some use cases (think extra-fast logging). To the >>> extent some people actually maintain Map to avoid it. >>> https://bugs.openjdk.java.net/browse/JDK-8059677 >>> >>> Here's the attempt to maintain String instead of char[]: >>> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >>> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ >>> >>> JDK changes are trivial, but HS changes require some rewiring, since VM >>> treats Thread.name specially. However, it turns out we can make a >>> contained change, since the getter is used sparingly, and setter seems >>> to be not used at all. Any trouble with this change? >>> >>> Testing: JPRT, manual tests, jdk/test/java/lang/Thread jtreg, >>> hotspot/test/runtime/ jtreg, vm.quick.testlist, nsk.jvmti.testlist, >>> svc.quick.testlist >>> >>> Thanks, >>> -Aleksey. >>> From vladimir.x.ivanov at oracle.com Mon Nov 10 13:01:39 2014 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 10 Nov 2014 17:01:39 +0400 Subject: [9] RFR (S): 8060147: SIGSEGV in Metadata::mark_on_stack() while marking metadata in ciEnv In-Reply-To: <545A9BEB.8020507@oracle.com> References: <5450F261.60400@oracle.com> <545114DF.7040005@oracle.com> <54511744.4060904@oracle.com> <5451F43A.1010108@oracle.com> <5452128C.4090408@oracle.com> <54522805.5040701@oracle.com> <1421AC31-CBEA-4AED-A657-3C43B258A6DB@oracle.com> <54522357.4070705@oracle.com> <5452425D.7040405@oracle.com> <5452517C.4050104@oracle.com> <54527E1E.1070507@oracle.com> <5452A786.7030000@oracle.com> <5452A077.2050903@oracle.com> <545A5F59.2020907@oracle.com> <545A5814.8000109@oracle.com> <545A9BEB.8020507@oracle.com> Message-ID: <5460B733.4040500@oracle.com> Vladimir, Coleen, Roland, Mikael, thanks for reviews! On 11/6/14, 1:51 AM, Vladimir Kozlov wrote: > I am fine with targeted fix only. > > One comment env->get_instance_klass() checks for NULL. Your new code in > create_new_metadata() does not: > > ciInstanceKlass* holder = > get_metadata(h_m()->method_holder())->as_instance_klass(); Good catch. I reverted to ciEnv::get_instance_klass(). FTR updated webrev: http://cr.openjdk.java.net/~vlivanov/8060147/webrev.02 Best regards, Vladimir Ivanov > > Thanks, > Vladimir K > > On 11/5/14 9:02 AM, Vladimir Ivanov wrote: >> >> On 11/5/14, 9:33 PM, Coleen Phillimore wrote: >>> >>> On 10/30/14, 4:32 PM, Vladimir Ivanov wrote: >>>> Coleen, >>>> >>>> I implemented 2 approaches of the fix. >>>> >>>> The fix with a special case for VM anon classes is: >>>> http://cr.openjdk.java.net/~vlivanov/8060147/webrev.anon.00/ >>>> >>>> Both fix the bug, but have different properties. >>>> >>>> (1) Special case for VM anon class is very focused on the actual >>>> cause, but more fragile - all the logic which keeps metadata from >>>> being deallocated is non-trivial and scattered around the whole >>>> ciMetadata hierarchy. >>>> >>>> (2) On the other hand, initial version, which forcibly creates >>>> klass_holder ciObject for each ciMetadata, is much cleaner and >>>> localized, but does unnecessary work. >>>> >>>> Am I right that you prefer (1) as a fix? >>> >>> Yes, I think this version does less unnecessary work and creates less >>> ciObjects. And the comment is useful for finding how we keep >>> ciMetadata alive for anonymous classes. You still have a UseNewCode in >>> the webrev thought that you want to take out. >> >> Thanks, Coleen. >> >> VladimirK, Roland, what do you think about (1)? >> >> Best regards, >> Vladimir Ivanov From aleksey.shipilev at oracle.com Mon Nov 10 14:08:51 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Nov 2014 17:08:51 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460A6E8.9050506@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> Message-ID: <5460C6F3.1080201@oracle.com> Hi Chris, Thanks for taking a look! On 11/10/2014 02:52 PM, Chris Hegarty wrote: > Trivially, after your changes will NPE be thrown if setName(null), as it > is today ? There is no way it could throw NPE now, therefore the behavior is different. The spec says nothing about NPE though, but it feels wrong to pass the null String to setNativeName. I should add Objects.requireNonNull there. Will wait for more feedbacks, and update the webrev. -Aleksey. From aleksey.shipilev at oracle.com Mon Nov 10 14:19:12 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Nov 2014 17:19:12 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460C354.5000605@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> Message-ID: <5460C960.9080509@oracle.com> Hi David, Chris, On 11/10/2014 04:53 PM, Chris Hegarty wrote: > On 10/11/14 12:56, David Holmes wrote: >> On 10/11/2014 9:52 PM, Chris Hegarty wrote: >>> I have only looked at the libraries changes, and I think they make sense >>> . As in, I can find no reason why the name cannot be changed to be a >>> String. >> >> Very quick response, but IIRC this has been examined in the past and >> there were reasons why it can't/shouldn't be done. Will try to dig out >> more details in the morning. > > If there was previous discussion on this, that revealed some substantial > issue, that would be great, but I can't recall, or find, it now. > > Hotspot express, and the desire for hotspot to run with different > library versions, would certainly cause complication, but I don't > believe that is an issue now. > > Just on that, the library changes are minimal, and if this were to > proceed then they can accompany the hotspot change, as they make their > way into jdk9/dev. > > Anyway, this should await your reply. Alan was having the same concern, there is an issue with JNI/JVMTI and other power users that might break when exposed to under-constructed Thread, e.g: https://bugs.openjdk.java.net/browse/JDK-6412693 This is why I ran jvmti and serviceability tests for this change, yielding no failures. This reinforces my belief this patch does not break the important invariant: if there is a problem with "Thread.name = name.toCharArray()" anywhere in Thread code, then "Thread.name = name" does neither regress it further nor fixes it. Then I speculated that having char[] name would help VM initialize the name if we wanted to switch to complete VM-side initialization of Thread, but it seems we can do String oop instantiation in the similar vein. Caching the name feels like a band-aid, that will probably complicate the Thread initialization on VM side even more. Let's wait and see if David can come up with some horror issue we are overlooking. :) Thanks, -Aleksey. From Alan.Bateman at oracle.com Mon Nov 10 14:30:33 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 10 Nov 2014 14:30:33 +0000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460C354.5000605@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> Message-ID: <5460CC09.7030204@oracle.com> On 10/11/2014 13:53, Chris Hegarty wrote: > > If there was previous discussion on this, that revealed some > substantial issue, that would be great, but I can't recall, or find, > it now. > > Hotspot express, and the desire for hotspot to run with different > library versions, would certainly cause complication, but I don't > believe that is an issue now. > > Just on that, the library changes are minimal, and if this were to > proceed then they can accompany the hotspot change, as they make their > way into jdk9/dev. > I remember the previous discussion on this and at the time it was just too troublesome to try to coordinate the change to hotspot + jdk. So a jdk-only change was pushed to address the last issue in this area, the issue of changing it from char[] to String was kicked down the road. -Alan From staffan.larsen at oracle.com Mon Nov 10 14:51:13 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Mon, 10 Nov 2014 15:51:13 +0100 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460C960.9080509@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> Message-ID: I?m afraid this change requires changes in the Serviceability Agent as well. See OopUtilities.threadOopGetName() for example. /Staffan > On 10 nov 2014, at 15:19, Aleksey Shipilev wrote: > > Hi David, Chris, > > On 11/10/2014 04:53 PM, Chris Hegarty wrote: >> On 10/11/14 12:56, David Holmes wrote: >>> On 10/11/2014 9:52 PM, Chris Hegarty wrote: >>>> I have only looked at the libraries changes, and I think they make sense >>>> . As in, I can find no reason why the name cannot be changed to be a >>>> String. >>> >>> Very quick response, but IIRC this has been examined in the past and >>> there were reasons why it can't/shouldn't be done. Will try to dig out >>> more details in the morning. >> >> If there was previous discussion on this, that revealed some substantial >> issue, that would be great, but I can't recall, or find, it now. >> >> Hotspot express, and the desire for hotspot to run with different >> library versions, would certainly cause complication, but I don't >> believe that is an issue now. >> >> Just on that, the library changes are minimal, and if this were to >> proceed then they can accompany the hotspot change, as they make their >> way into jdk9/dev. >> >> Anyway, this should await your reply. > > Alan was having the same concern, there is an issue with JNI/JVMTI and > other power users that might break when exposed to under-constructed > Thread, e.g: > https://bugs.openjdk.java.net/browse/JDK-6412693 > > This is why I ran jvmti and serviceability tests for this change, > yielding no failures. This reinforces my belief this patch does not > break the important invariant: if there is a problem with "Thread.name = > name.toCharArray()" anywhere in Thread code, then "Thread.name = name" > does neither regress it further nor fixes it. > > Then I speculated that having char[] name would help VM initialize the > name if we wanted to switch to complete VM-side initialization of > Thread, but it seems we can do String oop instantiation in the similar vein. > > Caching the name feels like a band-aid, that will probably complicate > the Thread initialization on VM side even more. Let's wait and see if > David can come up with some horror issue we are overlooking. :) > > Thanks, > -Aleksey. > From aleksey.shipilev at oracle.com Mon Nov 10 14:54:16 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Nov 2014 17:54:16 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460CC25.1000609@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460C6F3.1080201@oracle.com> <5460CC25.1000609@oracle.com> Message-ID: <5460D198.4030905@oracle.com> Hi Roger, On 11/10/2014 05:31 PM, roger riggs wrote: > 1) The Thread class javadoc says: > " Unless otherwise noted, passing a {@code null} argument to a constructor > * or method in this class will cause a {@link NullPointerException} to be > * thrown." > > So, NPE is already specified for setThreadName(null) or any other method. Ah, thanks! It is odd to see this specified in a blanked fashion in the class Javadoc, oh well. So we need to restore the NP check. > I'm not infavor of adding the Objects.requireNonNull, the NPE will > be thrown soon enough and it is just noise in the source code in most > cases that creates larger bytecodes and extra work for the compiler > /interpreter. Sorry, I have a hard time understanding what you are saying. How would you guarantee NPE (as per Javadoc contract above) in the new version of Thread.setName otherwise? > 2) About not storing the name as a String, I have some vague > recollection of the issue being related to exposing an object > settable by the application that can be used with synchronize and > allows communication and sync issues between threads. Again, I don't quite understand. Is it about storing the reference to String as the thread name, that can potentially be used for external synchronization? If so, I have a hard time devising a sane test case that might fail with this change. Internal code does not synchronize on Thread.name. Anyone synchronizing on Thread.getName() result has broken synchronization with current code. Anyone synchronizing on Thread.getName() result after this patch will have that (ahem) fixed, plus a performance problem. > Just because some test doesn't fail, doesn't mean there isn't a > design/implementation constraint. I should have said, "I *also* run the jvmti and serviceability tests" to confirm the change in innocuous. See the HS code change itself -- it does seem contained. Thanks, -Aleksey. From aleksey.shipilev at oracle.com Mon Nov 10 14:55:22 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Nov 2014 17:55:22 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> Message-ID: <5460D1DA.4050907@oracle.com> Hi Staffan, Ow, it seems very like it. So, what testlist have I missed to catch this? -Aleksey. On 11/10/2014 05:51 PM, Staffan Larsen wrote: > I?m afraid this change requires changes in the Serviceability Agent as well. See OopUtilities.threadOopGetName() for example. > > /Staffan > >> On 10 nov 2014, at 15:19, Aleksey Shipilev wrote: >> >> Hi David, Chris, >> >> On 11/10/2014 04:53 PM, Chris Hegarty wrote: >>> On 10/11/14 12:56, David Holmes wrote: >>>> On 10/11/2014 9:52 PM, Chris Hegarty wrote: >>>>> I have only looked at the libraries changes, and I think they make sense >>>>> . As in, I can find no reason why the name cannot be changed to be a >>>>> String. >>>> >>>> Very quick response, but IIRC this has been examined in the past and >>>> there were reasons why it can't/shouldn't be done. Will try to dig out >>>> more details in the morning. >>> >>> If there was previous discussion on this, that revealed some substantial >>> issue, that would be great, but I can't recall, or find, it now. >>> >>> Hotspot express, and the desire for hotspot to run with different >>> library versions, would certainly cause complication, but I don't >>> believe that is an issue now. >>> >>> Just on that, the library changes are minimal, and if this were to >>> proceed then they can accompany the hotspot change, as they make their >>> way into jdk9/dev. >>> >>> Anyway, this should await your reply. >> >> Alan was having the same concern, there is an issue with JNI/JVMTI and >> other power users that might break when exposed to under-constructed >> Thread, e.g: >> https://bugs.openjdk.java.net/browse/JDK-6412693 >> >> This is why I ran jvmti and serviceability tests for this change, >> yielding no failures. This reinforces my belief this patch does not >> break the important invariant: if there is a problem with "Thread.name = >> name.toCharArray()" anywhere in Thread code, then "Thread.name = name" >> does neither regress it further nor fixes it. >> >> Then I speculated that having char[] name would help VM initialize the >> name if we wanted to switch to complete VM-side initialization of >> Thread, but it seems we can do String oop instantiation in the similar vein. >> >> Caching the name feels like a band-aid, that will probably complicate >> the Thread initialization on VM side even more. Let's wait and see if >> David can come up with some horror issue we are overlooking. :) >> >> Thanks, >> -Aleksey. >> > From vladimir.kozlov at oracle.com Mon Nov 10 15:59:55 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Nov 2014 07:59:55 -0800 Subject: [9] RFR (S): 8060147: SIGSEGV in Metadata::mark_on_stack() while marking metadata in ciEnv In-Reply-To: <5460B733.4040500@oracle.com> References: <5450F261.60400@oracle.com> <545114DF.7040005@oracle.com> <54511744.4060904@oracle.com> <5451F43A.1010108@oracle.com> <5452128C.4090408@oracle.com> <54522805.5040701@oracle.com> <1421AC31-CBEA-4AED-A657-3C43B258A6DB@oracle.com> <54522357.4070705@oracle.com> <5452425D.7040405@oracle.com> <5452517C.4050104@oracle.com> <54527E1E.1070507@oracle.com> <5452A786.7030000@oracle.com> <5452A077.2050903@oracle.com> <545A5F59.2020907@oracle.com> <545A5814.8000109@oracle.com> <545A9BEB.8020507@oracle.com> <5460B733.4040500@oracle.com> Message-ID: <5460E0FB.6000704@oracle.com> Good. Thanks, Vladimir On 11/10/14 5:01 AM, Vladimir Ivanov wrote: > Vladimir, Coleen, Roland, Mikael, thanks for reviews! > > On 11/6/14, 1:51 AM, Vladimir Kozlov wrote: >> I am fine with targeted fix only. >> >> One comment env->get_instance_klass() checks for NULL. Your new code in >> create_new_metadata() does not: >> >> ciInstanceKlass* holder = >> get_metadata(h_m()->method_holder())->as_instance_klass(); > Good catch. I reverted to ciEnv::get_instance_klass(). > > FTR updated webrev: > http://cr.openjdk.java.net/~vlivanov/8060147/webrev.02 > > Best regards, > Vladimir Ivanov > >> >> Thanks, >> Vladimir K >> >> On 11/5/14 9:02 AM, Vladimir Ivanov wrote: >>> >>> On 11/5/14, 9:33 PM, Coleen Phillimore wrote: >>>> >>>> On 10/30/14, 4:32 PM, Vladimir Ivanov wrote: >>>>> Coleen, >>>>> >>>>> I implemented 2 approaches of the fix. >>>>> >>>>> The fix with a special case for VM anon classes is: >>>>> http://cr.openjdk.java.net/~vlivanov/8060147/webrev.anon.00/ >>>>> >>>>> Both fix the bug, but have different properties. >>>>> >>>>> (1) Special case for VM anon class is very focused on the actual >>>>> cause, but more fragile - all the logic which keeps metadata from >>>>> being deallocated is non-trivial and scattered around the whole >>>>> ciMetadata hierarchy. >>>>> >>>>> (2) On the other hand, initial version, which forcibly creates >>>>> klass_holder ciObject for each ciMetadata, is much cleaner and >>>>> localized, but does unnecessary work. >>>>> >>>>> Am I right that you prefer (1) as a fix? >>>> >>>> Yes, I think this version does less unnecessary work and creates less >>>> ciObjects. And the comment is useful for finding how we keep >>>> ciMetadata alive for anonymous classes. You still have a >>>> UseNewCode in >>>> the webrev thought that you want to take out. >>> >>> Thanks, Coleen. >>> >>> VladimirK, Roland, what do you think about (1)? >>> >>> Best regards, >>> Vladimir Ivanov From mikael.gerdin at oracle.com Mon Nov 10 16:11:33 2014 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Mon, 10 Nov 2014 17:11:33 +0100 Subject: [9] RFR (S): 8060147: SIGSEGV in Metadata::mark_on_stack() while marking metadata in ciEnv In-Reply-To: <5460B733.4040500@oracle.com> References: <5450F261.60400@oracle.com> <545114DF.7040005@oracle.com> <54511744.4060904@oracle.com> <5451F43A.1010108@oracle.com> <5452128C.4090408@oracle.com> <54522805.5040701@oracle.com> <1421AC31-CBEA-4AED-A657-3C43B258A6DB@oracle.com> <54522357.4070705@oracle.com> <5452425D.7040405@oracle.com> <5452517C.4050104@oracle.com> <54527E1E.1070507@oracle.com> <5452A786.7030000@oracle.com> <5452A077.2050903@oracle.com> <545A5F59.2020907@oracle.com> <545A5814.8000109@oracle.com> <545A9BEB.8020507@oracle.com> <5460B733.4040500@oracle.com> Message-ID: <5460E3B5.2010206@oracle.com> Vladimir, On 2014-11-10 14:01, Vladimir Ivanov wrote: > Vladimir, Coleen, Roland, Mikael, thanks for reviews! > > On 11/6/14, 1:51 AM, Vladimir Kozlov wrote: >> I am fine with targeted fix only. >> >> One comment env->get_instance_klass() checks for NULL. Your new code in >> create_new_metadata() does not: >> >> ciInstanceKlass* holder = >> get_metadata(h_m()->method_holder())->as_instance_klass(); > Good catch. I reverted to ciEnv::get_instance_klass(). > > FTR updated webrev: > http://cr.openjdk.java.net/~vlivanov/8060147/webrev.02 Looks good. /Mikael > > Best regards, > Vladimir Ivanov > >> >> Thanks, >> Vladimir K >> >> On 11/5/14 9:02 AM, Vladimir Ivanov wrote: >>> >>> On 11/5/14, 9:33 PM, Coleen Phillimore wrote: >>>> >>>> On 10/30/14, 4:32 PM, Vladimir Ivanov wrote: >>>>> Coleen, >>>>> >>>>> I implemented 2 approaches of the fix. >>>>> >>>>> The fix with a special case for VM anon classes is: >>>>> http://cr.openjdk.java.net/~vlivanov/8060147/webrev.anon.00/ >>>>> >>>>> Both fix the bug, but have different properties. >>>>> >>>>> (1) Special case for VM anon class is very focused on the actual >>>>> cause, but more fragile - all the logic which keeps metadata from >>>>> being deallocated is non-trivial and scattered around the whole >>>>> ciMetadata hierarchy. >>>>> >>>>> (2) On the other hand, initial version, which forcibly creates >>>>> klass_holder ciObject for each ciMetadata, is much cleaner and >>>>> localized, but does unnecessary work. >>>>> >>>>> Am I right that you prefer (1) as a fix? >>>> >>>> Yes, I think this version does less unnecessary work and creates less >>>> ciObjects. And the comment is useful for finding how we keep >>>> ciMetadata alive for anonymous classes. You still have a >>>> UseNewCode in >>>> the webrev thought that you want to take out. >>> >>> Thanks, Coleen. >>> >>> VladimirK, Roland, what do you think about (1)? >>> >>> Best regards, >>> Vladimir Ivanov From staffan.larsen at oracle.com Mon Nov 10 16:39:27 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Mon, 10 Nov 2014 17:39:27 +0100 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460D1DA.4050907@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> <5460D1DA.4050907@oracle.com> Message-ID: > On 10 nov 2014, at 15:55, Aleksey Shipilev wrote: > > Hi Staffan, > > Ow, it seems very like it. > So, what testlist have I missed to catch this? Probably vm.tmtools.testlist and/or nsk.sajdi.testlist. Just a warning that these tests are far from stable. Sorry about that. /Staffan > > -Aleksey. > > On 11/10/2014 05:51 PM, Staffan Larsen wrote: >> I?m afraid this change requires changes in the Serviceability Agent as well. See OopUtilities.threadOopGetName() for example. >> >> /Staffan >> >>> On 10 nov 2014, at 15:19, Aleksey Shipilev wrote: >>> >>> Hi David, Chris, >>> >>> On 11/10/2014 04:53 PM, Chris Hegarty wrote: >>>> On 10/11/14 12:56, David Holmes wrote: >>>>> On 10/11/2014 9:52 PM, Chris Hegarty wrote: >>>>>> I have only looked at the libraries changes, and I think they make sense >>>>>> . As in, I can find no reason why the name cannot be changed to be a >>>>>> String. >>>>> >>>>> Very quick response, but IIRC this has been examined in the past and >>>>> there were reasons why it can't/shouldn't be done. Will try to dig out >>>>> more details in the morning. >>>> >>>> If there was previous discussion on this, that revealed some substantial >>>> issue, that would be great, but I can't recall, or find, it now. >>>> >>>> Hotspot express, and the desire for hotspot to run with different >>>> library versions, would certainly cause complication, but I don't >>>> believe that is an issue now. >>>> >>>> Just on that, the library changes are minimal, and if this were to >>>> proceed then they can accompany the hotspot change, as they make their >>>> way into jdk9/dev. >>>> >>>> Anyway, this should await your reply. >>> >>> Alan was having the same concern, there is an issue with JNI/JVMTI and >>> other power users that might break when exposed to under-constructed >>> Thread, e.g: >>> https://bugs.openjdk.java.net/browse/JDK-6412693 >>> >>> This is why I ran jvmti and serviceability tests for this change, >>> yielding no failures. This reinforces my belief this patch does not >>> break the important invariant: if there is a problem with "Thread.name = >>> name.toCharArray()" anywhere in Thread code, then "Thread.name = name" >>> does neither regress it further nor fixes it. >>> >>> Then I speculated that having char[] name would help VM initialize the >>> name if we wanted to switch to complete VM-side initialization of >>> Thread, but it seems we can do String oop instantiation in the similar vein. >>> >>> Caching the name feels like a band-aid, that will probably complicate >>> the Thread initialization on VM side even more. Let's wait and see if >>> David can come up with some horror issue we are overlooking. :) >>> >>> Thanks, >>> -Aleksey. >>> >> > > From roger.riggs at oracle.com Mon Nov 10 14:31:01 2014 From: roger.riggs at oracle.com (roger riggs) Date: Mon, 10 Nov 2014 09:31:01 -0500 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460C6F3.1080201@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460C6F3.1080201@oracle.com> Message-ID: <5460CC25.1000609@oracle.com> Hi Aleksey, 1) The Thread class javadoc says: " Unless otherwise noted, passing a {@code null} argument to a constructor * or method in this class will cause a {@link NullPointerException} to be * thrown." So, NPE is already specified for setThreadName(null) or any other method. I'm not infavor of adding the Objects.requireNonNull, the NPE will be thrown soon enough and it is just noise in the source code in most cases that creates larger bytecodes and extra work for the compiler /interpreter. 2) About not storing the name as a String, I have some vague recollection of the issue being related to exposing an object settable by the application that can be used with synchronize and allows communication and sync issues between threads. Just because some test doesn't fail, doesn't mean there isn't a design/implementation constraint. Roger On 11/10/2014 9:08 AM, Aleksey Shipilev wrote: > Hi Chris, > > Thanks for taking a look! > > On 11/10/2014 02:52 PM, Chris Hegarty wrote: >> Trivially, after your changes will NPE be thrown if setName(null), as it >> is today ? > There is no way it could throw NPE now, therefore the behavior is > different. The spec says nothing about NPE though, but it feels wrong to > pass the null String to setNativeName. I should add > Objects.requireNonNull there. Will wait for more feedbacks, and update > the webrev. > > -Aleksey. > > From calvin.cheung at oracle.com Mon Nov 10 17:17:46 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 10 Nov 2014 09:17:46 -0800 Subject: RFR(XS): 8060721: Test runtime/SharedArchiveFile/LimitSharedSizes.java fails in jdk 9 fcs new platforms/compiler In-Reply-To: <54609FB4.6040203@oracle.com> References: <545A770C.3030503@oracle.com> <545AC5D3.9090005@oracle.com> <545AF8F2.1010106@oracle.com> <545C1B31.3060901@oracle.com> <545C68E7.4080807@oracle.com> <545D1D56.4050000@oracle.com> <54609FB4.6040203@oracle.com> Message-ID: <5460F33A.1040000@oracle.com> Thanks for your re-review, David. Calvin On 11/10/2014 3:21 AM, David Holmes wrote: > On 8/11/2014 5:28 AM, Calvin Cheung wrote: >> On 11/6/2014 10:38 PM, David Holmes wrote: >>> Hi Calvin, >>> >>> On 7/11/2014 11:06 AM, Calvin Cheung wrote: >>>> I've updated the webrev at the same location: >>>> http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>> I also re-ran the tests. >>>> >>>> Please take a look. >>> >>> 717 jio_snprintf(class_list_path_str + class_list_path_len, >>> 718 sizeof(class_list_path_str) - >>> class_list_path_len, >>> 719 "%slib", os::file_separator()); >>> 720 } >>> 721 } >>> 722 class_list_path_len = (int)strlen(class_list_path_str); >>> >>> The strlen recalculation at #722 should be moved inside the if-block >>> as that is the only time it is needed. >> Agreed. >>> Also can we not just do += 4 ? >> I didn't want to use 4 to avoid another magic number but in this case I >> think it's obvious. >> >> I've updated webrev at the same location: >> http://cr.openjdk.java.net/~ccheung/8060721/webrev/ > > Looks good to me. > > Thanks, > David > >> thanks, >> Calvin >>> >>> Thanks, >>> David >>> >>>> thanks, >>>> Calvin >>>> >>>> On 11/5/2014 8:28 PM, Calvin Cheung wrote: >>>>> On 11/5/2014 4:50 PM, David Holmes wrote: >>>>>> On 6/11/2014 5:14 AM, Calvin Cheung wrote: >>>>>>> While upgrading the compiler on Mac for jdk9, we found this >>>>>>> compiler >>>>>>> bug >>>>>>> where it skips the following 2 lines of code in metaspaceShared.cpp >>>>>>> when >>>>>>> optimization is enable (set to -Os) for the fastdebug and product >>>>>>> builds. >>>>>>> strcat(class_list_path_str, os::file_separator()); >>>>>>> strcat(class_list_path_str, "classlist"); >>>>>>> >>>>>>> The bug is reproducible with Xcode 5.1.1 and 6.1. >>>>>>> >>>>>>> A workaround fix is to rewrite an "if" block in the >>>>>>> MetaspaceShared::preload_and_dump() method. >>>>>> >>>>>> Can't you simply replace the strcats with jio_snprintf and do away >>>>>> with the sub_path array? >>>>> The following works. I'll do more testing before sending an updated >>>>> webrev. >>>>> >>>>> --- a/src/share/vm/memory/metaspaceShared.cpp >>>>> +++ b/src/share/vm/memory/metaspaceShared.cpp >>>>> @@ -713,12 +713,15 @@ >>>>> int class_list_path_len = (int)strlen(class_list_path_str); >>>>> if (class_list_path_len >= 3) { >>>>> if (strcmp(class_list_path_str + class_list_path_len - 3, >>>>> "lib") != 0) { >>>>> - strcat(class_list_path_str, os::file_separator()); >>>>> - strcat(class_list_path_str, "lib"); >>>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>>> + sizeof(class_list_path_str) - >>>>> class_list_path_len, >>>>> + "%slib", os::file_separator()); >>>>> } >>>>> } >>>>> - strcat(class_list_path_str, os::file_separator()); >>>>> - strcat(class_list_path_str, "classlist"); >>>>> + class_list_path_len = (int)strlen(class_list_path_str); >>>>> + jio_snprintf(class_list_path_str + class_list_path_len, >>>>> + sizeof(class_list_path_str) - class_list_path_len, >>>>> + "%sclasslist", os::file_separator()); >>>>> class_list_path = class_list_path_str; >>>>> } else { >>>>> class_list_path = SharedClassListFile; >>>>>> >>>>>> Or even try strncat instead of strcat? >>>>> I think jio_snprintf is better because it null terminates the string. >>>>> If I use strncat, I'll need to initialize the entire buffer to null. >>>>> >>>>> thanks, >>>>> Calvin >>>>>> >>>>>> David >>>>>> >>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8060721 >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~ccheung/8060721/webrev/ >>>>>>> >>>>>>> Testing: >>>>>>> JPRT >>>>>>> The affected testcase with product, fastdebug, and debug >>>>>>> builds >>>>>>> built with Xcode 5.1.1 and 6.1. >>>>>>> >>>>>>> thanks, >>>>>>> Calvin >>>>> >>>> >> From coleen.phillimore at oracle.com Mon Nov 10 17:21:06 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 12:21:06 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter Message-ID: <5460F402.4060507@oracle.com> Summary: Signed bitfield size y can only have (1 << y)-1 values. We were overflowing the the _pos index and reusing the 0th element in the MallocSiteTable for two different stack traces which caused the assert for deallocation. Tested with nsk.quick.testlist and jtreg runtime tests with -XX:NativeMemoryTracking=detail. open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ bug link https://bugs.openjdk.java.net/browse/JDK-8062870 Thanks, Coleen From aleksey.shipilev at oracle.com Mon Nov 10 17:35:01 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Nov 2014 20:35:01 +0300 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5460F402.4060507@oracle.com> References: <5460F402.4060507@oracle.com> Message-ID: <5460F745.4070808@oracle.com> On 10.11.2014 20:21, Coleen Phillimore wrote: > Summary: Signed bitfield size y can only have (1 << y)-1 values. > > We were overflowing the the _pos index and reusing the 0th element in > the MallocSiteTable for two different stack traces which caused the > assert for deallocation. > > Tested with nsk.quick.testlist and jtreg runtime tests with > -XX:NativeMemoryTracking=detail. > > open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ > bug link https://bugs.openjdk.java.net/browse/JDK-8062870 Looks good, but made my head hurt a little. I think it deserves a more bullet-proof rework, a la: #ifdef _LP64 #define SIZE_BITS 64 #define FLAGS_BITS 8 #define POS_BITS 16 #define BUCKET_BITS 40 #else #define SIZE_BITS 32 #define FLAGS_BITS 8 #define POS_BITS 8 #define BUCKET_BITS 16 #endif // _LP64 #define MAX_MALLOCSITE_TABLE_SIZE ((size_t)((1 << BUCKET_BITS)-1)) #define MAX_BUCKET_LENGTH ((size_t)((1 << POS_BITS)-1)) class MallocHeader VALUE_OBJ_CLASS_SPEC { size_t _size : SIZE_BITS; size_t _flags : FLAGS_BITS; size_t _pos_idx : POS_BITS; size_t _bucket_idx: BUCKET_BITS; } ...and assert (SIZE_BITS + FLAGS_BITS + BUCKET_BITS + POS_BITS <= 2*BitsPerWord) somewhere? -Aleksey. From aleksey.shipilev at oracle.com Mon Nov 10 17:39:10 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Nov 2014 20:39:10 +0300 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5460F745.4070808@oracle.com> References: <5460F402.4060507@oracle.com> <5460F745.4070808@oracle.com> Message-ID: <5460F83E.9030901@oracle.com> On 10.11.2014 20:35, Aleksey Shipilev wrote: > On 10.11.2014 20:21, Coleen Phillimore wrote: >> Summary: Signed bitfield size y can only have (1 << y)-1 values. >> >> We were overflowing the the _pos index and reusing the 0th element in >> the MallocSiteTable for two different stack traces which caused the >> assert for deallocation. >> >> Tested with nsk.quick.testlist and jtreg runtime tests with >> -XX:NativeMemoryTracking=detail. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 > > Looks good, but made my head hurt a little. I think it deserves a more > bullet-proof rework, a la: > > #ifdef _LP64 > #define SIZE_BITS 64 > #define FLAGS_BITS 8 > #define POS_BITS 16 > #define BUCKET_BITS 40 > #else > #define SIZE_BITS 32 > #define FLAGS_BITS 8 > #define POS_BITS 8 > #define BUCKET_BITS 16 > #endif // _LP64 > > #define MAX_MALLOCSITE_TABLE_SIZE ((size_t)((1 << BUCKET_BITS)-1)) > #define MAX_BUCKET_LENGTH ((size_t)((1 << POS_BITS)-1)) Also, probably these two guys should be MAX_BUCKET_IDX and MAX_POS_IDX, respectively. (_pos_idx < MAX_BUCKET_LENGTH) looks more odd than (_pos_idx < MAX_POS_IDX). > class MallocHeader VALUE_OBJ_CLASS_SPEC { > size_t _size : SIZE_BITS; > size_t _flags : FLAGS_BITS; > size_t _pos_idx : POS_BITS; > size_t _bucket_idx: BUCKET_BITS; > } > > ...and assert (SIZE_BITS + FLAGS_BITS + BUCKET_BITS + POS_BITS <= > 2*BitsPerWord) somewhere? -Aleksey. From george.triantafillou at oracle.com Mon Nov 10 17:44:41 2014 From: george.triantafillou at oracle.com (George Triantafillou) Date: Mon, 10 Nov 2014 12:44:41 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5460F402.4060507@oracle.com> References: <5460F402.4060507@oracle.com> Message-ID: <5460F989.3030807@oracle.com> Hi Coleen, This looks good. Thanks for fixing this. -George On 11/10/2014 12:21 PM, Coleen Phillimore wrote: > Summary: Signed bitfield size y can only have (1 << y)-1 values. > > We were overflowing the the _pos index and reusing the 0th element in > the MallocSiteTable for two different stack traces which caused the > assert for deallocation. > > Tested with nsk.quick.testlist and jtreg runtime tests with > -XX:NativeMemoryTracking=detail. > > open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ > bug link https://bugs.openjdk.java.net/browse/JDK-8062870 > > Thanks, > Coleen From mikael.gerdin at oracle.com Mon Nov 10 17:56:44 2014 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Mon, 10 Nov 2014 18:56:44 +0100 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5460F83E.9030901@oracle.com> References: <5460F402.4060507@oracle.com> <5460F745.4070808@oracle.com> <5460F83E.9030901@oracle.com> Message-ID: <5460FC5C.9030306@oracle.com> On 2014-11-10 18:39, Aleksey Shipilev wrote: > On 10.11.2014 20:35, Aleksey Shipilev wrote: >> On 10.11.2014 20:21, Coleen Phillimore wrote: >>> Summary: Signed bitfield size y can only have (1 << y)-1 values. >>> >>> We were overflowing the the _pos index and reusing the 0th element in >>> the MallocSiteTable for two different stack traces which caused the >>> assert for deallocation. >>> >>> Tested with nsk.quick.testlist and jtreg runtime tests with >>> -XX:NativeMemoryTracking=detail. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >>> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 >> >> Looks good, but made my head hurt a little. I think it deserves a more >> bullet-proof rework, a la: >> >> #ifdef _LP64 >> #define SIZE_BITS 64 >> #define FLAGS_BITS 8 >> #define POS_BITS 16 >> #define BUCKET_BITS 40 >> #else >> #define SIZE_BITS 32 >> #define FLAGS_BITS 8 >> #define POS_BITS 8 >> #define BUCKET_BITS 16 >> #endif // _LP64 >> >> #define MAX_MALLOCSITE_TABLE_SIZE ((size_t)((1 << BUCKET_BITS)-1)) >> #define MAX_BUCKET_LENGTH ((size_t)((1 << POS_BITS)-1)) > > Also, probably these two guys should be MAX_BUCKET_IDX and MAX_POS_IDX, > respectively. (_pos_idx < MAX_BUCKET_LENGTH) looks more odd than > (_pos_idx < MAX_POS_IDX). > > >> class MallocHeader VALUE_OBJ_CLASS_SPEC { >> size_t _size : SIZE_BITS; >> size_t _flags : FLAGS_BITS; >> size_t _pos_idx : POS_BITS; >> size_t _bucket_idx: BUCKET_BITS; >> } >> >> ...and assert (SIZE_BITS + FLAGS_BITS + BUCKET_BITS + POS_BITS <= >> 2*BitsPerWord) somewhere? Perhaps even STATIC_ASSERT(...) /Mikael > > -Aleksey. > > From aleksey.shipilev at oracle.com Mon Nov 10 18:09:05 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Nov 2014 21:09:05 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> <5460D1DA.4050907@oracle.com> Message-ID: <5460FF41.90208@oracle.com> On 10.11.2014 19:39, Staffan Larsen wrote: >> On 10 nov 2014, at 15:55, Aleksey Shipilev wrote: >> Ow, it seems very like it. >> So, what testlist have I missed to catch this? > > Probably vm.tmtools.testlist and/or nsk.sajdi.testlist. Just a warning that these tests are far from stable. Sorry about that. Alas, both these testlists pass with current change without a hitch. That probably tells something about the test coverage. Any other ideas how to test for it? Maybe some manual way? Anyhow, there is a synonymous block in ThreadGroup handling, I can copy the relevant bits from there. Updated webrev follows soon. Still need to test if that change is safe. -Aleksey. From coleen.phillimore at oracle.com Mon Nov 10 18:26:52 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 13:26:52 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5460F745.4070808@oracle.com> References: <5460F402.4060507@oracle.com> <5460F745.4070808@oracle.com> Message-ID: <5461036C.7070408@oracle.com> Yeah, that seems like an improvement. I'll do it and send it out again. Thanks, Coleen On 11/10/14, 12:35 PM, Aleksey Shipilev wrote: > On 10.11.2014 20:21, Coleen Phillimore wrote: >> Summary: Signed bitfield size y can only have (1 << y)-1 values. >> >> We were overflowing the the _pos index and reusing the 0th element in >> the MallocSiteTable for two different stack traces which caused the >> assert for deallocation. >> >> Tested with nsk.quick.testlist and jtreg runtime tests with >> -XX:NativeMemoryTracking=detail. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 > Looks good, but made my head hurt a little. I think it deserves a more > bullet-proof rework, a la: > > #ifdef _LP64 > #define SIZE_BITS 64 > #define FLAGS_BITS 8 > #define POS_BITS 16 > #define BUCKET_BITS 40 > #else > #define SIZE_BITS 32 > #define FLAGS_BITS 8 > #define POS_BITS 8 > #define BUCKET_BITS 16 > #endif // _LP64 > > #define MAX_MALLOCSITE_TABLE_SIZE ((size_t)((1 << BUCKET_BITS)-1)) > #define MAX_BUCKET_LENGTH ((size_t)((1 << POS_BITS)-1)) > > class MallocHeader VALUE_OBJ_CLASS_SPEC { > size_t _size : SIZE_BITS; > size_t _flags : FLAGS_BITS; > size_t _pos_idx : POS_BITS; > size_t _bucket_idx: BUCKET_BITS; > } > > ...and assert (SIZE_BITS + FLAGS_BITS + BUCKET_BITS + POS_BITS <= > 2*BitsPerWord) somewhere? > > -Aleksey. > > From christian.tornqvist at oracle.com Mon Nov 10 20:00:42 2014 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Mon, 10 Nov 2014 15:00:42 -0500 Subject: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative , counter In-Reply-To: <5460F402.4060507@oracle.com> References: <5460F402.4060507@oracle.com> Message-ID: <008901cffd21$023660e0$06a322a0$@oracle.com> Hi Coleen, As mentioned offline, please make sure you remove the @ignore from test/runtime/NMT/MallocTrackingVerify.java as well. Otherwise this looks good, thanks for fixing this. Thanks, Christian -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen Phillimore Sent: Monday, November 10, 2014 12:21 PM To: hotspot-runtime-dev Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter Summary: Signed bitfield size y can only have (1 << y)-1 values. We were overflowing the the _pos index and reusing the 0th element in the MallocSiteTable for two different stack traces which caused the assert for deallocation. Tested with nsk.quick.testlist and jtreg runtime tests with -XX:NativeMemoryTracking=detail. open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ bug link https://bugs.openjdk.java.net/browse/JDK-8062870 Thanks, Coleen From jiangli.zhou at oracle.com Mon Nov 10 20:21:48 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 10 Nov 2014 12:21:48 -0800 Subject: RFR 8064375: Change certain errors to warnings in AppCDS output Message-ID: <54611E5C.6050605@oracle.com> Please review following simple fix that changes the non-fatal CDS preloading errors into warnings: http://cr.openjdk.java.net/~jiangli/8064375/webrev.00/ Thanks, Jiangli From mikhailo.seledtsov at oracle.com Mon Nov 10 20:32:51 2014 From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov) Date: Mon, 10 Nov 2014 12:32:51 -0800 Subject: RFR 8064375: Change certain errors to warnings in AppCDS output In-Reply-To: <54611E5C.6050605@oracle.com> References: <54611E5C.6050605@oracle.com> Message-ID: <546120F3.2090507@oracle.com> Hi Jiangli, The changes look good to me. Misha On 11/10/2014 12:21 PM, Jiangli Zhou wrote: > Please review following simple fix that changes the non-fatal CDS > preloading errors into warnings: > > http://cr.openjdk.java.net/~jiangli/8064375/webrev.00/ > > Thanks, > Jiangli From jiangli.zhou at oracle.com Mon Nov 10 20:36:02 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 10 Nov 2014 12:36:02 -0800 Subject: RFR 8064375: Change certain errors to warnings in AppCDS output In-Reply-To: <546120F3.2090507@oracle.com> References: <54611E5C.6050605@oracle.com> <546120F3.2090507@oracle.com> Message-ID: <546121B2.4070705@oracle.com> Thanks, Misha! Jiangli On 11/10/2014 12:32 PM, Mikhailo Seledtsov wrote: > Hi Jiangli, > > The changes look good to me. > > Misha > > On 11/10/2014 12:21 PM, Jiangli Zhou wrote: >> Please review following simple fix that changes the non-fatal CDS >> preloading errors into warnings: >> >> http://cr.openjdk.java.net/~jiangli/8064375/webrev.00/ >> >> Thanks, >> Jiangli > From coleen.phillimore at oracle.com Mon Nov 10 22:53:29 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 17:53:29 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5460F745.4070808@oracle.com> References: <5460F402.4060507@oracle.com> <5460F745.4070808@oracle.com> Message-ID: <546141E9.8060602@oracle.com> Aleksey, I made this change and I'm not happy with it. Now I have 6 #defines to leak into the global hotspot namespace (can't undef because other code uses MAX_BUCKET_LENGTH, which relies on these #defines). I think the differing constants within 10 lines of each other are less ugly and makes better sense. They're more direct and less visually disturbing than the upper case names. Also MAX_BUCKET_LENGTH is used in other NMT code where it's name makes a lot more sense, so I don't want to change that either. Also the STATIC_ASSERT leads to the most unhelpful error message. I'm not a fan. services/mallocTracker.hpp|264| error: aggregate ?StaticAssert DUMMY_STATIC_ASSERT? has incomplete type and cannot be defined Thank you for the comments which I initially agreed with but working with the code, makes me less happy and I will leave it as is (except one change which I'm going to put out shortly). Thanks, Coleen On 11/10/14, 12:35 PM, Aleksey Shipilev wrote: > On 10.11.2014 20:21, Coleen Phillimore wrote: >> Summary: Signed bitfield size y can only have (1 << y)-1 values. >> >> We were overflowing the the _pos index and reusing the 0th element in >> the MallocSiteTable for two different stack traces which caused the >> assert for deallocation. >> >> Tested with nsk.quick.testlist and jtreg runtime tests with >> -XX:NativeMemoryTracking=detail. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 > Looks good, but made my head hurt a little. I think it deserves a more > bullet-proof rework, a la: > > #ifdef _LP64 > #define SIZE_BITS 64 > #define FLAGS_BITS 8 > #define POS_BITS 16 > #define BUCKET_BITS 40 > #else > #define SIZE_BITS 32 > #define FLAGS_BITS 8 > #define POS_BITS 8 > #define BUCKET_BITS 16 > #endif // _LP64 > > #define MAX_MALLOCSITE_TABLE_SIZE ((size_t)((1 << BUCKET_BITS)-1)) > #define MAX_BUCKET_LENGTH ((size_t)((1 << POS_BITS)-1)) > > class MallocHeader VALUE_OBJ_CLASS_SPEC { > size_t _size : SIZE_BITS; > size_t _flags : FLAGS_BITS; > size_t _pos_idx : POS_BITS; > size_t _bucket_idx: BUCKET_BITS; > } > > ...and assert (SIZE_BITS + FLAGS_BITS + BUCKET_BITS + POS_BITS <= > 2*BitsPerWord) somewhere? > > -Aleksey. > > From coleen.phillimore at oracle.com Mon Nov 10 23:00:02 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 18:00:02 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5460F989.3030807@oracle.com> References: <5460F402.4060507@oracle.com> <5460F989.3030807@oracle.com> Message-ID: <54614372.6080001@oracle.com> Hi George, Thanks for the review. I didn't know there was another test that I needed to remove @ignore. I've run this test for a couple hours in a loop and it always passes now. The other bug number was the bug that Christian fixed. open webrev at http://cr.openjdk.java.net/~coleenp/8062870_2/ Thanks, Coleen On 11/10/14, 12:44 PM, George Triantafillou wrote: > Hi Coleen, > > This looks good. Thanks for fixing this. > > -George > > On 11/10/2014 12:21 PM, Coleen Phillimore wrote: >> Summary: Signed bitfield size y can only have (1 << y)-1 values. >> >> We were overflowing the the _pos index and reusing the 0th element in >> the MallocSiteTable for two different stack traces which caused the >> assert for deallocation. >> >> Tested with nsk.quick.testlist and jtreg runtime tests with >> -XX:NativeMemoryTracking=detail. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 >> >> Thanks, >> Coleen > From coleen.phillimore at oracle.com Mon Nov 10 23:10:53 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 18:10:53 -0500 Subject: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <008901cffd21$023660e0$06a322a0$@oracle.com> References: <5460F402.4060507@oracle.com> <008901cffd21$023660e0$06a322a0$@oracle.com> Message-ID: <546145FD.8000207@oracle.com> Thanks Christian. You took of RFR so I couldn't find it! Coleen On 11/10/14, 3:00 PM, Christian Tornqvist wrote: > Hi Coleen, > > As mentioned offline, please make sure you remove the @ignore from > test/runtime/NMT/MallocTrackingVerify.java as well. > > Otherwise this looks good, thanks for fixing this. > > Thanks, > Christian > > -----Original Message----- > From: hotspot-runtime-dev > [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen > Phillimore > Sent: Monday, November 10, 2014 12:21 PM > To: hotspot-runtime-dev > Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 > assert(_count > 0) failed: Negative ,counter > > Summary: Signed bitfield size y can only have (1 << y)-1 values. > > We were overflowing the the _pos index and reusing the 0th element in the > MallocSiteTable for two different stack traces which caused the assert for > deallocation. > > Tested with nsk.quick.testlist and jtreg runtime tests with > -XX:NativeMemoryTracking=detail. > > open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ > bug link https://bugs.openjdk.java.net/browse/JDK-8062870 > > Thanks, > Coleen > From daniel.daugherty at oracle.com Tue Nov 11 00:00:41 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 10 Nov 2014 17:00:41 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 Message-ID: <546151A9.1080100@oracle.com> Greetings, I have a Solaris Full Debug Symbols (FDS) fix ready for review. Yes, it is a small fix, but it is in Makefiles so feel free to run screaming from the room... :-) On the plus side the fix does delete two work around source files (Coleen would say that's a Good Thing (TM)!) The fix is to detect the version of GNU objcopy that is being used on the machine and only enable Full Debug Symbols when that version is 2.21.1 or newer. If you don't have the right version, then the build drops back to pre-FDS build configs with a message like this: WARNING: /usr/sfw/bin/gobjcopy --version info: WARNING: GNU objcopy 2.15 WARNING: an objcopy version of 2.21.1 or newer is needed to create valid .debuginfo files. WARNING: ignoring above objcopy command. WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC version. WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 version. WARNING: Solaris 11 Update 1 contains the correct version. INFO: no objcopy cmd found so cannot create .debuginfo files. INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 This work is being tracked by the following bug IDs: JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC https://bugs.openjdk.java.net/browse/JDK-8033602 JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on Solaris X86 https://bugs.openjdk.java.net/browse/JDK-8034005 Here is the webrev URL: http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ Testing: - JPRT test jobs to verify that the current JPRT Solaris hosts are happy - local builds on my Solaris 10 X86 machine to verify that the wrong version of GNU objcopy is caught Thanks, in advance, for any comments, questions or suggestions. Dan From coleen.phillimore at oracle.com Tue Nov 11 00:12:47 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 19:12:47 -0500 Subject: RFR 8064375: Change certain errors to warnings in AppCDS output In-Reply-To: <54611E5C.6050605@oracle.com> References: <54611E5C.6050605@oracle.com> Message-ID: <5461547F.3050809@oracle.com> Yes, these messages are better saying Warning since the Error doesn't seem to cause the -Xshare:dump to fail. Coleen On 11/10/14, 3:21 PM, Jiangli Zhou wrote: > Please review following simple fix that changes the non-fatal CDS > preloading errors into warnings: > > http://cr.openjdk.java.net/~jiangli/8064375/webrev.00/ > > Thanks, > Jiangli From jiangli.zhou at oracle.com Tue Nov 11 00:13:40 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 10 Nov 2014 16:13:40 -0800 Subject: RFR 8064375: Change certain errors to warnings in AppCDS output In-Reply-To: <5461547F.3050809@oracle.com> References: <54611E5C.6050605@oracle.com> <5461547F.3050809@oracle.com> Message-ID: <546154B4.2080002@oracle.com> Thanks Coleen! Jiangli On 11/10/2014 04:12 PM, Coleen Phillimore wrote: > > Yes, these messages are better saying Warning since the Error doesn't > seem to cause the -Xshare:dump to fail. > > Coleen > > On 11/10/14, 3:21 PM, Jiangli Zhou wrote: >> Please review following simple fix that changes the non-fatal CDS >> preloading errors into warnings: >> >> http://cr.openjdk.java.net/~jiangli/8064375/webrev.00/ >> >> Thanks, >> Jiangli > From john.r.rose at oracle.com Tue Nov 11 00:21:17 2014 From: john.r.rose at oracle.com (John Rose) Date: Mon, 10 Nov 2014 16:21:17 -0800 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative , counter In-Reply-To: <5460F402.4060507@oracle.com> References: <5460F402.4060507@oracle.com> Message-ID: <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> I think on many LP64 platforms the value of (1<<40) is 256, same as (1<<8). It will seldom be the intended value of (int64_t)1<<40. The C "<<" operator is notoriously devious (not to say shifty). For shift/mask arithmetic we should be continuing to use macros from globalDefinitions.hpp. They are far more reliable than C expressions. ? John On Nov 10, 2014, at 9:21 AM, Coleen Phillimore wrote: > Summary: Signed bitfield size y can only have (1 << y)-1 values. > > We were overflowing the the _pos index and reusing the 0th element in the MallocSiteTable for two different stack traces which caused the assert for deallocation. > > Tested with nsk.quick.testlist and jtreg runtime tests with -XX:NativeMemoryTracking=detail. > > open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ > bug link https://bugs.openjdk.java.net/browse/JDK-8062870 > > Thanks, > Coleen From coleen.phillimore at oracle.com Tue Nov 11 00:37:23 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 19:37:23 -0500 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <545F8CFA.80809@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> <545F8CFA.80809@oracle.com> Message-ID: <54615A43.10700@oracle.com> Hi, I think this code looks correct. Was there a test in the test system that exercises this code? I think it would be hard to test with a dedicated test but was there one already in the test sets? Secondly, could you use the word adjacent in the comments, reuse oopmap for adjacent oops in the class or something like that? That would have saved me some jotting down on notebook. I'll sponsor it if you get another reviewer. Thanks, Coleen On 11/9/14, 10:49 AM, Aleksey Shipilev wrote: > Hi again, > > No changes in webrev: > http://cr.openjdk.java.net/~shade/8015272/webrev.01/ > > Please review and sponsor: > http://cr.openjdk.java.net/~shade/8015272/8015272.changeset > > As per Karen's request, more testing is done, ran the tests on my Linux > x86_64/fastdebug: > > On 11/06/2014 07:07 PM, Aleksey Shipilev wrote: >> On 11/06/2014 06:01 PM, Karen Kinnear wrote: >>> - e.g. what about the vmtestbase vm/runtime/contended tests (and yes, some tests should be removed from that testlist) > vmtestbase vm/runtime/contended: no issues. > hotspot/test/runtime/ jtreg: no issues. > >>> - vmtestbase: vm.quick.testlist (required for runtime changes) > vm.quick.testlist: no issues. > >>> - and since @Contended annotation is used in the JDK core libraries - the jtreg jdk tests? > jdk/test/java/util/concurrent jtreg: no issues. > jdk/test/java/lang/Thread jtreg: no issues. > > > Thanks, > -Aleksey. > > From coleen.phillimore at oracle.com Tue Nov 11 02:06:17 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 21:06:17 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> References: <5460F402.4060507@oracle.com> <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> Message-ID: <54616F19.6050808@oracle.com> You are right! I didn't get 256 because my fix didn't compile on 64 bit due to the extra parenthesis I added around 1<<40. I'm changing it to use right_n_bits(40), 16 and 8. Nothing good ever comes from C shifts. Thanks! Coleen On 11/10/14, 7:21 PM, John Rose wrote: > I think on many LP64 platforms the value of (1<<40) is 256, same as (1<<8). It will seldom be the intended value of (int64_t)1<<40. > > The C "<<" operator is notoriously devious (not to say shifty). > > For shift/mask arithmetic we should be continuing to use macros from globalDefinitions.hpp. > They are far more reliable than C expressions. > > ? John > > On Nov 10, 2014, at 9:21 AM, Coleen Phillimore wrote: > >> Summary: Signed bitfield size y can only have (1 << y)-1 values. >> >> We were overflowing the the _pos index and reusing the 0th element in the MallocSiteTable for two different stack traces which caused the assert for deallocation. >> >> Tested with nsk.quick.testlist and jtreg runtime tests with -XX:NativeMemoryTracking=detail. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 >> >> Thanks, >> Coleen From coleen.phillimore at oracle.com Tue Nov 11 02:28:30 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Nov 2014 21:28:30 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> References: <5460F402.4060507@oracle.com> <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> Message-ID: <5461744E.2000304@oracle.com> I've made the change to use right_n_bits and run the NMT tests (including the one that crashed in a loop for a while). open webrev at http://cr.openjdk.java.net/~coleenp/8062870_3/ Thanks. This is a big improvement. Coleen On 11/10/14, 7:21 PM, John Rose wrote: > I think on many LP64 platforms the value of (1<<40) is 256, same as (1<<8). It will seldom be the intended value of (int64_t)1<<40. > > The C "<<" operator is notoriously devious (not to say shifty). > > For shift/mask arithmetic we should be continuing to use macros from globalDefinitions.hpp. > They are far more reliable than C expressions. > > ? John > > On Nov 10, 2014, at 9:21 AM, Coleen Phillimore wrote: > >> Summary: Signed bitfield size y can only have (1 << y)-1 values. >> >> We were overflowing the the _pos index and reusing the 0th element in the MallocSiteTable for two different stack traces which caused the assert for deallocation. >> >> Tested with nsk.quick.testlist and jtreg runtime tests with -XX:NativeMemoryTracking=detail. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 >> >> Thanks, >> Coleen From daniel.daugherty at oracle.com Tue Nov 11 03:46:18 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 10 Nov 2014 20:46:18 -0700 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <545C2BC0.3080207@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> <54591A3A.1090005@oracle.com> <545C2BC0.3080207@oracle.com> Message-ID: <5461868A.8070308@oracle.com> The webrev is now available! Sorry for any confusion. Dan On 11/6/14 7:17 PM, Daniel D. Daugherty wrote: > The fix for JDK-8062851 has been reviewed, tested and pushed to > RT_Baseline. > > Time to get back to this review thread so here's an updated webrev: > > http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/ > > David H., I believe I've addressed all of your comments. Please > let me know if I missed something... > > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > On 11/4/14 11:26 AM, Daniel D. Daugherty wrote: >> The cleanup is turning into a bigger change than the fast enter >> bucket itself so I'm spinning the cleanup into a new bug: >> >> JDK-8062851 cleanup ObjectMonitor offset adjustments >> https://bugs.openjdk.java.net/browse/JDK-8062851 >> >> Yes, this means that the Contended Locking cleanup bucket has reopened >> for yet another change... >> >> We'll get back to "fast enter" after the dust has settled... >> >> Dan >> >> >> On 11/3/14 6:59 PM, Daniel D. Daugherty wrote: >>> David, >>> >>> Thanks for the review! As usual, replies are embedded below... >>> >>> >>> On 11/2/14 9:44 PM, David Holmes wrote: >>>> Hi Dan, >>>> >>>> Looks good. >>> >>> Thanks! >>> >>> >>>> Couple of nits and one semantic query below ... >>>> >>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp >>>> >>>> Formatting changes were a bit of a distraction. >>> >>> Yes, I have no idea what got into me. Normally I do formatting >>> changes separately so the noise does not distract... >>> >>> It turns out there is a constant defined that should be used >>> instead of all these literal '2's: >>> >>> src/share/vm/oops/markOop.hpp: monitor_value = 2 >>> >>> Typically used as follows: >>> >>> src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = >>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; >>> >>> I will clean this up just for the files that I'm touching as >>> part of this fix. >>> >>> >>>> >>>> --- >>>> >>>> src/cpu/x86/vm/macroAssembler_x86.cpp >>>> >>>> Formatting changes were a bit of a distraction. >>> >>> Same reply as for macroAssembler_sparc.cpp. >>> >>> >>>> 1929 // unconditionally set stackBox->_displaced_header = 3 >>>> 1930 movptr(Address(boxReg, 0), >>>> (int32_t)intptr_t(markOopDesc::unused_mark())); >>>> >>>> At 1870 we refer to box rather than stackBox. Also it takes some >>>> sleuthing to realize that "3" here is somehow a pseudonym for >>>> unused_mark(). Back up at 1808 we have a to-do: >>>> >>>> 1808 // use markOop::unused_mark() instead of "3". >>>> >>>> so the current change seems to be implementing that, even though >>>> other uses of "3" are left untouched. >>> >>> I'll take a look at cleaning those up also... >>> >>> In some cases markOopDesc::marked_value will work for the literal '3', >>> but in other cases we'll use markOop::unused_mark(): >>> >>> static markOop unused_mark() { >>> return (markOop) marked_value; >>> } >>> >>> to save us the noise of the (markOop) cast. >>> >>> >>>> --- >>>> >>>> src/share/vm/runtime/sharedRuntime.cpp >>>> >>>> 1794 JRT_BLOCK_ENTRY(void, >>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >>>> lock, JavaThread* thread)) >>>> 1795 if (!SafepointSynchronize::is_synchronizing()) { >>>> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >>>> return; >>>> >>>> Is it necessary to check is_synchronizing? If we are executing this >>>> code we are not at a safepoint and the quick_enter wont change >>>> that, so I'm not sure what we are guarding against. >>> >>> So this first state checker: >>> >>> src/share/vm/runtime/safepoint.hpp: >>> inline static bool is_synchronizing() { return _state == >>> _synchronizing; } >>> >>> means that we want to go to a safepoint and: >>> >>> inline static bool is_at_safepoint() { return _state == >>> _synchronized; } >>> >>> means that we are at a safepoint. Dice's optimization bails out if >>> we want to go to a safepoint and ObjectSynchronizer::quick_enter() >>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that >>> code to be quick (and not go to a safepoint). I'm not seeing >>> anything obvious.... >>> >>> Sometimes we have to be careful with JavaThread suspend requests and >>> monitor acquisition, but I don't think that's a problem here... In >>> order for the "suspend requesting" thread to be surprised, the suspend >>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then >>> the suspend target has do something unexpected like acquire a monitor >>> that it was previously blocked upon when it was suspended. We've had >>> bugs like that in the past... In this optimization case, our target >>> thread is not blocked on a contended monitor... >>> >>> In this particular case, the "suspend requesting" thread will set the >>> suspend request state on the target thread, but the target thread is >>> busy trying to enter this uncontended monitor (quickly). So the >>> "suspend requesting" thread, will request a no-op safepoint, but it >>> won't return from the suspend API until that safepoint completes. >>> The safepoint won't complete until the target thread is done acquiring >>> the previously uncontended monitor... so the target thread will be >>> suspended while holding the previous uncontended monitor and the >>> "suspend requesting" thread will return from the suspend API all >>> happy... >>> >>> Well, I don't see the reason either so I'll have to ping Dave Dice >>> and Karen Kinnear to see if either of them can fill in the history >>> here. This could be an abundance of caution case. >>> >>> >>>> --- >>>> >>>> src/share/vm/runtime/synchronizer.cpp >>>> >>>> Minor nit: line 153 the usual acronym is NPE (for >>>> NullPointerException) not NPX >>> >>> I'll do a search for uses of NPX and other uses of 'X' in exception >>> acronyms... >>> >>> >>>> >>>> Nit: 159 Thread * const ox >>>> >>>> Please change ox to owner. >>> >>> Will do. >>> >>> Thanks again for the review! >>> >>> Dan >>> >>> >>>> >>>> --- >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> >>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> I have the Contended Locking fast enter bucket ready for review. >>>>> >>>>> The code changes in this bucket are primarily a quick_enter() >>>>> function that works on inflated but uncontended Java monitors. >>>>> This quick_enter() function is used on the "slow path" for Java >>>>> Monitor enter operations when the built-in "fast path" (read >>>>> assembly code) doesn't work. >>>>> >>>>> This work is being tracked by the following bug ID: >>>>> >>>>> JDK-8061553 Contended Locking fast enter bucket >>>>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>>>> >>>>> Here is the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>>>> >>>>> Here is the JEP link: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>> >>>>> 8061553 summary of changes: >>>>> >>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>>>> >>>>> - clean up spacing around some >>>>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>>>> - remove optional (EmitSync & 64) code >>>>> - change from cmp() to andcc() so icc.zf flag is set >>>>> >>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>>>> >>>>> - remove optional (EmitSync & 2) code >>>>> - rewrite LP64 inflated lock code that tries to CAS in >>>>> the new owner value to be more efficient >>>>> >>>>> interfaceSupport.hpp: >>>>> >>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>>>> JRT_BLOCK_ENTRY into two pieces. >>>>> >>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>>>> >>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>>>> to permit ObjectSynchronizer::quick_enter() call >>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>>>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>>>> >>>>> synchronizer.[ch]pp: >>>>> >>>>> - add ObjectSynchronizer::quick_enter() for entering an >>>>> inflated but unowned Java monitor without thread state >>>>> changes >>>>> >>>>> Testing: >>>>> >>>>> - Aurora Adhoc RT/SVC baseline batch >>>>> - JPRT test jobs >>>>> - MonitorEnterStresser micro-benchmark (in process) >>>>> - CallTimerGrid stress testing (in process) >>>>> - Aurora performance testing: >>>>> - out of the box for the "promotion" and 32-bit server configs >>>>> - heavy weight monitors for the "promotion" and 32-bit server >>>>> configs >>>>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>>>> (in process) >>>>> >>>>> >>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>> >>>>> Dan >>> >>> >> >> >> > > > From calvin.cheung at oracle.com Tue Nov 11 06:12:46 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 10 Nov 2014 22:12:46 -0800 Subject: RFR(S): 8043491: warning LNK4197: export '... ...' specified multiple times; using first specification Message-ID: <5461A8DE.1050009@oracle.com> This is for fixing link warnings on windows such as the following: jni.obj : warning LNK4197: export 'JNI_CreateJavaVM' specified multiple times; using first specification The warning is reproducible with both VS2010 and VS2013. It is applicable to 64-bit only probably due to the __declspec(dllexport) on 32-bit, it exports the function decorated name with a leading underscore, but not the case on 64-bit as described in: http://stackoverflow.com/questions/3572344/a-warning-with-building-64bit-dll All those functions are declared with JNIEXPORT (#define JNIEXPORT __declspec(dllexport)) and we're adding the /export: in the link command. Therefore, on 64-bit platform, we get the "specified multiple times" LNK4197 warning. A fix is to check if the platform is 64-bit, we don't add those /export option to the link command. JBS: https://bugs.openjdk.java.net/browse/JDK-8043491 webrev: http://cr.openjdk.java.net/~ccheung/8043491/webrev/ Tests: (1) build jvm.dll via command line (both 32- and 64-bit) use configure.sh to setup and then do "make CONF= hotspot" (2) generate visual studio project files using ProjectCreator (both 32- and 64-bit) build jvm.dll via VS2013 (both 32- and 64-bit) (3) JPRT thanks, Calvin From david.holmes at oracle.com Tue Nov 11 07:06:59 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Nov 2014 17:06:59 +1000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460C960.9080509@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> Message-ID: <5461B593.1000104@oracle.com> Hi Aleksey, On 11/11/2014 12:19 AM, Aleksey Shipilev wrote: > Hi David, Chris, > > On 11/10/2014 04:53 PM, Chris Hegarty wrote: >> On 10/11/14 12:56, David Holmes wrote: >>> On 10/11/2014 9:52 PM, Chris Hegarty wrote: >>>> I have only looked at the libraries changes, and I think they make sense >>>> . As in, I can find no reason why the name cannot be changed to be a >>>> String. >>> >>> Very quick response, but IIRC this has been examined in the past and >>> there were reasons why it can't/shouldn't be done. Will try to dig out >>> more details in the morning. >> >> If there was previous discussion on this, that revealed some substantial >> issue, that would be great, but I can't recall, or find, it now. >> >> Hotspot express, and the desire for hotspot to run with different >> library versions, would certainly cause complication, but I don't >> believe that is an issue now. >> >> Just on that, the library changes are minimal, and if this were to >> proceed then they can accompany the hotspot change, as they make their >> way into jdk9/dev. >> >> Anyway, this should await your reply. > > Alan was having the same concern, there is an issue with JNI/JVMTI and > other power users that might break when exposed to under-constructed > Thread, e.g: > https://bugs.openjdk.java.net/browse/JDK-6412693 > > This is why I ran jvmti and serviceability tests for this change, > yielding no failures. This reinforces my belief this patch does not > break the important invariant: if there is a problem with "Thread.name = > name.toCharArray()" anywhere in Thread code, then "Thread.name = name" > does neither regress it further nor fixes it. True. > Then I speculated that having char[] name would help VM initialize the > name if we wanted to switch to complete VM-side initialization of > Thread, but it seems we can do String oop instantiation in the similar vein. I think it really just came down to accessing the Thread name from things like JVMDI/PI (now JVM TI) - easier for C code to access a raw char[]. Maybe once upon a time (in a land not so far away) we even passed char[] to the Thread constructor? :) But having re-discovered past discussions etc there's really nothing to stop this from being a String (slight memory use increase per Thread object). > Caching the name feels like a band-aid, that will probably complicate > the Thread initialization on VM side even more. Let's wait and see if > David can come up with some horror issue we are overlooking. :) I don't see how a Java side cache affects anything on the VM initialization side - and as Strings can be published unsafely we don't even need sync/volatile to do so :) That aside I think it is as Alan commented - a number of small things (some logistical I think) that made this change not worth the effort. Maybe now it is worth the effort if getName is a bottleneck (but again caching is the common fix for that kind of problem :)). I was concerned about executing even more Java code at thread attach time, but we already create a String to pass to the Thread constructor, so no change there. So looking at your proposal ... some minor comments ... JDK change is okay - but "name" doesn't need to be volatile when it is a String reference. Hotspot side: src/share/vm/classfile/javaClasses.hpp This added assert seems overly cautious: 134 oop value = java_string->obj_field(value_offset); 135 assert((value->is_typeArray() && TypeArrayKlass::cast(value->klass())->element_type() == T_CHAR), "expect char[]"); you are basically checking that String.value is defined as a char[]. If warranted, this is a check needed once in the lifetime of a VM not every time this method is called. (Yes I see we had something similarly odd in java_lang_thread::name. :( ) --- src/share/vm/classfile/javaClasses.cpp ! oop java_lang_Thread::name(oop java_thread) { oop name = java_thread->obj_field(_name_offset); ! assert(name != NULL, "thread name is NULL"); I'm not confident this can never be called before the name has been set. The original assertion allowed for NULL as does the JVM TI code. --- src/share/vm/prims/jvmtiTrace.cpp Copyright year needs updating. :) --- Aside: I wonder if we've inadvertently fixed 6771287 now. :) That was a fun one to debug. Thanks, David ----- > Thanks, > -Aleksey. > From david.holmes at oracle.com Tue Nov 11 08:02:10 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Nov 2014 18:02:10 +1000 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <545C2BC0.3080207@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> <54591A3A.1090005@oracle.com> <545C2BC0.3080207@oracle.com> Message-ID: <5461C282.1020806@oracle.com> On 7/11/2014 12:17 PM, Daniel D. Daugherty wrote: > The fix for JDK-8062851 has been reviewed, tested and pushed to > RT_Baseline. > > Time to get back to this review thread so here's an updated webrev: > > http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/ > > David H., I believe I've addressed all of your comments. Please > let me know if I missed something... Looks good to me - thanks Dan! David ----- > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > On 11/4/14 11:26 AM, Daniel D. Daugherty wrote: >> The cleanup is turning into a bigger change than the fast enter >> bucket itself so I'm spinning the cleanup into a new bug: >> >> JDK-8062851 cleanup ObjectMonitor offset adjustments >> https://bugs.openjdk.java.net/browse/JDK-8062851 >> >> Yes, this means that the Contended Locking cleanup bucket has reopened >> for yet another change... >> >> We'll get back to "fast enter" after the dust has settled... >> >> Dan >> >> >> On 11/3/14 6:59 PM, Daniel D. Daugherty wrote: >>> David, >>> >>> Thanks for the review! As usual, replies are embedded below... >>> >>> >>> On 11/2/14 9:44 PM, David Holmes wrote: >>>> Hi Dan, >>>> >>>> Looks good. >>> >>> Thanks! >>> >>> >>>> Couple of nits and one semantic query below ... >>>> >>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp >>>> >>>> Formatting changes were a bit of a distraction. >>> >>> Yes, I have no idea what got into me. Normally I do formatting >>> changes separately so the noise does not distract... >>> >>> It turns out there is a constant defined that should be used >>> instead of all these literal '2's: >>> >>> src/share/vm/oops/markOop.hpp: monitor_value = 2 >>> >>> Typically used as follows: >>> >>> src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = >>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; >>> >>> I will clean this up just for the files that I'm touching as >>> part of this fix. >>> >>> >>>> >>>> --- >>>> >>>> src/cpu/x86/vm/macroAssembler_x86.cpp >>>> >>>> Formatting changes were a bit of a distraction. >>> >>> Same reply as for macroAssembler_sparc.cpp. >>> >>> >>>> 1929 // unconditionally set stackBox->_displaced_header = 3 >>>> 1930 movptr(Address(boxReg, 0), >>>> (int32_t)intptr_t(markOopDesc::unused_mark())); >>>> >>>> At 1870 we refer to box rather than stackBox. Also it takes some >>>> sleuthing to realize that "3" here is somehow a pseudonym for >>>> unused_mark(). Back up at 1808 we have a to-do: >>>> >>>> 1808 // use markOop::unused_mark() instead of "3". >>>> >>>> so the current change seems to be implementing that, even though >>>> other uses of "3" are left untouched. >>> >>> I'll take a look at cleaning those up also... >>> >>> In some cases markOopDesc::marked_value will work for the literal '3', >>> but in other cases we'll use markOop::unused_mark(): >>> >>> static markOop unused_mark() { >>> return (markOop) marked_value; >>> } >>> >>> to save us the noise of the (markOop) cast. >>> >>> >>>> --- >>>> >>>> src/share/vm/runtime/sharedRuntime.cpp >>>> >>>> 1794 JRT_BLOCK_ENTRY(void, >>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >>>> lock, JavaThread* thread)) >>>> 1795 if (!SafepointSynchronize::is_synchronizing()) { >>>> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >>>> return; >>>> >>>> Is it necessary to check is_synchronizing? If we are executing this >>>> code we are not at a safepoint and the quick_enter wont change that, >>>> so I'm not sure what we are guarding against. >>> >>> So this first state checker: >>> >>> src/share/vm/runtime/safepoint.hpp: >>> inline static bool is_synchronizing() { return _state == >>> _synchronizing; } >>> >>> means that we want to go to a safepoint and: >>> >>> inline static bool is_at_safepoint() { return _state == >>> _synchronized; } >>> >>> means that we are at a safepoint. Dice's optimization bails out if >>> we want to go to a safepoint and ObjectSynchronizer::quick_enter() >>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that >>> code to be quick (and not go to a safepoint). I'm not seeing >>> anything obvious.... >>> >>> Sometimes we have to be careful with JavaThread suspend requests and >>> monitor acquisition, but I don't think that's a problem here... In >>> order for the "suspend requesting" thread to be surprised, the suspend >>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then >>> the suspend target has do something unexpected like acquire a monitor >>> that it was previously blocked upon when it was suspended. We've had >>> bugs like that in the past... In this optimization case, our target >>> thread is not blocked on a contended monitor... >>> >>> In this particular case, the "suspend requesting" thread will set the >>> suspend request state on the target thread, but the target thread is >>> busy trying to enter this uncontended monitor (quickly). So the >>> "suspend requesting" thread, will request a no-op safepoint, but it >>> won't return from the suspend API until that safepoint completes. >>> The safepoint won't complete until the target thread is done acquiring >>> the previously uncontended monitor... so the target thread will be >>> suspended while holding the previous uncontended monitor and the >>> "suspend requesting" thread will return from the suspend API all >>> happy... >>> >>> Well, I don't see the reason either so I'll have to ping Dave Dice >>> and Karen Kinnear to see if either of them can fill in the history >>> here. This could be an abundance of caution case. >>> >>> >>>> --- >>>> >>>> src/share/vm/runtime/synchronizer.cpp >>>> >>>> Minor nit: line 153 the usual acronym is NPE (for >>>> NullPointerException) not NPX >>> >>> I'll do a search for uses of NPX and other uses of 'X' in exception >>> acronyms... >>> >>> >>>> >>>> Nit: 159 Thread * const ox >>>> >>>> Please change ox to owner. >>> >>> Will do. >>> >>> Thanks again for the review! >>> >>> Dan >>> >>> >>>> >>>> --- >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> >>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> I have the Contended Locking fast enter bucket ready for review. >>>>> >>>>> The code changes in this bucket are primarily a quick_enter() >>>>> function that works on inflated but uncontended Java monitors. >>>>> This quick_enter() function is used on the "slow path" for Java >>>>> Monitor enter operations when the built-in "fast path" (read >>>>> assembly code) doesn't work. >>>>> >>>>> This work is being tracked by the following bug ID: >>>>> >>>>> JDK-8061553 Contended Locking fast enter bucket >>>>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>>>> >>>>> Here is the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>>>> >>>>> Here is the JEP link: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>> >>>>> 8061553 summary of changes: >>>>> >>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>>>> >>>>> - clean up spacing around some >>>>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>>>> - remove optional (EmitSync & 64) code >>>>> - change from cmp() to andcc() so icc.zf flag is set >>>>> >>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>>>> >>>>> - remove optional (EmitSync & 2) code >>>>> - rewrite LP64 inflated lock code that tries to CAS in >>>>> the new owner value to be more efficient >>>>> >>>>> interfaceSupport.hpp: >>>>> >>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>>>> JRT_BLOCK_ENTRY into two pieces. >>>>> >>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>>>> >>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>>>> to permit ObjectSynchronizer::quick_enter() call >>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>>>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>>>> >>>>> synchronizer.[ch]pp: >>>>> >>>>> - add ObjectSynchronizer::quick_enter() for entering an >>>>> inflated but unowned Java monitor without thread state >>>>> changes >>>>> >>>>> Testing: >>>>> >>>>> - Aurora Adhoc RT/SVC baseline batch >>>>> - JPRT test jobs >>>>> - MonitorEnterStresser micro-benchmark (in process) >>>>> - CallTimerGrid stress testing (in process) >>>>> - Aurora performance testing: >>>>> - out of the box for the "promotion" and 32-bit server configs >>>>> - heavy weight monitors for the "promotion" and 32-bit server >>>>> configs >>>>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>>>> (in process) >>>>> >>>>> >>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>> >>>>> Dan >>> >>> >> >> >> > From staffan.larsen at oracle.com Tue Nov 11 08:03:18 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Tue, 11 Nov 2014 09:03:18 +0100 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5460FF41.90208@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> <5460D1DA.4050907@oracle.com> <5460FF41.90208@oracle.com> Message-ID: <0A9ACBF1-F16F-444A-9CB4-5338C18F68E4@oracle.com> I was able to provoke the failure with a ?jstack -F?. I think this patch solves the problem: http://cr.openjdk.java.net/~sla/8059677-thread.name.sa.patch . Feel free to not include the changes in StackTrace.java if you don?t want to complicate your review. Thanks, /Staffan > On 10 nov 2014, at 19:09, Aleksey Shipilev wrote: > > On 10.11.2014 19:39, Staffan Larsen wrote: >>> On 10 nov 2014, at 15:55, Aleksey Shipilev wrote: >>> Ow, it seems very like it. >>> So, what testlist have I missed to catch this? >> >> Probably vm.tmtools.testlist and/or nsk.sajdi.testlist. Just a warning that these tests are far from stable. Sorry about that. > > Alas, both these testlists pass with current change without a hitch. > That probably tells something about the test coverage. Any other ideas > how to test for it? Maybe some manual way? > > Anyhow, there is a synonymous block in ThreadGroup handling, I can copy > the relevant bits from there. Updated webrev follows soon. Still need to > test if that change is safe. > > -Aleksey. > From david.holmes at oracle.com Tue Nov 11 08:14:38 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Nov 2014 18:14:38 +1000 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5461744E.2000304@oracle.com> References: <5460F402.4060507@oracle.com> <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> <5461744E.2000304@oracle.com> Message-ID: <5461C56E.9090301@oracle.com> On 11/11/2014 12:28 PM, Coleen Phillimore wrote: > > I've made the change to use right_n_bits and run the NMT tests > (including the one that crashed in a loop for a while). > > open webrev at http://cr.openjdk.java.net/~coleenp/8062870_3/ > > Thanks. This is a big improvement. Looks good to me too. I think "we" forget the goodies located in globalDefinitions.hpp sometimes :) David > Coleen > > On 11/10/14, 7:21 PM, John Rose wrote: >> I think on many LP64 platforms the value of (1<<40) is 256, same as >> (1<<8). It will seldom be the intended value of (int64_t)1<<40. >> >> The C "<<" operator is notoriously devious (not to say shifty). >> >> For shift/mask arithmetic we should be continuing to use macros from >> globalDefinitions.hpp. >> They are far more reliable than C expressions. >> >> ? John >> >> On Nov 10, 2014, at 9:21 AM, Coleen Phillimore >> wrote: >> >>> Summary: Signed bitfield size y can only have (1 << y)-1 values. >>> >>> We were overflowing the the _pos index and reusing the 0th element in >>> the MallocSiteTable for two different stack traces which caused the >>> assert for deallocation. >>> >>> Tested with nsk.quick.testlist and jtreg runtime tests with >>> -XX:NativeMemoryTracking=detail. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8062870/ >>> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 >>> >>> Thanks, >>> Coleen > From dmitry.samersoff at oracle.com Tue Nov 11 08:35:40 2014 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Tue, 11 Nov 2014 11:35:40 +0300 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <546151A9.1080100@oracle.com> References: <546151A9.1080100@oracle.com> Message-ID: <5461CA5C.30409@oracle.com> Dan, 1. defs.make: It might be better to join obcopy version check and condition at ll.190 otherwise the user will have a wrong version warning and then misleading message "no objcopy cmd found" 2. Did you consider moving objcopy detection to configure? -Dmitry On 2014-11-11 03:00, Daniel D. Daugherty wrote: > Greetings, > > I have a Solaris Full Debug Symbols (FDS) fix ready for review. > Yes, it is a small fix, but it is in Makefiles so feel free to > run screaming from the room... :-) On the plus side the fix does > delete two work around source files (Coleen would say that's a > Good Thing (TM)!) > > The fix is to detect the version of GNU objcopy that is being > used on the machine and only enable Full Debug Symbols when that > version is 2.21.1 or newer. If you don't have the right version, > then the build drops back to pre-FDS build configs with a message > like this: > > WARNING: /usr/sfw/bin/gobjcopy --version info: > WARNING: GNU objcopy 2.15 > WARNING: an objcopy version of 2.21.1 or newer is needed to create valid > .debuginfo files. > WARNING: ignoring above objcopy command. > WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC > version. > WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 > version. > WARNING: Solaris 11 Update 1 contains the correct version. > INFO: no objcopy cmd found so cannot create .debuginfo files. > INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 > > This work is being tracked by the following bug IDs: > > JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC > https://bugs.openjdk.java.net/browse/JDK-8033602 > > JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on > Solaris X86 > https://bugs.openjdk.java.net/browse/JDK-8034005 > > Here is the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ > > Testing: > > - JPRT test jobs to verify that the current JPRT Solaris hosts > are happy > - local builds on my Solaris 10 X86 machine to verify that the > wrong version of GNU objcopy is caught > > Thanks, in advance, for any comments, questions or suggestions. > > Dan -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the source code. From aleksey.shipilev at oracle.com Tue Nov 11 09:05:18 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 12:05:18 +0300 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <546141E9.8060602@oracle.com> References: <5460F402.4060507@oracle.com> <5460F745.4070808@oracle.com> <546141E9.8060602@oracle.com> Message-ID: <5461D14E.1080708@oracle.com> On 11.11.2014 01:53, Coleen Phillimore wrote: > Thank you for the comments which I initially agreed with but working > with the code, makes me less happy and I will leave it as is (except one > change which I'm going to put out shortly). All right, that's your call. -Aleksey. From aleksey.shipilev at oracle.com Tue Nov 11 09:10:09 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 12:10:09 +0300 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5461744E.2000304@oracle.com> References: <5460F402.4060507@oracle.com> <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> <5461744E.2000304@oracle.com> Message-ID: <5461D271.90204@oracle.com> On 11.11.2014 05:28, Coleen Phillimore wrote: > > I've made the change to use right_n_bits and run the NMT tests > (including the one that crashed in a loop for a while). > > open webrev at http://cr.openjdk.java.net/~coleenp/8062870_3/ Looks good. What confused me initially is a conceptual impedance: _pos_idx is bounded by MAX_BUCKET_LENGTH, and _bucket_idx is bounded by MAX_MALLOCSITE_TABLE_SIZE. Notice the mention of "bucket" in both cases. So it does not look correct from the first glance, and I had to push myself from believing the defined values are not accidentally swapped. Granted, you can get used to this oddity, but it only takes a valuable space in a brain ;) -Aleksey. From aleksey.shipilev at oracle.com Tue Nov 11 09:26:01 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 12:26:01 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <0A9ACBF1-F16F-444A-9CB4-5338C18F68E4@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> <5460D1DA.4050907@oracle.com> <5460FF41.90208@oracle.com> <0A9ACBF1-F16F-444A-9CB4-5338C18F68E4@oracle.com> Message-ID: <5461D629.3010001@oracle.com> Thanks Staffan, your change is exactly what I (blindly) did in my updated webrev. I will get David's comments in, respin some tests and publish the update. -Aleksey. On 11.11.2014 11:03, Staffan Larsen wrote: > I was able to provoke the failure with a ?jstack -F?. I think this patch > solves the > problem: http://cr.openjdk.java.net/~sla/8059677-thread.name.sa.patch > . Feel > free to not include the changes in StackTrace.java if you don?t want to > complicate your review. > > Thanks, > /Staffan > > >> On 10 nov 2014, at 19:09, Aleksey Shipilev >> > wrote: >> >> On 10.11.2014 19:39, Staffan Larsen wrote: >>>> On 10 nov 2014, at 15:55, Aleksey Shipilev >>>> > >>>> wrote: >>>> Ow, it seems very like it. >>>> So, what testlist have I missed to catch this? >>> >>> Probably vm.tmtools.testlist and/or nsk.sajdi.testlist. Just a >>> warning that these tests are far from stable. Sorry about that. >> >> Alas, both these testlists pass with current change without a hitch. >> That probably tells something about the test coverage. Any other ideas >> how to test for it? Maybe some manual way? >> >> Anyhow, there is a synonymous block in ThreadGroup handling, I can copy >> the relevant bits from there. Updated webrev follows soon. Still need to >> test if that change is safe. >> >> -Aleksey. >> > From aleksey.shipilev at oracle.com Tue Nov 11 09:38:46 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 12:38:46 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5461B593.1000104@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> <5461B593.1000104@oracle.com> Message-ID: <5461D926.6010008@oracle.com> Hi David, Updated webrevs will follow after I respin the tests. Meanwhile, some comments below: On 11.11.2014 10:06, David Holmes wrote: > On 11/11/2014 12:19 AM, Aleksey Shipilev wrote: >> Then I speculated that having char[] name would help VM initialize the >> name if we wanted to switch to complete VM-side initialization of >> Thread, but it seems we can do String oop instantiation in the similar >> vein. > > I think it really just came down to accessing the Thread name from > things like JVMDI/PI (now JVM TI) - easier for C code to access a raw > char[]. Maybe once upon a time (in a land not so far away) we even > passed char[] to the Thread constructor? :) But having re-discovered > past discussions etc there's really nothing to stop this from being a > String (slight memory use increase per Thread object). Yes. char[] does appear simpler from the native side, if not that pesky Unicode requirement that forces use to use Unicode routines within the VM to deal with char[] exposed to the Java side. Not so much an improvement comparing to String oop dance. > JDK change is okay - but "name" doesn't need to be volatile when it is a > String reference. I understand the memory model reasoning about the correctness, but I think users rightfully expect getName() to return the last "updated" Thread.name, even though this requirement is not spelled out specifically. Therefore, I believe "volatile" should stay. (I would be violently disappointed about the JDK if I realized my logging is garbled and the same thread "appears" under several names back and forth within a short time window -- because of data race on Thread.name) > Hotspot side: > > src/share/vm/classfile/javaClasses.hpp > > This added assert seems overly cautious: > > 134 oop value = java_string->obj_field(value_offset); > 135 assert((value->is_typeArray() && > TypeArrayKlass::cast(value->klass())->element_type() == T_CHAR), "expect > char[]"); > > you are basically checking that String.value is defined as a char[]. If > warranted, this is a check needed once in the lifetime of a VM not every > time this method is called. (Yes I see we had something similarly odd in > java_lang_thread::name. :( ) Agreed. Dropped the assert from here. I think we already check this for String.name field when we pre-compute the value_offset. > --- > > src/share/vm/classfile/javaClasses.cpp > > ! oop java_lang_Thread::name(oop java_thread) { > oop name = java_thread->obj_field(_name_offset); > ! assert(name != NULL, "thread name is NULL"); > > I'm not confident this can never be called before the name has been set. > The original assertion allowed for NULL as does the JVM TI code. Agreed. Dropped the assert altogether. > --- > > src/share/vm/prims/jvmtiTrace.cpp > > Copyright year needs updating. :) Done. > --- > > Aside: I wonder if we've inadvertently fixed 6771287 now. :) That was a > fun one to debug. Ouch. -Aleksey. From david.holmes at oracle.com Tue Nov 11 10:29:22 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Nov 2014 20:29:22 +1000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5461D926.6010008@oracle.com> References: <545FB64F.7090705@oracle.com> <5460A6E8.9050506@oracle.com> <5460B608.4050909@oracle.com> <5460C354.5000605@oracle.com> <5460C960.9080509@oracle.com> <5461B593.1000104@oracle.com> <5461D926.6010008@oracle.com> Message-ID: <5461E502.4070700@oracle.com> On 11/11/2014 7:38 PM, Aleksey Shipilev wrote: > Hi David, > > Updated webrevs will follow after I respin the tests. Meanwhile, some > comments below: > > On 11.11.2014 10:06, David Holmes wrote: >> On 11/11/2014 12:19 AM, Aleksey Shipilev wrote: >>> Then I speculated that having char[] name would help VM initialize the >>> name if we wanted to switch to complete VM-side initialization of >>> Thread, but it seems we can do String oop instantiation in the similar >>> vein. >> >> I think it really just came down to accessing the Thread name from >> things like JVMDI/PI (now JVM TI) - easier for C code to access a raw >> char[]. Maybe once upon a time (in a land not so far away) we even >> passed char[] to the Thread constructor? :) But having re-discovered >> past discussions etc there's really nothing to stop this from being a >> String (slight memory use increase per Thread object). > > Yes. char[] does appear simpler from the native side, if not that pesky > Unicode requirement that forces use to use Unicode routines within the > VM to deal with char[] exposed to the Java side. Not so much an > improvement comparing to String oop dance. > > >> JDK change is okay - but "name" doesn't need to be volatile when it is a >> String reference. > > I understand the memory model reasoning about the correctness, but I > think users rightfully expect getName() to return the last "updated" > Thread.name, even though this requirement is not spelled out > specifically. Therefore, I believe "volatile" should stay. > > (I would be violently disappointed about the JDK if I realized my > logging is garbled and the same thread "appears" under several names > back and forth within a short time window -- because of data race on > Thread.name) Yes - silly of me. I was thinking the name is only set once but of course it can be set many times. Cheers, David ------ >> Hotspot side: >> >> src/share/vm/classfile/javaClasses.hpp >> >> This added assert seems overly cautious: >> >> 134 oop value = java_string->obj_field(value_offset); >> 135 assert((value->is_typeArray() && >> TypeArrayKlass::cast(value->klass())->element_type() == T_CHAR), "expect >> char[]"); >> >> you are basically checking that String.value is defined as a char[]. If >> warranted, this is a check needed once in the lifetime of a VM not every >> time this method is called. (Yes I see we had something similarly odd in >> java_lang_thread::name. :( ) > > Agreed. Dropped the assert from here. I think we already check this for > String.name field when we pre-compute the value_offset. > > >> --- >> >> src/share/vm/classfile/javaClasses.cpp >> >> ! oop java_lang_Thread::name(oop java_thread) { >> oop name = java_thread->obj_field(_name_offset); >> ! assert(name != NULL, "thread name is NULL"); >> >> I'm not confident this can never be called before the name has been set. >> The original assertion allowed for NULL as does the JVM TI code. > > Agreed. Dropped the assert altogether. > > >> --- >> >> src/share/vm/prims/jvmtiTrace.cpp >> >> Copyright year needs updating. :) > > Done. > > >> --- >> >> Aside: I wonder if we've inadvertently fixed 6771287 now. :) That was a >> fun one to debug. > > Ouch. > > > -Aleksey. > > > From aleksey.shipilev at oracle.com Tue Nov 11 11:38:20 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 14:38:20 +0300 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <54615A43.10700@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> <545F8CFA.80809@oracle.com> <54615A43.10700@oracle.com> Message-ID: <5461F52C.6020002@oracle.com> Thanks for review, Coleen! On 11.11.2014 03:37, Coleen Phillimore wrote: > Hi, I think this code looks correct. Was there a test in the test > system that exercises this code? I think it would be hard to test with > a dedicated test but was there one already in the test sets? Yes, there are @Contended tests in vmtestbase that exercise the @Contended placed over different fields. I added the targeted test that also does walk through new code. There is nothing to check there, except for the native assert in the new code. > Secondly, could you use the word adjacent in the comments, reuse oopmap > for adjacent oops in the class or something like that? That would have > saved me some jotting down on notebook. Sure, see the update. In previous change, I blindly copied the block already available for non- at Contended oops. I remember the oop maps code was tripping me over, this is why we have an explanation all the way on the top how oop maps are supposed to work. > I'll sponsor it if you get another reviewer. Here's the updated webrev: http://cr.openjdk.java.net/~shade/8015272/webrev.02/ I have only tested in builds on Linux x86_64/fastdebug, and passes runtime/contended jtregs. There were no changes in product code since last webrev, only in comments. Thanks, -Aleksey. From david.holmes at oracle.com Tue Nov 11 12:01:18 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Nov 2014 22:01:18 +1000 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <5461F52C.6020002@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> <545F8CFA.80809@oracle.com> <54615A43.10700@oracle.com> <5461F52C.6020002@oracle.com> Message-ID: <5461FA8E.2000301@oracle.com> On 11/11/2014 9:38 PM, Aleksey Shipilev wrote: > Thanks for review, Coleen! > > On 11.11.2014 03:37, Coleen Phillimore wrote: >> Hi, I think this code looks correct. Was there a test in the test >> system that exercises this code? I think it would be hard to test with >> a dedicated test but was there one already in the test sets? > > Yes, there are @Contended tests in vmtestbase that exercise the > @Contended placed over different fields. I added the targeted test that > also does walk through new code. There is nothing to check there, except > for the native assert in the new code. > >> Secondly, could you use the word adjacent in the comments, reuse oopmap >> for adjacent oops in the class or something like that? That would have >> saved me some jotting down on notebook. > > Sure, see the update. In previous change, I blindly copied the block > already available for non- at Contended oops. I remember the oop maps code > was tripping me over, this is why we have an explanation all the way on > the top how oop maps are supposed to work. > >> I'll sponsor it if you get another reviewer. I'll add my Review. Changes seem okay. Looks like the style-Police didn't pay enough attention to this section of code though as a lot of: if( XXX ) have crept in instead of: if (XXX) ;-) Cheers, David > Here's the updated webrev: > http://cr.openjdk.java.net/~shade/8015272/webrev.02/ > > I have only tested in builds on Linux x86_64/fastdebug, and passes > runtime/contended jtregs. There were no changes in product code since > last webrev, only in comments. > > Thanks, > -Aleksey. > From aleksey.shipilev at oracle.com Tue Nov 11 12:04:47 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 15:04:47 +0300 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <5461FA8E.2000301@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> <545F8CFA.80809@oracle.com> <54615A43.10700@oracle.com> <5461F52C.6020002@oracle.com> <5461FA8E.2000301@oracle.com> Message-ID: <5461FB5F.6060406@oracle.com> On 11.11.2014 15:01, David Holmes wrote: >>> I'll sponsor it if you get another reviewer. > > I'll add my Review. Changes seem okay. Thanks! > Looks like the style-Police didn't pay enough attention to this section > of code though as a lot of: > > if( XXX ) > > have crept in instead of: > > if (XXX) > > ;-) Yes, but unfortunately, that is consistent with the code style in method. If we are to change that, we should probably need to change the style consistently in the entire classLoader.cpp. -Aleksey. From aleksey.shipilev at oracle.com Tue Nov 11 12:09:07 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 15:09:07 +0300 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <5461FB5F.6060406@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> <545F8CFA.80809@oracle.com> <54615A43.10700@oracle.com> <5461F52C.6020002@oracle.com> <5461FA8E.2000301@oracle.com> <5461FB5F.6060406@oracle.com> Message-ID: <5461FC63.6030001@oracle.com> On 11.11.2014 15:04, Aleksey Shipilev wrote: > On 11.11.2014 15:01, David Holmes wrote: >>>> I'll sponsor it if you get another reviewer. >> >> I'll add my Review. Changes seem okay. > > Thanks! Changeset: http://cr.openjdk.java.net/~shade/8015272/8015272.changeset -Aleksey From gunter.haug at sap.com Tue Nov 11 13:23:00 2014 From: gunter.haug at sap.com (Haug, Gunter) Date: Tue, 11 Nov 2014 13:23:00 +0000 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX Message-ID: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> Hi All, The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs improvement)' makes use of getrusage() to retrieve accurate per-thread data on resource usage. We can use exactly the same code on AIX to achieve this. Please review the following change: http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8064471 Thanks, Gunter From volker.simonis at gmail.com Tue Nov 11 13:27:48 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 11 Nov 2014 14:27:48 +0100 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> Message-ID: Hi Gunter, I think the change looks good. I can also sponsor the change if we get one more review. This is a good possibility to test if we are really able to push changes to the ppc/aix directories of the hotspot repositories. Thank you and best regards, Volker On Tue, Nov 11, 2014 at 2:23 PM, Haug, Gunter wrote: > Hi All, > > The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs improvement)' makes use of getrusage() to retrieve accurate per-thread data on resource usage. We can use exactly the same code on AIX to achieve this. > > Please review the following change: > > http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8064471 > > Thanks, > Gunter > From daniel.daugherty at oracle.com Tue Nov 11 13:55:50 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Nov 2014 06:55:50 -0700 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <5461C282.1020806@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> <54591A3A.1090005@oracle.com> <545C2BC0.3080207@oracle.com> <5461C282.1020806@oracle.com> Message-ID: <54621566.9040805@oracle.com> On 11/11/14 1:02 AM, David Holmes wrote: > On 7/11/2014 12:17 PM, Daniel D. Daugherty wrote: >> The fix for JDK-8062851 has been reviewed, tested and pushed to >> RT_Baseline. >> >> Time to get back to this review thread so here's an updated webrev: >> >> http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/ >> >> David H., I believe I've addressed all of your comments. Please >> let me know if I missed something... > > Looks good to me - thanks Dan! Thanks for the re-review! Dan > > David > ----- > >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> >> >> On 11/4/14 11:26 AM, Daniel D. Daugherty wrote: >>> The cleanup is turning into a bigger change than the fast enter >>> bucket itself so I'm spinning the cleanup into a new bug: >>> >>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>> >>> Yes, this means that the Contended Locking cleanup bucket has reopened >>> for yet another change... >>> >>> We'll get back to "fast enter" after the dust has settled... >>> >>> Dan >>> >>> >>> On 11/3/14 6:59 PM, Daniel D. Daugherty wrote: >>>> David, >>>> >>>> Thanks for the review! As usual, replies are embedded below... >>>> >>>> >>>> On 11/2/14 9:44 PM, David Holmes wrote: >>>>> Hi Dan, >>>>> >>>>> Looks good. >>>> >>>> Thanks! >>>> >>>> >>>>> Couple of nits and one semantic query below ... >>>>> >>>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp >>>>> >>>>> Formatting changes were a bit of a distraction. >>>> >>>> Yes, I have no idea what got into me. Normally I do formatting >>>> changes separately so the noise does not distract... >>>> >>>> It turns out there is a constant defined that should be used >>>> instead of all these literal '2's: >>>> >>>> src/share/vm/oops/markOop.hpp: monitor_value = 2 >>>> >>>> Typically used as follows: >>>> >>>> src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = >>>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; >>>> >>>> I will clean this up just for the files that I'm touching as >>>> part of this fix. >>>> >>>> >>>>> >>>>> --- >>>>> >>>>> src/cpu/x86/vm/macroAssembler_x86.cpp >>>>> >>>>> Formatting changes were a bit of a distraction. >>>> >>>> Same reply as for macroAssembler_sparc.cpp. >>>> >>>> >>>>> 1929 // unconditionally set stackBox->_displaced_header = 3 >>>>> 1930 movptr(Address(boxReg, 0), >>>>> (int32_t)intptr_t(markOopDesc::unused_mark())); >>>>> >>>>> At 1870 we refer to box rather than stackBox. Also it takes some >>>>> sleuthing to realize that "3" here is somehow a pseudonym for >>>>> unused_mark(). Back up at 1808 we have a to-do: >>>>> >>>>> 1808 // use markOop::unused_mark() instead of "3". >>>>> >>>>> so the current change seems to be implementing that, even though >>>>> other uses of "3" are left untouched. >>>> >>>> I'll take a look at cleaning those up also... >>>> >>>> In some cases markOopDesc::marked_value will work for the literal '3', >>>> but in other cases we'll use markOop::unused_mark(): >>>> >>>> static markOop unused_mark() { >>>> return (markOop) marked_value; >>>> } >>>> >>>> to save us the noise of the (markOop) cast. >>>> >>>> >>>>> --- >>>>> >>>>> src/share/vm/runtime/sharedRuntime.cpp >>>>> >>>>> 1794 JRT_BLOCK_ENTRY(void, >>>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >>>>> lock, JavaThread* thread)) >>>>> 1795 if (!SafepointSynchronize::is_synchronizing()) { >>>>> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >>>>> return; >>>>> >>>>> Is it necessary to check is_synchronizing? If we are executing this >>>>> code we are not at a safepoint and the quick_enter wont change that, >>>>> so I'm not sure what we are guarding against. >>>> >>>> So this first state checker: >>>> >>>> src/share/vm/runtime/safepoint.hpp: >>>> inline static bool is_synchronizing() { return _state == >>>> _synchronizing; } >>>> >>>> means that we want to go to a safepoint and: >>>> >>>> inline static bool is_at_safepoint() { return _state == >>>> _synchronized; } >>>> >>>> means that we are at a safepoint. Dice's optimization bails out if >>>> we want to go to a safepoint and ObjectSynchronizer::quick_enter() >>>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that >>>> code to be quick (and not go to a safepoint). I'm not seeing >>>> anything obvious.... >>>> >>>> Sometimes we have to be careful with JavaThread suspend requests and >>>> monitor acquisition, but I don't think that's a problem here... In >>>> order for the "suspend requesting" thread to be surprised, the suspend >>>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then >>>> the suspend target has do something unexpected like acquire a monitor >>>> that it was previously blocked upon when it was suspended. We've had >>>> bugs like that in the past... In this optimization case, our target >>>> thread is not blocked on a contended monitor... >>>> >>>> In this particular case, the "suspend requesting" thread will set the >>>> suspend request state on the target thread, but the target thread is >>>> busy trying to enter this uncontended monitor (quickly). So the >>>> "suspend requesting" thread, will request a no-op safepoint, but it >>>> won't return from the suspend API until that safepoint completes. >>>> The safepoint won't complete until the target thread is done acquiring >>>> the previously uncontended monitor... so the target thread will be >>>> suspended while holding the previous uncontended monitor and the >>>> "suspend requesting" thread will return from the suspend API all >>>> happy... >>>> >>>> Well, I don't see the reason either so I'll have to ping Dave Dice >>>> and Karen Kinnear to see if either of them can fill in the history >>>> here. This could be an abundance of caution case. >>>> >>>> >>>>> --- >>>>> >>>>> src/share/vm/runtime/synchronizer.cpp >>>>> >>>>> Minor nit: line 153 the usual acronym is NPE (for >>>>> NullPointerException) not NPX >>>> >>>> I'll do a search for uses of NPX and other uses of 'X' in exception >>>> acronyms... >>>> >>>> >>>>> >>>>> Nit: 159 Thread * const ox >>>>> >>>>> Please change ox to owner. >>>> >>>> Will do. >>>> >>>> Thanks again for the review! >>>> >>>> Dan >>>> >>>> >>>>> >>>>> --- >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>> >>>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>>>>> Greetings, >>>>>> >>>>>> I have the Contended Locking fast enter bucket ready for review. >>>>>> >>>>>> The code changes in this bucket are primarily a quick_enter() >>>>>> function that works on inflated but uncontended Java monitors. >>>>>> This quick_enter() function is used on the "slow path" for Java >>>>>> Monitor enter operations when the built-in "fast path" (read >>>>>> assembly code) doesn't work. >>>>>> >>>>>> This work is being tracked by the following bug ID: >>>>>> >>>>>> JDK-8061553 Contended Locking fast enter bucket >>>>>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>>>>> >>>>>> Here is the webrev URL: >>>>>> >>>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>>>>> >>>>>> Here is the JEP link: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>> >>>>>> 8061553 summary of changes: >>>>>> >>>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>>>>> >>>>>> - clean up spacing around some >>>>>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>>>>> - remove optional (EmitSync & 64) code >>>>>> - change from cmp() to andcc() so icc.zf flag is set >>>>>> >>>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>>>>> >>>>>> - remove optional (EmitSync & 2) code >>>>>> - rewrite LP64 inflated lock code that tries to CAS in >>>>>> the new owner value to be more efficient >>>>>> >>>>>> interfaceSupport.hpp: >>>>>> >>>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>>>>> JRT_BLOCK_ENTRY into two pieces. >>>>>> >>>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>>>>> >>>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>>>>> to permit ObjectSynchronizer::quick_enter() call >>>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>>>>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>>>>> >>>>>> synchronizer.[ch]pp: >>>>>> >>>>>> - add ObjectSynchronizer::quick_enter() for entering an >>>>>> inflated but unowned Java monitor without thread state >>>>>> changes >>>>>> >>>>>> Testing: >>>>>> >>>>>> - Aurora Adhoc RT/SVC baseline batch >>>>>> - JPRT test jobs >>>>>> - MonitorEnterStresser micro-benchmark (in process) >>>>>> - CallTimerGrid stress testing (in process) >>>>>> - Aurora performance testing: >>>>>> - out of the box for the "promotion" and 32-bit server configs >>>>>> - heavy weight monitors for the "promotion" and 32-bit server >>>>>> configs >>>>>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>>>>> (in process) >>>>>> >>>>>> >>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>> >>>>>> Dan >>>> >>>> >>> >>> >>> >> From aleksey.shipilev at oracle.com Tue Nov 11 14:40:58 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 17:40:58 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <545FB64F.7090705@oracle.com> References: <545FB64F.7090705@oracle.com> Message-ID: <54621FFA.2070503@oracle.com> Hi, On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: > Thread.getName() returns String, and does new String instantiation every > time, because the thread name is stored in char[]. Even though we use a > private String constructor that shares the char[] array without copying > it, this still hurts some use cases (think extra-fast logging). To the > extent some people actually maintain Map to avoid it. > https://bugs.openjdk.java.net/browse/JDK-8059677 > > Here's the attempt to maintain String instead of char[]: > http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ > http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ Updated webrevs: http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ This version incorporates feedbacks from Chris, Staffan and David. I think it is very close to what we would like to push. Opinions? Testing: JPRT, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist, vm.tmtools.testlist Thanks, -Aleksey. From coleen.phillimore at oracle.com Tue Nov 11 14:59:12 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 11 Nov 2014 09:59:12 -0500 Subject: RFR: 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5461D271.90204@oracle.com> References: <5460F402.4060507@oracle.com> <1FA4267A-C001-48A5-9C04-1DA54890C3F1@oracle.com> <5461744E.2000304@oracle.com> <5461D271.90204@oracle.com> Message-ID: <54622440.6070607@oracle.com> On 11/11/14, 4:10 AM, Aleksey Shipilev wrote: > On 11.11.2014 05:28, Coleen Phillimore wrote: >> I've made the change to use right_n_bits and run the NMT tests >> (including the one that crashed in a loop for a while). >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870_3/ > Looks good. > > What confused me initially is a conceptual impedance: _pos_idx is > bounded by MAX_BUCKET_LENGTH, and _bucket_idx is bounded by > MAX_MALLOCSITE_TABLE_SIZE. Notice the mention of "bucket" in both cases. > So it does not look correct from the first glance, and I had to push > myself from believing the defined values are not accidentally swapped. > Granted, you can get used to this oddity, but it only takes a valuable > space in a brain ;) Hi Aleksey, Thanks for the code review. The MAX_BUCKET_LENGTH concept is used in the MallocSiteTable::lookup_or_add function when we add things to the hash table for updating counters later. It makes sense in that context. If I were changing more things in this code, I might be likely to make changes to save brain cells. We have to backport these changes to 8u though. Thanks! Coleen > > -Aleksey. > From coleen.phillimore at oracle.com Tue Nov 11 15:12:20 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 11 Nov 2014 10:12:20 -0500 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <5461FB5F.6060406@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> <545F8CFA.80809@oracle.com> <54615A43.10700@oracle.com> <5461F52C.6020002@oracle.com> <5461FA8E.2000301@oracle.com> <5461FB5F.6060406@oracle.com> Message-ID: <54622754.5060205@oracle.com> On 11/11/14, 7:04 AM, Aleksey Shipilev wrote: > On 11.11.2014 15:01, David Holmes wrote: >>>> I'll sponsor it if you get another reviewer. >> I'll add my Review. Changes seem okay. > Thanks! > >> Looks like the style-Police didn't pay enough attention to this section >> of code though as a lot of: >> >> if( XXX ) >> >> have crept in instead of: >> >> if (XXX) >> >> ;-) > Yes, but unfortunately, that is consistent with the code style in > method. If we are to change that, we should probably need to change the > style consistently in the entire classLoader.cpp. I didn't notice that this was a copy. This code is badly in need of refactoring to remove all the duplication, someday. Coleen > > -Aleksey. > From karen.kinnear at oracle.com Tue Nov 11 15:36:54 2014 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 11 Nov 2014 10:36:54 -0500 Subject: RFR (XS) JDK-8015272: Make @Contended within the same group to use the same oop map In-Reply-To: <545F8CFA.80809@oracle.com> References: <525AC628.4020906@oracle.com> <009001cec83e$5e9c6b40$1bd541c0$@oracle.com> <525B0A18.8000105@oracle.com> <545B70F6.60801@oracle.com> <7029DBDA-BBE6-420A-BE6E-2EA28B60B560@oracle.com> <545B9CC0.3080106@oracle.com> <545F8CFA.80809@oracle.com> Message-ID: Aleksey, Many thanks for the additional testing and checking that there was no need for platform-specific testing. thanks, Karen On Nov 9, 2014, at 10:49 AM, Aleksey Shipilev wrote: > Hi again, > > No changes in webrev: > http://cr.openjdk.java.net/~shade/8015272/webrev.01/ > > Please review and sponsor: > http://cr.openjdk.java.net/~shade/8015272/8015272.changeset > > As per Karen's request, more testing is done, ran the tests on my Linux > x86_64/fastdebug: > > On 11/06/2014 07:07 PM, Aleksey Shipilev wrote: >> On 11/06/2014 06:01 PM, Karen Kinnear wrote: >>> - e.g. what about the vmtestbase vm/runtime/contended tests (and yes, some tests should be removed from that testlist) > > vmtestbase vm/runtime/contended: no issues. > hotspot/test/runtime/ jtreg: no issues. > >>> - vmtestbase: vm.quick.testlist (required for runtime changes) > > vm.quick.testlist: no issues. > >>> - and since @Contended annotation is used in the JDK core libraries - the jtreg jdk tests? > > jdk/test/java/util/concurrent jtreg: no issues. > jdk/test/java/lang/Thread jtreg: no issues. > > > Thanks, > -Aleksey. > > From daniel.daugherty at oracle.com Tue Nov 11 15:40:32 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Nov 2014 08:40:32 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <5461CA5C.30409@oracle.com> References: <546151A9.1080100@oracle.com> <5461CA5C.30409@oracle.com> Message-ID: <54622DF0.5010800@oracle.com> Dmitry, Thanks for the quick review! Replies embedded below... On 11/11/14 1:35 AM, Dmitry Samersoff wrote: > Dan, > > 1. defs.make: > > It might be better to join obcopy version check and condition at ll.190 I looked at that... The seemingly natural place to put the version check is actually in the else branch on line 194... However, if the version check is bad, then you have to make a second check for a reset OBJCOPY value (along with indenting all the code another level or two). It just looked ugly... it seemed better to keep the version check separate from the other logic. > otherwise the user will have a wrong version warning and then misleading > message "no objcopy cmd found" However, part of that wrong version warning is this line: WARNING: ignoring above objcopy command. so in reality that "no objcopy cmd found" is just confirming that we are indeed ignoring the objcopy cmd that we found... > 2. Did you consider moving objcopy detection to configure? No because this fix has to be backported to JDK8u and JDK7 since we support FDS in those releases... Of course, the build-infra team is always welcome to use a new bug to evolve this code for JDK9 and newer. Again, thanks for the review! Dan > > > -Dmitry > > > On 2014-11-11 03:00, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >> Yes, it is a small fix, but it is in Makefiles so feel free to >> run screaming from the room... :-) On the plus side the fix does >> delete two work around source files (Coleen would say that's a >> Good Thing (TM)!) >> >> The fix is to detect the version of GNU objcopy that is being >> used on the machine and only enable Full Debug Symbols when that >> version is 2.21.1 or newer. If you don't have the right version, >> then the build drops back to pre-FDS build configs with a message >> like this: >> >> WARNING: /usr/sfw/bin/gobjcopy --version info: >> WARNING: GNU objcopy 2.15 >> WARNING: an objcopy version of 2.21.1 or newer is needed to create valid >> .debuginfo files. >> WARNING: ignoring above objcopy command. >> WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC >> version. >> WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 >> version. >> WARNING: Solaris 11 Update 1 contains the correct version. >> INFO: no objcopy cmd found so cannot create .debuginfo files. >> INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 >> >> This work is being tracked by the following bug IDs: >> >> JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC >> https://bugs.openjdk.java.net/browse/JDK-8033602 >> >> JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on >> Solaris X86 >> https://bugs.openjdk.java.net/browse/JDK-8034005 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ >> >> Testing: >> >> - JPRT test jobs to verify that the current JPRT Solaris hosts >> are happy >> - local builds on my Solaris 10 X86 machine to verify that the >> wrong version of GNU objcopy is caught >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan > From coleen.phillimore at oracle.com Tue Nov 11 15:59:30 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 11 Nov 2014 10:59:30 -0500 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <54621FFA.2070503@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> Message-ID: <54623262.9090109@oracle.com> The Hotspot changes look straightforward and correct to me. thanks, Coleen On 11/11/14, 9:40 AM, Aleksey Shipilev wrote: > Hi, > > On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: >> Thread.getName() returns String, and does new String instantiation every >> time, because the thread name is stored in char[]. Even though we use a >> private String constructor that shares the char[] array without copying >> it, this still hurts some use cases (think extra-fast logging). To the >> extent some people actually maintain Map to avoid it. >> https://bugs.openjdk.java.net/browse/JDK-8059677 >> >> Here's the attempt to maintain String instead of char[]: >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ > Updated webrevs: > http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ > http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ > > This version incorporates feedbacks from Chris, Staffan and David. I > think it is very close to what we would like to push. Opinions? > > Testing: JPRT, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ > jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist, > vm.tmtools.testlist > > Thanks, > -Aleksey. > > > > From aleksey.shipilev at oracle.com Tue Nov 11 16:04:17 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 19:04:17 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <54623262.9090109@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> <54623262.9090109@oracle.com> Message-ID: <54623381.9080509@oracle.com> Thanks for review, Coleen! -Aleksey. On 11/11/2014 06:59 PM, Coleen Phillimore wrote: > > The Hotspot changes look straightforward and correct to me. > thanks, > Coleen > > On 11/11/14, 9:40 AM, Aleksey Shipilev wrote: >> Hi, >> >> On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: >>> Thread.getName() returns String, and does new String instantiation every >>> time, because the thread name is stored in char[]. Even though we use a >>> private String constructor that shares the char[] array without copying >>> it, this still hurts some use cases (think extra-fast logging). To the >>> extent some people actually maintain Map to avoid it. >>> https://bugs.openjdk.java.net/browse/JDK-8059677 >>> >>> Here's the attempt to maintain String instead of char[]: >>> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >>> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ >> Updated webrevs: >> http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ >> >> This version incorporates feedbacks from Chris, Staffan and David. I >> think it is very close to what we would like to push. Opinions? >> >> Testing: JPRT, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ >> jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist, >> vm.tmtools.testlist >> >> Thanks, >> -Aleksey. >> >> >> >> > From chris.hegarty at oracle.com Tue Nov 11 16:10:35 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Tue, 11 Nov 2014 16:10:35 +0000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <54621FFA.2070503@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> Message-ID: On 11 Nov 2014, at 14:40, Aleksey Shipilev wrote: > Hi, > > On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: >> Thread.getName() returns String, and does new String instantiation every >> time, because the thread name is stored in char[]. Even though we use a >> private String constructor that shares the char[] array without copying >> it, this still hurts some use cases (think extra-fast logging). To the >> extent some people actually maintain Map to avoid it. >> https://bugs.openjdk.java.net/browse/JDK-8059677 >> >> Here's the attempt to maintain String instead of char[]: >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ > > Updated webrevs: > http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ Looks good. > http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ I skimmed this webrev, and it also looks fine to me. -Chris. > This version incorporates feedbacks from Chris, Staffan and David. I > think it is very close to what we would like to push. Opinions? > > Testing: JPRT, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ > jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist, > vm.tmtools.testlist > > Thanks, > -Aleksey. > > > > From dmitry.samersoff at oracle.com Tue Nov 11 16:21:36 2014 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Tue, 11 Nov 2014 19:21:36 +0300 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <54622DF0.5010800@oracle.com> References: <546151A9.1080100@oracle.com> <5461CA5C.30409@oracle.com> <54622DF0.5010800@oracle.com> Message-ID: <54623790.4080103@oracle.com> Dan, Thank you for the explanation. The fix looks good for me. -Dmitry On 2014-11-11 18:40, Daniel D. Daugherty wrote: > Dmitry, > > Thanks for the quick review! > > Replies embedded below... > > > On 11/11/14 1:35 AM, Dmitry Samersoff wrote: >> Dan, >> >> 1. defs.make: >> >> It might be better to join obcopy version check and condition at ll.190 > > I looked at that... The seemingly natural place to put the version check > is actually in the else branch on line 194... However, if the version > check is bad, then you have to make a second check for a reset OBJCOPY > value (along with indenting all the code another level or two). > > It just looked ugly... it seemed better to keep the version check > separate from the other logic. > > >> otherwise the user will have a wrong version warning and then misleading >> message "no objcopy cmd found" > > However, part of that wrong version warning is this line: > > WARNING: ignoring above objcopy command. > > so in reality that "no objcopy cmd found" is just confirming > that we are indeed ignoring the objcopy cmd that we found... > > >> 2. Did you consider moving objcopy detection to configure? > > No because this fix has to be backported to JDK8u and JDK7 since > we support FDS in those releases... > > Of course, the build-infra team is always welcome to use a new > bug to evolve this code for JDK9 and newer. > > Again, thanks for the review! > > Dan > > >> >> >> -Dmitry >> >> >> On 2014-11-11 03:00, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >>> Yes, it is a small fix, but it is in Makefiles so feel free to >>> run screaming from the room... :-) On the plus side the fix does >>> delete two work around source files (Coleen would say that's a >>> Good Thing (TM)!) >>> >>> The fix is to detect the version of GNU objcopy that is being >>> used on the machine and only enable Full Debug Symbols when that >>> version is 2.21.1 or newer. If you don't have the right version, >>> then the build drops back to pre-FDS build configs with a message >>> like this: >>> >>> WARNING: /usr/sfw/bin/gobjcopy --version info: >>> WARNING: GNU objcopy 2.15 >>> WARNING: an objcopy version of 2.21.1 or newer is needed to create valid >>> .debuginfo files. >>> WARNING: ignoring above objcopy command. >>> WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC >>> version. >>> WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 >>> version. >>> WARNING: Solaris 11 Update 1 contains the correct version. >>> INFO: no objcopy cmd found so cannot create .debuginfo files. >>> INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 >>> >>> This work is being tracked by the following bug IDs: >>> >>> JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC >>> https://bugs.openjdk.java.net/browse/JDK-8033602 >>> >>> JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on >>> Solaris X86 >>> https://bugs.openjdk.java.net/browse/JDK-8034005 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ >>> >>> Testing: >>> >>> - JPRT test jobs to verify that the current JPRT Solaris hosts >>> are happy >>> - local builds on my Solaris 10 X86 machine to verify that the >>> wrong version of GNU objcopy is caught >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan >> > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From daniel.daugherty at oracle.com Tue Nov 11 17:25:42 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Nov 2014 10:25:42 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <54623790.4080103@oracle.com> References: <546151A9.1080100@oracle.com> <5461CA5C.30409@oracle.com> <54622DF0.5010800@oracle.com> <54623790.4080103@oracle.com> Message-ID: <54624696.2090201@oracle.com> Thanks for closing the loop on this! Dan On 11/11/14 9:21 AM, Dmitry Samersoff wrote: > Dan, > > Thank you for the explanation. > > The fix looks good for me. > > -Dmitry > > On 2014-11-11 18:40, Daniel D. Daugherty wrote: >> Dmitry, >> >> Thanks for the quick review! >> >> Replies embedded below... >> >> >> On 11/11/14 1:35 AM, Dmitry Samersoff wrote: >>> Dan, >>> >>> 1. defs.make: >>> >>> It might be better to join obcopy version check and condition at ll.190 >> I looked at that... The seemingly natural place to put the version check >> is actually in the else branch on line 194... However, if the version >> check is bad, then you have to make a second check for a reset OBJCOPY >> value (along with indenting all the code another level or two). >> >> It just looked ugly... it seemed better to keep the version check >> separate from the other logic. >> >> >>> otherwise the user will have a wrong version warning and then misleading >>> message "no objcopy cmd found" >> However, part of that wrong version warning is this line: >> >> WARNING: ignoring above objcopy command. >> >> so in reality that "no objcopy cmd found" is just confirming >> that we are indeed ignoring the objcopy cmd that we found... >> >> >>> 2. Did you consider moving objcopy detection to configure? >> No because this fix has to be backported to JDK8u and JDK7 since >> we support FDS in those releases... >> >> Of course, the build-infra team is always welcome to use a new >> bug to evolve this code for JDK9 and newer. >> >> Again, thanks for the review! >> >> Dan >> >> >>> >>> -Dmitry >>> >>> >>> On 2014-11-11 03:00, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >>>> Yes, it is a small fix, but it is in Makefiles so feel free to >>>> run screaming from the room... :-) On the plus side the fix does >>>> delete two work around source files (Coleen would say that's a >>>> Good Thing (TM)!) >>>> >>>> The fix is to detect the version of GNU objcopy that is being >>>> used on the machine and only enable Full Debug Symbols when that >>>> version is 2.21.1 or newer. If you don't have the right version, >>>> then the build drops back to pre-FDS build configs with a message >>>> like this: >>>> >>>> WARNING: /usr/sfw/bin/gobjcopy --version info: >>>> WARNING: GNU objcopy 2.15 >>>> WARNING: an objcopy version of 2.21.1 or newer is needed to create valid >>>> .debuginfo files. >>>> WARNING: ignoring above objcopy command. >>>> WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC >>>> version. >>>> WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 >>>> version. >>>> WARNING: Solaris 11 Update 1 contains the correct version. >>>> INFO: no objcopy cmd found so cannot create .debuginfo files. >>>> INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 >>>> >>>> This work is being tracked by the following bug IDs: >>>> >>>> JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC >>>> https://bugs.openjdk.java.net/browse/JDK-8033602 >>>> >>>> JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on >>>> Solaris X86 >>>> https://bugs.openjdk.java.net/browse/JDK-8034005 >>>> >>>> Here is the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ >>>> >>>> Testing: >>>> >>>> - JPRT test jobs to verify that the current JPRT Solaris hosts >>>> are happy >>>> - local builds on my Solaris 10 X86 machine to verify that the >>>> wrong version of GNU objcopy is caught >>>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan > From lois.foltan at oracle.com Tue Nov 11 18:22:56 2014 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 11 Nov 2014 13:22:56 -0500 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <545C21E6.90709@oracle.com> References: <545C21E6.90709@oracle.com> Message-ID: <54625400.1000701@oracle.com> Hi Jiangli, Yes, this looks good, reviewed. Lois On 11/6/2014 8:35 PM, Jiangli Zhou wrote: > Hi, > > Please review the following changes that fix the crash with > -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). > During VM initialization, current_stack_pointer() could be called > before the VM generates stub routines. The generated get_previous_sp > routine cannot be used during that time, use the estimated value for > the sp value instead. The x86 implementation is unaffected by the > change and always returns the estimated sp value as before. > > bug: https://bugs.openjdk.java.net/browse/JDK-8054008 > webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ > > Tested with JPRT and ExtBadJAR test. > > Background: > As part of the VM initialization, classLoader_init() calls ZIP_Open > from the zip library for processing the boot class path when > -XX:-LazyBootClassLoader is specified. The call path re-enters VM > before returning from the zip library call. Following is the backtrace > right before when the crash happens. The windows x64 version of > current_stack_pointer() uses generated stub routine get_previous_sp > (generated by generate_get_previous_sp()) to obtain the stack pointer > value. Since classLoader_init() happens before stubRoutines_init1() > and the stub routines are not generated at the time, the execution > jumps to address 0 (referenced by _get_previous_sp_entry which should > contain the address of the generated routine after > stubRoutines_init1()) when it's trying to call the stub routine and > crashes. > > > jvm.dll!os::current_stack_pointer() Line 468 C++ > jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ > jvm.dll!JVM_NativePath(char * path) Line 691 C++ > zip.dll!000007feebc49de0() > [Frames below may be incorrect and/or missing, no symbols loaded > for zip.dll] > zip.dll!000007feebc4af1d() > zip.dll!000007feebc4b004() > jvm.dll!ClassLoader::create_class_path_entry(const char * path, > const stat * st, bool lazy, bool throw_exception, Thread * > __the_thread__) Line 666 + 0x13 bytes C++ > jvm.dll!ClassLoader::update_class_path_entry_list(const char * > path, bool check_for_duplicates, bool throw_exception) Line 763 + 0x2d > bytes C++ > jvm.dll!ClassLoader::setup_search_path(const char * class_path) > Line 630 C++ > jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ > jvm.dll!ClassLoader::initialize() Line 1237 C++ > jvm.dll!classLoader_init() Line 1291 C++ > jvm.dll!init_globals() Line 100 C++ > jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * > canTryAgain) Line 3414 + 0x5 bytes C++ > jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * > args) Line 5199 + 0x12 bytes C++ > java.exe!000000013f0520f6() > java.exe!000000013f05cb63() > java.exe!000000013f05cbf7() > kernel32.dll!0000000076ba59ed() > ntdll.dll!0000000076cdc541() > > Thanks, > Jiangli > From jiangli.zhou at oracle.com Tue Nov 11 18:27:43 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Tue, 11 Nov 2014 10:27:43 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <54625400.1000701@oracle.com> References: <545C21E6.90709@oracle.com> <54625400.1000701@oracle.com> Message-ID: <5462551F.1010808@oracle.com> Thank you for the review, Lois! Jiangli On 11/11/2014 10:22 AM, Lois Foltan wrote: > Hi Jiangli, > Yes, this looks good, reviewed. > Lois > > On 11/6/2014 8:35 PM, Jiangli Zhou wrote: >> Hi, >> >> Please review the following changes that fix the crash with >> -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). >> During VM initialization, current_stack_pointer() could be called >> before the VM generates stub routines. The generated get_previous_sp >> routine cannot be used during that time, use the estimated value for >> the sp value instead. The x86 implementation is unaffected by the >> change and always returns the estimated sp value as before. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >> >> Tested with JPRT and ExtBadJAR test. >> >> Background: >> As part of the VM initialization, classLoader_init() calls ZIP_Open >> from the zip library for processing the boot class path when >> -XX:-LazyBootClassLoader is specified. The call path re-enters VM >> before returning from the zip library call. Following is the >> backtrace right before when the crash happens. The windows x64 >> version of current_stack_pointer() uses generated stub routine >> get_previous_sp (generated by generate_get_previous_sp()) to obtain >> the stack pointer value. Since classLoader_init() happens before >> stubRoutines_init1() and the stub routines are not generated at the >> time, the execution jumps to address 0 (referenced by >> _get_previous_sp_entry which should contain the address of the >> generated routine after stubRoutines_init1()) when it's trying to >> call the stub routine and crashes. >> >> >> jvm.dll!os::current_stack_pointer() Line 468 C++ >> jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ >> jvm.dll!JVM_NativePath(char * path) Line 691 C++ >> zip.dll!000007feebc49de0() >> [Frames below may be incorrect and/or missing, no symbols loaded >> for zip.dll] >> zip.dll!000007feebc4af1d() >> zip.dll!000007feebc4b004() >> jvm.dll!ClassLoader::create_class_path_entry(const char * path, >> const stat * st, bool lazy, bool throw_exception, Thread * >> __the_thread__) Line 666 + 0x13 bytes C++ >> jvm.dll!ClassLoader::update_class_path_entry_list(const char * >> path, bool check_for_duplicates, bool throw_exception) Line 763 + >> 0x2d bytes C++ >> jvm.dll!ClassLoader::setup_search_path(const char * class_path) >> Line 630 C++ >> jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ >> jvm.dll!ClassLoader::initialize() Line 1237 C++ >> jvm.dll!classLoader_init() Line 1291 C++ >> jvm.dll!init_globals() Line 100 C++ >> jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * >> canTryAgain) Line 3414 + 0x5 bytes C++ >> jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * >> args) Line 5199 + 0x12 bytes C++ >> java.exe!000000013f0520f6() >> java.exe!000000013f05cb63() >> java.exe!000000013f05cbf7() >> kernel32.dll!0000000076ba59ed() >> ntdll.dll!0000000076cdc541() >> >> Thanks, >> Jiangli >> > From jiangli.zhou at oracle.com Tue Nov 11 18:30:49 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Tue, 11 Nov 2014 10:30:49 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <54625400.1000701@oracle.com> References: <545C21E6.90709@oracle.com> <54625400.1000701@oracle.com> Message-ID: <546255D9.6030600@oracle.com> Hi Lois, Actually there was an updated webrev based on Roland's feedback. Since you are replying to the original request, not sure if you reviewed the latest webrev. If not, here is the link: http://cr.openjdk.java.net/~jiangli/8054008/webrev.02/. Thanks, Jiangli On 11/11/2014 10:22 AM, Lois Foltan wrote: > Hi Jiangli, > Yes, this looks good, reviewed. > Lois > > On 11/6/2014 8:35 PM, Jiangli Zhou wrote: >> Hi, >> >> Please review the following changes that fix the crash with >> -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). >> During VM initialization, current_stack_pointer() could be called >> before the VM generates stub routines. The generated get_previous_sp >> routine cannot be used during that time, use the estimated value for >> the sp value instead. The x86 implementation is unaffected by the >> change and always returns the estimated sp value as before. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >> >> Tested with JPRT and ExtBadJAR test. >> >> Background: >> As part of the VM initialization, classLoader_init() calls ZIP_Open >> from the zip library for processing the boot class path when >> -XX:-LazyBootClassLoader is specified. The call path re-enters VM >> before returning from the zip library call. Following is the >> backtrace right before when the crash happens. The windows x64 >> version of current_stack_pointer() uses generated stub routine >> get_previous_sp (generated by generate_get_previous_sp()) to obtain >> the stack pointer value. Since classLoader_init() happens before >> stubRoutines_init1() and the stub routines are not generated at the >> time, the execution jumps to address 0 (referenced by >> _get_previous_sp_entry which should contain the address of the >> generated routine after stubRoutines_init1()) when it's trying to >> call the stub routine and crashes. >> >> >> jvm.dll!os::current_stack_pointer() Line 468 C++ >> jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ >> jvm.dll!JVM_NativePath(char * path) Line 691 C++ >> zip.dll!000007feebc49de0() >> [Frames below may be incorrect and/or missing, no symbols loaded >> for zip.dll] >> zip.dll!000007feebc4af1d() >> zip.dll!000007feebc4b004() >> jvm.dll!ClassLoader::create_class_path_entry(const char * path, >> const stat * st, bool lazy, bool throw_exception, Thread * >> __the_thread__) Line 666 + 0x13 bytes C++ >> jvm.dll!ClassLoader::update_class_path_entry_list(const char * >> path, bool check_for_duplicates, bool throw_exception) Line 763 + >> 0x2d bytes C++ >> jvm.dll!ClassLoader::setup_search_path(const char * class_path) >> Line 630 C++ >> jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ >> jvm.dll!ClassLoader::initialize() Line 1237 C++ >> jvm.dll!classLoader_init() Line 1291 C++ >> jvm.dll!init_globals() Line 100 C++ >> jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * >> canTryAgain) Line 3414 + 0x5 bytes C++ >> jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * >> args) Line 5199 + 0x12 bytes C++ >> java.exe!000000013f0520f6() >> java.exe!000000013f05cb63() >> java.exe!000000013f05cbf7() >> kernel32.dll!0000000076ba59ed() >> ntdll.dll!0000000076cdc541() >> >> Thanks, >> Jiangli >> > From lois.foltan at oracle.com Tue Nov 11 18:42:30 2014 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 11 Nov 2014 13:42:30 -0500 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <546255D9.6030600@oracle.com> References: <545C21E6.90709@oracle.com> <54625400.1000701@oracle.com> <546255D9.6030600@oracle.com> Message-ID: <54625896.9000005@oracle.com> On 11/11/2014 1:30 PM, Jiangli Zhou wrote: > Hi Lois, > > Actually there was an updated webrev based on Roland's feedback. Since > you are replying to the original request, not sure if you reviewed the > latest webrev. If not, here is the link: > http://cr.openjdk.java.net/~jiangli/8054008/webrev.02/. My apologies, I did review .02 but responded to your first RFR request. Looks fine. Lois > > Thanks, > Jiangli > > On 11/11/2014 10:22 AM, Lois Foltan wrote: >> Hi Jiangli, >> Yes, this looks good, reviewed. >> Lois >> >> On 11/6/2014 8:35 PM, Jiangli Zhou wrote: >>> Hi, >>> >>> Please review the following changes that fix the crash with >>> -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). >>> During VM initialization, current_stack_pointer() could be called >>> before the VM generates stub routines. The generated get_previous_sp >>> routine cannot be used during that time, use the estimated value for >>> the sp value instead. The x86 implementation is unaffected by the >>> change and always returns the estimated sp value as before. >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >>> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >>> >>> Tested with JPRT and ExtBadJAR test. >>> >>> Background: >>> As part of the VM initialization, classLoader_init() calls ZIP_Open >>> from the zip library for processing the boot class path when >>> -XX:-LazyBootClassLoader is specified. The call path re-enters VM >>> before returning from the zip library call. Following is the >>> backtrace right before when the crash happens. The windows x64 >>> version of current_stack_pointer() uses generated stub routine >>> get_previous_sp (generated by generate_get_previous_sp()) to obtain >>> the stack pointer value. Since classLoader_init() happens before >>> stubRoutines_init1() and the stub routines are not generated at the >>> time, the execution jumps to address 0 (referenced by >>> _get_previous_sp_entry which should contain the address of the >>> generated routine after stubRoutines_init1()) when it's trying to >>> call the stub routine and crashes. >>> >>> >>> jvm.dll!os::current_stack_pointer() Line 468 C++ >>> jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ >>> jvm.dll!JVM_NativePath(char * path) Line 691 C++ >>> zip.dll!000007feebc49de0() >>> [Frames below may be incorrect and/or missing, no symbols >>> loaded for zip.dll] >>> zip.dll!000007feebc4af1d() >>> zip.dll!000007feebc4b004() >>> jvm.dll!ClassLoader::create_class_path_entry(const char * path, >>> const stat * st, bool lazy, bool throw_exception, Thread * >>> __the_thread__) Line 666 + 0x13 bytes C++ >>> jvm.dll!ClassLoader::update_class_path_entry_list(const char * >>> path, bool check_for_duplicates, bool throw_exception) Line 763 + >>> 0x2d bytes C++ >>> jvm.dll!ClassLoader::setup_search_path(const char * class_path) >>> Line 630 C++ >>> jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ >>> jvm.dll!ClassLoader::initialize() Line 1237 C++ >>> jvm.dll!classLoader_init() Line 1291 C++ >>> jvm.dll!init_globals() Line 100 C++ >>> jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * >>> canTryAgain) Line 3414 + 0x5 bytes C++ >>> jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * >>> args) Line 5199 + 0x12 bytes C++ >>> java.exe!000000013f0520f6() >>> java.exe!000000013f05cb63() >>> java.exe!000000013f05cbf7() >>> kernel32.dll!0000000076ba59ed() >>> ntdll.dll!0000000076cdc541() >>> >>> Thanks, >>> Jiangli >>> >> > From jiangli.zhou at oracle.com Tue Nov 11 18:44:10 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Tue, 11 Nov 2014 10:44:10 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <54625896.9000005@oracle.com> References: <545C21E6.90709@oracle.com> <54625400.1000701@oracle.com> <546255D9.6030600@oracle.com> <54625896.9000005@oracle.com> Message-ID: <546258FA.1060404@oracle.com> Ok. Thank you for confirming that! Jiangli On 11/11/2014 10:42 AM, Lois Foltan wrote: > > On 11/11/2014 1:30 PM, Jiangli Zhou wrote: >> Hi Lois, >> >> Actually there was an updated webrev based on Roland's feedback. >> Since you are replying to the original request, not sure if you >> reviewed the latest webrev. If not, here is the link: >> http://cr.openjdk.java.net/~jiangli/8054008/webrev.02/. > > My apologies, I did review .02 but responded to your first RFR > request. Looks fine. > Lois > >> >> Thanks, >> Jiangli >> >> On 11/11/2014 10:22 AM, Lois Foltan wrote: >>> Hi Jiangli, >>> Yes, this looks good, reviewed. >>> Lois >>> >>> On 11/6/2014 8:35 PM, Jiangli Zhou wrote: >>>> Hi, >>>> >>>> Please review the following changes that fix the crash with >>>> -XX:-LazyBootClassLoader on windows x64 platforms (fastdebug only). >>>> During VM initialization, current_stack_pointer() could be called >>>> before the VM generates stub routines. The generated >>>> get_previous_sp routine cannot be used during that time, use the >>>> estimated value for the sp value instead. The x86 implementation is >>>> unaffected by the change and always returns the estimated sp value >>>> as before. >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8054008 >>>> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev/ >>>> >>>> Tested with JPRT and ExtBadJAR test. >>>> >>>> Background: >>>> As part of the VM initialization, classLoader_init() calls ZIP_Open >>>> from the zip library for processing the boot class path when >>>> -XX:-LazyBootClassLoader is specified. The call path re-enters VM >>>> before returning from the zip library call. Following is the >>>> backtrace right before when the crash happens. The windows x64 >>>> version of current_stack_pointer() uses generated stub routine >>>> get_previous_sp (generated by generate_get_previous_sp()) to obtain >>>> the stack pointer value. Since classLoader_init() happens before >>>> stubRoutines_init1() and the stub routines are not generated at the >>>> time, the execution jumps to address 0 (referenced by >>>> _get_previous_sp_entry which should contain the address of the >>>> generated routine after stubRoutines_init1()) when it's trying to >>>> call the stub routine and crashes. >>>> >>>> >>>> jvm.dll!os::current_stack_pointer() Line 468 C++ >>>> jvm.dll!os::verify_stack_alignment() Line 638 + 0x5 bytes C++ >>>> jvm.dll!JVM_NativePath(char * path) Line 691 C++ >>>> zip.dll!000007feebc49de0() >>>> [Frames below may be incorrect and/or missing, no symbols >>>> loaded for zip.dll] >>>> zip.dll!000007feebc4af1d() >>>> zip.dll!000007feebc4b004() >>>> jvm.dll!ClassLoader::create_class_path_entry(const char * >>>> path, const stat * st, bool lazy, bool throw_exception, Thread * >>>> __the_thread__) Line 666 + 0x13 bytes C++ >>>> jvm.dll!ClassLoader::update_class_path_entry_list(const char * >>>> path, bool check_for_duplicates, bool throw_exception) Line 763 + >>>> 0x2d bytes C++ >>>> jvm.dll!ClassLoader::setup_search_path(const char * >>>> class_path) Line 630 C++ >>>> jvm.dll!ClassLoader::setup_bootstrap_search_path() Line 594 C++ >>>> jvm.dll!ClassLoader::initialize() Line 1237 C++ >>>> jvm.dll!classLoader_init() Line 1291 C++ >>>> jvm.dll!init_globals() Line 100 C++ >>>> jvm.dll!Threads::create_vm(JavaVMInitArgs * args, bool * >>>> canTryAgain) Line 3414 + 0x5 bytes C++ >>>> jvm.dll!JNI_CreateJavaVM(JavaVM_ * * vm, void * * penv, void * >>>> args) Line 5199 + 0x12 bytes C++ >>>> java.exe!000000013f0520f6() >>>> java.exe!000000013f05cb63() >>>> java.exe!000000013f05cbf7() >>>> kernel32.dll!0000000076ba59ed() >>>> ntdll.dll!0000000076cdc541() >>>> >>>> Thanks, >>>> Jiangli >>>> >>> >> > From david.r.chase at oracle.com Tue Nov 11 18:58:30 2014 From: david.r.chase at oracle.com (David Chase) Date: Tue, 11 Nov 2014 13:58:30 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <545E31BA.3070500@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> <545E31BA.3070500@gmail.com> Message-ID: <1D96462F-87D3-4794-A818-27DD2EE10046@oracle.com> On 2014-11-08, at 10:07 AM, Peter Levart wrote: > > Now let's take for example one of the MemberName.make() methods that return interned MemberNames: > > 206 public static MemberName make(Method m, boolean wantSpecial) { > 207 // Unreflected member names are resolved so intern them here. > 208 MemberName tmp0 = null; > 209 InternTransaction tx = new InternTransaction(m.getDeclaringClass()); > 210 while (tmp0 == null) { > 211 MemberName tmp = new MemberName(m, wantSpecial); > 212 tmp0 = tx.tryIntern(tmp); > 213 } > 214 return tmp0; > 215 } > > I'm trying to understand the workings of InternTransaction helper class (and find an example that breaks it). You create an instance of it, passing Method's declaringClass. You then (in retry loop) create a resolved MemberName from the Method and wantSpecial flag. This MemberName's clazz can apparently differ from Method's declaringClass. I don't know when and why this happens, but apparently it can (super method?), so in InternTransaction.tryIntern() you do... > > 363 if (member_name.isResolved()) { > 364 if (member_name.clazz != tx_class) { > 365 Class prev_tx_class = tx_class; > 366 int prev_txn_token = txn_token; > 367 tx_class = member_name.clazz; > 368 txn_token = internTxnToken(tx_class); > 369 // Zero is a special case. > 370 if (txn_token != 0 || > 371 prev_txn_token != internTxnToken(prev_tx_class)) { > 372 // Resolved class is different and at least one > 373 // redef of it occurred, therefore repeat with > 374 // proper class for race consistency checking. > 375 return null; > 376 } > 377 } > 378 member_name = member_name.intern(txn_token); > 379 if (member_name == null) { > 380 // Update the token for the next try. > 381 txn_token = internTxnToken(tx_class); > 382 } > 383 } > > > Now let's assume that the resolved member_name.clazz differs from Method's declaringClass. Let's assume also that either member_name.clazz has had at least one redefinition or Method's declaringClass has been redefined between creating InternTransaction and reading member_name.clazz's txn_token. You return 'null' in such case, concluding that not only the resolved member_name.clazz redefinition matters, but Method's declaringClass redefinition can also invalidate resolved MemberName am I right? It would be helpful if I could understand when and how Method's declaringClass redefinition can affect member_name. Can it affect which clazz is resolved for member_name? If a declaring class is redefined before a MemberName is ?published? to the VM, then there is a risk that its secret fields will have gone stale because the referenced VM methods changed but were not updated. Therefore, the resolution must be retried to get a fresh resolution that is known not to be stale. There is sort of a glitch in the race-checking protocol; I don?t have certain knowledge which class will be resolved, so if I guessed wrong (and the common-case no redefinition at all check fails) then I am forced to retry and get a fresh, known-good resolution. However, based on my understanding of what is (not) allowed in class redefinition, what differs after redefinition is only the code of the method, and not the owner ? that is, if D.m resolved to B.m before redefinition of D, C, or B, then it will always resolve to B.m ? but the definition of B.m itself might have changed (from the test cases, it might print ?foo? instead of ?bar?). Or to put it differently, the methods change, but their hierarchy does not. > Anyway, you return null in such case from an updated InternTransaction (tx_class and txn_token are now updated to have values for resolved member_name.clazz). In next round the checks of newly constructed and resolved member_name are not performed against Method's declaringClass but against previous round's member_name.clazz. Is this what is intended? > I can see there has to be a stop condition for loop to end, but shouldn't checks for Method's declaringClass redefinition be performed in every iteration (in addition to the check for member_name.clazz redefinition if it differs from Method's declaringClass)? To the best of my understanding (see restrictions above) the tx_class ought to be wrong at most once; all subsequent resolutions including those that span a class redefinition should return the same class, so it suffices to detect redefinition of the method itself. I?ve incorporated your other changes (not yet the linear-scan hash table) and will be retesting. One thing I wonder about for both hash table and binary search is if the first try should be attempted with no lock to avoid the overhead of synchronization; I expect that looking will be more common than interning, which in turn will be (vastly) more common than class redefinition. David From peter.levart at gmail.com Tue Nov 11 19:30:10 2014 From: peter.levart at gmail.com (Peter Levart) Date: Tue, 11 Nov 2014 20:30:10 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <1D96462F-87D3-4794-A818-27DD2EE10046@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> <545E31BA.3070500@gmail.com> <1D96462F-87D3-4794-A818-27DD2EE10046@oracle.com> Message-ID: <546263C2.1060508@gmail.com> On 11/11/2014 07:58 PM, David Chase wrote: > I?ve incorporated your other changes (not yet the linear-scan hash table) and will be retesting. > One thing I wonder about for both hash table and binary search is if the first try should be attempted with no lock to avoid the overhead of synchronization; I expect that looking will be more common than interning, which in turn will be (vastly) more common than class redefinition. Hi David, Yes, that's why I implemented the hash table in a way where lookups are lock-free. Binary-search would be trickier to implement without locking, but maybe not impossible. Surely not with Arrays.binarySearch() but perhaps with a separate implementation. The problem with Arrays.binarySearch is that it returns an index. By the time you retrieve the element at that index, it can already move. I'm also not sure that "careful" concurrent insertion of new element would not break the correctness of binary search. But there is another way I showed before: using StampedLock. It is a kind of optimistic/pessimistic read-write lock. Its beauty is in that optimistic read part is almost free (just a volatile read at start and a readFence followed by another volatile read at the end). You just have to be sure that the algorithm guarded by an optimistic read lock terminates normally (that it doesn't spin in an endless loop or throw exceptions) in the presence of arbitrary concurrent modifications of looked-up state. Well, binary search is such an algorithm. Regards, Peter > David From aleksey.shipilev at oracle.com Tue Nov 11 19:35:24 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Nov 2014 22:35:24 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> Message-ID: <546264FC.2060001@oracle.com> On 11/11/2014 07:10 PM, Chris Hegarty wrote: > On 11 Nov 2014, at 14:40, Aleksey Shipilev wrote: >> Updated webrevs: >> http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ > > Looks good. > >> http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ > > I skimmed this webrev, and it also looks fine to me. > > -Chris. Thanks Chris! -Aleksey. From karen.kinnear at oracle.com Tue Nov 11 19:35:45 2014 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 11 Nov 2014 14:35:45 -0500 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <54621566.9040805@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> <54591A3A.1090005@oracle.com> <545C2BC0.3080207@oracle.com> <5461C282.1020806@oracle.com> <54621566.9040805@oracle.com> Message-ID: <01FABD3E-D846-48F9-BDF9-F1AD3CA01090@oracle.com> Dan, Code looks good. I like your choices of changes to pick up. Couple of minor questions/comments: 1. synchronizer.cpp: What does TLE stand for? 2. in macrosAssembler_x86.cpp - mind keeping the comment about // Without cat to int32_t a movptr will destroy R10 which is typically obj thanks, Karen p.s. I've forgotten - is the fast_notify in a different bucket? On Nov 11, 2014, at 8:55 AM, Daniel D. Daugherty wrote: > On 11/11/14 1:02 AM, David Holmes wrote: >> On 7/11/2014 12:17 PM, Daniel D. Daugherty wrote: >>> The fix for JDK-8062851 has been reviewed, tested and pushed to >>> RT_Baseline. >>> >>> Time to get back to this review thread so here's an updated webrev: >>> >>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/ >>> >>> David H., I believe I've addressed all of your comments. Please >>> let me know if I missed something... >> >> Looks good to me - thanks Dan! > > Thanks for the re-review! > > Dan > > >> >> David >> ----- >> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan >>> >>> >>> On 11/4/14 11:26 AM, Daniel D. Daugherty wrote: >>>> The cleanup is turning into a bigger change than the fast enter >>>> bucket itself so I'm spinning the cleanup into a new bug: >>>> >>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>> >>>> Yes, this means that the Contended Locking cleanup bucket has reopened >>>> for yet another change... >>>> >>>> We'll get back to "fast enter" after the dust has settled... >>>> >>>> Dan >>>> >>>> >>>> On 11/3/14 6:59 PM, Daniel D. Daugherty wrote: >>>>> David, >>>>> >>>>> Thanks for the review! As usual, replies are embedded below... >>>>> >>>>> >>>>> On 11/2/14 9:44 PM, David Holmes wrote: >>>>>> Hi Dan, >>>>>> >>>>>> Looks good. >>>>> >>>>> Thanks! >>>>> >>>>> >>>>>> Couple of nits and one semantic query below ... >>>>>> >>>>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp >>>>>> >>>>>> Formatting changes were a bit of a distraction. >>>>> >>>>> Yes, I have no idea what got into me. Normally I do formatting >>>>> changes separately so the noise does not distract... >>>>> >>>>> It turns out there is a constant defined that should be used >>>>> instead of all these literal '2's: >>>>> >>>>> src/share/vm/oops/markOop.hpp: monitor_value = 2 >>>>> >>>>> Typically used as follows: >>>>> >>>>> src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = >>>>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; >>>>> >>>>> I will clean this up just for the files that I'm touching as >>>>> part of this fix. >>>>> >>>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp >>>>>> >>>>>> Formatting changes were a bit of a distraction. >>>>> >>>>> Same reply as for macroAssembler_sparc.cpp. >>>>> >>>>> >>>>>> 1929 // unconditionally set stackBox->_displaced_header = 3 >>>>>> 1930 movptr(Address(boxReg, 0), >>>>>> (int32_t)intptr_t(markOopDesc::unused_mark())); >>>>>> >>>>>> At 1870 we refer to box rather than stackBox. Also it takes some >>>>>> sleuthing to realize that "3" here is somehow a pseudonym for >>>>>> unused_mark(). Back up at 1808 we have a to-do: >>>>>> >>>>>> 1808 // use markOop::unused_mark() instead of "3". >>>>>> >>>>>> so the current change seems to be implementing that, even though >>>>>> other uses of "3" are left untouched. >>>>> >>>>> I'll take a look at cleaning those up also... >>>>> >>>>> In some cases markOopDesc::marked_value will work for the literal '3', >>>>> but in other cases we'll use markOop::unused_mark(): >>>>> >>>>> static markOop unused_mark() { >>>>> return (markOop) marked_value; >>>>> } >>>>> >>>>> to save us the noise of the (markOop) cast. >>>>> >>>>> >>>>>> --- >>>>>> >>>>>> src/share/vm/runtime/sharedRuntime.cpp >>>>>> >>>>>> 1794 JRT_BLOCK_ENTRY(void, >>>>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >>>>>> lock, JavaThread* thread)) >>>>>> 1795 if (!SafepointSynchronize::is_synchronizing()) { >>>>>> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >>>>>> return; >>>>>> >>>>>> Is it necessary to check is_synchronizing? If we are executing this >>>>>> code we are not at a safepoint and the quick_enter wont change that, >>>>>> so I'm not sure what we are guarding against. >>>>> >>>>> So this first state checker: >>>>> >>>>> src/share/vm/runtime/safepoint.hpp: >>>>> inline static bool is_synchronizing() { return _state == >>>>> _synchronizing; } >>>>> >>>>> means that we want to go to a safepoint and: >>>>> >>>>> inline static bool is_at_safepoint() { return _state == >>>>> _synchronized; } >>>>> >>>>> means that we are at a safepoint. Dice's optimization bails out if >>>>> we want to go to a safepoint and ObjectSynchronizer::quick_enter() >>>>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that >>>>> code to be quick (and not go to a safepoint). I'm not seeing >>>>> anything obvious.... >>>>> >>>>> Sometimes we have to be careful with JavaThread suspend requests and >>>>> monitor acquisition, but I don't think that's a problem here... In >>>>> order for the "suspend requesting" thread to be surprised, the suspend >>>>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then >>>>> the suspend target has do something unexpected like acquire a monitor >>>>> that it was previously blocked upon when it was suspended. We've had >>>>> bugs like that in the past... In this optimization case, our target >>>>> thread is not blocked on a contended monitor... >>>>> >>>>> In this particular case, the "suspend requesting" thread will set the >>>>> suspend request state on the target thread, but the target thread is >>>>> busy trying to enter this uncontended monitor (quickly). So the >>>>> "suspend requesting" thread, will request a no-op safepoint, but it >>>>> won't return from the suspend API until that safepoint completes. >>>>> The safepoint won't complete until the target thread is done acquiring >>>>> the previously uncontended monitor... so the target thread will be >>>>> suspended while holding the previous uncontended monitor and the >>>>> "suspend requesting" thread will return from the suspend API all >>>>> happy... >>>>> >>>>> Well, I don't see the reason either so I'll have to ping Dave Dice >>>>> and Karen Kinnear to see if either of them can fill in the history >>>>> here. This could be an abundance of caution case. >>>>> >>>>> >>>>>> --- >>>>>> >>>>>> src/share/vm/runtime/synchronizer.cpp >>>>>> >>>>>> Minor nit: line 153 the usual acronym is NPE (for >>>>>> NullPointerException) not NPX >>>>> >>>>> I'll do a search for uses of NPX and other uses of 'X' in exception >>>>> acronyms... >>>>> >>>>> >>>>>> >>>>>> Nit: 159 Thread * const ox >>>>>> >>>>>> Please change ox to owner. >>>>> >>>>> Will do. >>>>> >>>>> Thanks again for the review! >>>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> --- >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>>> >>>>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> I have the Contended Locking fast enter bucket ready for review. >>>>>>> >>>>>>> The code changes in this bucket are primarily a quick_enter() >>>>>>> function that works on inflated but uncontended Java monitors. >>>>>>> This quick_enter() function is used on the "slow path" for Java >>>>>>> Monitor enter operations when the built-in "fast path" (read >>>>>>> assembly code) doesn't work. >>>>>>> >>>>>>> This work is being tracked by the following bug ID: >>>>>>> >>>>>>> JDK-8061553 Contended Locking fast enter bucket >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>>>>>> >>>>>>> Here is the webrev URL: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>>>>>> >>>>>>> Here is the JEP link: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>>> >>>>>>> 8061553 summary of changes: >>>>>>> >>>>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>>>>>> >>>>>>> - clean up spacing around some >>>>>>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>>>>>> - remove optional (EmitSync & 64) code >>>>>>> - change from cmp() to andcc() so icc.zf flag is set >>>>>>> >>>>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>>>>>> >>>>>>> - remove optional (EmitSync & 2) code >>>>>>> - rewrite LP64 inflated lock code that tries to CAS in >>>>>>> the new owner value to be more efficient >>>>>>> >>>>>>> interfaceSupport.hpp: >>>>>>> >>>>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>>>>>> JRT_BLOCK_ENTRY into two pieces. >>>>>>> >>>>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>>>>>> >>>>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>>>>>> to permit ObjectSynchronizer::quick_enter() call >>>>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>>>>>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>>>>>> >>>>>>> synchronizer.[ch]pp: >>>>>>> >>>>>>> - add ObjectSynchronizer::quick_enter() for entering an >>>>>>> inflated but unowned Java monitor without thread state >>>>>>> changes >>>>>>> >>>>>>> Testing: >>>>>>> >>>>>>> - Aurora Adhoc RT/SVC baseline batch >>>>>>> - JPRT test jobs >>>>>>> - MonitorEnterStresser micro-benchmark (in process) >>>>>>> - CallTimerGrid stress testing (in process) >>>>>>> - Aurora performance testing: >>>>>>> - out of the box for the "promotion" and 32-bit server configs >>>>>>> - heavy weight monitors for the "promotion" and 32-bit server >>>>>>> configs >>>>>>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>>>>>> (in process) >>>>>>> >>>>>>> >>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>> >>>>>>> Dan >>>>> >>>>> >>>> >>>> >>>> >>> > From staffan.larsen at oracle.com Tue Nov 11 20:38:15 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Tue, 11 Nov 2014 21:38:15 +0100 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <54621FFA.2070503@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> Message-ID: SA changes look good. /Staffan > On 11 nov 2014, at 15:40, Aleksey Shipilev wrote: > > Hi, > > On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: >> Thread.getName() returns String, and does new String instantiation every >> time, because the thread name is stored in char[]. Even though we use a >> private String constructor that shares the char[] array without copying >> it, this still hurts some use cases (think extra-fast logging). To the >> extent some people actually maintain Map to avoid it. >> https://bugs.openjdk.java.net/browse/JDK-8059677 >> >> Here's the attempt to maintain String instead of char[]: >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ > > Updated webrevs: > http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ > http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ > > This version incorporates feedbacks from Chris, Staffan and David. I > think it is very close to what we would like to push. Opinions? > > Testing: JPRT, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ > jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist, > vm.tmtools.testlist > > Thanks, > -Aleksey. > > > > From daniel.daugherty at oracle.com Tue Nov 11 21:23:06 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Nov 2014 14:23:06 -0700 Subject: RFR(S) Contended Locking fast enter bucket (8061553) In-Reply-To: <01FABD3E-D846-48F9-BDF9-F1AD3CA01090@oracle.com> References: <5452C0B4.4070601@oracle.com> <5457084B.6070808@oracle.com> <5458330E.1080207@oracle.com> <54591A3A.1090005@oracle.com> <545C2BC0.3080207@oracle.com> <5461C282.1020806@oracle.com> <54621566.9040805@oracle.com> <01FABD3E-D846-48F9-BDF9-F1AD3CA01090@oracle.com> Message-ID: <54627E3A.7050306@oracle.com> Thanks for the review! As usual, replies embedded below... On 11/11/14 12:35 PM, Karen Kinnear wrote: > Dan, > > Code looks good. Thanks! However, it is yours and Dice's code with a few tweaks from my brain... This bucket will also have a triple contributed by entry... > I like your choices of changes to pick up. Thanks! This bucket was fairly easy to sift/tease out... > Couple of minor questions/comments: > > 1. synchronizer.cpp: What does TLE stand for? Transactional Lock Elision is my guess. If Dice confirms, then I'll make sure the first use has it spelled out... Dave likes his TLAs! > 2. in macrosAssembler_x86.cpp - mind keeping the comment about // Without cat to int32_t a movptr will destroy R10 which is typically obj Yes, I kept looking at that and wondering why the comments was removed... I'll put it back... > thanks, > Karen > > p.s. I've forgotten - is the fast_notify in a different bucket? fast_enter is optimization #3, bucket #7 fast_exit is optimization #4, bucket #8 fast_notify is optimization #5, bucket #2 Dan > > On Nov 11, 2014, at 8:55 AM, Daniel D. Daugherty wrote: > >> On 11/11/14 1:02 AM, David Holmes wrote: >>> On 7/11/2014 12:17 PM, Daniel D. Daugherty wrote: >>>> The fix for JDK-8062851 has been reviewed, tested and pushed to >>>> RT_Baseline. >>>> >>>> Time to get back to this review thread so here's an updated webrev: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/ >>>> >>>> David H., I believe I've addressed all of your comments. Please >>>> let me know if I missed something... >>> Looks good to me - thanks Dan! >> Thanks for the re-review! >> >> Dan >> >> >>> David >>> ----- >>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan >>>> >>>> >>>> On 11/4/14 11:26 AM, Daniel D. Daugherty wrote: >>>>> The cleanup is turning into a bigger change than the fast enter >>>>> bucket itself so I'm spinning the cleanup into a new bug: >>>>> >>>>> JDK-8062851 cleanup ObjectMonitor offset adjustments >>>>> https://bugs.openjdk.java.net/browse/JDK-8062851 >>>>> >>>>> Yes, this means that the Contended Locking cleanup bucket has reopened >>>>> for yet another change... >>>>> >>>>> We'll get back to "fast enter" after the dust has settled... >>>>> >>>>> Dan >>>>> >>>>> >>>>> On 11/3/14 6:59 PM, Daniel D. Daugherty wrote: >>>>>> David, >>>>>> >>>>>> Thanks for the review! As usual, replies are embedded below... >>>>>> >>>>>> >>>>>> On 11/2/14 9:44 PM, David Holmes wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> Looks good. >>>>>> Thanks! >>>>>> >>>>>> >>>>>>> Couple of nits and one semantic query below ... >>>>>>> >>>>>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp >>>>>>> >>>>>>> Formatting changes were a bit of a distraction. >>>>>> Yes, I have no idea what got into me. Normally I do formatting >>>>>> changes separately so the noise does not distract... >>>>>> >>>>>> It turns out there is a constant defined that should be used >>>>>> instead of all these literal '2's: >>>>>> >>>>>> src/share/vm/oops/markOop.hpp: monitor_value = 2 >>>>>> >>>>>> Typically used as follows: >>>>>> >>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp: int owner_offset = >>>>>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value; >>>>>> >>>>>> I will clean this up just for the files that I'm touching as >>>>>> part of this fix. >>>>>> >>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp >>>>>>> >>>>>>> Formatting changes were a bit of a distraction. >>>>>> Same reply as for macroAssembler_sparc.cpp. >>>>>> >>>>>> >>>>>>> 1929 // unconditionally set stackBox->_displaced_header = 3 >>>>>>> 1930 movptr(Address(boxReg, 0), >>>>>>> (int32_t)intptr_t(markOopDesc::unused_mark())); >>>>>>> >>>>>>> At 1870 we refer to box rather than stackBox. Also it takes some >>>>>>> sleuthing to realize that "3" here is somehow a pseudonym for >>>>>>> unused_mark(). Back up at 1808 we have a to-do: >>>>>>> >>>>>>> 1808 // use markOop::unused_mark() instead of "3". >>>>>>> >>>>>>> so the current change seems to be implementing that, even though >>>>>>> other uses of "3" are left untouched. >>>>>> I'll take a look at cleaning those up also... >>>>>> >>>>>> In some cases markOopDesc::marked_value will work for the literal '3', >>>>>> but in other cases we'll use markOop::unused_mark(): >>>>>> >>>>>> static markOop unused_mark() { >>>>>> return (markOop) marked_value; >>>>>> } >>>>>> >>>>>> to save us the noise of the (markOop) cast. >>>>>> >>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/share/vm/runtime/sharedRuntime.cpp >>>>>>> >>>>>>> 1794 JRT_BLOCK_ENTRY(void, >>>>>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* >>>>>>> lock, JavaThread* thread)) >>>>>>> 1795 if (!SafepointSynchronize::is_synchronizing()) { >>>>>>> 1796 if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) >>>>>>> return; >>>>>>> >>>>>>> Is it necessary to check is_synchronizing? If we are executing this >>>>>>> code we are not at a safepoint and the quick_enter wont change that, >>>>>>> so I'm not sure what we are guarding against. >>>>>> So this first state checker: >>>>>> >>>>>> src/share/vm/runtime/safepoint.hpp: >>>>>> inline static bool is_synchronizing() { return _state == >>>>>> _synchronizing; } >>>>>> >>>>>> means that we want to go to a safepoint and: >>>>>> >>>>>> inline static bool is_at_safepoint() { return _state == >>>>>> _synchronized; } >>>>>> >>>>>> means that we are at a safepoint. Dice's optimization bails out if >>>>>> we want to go to a safepoint and ObjectSynchronizer::quick_enter() >>>>>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that >>>>>> code to be quick (and not go to a safepoint). I'm not seeing >>>>>> anything obvious.... >>>>>> >>>>>> Sometimes we have to be careful with JavaThread suspend requests and >>>>>> monitor acquisition, but I don't think that's a problem here... In >>>>>> order for the "suspend requesting" thread to be surprised, the suspend >>>>>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then >>>>>> the suspend target has do something unexpected like acquire a monitor >>>>>> that it was previously blocked upon when it was suspended. We've had >>>>>> bugs like that in the past... In this optimization case, our target >>>>>> thread is not blocked on a contended monitor... >>>>>> >>>>>> In this particular case, the "suspend requesting" thread will set the >>>>>> suspend request state on the target thread, but the target thread is >>>>>> busy trying to enter this uncontended monitor (quickly). So the >>>>>> "suspend requesting" thread, will request a no-op safepoint, but it >>>>>> won't return from the suspend API until that safepoint completes. >>>>>> The safepoint won't complete until the target thread is done acquiring >>>>>> the previously uncontended monitor... so the target thread will be >>>>>> suspended while holding the previous uncontended monitor and the >>>>>> "suspend requesting" thread will return from the suspend API all >>>>>> happy... >>>>>> >>>>>> Well, I don't see the reason either so I'll have to ping Dave Dice >>>>>> and Karen Kinnear to see if either of them can fill in the history >>>>>> here. This could be an abundance of caution case. >>>>>> >>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/share/vm/runtime/synchronizer.cpp >>>>>>> >>>>>>> Minor nit: line 153 the usual acronym is NPE (for >>>>>>> NullPointerException) not NPX >>>>>> I'll do a search for uses of NPX and other uses of 'X' in exception >>>>>> acronyms... >>>>>> >>>>>> >>>>>>> Nit: 159 Thread * const ox >>>>>>> >>>>>>> Please change ox to owner. >>>>>> Will do. >>>>>> >>>>>> Thanks again for the review! >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> --- >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> I have the Contended Locking fast enter bucket ready for review. >>>>>>>> >>>>>>>> The code changes in this bucket are primarily a quick_enter() >>>>>>>> function that works on inflated but uncontended Java monitors. >>>>>>>> This quick_enter() function is used on the "slow path" for Java >>>>>>>> Monitor enter operations when the built-in "fast path" (read >>>>>>>> assembly code) doesn't work. >>>>>>>> >>>>>>>> This work is being tracked by the following bug ID: >>>>>>>> >>>>>>>> JDK-8061553 Contended Locking fast enter bucket >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8061553 >>>>>>>> >>>>>>>> Here is the webrev URL: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/ >>>>>>>> >>>>>>>> Here is the JEP link: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133 >>>>>>>> >>>>>>>> 8061553 summary of changes: >>>>>>>> >>>>>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object() >>>>>>>> >>>>>>>> - clean up spacing around some >>>>>>>> 'ObjectMonitor::owner_offset_in_bytes() - 2' uses >>>>>>>> - remove optional (EmitSync & 64) code >>>>>>>> - change from cmp() to andcc() so icc.zf flag is set >>>>>>>> >>>>>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock() >>>>>>>> >>>>>>>> - remove optional (EmitSync & 2) code >>>>>>>> - rewrite LP64 inflated lock code that tries to CAS in >>>>>>>> the new owner value to be more efficient >>>>>>>> >>>>>>>> interfaceSupport.hpp: >>>>>>>> >>>>>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a >>>>>>>> JRT_BLOCK_ENTRY into two pieces. >>>>>>>> >>>>>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C() >>>>>>>> >>>>>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY >>>>>>>> to permit ObjectSynchronizer::quick_enter() call >>>>>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work >>>>>>>> to revert to JRT_ENTRY_NO_ASYNC-like semantics >>>>>>>> >>>>>>>> synchronizer.[ch]pp: >>>>>>>> >>>>>>>> - add ObjectSynchronizer::quick_enter() for entering an >>>>>>>> inflated but unowned Java monitor without thread state >>>>>>>> changes >>>>>>>> >>>>>>>> Testing: >>>>>>>> >>>>>>>> - Aurora Adhoc RT/SVC baseline batch >>>>>>>> - JPRT test jobs >>>>>>>> - MonitorEnterStresser micro-benchmark (in process) >>>>>>>> - CallTimerGrid stress testing (in process) >>>>>>>> - Aurora performance testing: >>>>>>>> - out of the box for the "promotion" and 32-bit server configs >>>>>>>> - heavy weight monitors for the "promotion" and 32-bit server >>>>>>>> configs >>>>>>>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors) >>>>>>>> (in process) >>>>>>>> >>>>>>>> >>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>> >>>>>>>> Dan >>>>>> >>>>> >>>>> From serguei.spitsyn at oracle.com Tue Nov 11 22:04:29 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 11 Nov 2014 14:04:29 -0800 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <546151A9.1080100@oracle.com> References: <546151A9.1080100@oracle.com> Message-ID: <546287ED.9050708@oracle.com> Dan, The fix looks good. Nice cleanup from workarounds: Good Thing (TM)! :) Thanks, Serguei On 11/10/14 4:00 PM, Daniel D. Daugherty wrote: > Greetings, > > I have a Solaris Full Debug Symbols (FDS) fix ready for review. > Yes, it is a small fix, but it is in Makefiles so feel free to > run screaming from the room... :-) On the plus side the fix does > delete two work around source files (Coleen would say that's a > Good Thing (TM)!) > > The fix is to detect the version of GNU objcopy that is being > used on the machine and only enable Full Debug Symbols when that > version is 2.21.1 or newer. If you don't have the right version, > then the build drops back to pre-FDS build configs with a message > like this: > > WARNING: /usr/sfw/bin/gobjcopy --version info: > WARNING: GNU objcopy 2.15 > WARNING: an objcopy version of 2.21.1 or newer is needed to create > valid .debuginfo files. > WARNING: ignoring above objcopy command. > WARNING: patch 149063-01 or newer contains the correct Solaris 10 > SPARC version. > WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 > version. > WARNING: Solaris 11 Update 1 contains the correct version. > INFO: no objcopy cmd found so cannot create .debuginfo files. > INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 > > This work is being tracked by the following bug IDs: > > JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC > https://bugs.openjdk.java.net/browse/JDK-8033602 > > JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on > Solaris X86 > https://bugs.openjdk.java.net/browse/JDK-8034005 > > Here is the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ > > Testing: > > - JPRT test jobs to verify that the current JPRT Solaris hosts > are happy > - local builds on my Solaris 10 X86 machine to verify that the > wrong version of GNU objcopy is caught > > Thanks, in advance, for any comments, questions or suggestions. > > Dan From daniel.daugherty at oracle.com Tue Nov 11 23:31:42 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Nov 2014 16:31:42 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <546287ED.9050708@oracle.com> References: <546151A9.1080100@oracle.com> <546287ED.9050708@oracle.com> Message-ID: <54629C5E.8080305@oracle.com> Thanks for the review! On 11/11/14 3:04 PM, serguei.spitsyn at oracle.com wrote: > Dan, > > The fix looks good. Thanks! > Nice cleanup from workarounds: Good Thing (TM)! :) Yes, this has been in the queue for quite a while... :-) Dan > > Thanks, > Serguei > > On 11/10/14 4:00 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >> Yes, it is a small fix, but it is in Makefiles so feel free to >> run screaming from the room... :-) On the plus side the fix does >> delete two work around source files (Coleen would say that's a >> Good Thing (TM)!) >> >> The fix is to detect the version of GNU objcopy that is being >> used on the machine and only enable Full Debug Symbols when that >> version is 2.21.1 or newer. If you don't have the right version, >> then the build drops back to pre-FDS build configs with a message >> like this: >> >> WARNING: /usr/sfw/bin/gobjcopy --version info: >> WARNING: GNU objcopy 2.15 >> WARNING: an objcopy version of 2.21.1 or newer is needed to create >> valid .debuginfo files. >> WARNING: ignoring above objcopy command. >> WARNING: patch 149063-01 or newer contains the correct Solaris 10 >> SPARC version. >> WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 >> version. >> WARNING: Solaris 11 Update 1 contains the correct version. >> INFO: no objcopy cmd found so cannot create .debuginfo files. >> INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 >> >> This work is being tracked by the following bug IDs: >> >> JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC >> https://bugs.openjdk.java.net/browse/JDK-8033602 >> >> JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on >> Solaris X86 >> https://bugs.openjdk.java.net/browse/JDK-8034005 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ >> >> Testing: >> >> - JPRT test jobs to verify that the current JPRT Solaris hosts >> are happy >> - local builds on my Solaris 10 X86 machine to verify that the >> wrong version of GNU objcopy is caught >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan > From david.holmes at oracle.com Wed Nov 12 04:48:53 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Nov 2014 14:48:53 +1000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <54621FFA.2070503@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> Message-ID: <5462E6B5.7080504@oracle.com> On 12/11/2014 12:40 AM, Aleksey Shipilev wrote: > Hi, > > On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: >> Thread.getName() returns String, and does new String instantiation every >> time, because the thread name is stored in char[]. Even though we use a >> private String constructor that shares the char[] array without copying >> it, this still hurts some use cases (think extra-fast logging). To the >> extent some people actually maintain Map to avoid it. >> https://bugs.openjdk.java.net/browse/JDK-8059677 >> >> Here's the attempt to maintain String instead of char[]: >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ > > Updated webrevs: > http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ > http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ > > This version incorporates feedbacks from Chris, Staffan and David. I > think it is very close to what we would like to push. Opinions? All looks good to me. But I also noticed this strange (to me) assertion in javaClasses.cpp void java_lang_Thread::set_name(oop java_thread, oop name) { assert(java_thread->obj_field(_name_offset) == NULL, "name should be NULL"); java_thread->obj_field_put(_name_offset, name); } and on investigation it seems like this is dead code - I couldn't locate a call to java_lang_Thread::set_name ?? It would only be usable on an attaching thread (else name can't be null) and we pass the name to the Thread constructor in that case. Cheers, David > Testing: JPRT, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ > jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist, > vm.tmtools.testlist > > Thanks, > -Aleksey. > > > > From david.holmes at oracle.com Wed Nov 12 08:04:42 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Nov 2014 18:04:42 +1000 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> Message-ID: <5463149A.6020506@oracle.com> Hi Gunter, On 11/11/2014 11:23 PM, Haug, Gunter wrote: > Hi All, > > The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs improvement)' makes use of getrusage() to retrieve accurate per-thread data on resource usage. We can use exactly the same code on AIX to achieve this. > > Please review the following change: > > http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8064471 I have a couple of comments on this code which presumably also apply to the orginal :( First this comment is no longer applicable (actually it was never applicable to AIX!): // For now, we say that linux does not support vtime. I have no idea // whether it can actually be made to (DLD, 9/13/05). Second this calculation seems wrong: return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + (double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000 * 1000); To me this performs integer division (ie truncation_) then converts the resulting integer to a double. I would expect to see additional parentheses (even if not needed, for clarity): return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + ((double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec)) / (1000 * 1000); or more simply divide by a floating-point value: return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); and you don't need two double casts regardless as the expression will be of type double as soon as there is one operand of type double. So that should reduce to: return usage.ru_utime.tv_sec + usage.ru_stime.tv_sec + (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); Cheers, David > Thanks, > Gunter > From roland.westrelin at oracle.com Wed Nov 12 09:55:21 2014 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 12 Nov 2014 10:55:21 +0100 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <545D4BDE.9010908@oracle.com> References: <545C21E6.90709@oracle.com> <682E3725-DDA0-4A1D-9F6E-94C6CF6045A1@oracle.com> <545D0167.3070903@oracle.com> <545D4BDE.9010908@oracle.com> Message-ID: <10933301-E9F4-4BDF-B678-50FE846873BD@oracle.com> > http://cr.openjdk.java.net/~jiangli/8054008/webrev.02/ It looks good to me. Roland. From aleksey.shipilev at oracle.com Wed Nov 12 10:18:41 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Nov 2014 13:18:41 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> Message-ID: <54633401.6040208@oracle.com> Thanks Staffan! -Aleksey. On 11.11.2014 23:38, Staffan Larsen wrote: > SA changes look good. > > /Staffan > >> On 11 nov 2014, at 15:40, Aleksey Shipilev wrote: >> >> Hi, >> >> On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: >>> Thread.getName() returns String, and does new String instantiation every >>> time, because the thread name is stored in char[]. Even though we use a >>> private String constructor that shares the char[] array without copying >>> it, this still hurts some use cases (think extra-fast logging). To the >>> extent some people actually maintain Map to avoid it. >>> https://bugs.openjdk.java.net/browse/JDK-8059677 >>> >>> Here's the attempt to maintain String instead of char[]: >>> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >>> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ >> >> Updated webrevs: >> http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ >> >> This version incorporates feedbacks from Chris, Staffan and David. I >> think it is very close to what we would like to push. Opinions? >> >> Testing: JPRT, jdk/test/java/lang/Thread jtreg, hotspot/test/runtime/ >> jtreg, vm.quick.testlist, nsk.jvmti.testlist, svc.quick.testlist, >> vm.tmtools.testlist >> >> Thanks, >> -Aleksey. >> >> >> >> > From aleksey.shipilev at oracle.com Wed Nov 12 10:23:32 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Nov 2014 13:23:32 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <5462E6B5.7080504@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> <5462E6B5.7080504@oracle.com> Message-ID: <54633524.8000607@oracle.com> Hi David, On 12.11.2014 07:48, David Holmes wrote: > On 12/11/2014 12:40 AM, Aleksey Shipilev wrote: > All looks good to me. Thanks for the review! > But I also noticed this strange (to me) assertion in javaClasses.cpp > > void java_lang_Thread::set_name(oop java_thread, oop name) { > assert(java_thread->obj_field(_name_offset) == NULL, "name should be > NULL"); > java_thread->obj_field_put(_name_offset, name); > } > > and on investigation it seems like this is dead code - I couldn't locate > a call to java_lang_Thread::set_name ?? It would only be usable on an > attaching thread (else name can't be null) and we pass the name to the > Thread constructor in that case. set_name is not used, as I mentioned earlier -- that makes the change even more "safe". I was even tempted to drop the setter completely, but it would break the symmetry against other setters and getters. I dropped the assert at set_name in this update: http://cr.openjdk.java.net/~shade/8059677/webrev.03.hs/ http://cr.openjdk.java.net/~shade/8059677/webrev.03.jdk/ The only difference against the previous version is the dropped assert, so I haven't re-spinned the tests. Thanks, -Aleksey. From gunter.haug at sap.com Wed Nov 12 15:19:54 2014 From: gunter.haug at sap.com (Haug, Gunter) Date: Wed, 12 Nov 2014 16:19:54 +0100 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <5463149A.6020506@oracle.com> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> <5463149A.6020506@oracle.com> Message-ID: <54637A9A.9040108@sap.com> On 12.11.2014 09:04, David Holmes wrote: > Hi Gunter, > > On 11/11/2014 11:23 PM, Haug, Gunter wrote: >> Hi All, >> >> The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs >> improvement)' makes use of getrusage() to retrieve accurate >> per-thread data on resource usage. We can use exactly the same code >> on AIX to achieve this. >> >> Please review the following change: >> >> http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8064471 > > I have a couple of comments on this code which presumably also apply > to the orginal :( Yes, they apply to the original as well, see below. > > First this comment is no longer applicable (actually it was never > applicable to AIX!): > > // For now, we say that linux does not support vtime. I have no idea > // whether it can actually be made to (DLD, 9/13/05). > You're right. I will remove it. > Second this calculation seems wrong: > > return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + > (double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000 * > 1000); > > To me this performs integer division (ie truncation_) then converts > the resulting integer to a double. I would expect to see additional > parentheses (even if not needed, for clarity): > > return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + > ((double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec)) / (1000 * > 1000); > > or more simply divide by a floating-point value: > > return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + > (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); > > and you don't need two double casts regardless as the expression will > be of type double as soon as there is one operand of type double. So > that should reduce to: > > return usage.ru_utime.tv_sec + usage.ru_stime.tv_sec + > (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); > OK. Do you want that we also change the Linux version like you proposed? Thanks, Gunter > Cheers, > David > >> Thanks, >> Gunter >> From karen.kinnear at oracle.com Wed Nov 12 16:27:54 2014 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 12 Nov 2014 11:27:54 -0500 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <5456EADF.4050203@oracle.com> References: <543C591E.8010602@oracle.com> <544AB477.4000204@oracle.com> <544ADC07.6080904@oracle.com> <544AE76A.9030701@oracle.com> <544E5123.1060202@oracle.com> <544E8844.1070907@oracle.com> <0FB37288-5995-4E0B-B005-E00A4FB3B22A@oracle.com> <5454218D.40009@oracle.com> <5456EADF.4050203@oracle.com> Message-ID: <3CF28613-0C0F-44E2-869A-FF5B01D7E575@oracle.com> I think there are three things we need to figure out. 1. I reproduced a problem in TestThread2. Below was the information from that and my analysis - all - comments on my analysis are very welcome - Yumin - please try the suggested test change below to see if it helps. - that is the only example I have seen the full details for. 2. Does the circularity error actually occur in the main thread and if so why? - need to catch in a debugger/hs_err file a situation in which this occurs in the main thread please. We need the full stack trace for this - native and java please - run this without the test change I suggested please - try to catch ClassCircularityError in the main thread 3. figure out why we we see this problem more frequently - I am not convinced this problem didn't already exist - the test logic has some very odd comments and workarounds which seem to imply there were intermittent problems from the beginning - that said - worth figuring out if for instance, the sun.misc.URLClassPath logic was rewritten (and when) to add $JarLoader$2 - and looking at the history of test failure thanks, Karen On Nov 2, 2014, at 9:39 PM, David Holmes wrote: > On 1/11/2014 9:55 AM, Yumin Qi wrote: >> Karen, >> >> Thanks for your detail message for debugging. Yes, from my debugging, >> the exception did happen in TestThread other than main thread. I have no >> idea why in the end the exception was reported in main thread. > > Until that question is answered I will remain uneasy about simply tweaking the test until it no longer fails. I would also like to know when it started failing - Karen alludes to the possible introduction of a new inner class at some point. > > Thanks, > David > >> You mention >> >> So that change to the test would be: >> in TestTransformer: >> if (loader != null) { >> if (tName.equals("TestThread")) { >> { >> loadClasses(3); >> } >> } >> return null; >> } >> >> >> The loader is the one defined in the test case, right? The system class >> loader is never null. >> I will try this change, let's see if it can work it out. >> >> Thanks >> Yumin >> >> On 10/31/2014 3:29 PM, Karen Kinnear wrote: >>> Yumin, >>> >>> From your earlier exception stack trace (many thanks) you reported: >>> >>> Exception in thread "main" java.lang.ClassCircularityError: (no - I >>> don't know why this is in thread "main") >>> sun/misc/URLClassPath$JarLoader$2 >>> at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:771) >>> at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:843) >>> at sun.misc.URLClassPath.getResource(URLClassPath.java:199) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:364) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:426) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:359) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:340) >>> at >>> ParallelTransformerLoaderApp.loadClasses(ParallelTransformerLoaderApp.java:83) >>> >>> at >>> ParallelTransformerLoaderApp.main(ParallelTransformerLoaderApp.java:45) >>> >>> >>> So I ran with -XX:AbortVMOnException=java.lang.ClassCircularityError >>> -XX:+ShowMessageBoxOnError to get >>> a log file and stack trace. See my instructions below on how to do that. >>> >>> I did this, attached a debugger, which didn't help enough since I >>> needed to see the java stack frames, >>> and got an hs_err_log also, so the stack traces came from the error >>> log. >>> >>> The stack trace was on Thread 2, which in the hs_err_log was >>> TestThread (which makes sense for what the test logic says). >>> See later in email for stack traces from Thread 2. >>> >>> Summary of stack trace: >>> >>> TestThread: >>> loadClasses(#) -> forName(TestClass#, URLClassLoader) >>> vm calls out to URLClassLoader.loadClass(String) which is >>> inherited from java.lang.ClassLoader.loadClass(String) >>> ... calls java.net.URLClassLoader.findClass(...) which calls >>> DoPrivileged java.net.URLClassLoader$1.run which calls >>> sun.misc.URLClassPath.getResource(name, false) which calls >>> sun.misc.URLClassPath$JarLoader.getResource which calls >>> sun.misc.URLClassPath$JarLoader.checkResource which >>> tries to call sun.misc.URLClassPath$JarLoader$2 >>> - and then the transformer jumps in with loadClasses(# (which we >>> know is 3) and walks the same logic which tries to load >>> sun.misc.URLClassPath$JarLoader$2 again >>> >>> Note that in the placeholder table information that Yumin printed, the >>> circularity error is on sun.misc.URLClassPath$JarLoader$2 with the >>> null == boot loader, which >>> makes sense -- that is the appropriate defining loader, and therefore >>> the one the CFLH would intercept during the defineClass phase. >>> >>> In the sun.misc.URLClassPath.java file, in the class JarLoader, in the >>> method checkResource >>> ... return new Resource() { ... } >>> -- I do not know why that generates sun.misc.URLClassPath$JarLoader$1, >>> $2 and $3 at build time or when that was added. >>> I would guess that is when the bug started happening. >>> >>> When I have a successful run, sun.misc.URLClassPath$JarLoader$2 loads >>> before any TestClass1 loads. >>> >>> My belief is that the point of the test is to test parallel class >>> loading for URL class loaders. >>> I don't think the point is to test the bootstrap class loader, nor to >>> test bootstrapping - i.e. running the agent before >>> we have loaded sufficient classes to allow loading URLClassLoader >>> classes. >>> >>> What I suggested to Yumin that he try would be to change the test to >>> NOT intercept boot loader loads, so that >>> sun.misc.URLClassPath$JarLoader$# >>> can load which will in turn allow classes loaded by a URLClassLoader >>> subclass to load. >>> >>> So that change to the test would be: >>> in TestTransformer: >>> if (loader != null) { >>> if (tName.equals("TestThread")) { >>> { >>> loadClasses(3); >>> } >>> } >>> return null; >>> } >>> // I also suspect with that change, we can remove the sleep loop >>> Note: there was a printed message which said that the Thread "Signal >>> Dispatcher" has called transform(), which I >>> ignored, however it is good that we don't call loadClass on that >>> thread - which is part of what the sleep loop does - >>> but that would be handled by the boot loader screening above >>> >>> Alternatively we can preload the URLClassPath classes, but I don't >>> think we want to do that, or >>> we can have the agent explicitly screen on a variety of jdk >>> bootstrapping classes. But I think the cleaner >>> solution is to screen on the boot loader. >>> >>> Does that make any sense to others? >>> >>> thanks, >>> Karen >>> >>> p.s. How to run with hotspot flags (jtreg has a -show:rerun option, >>> but with a shell script in the test, this is more complex, so >>> the following should be easier): >>> >>> So what I did was run the test once for it to pass (not your script, >>> but just once with jtreg) so that it generated >>> the $DST/work directory. >>> I then created a rerun.csh script - attached - you can modify for your >>> own $DST directory. >>> I used it to be able to quickly rerun the test without the jtreg >>> framework and compile time etc. but mostly >>> to be able to actually add hotspot command-line flags. >>> >>> >>> >>> >>> p.p.s. details from the error log (let me know if you want me to >>> attach the error log to the bug report) >>> >>> note: error log shows last 10 events including: >>> Event: 0.928 loading class sun/misc/URLClassPath$JarLoader$2 >>> Event: 0.928 loading class TestClass3 >>> Event: 0.929 loading class TestClass3 done >>> Event: 0.929 loading class java/lang/ClassCircularityError >>> Event: 0.929 loading class java/lang/ClassCircularityError done >>> >>> TestThread >>> >>> java frames: >>> >>> j >>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> >>> j >>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> >>> j >>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> j >>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> >>> j >>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j >>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>> v ~StubRoutines::call_stub >>> j >>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> >>> j >>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> >>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>> j >>> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>> >>> j >>> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>> >>> j >>> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>> >>> v ~StubRoutines::call_stub >>> j >>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> >>> j >>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> >>> j >>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> j >>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> >>> j >>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j >>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>> v ~StubRoutines::call_stub >>> j >>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> >>> j >>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> >>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>> j ParallelTransformerLoaderApp$TestThread.run()V+4 >>> v ~StubRoutines::call_stub >>> >>> >>> >>> detailed frames: >>> >>> V [libjvm.so+0x760f5a] Exceptions::_throw_msg(Thread*, char const*, >>> int, Symbol*, char const*)+0x7c >>> V [libjvm.so+0xce005c] >>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>> Handle, Thread*)+0x7d8 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>> Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>> Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x690fbc] >>> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >>> ConstantPool*, int)+0x14a >>> j >>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> >>> j >>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> >>> j >>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>> JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>> j >>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> >>> j >>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j >>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>> JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >>> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>> V [libjvm.so+0xce2096] >>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>> V [libjvm.so+0xce00a8] >>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>> Handle, Thread*)+0x824 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>> Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>> Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >>> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>> j >>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> >>> j >>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> >>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>> j >>> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>> >>> j >>> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>> >>> j >>> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>> >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>> JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x911bfb] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, >>> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x3cd >>> V [libjvm.so+0x916918] jni_CallObjectMethod+0x388 >>> C [libinstrument.so+0x4eb5] transformClassFile+0x1e5 >>> C [libinstrument.so+0x1e06] eventHandlerClassFileLoadHook+0x96 >>> V [libjvm.so+0xa04afa] >>> JvmtiClassFileLoadHookPoster::post_to_env(JvmtiEnv*, bool)+0x1a8 >>> V [libjvm.so+0xa0485e] >>> JvmtiClassFileLoadHookPoster::post_all_envs()+0x8a >>> V [libjvm.so+0xa047c6] JvmtiClassFileLoadHookPoster::post()+0x18 >>> V [libjvm.so+0x9fb6e1] >>> JvmtiExport::post_class_file_load_hook(Symbol*, Handle, Handle, >>> unsigned char**, unsigned char**, JvmtiCachedClassFileData**)+0x85 >>> V [libjvm.so+0x5cd17d] ClassFileParser::parseClassFile(Symbol*, >>> ClassLoaderData*, Handle, KlassHandle, GrowableArray*, >>> TempNewSymbol&, bool, Thread*)+0x2af >>> V [libjvm.so+0x5dd441] ClassFileParser::parseClassFile(Symbol*, >>> ClassLoaderData*, Handle, TempNewSymbol&, bool, Thread*)+0x95 >>> V [libjvm.so+0x5daf03] ClassLoader::load_classfile(Symbol*, >>> Thread*)+0x2ed >>> V [libjvm.so+0xce1cc4] >>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x118 >>> V [libjvm.so+0xce00a8] >>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>> Handle, Thread*)+0x824 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>> Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>> Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x690fbc] >>> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >>> ConstantPool*, int)+0x14a >>> j >>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>> >>> j >>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>> >>> j >>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>> >>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>> JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>> j >>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>> >>> j >>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>> j >>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; >>> v ~StubRoutines::call_stub >>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>> JavaCallArguments*, Thread*)+0x7d >>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >>> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>> V [libjvm.so+0xce2096] >>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>> V [libjvm.so+0xce00a8] >>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>> Handle, Thread*)+0x824 >>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>> Handle, Handle, Thread*)+0x26d >>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>> Handle, Handle, bool, Thread*)+0x39 >>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >>> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>> j >>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>> >>> j >>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>> >>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>> ...... >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Oct 27, 2014, at 2:00 PM, serguei.spitsyn at oracle.com wrote: >>> >>>> Ok. >>>> >>>> Thanks, Dan! >>>> Serguei >>>> >>>> >>>> On 10/27/14 7:05 AM, Daniel D. Daugherty wrote: >>>>>> The test case was added by Dan. >>>>>> We may want to ask him to clarify the test case purpose. >>>>>> (added Dan to the to-list) >>>>> Here's the changeset that added the test: >>>>> >>>>> $ hg log -v -r bca8bf23ac59 >>>>> test/java/lang/instrument/ParallelTransformerLoader.sh >>>>> changeset: 132:bca8bf23ac59 >>>>> user: dcubed >>>>> date: Mon Mar 24 15:05:09 2008 -0700 >>>>> files: test/java/lang/instrument/ParallelTransformerLoader.sh >>>>> test/java/lang/instrument/ParallelTransformerLoaderAgent.java >>>>> test/java/lang/instrument/ParallelTransformerLoaderApp.java >>>>> test/java/lang/instrument/TestClass1.java >>>>> test/java/lang/instrument/TestClass2.java >>>>> test/java/lang/instrument/TestClass3.java >>>>> description: >>>>> 5088398: 3/2 java.lang.instrument TCK test deadlock (test11) >>>>> Summary: Add regression test for single-threaded bootstrap classloader. >>>>> Reviewed-by: sspitsyn >>>>> >>>>> >>>>> Based on my e-mail archive for this bug and from the bug report itself, >>>>> it looks like we got this test from Wily Labs. The original bug was a >>>>> deadlock that stopped being reproducible after: >>>>> >>>>> Karen fixed the bootstrap class loader to work in parallel via: >>>>> >>>>> 4997893 4/5 Investigate allowing bootstrap loader to work in >>>>> parallel >>>>> >>>>> with that fix in place the deadlock no longer reproduces. >>>>> I'm planning to use this bug as the vehicle for getting >>>>> the test program into the INSTRUMENT_REGRESSION test suite. >>>>> >>>>> *** (#2 of 2): 2008-02-29 18:20:17 GMT+00:00 daniel.daugherty at sun.com >>>>> >>>>> >>>>> A careful reading of JDK-5088398 might reveal the intentions of this >>>>> test... >>>>> >>>>> Dan >>>>> >>>>> >>>>> On 10/24/14 5:57 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Yumin, >>>>>> >>>>>> On 10/24/14 4:08 PM, Yumin Qi wrote: >>>>>>> Serguei, >>>>>>> >>>>>>> Thanks for your comments. >>>>>>> This test happens intermittently, but now it can repeat with 8/9. >>>>>>> Loading TestClass1 in main thread while loading TestClass2 in >>>>>>> TestThread in parallel. They both will call transform since >>>>>>> TestClass[1-3] are loaded via agent. When loading TestClass2, it >>>>>>> will call loading TestClass3 in TestThread. >>>>>>> Note in the main thread, for loop: >>>>>>> >>>>>>> for (int i = 0; i < kNumIterations; i++) >>>>>>> { >>>>>>> // load some classes from multiple threads >>>>>>> (this thread and one other) >>>>>>> Thread testThread = new TestThread(2); >>>>>>> testThread.start(); >>>>>>> loadClasses(1); >>>>>>> >>>>>>> // log that it completed and reset for the >>>>>>> next iteration >>>>>>> testThread.join(); >>>>>>> System.out.print("."); >>>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader(); >>>>>>> } >>>>>>> >>>>>>> The loader got renewed after testThread.join(). So both threads >>>>>>> are using the exact same class loader. >>>>>> You are right, thanks. >>>>>> It means that all three classes (TesClass1, TestClass2 and TestClass3) >>>>>> are loaded by the same class loader in each iteration. >>>>>> >>>>>> However, I see more cases when the TestClass3 gets loaded. >>>>>> It happens in a CFLH event when any other class (not TestClass*) in >>>>>> the system is loaded. >>>>>> The class loading thread can be any, not only "main" or "TestClass" >>>>>> thread. >>>>>> I suspect this test case mostly targets class loading that happens >>>>>> on other threads. >>>>>> It is because of the lines: >>>>>> // In 160_03 and older, transform() is called >>>>>> // with the "system_loader_lock" held and that >>>>>> // prevents the bootstrap class loaded from >>>>>> // running in parallel. If we add a slight >>>>>> sleep >>>>>> // delay here when the transform() call is not >>>>>> // main or TestThread, then the deadlock in >>>>>> // 160_03 and older is much more reproducible. >>>>>> if (!tName.equals("main") && >>>>>> !tName.equals("TestThread")) { >>>>>> System.out.println("Thread '" + tName + >>>>>> "' has called transform()"); >>>>>> try { >>>>>> Thread.sleep(500); >>>>>> } catch (InterruptedException ie) { >>>>>> } >>>>>> } >>>>>> >>>>>> What about the following? >>>>>> >>>>>> In the ParallelTransformerLoaderAgent.java make this change: >>>>>> if (!tName.equals("main")) >>>>>> => if (tName.equals("TestThread")) >>>>>> >>>>>> Does such updated test still failing? >>>>>> >>>>>>> After create a new class loader, next loop will use the loader. >>>>>>> This is why quite often on the stack trace we can see it resolves >>>>>>> JarLoader$2. >>>>>>> >>>>>>> I am not quite understand the test case either. Loading TestClass3 >>>>>>> inside transform using the same classloader will cause call to >>>>>>> transform again and form a circle. Nonetheless, if we see >>>>>>> TestClass2 already loaded, the loop will end but that still is a >>>>>>> risk. >>>>>> In fact, I don't like that the test loads the class TestClass3 at >>>>>> the TestClass3 CFLH event. >>>>>> However, it is interesting to know why we did not see (is it the >>>>>> case?) this issue before. >>>>>> Also, it is interesting why the test stops failing with you fix >>>>>> (replacing loader with SystemClassLoader). >>>>>> >>>>>> The test case was added by Dan. >>>>>> We may want to ask him to clarify the test case purpose. >>>>>> (added Dan to the to-list) >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>>> Thanks >>>>>>> Yumin >>>>>>> >>>>>>> On 10/24/2014 1:20 PM, serguei.spitsyn at oracle.com wrote: >>>>>>>> Hi Yumin, >>>>>>>> >>>>>>>> Below is some analysis to make sure I understand the test >>>>>>>> scenario correctly. >>>>>>>> >>>>>>>> The ParallelTransformerLoaderApp.main() executes a 1000 iteration >>>>>>>> loop. >>>>>>>> At each iteration it does: >>>>>>>> - creates and starts a new TestThread >>>>>>>> - loads TestClass1 with the current class loader: >>>>>>>> ParallelTransformerLoaderAgent.getClassLoader() >>>>>>>> - changes the current class loader with new one: >>>>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader() >>>>>>>> >>>>>>>> The TestThread loads the TestClass2 concurrently with the main >>>>>>>> thread. >>>>>>>> >>>>>>>> At the CFLH events, the ParallelTransformerLoaderAgent does the >>>>>>>> class retransformation. >>>>>>>> If the thread loading the class is not "main", it loads the class >>>>>>>> TestClass3 >>>>>>>> with the current class loader >>>>>>>> ParallelTransformerLoaderAgent.getClassLoader(). >>>>>>>> >>>>>>>> Sometimes, the TestClass2 and TestClass3 are loaded by the same >>>>>>>> class loader recursively. >>>>>>>> It happens if the class loader has not been changed between >>>>>>>> loading TestClass2 and TestClass3 classes. >>>>>>>> >>>>>>>> I'm not convinced yet the test is incorrect. >>>>>>>> And it is not clear why do we get a ClassCircularityError. >>>>>>>> >>>>>>>> Please, let me know if the above understanding is wrong. >>>>>>>> I also see the reply from David and share his concerns. >>>>>>>> >>>>>>>> It is not clear if this failure is a regression. >>>>>>>> Did we observe this issue before? >>>>>>>> If - NOT then when and why had this failure started to appear? >>>>>>>> >>>>>>>> Unfortunately, it is impossible to look at the test run history >>>>>>>> at the moment. >>>>>>>> The Aurora is at a maintenance. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>> On 10/13/14 3:58 PM, Yumin Qi wrote: >>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8038468 >>>>>>>>> webrev:*http://cr.openjdk.java.net/~minqi/8038468/webrev00/ >>>>>>>>> >>>>>>>>> the bug marked as confidential so post the webrev internally. >>>>>>>>> >>>>>>>>> Problem: The test case tries to load a class from the same jar >>>>>>>>> via agent in the middle of loading another class from the jar >>>>>>>>> via same class loader in same thread. The call happens in >>>>>>>>> transform which is a rare case --- in middle of loading class, >>>>>>>>> loading another class. The result is a CircularityError. When >>>>>>>>> first class is in loading, in vm we put JarLoader$2 on place >>>>>>>>> holder table, then we start the defineClass, which calls >>>>>>>>> transform, begins loading the second class so go along the same >>>>>>>>> routine for loading JarLoader$2 first, found it already in >>>>>>>>> placeholder table. A CircularityError is thrown. >>>>>>>>> Fix: The test case should not call loading class with same class >>>>>>>>> loader in same thread from same jar in 'transform' method. I >>>>>>>>> modify it loading with system class loader and we expect see >>>>>>>>> ClassNotFoundException. Detail see bug comments. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Yumin * >> From jiangli.zhou at oracle.com Wed Nov 12 16:33:18 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Wed, 12 Nov 2014 08:33:18 -0800 Subject: RFR 8054008: Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit In-Reply-To: <10933301-E9F4-4BDF-B678-50FE846873BD@oracle.com> References: <545C21E6.90709@oracle.com> <682E3725-DDA0-4A1D-9F6E-94C6CF6045A1@oracle.com> <545D0167.3070903@oracle.com> <545D4BDE.9010908@oracle.com> <10933301-E9F4-4BDF-B678-50FE846873BD@oracle.com> Message-ID: <54638BCE.5000607@oracle.com> Thanks, Roland! Jiangli On 11/12/2014 01:55 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~jiangli/8054008/webrev.02/ > It looks good to me. > > Roland. From tom.deneau at amd.com Wed Nov 12 16:52:13 2014 From: tom.deneau at amd.com (Deneau, Tom) Date: Wed, 12 Nov 2014 16:52:13 +0000 Subject: hang when using -XX:-UseCompilerSafepoints Message-ID: Hi all -- Forwarding a thread which came about on the jmh-dev mail list, as recommended by Aleksey Shipilev (see below). The JMH framework has a timing control thread which sleeps for a certain period, then sets a volatile isDone variable. Meanwhile, the benchmark thread loops doing its benchmark code and also checking the isDone field. A hang occurs if -XX:-UseCompilerSafepoints is used. The original issue can be reproduced by the following steps hg clone http://hg.openjdk.java.net/code-tools/jmh cd jmh mvn clean install -DskipTests=true cd jmh-samples java -server -XX:-UseCompilerSafepoints -jar target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 -- Tom Deneau -----Original Message----- From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] Sent: Wednesday, November 12, 2014 6:09 AM To: Deneau, Tom; jmh-dev at openjdk.java.net Subject: Re: using -XX:-UseCompilerSafepoints Hi Tom, On 11/11/2014 07:34 PM, Deneau, Tom wrote: > It looks like a thread that calls Thread.sleep (as the timing control > thread does in the harness) will eventually go thru > SafepointSynchonize::block (as part of the ThreadBlockInVM > destructor). So if there is a looping benchmark thread compiled > without Compiler Safepoints, the control thread will be blocked and > will never set the isDone flag. So, you are saying that without the safepoint in the while(!isDone) loop in workload, control thread and workload thread will never rendezvous on safepoint? I believe this is a bug with -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls this out specifically for VMThread vs. Mutator threads: // In a pathological scenario such as that described in CR6415670 // the VMthread may sleep just before the mutator(s) become safe. // In that case the mutators will be stalled waiting for the safepoint // to complete and the the VMthread will be sleeping, waiting for the // mutators to rendezvous. The VMthread will eventually wake up and // detect that all mutators are safe, at which point we'll again make // progress. If this is a case, you probably need to report this to runtime guys. > This is probably OK, just need to document that CompilerSafepoints > cannot be turned off. I think it is safe to presume something will go hairy if you are using any special VM flag, therefore I am not inclined to document this. Thanks, -Aleksey. From david.r.chase at oracle.com Wed Nov 12 17:03:11 2014 From: david.r.chase at oracle.com (David Chase) Date: Wed, 12 Nov 2014 12:03:11 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <545F642E.30205@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> <545F642E.30205@gmail.com> Message-ID: Hello Peter, I was looking at this (thinking it would be a useful thing to benchmark, looking for possible improvements) and noticed that you rely on the hashed objects having a sensible value-dependent hashcode (as opposed to the default Object hashcode). Sadly, this seems not to be the case for MemberNames or for ?Types?. I am sorely tempted to repair this glitch, not sure if it fits in the scope of the original bug, but there?s a lot to be said for future-performance-proofing. David On 2014-11-09, at 7:55 AM, Peter Levart wrote: > Hi David, > > I played a little with the idea of having a hash table instead of packed sorted array for interning. Using ConcurrentHashMap would present quite some memory overhead. A more compact representation is possible in the form of a linear-scan hash table where elements of array are MemberNames themselves: > > http://cr.openjdk.java.net/~plevart/misc/MemberName.intern/jdk.06.diff/ > > This is a drop-in replacement for MemberName on top of your jdk.06 patch. If you have some time, you can run this with your performance tests to see if it presents any difference. If not, then perhaps this interning is not so performance critical after all. > > Regards, Peter From david.r.chase at oracle.com Wed Nov 12 18:27:33 2014 From: david.r.chase at oracle.com (David Chase) Date: Wed, 12 Nov 2014 13:27:33 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <545F642E.30205@gmail.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> <545F642E.30205@gmail.com> Message-ID: Hello Peter, > Sadly, this seems not to be the case for MemberNames or for ?Types?. That statement is inoperative. Mistakes were made. It?s compareTo that they lack. David On 2014-11-09, at 7:55 AM, Peter Levart wrote: > Hi David, > > I played a little with the idea of having a hash table instead of packed sorted array for interning. Using ConcurrentHashMap would present quite some memory overhead. A more compact representation is possible in the form of a linear-scan hash table where elements of array are MemberNames themselves: > > http://cr.openjdk.java.net/~plevart/misc/MemberName.intern/jdk.06.diff/ > > This is a drop-in replacement for MemberName on top of your jdk.06 patch. If you have some time, you can run this with your performance tests to see if it presents any difference. If not, then perhaps this interning is not so performance critical after all. > > Regards, Peter From chris.plummer at oracle.com Wed Nov 12 19:44:22 2014 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 12 Nov 2014 11:44:22 -0800 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault In-Reply-To: <545D939D.2030308@oracle.com> References: <545D939D.2030308@oracle.com> Message-ID: <5463B896.10801@oracle.com> Hi, I'm still looking for reviewers. thanks, Chris On 11/7/14 7:53 PM, Chris Plummer wrote: > This is an initial review for 6762191. I'm guessing there will be > recommendations to fix in a different way, but thought this would be a > good time to start the discussion. > > https://bugs.openjdk.java.net/browse/JDK-6762191 > http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ > http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ > > The bug is that if the -Xss size is set to something very small (like > 16k), on linux there will be a crash due to overwriting the end of the > stack. This happens before hotspot can compute its stack needs and > verify that the stack is big enough. > > It didn't seem viable to move the hotspot stack size check earlier. It > depends on too much other work done before that point, and the changes > would have been disruptive. The stack size check is currently done in > os::init_2(). > > What is needed is a check before the thread is created. That way we > can create a thread with a big enough stack to handle all needs up to > the point of the check in os::init_2(). This initial check does not > need to be the final check. It just needs to confirm that we have > enough stack to get us to the check in os::init_2(). > > I decided to check in java.c if the -Xss size is too small, and set it > to a larger size if it is. I hard coded this size to 32k (I'll explain > why 32k later). I suspect this is the part that will result in some > debate. If you have better suggestions let me know. If it does stay > here, then probably the 32k needs to be a #define, and maybe even an > OS porting interface, but I'm not sure where to put it. > > The reason I chose 32k is because this is big enough for all platforms > to get to the stack size check in os::init_2(). It is also smaller > than the actual minimum stack size allowed on any platform. 32-bit > windows has the smallest requirement at 64k. I add some printfs to > print the minimum stack requirement, and then ran a simple JTReg test > with every JPRT supported platform to get the results. > > The TooSmallStackSize.sh will run "java -version" with -Xss16k, > -Xss32k, and -XXss, where is the size from the > error message produced by the JVM, such as in the following: > > $ java -Xss32k -version > The stack size specified is too small, Specify at least 100k > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > > I ran this test through JPRT on all platforms, and they all pass. > > One thing to point out is that Windows behaves a bit different than > the other platforms. It always rounds the stack size up to a multiple > of 64k , so even if you specify -Xss16k, you get a 64k stack. On > 32-bit Windows with C1, 64k is also the minimum requirement, so there > is no error produced in this case. However, on 32-bit Windows with C2, > 68k is the minimum, so an error is produced since the stack will only > be 64k. There is no bug here. It's just a bit confusing. > > thanks, > > Chris From aleksey.shipilev at oracle.com Wed Nov 12 20:13:37 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Nov 2014 23:13:37 +0300 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: References: Message-ID: <5463BF71.4080804@oracle.com> Hi, Still not sure if this is a runtime bug: stripping safepoints from the non-counted loop seems to be a recipe for disaster. Anyhow, I think it deserves a simpler example. Submitted the bug and attached a simple test there: https://bugs.openjdk.java.net/browse/JDK-8064749 Thanks, -Aleksey. On 12.11.2014 19:52, Deneau, Tom wrote: > Hi all -- > > Forwarding a thread which came about on the jmh-dev mail list, as recommended by Aleksey Shipilev (see below). The JMH framework has a timing control thread which sleeps for a certain period, then sets a volatile isDone variable. Meanwhile, the benchmark thread loops doing its benchmark code and also checking the isDone field. A hang occurs if -XX:-UseCompilerSafepoints is used. > > The original issue can be reproduced by the following steps > > hg clone http://hg.openjdk.java.net/code-tools/jmh > cd jmh > mvn clean install -DskipTests=true > cd jmh-samples > java -server -XX:-UseCompilerSafepoints -jar target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 > > -- Tom Deneau > > > -----Original Message----- > From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] > Sent: Wednesday, November 12, 2014 6:09 AM > To: Deneau, Tom; jmh-dev at openjdk.java.net > Subject: Re: using -XX:-UseCompilerSafepoints > > Hi Tom, > > On 11/11/2014 07:34 PM, Deneau, Tom wrote: >> It looks like a thread that calls Thread.sleep (as the timing control >> thread does in the harness) will eventually go thru >> SafepointSynchonize::block (as part of the ThreadBlockInVM >> destructor). So if there is a looping benchmark thread compiled >> without Compiler Safepoints, the control thread will be blocked and >> will never set the isDone flag. > > So, you are saying that without the safepoint in the while(!isDone) > loop in workload, control thread and workload thread will never > rendezvous on safepoint? I believe this is a bug with > -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls this > out specifically for VMThread vs. Mutator threads: > > // In a pathological scenario such as that described in CR6415670 > // the VMthread may sleep just before the mutator(s) become safe. > // In that case the mutators will be stalled waiting for the safepoint > // to complete and the the VMthread will be sleeping, waiting for the > // mutators to rendezvous. The VMthread will eventually wake up and > // detect that all mutators are safe, at which point we'll again make > // progress. > > If this is a case, you probably need to report this to runtime guys. > >> This is probably OK, just need to document that CompilerSafepoints >> cannot be turned off. > > I think it is safe to presume something will go hairy if you are using > any special VM flag, therefore I am not inclined to document this. > > Thanks, > -Aleksey. > From christian.tornqvist at oracle.com Wed Nov 12 20:53:24 2014 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Wed, 12 Nov 2014 15:53:24 -0500 Subject: RFR(S): 8043491: warning LNK4197: export '... ...' specified multiple times; using first specification In-Reply-To: <5461A8DE.1050009@oracle.com> References: <5461A8DE.1050009@oracle.com> Message-ID: <01ff01cffeba$b3aa2c40$1afe84c0$@oracle.com> Hi Calvin, Change looks good, thanks for fixing this. Thanks, Christian -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Calvin Cheung Sent: Tuesday, November 11, 2014 1:13 AM To: hotspot-runtime-dev at openjdk.java.net Subject: RFR(S): 8043491: warning LNK4197: export '... ...' specified multiple times; using first specification This is for fixing link warnings on windows such as the following: jni.obj : warning LNK4197: export 'JNI_CreateJavaVM' specified multiple times; using first specification The warning is reproducible with both VS2010 and VS2013. It is applicable to 64-bit only probably due to the __declspec(dllexport) on 32-bit, it exports the function decorated name with a leading underscore, but not the case on 64-bit as described in: http://stackoverflow.com/questions/3572344/a-warning-with-building-64bit-dll All those functions are declared with JNIEXPORT (#define JNIEXPORT __declspec(dllexport)) and we're adding the /export: in the link command. Therefore, on 64-bit platform, we get the "specified multiple times" LNK4197 warning. A fix is to check if the platform is 64-bit, we don't add those /export option to the link command. JBS: https://bugs.openjdk.java.net/browse/JDK-8043491 webrev: http://cr.openjdk.java.net/~ccheung/8043491/webrev/ Tests: (1) build jvm.dll via command line (both 32- and 64-bit) use configure.sh to setup and then do "make CONF= hotspot" (2) generate visual studio project files using ProjectCreator (both 32- and 64-bit) build jvm.dll via VS2013 (both 32- and 64-bit) (3) JPRT thanks, Calvin From calvin.cheung at oracle.com Wed Nov 12 21:08:52 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 12 Nov 2014 13:08:52 -0800 Subject: RFR(S): 8043491: warning LNK4197: export '... ...' specified multiple times; using first specification In-Reply-To: <01ff01cffeba$b3aa2c40$1afe84c0$@oracle.com> References: <5461A8DE.1050009@oracle.com> <01ff01cffeba$b3aa2c40$1afe84c0$@oracle.com> Message-ID: <5463CC64.1000003@oracle.com> Thanks for your review - Christian. Calvin On 11/12/2014 12:53 PM, Christian Tornqvist wrote: > Hi Calvin, > > Change looks good, thanks for fixing this. > > Thanks, > Christian > > -----Original Message----- > From: hotspot-runtime-dev > [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Calvin > Cheung > Sent: Tuesday, November 11, 2014 1:13 AM > To: hotspot-runtime-dev at openjdk.java.net > Subject: RFR(S): 8043491: warning LNK4197: export '... ...' specified > multiple times; using first specification > > This is for fixing link warnings on windows such as the following: > jni.obj : warning LNK4197: export 'JNI_CreateJavaVM' specified multiple > times; using first specification > > The warning is reproducible with both VS2010 and VS2013. > It is applicable to 64-bit only probably due to the > __declspec(dllexport) on 32-bit, it exports the function decorated name with > a leading underscore, but not the case on 64-bit as described in: > http://stackoverflow.com/questions/3572344/a-warning-with-building-64bit-dll > > All those functions are declared with JNIEXPORT (#define JNIEXPORT > __declspec(dllexport)) and we're adding the /export: in the > link command. Therefore, on 64-bit platform, we get the "specified multiple > times" LNK4197 warning. > > A fix is to check if the platform is 64-bit, we don't add those /export > option to the link command. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8043491 > > webrev: http://cr.openjdk.java.net/~ccheung/8043491/webrev/ > > Tests: > (1) build jvm.dll via command line (both 32- and 64-bit) > use configure.sh to setup and then do "make CONF= > hotspot" > > (2) generate visual studio project files using ProjectCreator (both > 32- and 64-bit) > build jvm.dll via VS2013 (both 32- and 64-bit) > > (3) JPRT > > thanks, > Calvin > > > > From david.holmes at oracle.com Wed Nov 12 22:45:16 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 08:45:16 +1000 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <54633524.8000607@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> <5462E6B5.7080504@oracle.com> <54633524.8000607@oracle.com> Message-ID: <5463E2FC.5010207@oracle.com> On 12/11/2014 8:23 PM, Aleksey Shipilev wrote: > Hi David, > > On 12.11.2014 07:48, David Holmes wrote: >> On 12/11/2014 12:40 AM, Aleksey Shipilev wrote: >> All looks good to me. > > Thanks for the review! > >> But I also noticed this strange (to me) assertion in javaClasses.cpp >> >> void java_lang_Thread::set_name(oop java_thread, oop name) { >> assert(java_thread->obj_field(_name_offset) == NULL, "name should be >> NULL"); >> java_thread->obj_field_put(_name_offset, name); >> } >> >> and on investigation it seems like this is dead code - I couldn't locate >> a call to java_lang_Thread::set_name ?? It would only be usable on an >> attaching thread (else name can't be null) and we pass the name to the >> Thread constructor in that case. > > set_name is not used, as I mentioned earlier -- that makes the change Sorry, I missed that comment. > even more "safe". I was even tempted to drop the setter completely, but > it would break the symmetry against other setters and getters. I dropped > the assert at set_name in this update: > http://cr.openjdk.java.net/~shade/8059677/webrev.03.hs/ > http://cr.openjdk.java.net/~shade/8059677/webrev.03.jdk/ > > The only difference against the previous version is the dropped assert, > so I haven't re-spinned the tests. OK. I'm more inclined to delete unused code but it is fine as is. Thanks, David > Thanks, > -Aleksey. > From yumin.qi at oracle.com Wed Nov 12 22:45:42 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Wed, 12 Nov 2014 14:45:42 -0800 Subject: RFR(S): 8043491: warning LNK4197: export '... ...' specified multiple times; using first specification In-Reply-To: <5461A8DE.1050009@oracle.com> References: <5461A8DE.1050009@oracle.com> Message-ID: <5463E316.30308@oracle.com> Looks good to me. Thanks Yumin On 11/10/2014 10:12 PM, Calvin Cheung wrote: > This is for fixing link warnings on windows such as the following: > jni.obj : warning LNK4197: export 'JNI_CreateJavaVM' specified > multiple times; using first specification > > The warning is reproducible with both VS2010 and VS2013. > It is applicable to 64-bit only probably due to the > __declspec(dllexport) on 32-bit, it exports the function decorated > name with a leading underscore, but not the case on 64-bit as > described in: > http://stackoverflow.com/questions/3572344/a-warning-with-building-64bit-dll > > > All those functions are declared with JNIEXPORT (#define JNIEXPORT > __declspec(dllexport)) and we're adding the /export: in > the link command. Therefore, on 64-bit platform, we get the "specified > multiple times" LNK4197 warning. > > A fix is to check if the platform is 64-bit, we don't add those > /export option to the link command. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8043491 > > webrev: http://cr.openjdk.java.net/~ccheung/8043491/webrev/ > > Tests: > (1) build jvm.dll via command line (both 32- and 64-bit) > use configure.sh to setup and then do "make CONF= > hotspot" > > (2) generate visual studio project files using ProjectCreator > (both 32- and 64-bit) > build jvm.dll via VS2013 (both 32- and 64-bit) > > (3) JPRT > > thanks, > Calvin > > > From calvin.cheung at oracle.com Wed Nov 12 22:48:24 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 12 Nov 2014 14:48:24 -0800 Subject: RFR(S): 8043491: warning LNK4197: export '... ...' specified multiple times; using first specification In-Reply-To: <5463E316.30308@oracle.com> References: <5461A8DE.1050009@oracle.com> <5463E316.30308@oracle.com> Message-ID: <5463E3B8.3050708@oracle.com> Thanks for your review - Yumin. On 11/12/2014 2:45 PM, Yumin Qi wrote: > Looks good to me. > > Thanks > Yumin > On 11/10/2014 10:12 PM, Calvin Cheung wrote: >> This is for fixing link warnings on windows such as the following: >> jni.obj : warning LNK4197: export 'JNI_CreateJavaVM' specified >> multiple times; using first specification >> >> The warning is reproducible with both VS2010 and VS2013. >> It is applicable to 64-bit only probably due to the >> __declspec(dllexport) on 32-bit, it exports the function decorated >> name with a leading underscore, but not the case on 64-bit as >> described in: >> http://stackoverflow.com/questions/3572344/a-warning-with-building-64bit-dll >> >> >> All those functions are declared with JNIEXPORT (#define JNIEXPORT >> __declspec(dllexport)) and we're adding the /export: >> in the link command. Therefore, on 64-bit platform, we get the >> "specified multiple times" LNK4197 warning. >> >> A fix is to check if the platform is 64-bit, we don't add those >> /export option to the link command. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8043491 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8043491/webrev/ >> >> Tests: >> (1) build jvm.dll via command line (both 32- and 64-bit) >> use configure.sh to setup and then do "make CONF= >> hotspot" >> >> (2) generate visual studio project files using ProjectCreator >> (both 32- and 64-bit) >> build jvm.dll via VS2013 (both 32- and 64-bit) >> >> (3) JPRT >> >> thanks, >> Calvin >> >> >> > From aleksey.shipilev at oracle.com Wed Nov 12 23:01:39 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 02:01:39 +0300 Subject: RFR (S) 8059677: Thread.getName() instantiates Strings In-Reply-To: <54621FFA.2070503@oracle.com> References: <545FB64F.7090705@oracle.com> <54621FFA.2070503@oracle.com> Message-ID: <5463E6D3.6030806@oracle.com> On 11.11.2014 17:40, Aleksey Shipilev wrote: > On 11/09/2014 09:45 PM, Aleksey Shipilev wrote: >> Thread.getName() returns String, and does new String instantiation every >> time, because the thread name is stored in char[]. Even though we use a >> private String constructor that shares the char[] array without copying >> it, this still hurts some use cases (think extra-fast logging). To the >> extent some people actually maintain Map to avoid it. >> https://bugs.openjdk.java.net/browse/JDK-8059677 >> >> Here's the attempt to maintain String instead of char[]: >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.jdk/ >> http://cr.openjdk.java.net/~shade/8059677/webrev.01.hs/ > > Updated webrevs: > http://cr.openjdk.java.net/~shade/8059677/webrev.02.jdk/ > http://cr.openjdk.java.net/~shade/8059677/webrev.02.hs/ All right, third time a charm. All reviewers seem to be happy with these changes: http://cr.openjdk.java.net/~shade/8059677/webrev.03.jdk/ http://cr.openjdk.java.net/~shade/8059677/webrev.03.hs/ Coleen had volunteered to sponsor them (thanks!), here are the changesets: http://cr.openjdk.java.net/~shade/8059677/8059677-jdk.changeset http://cr.openjdk.java.net/~shade/8059677/8059677-hs.changeset Thanks, -Aleksey. From david.holmes at oracle.com Wed Nov 12 23:27:05 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 09:27:05 +1000 Subject: RFR (S) 8062307: 'Reference handler' thread triggers assert w/ TraceThreadEvents In-Reply-To: <5451BD59.4060202@oracle.com> References: <5451BADF.8040203@oracle.com> <5451BD59.4060202@oracle.com> Message-ID: <5463ECC9.10309@oracle.com> The CCC for this trivial removal has been removed. Still need two reviewers please. David On 30/10/2014 2:23 PM, David Holmes wrote: > On 30/10/2014 2:13 PM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8062307 >> >> Webrev: http://cr.openjdk.java.net/~dholmes/8062307/webrev/ >> >> It turns out that the little known TraceThreadEvents logic has been >> broken since at least very early in JDK 5. A develop-only option it was >> intended to show when different Thread methods were called (the VM side >> of certain java.lang.Thread methods). While that sounds potentially >> useful for debugging it seems that in practice it is not - this has been >> broken for over 10 years with nobody noticing: it is unused. So rather >> than fix unused code it is proposed to simply delete it instead. > > Correction this has been noticed in the past: > > https://bugs.openjdk.java.net/browse/JDK-6757482 (2008-10-09 06:51) > > http://mail.openjdk.java.net/pipermail/distro-pkg-dev/2008-March/001586.html > > > David > >> Thanks, >> David From coleen.phillimore at oracle.com Wed Nov 12 23:34:06 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 12 Nov 2014 18:34:06 -0500 Subject: RFR (S) 8062307: 'Reference handler' thread triggers assert w/ TraceThreadEvents In-Reply-To: <5463ECC9.10309@oracle.com> References: <5451BADF.8040203@oracle.com> <5451BD59.4060202@oracle.com> <5463ECC9.10309@oracle.com> Message-ID: <5463EE6E.7060008@oracle.com> Change looks good. You mean the CCC is approved, not removed. Coleen On 11/12/14, 6:27 PM, David Holmes wrote: > The CCC for this trivial removal has been removed. > > Still need two reviewers please. > > David > > On 30/10/2014 2:23 PM, David Holmes wrote: >> On 30/10/2014 2:13 PM, David Holmes wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8062307 >>> >>> Webrev: http://cr.openjdk.java.net/~dholmes/8062307/webrev/ >>> >>> It turns out that the little known TraceThreadEvents logic has been >>> broken since at least very early in JDK 5. A develop-only option it was >>> intended to show when different Thread methods were called (the VM side >>> of certain java.lang.Thread methods). While that sounds potentially >>> useful for debugging it seems that in practice it is not - this has >>> been >>> broken for over 10 years with nobody noticing: it is unused. So rather >>> than fix unused code it is proposed to simply delete it instead. >> >> Correction this has been noticed in the past: >> >> https://bugs.openjdk.java.net/browse/JDK-6757482 (2008-10-09 06:51) >> >> http://mail.openjdk.java.net/pipermail/distro-pkg-dev/2008-March/001586.html >> >> >> >> David >> >>> Thanks, >>> David From jiangli.zhou at oracle.com Wed Nov 12 23:37:09 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Wed, 12 Nov 2014 15:37:09 -0800 Subject: RFR (S) 8062307: 'Reference handler' thread triggers assert w/ TraceThreadEvents In-Reply-To: <5463ECC9.10309@oracle.com> References: <5451BADF.8040203@oracle.com> <5451BD59.4060202@oracle.com> <5463ECC9.10309@oracle.com> Message-ID: <5463EF25.40804@oracle.com> Hi David, The change looks good. Thanks, Jiangli On 11/12/2014 03:27 PM, David Holmes wrote: > The CCC for this trivial removal has been removed. > > Still need two reviewers please. > > David > > On 30/10/2014 2:23 PM, David Holmes wrote: >> On 30/10/2014 2:13 PM, David Holmes wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8062307 >>> >>> Webrev: http://cr.openjdk.java.net/~dholmes/8062307/webrev/ >>> >>> It turns out that the little known TraceThreadEvents logic has been >>> broken since at least very early in JDK 5. A develop-only option it was >>> intended to show when different Thread methods were called (the VM side >>> of certain java.lang.Thread methods). While that sounds potentially >>> useful for debugging it seems that in practice it is not - this has >>> been >>> broken for over 10 years with nobody noticing: it is unused. So rather >>> than fix unused code it is proposed to simply delete it instead. >> >> Correction this has been noticed in the past: >> >> https://bugs.openjdk.java.net/browse/JDK-6757482 (2008-10-09 06:51) >> >> http://mail.openjdk.java.net/pipermail/distro-pkg-dev/2008-March/001586.html >> >> >> >> David >> >>> Thanks, >>> David From vladimir.kozlov at oracle.com Wed Nov 12 23:38:48 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Nov 2014 15:38:48 -0800 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5463BF71.4080804@oracle.com> References: <5463BF71.4080804@oracle.com> Message-ID: <5463EF88.1050100@oracle.com> On 11/12/14 12:13 PM, Aleksey Shipilev wrote: > Hi, > > Still not sure if this is a runtime bug: stripping safepoints from the > non-counted loop seems to be a recipe for disaster. This flag does not affect compiled code - so it is not compiler issue. It is only used in runtime/safepoint.cpp and it guards the code which protects a polling page. There are many bugs which shows current problem. For example: https://bugs.openjdk.java.net/browse/JDK-6873333 I would say that we have to remove it or at least make it experimental flag if we want to do experiments with it. We definitely should not allow to use it in production! Regards, Vladimir > > Anyhow, I think it deserves a simpler example. Submitted the bug and > attached a simple test there: > https://bugs.openjdk.java.net/browse/JDK-8064749 > > Thanks, > -Aleksey. > > On 12.11.2014 19:52, Deneau, Tom wrote: >> Hi all -- >> >> Forwarding a thread which came about on the jmh-dev mail list, as recommended by Aleksey Shipilev (see below). The JMH framework has a timing control thread which sleeps for a certain period, then sets a volatile isDone variable. Meanwhile, the benchmark thread loops doing its benchmark code and also checking the isDone field. A hang occurs if -XX:-UseCompilerSafepoints is used. >> >> The original issue can be reproduced by the following steps >> >> hg clone http://hg.openjdk.java.net/code-tools/jmh >> cd jmh >> mvn clean install -DskipTests=true >> cd jmh-samples >> java -server -XX:-UseCompilerSafepoints -jar target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 >> >> -- Tom Deneau >> >> >> -----Original Message----- >> From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] >> Sent: Wednesday, November 12, 2014 6:09 AM >> To: Deneau, Tom; jmh-dev at openjdk.java.net >> Subject: Re: using -XX:-UseCompilerSafepoints >> >> Hi Tom, >> >> On 11/11/2014 07:34 PM, Deneau, Tom wrote: >>> It looks like a thread that calls Thread.sleep (as the timing control >>> thread does in the harness) will eventually go thru >>> SafepointSynchonize::block (as part of the ThreadBlockInVM >>> destructor). So if there is a looping benchmark thread compiled >>> without Compiler Safepoints, the control thread will be blocked and >>> will never set the isDone flag. >> >> So, you are saying that without the safepoint in the while(!isDone) >> loop in workload, control thread and workload thread will never >> rendezvous on safepoint? I believe this is a bug with >> -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls this >> out specifically for VMThread vs. Mutator threads: >> >> // In a pathological scenario such as that described in CR6415670 >> // the VMthread may sleep just before the mutator(s) become safe. >> // In that case the mutators will be stalled waiting for the safepoint >> // to complete and the the VMthread will be sleeping, waiting for the >> // mutators to rendezvous. The VMthread will eventually wake up and >> // detect that all mutators are safe, at which point we'll again make >> // progress. >> >> If this is a case, you probably need to report this to runtime guys. >> >>> This is probably OK, just need to document that CompilerSafepoints >>> cannot be turned off. >> >> I think it is safe to presume something will go hairy if you are using >> any special VM flag, therefore I am not inclined to document this. >> >> Thanks, >> -Aleksey. >> > > From aleksey.shipilev at oracle.com Wed Nov 12 23:55:42 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 02:55:42 +0300 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5463EF88.1050100@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> Message-ID: <5463F37E.7020804@oracle.com> On 13.11.2014 02:38, Vladimir Kozlov wrote: > On 11/12/14 12:13 PM, Aleksey Shipilev wrote: >> Still not sure if this is a runtime bug: stripping safepoints from the >> non-counted loop seems to be a recipe for disaster. > > This flag does not affect compiled code - so it is not compiler issue. > It is only used in runtime/safepoint.cpp and it guards the code which > protects a polling page. > > There are many bugs which shows current problem. For example: > > https://bugs.openjdk.java.net/browse/JDK-6873333 > > I would say that we have to remove it or at least make it experimental > flag if we want to do experiments with it. > > We definitely should not allow to use it in production! Yes, that's what I meant. By "runtime" I meant JRE as whole, not a particular component. I am not sure why Tom played with this flag to begin with, are there legitimate use cases that force users to mess with safepoint internals? I sure hope there are no such use cases. I agree that demoting this flag from "product" to "experimental" sets the expectations about its impact right. Thanks, -Aleksey. From david.holmes at oracle.com Thu Nov 13 00:03:01 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 10:03:01 +1000 Subject: RFR (S) 8062307: 'Reference handler' thread triggers assert w/ TraceThreadEvents In-Reply-To: <5463EE6E.7060008@oracle.com> References: <5451BADF.8040203@oracle.com> <5451BD59.4060202@oracle.com> <5463ECC9.10309@oracle.com> <5463EE6E.7060008@oracle.com> Message-ID: <5463F535.1070201@oracle.com> On 13/11/2014 9:34 AM, Coleen Phillimore wrote: > > Change looks good. You mean the CCC is approved, not removed. Yep approved - must have been delayed keyboard stutter :) Thanks, David > Coleen > > On 11/12/14, 6:27 PM, David Holmes wrote: >> The CCC for this trivial removal has been removed. >> >> Still need two reviewers please. >> >> David >> >> On 30/10/2014 2:23 PM, David Holmes wrote: >>> On 30/10/2014 2:13 PM, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8062307 >>>> >>>> Webrev: http://cr.openjdk.java.net/~dholmes/8062307/webrev/ >>>> >>>> It turns out that the little known TraceThreadEvents logic has been >>>> broken since at least very early in JDK 5. A develop-only option it was >>>> intended to show when different Thread methods were called (the VM side >>>> of certain java.lang.Thread methods). While that sounds potentially >>>> useful for debugging it seems that in practice it is not - this has >>>> been >>>> broken for over 10 years with nobody noticing: it is unused. So rather >>>> than fix unused code it is proposed to simply delete it instead. >>> >>> Correction this has been noticed in the past: >>> >>> https://bugs.openjdk.java.net/browse/JDK-6757482 (2008-10-09 06:51) >>> >>> http://mail.openjdk.java.net/pipermail/distro-pkg-dev/2008-March/001586.html >>> >>> >>> >>> David >>> >>>> Thanks, >>>> David > From david.holmes at oracle.com Thu Nov 13 00:03:17 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 10:03:17 +1000 Subject: RFR (S) 8062307: 'Reference handler' thread triggers assert w/ TraceThreadEvents In-Reply-To: <5463EF25.40804@oracle.com> References: <5451BADF.8040203@oracle.com> <5451BD59.4060202@oracle.com> <5463ECC9.10309@oracle.com> <5463EF25.40804@oracle.com> Message-ID: <5463F545.1090207@oracle.com> Thanks Jiangli! David On 13/11/2014 9:37 AM, Jiangli Zhou wrote: > Hi David, > > The change looks good. > > Thanks, > Jiangli > > On 11/12/2014 03:27 PM, David Holmes wrote: >> The CCC for this trivial removal has been removed. >> >> Still need two reviewers please. >> >> David >> >> On 30/10/2014 2:23 PM, David Holmes wrote: >>> On 30/10/2014 2:13 PM, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8062307 >>>> >>>> Webrev: http://cr.openjdk.java.net/~dholmes/8062307/webrev/ >>>> >>>> It turns out that the little known TraceThreadEvents logic has been >>>> broken since at least very early in JDK 5. A develop-only option it was >>>> intended to show when different Thread methods were called (the VM side >>>> of certain java.lang.Thread methods). While that sounds potentially >>>> useful for debugging it seems that in practice it is not - this has >>>> been >>>> broken for over 10 years with nobody noticing: it is unused. So rather >>>> than fix unused code it is proposed to simply delete it instead. >>> >>> Correction this has been noticed in the past: >>> >>> https://bugs.openjdk.java.net/browse/JDK-6757482 (2008-10-09 06:51) >>> >>> http://mail.openjdk.java.net/pipermail/distro-pkg-dev/2008-March/001586.html >>> >>> >>> >>> David >>> >>>> Thanks, >>>> David > From david.holmes at oracle.com Thu Nov 13 02:43:42 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 12:43:42 +1000 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault In-Reply-To: <5463B896.10801@oracle.com> References: <545D939D.2030308@oracle.com> <5463B896.10801@oracle.com> Message-ID: <54641ADE.8030504@oracle.com> Hi Chris, Sorry for the delay. On 13/11/2014 5:44 AM, Chris Plummer wrote: > Hi, > > I'm still looking for reviewers. As the change is to the launcher it needs to be reviewed by the launcher owner - which I think is serviceability (though also cc'd Kumar :) ). Launcher change, and your rationale, seems okay to me. I'd probably put the test in to jdk/test/tools/launcher/ though. Thanks, David > thanks, > > Chris > > On 11/7/14 7:53 PM, Chris Plummer wrote: >> This is an initial review for 6762191. I'm guessing there will be >> recommendations to fix in a different way, but thought this would be a >> good time to start the discussion. >> >> https://bugs.openjdk.java.net/browse/JDK-6762191 >> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ >> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ >> >> The bug is that if the -Xss size is set to something very small (like >> 16k), on linux there will be a crash due to overwriting the end of the >> stack. This happens before hotspot can compute its stack needs and >> verify that the stack is big enough. >> >> It didn't seem viable to move the hotspot stack size check earlier. It >> depends on too much other work done before that point, and the changes >> would have been disruptive. The stack size check is currently done in >> os::init_2(). >> >> What is needed is a check before the thread is created. That way we >> can create a thread with a big enough stack to handle all needs up to >> the point of the check in os::init_2(). This initial check does not >> need to be the final check. It just needs to confirm that we have >> enough stack to get us to the check in os::init_2(). >> >> I decided to check in java.c if the -Xss size is too small, and set it >> to a larger size if it is. I hard coded this size to 32k (I'll explain >> why 32k later). I suspect this is the part that will result in some >> debate. If you have better suggestions let me know. If it does stay >> here, then probably the 32k needs to be a #define, and maybe even an >> OS porting interface, but I'm not sure where to put it. >> >> The reason I chose 32k is because this is big enough for all platforms >> to get to the stack size check in os::init_2(). It is also smaller >> than the actual minimum stack size allowed on any platform. 32-bit >> windows has the smallest requirement at 64k. I add some printfs to >> print the minimum stack requirement, and then ran a simple JTReg test >> with every JPRT supported platform to get the results. >> >> The TooSmallStackSize.sh will run "java -version" with -Xss16k, >> -Xss32k, and -XXss, where is the size from the >> error message produced by the JVM, such as in the following: >> >> $ java -Xss32k -version >> The stack size specified is too small, Specify at least 100k >> Error: Could not create the Java Virtual Machine. >> Error: A fatal exception has occurred. Program will exit. >> >> I ran this test through JPRT on all platforms, and they all pass. >> >> One thing to point out is that Windows behaves a bit different than >> the other platforms. It always rounds the stack size up to a multiple >> of 64k , so even if you specify -Xss16k, you get a 64k stack. On >> 32-bit Windows with C1, 64k is also the minimum requirement, so there >> is no error produced in this case. However, on 32-bit Windows with C2, >> 68k is the minimum, so an error is produced since the stack will only >> be 64k. There is no bug here. It's just a bit confusing. >> >> thanks, >> >> Chris > From david.holmes at oracle.com Thu Nov 13 02:57:43 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 12:57:43 +1000 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5463EF88.1050100@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> Message-ID: <54641E27.4090303@oracle.com> On 13/11/2014 9:38 AM, Vladimir Kozlov wrote: > On 11/12/14 12:13 PM, Aleksey Shipilev wrote: >> Hi, >> >> Still not sure if this is a runtime bug: stripping safepoints from the >> non-counted loop seems to be a recipe for disaster. > > This flag does not affect compiled code - so it is not compiler issue. Well, it disables the mechanism that the compiler inserts for checking if a safepoint has been requested. As I've added to the bug report, disabling compiler safepoints should go hand-in-hand with disabling the compilers (ie run with -Xint) - otherwise you have to know that the compiled code will eventually hit a non-compiler safepoint check. > It is only used in runtime/safepoint.cpp and it guards the code which > protects a polling page. > > There are many bugs which shows current problem. For example: > > https://bugs.openjdk.java.net/browse/JDK-6873333 > > I would say that we have to remove it or at least make it experimental > flag if we want to do experiments with it. > > We definitely should not allow to use it in production! If we assume there is a reason it was made a product flag then the correct fix in my opinion would be to fall back to intepreter-only mode when this flag is turned off. If we don't make that assumption then we could still tie it to interpreter-only mode, but we definitely should not make it configurable in product mode without some effort. Or if we can't ascertain a valid reason for ever wanting to do this, we could simply delete the flag altogether. :) Cheers, David > Regards, > Vladimir > >> >> Anyhow, I think it deserves a simpler example. Submitted the bug and >> attached a simple test there: >> https://bugs.openjdk.java.net/browse/JDK-8064749 >> >> Thanks, >> -Aleksey. >> >> On 12.11.2014 19:52, Deneau, Tom wrote: >>> Hi all -- >>> >>> Forwarding a thread which came about on the jmh-dev mail list, as >>> recommended by Aleksey Shipilev (see below). The JMH framework has a >>> timing control thread which sleeps for a certain period, then sets a >>> volatile isDone variable. Meanwhile, the benchmark thread loops >>> doing its benchmark code and also checking the isDone field. A hang >>> occurs if -XX:-UseCompilerSafepoints is used. >>> >>> The original issue can be reproduced by the following steps >>> >>> hg clone http://hg.openjdk.java.net/code-tools/jmh >>> cd jmh >>> mvn clean install -DskipTests=true >>> cd jmh-samples >>> java -server -XX:-UseCompilerSafepoints -jar >>> target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 >>> >>> -- Tom Deneau >>> >>> >>> -----Original Message----- >>> From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] >>> Sent: Wednesday, November 12, 2014 6:09 AM >>> To: Deneau, Tom; jmh-dev at openjdk.java.net >>> Subject: Re: using -XX:-UseCompilerSafepoints >>> >>> Hi Tom, >>> >>> On 11/11/2014 07:34 PM, Deneau, Tom wrote: >>>> It looks like a thread that calls Thread.sleep (as the timing control >>>> thread does in the harness) will eventually go thru >>>> SafepointSynchonize::block (as part of the ThreadBlockInVM >>>> destructor). So if there is a looping benchmark thread compiled >>>> without Compiler Safepoints, the control thread will be blocked and >>>> will never set the isDone flag. >>> >>> So, you are saying that without the safepoint in the while(!isDone) >>> loop in workload, control thread and workload thread will never >>> rendezvous on safepoint? I believe this is a bug with >>> -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls this >>> out specifically for VMThread vs. Mutator threads: >>> >>> // In a pathological scenario such as that described in CR6415670 >>> // the VMthread may sleep just before the mutator(s) become safe. >>> // In that case the mutators will be stalled waiting for the safepoint >>> // to complete and the the VMthread will be sleeping, waiting for the >>> // mutators to rendezvous. The VMthread will eventually wake up and >>> // detect that all mutators are safe, at which point we'll again make >>> // progress. >>> >>> If this is a case, you probably need to report this to runtime guys. >>> >>>> This is probably OK, just need to document that CompilerSafepoints >>>> cannot be turned off. >>> >>> I think it is safe to presume something will go hairy if you are using >>> any special VM flag, therefore I am not inclined to document this. >>> >>> Thanks, >>> -Aleksey. >>> >> >> From david.holmes at oracle.com Thu Nov 13 03:10:34 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 13:10:34 +1000 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <546151A9.1080100@oracle.com> References: <546151A9.1080100@oracle.com> Message-ID: <5464212A.6070504@oracle.com> Hi Dan, If you still need a Reviewer, looks okay to me. Thanks, David On 11/11/2014 10:00 AM, Daniel D. Daugherty wrote: > Greetings, > > I have a Solaris Full Debug Symbols (FDS) fix ready for review. > Yes, it is a small fix, but it is in Makefiles so feel free to > run screaming from the room... :-) On the plus side the fix does > delete two work around source files (Coleen would say that's a > Good Thing (TM)!) > > The fix is to detect the version of GNU objcopy that is being > used on the machine and only enable Full Debug Symbols when that > version is 2.21.1 or newer. If you don't have the right version, > then the build drops back to pre-FDS build configs with a message > like this: > > WARNING: /usr/sfw/bin/gobjcopy --version info: > WARNING: GNU objcopy 2.15 > WARNING: an objcopy version of 2.21.1 or newer is needed to create valid > .debuginfo files. > WARNING: ignoring above objcopy command. > WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC > version. > WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 > version. > WARNING: Solaris 11 Update 1 contains the correct version. > INFO: no objcopy cmd found so cannot create .debuginfo files. > INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 > > This work is being tracked by the following bug IDs: > > JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC > https://bugs.openjdk.java.net/browse/JDK-8033602 > > JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on > Solaris X86 > https://bugs.openjdk.java.net/browse/JDK-8034005 > > Here is the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ > > Testing: > > - JPRT test jobs to verify that the current JPRT Solaris hosts > are happy > - local builds on my Solaris 10 X86 machine to verify that the > wrong version of GNU objcopy is caught > > Thanks, in advance, for any comments, questions or suggestions. > > Dan From vladimir.kozlov at oracle.com Thu Nov 13 03:39:07 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Nov 2014 19:39:07 -0800 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <54641E27.4090303@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> Message-ID: <546427DB.3070806@oracle.com> I agrer that workaround is -Xint. But if we disable compilation with -UseCompilerSafepoints, the flag becomes useless. You can get the same result with just -Xint. The history shows that it was added at the very beginning of Hotspot development, at the day one. I can only speculate that it was used to find performance effects of safepoints in compiled code . It could be the case that we removed safepoints from Counted loops as result of that investigation. I think it was never intended to be used in production. Although we can fix compilers to generate a runtime call which does safepoint when -UseCompilerSafepoints is specified, it will be useless work, I think. thanks, Vladimir On 11/12/14 6:57 PM, David Holmes wrote: > On 13/11/2014 9:38 AM, Vladimir Kozlov wrote: >> On 11/12/14 12:13 PM, Aleksey Shipilev wrote: >>> Hi, >>> >>> Still not sure if this is a runtime bug: stripping safepoints from the >>> non-counted loop seems to be a recipe for disaster. >> >> This flag does not affect compiled code - so it is not compiler issue. > > Well, it disables the mechanism that the compiler inserts for checking > if a safepoint has been requested. As I've added to the bug report, > disabling compiler safepoints should go hand-in-hand with disabling the > compilers (ie run with -Xint) - otherwise you have to know that the > compiled code will eventually hit a non-compiler safepoint check. > >> It is only used in runtime/safepoint.cpp and it guards the code which >> protects a polling page. >> >> There are many bugs which shows current problem. For example: >> >> https://bugs.openjdk.java.net/browse/JDK-6873333 >> >> I would say that we have to remove it or at least make it experimental >> flag if we want to do experiments with it. >> >> We definitely should not allow to use it in production! > > If we assume there is a reason it was made a product flag then the > correct fix in my opinion would be to fall back to intepreter-only mode > when this flag is turned off. > > If we don't make that assumption then we could still tie it to > interpreter-only mode, but we definitely should not make it configurable > in product mode without some effort. > > Or if we can't ascertain a valid reason for ever wanting to do this, we > could simply delete the flag altogether. :) > > Cheers, > David > >> Regards, >> Vladimir >> >>> >>> Anyhow, I think it deserves a simpler example. Submitted the bug and >>> attached a simple test there: >>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>> >>> Thanks, >>> -Aleksey. >>> >>> On 12.11.2014 19:52, Deneau, Tom wrote: >>>> Hi all -- >>>> >>>> Forwarding a thread which came about on the jmh-dev mail list, as >>>> recommended by Aleksey Shipilev (see below). The JMH framework has a >>>> timing control thread which sleeps for a certain period, then sets a >>>> volatile isDone variable. Meanwhile, the benchmark thread loops >>>> doing its benchmark code and also checking the isDone field. A hang >>>> occurs if -XX:-UseCompilerSafepoints is used. >>>> >>>> The original issue can be reproduced by the following steps >>>> >>>> hg clone http://hg.openjdk.java.net/code-tools/jmh >>>> cd jmh >>>> mvn clean install -DskipTests=true >>>> cd jmh-samples >>>> java -server -XX:-UseCompilerSafepoints -jar >>>> target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 >>>> >>>> -- Tom Deneau >>>> >>>> >>>> -----Original Message----- >>>> From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] >>>> Sent: Wednesday, November 12, 2014 6:09 AM >>>> To: Deneau, Tom; jmh-dev at openjdk.java.net >>>> Subject: Re: using -XX:-UseCompilerSafepoints >>>> >>>> Hi Tom, >>>> >>>> On 11/11/2014 07:34 PM, Deneau, Tom wrote: >>>>> It looks like a thread that calls Thread.sleep (as the timing control >>>>> thread does in the harness) will eventually go thru >>>>> SafepointSynchonize::block (as part of the ThreadBlockInVM >>>>> destructor). So if there is a looping benchmark thread compiled >>>>> without Compiler Safepoints, the control thread will be blocked and >>>>> will never set the isDone flag. >>>> >>>> So, you are saying that without the safepoint in the while(!isDone) >>>> loop in workload, control thread and workload thread will never >>>> rendezvous on safepoint? I believe this is a bug with >>>> -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls >>>> this >>>> out specifically for VMThread vs. Mutator threads: >>>> >>>> // In a pathological scenario such as that described in CR6415670 >>>> // the VMthread may sleep just before the mutator(s) become safe. >>>> // In that case the mutators will be stalled waiting for the >>>> safepoint >>>> // to complete and the the VMthread will be sleeping, waiting for the >>>> // mutators to rendezvous. The VMthread will eventually wake up and >>>> // detect that all mutators are safe, at which point we'll again make >>>> // progress. >>>> >>>> If this is a case, you probably need to report this to runtime guys. >>>> >>>>> This is probably OK, just need to document that CompilerSafepoints >>>>> cannot be turned off. >>>> >>>> I think it is safe to presume something will go hairy if you are using >>>> any special VM flag, therefore I am not inclined to document this. >>>> >>>> Thanks, >>>> -Aleksey. >>>> >>> >>> From daniel.daugherty at oracle.com Thu Nov 13 03:54:31 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 12 Nov 2014 20:54:31 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <5464212A.6070504@oracle.com> References: <546151A9.1080100@oracle.com> <5464212A.6070504@oracle.com> Message-ID: <54642B77.3030607@oracle.com> Thanks! I was still in need of a (R)eviewer and a Runtime team member so thanks for covering both... Dan On 11/12/14 8:10 PM, David Holmes wrote: > Hi Dan, > > If you still need a Reviewer, looks okay to me. > > Thanks, > David > > On 11/11/2014 10:00 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >> Yes, it is a small fix, but it is in Makefiles so feel free to >> run screaming from the room... :-) On the plus side the fix does >> delete two work around source files (Coleen would say that's a >> Good Thing (TM)!) >> >> The fix is to detect the version of GNU objcopy that is being >> used on the machine and only enable Full Debug Symbols when that >> version is 2.21.1 or newer. If you don't have the right version, >> then the build drops back to pre-FDS build configs with a message >> like this: >> >> WARNING: /usr/sfw/bin/gobjcopy --version info: >> WARNING: GNU objcopy 2.15 >> WARNING: an objcopy version of 2.21.1 or newer is needed to create valid >> .debuginfo files. >> WARNING: ignoring above objcopy command. >> WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC >> version. >> WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 >> version. >> WARNING: Solaris 11 Update 1 contains the correct version. >> INFO: no objcopy cmd found so cannot create .debuginfo files. >> INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 >> >> This work is being tracked by the following bug IDs: >> >> JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC >> https://bugs.openjdk.java.net/browse/JDK-8033602 >> >> JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on >> Solaris X86 >> https://bugs.openjdk.java.net/browse/JDK-8034005 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ >> >> Testing: >> >> - JPRT test jobs to verify that the current JPRT Solaris hosts >> are happy >> - local builds on my Solaris 10 X86 machine to verify that the >> wrong version of GNU objcopy is caught >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan From david.holmes at oracle.com Thu Nov 13 05:40:26 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 15:40:26 +1000 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <546427DB.3070806@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> Message-ID: <5464444A.7030601@oracle.com> Hi Vladimir, On 13/11/2014 1:39 PM, Vladimir Kozlov wrote: > I agrer that workaround is -Xint. But if we disable compilation with > -UseCompilerSafepoints, the flag becomes useless. You can get the same > result with just -Xint. > > The history shows that it was added at the very beginning of Hotspot > development, at the day one. I can only speculate that it was used to > find performance effects of safepoints in compiled code . It could be > the case that we removed safepoints from Counted loops as result of that > investigation. I think it was never intended to be used in production. > > Although we can fix compilers to generate a runtime call which does > safepoint when -UseCompilerSafepoints is specified, it will be useless > work, I think. There is some history in JDK-4974572 (which is non-public I'm afraid). To all intents and purposes the flag at that point was used to enable testing of workarounds if problems were suspected in the "new" safepointing code. I think it has outlived its usefulness by a few major releases so I'm happy to see it go. Cheers, David > thanks, > Vladimir > > On 11/12/14 6:57 PM, David Holmes wrote: >> On 13/11/2014 9:38 AM, Vladimir Kozlov wrote: >>> On 11/12/14 12:13 PM, Aleksey Shipilev wrote: >>>> Hi, >>>> >>>> Still not sure if this is a runtime bug: stripping safepoints from the >>>> non-counted loop seems to be a recipe for disaster. >>> >>> This flag does not affect compiled code - so it is not compiler issue. >> >> Well, it disables the mechanism that the compiler inserts for checking >> if a safepoint has been requested. As I've added to the bug report, >> disabling compiler safepoints should go hand-in-hand with disabling the >> compilers (ie run with -Xint) - otherwise you have to know that the >> compiled code will eventually hit a non-compiler safepoint check. >> >>> It is only used in runtime/safepoint.cpp and it guards the code which >>> protects a polling page. >>> >>> There are many bugs which shows current problem. For example: >>> >>> https://bugs.openjdk.java.net/browse/JDK-6873333 >>> >>> I would say that we have to remove it or at least make it experimental >>> flag if we want to do experiments with it. >>> >>> We definitely should not allow to use it in production! >> >> If we assume there is a reason it was made a product flag then the >> correct fix in my opinion would be to fall back to intepreter-only mode >> when this flag is turned off. >> >> If we don't make that assumption then we could still tie it to >> interpreter-only mode, but we definitely should not make it configurable >> in product mode without some effort. >> >> Or if we can't ascertain a valid reason for ever wanting to do this, we >> could simply delete the flag altogether. :) >> >> Cheers, >> David >> >>> Regards, >>> Vladimir >>> >>>> >>>> Anyhow, I think it deserves a simpler example. Submitted the bug and >>>> attached a simple test there: >>>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>>> >>>> Thanks, >>>> -Aleksey. >>>> >>>> On 12.11.2014 19:52, Deneau, Tom wrote: >>>>> Hi all -- >>>>> >>>>> Forwarding a thread which came about on the jmh-dev mail list, as >>>>> recommended by Aleksey Shipilev (see below). The JMH framework has a >>>>> timing control thread which sleeps for a certain period, then sets a >>>>> volatile isDone variable. Meanwhile, the benchmark thread loops >>>>> doing its benchmark code and also checking the isDone field. A hang >>>>> occurs if -XX:-UseCompilerSafepoints is used. >>>>> >>>>> The original issue can be reproduced by the following steps >>>>> >>>>> hg clone http://hg.openjdk.java.net/code-tools/jmh >>>>> cd jmh >>>>> mvn clean install -DskipTests=true >>>>> cd jmh-samples >>>>> java -server -XX:-UseCompilerSafepoints -jar >>>>> target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 >>>>> >>>>> -- Tom Deneau >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] >>>>> Sent: Wednesday, November 12, 2014 6:09 AM >>>>> To: Deneau, Tom; jmh-dev at openjdk.java.net >>>>> Subject: Re: using -XX:-UseCompilerSafepoints >>>>> >>>>> Hi Tom, >>>>> >>>>> On 11/11/2014 07:34 PM, Deneau, Tom wrote: >>>>>> It looks like a thread that calls Thread.sleep (as the timing control >>>>>> thread does in the harness) will eventually go thru >>>>>> SafepointSynchonize::block (as part of the ThreadBlockInVM >>>>>> destructor). So if there is a looping benchmark thread compiled >>>>>> without Compiler Safepoints, the control thread will be blocked and >>>>>> will never set the isDone flag. >>>>> >>>>> So, you are saying that without the safepoint in the while(!isDone) >>>>> loop in workload, control thread and workload thread will never >>>>> rendezvous on safepoint? I believe this is a bug with >>>>> -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls >>>>> this >>>>> out specifically for VMThread vs. Mutator threads: >>>>> >>>>> // In a pathological scenario such as that described in CR6415670 >>>>> // the VMthread may sleep just before the mutator(s) become safe. >>>>> // In that case the mutators will be stalled waiting for the >>>>> safepoint >>>>> // to complete and the the VMthread will be sleeping, waiting for >>>>> the >>>>> // mutators to rendezvous. The VMthread will eventually wake up and >>>>> // detect that all mutators are safe, at which point we'll again >>>>> make >>>>> // progress. >>>>> >>>>> If this is a case, you probably need to report this to runtime guys. >>>>> >>>>>> This is probably OK, just need to document that CompilerSafepoints >>>>>> cannot be turned off. >>>>> >>>>> I think it is safe to presume something will go hairy if you are using >>>>> any special VM flag, therefore I am not inclined to document this. >>>>> >>>>> Thanks, >>>>> -Aleksey. >>>>> >>>> >>>> From kirk at kodewerk.com Thu Nov 13 06:41:32 2014 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Thu, 13 Nov 2014 07:41:32 +0100 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5463F37E.7020804@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <5463F37E.7020804@oracle.com> Message-ID: > > I agree that demoting this flag from "product" to "experimental" sets > the expectations about its impact right. +1 ? Kirk From yumin.qi at oracle.com Thu Nov 13 06:52:32 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Wed, 12 Nov 2014 22:52:32 -0800 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <3CF28613-0C0F-44E2-869A-FF5B01D7E575@oracle.com> References: <543C591E.8010602@oracle.com> <544AB477.4000204@oracle.com> <544ADC07.6080904@oracle.com> <544AE76A.9030701@oracle.com> <544E5123.1060202@oracle.com> <544E8844.1070907@oracle.com> <0FB37288-5995-4E0B-B005-E00A4FB3B22A@oracle.com> <5454218D.40009@oracle.com> <5456EADF.4050203@oracle.com> <3CF28613-0C0F-44E2-869A-FF5B01D7E575@oracle.com> Message-ID: <54645530.6010107@oracle.com> Thanks, Karen Now I have a standalone tests which easy to reproduce. I am trying to set debugger to trace the problem. While, I will try the suggested fix from you too. When set AbortVMOnException, it is late for debugger to attach since the execution goes to abort. Currently not easy run single (we loop in the test script) time to fail. Thanks Yumin On 11/12/2014 8:27 AM, Karen Kinnear wrote: > I think there are three things we need to figure out. > > 1. I reproduced a problem in TestThread2. Below was the information from that and my analysis > - all - comments on my analysis are very welcome > - Yumin - please try the suggested test change below to see if it helps. > > - that is the only example I have seen the full details for. > > 2. Does the circularity error actually occur in the main thread and if so why? > - need to catch in a debugger/hs_err file a situation in which this occurs in the main thread please. > We need the full stack trace for this - native and java please > - run this without the test change I suggested please > - try to catch ClassCircularityError in the main thread > > 3. figure out why we we see this problem more frequently > - I am not convinced this problem didn't already exist - the test logic has some very odd comments and workarounds which seem to imply there > were intermittent problems from the beginning > - that said - worth figuring out if for instance, the sun.misc.URLClassPath logic was rewritten (and when) to add $JarLoader$2 > - and looking at the history of test failure > > thanks, > Karen > > On Nov 2, 2014, at 9:39 PM, David Holmes wrote: > >> On 1/11/2014 9:55 AM, Yumin Qi wrote: >>> Karen, >>> >>> Thanks for your detail message for debugging. Yes, from my debugging, >>> the exception did happen in TestThread other than main thread. I have no >>> idea why in the end the exception was reported in main thread. >> Until that question is answered I will remain uneasy about simply tweaking the test until it no longer fails. I would also like to know when it started failing - Karen alludes to the possible introduction of a new inner class at some point. >> >> Thanks, >> David >> >>> You mention >>> >>> So that change to the test would be: >>> in TestTransformer: >>> if (loader != null) { >>> if (tName.equals("TestThread")) { >>> { >>> loadClasses(3); >>> } >>> } >>> return null; >>> } >>> >>> >>> The loader is the one defined in the test case, right? The system class >>> loader is never null. >>> I will try this change, let's see if it can work it out. >>> >>> Thanks >>> Yumin >>> >>> On 10/31/2014 3:29 PM, Karen Kinnear wrote: >>>> Yumin, >>>> >>>> From your earlier exception stack trace (many thanks) you reported: >>>> >>>> Exception in thread "main" java.lang.ClassCircularityError: (no - I >>>> don't know why this is in thread "main") >>>> sun/misc/URLClassPath$JarLoader$2 >>>> at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:771) >>>> at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:843) >>>> at sun.misc.URLClassPath.getResource(URLClassPath.java:199) >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:364) >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:426) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:359) >>>> at java.lang.Class.forName0(Native Method) >>>> at java.lang.Class.forName(Class.java:340) >>>> at >>>> ParallelTransformerLoaderApp.loadClasses(ParallelTransformerLoaderApp.java:83) >>>> >>>> at >>>> ParallelTransformerLoaderApp.main(ParallelTransformerLoaderApp.java:45) >>>> >>>> >>>> So I ran with -XX:AbortVMOnException=java.lang.ClassCircularityError >>>> -XX:+ShowMessageBoxOnError to get >>>> a log file and stack trace. See my instructions below on how to do that. >>>> >>>> I did this, attached a debugger, which didn't help enough since I >>>> needed to see the java stack frames, >>>> and got an hs_err_log also, so the stack traces came from the error >>>> log. >>>> >>>> The stack trace was on Thread 2, which in the hs_err_log was >>>> TestThread (which makes sense for what the test logic says). >>>> See later in email for stack traces from Thread 2. >>>> >>>> Summary of stack trace: >>>> >>>> TestThread: >>>> loadClasses(#) -> forName(TestClass#, URLClassLoader) >>>> vm calls out to URLClassLoader.loadClass(String) which is >>>> inherited from java.lang.ClassLoader.loadClass(String) >>>> ... calls java.net.URLClassLoader.findClass(...) which calls >>>> DoPrivileged java.net.URLClassLoader$1.run which calls >>>> sun.misc.URLClassPath.getResource(name, false) which calls >>>> sun.misc.URLClassPath$JarLoader.getResource which calls >>>> sun.misc.URLClassPath$JarLoader.checkResource which >>>> tries to call sun.misc.URLClassPath$JarLoader$2 >>>> - and then the transformer jumps in with loadClasses(# (which we >>>> know is 3) and walks the same logic which tries to load >>>> sun.misc.URLClassPath$JarLoader$2 again >>>> >>>> Note that in the placeholder table information that Yumin printed, the >>>> circularity error is on sun.misc.URLClassPath$JarLoader$2 with the >>>> null == boot loader, which >>>> makes sense -- that is the appropriate defining loader, and therefore >>>> the one the CFLH would intercept during the defineClass phase. >>>> >>>> In the sun.misc.URLClassPath.java file, in the class JarLoader, in the >>>> method checkResource >>>> ... return new Resource() { ... } >>>> -- I do not know why that generates sun.misc.URLClassPath$JarLoader$1, >>>> $2 and $3 at build time or when that was added. >>>> I would guess that is when the bug started happening. >>>> >>>> When I have a successful run, sun.misc.URLClassPath$JarLoader$2 loads >>>> before any TestClass1 loads. >>>> >>>> My belief is that the point of the test is to test parallel class >>>> loading for URL class loaders. >>>> I don't think the point is to test the bootstrap class loader, nor to >>>> test bootstrapping - i.e. running the agent before >>>> we have loaded sufficient classes to allow loading URLClassLoader >>>> classes. >>>> >>>> What I suggested to Yumin that he try would be to change the test to >>>> NOT intercept boot loader loads, so that >>>> sun.misc.URLClassPath$JarLoader$# >>>> can load which will in turn allow classes loaded by a URLClassLoader >>>> subclass to load. >>>> >>>> So that change to the test would be: >>>> in TestTransformer: >>>> if (loader != null) { >>>> if (tName.equals("TestThread")) { >>>> { >>>> loadClasses(3); >>>> } >>>> } >>>> return null; >>>> } >>>> // I also suspect with that change, we can remove the sleep loop >>>> Note: there was a printed message which said that the Thread "Signal >>>> Dispatcher" has called transform(), which I >>>> ignored, however it is good that we don't call loadClass on that >>>> thread - which is part of what the sleep loop does - >>>> but that would be handled by the boot loader screening above >>>> >>>> Alternatively we can preload the URLClassPath classes, but I don't >>>> think we want to do that, or >>>> we can have the agent explicitly screen on a variety of jdk >>>> bootstrapping classes. But I think the cleaner >>>> solution is to screen on the boot loader. >>>> >>>> Does that make any sense to others? >>>> >>>> thanks, >>>> Karen >>>> >>>> p.s. How to run with hotspot flags (jtreg has a -show:rerun option, >>>> but with a shell script in the test, this is more complex, so >>>> the following should be easier): >>>> >>>> So what I did was run the test once for it to pass (not your script, >>>> but just once with jtreg) so that it generated >>>> the $DST/work directory. >>>> I then created a rerun.csh script - attached - you can modify for your >>>> own $DST directory. >>>> I used it to be able to quickly rerun the test without the jtreg >>>> framework and compile time etc. but mostly >>>> to be able to actually add hotspot command-line flags. >>>> >>>> >>>> >>>> >>>> p.p.s. details from the error log (let me know if you want me to >>>> attach the error log to the bug report) >>>> >>>> note: error log shows last 10 events including: >>>> Event: 0.928 loading class sun/misc/URLClassPath$JarLoader$2 >>>> Event: 0.928 loading class TestClass3 >>>> Event: 0.929 loading class TestClass3 done >>>> Event: 0.929 loading class java/lang/ClassCircularityError >>>> Event: 0.929 loading class java/lang/ClassCircularityError done >>>> >>>> TestThread >>>> >>>> java frames: >>>> >>>> j >>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>> >>>> j >>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>> >>>> j >>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>> >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>> v ~StubRoutines::call_stub >>>> j >>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>> >>>> j >>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>> j >>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>>> v ~StubRoutines::call_stub >>>> j >>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>> >>>> j >>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>> >>>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>>> j >>>> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>>> >>>> j >>>> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>>> >>>> j >>>> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>>> >>>> v ~StubRoutines::call_stub >>>> j >>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>> >>>> j >>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>> >>>> j >>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>> >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>> v ~StubRoutines::call_stub >>>> j >>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>> >>>> j >>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>> j >>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>>> v ~StubRoutines::call_stub >>>> j >>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>> >>>> j >>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>> >>>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>>> j ParallelTransformerLoaderApp$TestThread.run()V+4 >>>> v ~StubRoutines::call_stub >>>> >>>> >>>> >>>> detailed frames: >>>> >>>> V [libjvm.so+0x760f5a] Exceptions::_throw_msg(Thread*, char const*, >>>> int, Symbol*, char const*)+0x7c >>>> V [libjvm.so+0xce005c] >>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>> Handle, Thread*)+0x7d8 >>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>> Handle, Handle, Thread*)+0x26d >>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>> Handle, Handle, bool, Thread*)+0x39 >>>> V [libjvm.so+0x690fbc] >>>> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >>>> ConstantPool*, int)+0x14a >>>> j >>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>> >>>> j >>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>> >>>> j >>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>> >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>> v ~StubRoutines::call_stub >>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>> JavaCallArguments*, Thread*)+0x7d >>>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>>> j >>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>> >>>> j >>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>> j >>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>>> v ~StubRoutines::call_stub >>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>> JavaCallArguments*, Thread*)+0x7d >>>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >>>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >>>> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>>> V [libjvm.so+0xce2096] >>>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>>> V [libjvm.so+0xce00a8] >>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>> Handle, Thread*)+0x824 >>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>> Handle, Handle, Thread*)+0x26d >>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>> Handle, Handle, bool, Thread*)+0x39 >>>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >>>> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>>> j >>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>> >>>> j >>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>> >>>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>>> j >>>> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>>> >>>> j >>>> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>>> >>>> j >>>> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>>> >>>> v ~StubRoutines::call_stub >>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>> JavaCallArguments*, Thread*)+0x7d >>>> V [libjvm.so+0x911bfb] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, >>>> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x3cd >>>> V [libjvm.so+0x916918] jni_CallObjectMethod+0x388 >>>> C [libinstrument.so+0x4eb5] transformClassFile+0x1e5 >>>> C [libinstrument.so+0x1e06] eventHandlerClassFileLoadHook+0x96 >>>> V [libjvm.so+0xa04afa] >>>> JvmtiClassFileLoadHookPoster::post_to_env(JvmtiEnv*, bool)+0x1a8 >>>> V [libjvm.so+0xa0485e] >>>> JvmtiClassFileLoadHookPoster::post_all_envs()+0x8a >>>> V [libjvm.so+0xa047c6] JvmtiClassFileLoadHookPoster::post()+0x18 >>>> V [libjvm.so+0x9fb6e1] >>>> JvmtiExport::post_class_file_load_hook(Symbol*, Handle, Handle, >>>> unsigned char**, unsigned char**, JvmtiCachedClassFileData**)+0x85 >>>> V [libjvm.so+0x5cd17d] ClassFileParser::parseClassFile(Symbol*, >>>> ClassLoaderData*, Handle, KlassHandle, GrowableArray*, >>>> TempNewSymbol&, bool, Thread*)+0x2af >>>> V [libjvm.so+0x5dd441] ClassFileParser::parseClassFile(Symbol*, >>>> ClassLoaderData*, Handle, TempNewSymbol&, bool, Thread*)+0x95 >>>> V [libjvm.so+0x5daf03] ClassLoader::load_classfile(Symbol*, >>>> Thread*)+0x2ed >>>> V [libjvm.so+0xce1cc4] >>>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x118 >>>> V [libjvm.so+0xce00a8] >>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>> Handle, Thread*)+0x824 >>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>> Handle, Handle, Thread*)+0x26d >>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>> Handle, Handle, bool, Thread*)+0x39 >>>> V [libjvm.so+0x690fbc] >>>> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >>>> ConstantPool*, int)+0x14a >>>> j >>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>> >>>> j >>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>> >>>> j >>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>> >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>> v ~StubRoutines::call_stub >>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>> JavaCallArguments*, Thread*)+0x7d >>>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>>> j >>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>> >>>> j >>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>> j >>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; >>>> v ~StubRoutines::call_stub >>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>> JavaCallArguments*, Thread*)+0x7d >>>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >>>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >>>> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>>> V [libjvm.so+0xce2096] >>>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>>> V [libjvm.so+0xce00a8] >>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>> Handle, Thread*)+0x824 >>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>> Handle, Handle, Thread*)+0x26d >>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>> Handle, Handle, bool, Thread*)+0x39 >>>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >>>> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>>> j >>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>> >>>> j >>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>> >>>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>>> ...... >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Oct 27, 2014, at 2:00 PM, serguei.spitsyn at oracle.com wrote: >>>> >>>>> Ok. >>>>> >>>>> Thanks, Dan! >>>>> Serguei >>>>> >>>>> >>>>> On 10/27/14 7:05 AM, Daniel D. Daugherty wrote: >>>>>>> The test case was added by Dan. >>>>>>> We may want to ask him to clarify the test case purpose. >>>>>>> (added Dan to the to-list) >>>>>> Here's the changeset that added the test: >>>>>> >>>>>> $ hg log -v -r bca8bf23ac59 >>>>>> test/java/lang/instrument/ParallelTransformerLoader.sh >>>>>> changeset: 132:bca8bf23ac59 >>>>>> user: dcubed >>>>>> date: Mon Mar 24 15:05:09 2008 -0700 >>>>>> files: test/java/lang/instrument/ParallelTransformerLoader.sh >>>>>> test/java/lang/instrument/ParallelTransformerLoaderAgent.java >>>>>> test/java/lang/instrument/ParallelTransformerLoaderApp.java >>>>>> test/java/lang/instrument/TestClass1.java >>>>>> test/java/lang/instrument/TestClass2.java >>>>>> test/java/lang/instrument/TestClass3.java >>>>>> description: >>>>>> 5088398: 3/2 java.lang.instrument TCK test deadlock (test11) >>>>>> Summary: Add regression test for single-threaded bootstrap classloader. >>>>>> Reviewed-by: sspitsyn >>>>>> >>>>>> >>>>>> Based on my e-mail archive for this bug and from the bug report itself, >>>>>> it looks like we got this test from Wily Labs. The original bug was a >>>>>> deadlock that stopped being reproducible after: >>>>>> >>>>>> Karen fixed the bootstrap class loader to work in parallel via: >>>>>> >>>>>> 4997893 4/5 Investigate allowing bootstrap loader to work in >>>>>> parallel >>>>>> >>>>>> with that fix in place the deadlock no longer reproduces. >>>>>> I'm planning to use this bug as the vehicle for getting >>>>>> the test program into the INSTRUMENT_REGRESSION test suite. >>>>>> >>>>>> *** (#2 of 2): 2008-02-29 18:20:17 GMT+00:00 daniel.daugherty at sun.com >>>>>> >>>>>> >>>>>> A careful reading of JDK-5088398 might reveal the intentions of this >>>>>> test... >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>> On 10/24/14 5:57 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Yumin, >>>>>>> >>>>>>> On 10/24/14 4:08 PM, Yumin Qi wrote: >>>>>>>> Serguei, >>>>>>>> >>>>>>>> Thanks for your comments. >>>>>>>> This test happens intermittently, but now it can repeat with 8/9. >>>>>>>> Loading TestClass1 in main thread while loading TestClass2 in >>>>>>>> TestThread in parallel. They both will call transform since >>>>>>>> TestClass[1-3] are loaded via agent. When loading TestClass2, it >>>>>>>> will call loading TestClass3 in TestThread. >>>>>>>> Note in the main thread, for loop: >>>>>>>> >>>>>>>> for (int i = 0; i < kNumIterations; i++) >>>>>>>> { >>>>>>>> // load some classes from multiple threads >>>>>>>> (this thread and one other) >>>>>>>> Thread testThread = new TestThread(2); >>>>>>>> testThread.start(); >>>>>>>> loadClasses(1); >>>>>>>> >>>>>>>> // log that it completed and reset for the >>>>>>>> next iteration >>>>>>>> testThread.join(); >>>>>>>> System.out.print("."); >>>>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader(); >>>>>>>> } >>>>>>>> >>>>>>>> The loader got renewed after testThread.join(). So both threads >>>>>>>> are using the exact same class loader. >>>>>>> You are right, thanks. >>>>>>> It means that all three classes (TesClass1, TestClass2 and TestClass3) >>>>>>> are loaded by the same class loader in each iteration. >>>>>>> >>>>>>> However, I see more cases when the TestClass3 gets loaded. >>>>>>> It happens in a CFLH event when any other class (not TestClass*) in >>>>>>> the system is loaded. >>>>>>> The class loading thread can be any, not only "main" or "TestClass" >>>>>>> thread. >>>>>>> I suspect this test case mostly targets class loading that happens >>>>>>> on other threads. >>>>>>> It is because of the lines: >>>>>>> // In 160_03 and older, transform() is called >>>>>>> // with the "system_loader_lock" held and that >>>>>>> // prevents the bootstrap class loaded from >>>>>>> // running in parallel. If we add a slight >>>>>>> sleep >>>>>>> // delay here when the transform() call is not >>>>>>> // main or TestThread, then the deadlock in >>>>>>> // 160_03 and older is much more reproducible. >>>>>>> if (!tName.equals("main") && >>>>>>> !tName.equals("TestThread")) { >>>>>>> System.out.println("Thread '" + tName + >>>>>>> "' has called transform()"); >>>>>>> try { >>>>>>> Thread.sleep(500); >>>>>>> } catch (InterruptedException ie) { >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> What about the following? >>>>>>> >>>>>>> In the ParallelTransformerLoaderAgent.java make this change: >>>>>>> if (!tName.equals("main")) >>>>>>> => if (tName.equals("TestThread")) >>>>>>> >>>>>>> Does such updated test still failing? >>>>>>> >>>>>>>> After create a new class loader, next loop will use the loader. >>>>>>>> This is why quite often on the stack trace we can see it resolves >>>>>>>> JarLoader$2. >>>>>>>> >>>>>>>> I am not quite understand the test case either. Loading TestClass3 >>>>>>>> inside transform using the same classloader will cause call to >>>>>>>> transform again and form a circle. Nonetheless, if we see >>>>>>>> TestClass2 already loaded, the loop will end but that still is a >>>>>>>> risk. >>>>>>> In fact, I don't like that the test loads the class TestClass3 at >>>>>>> the TestClass3 CFLH event. >>>>>>> However, it is interesting to know why we did not see (is it the >>>>>>> case?) this issue before. >>>>>>> Also, it is interesting why the test stops failing with you fix >>>>>>> (replacing loader with SystemClassLoader). >>>>>>> >>>>>>> The test case was added by Dan. >>>>>>> We may want to ask him to clarify the test case purpose. >>>>>>> (added Dan to the to-list) >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>>> Thanks >>>>>>>> Yumin >>>>>>>> >>>>>>>> On 10/24/2014 1:20 PM, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi Yumin, >>>>>>>>> >>>>>>>>> Below is some analysis to make sure I understand the test >>>>>>>>> scenario correctly. >>>>>>>>> >>>>>>>>> The ParallelTransformerLoaderApp.main() executes a 1000 iteration >>>>>>>>> loop. >>>>>>>>> At each iteration it does: >>>>>>>>> - creates and starts a new TestThread >>>>>>>>> - loads TestClass1 with the current class loader: >>>>>>>>> ParallelTransformerLoaderAgent.getClassLoader() >>>>>>>>> - changes the current class loader with new one: >>>>>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader() >>>>>>>>> >>>>>>>>> The TestThread loads the TestClass2 concurrently with the main >>>>>>>>> thread. >>>>>>>>> >>>>>>>>> At the CFLH events, the ParallelTransformerLoaderAgent does the >>>>>>>>> class retransformation. >>>>>>>>> If the thread loading the class is not "main", it loads the class >>>>>>>>> TestClass3 >>>>>>>>> with the current class loader >>>>>>>>> ParallelTransformerLoaderAgent.getClassLoader(). >>>>>>>>> >>>>>>>>> Sometimes, the TestClass2 and TestClass3 are loaded by the same >>>>>>>>> class loader recursively. >>>>>>>>> It happens if the class loader has not been changed between >>>>>>>>> loading TestClass2 and TestClass3 classes. >>>>>>>>> >>>>>>>>> I'm not convinced yet the test is incorrect. >>>>>>>>> And it is not clear why do we get a ClassCircularityError. >>>>>>>>> >>>>>>>>> Please, let me know if the above understanding is wrong. >>>>>>>>> I also see the reply from David and share his concerns. >>>>>>>>> >>>>>>>>> It is not clear if this failure is a regression. >>>>>>>>> Did we observe this issue before? >>>>>>>>> If - NOT then when and why had this failure started to appear? >>>>>>>>> >>>>>>>>> Unfortunately, it is impossible to look at the test run history >>>>>>>>> at the moment. >>>>>>>>> The Aurora is at a maintenance. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> On 10/13/14 3:58 PM, Yumin Qi wrote: >>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8038468 >>>>>>>>>> webrev:*http://cr.openjdk.java.net/~minqi/8038468/webrev00/ >>>>>>>>>> >>>>>>>>>> the bug marked as confidential so post the webrev internally. >>>>>>>>>> >>>>>>>>>> Problem: The test case tries to load a class from the same jar >>>>>>>>>> via agent in the middle of loading another class from the jar >>>>>>>>>> via same class loader in same thread. The call happens in >>>>>>>>>> transform which is a rare case --- in middle of loading class, >>>>>>>>>> loading another class. The result is a CircularityError. When >>>>>>>>>> first class is in loading, in vm we put JarLoader$2 on place >>>>>>>>>> holder table, then we start the defineClass, which calls >>>>>>>>>> transform, begins loading the second class so go along the same >>>>>>>>>> routine for loading JarLoader$2 first, found it already in >>>>>>>>>> placeholder table. A CircularityError is thrown. >>>>>>>>>> Fix: The test case should not call loading class with same class >>>>>>>>>> loader in same thread from same jar in 'transform' method. I >>>>>>>>>> modify it loading with system class loader and we expect see >>>>>>>>>> ClassNotFoundException. Detail see bug comments. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Yumin * From aleksey.shipilev at oracle.com Thu Nov 13 07:43:06 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 10:43:06 +0300 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5464444A.7030601@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> Message-ID: <5464610A.20901@oracle.com> On 13.11.2014 08:40, David Holmes wrote: > There is some history in JDK-4974572 (which is non-public I'm afraid). > To all intents and purposes the flag at that point was used to enable > testing of workarounds if problems were suspected in the "new" > safepointing code. I think it has outlived its usefulness by a few major > releases so I'm happy to see it go. Filed: https://bugs.openjdk.java.net/browse/JDK-8064777 I'll do a patch to remove the flag. -Aleksey. From peter.levart at gmail.com Thu Nov 13 08:24:53 2014 From: peter.levart at gmail.com (Peter Levart) Date: Thu, 13 Nov 2014 09:24:53 +0100 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> <545F642E.30205@gmail.com> Message-ID: <54646AD5.4000404@gmail.com> On 11/12/2014 07:27 PM, David Chase wrote: > Hello Peter, > >> Sadly, this seems not to be the case for MemberNames or for ?Types?. > That statement is inoperative. Mistakes were made. > It?s compareTo that they lack. Yes, I say your quite tricky implementation of MemberName.compareTo, based on hashCode(s), String representations, etc... The hash-table based interning does not need it though. Regards, Peter > David > > > On 2014-11-09, at 7:55 AM, Peter Levart wrote: > >> Hi David, >> >> I played a little with the idea of having a hash table instead of packed sorted array for interning. Using ConcurrentHashMap would present quite some memory overhead. A more compact representation is possible in the form of a linear-scan hash table where elements of array are MemberNames themselves: >> >> http://cr.openjdk.java.net/~plevart/misc/MemberName.intern/jdk.06.diff/ >> >> This is a drop-in replacement for MemberName on top of your jdk.06 patch. If you have some time, you can run this with your performance tests to see if it presents any difference. If not, then perhaps this interning is not so performance critical after all. >> >> Regards, Peter From david.holmes at oracle.com Thu Nov 13 08:47:19 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 18:47:19 +1000 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5464610A.20901@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> <5464610A.20901@oracle.com> Message-ID: <54647017.8000704@oracle.com> On 13/11/2014 5:43 PM, Aleksey Shipilev wrote: > On 13.11.2014 08:40, David Holmes wrote: >> There is some history in JDK-4974572 (which is non-public I'm afraid). >> To all intents and purposes the flag at that point was used to enable >> testing of workarounds if problems were suspected in the "new" >> safepointing code. I think it has outlived its usefulness by a few major >> releases so I'm happy to see it go. > > Filed: > https://bugs.openjdk.java.net/browse/JDK-8064777 You actually filed 8064776 first :) But neither is needed as removal can be the solution of 8064749. David > I'll do a patch to remove the flag. > > -Aleksey. > From david.holmes at oracle.com Thu Nov 13 08:50:05 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 18:50:05 +1000 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <54637A9A.9040108@sap.com> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> <5463149A.6020506@oracle.com> <54637A9A.9040108@sap.com> Message-ID: <546470BD.9050303@oracle.com> On 13/11/2014 1:19 AM, Haug, Gunter wrote: > > On 12.11.2014 09:04, David Holmes wrote: >> Hi Gunter, >> >> On 11/11/2014 11:23 PM, Haug, Gunter wrote: >>> Hi All, >>> >>> The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs >>> improvement)' makes use of getrusage() to retrieve accurate >>> per-thread data on resource usage. We can use exactly the same code >>> on AIX to achieve this. >>> >>> Please review the following change: >>> >>> http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8064471 >> >> I have a couple of comments on this code which presumably also apply >> to the orginal :( > Yes, they apply to the original as well, see below. >> >> First this comment is no longer applicable (actually it was never >> applicable to AIX!): >> >> // For now, we say that linux does not support vtime. I have no idea >> // whether it can actually be made to (DLD, 9/13/05). >> > You're right. I will remove it. >> Second this calculation seems wrong: >> >> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >> (double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000 * >> 1000); >> >> To me this performs integer division (ie truncation_) then converts >> the resulting integer to a double. I would expect to see additional >> parentheses (even if not needed, for clarity): >> >> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >> ((double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec)) / (1000 * >> 1000); >> >> or more simply divide by a floating-point value: >> >> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >> >> and you don't need two double casts regardless as the expression will >> be of type double as soon as there is one operand of type double. So >> that should reduce to: >> >> return usage.ru_utime.tv_sec + usage.ru_stime.tv_sec + >> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >> > OK. Do you want that we also change the Linux version like you proposed? I'll leave it up to you. If you leave this as AIX only then it tests the new process :) There can be a follow up cleanup bug for linux. Thanks, David > Thanks, > Gunter > >> Cheers, >> David >> >>> Thanks, >>> Gunter >>> > From aleksey.shipilev at oracle.com Thu Nov 13 08:55:28 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 11:55:28 +0300 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <54647017.8000704@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> <5464610A.20901@oracle.com> <54647017.8000704@oracle.com> Message-ID: <54647200.5000103@oracle.com> On 13.11.2014 11:47, David Holmes wrote: > On 13/11/2014 5:43 PM, Aleksey Shipilev wrote: >> On 13.11.2014 08:40, David Holmes wrote: >>> There is some history in JDK-4974572 (which is non-public I'm afraid). >>> To all intents and purposes the flag at that point was used to enable >>> testing of workarounds if problems were suspected in the "new" >>> safepointing code. I think it has outlived its usefulness by a few major >>> releases so I'm happy to see it go. >> >> Filed: >> https://bugs.openjdk.java.net/browse/JDK-8064777 > > You actually filed 8064776 first :) O_o. The submit timestamps are the same. JIRA is funky today, huh. > But neither is needed as removal can be the solution of 8064749. I thought we are better off tracking this separately, and then close all/any pending bugs about UseCompilerSafepoints as WNF citing 8064776. Still want to do this in 8064749? Also, I wonder if we want to demote the flag to experimental in 8u. This does not sound like a backport of 8064749 at all, but rather a separate change. -Aleksey. From david.holmes at oracle.com Thu Nov 13 09:12:08 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Nov 2014 19:12:08 +1000 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <54647200.5000103@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> <5464610A.20901@oracle.com> <54647017.8000704@oracle.com> <54647200.5000103@oracle.com> Message-ID: <546475E8.6000908@oracle.com> On 13/11/2014 6:55 PM, Aleksey Shipilev wrote: > On 13.11.2014 11:47, David Holmes wrote: >> On 13/11/2014 5:43 PM, Aleksey Shipilev wrote: >>> On 13.11.2014 08:40, David Holmes wrote: >>>> There is some history in JDK-4974572 (which is non-public I'm afraid). >>>> To all intents and purposes the flag at that point was used to enable >>>> testing of workarounds if problems were suspected in the "new" >>>> safepointing code. I think it has outlived its usefulness by a few major >>>> releases so I'm happy to see it go. >>> >>> Filed: >>> https://bugs.openjdk.java.net/browse/JDK-8064777 >> >> You actually filed 8064776 first :) > > O_o. The submit timestamps are the same. JIRA is funky today, huh. > >> But neither is needed as removal can be the solution of 8064749. > > I thought we are better off tracking this separately, and then close > all/any pending bugs about UseCompilerSafepoints as WNF citing 8064776. > Still want to do this in 8064749? It is what I did in 8062307 for the TraceThreadEvents flag. 8064749 contains all the pertinent comments. > Also, I wonder if we want to demote the flag to experimental in 8u. This > does not sound like a backport of 8064749 at all, but rather a separate > change. Any change requires CCC. I don't see any point in making the flag experimental as it doesn't really provide any "experimentation". Happy to let others weigh in. Cheers, David > -Aleksey. > From aleksey.shipilev at oracle.com Thu Nov 13 12:41:16 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 15:41:16 +0300 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous Message-ID: <5464A6EC.6090804@oracle.com> Hi, This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: https://bugs.openjdk.java.net/browse/JDK-8064749 http://cr.openjdk.java.net/~shade/8064749/webrev.01/ Do I understand it right we need a CCC to remove the product flag? Testing: JPRT, vm.quick.testlist Thanks, -Aleksey. From daniel.daugherty at oracle.com Thu Nov 13 13:53:57 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 13 Nov 2014 06:53:57 -0700 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <546475E8.6000908@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> <5464610A.20901@oracle.com> <54647017.8000704@oracle.com> <54647200.5000103@oracle.com> <546475E8.6000908@oracle.com> Message-ID: <5464B7F5.9060000@oracle.com> > Happy to let others weigh in. Please use 8064749 to remove the flag; David H is correct that all the right info is there. Dan On 11/13/14 2:12 AM, David Holmes wrote: > On 13/11/2014 6:55 PM, Aleksey Shipilev wrote: >> On 13.11.2014 11:47, David Holmes wrote: >>> On 13/11/2014 5:43 PM, Aleksey Shipilev wrote: >>>> On 13.11.2014 08:40, David Holmes wrote: >>>>> There is some history in JDK-4974572 (which is non-public I'm >>>>> afraid). >>>>> To all intents and purposes the flag at that point was used to enable >>>>> testing of workarounds if problems were suspected in the "new" >>>>> safepointing code. I think it has outlived its usefulness by a few >>>>> major >>>>> releases so I'm happy to see it go. >>>> >>>> Filed: >>>> https://bugs.openjdk.java.net/browse/JDK-8064777 >>> >>> You actually filed 8064776 first :) >> >> O_o. The submit timestamps are the same. JIRA is funky today, huh. >> >>> But neither is needed as removal can be the solution of 8064749. >> >> I thought we are better off tracking this separately, and then close >> all/any pending bugs about UseCompilerSafepoints as WNF citing 8064776. >> Still want to do this in 8064749? > > It is what I did in 8062307 for the TraceThreadEvents flag. 8064749 > contains all the pertinent comments. > >> Also, I wonder if we want to demote the flag to experimental in 8u. This >> does not sound like a backport of 8064749 at all, but rather a separate >> change. > > Any change requires CCC. I don't see any point in making the flag > experimental as it doesn't really provide any "experimentation". > > Happy to let others weigh in. > > Cheers, > David > > > >> -Aleksey. >> From magnus.ihse.bursie at oracle.com Thu Nov 13 14:44:54 2014 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 13 Nov 2014 15:44:54 +0100 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <546151A9.1080100@oracle.com> References: <546151A9.1080100@oracle.com> Message-ID: <5464C3E6.5000309@oracle.com> On 2014-11-11 01:00, Daniel D. Daugherty wrote: > Greetings, > > I have a Solaris Full Debug Symbols (FDS) fix ready for review. > Yes, it is a small fix, but it is in Makefiles so feel free to > run screaming from the room... :-) On the plus side the fix does > delete two work around source files (Coleen would say that's a > Good Thing (TM)!) ... but you're only deleting the make files? src/os/solaris/add_gnu_debuglink/add_gnu_debuglink.c and src/os/solaris/fix_empty_sec_hdr_flags/fix_empty_sec_hdr_flags.c could be deleted as well, right? Good idea for the fix, anyway. I opened https://bugs.openjdk.java.net/browse/JDK-8064808 to implement a similar solution in configure. /Magnus From daniel.daugherty at oracle.com Thu Nov 13 14:53:06 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 13 Nov 2014 07:53:06 -0700 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464A6EC.6090804@oracle.com> References: <5464A6EC.6090804@oracle.com> Message-ID: <5464C5D2.2050007@oracle.com> On 11/13/14 5:41 AM, Aleksey Shipilev wrote: > Hi, > > This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: > https://bugs.openjdk.java.net/browse/JDK-8064749 > http://cr.openjdk.java.net/~shade/8064749/webrev.01/ src/share/vm/runtime/arguments.cpp Not your problem, but that list is formatted quite inconsistently. It seems like new entries should be added at the bottom so your line 309 should be between these two lines: line 313: #endif // ZERO line 314: { NULL, JDK_Version(0), JDK_Version(0) } src/share/vm/runtime/globals.hpp No comments. src/share/vm/runtime/safepoint.cpp No comments. > Do I understand it right we need a CCC to remove the product flag? Yes. Since this was a product flag, it needs a CCC to remove it. Dan > > Testing: JPRT, vm.quick.testlist > > Thanks, > -Aleksey. > > > From karen.kinnear at oracle.com Thu Nov 13 14:54:14 2014 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 13 Nov 2014 09:54:14 -0500 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <54645530.6010107@oracle.com> References: <543C591E.8010602@oracle.com> <544AB477.4000204@oracle.com> <544ADC07.6080904@oracle.com> <544AE76A.9030701@oracle.com> <544E5123.1060202@oracle.com> <544E8844.1070907@oracle.com> <0FB37288-5995-4E0B-B005-E00A4FB3B22A@oracle.com> <5454218D.40009@oracle.com> <5456EADF.4050203@oracle.com> <3CF28613-0C0F-44E2-869A-FF5B01D7E575@oracle.com> <54645530.6010107@oracle.com> Message-ID: <589B2E5D-9B9E-4945-BC13-A0025B99AABF@oracle.com> Yumin, If you run -XX:+AbortVMOnException=java.lang.ClassCircularityError and -XX:+ShowMessageBoxOnError - you can attach a debugger with the correct stack trace. My notes below show a way to run a faster test script loop - i.e. see the earlier attached rerun.csh script - that way you can just run the test, not the recompile etc. each time, and you can add your flags. Hopefully that will make it fail sooner. hth, Karen On Nov 13, 2014, at 1:52 AM, Yumin Qi wrote: > Thanks, Karen > > Now I have a standalone tests which easy to reproduce. I am trying to set debugger to trace the problem. While, I will try the suggested fix from you too. > When set AbortVMOnException, it is late for debugger to attach since the execution goes to abort. Currently not easy run single (we loop in the test script) time to fail. > > Thanks > Yumin > > On 11/12/2014 8:27 AM, Karen Kinnear wrote: >> I think there are three things we need to figure out. >> >> 1. I reproduced a problem in TestThread2. Below was the information from that and my analysis >> - all - comments on my analysis are very welcome >> - Yumin - please try the suggested test change below to see if it helps. >> >> - that is the only example I have seen the full details for. >> >> 2. Does the circularity error actually occur in the main thread and if so why? >> - need to catch in a debugger/hs_err file a situation in which this occurs in the main thread please. >> We need the full stack trace for this - native and java please >> - run this without the test change I suggested please >> - try to catch ClassCircularityError in the main thread >> >> 3. figure out why we we see this problem more frequently >> - I am not convinced this problem didn't already exist - the test logic has some very odd comments and workarounds which seem to imply there >> were intermittent problems from the beginning >> - that said - worth figuring out if for instance, the sun.misc.URLClassPath logic was rewritten (and when) to add $JarLoader$2 >> - and looking at the history of test failure >> >> thanks, >> Karen >> >> On Nov 2, 2014, at 9:39 PM, David Holmes wrote: >> >>> On 1/11/2014 9:55 AM, Yumin Qi wrote: >>>> Karen, >>>> >>>> Thanks for your detail message for debugging. Yes, from my debugging, >>>> the exception did happen in TestThread other than main thread. I have no >>>> idea why in the end the exception was reported in main thread. >>> Until that question is answered I will remain uneasy about simply tweaking the test until it no longer fails. I would also like to know when it started failing - Karen alludes to the possible introduction of a new inner class at some point. >>> >>> Thanks, >>> David >>> >>>> You mention >>>> >>>> So that change to the test would be: >>>> in TestTransformer: >>>> if (loader != null) { >>>> if (tName.equals("TestThread")) { >>>> { >>>> loadClasses(3); >>>> } >>>> } >>>> return null; >>>> } >>>> >>>> >>>> The loader is the one defined in the test case, right? The system class >>>> loader is never null. >>>> I will try this change, let's see if it can work it out. >>>> >>>> Thanks >>>> Yumin >>>> >>>> On 10/31/2014 3:29 PM, Karen Kinnear wrote: >>>>> Yumin, >>>>> >>>>> From your earlier exception stack trace (many thanks) you reported: >>>>> >>>>> Exception in thread "main" java.lang.ClassCircularityError: (no - I >>>>> don't know why this is in thread "main") >>>>> sun/misc/URLClassPath$JarLoader$2 >>>>> at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:771) >>>>> at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:843) >>>>> at sun.misc.URLClassPath.getResource(URLClassPath.java:199) >>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:364) >>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:426) >>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:359) >>>>> at java.lang.Class.forName0(Native Method) >>>>> at java.lang.Class.forName(Class.java:340) >>>>> at >>>>> ParallelTransformerLoaderApp.loadClasses(ParallelTransformerLoaderApp.java:83) >>>>> >>>>> at >>>>> ParallelTransformerLoaderApp.main(ParallelTransformerLoaderApp.java:45) >>>>> >>>>> >>>>> So I ran with -XX:AbortVMOnException=java.lang.ClassCircularityError >>>>> -XX:+ShowMessageBoxOnError to get >>>>> a log file and stack trace. See my instructions below on how to do that. >>>>> >>>>> I did this, attached a debugger, which didn't help enough since I >>>>> needed to see the java stack frames, >>>>> and got an hs_err_log also, so the stack traces came from the error >>>>> log. >>>>> >>>>> The stack trace was on Thread 2, which in the hs_err_log was >>>>> TestThread (which makes sense for what the test logic says). >>>>> See later in email for stack traces from Thread 2. >>>>> >>>>> Summary of stack trace: >>>>> >>>>> TestThread: >>>>> loadClasses(#) -> forName(TestClass#, URLClassLoader) >>>>> vm calls out to URLClassLoader.loadClass(String) which is >>>>> inherited from java.lang.ClassLoader.loadClass(String) >>>>> ... calls java.net.URLClassLoader.findClass(...) which calls >>>>> DoPrivileged java.net.URLClassLoader$1.run which calls >>>>> sun.misc.URLClassPath.getResource(name, false) which calls >>>>> sun.misc.URLClassPath$JarLoader.getResource which calls >>>>> sun.misc.URLClassPath$JarLoader.checkResource which >>>>> tries to call sun.misc.URLClassPath$JarLoader$2 >>>>> - and then the transformer jumps in with loadClasses(# (which we >>>>> know is 3) and walks the same logic which tries to load >>>>> sun.misc.URLClassPath$JarLoader$2 again >>>>> >>>>> Note that in the placeholder table information that Yumin printed, the >>>>> circularity error is on sun.misc.URLClassPath$JarLoader$2 with the >>>>> null == boot loader, which >>>>> makes sense -- that is the appropriate defining loader, and therefore >>>>> the one the CFLH would intercept during the defineClass phase. >>>>> >>>>> In the sun.misc.URLClassPath.java file, in the class JarLoader, in the >>>>> method checkResource >>>>> ... return new Resource() { ... } >>>>> -- I do not know why that generates sun.misc.URLClassPath$JarLoader$1, >>>>> $2 and $3 at build time or when that was added. >>>>> I would guess that is when the bug started happening. >>>>> >>>>> When I have a successful run, sun.misc.URLClassPath$JarLoader$2 loads >>>>> before any TestClass1 loads. >>>>> >>>>> My belief is that the point of the test is to test parallel class >>>>> loading for URL class loaders. >>>>> I don't think the point is to test the bootstrap class loader, nor to >>>>> test bootstrapping - i.e. running the agent before >>>>> we have loaded sufficient classes to allow loading URLClassLoader >>>>> classes. >>>>> >>>>> What I suggested to Yumin that he try would be to change the test to >>>>> NOT intercept boot loader loads, so that >>>>> sun.misc.URLClassPath$JarLoader$# >>>>> can load which will in turn allow classes loaded by a URLClassLoader >>>>> subclass to load. >>>>> >>>>> So that change to the test would be: >>>>> in TestTransformer: >>>>> if (loader != null) { >>>>> if (tName.equals("TestThread")) { >>>>> { >>>>> loadClasses(3); >>>>> } >>>>> } >>>>> return null; >>>>> } >>>>> // I also suspect with that change, we can remove the sleep loop >>>>> Note: there was a printed message which said that the Thread "Signal >>>>> Dispatcher" has called transform(), which I >>>>> ignored, however it is good that we don't call loadClass on that >>>>> thread - which is part of what the sleep loop does - >>>>> but that would be handled by the boot loader screening above >>>>> >>>>> Alternatively we can preload the URLClassPath classes, but I don't >>>>> think we want to do that, or >>>>> we can have the agent explicitly screen on a variety of jdk >>>>> bootstrapping classes. But I think the cleaner >>>>> solution is to screen on the boot loader. >>>>> >>>>> Does that make any sense to others? >>>>> >>>>> thanks, >>>>> Karen >>>>> >>>>> p.s. How to run with hotspot flags (jtreg has a -show:rerun option, >>>>> but with a shell script in the test, this is more complex, so >>>>> the following should be easier): >>>>> >>>>> So what I did was run the test once for it to pass (not your script, >>>>> but just once with jtreg) so that it generated >>>>> the $DST/work directory. >>>>> I then created a rerun.csh script - attached - you can modify for your >>>>> own $DST directory. >>>>> I used it to be able to quickly rerun the test without the jtreg >>>>> framework and compile time etc. but mostly >>>>> to be able to actually add hotspot command-line flags. >>>>> >>>>> >>>>> >>>>> >>>>> p.p.s. details from the error log (let me know if you want me to >>>>> attach the error log to the bug report) >>>>> >>>>> note: error log shows last 10 events including: >>>>> Event: 0.928 loading class sun/misc/URLClassPath$JarLoader$2 >>>>> Event: 0.928 loading class TestClass3 >>>>> Event: 0.929 loading class TestClass3 done >>>>> Event: 0.929 loading class java/lang/ClassCircularityError >>>>> Event: 0.929 loading class java/lang/ClassCircularityError done >>>>> >>>>> TestThread >>>>> >>>>> java frames: >>>>> >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>>> >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>>> >>>>> j >>>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>>> >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>>> v ~StubRoutines::call_stub >>>>> j >>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>>> >>>>> j >>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>>> j >>>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>>>> v ~StubRoutines::call_stub >>>>> j >>>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>>> >>>>> j >>>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>>> >>>>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>>>> j >>>>> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>>>> >>>>> j >>>>> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>>>> >>>>> j >>>>> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>>>> >>>>> v ~StubRoutines::call_stub >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>>> >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>>> >>>>> j >>>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>>> >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>>> v ~StubRoutines::call_stub >>>>> j >>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>>> >>>>> j >>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>>> j >>>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>>>> v ~StubRoutines::call_stub >>>>> j >>>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>>> >>>>> j >>>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>>> >>>>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>>>> j ParallelTransformerLoaderApp$TestThread.run()V+4 >>>>> v ~StubRoutines::call_stub >>>>> >>>>> >>>>> >>>>> detailed frames: >>>>> >>>>> V [libjvm.so+0x760f5a] Exceptions::_throw_msg(Thread*, char const*, >>>>> int, Symbol*, char const*)+0x7c >>>>> V [libjvm.so+0xce005c] >>>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>>> Handle, Thread*)+0x7d8 >>>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>>> Handle, Handle, Thread*)+0x26d >>>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>>> Handle, Handle, bool, Thread*)+0x39 >>>>> V [libjvm.so+0x690fbc] >>>>> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>>>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>>>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >>>>> ConstantPool*, int)+0x14a >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>>> >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>>> >>>>> j >>>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>>> >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>>> v ~StubRoutines::call_stub >>>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>>> JavaCallArguments*, Thread*)+0x7d >>>>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>>>> j >>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>>> >>>>> j >>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>>> j >>>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 >>>>> v ~StubRoutines::call_stub >>>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>>> JavaCallArguments*, Thread*)+0x7d >>>>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >>>>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>>>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >>>>> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>>>> V [libjvm.so+0xce2096] >>>>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>>>> V [libjvm.so+0xce00a8] >>>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>>> Handle, Thread*)+0x824 >>>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>>> Handle, Handle, Thread*)+0x26d >>>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>>> Handle, Handle, bool, Thread*)+0x39 >>>>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >>>>> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>>>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>>>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>>>> j >>>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>>> >>>>> j >>>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>>> >>>>> j ParallelTransformerLoaderAgent$TestTransformer.loadClasses(I)V+25 >>>>> j >>>>> ParallelTransformerLoaderAgent$TestTransformer.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+81 >>>>> >>>>> j >>>>> sun.instrument.TransformerManager.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[B)[B+50 >>>>> >>>>> j >>>>> sun.instrument.InstrumentationImpl.transform(Ljava/lang/ClassLoader;Ljava/lang/String;Ljava/lang/Class;Ljava/security/ProtectionDomain;[BZ)[B+34 >>>>> >>>>> v ~StubRoutines::call_stub >>>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>>> JavaCallArguments*, Thread*)+0x7d >>>>> V [libjvm.so+0x911bfb] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, >>>>> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x3cd >>>>> V [libjvm.so+0x916918] jni_CallObjectMethod+0x388 >>>>> C [libinstrument.so+0x4eb5] transformClassFile+0x1e5 >>>>> C [libinstrument.so+0x1e06] eventHandlerClassFileLoadHook+0x96 >>>>> V [libjvm.so+0xa04afa] >>>>> JvmtiClassFileLoadHookPoster::post_to_env(JvmtiEnv*, bool)+0x1a8 >>>>> V [libjvm.so+0xa0485e] >>>>> JvmtiClassFileLoadHookPoster::post_all_envs()+0x8a >>>>> V [libjvm.so+0xa047c6] JvmtiClassFileLoadHookPoster::post()+0x18 >>>>> V [libjvm.so+0x9fb6e1] >>>>> JvmtiExport::post_class_file_load_hook(Symbol*, Handle, Handle, >>>>> unsigned char**, unsigned char**, JvmtiCachedClassFileData**)+0x85 >>>>> V [libjvm.so+0x5cd17d] ClassFileParser::parseClassFile(Symbol*, >>>>> ClassLoaderData*, Handle, KlassHandle, GrowableArray*, >>>>> TempNewSymbol&, bool, Thread*)+0x2af >>>>> V [libjvm.so+0x5dd441] ClassFileParser::parseClassFile(Symbol*, >>>>> ClassLoaderData*, Handle, TempNewSymbol&, bool, Thread*)+0x95 >>>>> V [libjvm.so+0x5daf03] ClassLoader::load_classfile(Symbol*, >>>>> Thread*)+0x2ed >>>>> V [libjvm.so+0xce1cc4] >>>>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x118 >>>>> V [libjvm.so+0xce00a8] >>>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>>> Handle, Thread*)+0x824 >>>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>>> Handle, Handle, Thread*)+0x26d >>>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>>> Handle, Handle, bool, Thread*)+0x39 >>>>> V [libjvm.so+0x690fbc] >>>>> ConstantPool::klass_at_impl(constantPoolHandle, int, Thread*)+0x3cc >>>>> V [libjvm.so+0x5398cb] ConstantPool::klass_at(int, Thread*)+0x55 >>>>> V [libjvm.so+0x8b1f3c] InterpreterRuntime::_new(JavaThread*, >>>>> ConstantPool*, int)+0x14a >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.checkResource(Ljava/lang/String;ZLjava/util/jar/JarEntry;)Lsun/misc/Resource;+42 >>>>> >>>>> j >>>>> sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+54 >>>>> >>>>> j >>>>> sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53 >>>>> >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Class;+26 >>>>> j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 >>>>> v ~StubRoutines::call_stub >>>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>>> JavaCallArguments*, Thread*)+0x7d >>>>> V [libjvm.so+0x972a80] JVM_DoPrivileged+0x63d >>>>> j >>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 >>>>> >>>>> j >>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;+13 >>>>> j >>>>> java.lang.ClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+70 >>>>> j java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; >>>>> v ~StubRoutines::call_stub >>>>> V [libjvm.so+0x8c3060] JavaCalls::call_helper(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x6b2 >>>>> V [libjvm.so+0xba06bc] os::os_exception_wrapper(void (*)(JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, >>>>> methodHandle*, JavaCallArguments*, Thread*)+0x3a >>>>> V [libjvm.so+0x8c29a7] JavaCalls::call(JavaValue*, methodHandle, >>>>> JavaCallArguments*, Thread*)+0x7d >>>>> V [libjvm.so+0x8c1ec7] JavaCalls::call_virtual(JavaValue*, >>>>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1cb >>>>> V [libjvm.so+0x8c2086] JavaCalls::call_virtual(JavaValue*, Handle, >>>>> KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0xb0 >>>>> V [libjvm.so+0xce2096] >>>>> SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x4ea >>>>> V [libjvm.so+0xce00a8] >>>>> SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, >>>>> Handle, Thread*)+0x824 >>>>> V [libjvm.so+0xcde9e5] SystemDictionary::resolve_or_null(Symbol*, >>>>> Handle, Handle, Thread*)+0x26d >>>>> V [libjvm.so+0xcde435] SystemDictionary::resolve_or_fail(Symbol*, >>>>> Handle, Handle, bool, Thread*)+0x39 >>>>> V [libjvm.so+0x98c89e] find_class_from_class_loader(JNIEnv_*, >>>>> Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x49 >>>>> V [libjvm.so+0x96f681] JVM_FindClassFromCaller+0x39d >>>>> C [libjava.so+0xdfd0] Java_java_lang_Class_forName0+0x130 >>>>> j >>>>> java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 >>>>> >>>>> j >>>>> java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+49 >>>>> >>>>> j ParallelTransformerLoaderApp.loadClasses(I)V+25 >>>>> ...... >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Oct 27, 2014, at 2:00 PM, serguei.spitsyn at oracle.com wrote: >>>>> >>>>>> Ok. >>>>>> >>>>>> Thanks, Dan! >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 10/27/14 7:05 AM, Daniel D. Daugherty wrote: >>>>>>>> The test case was added by Dan. >>>>>>>> We may want to ask him to clarify the test case purpose. >>>>>>>> (added Dan to the to-list) >>>>>>> Here's the changeset that added the test: >>>>>>> >>>>>>> $ hg log -v -r bca8bf23ac59 >>>>>>> test/java/lang/instrument/ParallelTransformerLoader.sh >>>>>>> changeset: 132:bca8bf23ac59 >>>>>>> user: dcubed >>>>>>> date: Mon Mar 24 15:05:09 2008 -0700 >>>>>>> files: test/java/lang/instrument/ParallelTransformerLoader.sh >>>>>>> test/java/lang/instrument/ParallelTransformerLoaderAgent.java >>>>>>> test/java/lang/instrument/ParallelTransformerLoaderApp.java >>>>>>> test/java/lang/instrument/TestClass1.java >>>>>>> test/java/lang/instrument/TestClass2.java >>>>>>> test/java/lang/instrument/TestClass3.java >>>>>>> description: >>>>>>> 5088398: 3/2 java.lang.instrument TCK test deadlock (test11) >>>>>>> Summary: Add regression test for single-threaded bootstrap classloader. >>>>>>> Reviewed-by: sspitsyn >>>>>>> >>>>>>> >>>>>>> Based on my e-mail archive for this bug and from the bug report itself, >>>>>>> it looks like we got this test from Wily Labs. The original bug was a >>>>>>> deadlock that stopped being reproducible after: >>>>>>> >>>>>>> Karen fixed the bootstrap class loader to work in parallel via: >>>>>>> >>>>>>> 4997893 4/5 Investigate allowing bootstrap loader to work in >>>>>>> parallel >>>>>>> >>>>>>> with that fix in place the deadlock no longer reproduces. >>>>>>> I'm planning to use this bug as the vehicle for getting >>>>>>> the test program into the INSTRUMENT_REGRESSION test suite. >>>>>>> >>>>>>> *** (#2 of 2): 2008-02-29 18:20:17 GMT+00:00 daniel.daugherty at sun.com >>>>>>> >>>>>>> >>>>>>> A careful reading of JDK-5088398 might reveal the intentions of this >>>>>>> test... >>>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> On 10/24/14 5:57 PM, serguei.spitsyn at oracle.com wrote: >>>>>>>> Yumin, >>>>>>>> >>>>>>>> On 10/24/14 4:08 PM, Yumin Qi wrote: >>>>>>>>> Serguei, >>>>>>>>> >>>>>>>>> Thanks for your comments. >>>>>>>>> This test happens intermittently, but now it can repeat with 8/9. >>>>>>>>> Loading TestClass1 in main thread while loading TestClass2 in >>>>>>>>> TestThread in parallel. They both will call transform since >>>>>>>>> TestClass[1-3] are loaded via agent. When loading TestClass2, it >>>>>>>>> will call loading TestClass3 in TestThread. >>>>>>>>> Note in the main thread, for loop: >>>>>>>>> >>>>>>>>> for (int i = 0; i < kNumIterations; i++) >>>>>>>>> { >>>>>>>>> // load some classes from multiple threads >>>>>>>>> (this thread and one other) >>>>>>>>> Thread testThread = new TestThread(2); >>>>>>>>> testThread.start(); >>>>>>>>> loadClasses(1); >>>>>>>>> >>>>>>>>> // log that it completed and reset for the >>>>>>>>> next iteration >>>>>>>>> testThread.join(); >>>>>>>>> System.out.print("."); >>>>>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> The loader got renewed after testThread.join(). So both threads >>>>>>>>> are using the exact same class loader. >>>>>>>> You are right, thanks. >>>>>>>> It means that all three classes (TesClass1, TestClass2 and TestClass3) >>>>>>>> are loaded by the same class loader in each iteration. >>>>>>>> >>>>>>>> However, I see more cases when the TestClass3 gets loaded. >>>>>>>> It happens in a CFLH event when any other class (not TestClass*) in >>>>>>>> the system is loaded. >>>>>>>> The class loading thread can be any, not only "main" or "TestClass" >>>>>>>> thread. >>>>>>>> I suspect this test case mostly targets class loading that happens >>>>>>>> on other threads. >>>>>>>> It is because of the lines: >>>>>>>> // In 160_03 and older, transform() is called >>>>>>>> // with the "system_loader_lock" held and that >>>>>>>> // prevents the bootstrap class loaded from >>>>>>>> // running in parallel. If we add a slight >>>>>>>> sleep >>>>>>>> // delay here when the transform() call is not >>>>>>>> // main or TestThread, then the deadlock in >>>>>>>> // 160_03 and older is much more reproducible. >>>>>>>> if (!tName.equals("main") && >>>>>>>> !tName.equals("TestThread")) { >>>>>>>> System.out.println("Thread '" + tName + >>>>>>>> "' has called transform()"); >>>>>>>> try { >>>>>>>> Thread.sleep(500); >>>>>>>> } catch (InterruptedException ie) { >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> What about the following? >>>>>>>> >>>>>>>> In the ParallelTransformerLoaderAgent.java make this change: >>>>>>>> if (!tName.equals("main")) >>>>>>>> => if (tName.equals("TestThread")) >>>>>>>> >>>>>>>> Does such updated test still failing? >>>>>>>> >>>>>>>>> After create a new class loader, next loop will use the loader. >>>>>>>>> This is why quite often on the stack trace we can see it resolves >>>>>>>>> JarLoader$2. >>>>>>>>> >>>>>>>>> I am not quite understand the test case either. Loading TestClass3 >>>>>>>>> inside transform using the same classloader will cause call to >>>>>>>>> transform again and form a circle. Nonetheless, if we see >>>>>>>>> TestClass2 already loaded, the loop will end but that still is a >>>>>>>>> risk. >>>>>>>> In fact, I don't like that the test loads the class TestClass3 at >>>>>>>> the TestClass3 CFLH event. >>>>>>>> However, it is interesting to know why we did not see (is it the >>>>>>>> case?) this issue before. >>>>>>>> Also, it is interesting why the test stops failing with you fix >>>>>>>> (replacing loader with SystemClassLoader). >>>>>>>> >>>>>>>> The test case was added by Dan. >>>>>>>> We may want to ask him to clarify the test case purpose. >>>>>>>> (added Dan to the to-list) >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Yumin >>>>>>>>> >>>>>>>>> On 10/24/2014 1:20 PM, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Hi Yumin, >>>>>>>>>> >>>>>>>>>> Below is some analysis to make sure I understand the test >>>>>>>>>> scenario correctly. >>>>>>>>>> >>>>>>>>>> The ParallelTransformerLoaderApp.main() executes a 1000 iteration >>>>>>>>>> loop. >>>>>>>>>> At each iteration it does: >>>>>>>>>> - creates and starts a new TestThread >>>>>>>>>> - loads TestClass1 with the current class loader: >>>>>>>>>> ParallelTransformerLoaderAgent.getClassLoader() >>>>>>>>>> - changes the current class loader with new one: >>>>>>>>>> ParallelTransformerLoaderAgent.generateNewClassLoader() >>>>>>>>>> >>>>>>>>>> The TestThread loads the TestClass2 concurrently with the main >>>>>>>>>> thread. >>>>>>>>>> >>>>>>>>>> At the CFLH events, the ParallelTransformerLoaderAgent does the >>>>>>>>>> class retransformation. >>>>>>>>>> If the thread loading the class is not "main", it loads the class >>>>>>>>>> TestClass3 >>>>>>>>>> with the current class loader >>>>>>>>>> ParallelTransformerLoaderAgent.getClassLoader(). >>>>>>>>>> >>>>>>>>>> Sometimes, the TestClass2 and TestClass3 are loaded by the same >>>>>>>>>> class loader recursively. >>>>>>>>>> It happens if the class loader has not been changed between >>>>>>>>>> loading TestClass2 and TestClass3 classes. >>>>>>>>>> >>>>>>>>>> I'm not convinced yet the test is incorrect. >>>>>>>>>> And it is not clear why do we get a ClassCircularityError. >>>>>>>>>> >>>>>>>>>> Please, let me know if the above understanding is wrong. >>>>>>>>>> I also see the reply from David and share his concerns. >>>>>>>>>> >>>>>>>>>> It is not clear if this failure is a regression. >>>>>>>>>> Did we observe this issue before? >>>>>>>>>> If - NOT then when and why had this failure started to appear? >>>>>>>>>> >>>>>>>>>> Unfortunately, it is impossible to look at the test run history >>>>>>>>>> at the moment. >>>>>>>>>> The Aurora is at a maintenance. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>> On 10/13/14 3:58 PM, Yumin Qi wrote: >>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8038468 >>>>>>>>>>> webrev:*http://cr.openjdk.java.net/~minqi/8038468/webrev00/ >>>>>>>>>>> >>>>>>>>>>> the bug marked as confidential so post the webrev internally. >>>>>>>>>>> >>>>>>>>>>> Problem: The test case tries to load a class from the same jar >>>>>>>>>>> via agent in the middle of loading another class from the jar >>>>>>>>>>> via same class loader in same thread. The call happens in >>>>>>>>>>> transform which is a rare case --- in middle of loading class, >>>>>>>>>>> loading another class. The result is a CircularityError. When >>>>>>>>>>> first class is in loading, in vm we put JarLoader$2 on place >>>>>>>>>>> holder table, then we start the defineClass, which calls >>>>>>>>>>> transform, begins loading the second class so go along the same >>>>>>>>>>> routine for loading JarLoader$2 first, found it already in >>>>>>>>>>> placeholder table. A CircularityError is thrown. >>>>>>>>>>> Fix: The test case should not call loading class with same class >>>>>>>>>>> loader in same thread from same jar in 'transform' method. I >>>>>>>>>>> modify it loading with system class loader and we expect see >>>>>>>>>>> ClassNotFoundException. Detail see bug comments. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Yumin * > From coleen.phillimore at oracle.com Thu Nov 13 15:17:50 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 13 Nov 2014 10:17:50 -0500 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5464B7F5.9060000@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> <5464610A.20901@oracle.com> <54647017.8000704@oracle.com> <54647200.5000103@oracle.com> <546475E8.6000908@oracle.com> <5464B7F5.9060000@oracle.com> Message-ID: <5464CB9E.60107@oracle.com> On 11/13/14, 8:53 AM, Daniel D. Daugherty wrote: > > Happy to let others weigh in. > > Please use 8064749 to remove the flag; David H is correct that all the > right info is there. Agreed. There is no point in having an experimental flag that is broken also. This is not worth backporting to 8u. Coleen > > Dan > > > On 11/13/14 2:12 AM, David Holmes wrote: >> On 13/11/2014 6:55 PM, Aleksey Shipilev wrote: >>> On 13.11.2014 11:47, David Holmes wrote: >>>> On 13/11/2014 5:43 PM, Aleksey Shipilev wrote: >>>>> On 13.11.2014 08:40, David Holmes wrote: >>>>>> There is some history in JDK-4974572 (which is non-public I'm >>>>>> afraid). >>>>>> To all intents and purposes the flag at that point was used to >>>>>> enable >>>>>> testing of workarounds if problems were suspected in the "new" >>>>>> safepointing code. I think it has outlived its usefulness by a >>>>>> few major >>>>>> releases so I'm happy to see it go. >>>>> >>>>> Filed: >>>>> https://bugs.openjdk.java.net/browse/JDK-8064777 >>>> >>>> You actually filed 8064776 first :) >>> >>> O_o. The submit timestamps are the same. JIRA is funky today, huh. >>> >>>> But neither is needed as removal can be the solution of 8064749. >>> >>> I thought we are better off tracking this separately, and then close >>> all/any pending bugs about UseCompilerSafepoints as WNF citing 8064776. >>> Still want to do this in 8064749? >> >> It is what I did in 8062307 for the TraceThreadEvents flag. 8064749 >> contains all the pertinent comments. >> >>> Also, I wonder if we want to demote the flag to experimental in 8u. >>> This >>> does not sound like a backport of 8064749 at all, but rather a separate >>> change. >> >> Any change requires CCC. I don't see any point in making the flag >> experimental as it doesn't really provide any "experimentation". >> >> Happy to let others weigh in. >> >> Cheers, >> David >> >> >> >>> -Aleksey. >>> > From coleen.phillimore at oracle.com Thu Nov 13 15:25:18 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 13 Nov 2014 10:25:18 -0500 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464A6EC.6090804@oracle.com> References: <5464A6EC.6090804@oracle.com> Message-ID: <5464CD5E.4060800@oracle.com> Yes, you have to file a CCC first. Coleen On 11/13/14, 7:41 AM, Aleksey Shipilev wrote: > Hi, > > This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: > https://bugs.openjdk.java.net/browse/JDK-8064749 > http://cr.openjdk.java.net/~shade/8064749/webrev.01/ > > Do I understand it right we need a CCC to remove the product flag? > > Testing: JPRT, vm.quick.testlist > > Thanks, > -Aleksey. > From aleksey.shipilev at oracle.com Thu Nov 13 15:43:36 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 18:43:36 +0300 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464C5D2.2050007@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464C5D2.2050007@oracle.com> Message-ID: <5464D1A8.7050901@oracle.com> Thanks for review, Dan! Updated webrev: http://cr.openjdk.java.net/~shade/8064749/webrev.02/ On 11/13/2014 05:53 PM, Daniel D. Daugherty wrote: > On 11/13/14 5:41 AM, Aleksey Shipilev wrote: >> Hi, >> >> This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: >> https://bugs.openjdk.java.net/browse/JDK-8064749 >> http://cr.openjdk.java.net/~shade/8064749/webrev.01/ > > src/share/vm/runtime/arguments.cpp > Not your problem, but that list is formatted quite > inconsistently. > > It seems like new entries should be added at the bottom > so your line 309 should be between these two lines: > > line 313: #endif // ZERO > line 314: { NULL, JDK_Version(0), JDK_Version(0) } Moved. >> Do I understand it right we need a CCC to remove the product flag? > > Yes. Since this was a product flag, it needs a CCC to remove it. Filed. -Aleksey. From aleksey.shipilev at oracle.com Thu Nov 13 15:43:58 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 18:43:58 +0300 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464CD5E.4060800@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464CD5E.4060800@oracle.com> Message-ID: <5464D1BE.4090204@oracle.com> Got it, filed. Any problems with the code change? http://cr.openjdk.java.net/~shade/8064749/webrev.02/ -Aleksey. On 11/13/2014 06:25 PM, Coleen Phillimore wrote: > > Yes, you have to file a CCC first. > Coleen > > On 11/13/14, 7:41 AM, Aleksey Shipilev wrote: >> Hi, >> >> This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: >> https://bugs.openjdk.java.net/browse/JDK-8064749 >> http://cr.openjdk.java.net/~shade/8064749/webrev.01/ >> >> Do I understand it right we need a CCC to remove the product flag? >> >> Testing: JPRT, vm.quick.testlist >> >> Thanks, >> -Aleksey. >> > From coleen.phillimore at oracle.com Thu Nov 13 16:02:40 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 13 Nov 2014 11:02:40 -0500 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464D1BE.4090204@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464CD5E.4060800@oracle.com> <5464D1BE.4090204@oracle.com> Message-ID: <5464D620.80008@oracle.com> On 11/13/14, 10:43 AM, Aleksey Shipilev wrote: > Got it, filed. > > Any problems with the code change? > http://cr.openjdk.java.net/~shade/8064749/webrev.02/ No, the code change looks fine. Coleen > > -Aleksey. > > On 11/13/2014 06:25 PM, Coleen Phillimore wrote: >> Yes, you have to file a CCC first. >> Coleen >> >> On 11/13/14, 7:41 AM, Aleksey Shipilev wrote: >>> Hi, >>> >>> This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: >>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>> http://cr.openjdk.java.net/~shade/8064749/webrev.01/ >>> >>> Do I understand it right we need a CCC to remove the product flag? >>> >>> Testing: JPRT, vm.quick.testlist >>> >>> Thanks, >>> -Aleksey. >>> > From aleksey.shipilev at oracle.com Thu Nov 13 16:14:17 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 19:14:17 +0300 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464D620.80008@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464CD5E.4060800@oracle.com> <5464D1BE.4090204@oracle.com> <5464D620.80008@oracle.com> Message-ID: <5464D8D9.70606@oracle.com> Thanks, Coleen! Changeset: http://cr.openjdk.java.net/~shade/8064749/8064749.changeset (Patiently waiting for CCC to be approved). -Aleksey. On 11/13/2014 07:02 PM, Coleen Phillimore wrote: > > On 11/13/14, 10:43 AM, Aleksey Shipilev wrote: >> Got it, filed. >> >> Any problems with the code change? >> http://cr.openjdk.java.net/~shade/8064749/webrev.02/ > > No, the code change looks fine. > > Coleen >> >> -Aleksey. >> >> On 11/13/2014 06:25 PM, Coleen Phillimore wrote: >>> Yes, you have to file a CCC first. >>> Coleen >>> >>> On 11/13/14, 7:41 AM, Aleksey Shipilev wrote: >>>> Hi, >>>> >>>> This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: >>>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>>> http://cr.openjdk.java.net/~shade/8064749/webrev.01/ >>>> >>>> Do I understand it right we need a CCC to remove the product flag? >>>> >>>> Testing: JPRT, vm.quick.testlist >>>> >>>> Thanks, >>>> -Aleksey. >>>> >> > From gunter.haug at sap.com Thu Nov 13 16:39:48 2014 From: gunter.haug at sap.com (Haug, Gunter) Date: Thu, 13 Nov 2014 17:39:48 +0100 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <546470BD.9050303@oracle.com> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> <5463149A.6020506@oracle.com> <54637A9A.9040108@sap.com> <546470BD.9050303@oracle.com> Message-ID: <5464DED4.9040909@sap.com> On 13.11.2014 09:50, David Holmes wrote: > On 13/11/2014 1:19 AM, Haug, Gunter wrote: >> >> On 12.11.2014 09:04, David Holmes wrote: >>> Hi Gunter, >>> >>> On 11/11/2014 11:23 PM, Haug, Gunter wrote: >>>> Hi All, >>>> >>>> The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs >>>> improvement)' makes use of getrusage() to retrieve accurate >>>> per-thread data on resource usage. We can use exactly the same code >>>> on AIX to achieve this. >>>> >>>> Please review the following change: >>>> >>>> http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8064471 >>> >>> I have a couple of comments on this code which presumably also apply >>> to the orginal :( >> Yes, they apply to the original as well, see below. >>> >>> First this comment is no longer applicable (actually it was never >>> applicable to AIX!): >>> >>> // For now, we say that linux does not support vtime. I have no idea >>> // whether it can actually be made to (DLD, 9/13/05). >>> >> You're right. I will remove it. >>> Second this calculation seems wrong: >>> >>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>> (double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000 * >>> 1000); >>> >>> To me this performs integer division (ie truncation_) then converts >>> the resulting integer to a double. I would expect to see additional >>> parentheses (even if not needed, for clarity): >>> >>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>> ((double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec)) / (1000 * >>> 1000); >>> >>> or more simply divide by a floating-point value: >>> >>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>> >>> and you don't need two double casts regardless as the expression will >>> be of type double as soon as there is one operand of type double. So >>> that should reduce to: >>> >>> return usage.ru_utime.tv_sec + usage.ru_stime.tv_sec + >>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>> >> OK. Do you want that we also change the Linux version like you proposed? > > I'll leave it up to you. If you leave this as AIX only then it tests > the new process :) There can be a follow up cleanup bug for linux. Hi David, I think it's not worth the effort to make two separate changes on linux and aix, so I fixed linux as well. Please find the new webrev below. There will probably be more opportunities to test the new process in the future. http://cr.openjdk.java.net/~simonis/webrevs/8064471.v2/ Now we need a sponsor, as it is not aix only anymore. Thanks, Gunter > > Thanks, > David > >> Thanks, >> Gunter >> >>> Cheers, >>> David >>> >>>> Thanks, >>>> Gunter >>>> >> From tom.deneau at amd.com Thu Nov 13 16:43:25 2014 From: tom.deneau at amd.com (Deneau, Tom) Date: Thu, 13 Nov 2014 16:43:25 +0000 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: <5464444A.7030601@oracle.com> References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> Message-ID: Just as an aside, I got involved in this because I wanted to see the effect of the CompilerSafepoints poll instruction when comparing performance of small JMH benchmarks across a couple of different architectures. But I'm fine with getting rid of the flag. -- Tom -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Wednesday, November 12, 2014 11:40 PM To: Vladimir Kozlov; hotspot-runtime-dev at openjdk.java.net Cc: Deneau, Tom Subject: Re: hang when using -XX:-UseCompilerSafepoints Hi Vladimir, On 13/11/2014 1:39 PM, Vladimir Kozlov wrote: > I agrer that workaround is -Xint. But if we disable compilation with > -UseCompilerSafepoints, the flag becomes useless. You can get the same > result with just -Xint. > > The history shows that it was added at the very beginning of Hotspot > development, at the day one. I can only speculate that it was used to > find performance effects of safepoints in compiled code . It could be > the case that we removed safepoints from Counted loops as result of that > investigation. I think it was never intended to be used in production. > > Although we can fix compilers to generate a runtime call which does > safepoint when -UseCompilerSafepoints is specified, it will be useless > work, I think. There is some history in JDK-4974572 (which is non-public I'm afraid). To all intents and purposes the flag at that point was used to enable testing of workarounds if problems were suspected in the "new" safepointing code. I think it has outlived its usefulness by a few major releases so I'm happy to see it go. Cheers, David > thanks, > Vladimir > > On 11/12/14 6:57 PM, David Holmes wrote: >> On 13/11/2014 9:38 AM, Vladimir Kozlov wrote: >>> On 11/12/14 12:13 PM, Aleksey Shipilev wrote: >>>> Hi, >>>> >>>> Still not sure if this is a runtime bug: stripping safepoints from the >>>> non-counted loop seems to be a recipe for disaster. >>> >>> This flag does not affect compiled code - so it is not compiler issue. >> >> Well, it disables the mechanism that the compiler inserts for checking >> if a safepoint has been requested. As I've added to the bug report, >> disabling compiler safepoints should go hand-in-hand with disabling the >> compilers (ie run with -Xint) - otherwise you have to know that the >> compiled code will eventually hit a non-compiler safepoint check. >> >>> It is only used in runtime/safepoint.cpp and it guards the code which >>> protects a polling page. >>> >>> There are many bugs which shows current problem. For example: >>> >>> https://bugs.openjdk.java.net/browse/JDK-6873333 >>> >>> I would say that we have to remove it or at least make it experimental >>> flag if we want to do experiments with it. >>> >>> We definitely should not allow to use it in production! >> >> If we assume there is a reason it was made a product flag then the >> correct fix in my opinion would be to fall back to intepreter-only mode >> when this flag is turned off. >> >> If we don't make that assumption then we could still tie it to >> interpreter-only mode, but we definitely should not make it configurable >> in product mode without some effort. >> >> Or if we can't ascertain a valid reason for ever wanting to do this, we >> could simply delete the flag altogether. :) >> >> Cheers, >> David >> >>> Regards, >>> Vladimir >>> >>>> >>>> Anyhow, I think it deserves a simpler example. Submitted the bug and >>>> attached a simple test there: >>>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>>> >>>> Thanks, >>>> -Aleksey. >>>> >>>> On 12.11.2014 19:52, Deneau, Tom wrote: >>>>> Hi all -- >>>>> >>>>> Forwarding a thread which came about on the jmh-dev mail list, as >>>>> recommended by Aleksey Shipilev (see below). The JMH framework has a >>>>> timing control thread which sleeps for a certain period, then sets a >>>>> volatile isDone variable. Meanwhile, the benchmark thread loops >>>>> doing its benchmark code and also checking the isDone field. A hang >>>>> occurs if -XX:-UseCompilerSafepoints is used. >>>>> >>>>> The original issue can be reproduced by the following steps >>>>> >>>>> hg clone http://hg.openjdk.java.net/code-tools/jmh >>>>> cd jmh >>>>> mvn clean install -DskipTests=true >>>>> cd jmh-samples >>>>> java -server -XX:-UseCompilerSafepoints -jar >>>>> target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 >>>>> >>>>> -- Tom Deneau >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] >>>>> Sent: Wednesday, November 12, 2014 6:09 AM >>>>> To: Deneau, Tom; jmh-dev at openjdk.java.net >>>>> Subject: Re: using -XX:-UseCompilerSafepoints >>>>> >>>>> Hi Tom, >>>>> >>>>> On 11/11/2014 07:34 PM, Deneau, Tom wrote: >>>>>> It looks like a thread that calls Thread.sleep (as the timing control >>>>>> thread does in the harness) will eventually go thru >>>>>> SafepointSynchonize::block (as part of the ThreadBlockInVM >>>>>> destructor). So if there is a looping benchmark thread compiled >>>>>> without Compiler Safepoints, the control thread will be blocked and >>>>>> will never set the isDone flag. >>>>> >>>>> So, you are saying that without the safepoint in the while(!isDone) >>>>> loop in workload, control thread and workload thread will never >>>>> rendezvous on safepoint? I believe this is a bug with >>>>> -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls >>>>> this >>>>> out specifically for VMThread vs. Mutator threads: >>>>> >>>>> // In a pathological scenario such as that described in CR6415670 >>>>> // the VMthread may sleep just before the mutator(s) become safe. >>>>> // In that case the mutators will be stalled waiting for the >>>>> safepoint >>>>> // to complete and the the VMthread will be sleeping, waiting for >>>>> the >>>>> // mutators to rendezvous. The VMthread will eventually wake up and >>>>> // detect that all mutators are safe, at which point we'll again >>>>> make >>>>> // progress. >>>>> >>>>> If this is a case, you probably need to report this to runtime guys. >>>>> >>>>>> This is probably OK, just need to document that CompilerSafepoints >>>>>> cannot be turned off. >>>>> >>>>> I think it is safe to presume something will go hairy if you are using >>>>> any special VM flag, therefore I am not inclined to document this. >>>>> >>>>> Thanks, >>>>> -Aleksey. >>>>> >>>> >>>> From vladimir.kozlov at oracle.com Thu Nov 13 16:45:02 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Nov 2014 08:45:02 -0800 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464D1BE.4090204@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464CD5E.4060800@oracle.com> <5464D1BE.4090204@oracle.com> Message-ID: <5464E00E.4070503@oracle.com> Good. Thanks, Vladimir On 11/13/14 7:43 AM, Aleksey Shipilev wrote: > Got it, filed. > > Any problems with the code change? > http://cr.openjdk.java.net/~shade/8064749/webrev.02/ > > -Aleksey. > > On 11/13/2014 06:25 PM, Coleen Phillimore wrote: >> >> Yes, you have to file a CCC first. >> Coleen >> >> On 11/13/14, 7:41 AM, Aleksey Shipilev wrote: >>> Hi, >>> >>> This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: >>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>> http://cr.openjdk.java.net/~shade/8064749/webrev.01/ >>> >>> Do I understand it right we need a CCC to remove the product flag? >>> >>> Testing: JPRT, vm.quick.testlist >>> >>> Thanks, >>> -Aleksey. >>> >> > > From aleksey.shipilev at oracle.com Thu Nov 13 16:48:30 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 19:48:30 +0300 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464E00E.4070503@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464CD5E.4060800@oracle.com> <5464D1BE.4090204@oracle.com> <5464E00E.4070503@oracle.com> Message-ID: <5464E0DE.6030105@oracle.com> Thanks! Added you as the reviewer to the changeset. -Aleksey. On 11/13/2014 07:45 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 11/13/14 7:43 AM, Aleksey Shipilev wrote: >> Got it, filed. >> >> Any problems with the code change? >> http://cr.openjdk.java.net/~shade/8064749/webrev.02/ >> >> -Aleksey. >> >> On 11/13/2014 06:25 PM, Coleen Phillimore wrote: >>> >>> Yes, you have to file a CCC first. >>> Coleen >>> >>> On 11/13/14, 7:41 AM, Aleksey Shipilev wrote: >>>> Hi, >>>> >>>> This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: >>>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>>> http://cr.openjdk.java.net/~shade/8064749/webrev.01/ >>>> >>>> Do I understand it right we need a CCC to remove the product flag? >>>> >>>> Testing: JPRT, vm.quick.testlist >>>> >>>> Thanks, >>>> -Aleksey. >>>> >>> >> >> From daniel.daugherty at oracle.com Thu Nov 13 18:18:40 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 13 Nov 2014 11:18:40 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <5464C3E6.5000309@oracle.com> References: <546151A9.1080100@oracle.com> <5464C3E6.5000309@oracle.com> Message-ID: <5464F600.7040601@oracle.com> Magnus, Thanks for the review! Replies embedded below... On 11/13/14 7:44 AM, Magnus Ihse Bursie wrote: > On 2014-11-11 01:00, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >> Yes, it is a small fix, but it is in Makefiles so feel free to >> run screaming from the room... :-) On the plus side the fix does >> delete two work around source files (Coleen would say that's a >> Good Thing (TM)!) > > ... but you're only deleting the make files? Good catch! Looks like when I resurrected this fix from my JDK8 queue I missed a couple of deletes. > src/os/solaris/add_gnu_debuglink/add_gnu_debuglink.c and > src/os/solaris/fix_empty_sec_hdr_flags/fix_empty_sec_hdr_flags.c could > be deleted as well, right? Yes, these should be deleted and I'll do that in this fix. Since these are two deletes of files that can no longer be built anyway, I presume I don't need to sent out another webrev... > > Good idea for the fix, anyway. I opened > https://bugs.openjdk.java.net/browse/JDK-8064808 to implement a > similar solution in configure. Sounds good to me. Dan > > /Magnus From david.holmes at oracle.com Thu Nov 13 20:15:55 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Nov 2014 06:15:55 +1000 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5464D1BE.4090204@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464CD5E.4060800@oracle.com> <5464D1BE.4090204@oracle.com> Message-ID: <5465117B.1030205@oracle.com> On 14/11/2014 1:43 AM, Aleksey Shipilev wrote: > Got it, filed. Did you fast-track it? > Any problems with the code change? > http://cr.openjdk.java.net/~shade/8064749/webrev.02/ Looks okay. I would have pushed for outright removal rather than the "deprecation" mechanism given it is likely an unusable flag. :) Thanks, David > -Aleksey. > > On 11/13/2014 06:25 PM, Coleen Phillimore wrote: >> >> Yes, you have to file a CCC first. >> Coleen >> >> On 11/13/14, 7:41 AM, Aleksey Shipilev wrote: >>> Hi, >>> >>> This is a patch that removes -XX:+-UseCompilerSafepoints from Hotspot: >>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>> http://cr.openjdk.java.net/~shade/8064749/webrev.01/ >>> >>> Do I understand it right we need a CCC to remove the product flag? >>> >>> Testing: JPRT, vm.quick.testlist >>> >>> Thanks, >>> -Aleksey. >>> >> > > From aleksey.shipilev at oracle.com Thu Nov 13 20:27:14 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 13 Nov 2014 23:27:14 +0300 Subject: RFR (S) 8064749: -XX:-UseCompilerSafepoints breaks safepoint rendezvous In-Reply-To: <5465117B.1030205@oracle.com> References: <5464A6EC.6090804@oracle.com> <5464CD5E.4060800@oracle.com> <5464D1BE.4090204@oracle.com> <5465117B.1030205@oracle.com> Message-ID: <54651422.2000603@oracle.com> On 13.11.2014 23:15, David Holmes wrote: > On 14/11/2014 1:43 AM, Aleksey Shipilev wrote: >> Got it, filed. > > Did you fast-track it? Alas, I wasn't aware I needed a "fast-track", and got just to "submit". Anyhow, this is not a pressing issue, and Coleen volunteered (thanks!) to push the changeset as soon as CCC is approved. Added you to the changeset as the reviewer as well. Thanks, -Aleksey. From david.holmes at oracle.com Fri Nov 14 02:02:02 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Nov 2014 12:02:02 +1000 Subject: hang when using -XX:-UseCompilerSafepoints In-Reply-To: References: <5463BF71.4080804@oracle.com> <5463EF88.1050100@oracle.com> <54641E27.4090303@oracle.com> <546427DB.3070806@oracle.com> <5464444A.7030601@oracle.com> Message-ID: <5465629A.9010204@oracle.com> Hi Tom, On 14/11/2014 2:43 AM, Deneau, Tom wrote: > Just as an aside, I got involved in this because I wanted to see the effect of the CompilerSafepoints poll instruction when comparing performance of small JMH benchmarks across a couple of different architectures. Assuming you are interested in the cost of accessing the page, not the cost of the trap, you've probably already deduced that the flag simply disabled the arming of the page, as opposed to disabling generation of the polling instructions - and so would be of no help. > But I'm fine with getting rid of the flag. Thanks for confirming. David > -- Tom > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Wednesday, November 12, 2014 11:40 PM > To: Vladimir Kozlov; hotspot-runtime-dev at openjdk.java.net > Cc: Deneau, Tom > Subject: Re: hang when using -XX:-UseCompilerSafepoints > > Hi Vladimir, > > On 13/11/2014 1:39 PM, Vladimir Kozlov wrote: >> I agrer that workaround is -Xint. But if we disable compilation with >> -UseCompilerSafepoints, the flag becomes useless. You can get the same >> result with just -Xint. >> >> The history shows that it was added at the very beginning of Hotspot >> development, at the day one. I can only speculate that it was used to >> find performance effects of safepoints in compiled code . It could be >> the case that we removed safepoints from Counted loops as result of that >> investigation. I think it was never intended to be used in production. >> >> Although we can fix compilers to generate a runtime call which does >> safepoint when -UseCompilerSafepoints is specified, it will be useless >> work, I think. > > There is some history in JDK-4974572 (which is non-public I'm afraid). > To all intents and purposes the flag at that point was used to enable > testing of workarounds if problems were suspected in the "new" > safepointing code. I think it has outlived its usefulness by a few major > releases so I'm happy to see it go. > > Cheers, > David > >> thanks, >> Vladimir >> >> On 11/12/14 6:57 PM, David Holmes wrote: >>> On 13/11/2014 9:38 AM, Vladimir Kozlov wrote: >>>> On 11/12/14 12:13 PM, Aleksey Shipilev wrote: >>>>> Hi, >>>>> >>>>> Still not sure if this is a runtime bug: stripping safepoints from the >>>>> non-counted loop seems to be a recipe for disaster. >>>> >>>> This flag does not affect compiled code - so it is not compiler issue. >>> >>> Well, it disables the mechanism that the compiler inserts for checking >>> if a safepoint has been requested. As I've added to the bug report, >>> disabling compiler safepoints should go hand-in-hand with disabling the >>> compilers (ie run with -Xint) - otherwise you have to know that the >>> compiled code will eventually hit a non-compiler safepoint check. >>> >>>> It is only used in runtime/safepoint.cpp and it guards the code which >>>> protects a polling page. >>>> >>>> There are many bugs which shows current problem. For example: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-6873333 >>>> >>>> I would say that we have to remove it or at least make it experimental >>>> flag if we want to do experiments with it. >>>> >>>> We definitely should not allow to use it in production! >>> >>> If we assume there is a reason it was made a product flag then the >>> correct fix in my opinion would be to fall back to intepreter-only mode >>> when this flag is turned off. >>> >>> If we don't make that assumption then we could still tie it to >>> interpreter-only mode, but we definitely should not make it configurable >>> in product mode without some effort. >>> >>> Or if we can't ascertain a valid reason for ever wanting to do this, we >>> could simply delete the flag altogether. :) >>> >>> Cheers, >>> David >>> >>>> Regards, >>>> Vladimir >>>> >>>>> >>>>> Anyhow, I think it deserves a simpler example. Submitted the bug and >>>>> attached a simple test there: >>>>> https://bugs.openjdk.java.net/browse/JDK-8064749 >>>>> >>>>> Thanks, >>>>> -Aleksey. >>>>> >>>>> On 12.11.2014 19:52, Deneau, Tom wrote: >>>>>> Hi all -- >>>>>> >>>>>> Forwarding a thread which came about on the jmh-dev mail list, as >>>>>> recommended by Aleksey Shipilev (see below). The JMH framework has a >>>>>> timing control thread which sleeps for a certain period, then sets a >>>>>> volatile isDone variable. Meanwhile, the benchmark thread loops >>>>>> doing its benchmark code and also checking the isDone field. A hang >>>>>> occurs if -XX:-UseCompilerSafepoints is used. >>>>>> >>>>>> The original issue can be reproduced by the following steps >>>>>> >>>>>> hg clone http://hg.openjdk.java.net/code-tools/jmh >>>>>> cd jmh >>>>>> mvn clean install -DskipTests=true >>>>>> cd jmh-samples >>>>>> java -server -XX:-UseCompilerSafepoints -jar >>>>>> target/benchmarks.jar 'JMHSample_01_.*' -t 1 -wi 5 -i 5 -f 0 >>>>>> >>>>>> -- Tom Deneau >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] >>>>>> Sent: Wednesday, November 12, 2014 6:09 AM >>>>>> To: Deneau, Tom; jmh-dev at openjdk.java.net >>>>>> Subject: Re: using -XX:-UseCompilerSafepoints >>>>>> >>>>>> Hi Tom, >>>>>> >>>>>> On 11/11/2014 07:34 PM, Deneau, Tom wrote: >>>>>>> It looks like a thread that calls Thread.sleep (as the timing control >>>>>>> thread does in the harness) will eventually go thru >>>>>>> SafepointSynchonize::block (as part of the ThreadBlockInVM >>>>>>> destructor). So if there is a looping benchmark thread compiled >>>>>>> without Compiler Safepoints, the control thread will be blocked and >>>>>>> will never set the isDone flag. >>>>>> >>>>>> So, you are saying that without the safepoint in the while(!isDone) >>>>>> loop in workload, control thread and workload thread will never >>>>>> rendezvous on safepoint? I believe this is a bug with >>>>>> -XX:-CompilerSafepoints, because the comment in safepoint.cpp calls >>>>>> this >>>>>> out specifically for VMThread vs. Mutator threads: >>>>>> >>>>>> // In a pathological scenario such as that described in CR6415670 >>>>>> // the VMthread may sleep just before the mutator(s) become safe. >>>>>> // In that case the mutators will be stalled waiting for the >>>>>> safepoint >>>>>> // to complete and the the VMthread will be sleeping, waiting for >>>>>> the >>>>>> // mutators to rendezvous. The VMthread will eventually wake up and >>>>>> // detect that all mutators are safe, at which point we'll again >>>>>> make >>>>>> // progress. >>>>>> >>>>>> If this is a case, you probably need to report this to runtime guys. >>>>>> >>>>>>> This is probably OK, just need to document that CompilerSafepoints >>>>>>> cannot be turned off. >>>>>> >>>>>> I think it is safe to presume something will go hairy if you are using >>>>>> any special VM flag, therefore I am not inclined to document this. >>>>>> >>>>>> Thanks, >>>>>> -Aleksey. >>>>>> >>>>> >>>>> From david.holmes at oracle.com Fri Nov 14 02:20:50 2014 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Nov 2014 12:20:50 +1000 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <5464F600.7040601@oracle.com> References: <546151A9.1080100@oracle.com> <5464C3E6.5000309@oracle.com> <5464F600.7040601@oracle.com> Message-ID: <54656702.4090102@oracle.com> On 14/11/2014 4:18 AM, Daniel D. Daugherty wrote: > Magnus, > > Thanks for the review! > > Replies embedded below... > > On 11/13/14 7:44 AM, Magnus Ihse Bursie wrote: >> On 2014-11-11 01:00, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >>> Yes, it is a small fix, but it is in Makefiles so feel free to >>> run screaming from the room... :-) On the plus side the fix does >>> delete two work around source files (Coleen would say that's a >>> Good Thing (TM)!) >> >> ... but you're only deleting the make files? > > Good catch! Looks like when I resurrected this fix from my JDK8 > queue I missed a couple of deletes. > > >> src/os/solaris/add_gnu_debuglink/add_gnu_debuglink.c and >> src/os/solaris/fix_empty_sec_hdr_flags/fix_empty_sec_hdr_flags.c could >> be deleted as well, right? > > Yes, these should be deleted and I'll do that in this fix. > Since these are two deletes of files that can no longer be > built anyway, I presume I don't need to sent out another > webrev... I don't need to see an updated webrev :) Thanks, David > >> >> Good idea for the fix, anyway. I opened >> https://bugs.openjdk.java.net/browse/JDK-8064808 to implement a >> similar solution in configure. > > Sounds good to me. > > Dan > > >> >> /Magnus > From ivan.gerasimov at oracle.com Fri Nov 14 12:35:29 2014 From: ivan.gerasimov at oracle.com (Ivan Gerasimov) Date: Fri, 14 Nov 2014 15:35:29 +0300 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 Message-ID: <5465F711.9090605@oracle.com> Hello! The recent fix for JDK-8059533 ((process) Make exiting process wait for exiting threads [win]) caused the warning message to be printed in some test environments: ----------- os_windows.cpp:3844 is in the newly updated os::win32::exit_process_or_thread(Ept what, int exit_code) ----------- This has been observed with debug builds on highly loaded systems. To address the issue it is proposed to do three things: 1) increase the timeout for debug builds, 2) increase the maximum number of the thread handles to be stored, 3) rise the priority of the exiting threads, if we need to wait for them. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ The fix was tested on all available platforms, with the hotspot testset. No failures. Sincerely yours, Ivan From sergey.gabdurakhmanov at oracle.com Fri Nov 14 13:07:41 2014 From: sergey.gabdurakhmanov at oracle.com (Sergey Gabdurakhmanov) Date: Fri, 14 Nov 2014 16:07:41 +0300 Subject: RFR(XS): 8048050: Agent NullPointerException when rmi.port in use Message-ID: <5465FE9D.5060003@oracle.com> Hi, Could I please have a review of this small fix. webrev: http://cr.openjdk.java.net/~sgabdura/8048050/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8048050 Problem description: If the com.sun.management.jmxremote.rmi.port option is provided it will give a NPE if already in use by a different JVM. Its expected to fail but should provide an appropriate exception. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Run two instances in different JVMs at same time with the following options: -Dcom.sun.management.jmxremote.port=2222 -Dcom.sun.management.jmxremote.rmi.port=2223 -Dcom.sun.management.jmxremote.authenticate=false Root cause: Then we trying to start JMXConnectorServer (see method exportMBeanServer of class sun.management.jmxremote.ConnectorBootstrap on already used port it cause IOException. Call of connServer.getAddress().toString() in the exception handler cause NullPointerException because connServer.getAddress() returns null. Solution: Provide url.toString() if connServer.getAddress() is null I'm going to push this fix into JDK9, 8 and 7. BR, Sergey From jaroslav.bachorik at oracle.com Fri Nov 14 13:10:25 2014 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Fri, 14 Nov 2014 14:10:25 +0100 Subject: RFR(XS): 8048050: Agent NullPointerException when rmi.port in use In-Reply-To: <5465FE9D.5060003@oracle.com> References: <5465FE9D.5060003@oracle.com> Message-ID: <5465FF41.2050507@oracle.com> Good to go. -JB- On 11/14/2014 02:07 PM, Sergey Gabdurakhmanov wrote: > Hi, > > Could I please have a review of this small fix. > > webrev: http://cr.openjdk.java.net/~sgabdura/8048050/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8048050 > > Problem description: > If the com.sun.management.jmxremote.rmi.port option is provided it will > give a NPE if already in use by a different JVM. Its expected to fail > but should provide an appropriate exception. > > STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : > Run two instances in different JVMs at same time with the following > options: > -Dcom.sun.management.jmxremote.port=2222 > -Dcom.sun.management.jmxremote.rmi.port=2223 > -Dcom.sun.management.jmxremote.authenticate=false > > Root cause: > Then we trying to start JMXConnectorServer (see method exportMBeanServer > of class sun.management.jmxremote.ConnectorBootstrap on already used > port it cause IOException. Call of connServer.getAddress().toString() in > the exception handler cause NullPointerException because > connServer.getAddress() returns null. > > Solution: > Provide url.toString() if connServer.getAddress() is null > > I'm going to push this fix into JDK9, 8 and 7. > > BR, > Sergey > From daniel.fuchs at oracle.com Fri Nov 14 13:32:26 2014 From: daniel.fuchs at oracle.com (Daniel Fuchs) Date: Fri, 14 Nov 2014 14:32:26 +0100 Subject: RFR(XS): 8048050: Agent NullPointerException when rmi.port in use In-Reply-To: <5465FF41.2050507@oracle.com> References: <5465FE9D.5060003@oracle.com> <5465FF41.2050507@oracle.com> Message-ID: <5466046A.2060309@oracle.com> Hi Sergey, The fix looks fine. I wonder whether there should be a testcase for that? best regards, -- daniel On 14/11/14 14:10, Jaroslav Bachorik wrote: > > Good to go. > > -JB- > > On 11/14/2014 02:07 PM, Sergey Gabdurakhmanov wrote: >> Hi, >> >> Could I please have a review of this small fix. >> >> webrev: http://cr.openjdk.java.net/~sgabdura/8048050/webrev.00/ >> bug: https://bugs.openjdk.java.net/browse/JDK-8048050 >> >> Problem description: >> If the com.sun.management.jmxremote.rmi.port option is provided it will >> give a NPE if already in use by a different JVM. Its expected to fail >> but should provide an appropriate exception. >> >> STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : >> Run two instances in different JVMs at same time with the following >> options: >> -Dcom.sun.management.jmxremote.port=2222 >> -Dcom.sun.management.jmxremote.rmi.port=2223 >> -Dcom.sun.management.jmxremote.authenticate=false >> >> Root cause: >> Then we trying to start JMXConnectorServer (see method exportMBeanServer >> of class sun.management.jmxremote.ConnectorBootstrap on already used >> port it cause IOException. Call of connServer.getAddress().toString() in >> the exception handler cause NullPointerException because >> connServer.getAddress() returns null. >> >> Solution: >> Provide url.toString() if connServer.getAddress() is null >> >> I'm going to push this fix into JDK9, 8 and 7. >> >> BR, >> Sergey >> > From sergey.gabdurakhmanov at oracle.com Fri Nov 14 16:23:21 2014 From: sergey.gabdurakhmanov at oracle.com (Sergey Gabdurakhmanov) Date: Fri, 14 Nov 2014 08:23:21 -0800 (PST) Subject: RFR(XS): 8048050: Agent NullPointerException when rmi.port in use Message-ID: <8c93fff6-4bdd-4293-ba0d-f49649489ecf@default> Hi Daniel, Our documentation does not specify the exact exception that should be thrown in this scenario. But it should be reasonable. That makes testcase very difficult to implement. Because "reasonable" is not clear for test. E.g. "divided by zero" is not reasonable, but "illegal argument" is... I prefer do not put any tests where. BR, Sergey ----- Original Message ----- From: daniel.fuchs at oracle.com To: jaroslav.bachorik at oracle.com, sergey.gabdurakhmanov at oracle.com, hotspot-runtime-dev at openjdk.java.net, dmitry.samersoff at oracle.com, serviceability-dev at openjdk.java.net Sent: Friday, November 14, 2014 4:32:38 PM (GMT+0300) Auto-Detected Subject: Re: RFR(XS): 8048050: Agent NullPointerException when rmi.port in use Hi Sergey, The fix looks fine. I wonder whether there should be a testcase for that? best regards, -- daniel On 14/11/14 14:10, Jaroslav Bachorik wrote: > > Good to go. > > -JB- > > On 11/14/2014 02:07 PM, Sergey Gabdurakhmanov wrote: >> Hi, >> >> Could I please have a review of this small fix. >> >> webrev: http://cr.openjdk.java.net/~sgabdura/8048050/webrev.00/ >> bug: https://bugs.openjdk.java.net/browse/JDK-8048050 >> >> Problem description: >> If the com.sun.management.jmxremote.rmi.port option is provided it will >> give a NPE if already in use by a different JVM. Its expected to fail >> but should provide an appropriate exception. >> >> STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : >> Run two instances in different JVMs at same time with the following >> options: >> -Dcom.sun.management.jmxremote.port=2222 >> -Dcom.sun.management.jmxremote.rmi.port=2223 >> -Dcom.sun.management.jmxremote.authenticate=false >> >> Root cause: >> Then we trying to start JMXConnectorServer (see method exportMBeanServer >> of class sun.management.jmxremote.ConnectorBootstrap on already used >> port it cause IOException. Call of connServer.getAddress().toString() in >> the exception handler cause NullPointerException because >> connServer.getAddress() returns null. >> >> Solution: >> Provide url.toString() if connServer.getAddress() is null >> >> I'm going to push this fix into JDK9, 8 and 7. >> >> BR, >> Sergey >> > From daniel.daugherty at oracle.com Fri Nov 14 21:30:11 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 14 Nov 2014 14:30:11 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <5464F600.7040601@oracle.com> References: <546151A9.1080100@oracle.com> <5464C3E6.5000309@oracle.com> <5464F600.7040601@oracle.com> Message-ID: <54667463.7000405@oracle.com> > I presume I don't need to sent out another webrev... I have to change my mind on this because this fix needs to be backported to JDK8u-hs-dev. Here's the updated JDK9 webrev: http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk9-hs-rt/ And here's the JDK8u-hs-dev backport: http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk8u-hs-dev/ Because of improvements to the JDK9 makefiles, a bunch of the anchor text has changed. The best way to sanity check the backport is to download the two patch files and look at them in your favorite diff tool: http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk9-hs-rt/hotspot.patch http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk8u-hs-dev/8033602_for_jdk8u_hs_dev.patch I need just one sanity check on the backport... Thanks, in advance, for any comments, questions or suggestions. Dan On 11/13/14 11:18 AM, Daniel D. Daugherty wrote: > Magnus, > > Thanks for the review! > > Replies embedded below... > > On 11/13/14 7:44 AM, Magnus Ihse Bursie wrote: >> On 2014-11-11 01:00, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >>> Yes, it is a small fix, but it is in Makefiles so feel free to >>> run screaming from the room... :-) On the plus side the fix does >>> delete two work around source files (Coleen would say that's a >>> Good Thing (TM)!) >> >> ... but you're only deleting the make files? > > Good catch! Looks like when I resurrected this fix from my JDK8 > queue I missed a couple of deletes. > > >> src/os/solaris/add_gnu_debuglink/add_gnu_debuglink.c and >> src/os/solaris/fix_empty_sec_hdr_flags/fix_empty_sec_hdr_flags.c >> could be deleted as well, right? > > Yes, these should be deleted and I'll do that in this fix. > Since these are two deletes of files that can no longer be > built anyway, I presume I don't need to sent out another > webrev... > > >> >> Good idea for the fix, anyway. I opened >> https://bugs.openjdk.java.net/browse/JDK-8064808 to implement a >> similar solution in configure. > > Sounds good to me. > > Dan > > >> >> /Magnus On 11/10/14 5:00 PM, Daniel D. Daugherty wrote: > Greetings, > > I have a Solaris Full Debug Symbols (FDS) fix ready for review. > Yes, it is a small fix, but it is in Makefiles so feel free to > run screaming from the room... :-) On the plus side the fix does > delete two work around source files (Coleen would say that's a > Good Thing (TM)!) > > The fix is to detect the version of GNU objcopy that is being > used on the machine and only enable Full Debug Symbols when that > version is 2.21.1 or newer. If you don't have the right version, > then the build drops back to pre-FDS build configs with a message > like this: > > WARNING: /usr/sfw/bin/gobjcopy --version info: > WARNING: GNU objcopy 2.15 > WARNING: an objcopy version of 2.21.1 or newer is needed to create valid .debuginfo files. > WARNING: ignoring above objcopy command. > WARNING: patch 149063-01 or newer contains the correct Solaris 10 SPARC version. > WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 version. > WARNING: Solaris 11 Update 1 contains the correct version. > INFO: no objcopy cmd found so cannot create .debuginfo files. > INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 > > This work is being tracked by the following bug IDs: > > JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC > https://bugs.openjdk.java.net/browse/JDK-8033602 > > JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on Solaris X86 > https://bugs.openjdk.java.net/browse/JDK-8034005 > > Here is the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ > > Testing: > > - JPRT test jobs to verify that the current JPRT Solaris hosts > are happy > - local builds on my Solaris 10 X86 machine to verify that the > wrong version of GNU objcopy is caught > > Thanks, in advance, for any comments, questions or suggestions. > > Dan From coleen.phillimore at oracle.com Fri Nov 14 22:47:43 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Fri, 14 Nov 2014 17:47:43 -0500 Subject: [8u40] RFR 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter Message-ID: <5466868F.2020802@oracle.com> Please approve the backport of the bug fix for this bug. The fix has been tested all week with no problems. The patch didn't import because MallocTrackingVerify.java didn't have an @ignore tag, which was removed by the jdk9 fix. Everything else applied cleanly. Summary: Signed bitfield size y can only have (1 << y)-1 values. Reviewed-by: shade, dholmes, jrose, ctornqvi, gtriantafill open webrev at http://cr.openjdk.java.net/~coleenp/8062870_8u40/ bug link https://bugs.openjdk.java.net/browse/JDK-8062870 Thanks! Coleen From daniel.daugherty at oracle.com Fri Nov 14 23:22:23 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 14 Nov 2014 16:22:23 -0700 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <5465F711.9090605@oracle.com> References: <5465F711.9090605@oracle.com> Message-ID: <54668EAF.9070807@oracle.com> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: > Hello! > > The recent fix for JDK-8059533 ((process) Make exiting process wait > for exiting threads [win]) caused the warning message to be printed in > some test environments: > ----------- > os_windows.cpp:3844 is in the newly updated > os::win32::exit_process_or_thread(Ept what, int exit_code) > ----------- > > This has been observed with debug builds on highly loaded systems. > > > To address the issue it is proposed to do three things: > 1) increase the timeout for debug builds, > 2) increase the maximum number of the thread handles to be stored, > 3) rise the priority of the exiting threads, if we need to wait for them. > > Would you please help review the fix? > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 > WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ src/os/windows/vm/os_windows.cpp line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) Instead of NOT_DEBUG can you use PRODUCT_ONLY? Instead of DEBUG_ONLY can you used NOT_PRODUCT? That uses the smaller value for only one build config (PRODUCT). line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) /*1 sec in product, 4 sec in debug*/ Instead of NOT_DEBUG can you use PRODUCT_ONLY? Instead of DEBUG_ONLY can you used NOT_PRODUCT? Please add spaces between the comment delimiters and the comment text. That uses the smaller timeout for only one build config (PRODUCT). line 3836 // Rise the priority... Typo: 'Rise' -> 'Raise' About the general idea of raising the exiting thread's priority, if the exiting thread is looping in some Win* OS code after this point, will raising the priority make the machine unusable? Dan > > The fix was tested on all available platforms, with the hotspot > testset. No failures. > > Sincerely yours, > Ivan > From dmitry.samersoff at oracle.com Sat Nov 15 18:57:10 2014 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Sat, 15 Nov 2014 21:57:10 +0300 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <54667463.7000405@oracle.com> References: <546151A9.1080100@oracle.com> <5464C3E6.5000309@oracle.com> <5464F600.7040601@oracle.com> <54667463.7000405@oracle.com> Message-ID: <5467A206.7020105@oracle.com> Dan, The fix looks good for me. -Dmitry On 2014-11-15 00:30, Daniel D. Daugherty wrote: >> I presume I don't need to sent out another webrev... > > I have to change my mind on this because this fix needs to be > backported to JDK8u-hs-dev. > > Here's the updated JDK9 webrev: > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk9-hs-rt/ > > And here's the JDK8u-hs-dev backport: > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk8u-hs-dev/ > > Because of improvements to the JDK9 makefiles, a bunch of the > anchor text has changed. The best way to sanity check the backport > is to download the two patch files and look at them in your favorite > diff tool: > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk9-hs-rt/hotspot.patch > > http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk8u-hs-dev/8033602_for_jdk8u_hs_dev.patch > > > I need just one sanity check on the backport... > > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > On 11/13/14 11:18 AM, Daniel D. Daugherty wrote: >> Magnus, >> >> Thanks for the review! >> >> Replies embedded below... >> >> On 11/13/14 7:44 AM, Magnus Ihse Bursie wrote: >>> On 2014-11-11 01:00, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >>>> Yes, it is a small fix, but it is in Makefiles so feel free to >>>> run screaming from the room... :-) On the plus side the fix does >>>> delete two work around source files (Coleen would say that's a >>>> Good Thing (TM)!) >>> >>> ... but you're only deleting the make files? >> >> Good catch! Looks like when I resurrected this fix from my JDK8 >> queue I missed a couple of deletes. >> >> >>> src/os/solaris/add_gnu_debuglink/add_gnu_debuglink.c and >>> src/os/solaris/fix_empty_sec_hdr_flags/fix_empty_sec_hdr_flags.c >>> could be deleted as well, right? >> >> Yes, these should be deleted and I'll do that in this fix. >> Since these are two deletes of files that can no longer be >> built anyway, I presume I don't need to sent out another >> webrev... >> >> >>> >>> Good idea for the fix, anyway. I opened >>> https://bugs.openjdk.java.net/browse/JDK-8064808 to implement a >>> similar solution in configure. >> >> Sounds good to me. >> >> Dan >> >> >>> >>> /Magnus > > > > On 11/10/14 5:00 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >> Yes, it is a small fix, but it is in Makefiles so feel free to >> run screaming from the room... :-) On the plus side the fix does >> delete two work around source files (Coleen would say that's a >> Good Thing (TM)!) >> >> The fix is to detect the version of GNU objcopy that is being >> used on the machine and only enable Full Debug Symbols when that >> version is 2.21.1 or newer. If you don't have the right version, >> then the build drops back to pre-FDS build configs with a message >> like this: >> >> WARNING: /usr/sfw/bin/gobjcopy --version info: >> WARNING: GNU objcopy 2.15 >> WARNING: an objcopy version of 2.21.1 or newer is needed to create > valid .debuginfo files. >> WARNING: ignoring above objcopy command. >> WARNING: patch 149063-01 or newer contains the correct Solaris 10 > SPARC version. >> WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 > version. >> WARNING: Solaris 11 Update 1 contains the correct version. >> INFO: no objcopy cmd found so cannot create .debuginfo files. >> INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 >> >> This work is being tracked by the following bug IDs: >> >> JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC >> https://bugs.openjdk.java.net/browse/JDK-8033602 >> >> JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on > Solaris X86 >> https://bugs.openjdk.java.net/browse/JDK-8034005 >> >> Here is the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ >> >> Testing: >> >> - JPRT test jobs to verify that the current JPRT Solaris hosts >> are happy >> - local builds on my Solaris 10 X86 machine to verify that the >> wrong version of GNU objcopy is caught >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From daniel.daugherty at oracle.com Sat Nov 15 19:06:59 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 15 Nov 2014 12:06:59 -0700 Subject: RFR(S) Solaris Full Debug Symbols (FDS) fix for 8033602 and 8034005 In-Reply-To: <5467A206.7020105@oracle.com> References: <546151A9.1080100@oracle.com> <5464C3E6.5000309@oracle.com> <5464F600.7040601@oracle.com> <54667463.7000405@oracle.com> <5467A206.7020105@oracle.com> Message-ID: <5467A453.8020001@oracle.com> Thanks! Dan On 11/15/14 11:57 AM, Dmitry Samersoff wrote: > Dan, > > The fix looks good for me. > > -Dmitry > > > On 2014-11-15 00:30, Daniel D. Daugherty wrote: >>> I presume I don't need to sent out another webrev... >> I have to change my mind on this because this fix needs to be >> backported to JDK8u-hs-dev. >> >> Here's the updated JDK9 webrev: >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk9-hs-rt/ >> >> And here's the JDK8u-hs-dev backport: >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk8u-hs-dev/ >> >> Because of improvements to the JDK9 makefiles, a bunch of the >> anchor text has changed. The best way to sanity check the backport >> is to download the two patch files and look at them in your favorite >> diff tool: >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk9-hs-rt/hotspot.patch >> >> http://cr.openjdk.java.net/~dcubed/8033602-webrev/1-jdk8u-hs-dev/8033602_for_jdk8u_hs_dev.patch >> >> >> I need just one sanity check on the backport... >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> >> >> On 11/13/14 11:18 AM, Daniel D. Daugherty wrote: >>> Magnus, >>> >>> Thanks for the review! >>> >>> Replies embedded below... >>> >>> On 11/13/14 7:44 AM, Magnus Ihse Bursie wrote: >>>> On 2014-11-11 01:00, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >>>>> Yes, it is a small fix, but it is in Makefiles so feel free to >>>>> run screaming from the room... :-) On the plus side the fix does >>>>> delete two work around source files (Coleen would say that's a >>>>> Good Thing (TM)!) >>>> ... but you're only deleting the make files? >>> Good catch! Looks like when I resurrected this fix from my JDK8 >>> queue I missed a couple of deletes. >>> >>> >>>> src/os/solaris/add_gnu_debuglink/add_gnu_debuglink.c and >>>> src/os/solaris/fix_empty_sec_hdr_flags/fix_empty_sec_hdr_flags.c >>>> could be deleted as well, right? >>> Yes, these should be deleted and I'll do that in this fix. >>> Since these are two deletes of files that can no longer be >>> built anyway, I presume I don't need to sent out another >>> webrev... >>> >>> >>>> Good idea for the fix, anyway. I opened >>>> https://bugs.openjdk.java.net/browse/JDK-8064808 to implement a >>>> similar solution in configure. >>> Sounds good to me. >>> >>> Dan >>> >>> >>>> /Magnus >> >> >> On 11/10/14 5:00 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a Solaris Full Debug Symbols (FDS) fix ready for review. >>> Yes, it is a small fix, but it is in Makefiles so feel free to >>> run screaming from the room... :-) On the plus side the fix does >>> delete two work around source files (Coleen would say that's a >>> Good Thing (TM)!) >>> >>> The fix is to detect the version of GNU objcopy that is being >>> used on the machine and only enable Full Debug Symbols when that >>> version is 2.21.1 or newer. If you don't have the right version, >>> then the build drops back to pre-FDS build configs with a message >>> like this: >>> >>> WARNING: /usr/sfw/bin/gobjcopy --version info: >>> WARNING: GNU objcopy 2.15 >>> WARNING: an objcopy version of 2.21.1 or newer is needed to create >> valid .debuginfo files. >>> WARNING: ignoring above objcopy command. >>> WARNING: patch 149063-01 or newer contains the correct Solaris 10 >> SPARC version. >>> WARNING: patch 149064-01 or newer contains the correct Solaris 10 X86 >> version. >>> WARNING: Solaris 11 Update 1 contains the correct version. >>> INFO: no objcopy cmd found so cannot create .debuginfo files. >>> INFO: ENABLE_FULL_DEBUG_SYMBOLS=0 >>> >>> This work is being tracked by the following bug IDs: >>> >>> JDK-8033602 wrong stabs data in libjvm.debuginfo on JDK 8 - SPARC >>> https://bugs.openjdk.java.net/browse/JDK-8033602 >>> >>> JDK-8034005 cannot debug in synchronizer.o or objectMonitor.o on >> Solaris X86 >>> https://bugs.openjdk.java.net/browse/JDK-8034005 >>> >>> Here is the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8033602-webrev/0-jdk9-hs-rt/ >>> >>> Testing: >>> >>> - JPRT test jobs to verify that the current JPRT Solaris hosts >>> are happy >>> - local builds on my Solaris 10 X86 machine to verify that the >>> wrong version of GNU objcopy is caught >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan > From ivan.gerasimov at oracle.com Sun Nov 16 21:23:43 2014 From: ivan.gerasimov at oracle.com (Ivan Gerasimov) Date: Mon, 17 Nov 2014 00:23:43 +0300 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <54668EAF.9070807@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> Message-ID: <546915DF.7080106@oracle.com> Thank you Daniel! Please find the updated webrev with your suggestions incorporated here: http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ Concerning the thread priority: If the application is of NORMAL_PRIORITY_CLASS, then setting the thread's priority level to THREAD_PRIORITY_HIGHEST will result in its priority value to be only 10 (of maximum 31). http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx And if the process is HIGH_PRIORITY_CLASS, then the tread with the HIGHEST priority level will have priority value == 15 of 31. I believe, it should not be too much, and the machine will not become busy with only those closing threads. However, I hope it would be enough to make them complete faster than other threads of the NORMAL priority level withing the same application. Sincerely yours, Ivan On 15.11.2014 2:22, Daniel D. Daugherty wrote: > On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >> Hello! >> >> The recent fix for JDK-8059533 ((process) Make exiting process wait >> for exiting threads [win]) caused the warning message to be printed >> in some test environments: >> ----------- >> os_windows.cpp:3844 is in the newly updated >> os::win32::exit_process_or_thread(Ept what, int exit_code) >> ----------- >> >> This has been observed with debug builds on highly loaded systems. >> >> >> To address the issue it is proposed to do three things: >> 1) increase the timeout for debug builds, >> 2) increase the maximum number of the thread handles to be stored, >> 3) rise the priority of the exiting threads, if we need to wait for >> them. >> >> Would you please help review the fix? >> >> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ > > src/os/windows/vm/os_windows.cpp > > line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) > Instead of NOT_DEBUG can you use PRODUCT_ONLY? > Instead of DEBUG_ONLY can you used NOT_PRODUCT? > > That uses the smaller value for only one build config (PRODUCT). > > line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) > /*1 sec in product, 4 sec in debug*/ > Instead of NOT_DEBUG can you use PRODUCT_ONLY? > Instead of DEBUG_ONLY can you used NOT_PRODUCT? > Please add spaces between the comment delimiters and the comment > text. > > That uses the smaller timeout for only one build config (PRODUCT). > > line 3836 // Rise the priority... > Typo: 'Rise' -> 'Raise' > > About the general idea of raising the exiting thread's priority, > if the exiting thread is looping in some Win* OS code after this > point, will raising the priority make the machine unusable? > > Dan > > >> >> The fix was tested on all available platforms, with the hotspot >> testset. No failures. >> >> Sincerely yours, >> Ivan >> > > > From david.holmes at oracle.com Mon Nov 17 06:40:05 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 Nov 2014 16:40:05 +1000 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <546915DF.7080106@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> Message-ID: <54699845.5010901@oracle.com> On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: > Thank you Daniel! > > Please find the updated webrev with your suggestions incorporated here: > http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ > > Concerning the thread priority: If the application is of > NORMAL_PRIORITY_CLASS, then setting the thread's priority level to > THREAD_PRIORITY_HIGHEST will result in its priority value to be only 10 > (of maximum 31). > http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx > > > And if the process is HIGH_PRIORITY_CLASS, then the tread with the > HIGHEST priority level will have priority value == 15 of 31. > > I believe, it should not be too much, and the machine will not become > busy with only those closing threads. > However, I hope it would be enough to make them complete faster than > other threads of the NORMAL priority level withing the same application. I don't think this is necessary or desirable. Under normal usage we're giving priority to exiting threads and that may disrupt the usual scheduling patterns that applications see. You may posit that it is "harmless" but we can't say that for sure. Nor can we actually know that this will help with this particular bug. I would not add in this new code. David > Sincerely yours, > Ivan > > > On 15.11.2014 2:22, Daniel D. Daugherty wrote: >> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>> Hello! >>> >>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>> for exiting threads [win]) caused the warning message to be printed >>> in some test environments: >>> ----------- >>> os_windows.cpp:3844 is in the newly updated >>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>> ----------- >>> >>> This has been observed with debug builds on highly loaded systems. >>> >>> >>> To address the issue it is proposed to do three things: >>> 1) increase the timeout for debug builds, >>> 2) increase the maximum number of the thread handles to be stored, >>> 3) rise the priority of the exiting threads, if we need to wait for >>> them. >>> >>> Would you please help review the fix? >>> >>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >> >> src/os/windows/vm/os_windows.cpp >> >> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >> >> That uses the smaller value for only one build config (PRODUCT). >> >> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) >> /*1 sec in product, 4 sec in debug*/ >> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >> Please add spaces between the comment delimiters and the comment >> text. >> >> That uses the smaller timeout for only one build config (PRODUCT). >> >> line 3836 // Rise the priority... >> Typo: 'Rise' -> 'Raise' >> >> About the general idea of raising the exiting thread's priority, >> if the exiting thread is looping in some Win* OS code after this >> point, will raising the priority make the machine unusable? >> >> Dan >> >> >>> >>> The fix was tested on all available platforms, with the hotspot >>> testset. No failures. >>> >>> Sincerely yours, >>> Ivan >>> >> >> >> > From david.holmes at oracle.com Mon Nov 17 06:44:29 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 Nov 2014 16:44:29 +1000 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <5464DED4.9040909@sap.com> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> <5463149A.6020506@oracle.com> <54637A9A.9040108@sap.com> <546470BD.9050303@oracle.com> <5464DED4.9040909@sap.com> Message-ID: <5469994D.3070208@oracle.com> On 14/11/2014 2:39 AM, Haug, Gunter wrote: > > On 13.11.2014 09:50, David Holmes wrote: >> On 13/11/2014 1:19 AM, Haug, Gunter wrote: >>> >>> On 12.11.2014 09:04, David Holmes wrote: >>>> Hi Gunter, >>>> >>>> On 11/11/2014 11:23 PM, Haug, Gunter wrote: >>>>> Hi All, >>>>> >>>>> The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs >>>>> improvement)' makes use of getrusage() to retrieve accurate >>>>> per-thread data on resource usage. We can use exactly the same code >>>>> on AIX to achieve this. >>>>> >>>>> Please review the following change: >>>>> >>>>> http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8064471 >>>> >>>> I have a couple of comments on this code which presumably also apply >>>> to the orginal :( >>> Yes, they apply to the original as well, see below. >>>> >>>> First this comment is no longer applicable (actually it was never >>>> applicable to AIX!): >>>> >>>> // For now, we say that linux does not support vtime. I have no idea >>>> // whether it can actually be made to (DLD, 9/13/05). >>>> >>> You're right. I will remove it. >>>> Second this calculation seems wrong: >>>> >>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>> (double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000 * >>>> 1000); >>>> >>>> To me this performs integer division (ie truncation_) then converts >>>> the resulting integer to a double. I would expect to see additional >>>> parentheses (even if not needed, for clarity): >>>> >>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>> ((double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec)) / (1000 * >>>> 1000); >>>> >>>> or more simply divide by a floating-point value: >>>> >>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>>> >>>> and you don't need two double casts regardless as the expression will >>>> be of type double as soon as there is one operand of type double. So >>>> that should reduce to: >>>> >>>> return usage.ru_utime.tv_sec + usage.ru_stime.tv_sec + >>>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>>> >>> OK. Do you want that we also change the Linux version like you proposed? >> >> I'll leave it up to you. If you leave this as AIX only then it tests >> the new process :) There can be a follow up cleanup bug for linux. > > Hi David, > > I think it's not worth the effort to make two separate changes on linux > and aix, so I fixed linux as well. Please find the new webrev below. > There will probably be more opportunities to test the new process in the > future. > > http://cr.openjdk.java.net/~simonis/webrevs/8064471.v2/ > > > > Now we need a sponsor, as it is not aix only anymore. I guess that will have to be me. :) I will try to look at this again tomorrow. David > Thanks, > Gunter > > >> >> Thanks, >> David >> >>> Thanks, >>> Gunter >>> >>>> Cheers, >>>> David >>>> >>>>> Thanks, >>>>> Gunter >>>>> >>> > From markus.gronlund at oracle.com Mon Nov 17 08:33:45 2014 From: markus.gronlund at oracle.com (=?utf-8?B?TWFya3VzIEdyw7ZubHVuZA==?=) Date: Mon, 17 Nov 2014 00:33:45 -0800 (PST) Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <54699845.5010901@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> Message-ID: I agree with David. The side effects will be unknown and very hard to debug. Is there another way to accomplish the results without manipulating base services? Thanks Markus -----Original Message----- From: David Holmes Sent: den 17 november 2014 07:40 To: Ivan Gerasimov; Daniel Daugherty Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev Subject: Re: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: > Thank you Daniel! > > Please find the updated webrev with your suggestions incorporated here: > http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ > > Concerning the thread priority: If the application is of > NORMAL_PRIORITY_CLASS, then setting the thread's priority level to > THREAD_PRIORITY_HIGHEST will result in its priority value to be only > 10 (of maximum 31). > http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs. > 85).aspx > > > And if the process is HIGH_PRIORITY_CLASS, then the tread with the > HIGHEST priority level will have priority value == 15 of 31. > > I believe, it should not be too much, and the machine will not become > busy with only those closing threads. > However, I hope it would be enough to make them complete faster than > other threads of the NORMAL priority level withing the same application. I don't think this is necessary or desirable. Under normal usage we're giving priority to exiting threads and that may disrupt the usual scheduling patterns that applications see. You may posit that it is "harmless" but we can't say that for sure. Nor can we actually know that this will help with this particular bug. I would not add in this new code. David > Sincerely yours, > Ivan > > > On 15.11.2014 2:22, Daniel D. Daugherty wrote: >> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>> Hello! >>> >>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>> for exiting threads [win]) caused the warning message to be printed >>> in some test environments: >>> ----------- >>> os_windows.cpp:3844 is in the newly updated >>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>> ----------- >>> >>> This has been observed with debug builds on highly loaded systems. >>> >>> >>> To address the issue it is proposed to do three things: >>> 1) increase the timeout for debug builds, >>> 2) increase the maximum number of the thread handles to be stored, >>> 3) rise the priority of the exiting threads, if we need to wait for >>> them. >>> >>> Would you please help review the fix? >>> >>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >> >> src/os/windows/vm/os_windows.cpp >> >> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >> >> That uses the smaller value for only one build config (PRODUCT). >> >> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) >> /*1 sec in product, 4 sec in debug*/ >> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >> Please add spaces between the comment delimiters and the comment >> text. >> >> That uses the smaller timeout for only one build config (PRODUCT). >> >> line 3836 // Rise the priority... >> Typo: 'Rise' -> 'Raise' >> >> About the general idea of raising the exiting thread's priority, >> if the exiting thread is looping in some Win* OS code after this >> point, will raising the priority make the machine unusable? >> >> Dan >> >> >>> >>> The fix was tested on all available platforms, with the hotspot >>> testset. No failures. >>> >>> Sincerely yours, >>> Ivan >>> >> >> >> > From ivan.gerasimov at oracle.com Mon Nov 17 09:00:16 2014 From: ivan.gerasimov at oracle.com (Ivan Gerasimov) Date: Mon, 17 Nov 2014 12:00:16 +0300 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <54699845.5010901@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> Message-ID: <5469B920.4040300@oracle.com> Thanks David! On 17.11.2014 9:40, David Holmes wrote: > On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >> Thank you Daniel! >> >> Please find the updated webrev with your suggestions incorporated here: >> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >> >> Concerning the thread priority: If the application is of >> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >> THREAD_PRIORITY_HIGHEST will result in its priority value to be only 10 >> (of maximum 31). >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx >> >> >> >> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >> HIGHEST priority level will have priority value == 15 of 31. >> >> I believe, it should not be too much, and the machine will not become >> busy with only those closing threads. >> However, I hope it would be enough to make them complete faster than >> other threads of the NORMAL priority level withing the same application. > > I don't think this is necessary or desirable. Under normal usage we're > giving priority to exiting threads and that may disrupt the usual > scheduling patterns that applications see. You may posit that it is > "harmless" but we can't say that for sure. Nor can we actually know > that this will help with this particular bug. I would not add in this > new code. > There are two places where I put adjusting the thread's priority: 1) We've the array of handles filled up. If we're found in this code branch, it'll mean that unfortunately we've already got broken exit pattern, because the current thread has to do a blocking call, having the ownership of a critical section. The full array of handles means that many threads are exiting at that time, thus all the threads that are starting to exit after the current one will block at the attempt to grab ownership of the critical section. Raising the priority of one thread that had already reached _endthreadex(), seems appropriate to me in such a situation, because it helps shorten the period of time when the threads remain blocked. Choosing the oldest exiting thread ensures that the period of time when the priority of one thread is higher is the smallest possible. 2) The process exit branch. That's the main part of the fix -- here we make the process to wait for all the threads having called _endthreadex() to complete, at the same time preventing any other threads from starting the exiting procedure. The execution flow is already changed here (I don't want to say disrupted, because it was meant to fix the issue). All running threads are about to be terminated soon by ending the process, so raising the priority of some of the threads should not have any bad impact on the program flow. Instead, it may make the time the process has to wait before calling exit() shorter. I can surely remove that playing with the threads' priority, as it's not the essential part of the fix. However, I think it's a useful hint to the scheduler, which can improve things in some situations, and I'm not really sure how it can harm. Sincerely yours, Ivan > David > >> Sincerely yours, >> Ivan >> >> >> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>> Hello! >>>> >>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>> for exiting threads [win]) caused the warning message to be printed >>>> in some test environments: >>>> ----------- >>>> os_windows.cpp:3844 is in the newly updated >>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>> ----------- >>>> >>>> This has been observed with debug builds on highly loaded systems. >>>> >>>> >>>> To address the issue it is proposed to do three things: >>>> 1) increase the timeout for debug builds, >>>> 2) increase the maximum number of the thread handles to be stored, >>>> 3) rise the priority of the exiting threads, if we need to wait for >>>> them. >>>> >>>> Would you please help review the fix? >>>> >>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>> >>> src/os/windows/vm/os_windows.cpp >>> >>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>> >>> That uses the smaller value for only one build config (PRODUCT). >>> >>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) >>> /*1 sec in product, 4 sec in debug*/ >>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>> Please add spaces between the comment delimiters and the comment >>> text. >>> >>> That uses the smaller timeout for only one build config (PRODUCT). >>> >>> line 3836 // Rise the priority... >>> Typo: 'Rise' -> 'Raise' >>> >>> About the general idea of raising the exiting thread's priority, >>> if the exiting thread is looping in some Win* OS code after this >>> point, will raising the priority make the machine unusable? >>> >>> Dan >>> >>> >>>> >>>> The fix was tested on all available platforms, with the hotspot >>>> testset. No failures. >>>> >>>> Sincerely yours, >>>> Ivan >>>> >>> >>> >>> >> > > From george.triantafillou at oracle.com Mon Nov 17 12:52:40 2014 From: george.triantafillou at oracle.com (George Triantafillou) Date: Mon, 17 Nov 2014 07:52:40 -0500 Subject: [8u40] RFR 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5466868F.2020802@oracle.com> References: <5466868F.2020802@oracle.com> Message-ID: <5469EF98.8010207@oracle.com> Hi Coleen, This looks good. -George On 11/14/2014 5:47 PM, Coleen Phillimore wrote: > Please approve the backport of the bug fix for this bug. The fix has > been tested all week with no problems. The patch didn't import > because MallocTrackingVerify.java didn't have an @ignore tag, which > was removed by the jdk9 fix. Everything else applied cleanly. > > Summary: Signed bitfield size y can only have (1 << y)-1 values. > Reviewed-by: shade, dholmes, jrose, ctornqvi, gtriantafill > > open webrev at http://cr.openjdk.java.net/~coleenp/8062870_8u40/ > bug link https://bugs.openjdk.java.net/browse/JDK-8062870 > > Thanks! > Coleen > > > From coleen.phillimore at oracle.com Mon Nov 17 16:21:05 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 17 Nov 2014 11:21:05 -0500 Subject: [8u40] RFR 8062870: src/share/vm/services/mallocTracker.hpp:64 assert(_count > 0) failed: Negative ,counter In-Reply-To: <5469EF98.8010207@oracle.com> References: <5466868F.2020802@oracle.com> <5469EF98.8010207@oracle.com> Message-ID: <546A2071.2060204@oracle.com> Thank you, George! Coleen On 11/17/14, 7:52 AM, George Triantafillou wrote: > Hi Coleen, > > This looks good. > > -George > > On 11/14/2014 5:47 PM, Coleen Phillimore wrote: >> Please approve the backport of the bug fix for this bug. The fix has >> been tested all week with no problems. The patch didn't import >> because MallocTrackingVerify.java didn't have an @ignore tag, which >> was removed by the jdk9 fix. Everything else applied cleanly. >> >> Summary: Signed bitfield size y can only have (1 << y)-1 values. >> Reviewed-by: shade, dholmes, jrose, ctornqvi, gtriantafill >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8062870_8u40/ >> bug link https://bugs.openjdk.java.net/browse/JDK-8062870 >> >> Thanks! >> Coleen >> >> >> > From mandy.chung at oracle.com Mon Nov 17 16:57:23 2014 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 17 Nov 2014 08:57:23 -0800 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism Message-ID: <546A28F3.1010802@oracle.com> This requests both code review and 8u40 approval for: https://bugs.openjdk.java.net/browse/JDK-8064667 Webrev: http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ JEP 220 [1] proposes to remove the endorsed standards override mechanism and extension mechanism. This patch adds a VM flag in 8u40 to help identify any existing uses of these mechanisms so that users can turn on the VM flag to help identify if they depend on the endorsed standards override mechanism and extension mechanism and can plan to prepare for the migration to a newer JDK release early on. When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any system extension directory contains JAR files. Thanks Mandy [1] http://openjdk.java.net/jeps/220 From vladimir.kempik at oracle.com Mon Nov 17 16:20:58 2014 From: vladimir.kempik at oracle.com (Vladimir Kempik) Date: Mon, 17 Nov 2014 20:20:58 +0400 Subject: RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment Message-ID: <546A206A.4070604@oracle.com> Hi, Please review patch adding sanity check to cores_per_cpu(): http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8058935 Few months ago we've got reports of java crashing in amazon ec2 enviroment (they use Xen). https://bugs.openjdk.java.net/browse/JDK-8058935 https://bugs.openjdk.java.net/browse/JDK-8058937 JVM args was used to make the crash: -XX:+UnlockCommercialFeatures -XX:+FlightRecorder After investigation I think the crash could only have happened if support_processor_topology() returned true and _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero. I wasn't able to reproduce the bug on amazon ec2 cloud in present days. The patch adds sanity check, if cpu topology was used and resulted in 0 cores per cpu, then fallback to non-topology variant, which can't result in 0 cores per cpu. Testing: JPRT. Thanks, Vladimir. From sean.coffey at oracle.com Mon Nov 17 18:06:17 2014 From: sean.coffey at oracle.com (=?windows-1252?Q?Se=E1n_Coffey?=) Date: Mon, 17 Nov 2014 18:06:17 +0000 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546A28F3.1010802@oracle.com> References: <546A28F3.1010802@oracle.com> Message-ID: <546A3919.7020804@oracle.com> Looks good to me Mandy. Best to get a runtime reviewer to look at that I guess. Few other comments : a) Will you be filing a CCC for the new flag ? b) Maybe a testcase would be useful (simple one launching in ovm mode with java.endorsed.dirs etc. c) Will you be pushing this to hs-dev or jdk8u-dev forest ? Seems most relevant for hotspot team forest. No approval required in that case. regards, Sean. On 17/11/14 16:57, Mandy Chung wrote: > This requests both code review and 8u40 approval for: > https://bugs.openjdk.java.net/browse/JDK-8064667 > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ > > JEP 220 [1] proposes to remove the endorsed standards override > mechanism and extension mechanism. This patch adds a VM flag in 8u40 > to help identify any existing uses of these mechanisms so that users > can turn on the VM flag to help identify if they depend on the > endorsed standards override mechanism and extension mechanism and can > plan to prepare for the migration to a newer JDK release early on. > When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the > system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if > ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any > system extension directory contains JAR files. > > Thanks > Mandy > [1] http://openjdk.java.net/jeps/220 > > > From coleen.phillimore at oracle.com Mon Nov 17 19:10:36 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 17 Nov 2014 14:10:36 -0500 Subject: [8u40] RFR 8048169: Change 8037816 breaks HS build on PPC64 and CPP-Interpreter platforms Message-ID: <546A482C.7030508@oracle.com> Summary: Fix the matching of format string parameter types to the actual argument types for the PPC64 and CPP-Interpreter files in the same way as 8037816 already did it for all the other files Reviewed-by: stefank, coleenp, dholmes This is a 8u40 backport of the changes that Volker did for 9. Please approve. Coleen From christian.tornqvist at oracle.com Mon Nov 17 19:21:23 2014 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Mon, 17 Nov 2014 14:21:23 -0500 Subject: [8u40] RFR 8048169: Change 8037816 breaks HS build on PPC64 and CPP-Interpreter platforms In-Reply-To: <546A482C.7030508@oracle.com> References: <546A482C.7030508@oracle.com> Message-ID: <042701d0029b$adb04060$0910c120$@oracle.com> Sounds like a good idea to backport this. Thanks, Christian -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen Phillimore Sent: Monday, November 17, 2014 2:11 PM To: hotspot-runtime-dev Subject: [8u40] RFR 8048169: Change 8037816 breaks HS build on PPC64 and CPP-Interpreter platforms Summary: Fix the matching of format string parameter types to the actual argument types for the PPC64 and CPP-Interpreter files in the same way as 8037816 already did it for all the other files Reviewed-by: stefank, coleenp, dholmes This is a 8u40 backport of the changes that Volker did for 9. Please approve. Coleen From coleen.phillimore at oracle.com Mon Nov 17 19:23:01 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 17 Nov 2014 14:23:01 -0500 Subject: [8u40] RFR 8048169: Change 8037816 breaks HS build on PPC64 and CPP-Interpreter platforms In-Reply-To: <042701d0029b$adb04060$0910c120$@oracle.com> References: <546A482C.7030508@oracle.com> <042701d0029b$adb04060$0910c120$@oracle.com> Message-ID: <546A4B15.1030909@oracle.com> Thanks Christian. I forgot to mention that this change imported cleanly from the jdk9 changes. Coleen On 11/17/14, 2:21 PM, Christian Tornqvist wrote: > Sounds like a good idea to backport this. > > Thanks, > Christian > > -----Original Message----- > From: hotspot-runtime-dev > [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen > Phillimore > Sent: Monday, November 17, 2014 2:11 PM > To: hotspot-runtime-dev > Subject: [8u40] RFR 8048169: Change 8037816 breaks HS build on PPC64 and > CPP-Interpreter platforms > > Summary: Fix the matching of format string parameter types to the actual > argument types for the PPC64 and CPP-Interpreter files in the same way as > 8037816 already did it for all the other files > Reviewed-by: stefank, coleenp, dholmes > > > This is a 8u40 backport of the changes that Volker did for 9. Please > approve. > > Coleen > From calvin.cheung at oracle.com Mon Nov 17 19:40:33 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 17 Nov 2014 11:40:33 -0800 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546A28F3.1010802@oracle.com> References: <546A28F3.1010802@oracle.com> Message-ID: <546A4F31.2040406@oracle.com> Hi Mandy, In 8u40, the jre/lib/ext dir still exists and containing nashorn.jar, javafx.jar, etc. Would those jar files be moved to a different dir? Some minor comments in arguments.cpp: lines 3470 and 3472 can be combined as follows: int nonEmptyDirs = check_non_empty_dirs(Arguments::get_endorsed_dir(), "endorsed"); before return JNI_ERR; at lines 3493 and 3503, the dir should be closed: os::closedir(dir); Calvin On 11/17/2014 8:57 AM, Mandy Chung wrote: > This requests both code review and 8u40 approval for: > https://bugs.openjdk.java.net/browse/JDK-8064667 > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ > > JEP 220 [1] proposes to remove the endorsed standards override > mechanism and extension mechanism. This patch adds a VM flag in 8u40 > to help identify any existing uses of these mechanisms so that users > can turn on the VM flag to help identify if they depend on the > endorsed standards override mechanism and extension mechanism and can > plan to prepare for the migration to a newer JDK release early on. > When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the > system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if > ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any > system extension directory contains JAR files. > > Thanks > Mandy > [1] http://openjdk.java.net/jeps/220 > > > From vladimir.kozlov at oracle.com Mon Nov 17 19:47:47 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Nov 2014 11:47:47 -0800 Subject: RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment In-Reply-To: <546A206A.4070604@oracle.com> References: <546A206A.4070604@oracle.com> Message-ID: <546A50E3.6010200@oracle.com> According to next document the cpu has 10 cores (and 2 threads per core): http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz hs_err in the bug report reports only 2 processors and next lines are missing: physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 I assume it is some kind of virtual environment with which cpuid topology is not working (at least our code does not work). We may missing some checks which indicates that topology is not supported. It would be nice if you can put all topology and related cpuid bits from amazon ec2 in bug report. Checking for 0 could be fine but if it is not 0 it could be still wrong if topology info is not supported. Thanks, Vladimir On 11/17/14 8:20 AM, Vladimir Kempik wrote: > Hi, > > Please review patch adding sanity check to cores_per_cpu(): > > http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8058935 > > Few months ago we've got reports of java crashing in amazon ec2 > enviroment (they use Xen). > https://bugs.openjdk.java.net/browse/JDK-8058935 > https://bugs.openjdk.java.net/browse/JDK-8058937 > > JVM args was used to make the crash: -XX:+UnlockCommercialFeatures > -XX:+FlightRecorder > > After investigation I think the crash could only have happened if > support_processor_topology() returned true and > _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero. > > I wasn't able to reproduce the bug on amazon ec2 cloud in present days. > > The patch adds sanity check, if cpu topology was used and resulted in 0 > cores per cpu, then fallback to non-topology variant, which can't result > in 0 cores per cpu. > > Testing: JPRT. > > Thanks, > Vladimir. From coleen.phillimore at oracle.com Mon Nov 17 21:21:01 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 17 Nov 2014 16:21:01 -0500 Subject: [9] RFR(L) 8013267 : move MemberNameTable from native code to Java heap, use to intern MemberNames In-Reply-To: <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> References: <594FE416-D445-426E-B7A9-C75F012ADE16@oracle.com> <0BDCD5DD-CBFB-4A3D-9BAD-7F9912E9237C@oracle.com> <5453C230.8010709@oracle.com> <9747A3A3-B7F4-407A-8E00-1E647D9DC2D1@oracle.com> <1D195ED3-8BD2-46CB-9D70-29CF809D9F5A@oracle.com> <5456FB59.60905@oracle.com> <632A5C98-B386-4625-BE12-355241581955@oracle.com> <5457AA75.8090103@gmail.com> <5457E0F9.8090004@gmail.com> <5458A57C.4060208@gmail.com> <260D49F5-6380-4FC3-A900-6CD9AB3ED6F7@oracle.com> <5459034E.8070809@gmail.com> <39826508-110B-4FCE-9A58-8C3D1B9FC7DE@oracle.com> Message-ID: <546A66BD.7090904@oracle.com> Hi, I recommend that we split this bug into three changes. One to fix the class redefinition problem (RFR coming shortly), one to intern MemberNames for performance, and the third to potentially (maybe) make class redefinition work with the new member name table. In this change, I don't like how class redefinition has leaked into the java code to intern member names. Thanks, Coleen On 11/7/14, 4:14 PM, David Chase wrote: > New webrev: > > bug: https://bugs.openjdk.java.net/browse/JDK-8013267 > > webrevs: > http://cr.openjdk.java.net/~drchase/8013267/jdk.06/ > http://cr.openjdk.java.net/~drchase/8013267/hotspot.06/ > > Changes since last: > > 1) refactored to put ClassData under java.lang.invoke.MemberName > > 2) split the data structure into two parts; handshake with JVM uses a linked list, > which makes for a simpler backout-if-race, and Java side continues to use the > simple sorted array. This should allow easier use of (for example) fancier > data structures (like ConcurrentHashMap) if this later proves necessary. > > 3) Cleaned up symbol references in the new hotspot code to go through vmSymbols. > > 4) renamed oldCapacity to oldSize > > 5) ran two different benchmarks and saw no change in performance. > a) nashorn ScriptTest (see https://bugs.openjdk.java.net/browse/JDK-8014288 ) > b) JMH microbenchmarks > (see bug comments for details) > > And it continues to pass the previously-failing tests, as well as the new test > which has been added to hotspot/test/compiler/jsr292 . > > David > > On 2014-11-04, at 3:54 PM, David Chase wrote: > >> I?m working on the initial benchmarking, and so far this arrangement (with synchronization >> and binary search for lookup, lots of barriers and linear cost insertion) has not yet been any >> slower. >> >> I am nonetheless tempted by the 2-tables solution, because I think the simpler JVM-side >> interface that it allows is desirable. >> >> David >> >> On 2014-11-04, at 11:48 AM, Peter Levart wrote: >> >>> On 11/04/2014 04:19 PM, David Chase wrote: >>>> On 2014-11-04, at 5:07 AM, Peter Levart wrote: >>>>> Are you thinking of an IdentityHashMap type of hash table (no linked-list of elements for same bucket, just search for 1st free slot on insert)? The problem would be how to pre-size the array. Count declared members? >>>> It can?t be an identityHashMap, because we are interning member names. >>> I know it can't be IdentityHashMap - I just wondered if you were thinking of an IdentityHashMap-like data structure in contrast to standard HashMap-like. Not in terms of equality/hashCode used, but in terms of internal data structure. IdentityHashMap is just an array of elements (well pairs of them - key, value are placed in two consecutive array slots). Lookup searches for element linearly in the array starting from hashCode based index to the element if found or 1st empty array slot. It's very easy to implement if the only operations are get() and put() and could be used for interning and as a shared structure for VM to scan, but array has to be sized to at least 3/2 the number of elements for performance to not degrade. >>> >>>> In spite of my grumbling about benchmarking, I?m inclined to do that and try a couple of experiments. >>>> One possibility would be to use two data structures, one for interning, the other for communication with the VM. >>>> Because there?s no lookup in the VM data stucture it can just be an array that gets elements appended, >>>> and the synchronization dance is much simpler. >>>> >>>> For interning, maybe I use a ConcurrentHashMap, and I try the following idiom: >>>> >>>> mn = resolve(args) >>>> // deal with any errors >>>> mn? = chm.get(mn) >>>> if (mn? != null) return mn? // hoped-for-common-case >>>> >>>> synchronized (something) { >>>> mn? = chm.get(mn) >>>> if (mn? != null) return mn? >>>> txn_class = mn.getDeclaringClass() >>>> >>>> while (true) { >>>> redef_count = txn_class.redefCount() >>>> mn = resolve(args) >>>> >>>> shared_array.add(mn); >>>> // barrier, because we are a paranoid >>>> if (redef_count = redef_count.redefCount()) { >>>> chm.add(mn); // safe to publish to other Java threads. >>>> return mn; >>>> } >>>> shared_array.drop_last(); // Try again >>>> } >>>> } >>>> >>>> (Idiom gets slightly revised for the one or two other intern use cases, but this is the basic idea). >>> Yes, that's similar to what I suggested by using a linked-list of MemberName(s) instead of the "shared_array" (easier to reason about ordering of writes) and a sorted array of MemberName(s) instead of the "chm" in your scheme above. ConcurrentHashMap would certainly be the most performant solution in terms of lookup/insertion-time and concurrent throughput, but it will use more heap than a simple packed array of MemberNames. CHM is much better now in JDK8 though regarding heap use. >>> >>> A combination of the two approaches is also possible: >>> >>> - instead of maintaining a "shared_array" of MemberName(s), have them form a linked-list (you trade a slot in array for 'next' pointer in MemberName) >>> - use ConcurrentHashMap for interning. >>> >>> Regards, Peter >>> >>>> David >>>> >>>>>> And another way to view this is that we?re now quibbling about performance, when we still >>>>>> have an existing correctness problem that this patch solves, so maybe we should just get this >>>>>> done and then file an RFE. >>>>> Perhaps, yes. But note that questions about JMM and ordering of writes to array elements are about correctness, not performance. >>>>> >>>>> Regards, Peter >>>>> >>>>>> David From jiangli.zhou at oracle.com Mon Nov 17 21:52:27 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 17 Nov 2014 13:52:27 -0800 Subject: [8u40] RFR 8054008 & 8064375 backports Message-ID: <546A6E1B.2010705@oracle.com> Hi, Please approve the backport for following bugs to 8u40: JDK-8054008 : Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/9dd17854c570) JDK-8064735 : Change certain errors to warnings in CDS output (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/6155ff53a422) webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev.backport/ Thanks, Jiangli From christian.tornqvist at oracle.com Mon Nov 17 22:02:21 2014 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Mon, 17 Nov 2014 17:02:21 -0500 Subject: [8u40] RFR 8054008 & 8064375 backports In-Reply-To: <546A6E1B.2010705@oracle.com> References: <546A6E1B.2010705@oracle.com> Message-ID: <065a01d002b2$29f1c890$7dd559b0$@oracle.com> Hi Jiangli, Sounds like a good idea to backport these. Thanks, Christian -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Jiangli Zhou Sent: Monday, November 17, 2014 4:52 PM To: hotspot-runtime-dev at openjdk.java.net Subject: [8u40] RFR 8054008 & 8064375 backports Hi, Please approve the backport for following bugs to 8u40: JDK-8054008 : Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/9dd17854c570) JDK-8064735 : Change certain errors to warnings in CDS output (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/6155ff53a422) webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev.backport/ Thanks, Jiangli From jiangli.zhou at oracle.com Mon Nov 17 22:04:12 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 17 Nov 2014 14:04:12 -0800 Subject: [8u40] RFR 8054008 & 8064375 backports In-Reply-To: <065a01d002b2$29f1c890$7dd559b0$@oracle.com> References: <546A6E1B.2010705@oracle.com> <065a01d002b2$29f1c890$7dd559b0$@oracle.com> Message-ID: <546A70DC.7040708@oracle.com> Thank you Christian for the quick response! Jiangli On 11/17/2014 02:02 PM, Christian Tornqvist wrote: > Hi Jiangli, > > Sounds like a good idea to backport these. > > Thanks, > Christian > > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Jiangli Zhou > Sent: Monday, November 17, 2014 4:52 PM > To: hotspot-runtime-dev at openjdk.java.net > Subject: [8u40] RFR 8054008 & 8064375 backports > > Hi, > > Please approve the backport for following bugs to 8u40: > > JDK-8054008 : Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit > (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/9dd17854c570) > JDK-8064735 : Change certain errors to warnings in CDS output > (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/6155ff53a422) > > webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev.backport/ > > Thanks, > Jiangli > From mikhailo.seledtsov at oracle.com Mon Nov 17 23:19:52 2014 From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov) Date: Mon, 17 Nov 2014 15:19:52 -0800 Subject: [8u40] RFR 8054008 & 8064375 backports In-Reply-To: <546A70DC.7040708@oracle.com> References: <546A6E1B.2010705@oracle.com> <065a01d002b2$29f1c890$7dd559b0$@oracle.com> <546A70DC.7040708@oracle.com> Message-ID: <546A8298.9040208@oracle.com> I agree, sounds like a good idea. Misha On 11/17/2014 2:04 PM, Jiangli Zhou wrote: > Thank you Christian for the quick response! > > Jiangli > > On 11/17/2014 02:02 PM, Christian Tornqvist wrote: >> Hi Jiangli, >> >> Sounds like a good idea to backport these. >> >> Thanks, >> Christian >> >> -----Original Message----- >> From: hotspot-runtime-dev >> [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of >> Jiangli Zhou >> Sent: Monday, November 17, 2014 4:52 PM >> To: hotspot-runtime-dev at openjdk.java.net >> Subject: [8u40] RFR 8054008 & 8064375 backports >> >> Hi, >> >> Please approve the backport for following bugs to 8u40: >> >> JDK-8054008 : Using >> -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win 64bit >> (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/9dd17854c570) >> JDK-8064735 : >> Change certain errors to warnings in CDS output >> (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/6155ff53a422) >> >> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev.backport/ >> >> Thanks, >> Jiangli >> > From jiangli.zhou at oracle.com Tue Nov 18 00:52:47 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 17 Nov 2014 16:52:47 -0800 Subject: [8u40] RFR 8054008 & 8064375 backports In-Reply-To: <546A8298.9040208@oracle.com> References: <546A6E1B.2010705@oracle.com> <065a01d002b2$29f1c890$7dd559b0$@oracle.com> <546A70DC.7040708@oracle.com> <546A8298.9040208@oracle.com> Message-ID: <546A985F.7030504@oracle.com> Thank you, Misha. Jiangli On 11/17/2014 03:19 PM, Mikhailo Seledtsov wrote: > I agree, sounds like a good idea. > > Misha > > On 11/17/2014 2:04 PM, Jiangli Zhou wrote: >> Thank you Christian for the quick response! >> >> Jiangli >> >> On 11/17/2014 02:02 PM, Christian Tornqvist wrote: >>> Hi Jiangli, >>> >>> Sounds like a good idea to backport these. >>> >>> Thanks, >>> Christian >>> >>> -----Original Message----- >>> From: hotspot-runtime-dev >>> [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of >>> Jiangli Zhou >>> Sent: Monday, November 17, 2014 4:52 PM >>> To: hotspot-runtime-dev at openjdk.java.net >>> Subject: [8u40] RFR 8054008 & 8064375 backports >>> >>> Hi, >>> >>> Please approve the backport for following bugs to 8u40: >>> >>> JDK-8054008 : >>> Using -XX:-LazyBootClassLoader crashes with ACCESS_VIOLATION on Win >>> 64bit >>> (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/9dd17854c570) >>> JDK-8064735 : >>> Change certain errors to warnings in CDS output >>> (http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/6155ff53a422) >>> >>> webrev: http://cr.openjdk.java.net/~jiangli/8054008/webrev.backport/ >>> >>> Thanks, >>> Jiangli >>> >> > From david.holmes at oracle.com Tue Nov 18 01:50:14 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Nov 2014 11:50:14 +1000 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <5469B920.4040300@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> <5469B920.4040300@oracle.com> Message-ID: <546AA5D6.4050203@oracle.com> Hi Ivan, On 17/11/2014 7:00 PM, Ivan Gerasimov wrote: > Thanks David! > > On 17.11.2014 9:40, David Holmes wrote: >> On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >>> Thank you Daniel! >>> >>> Please find the updated webrev with your suggestions incorporated here: >>> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >>> >>> Concerning the thread priority: If the application is of >>> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >>> THREAD_PRIORITY_HIGHEST will result in its priority value to be only 10 >>> (of maximum 31). >>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx >>> >>> >>> >>> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >>> HIGHEST priority level will have priority value == 15 of 31. >>> >>> I believe, it should not be too much, and the machine will not become >>> busy with only those closing threads. >>> However, I hope it would be enough to make them complete faster than >>> other threads of the NORMAL priority level withing the same application. >> >> I don't think this is necessary or desirable. Under normal usage we're >> giving priority to exiting threads and that may disrupt the usual >> scheduling patterns that applications see. You may posit that it is >> "harmless" but we can't say that for sure. Nor can we actually know >> that this will help with this particular bug. I would not add in this >> new code. >> > > There are two places where I put adjusting the thread's priority: > > 1) We've the array of handles filled up. > > If we're found in this code branch, it'll mean that unfortunately we've > already got broken exit pattern, because the current thread has to do a > blocking call, having the ownership of a critical section. > The full array of handles means that many threads are exiting at that > time, thus all the threads that are starting to exit after the current > one will block at the attempt to grab ownership of the critical section. > > Raising the priority of one thread that had already reached > _endthreadex(), seems appropriate to me in such a situation, because it > helps shorten the period of time when the threads remain blocked. > > Choosing the oldest exiting thread ensures that the period of time when > the priority of one thread is higher is the smallest possible. > > 2) The process exit branch. > > That's the main part of the fix -- here we make the process to wait for > all the threads having called _endthreadex() to complete, at the same > time preventing any other threads from starting the exiting procedure. > The execution flow is already changed here (I don't want to say > disrupted, because it was meant to fix the issue). > > All running threads are about to be terminated soon by ending the > process, so raising the priority of some of the threads should not have > any bad impact on the program flow. > Instead, it may make the time the process has to wait before calling > exit() shorter. > > > I can surely remove that playing with the threads' priority, as it's not > the essential part of the fix. > However, I think it's a useful hint to the scheduler, which can improve > things in some situations, and I'm not really sure how it can harm. Okay. You've convinced me. I'm okay with the priority changes to try to minimize the exit time blocking. Thanks, David > > Sincerely yours, > Ivan > > >> David >> >>> Sincerely yours, >>> Ivan >>> >>> >>> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>>> Hello! >>>>> >>>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>>> for exiting threads [win]) caused the warning message to be printed >>>>> in some test environments: >>>>> ----------- >>>>> os_windows.cpp:3844 is in the newly updated >>>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>>> ----------- >>>>> >>>>> This has been observed with debug builds on highly loaded systems. >>>>> >>>>> >>>>> To address the issue it is proposed to do three things: >>>>> 1) increase the timeout for debug builds, >>>>> 2) increase the maximum number of the thread handles to be stored, >>>>> 3) rise the priority of the exiting threads, if we need to wait for >>>>> them. >>>>> >>>>> Would you please help review the fix? >>>>> >>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>>> >>>> src/os/windows/vm/os_windows.cpp >>>> >>>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>> >>>> That uses the smaller value for only one build config (PRODUCT). >>>> >>>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) >>>> /*1 sec in product, 4 sec in debug*/ >>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>> Please add spaces between the comment delimiters and the comment >>>> text. >>>> >>>> That uses the smaller timeout for only one build config (PRODUCT). >>>> >>>> line 3836 // Rise the priority... >>>> Typo: 'Rise' -> 'Raise' >>>> >>>> About the general idea of raising the exiting thread's priority, >>>> if the exiting thread is looping in some Win* OS code after this >>>> point, will raising the priority make the machine unusable? >>>> >>>> Dan >>>> >>>> >>>>> >>>>> The fix was tested on all available platforms, with the hotspot >>>>> testset. No failures. >>>>> >>>>> Sincerely yours, >>>>> Ivan >>>>> >>>> >>>> >>>> >>> >> >> > From daniel.daugherty at oracle.com Tue Nov 18 02:01:18 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 17 Nov 2014 19:01:18 -0700 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <546AA5D6.4050203@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> <5469B920.4040300@oracle.com> <546AA5D6.4050203@oracle.com> Message-ID: <546AA86E.3030308@oracle.com> Ivan, Please coordinate with Staffan Larsen about when he is planning to take this week's snapshot of JDK9-hs-rt (RT_Baseline). Please push your fix after Staffan's snapshot so we can have a week of soak time for this version of the fix... Dan On 11/17/14 6:50 PM, David Holmes wrote: > Hi Ivan, > > On 17/11/2014 7:00 PM, Ivan Gerasimov wrote: >> Thanks David! >> >> On 17.11.2014 9:40, David Holmes wrote: >>> On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >>>> Thank you Daniel! >>>> >>>> Please find the updated webrev with your suggestions incorporated >>>> here: >>>> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >>>> >>>> Concerning the thread priority: If the application is of >>>> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >>>> THREAD_PRIORITY_HIGHEST will result in its priority value to be >>>> only 10 >>>> (of maximum 31). >>>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx >>>> >>>> >>>> >>>> >>>> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >>>> HIGHEST priority level will have priority value == 15 of 31. >>>> >>>> I believe, it should not be too much, and the machine will not become >>>> busy with only those closing threads. >>>> However, I hope it would be enough to make them complete faster than >>>> other threads of the NORMAL priority level withing the same >>>> application. >>> >>> I don't think this is necessary or desirable. Under normal usage we're >>> giving priority to exiting threads and that may disrupt the usual >>> scheduling patterns that applications see. You may posit that it is >>> "harmless" but we can't say that for sure. Nor can we actually know >>> that this will help with this particular bug. I would not add in this >>> new code. >>> >> >> There are two places where I put adjusting the thread's priority: >> >> 1) We've the array of handles filled up. >> >> If we're found in this code branch, it'll mean that unfortunately we've >> already got broken exit pattern, because the current thread has to do a >> blocking call, having the ownership of a critical section. >> The full array of handles means that many threads are exiting at that >> time, thus all the threads that are starting to exit after the current >> one will block at the attempt to grab ownership of the critical section. >> >> Raising the priority of one thread that had already reached >> _endthreadex(), seems appropriate to me in such a situation, because it >> helps shorten the period of time when the threads remain blocked. >> >> Choosing the oldest exiting thread ensures that the period of time when >> the priority of one thread is higher is the smallest possible. >> >> 2) The process exit branch. >> >> That's the main part of the fix -- here we make the process to wait for >> all the threads having called _endthreadex() to complete, at the same >> time preventing any other threads from starting the exiting procedure. >> The execution flow is already changed here (I don't want to say >> disrupted, because it was meant to fix the issue). >> >> All running threads are about to be terminated soon by ending the >> process, so raising the priority of some of the threads should not have >> any bad impact on the program flow. >> Instead, it may make the time the process has to wait before calling >> exit() shorter. >> >> >> I can surely remove that playing with the threads' priority, as it's not >> the essential part of the fix. >> However, I think it's a useful hint to the scheduler, which can improve >> things in some situations, and I'm not really sure how it can harm. > > Okay. You've convinced me. I'm okay with the priority changes to try > to minimize the exit time blocking. > > Thanks, > David > >> >> Sincerely yours, >> Ivan >> >> >>> David >>> >>>> Sincerely yours, >>>> Ivan >>>> >>>> >>>> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>>>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>>>> Hello! >>>>>> >>>>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>>>> for exiting threads [win]) caused the warning message to be printed >>>>>> in some test environments: >>>>>> ----------- >>>>>> os_windows.cpp:3844 is in the newly updated >>>>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>>>> ----------- >>>>>> >>>>>> This has been observed with debug builds on highly loaded systems. >>>>>> >>>>>> >>>>>> To address the issue it is proposed to do three things: >>>>>> 1) increase the timeout for debug builds, >>>>>> 2) increase the maximum number of the thread handles to be stored, >>>>>> 3) rise the priority of the exiting threads, if we need to wait for >>>>>> them. >>>>>> >>>>>> Would you please help review the fix? >>>>>> >>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>>>> >>>>> src/os/windows/vm/os_windows.cpp >>>>> >>>>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>> >>>>> That uses the smaller value for only one build config (PRODUCT). >>>>> >>>>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) >>>>> DEBUG_ONLY(4000) >>>>> /*1 sec in product, 4 sec in debug*/ >>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>> Please add spaces between the comment delimiters and the comment >>>>> text. >>>>> >>>>> That uses the smaller timeout for only one build config >>>>> (PRODUCT). >>>>> >>>>> line 3836 // Rise the priority... >>>>> Typo: 'Rise' -> 'Raise' >>>>> >>>>> About the general idea of raising the exiting thread's priority, >>>>> if the exiting thread is looping in some Win* OS code after this >>>>> point, will raising the priority make the machine unusable? >>>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> The fix was tested on all available platforms, with the hotspot >>>>>> testset. No failures. >>>>>> >>>>>> Sincerely yours, >>>>>> Ivan >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> From mandy.chung at oracle.com Tue Nov 18 02:02:58 2014 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 17 Nov 2014 18:02:58 -0800 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546A28F3.1010802@oracle.com> References: <546A28F3.1010802@oracle.com> Message-ID: <546AA8D2.1050600@oracle.com> Updated webrev: http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.01/ This addresses Calvin's comment. It now keeps a list of the jar files shipped with jre/lib/ext and determine if jre/lib/ext has any other non-JDK jar files installed. Mandy On 11/17/2014 8:57 AM, Mandy Chung wrote: > This requests both code review and 8u40 approval for: > https://bugs.openjdk.java.net/browse/JDK-8064667 > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ > > JEP 220 [1] proposes to remove the endorsed standards override > mechanism and extension mechanism. This patch adds a VM flag in 8u40 > to help identify any existing uses of these mechanisms so that users > can turn on the VM flag to help identify if they depend on the > endorsed standards override mechanism and extension mechanism and can > plan to prepare for the migration to a newer JDK release early on. > When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the > system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if > ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any > system extension directory contains JAR files. > > Thanks > Mandy > [1] http://openjdk.java.net/jeps/220 > > > From daniel.daugherty at oracle.com Tue Nov 18 02:05:30 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 17 Nov 2014 19:05:30 -0700 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <546AA86E.3030308@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> <5469B920.4040300@oracle.com> <546AA5D6.4050203@oracle.com> <546AA86E.3030308@oracle.com> Message-ID: <546AA96A.1020203@oracle.com> Ivan, I spoke too soon. There's a review comment from Markus G that hasn't been addressed. We need to see if you've convinced Markus in addition to David H. Dan P.S. Look for Markus' reply to David H's e-mail; it not in this fork of the review thread... On 11/17/14 7:01 PM, Daniel D. Daugherty wrote: > Ivan, > > Please coordinate with Staffan Larsen about when he is planning to > take this week's snapshot of JDK9-hs-rt (RT_Baseline). Please push > your fix after Staffan's snapshot so we can have a week of soak > time for this version of the fix... > > Dan > > > On 11/17/14 6:50 PM, David Holmes wrote: >> Hi Ivan, >> >> On 17/11/2014 7:00 PM, Ivan Gerasimov wrote: >>> Thanks David! >>> >>> On 17.11.2014 9:40, David Holmes wrote: >>>> On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >>>>> Thank you Daniel! >>>>> >>>>> Please find the updated webrev with your suggestions incorporated >>>>> here: >>>>> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >>>>> >>>>> Concerning the thread priority: If the application is of >>>>> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >>>>> THREAD_PRIORITY_HIGHEST will result in its priority value to be >>>>> only 10 >>>>> (of maximum 31). >>>>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx >>>>> >>>>> >>>>> >>>>> >>>>> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >>>>> HIGHEST priority level will have priority value == 15 of 31. >>>>> >>>>> I believe, it should not be too much, and the machine will not become >>>>> busy with only those closing threads. >>>>> However, I hope it would be enough to make them complete faster than >>>>> other threads of the NORMAL priority level withing the same >>>>> application. >>>> >>>> I don't think this is necessary or desirable. Under normal usage we're >>>> giving priority to exiting threads and that may disrupt the usual >>>> scheduling patterns that applications see. You may posit that it is >>>> "harmless" but we can't say that for sure. Nor can we actually know >>>> that this will help with this particular bug. I would not add in this >>>> new code. >>>> >>> >>> There are two places where I put adjusting the thread's priority: >>> >>> 1) We've the array of handles filled up. >>> >>> If we're found in this code branch, it'll mean that unfortunately we've >>> already got broken exit pattern, because the current thread has to do a >>> blocking call, having the ownership of a critical section. >>> The full array of handles means that many threads are exiting at that >>> time, thus all the threads that are starting to exit after the current >>> one will block at the attempt to grab ownership of the critical >>> section. >>> >>> Raising the priority of one thread that had already reached >>> _endthreadex(), seems appropriate to me in such a situation, because it >>> helps shorten the period of time when the threads remain blocked. >>> >>> Choosing the oldest exiting thread ensures that the period of time when >>> the priority of one thread is higher is the smallest possible. >>> >>> 2) The process exit branch. >>> >>> That's the main part of the fix -- here we make the process to wait for >>> all the threads having called _endthreadex() to complete, at the same >>> time preventing any other threads from starting the exiting procedure. >>> The execution flow is already changed here (I don't want to say >>> disrupted, because it was meant to fix the issue). >>> >>> All running threads are about to be terminated soon by ending the >>> process, so raising the priority of some of the threads should not have >>> any bad impact on the program flow. >>> Instead, it may make the time the process has to wait before calling >>> exit() shorter. >>> >>> >>> I can surely remove that playing with the threads' priority, as it's >>> not >>> the essential part of the fix. >>> However, I think it's a useful hint to the scheduler, which can improve >>> things in some situations, and I'm not really sure how it can harm. >> >> Okay. You've convinced me. I'm okay with the priority changes to try >> to minimize the exit time blocking. >> >> Thanks, >> David >> >>> >>> Sincerely yours, >>> Ivan >>> >>> >>>> David >>>> >>>>> Sincerely yours, >>>>> Ivan >>>>> >>>>> >>>>> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>>>>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>>>>> Hello! >>>>>>> >>>>>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>>>>> for exiting threads [win]) caused the warning message to be printed >>>>>>> in some test environments: >>>>>>> ----------- >>>>>>> os_windows.cpp:3844 is in the newly updated >>>>>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>>>>> ----------- >>>>>>> >>>>>>> This has been observed with debug builds on highly loaded systems. >>>>>>> >>>>>>> >>>>>>> To address the issue it is proposed to do three things: >>>>>>> 1) increase the timeout for debug builds, >>>>>>> 2) increase the maximum number of the thread handles to be stored, >>>>>>> 3) rise the priority of the exiting threads, if we need to wait for >>>>>>> them. >>>>>>> >>>>>>> Would you please help review the fix? >>>>>>> >>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>>>>> >>>>>> src/os/windows/vm/os_windows.cpp >>>>>> >>>>>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>>> >>>>>> That uses the smaller value for only one build config (PRODUCT). >>>>>> >>>>>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) >>>>>> DEBUG_ONLY(4000) >>>>>> /*1 sec in product, 4 sec in debug*/ >>>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>>> Please add spaces between the comment delimiters and the comment >>>>>> text. >>>>>> >>>>>> That uses the smaller timeout for only one build config >>>>>> (PRODUCT). >>>>>> >>>>>> line 3836 // Rise the priority... >>>>>> Typo: 'Rise' -> 'Raise' >>>>>> >>>>>> About the general idea of raising the exiting thread's priority, >>>>>> if the exiting thread is looping in some Win* OS code after this >>>>>> point, will raising the priority make the machine unusable? >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> >>>>>>> The fix was tested on all available platforms, with the hotspot >>>>>>> testset. No failures. >>>>>>> >>>>>>> Sincerely yours, >>>>>>> Ivan >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> > > > > From ioi.lam at oracle.com Tue Nov 18 02:13:48 2014 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 18 Nov 2014 10:13:48 +0800 Subject: RFR (XS) 8064701: Some CDS optimizations should be disabled if bootclasspath is modified by JVMTI Message-ID: <546AAB5C.1070009@oracle.com> Please review a very small fix: http://cr.openjdk.java.net/~iklam/8064701-append-boot-v1/hotspot/ Bug: Some CDS optimizations should be disabled if bootclasspath is modified by JVMTI https://bugs.openjdk.java.net/browse/JDK-8064701 Summary of fix: This change adds an API so that the class loader is notified when JVMTI modifies the boot classpath. Further CDS optimizations can use this API to disable optimizations that may be invalidated by boot classpath modifications. Also added white box testing API for invoking JVMTI boot/system classpath modifications for further CDS testing needs. Tests: JPRT Thanks - Ioi From jiangli.zhou at oracle.com Tue Nov 18 02:54:41 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 17 Nov 2014 18:54:41 -0800 Subject: RFR (XS) 8064701: Some CDS optimizations should be disabled if bootclasspath is modified by JVMTI In-Reply-To: <546AAB5C.1070009@oracle.com> References: <546AAB5C.1070009@oracle.com> Message-ID: <546AB4F1.4010104@oracle.com> Hi Ioi, Looks good. Thanks, Jiangli On 11/17/2014 06:13 PM, Ioi Lam wrote: > Please review a very small fix: > > http://cr.openjdk.java.net/~iklam/8064701-append-boot-v1/hotspot/ > > Bug: Some CDS optimizations should be disabled if bootclasspath is > modified by JVMTI > https://bugs.openjdk.java.net/browse/JDK-8064701 > > > Summary of fix: > > This change adds an API so that the class loader is notified when > JVMTI modifies > the boot classpath. Further CDS optimizations can use this API to > disable > optimizations that may be invalidated by boot classpath > modifications. > > Also added white box testing API for invoking JVMTI boot/system classpath > modifications for further CDS testing needs. > > Tests: > > JPRT > > Thanks > - Ioi From yumin.qi at oracle.com Tue Nov 18 03:59:58 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Mon, 17 Nov 2014 19:59:58 -0800 Subject: RFR (XS) 8064701: Some CDS optimizations should be disabled if bootclasspath is modified by JVMTI In-Reply-To: <546AAB5C.1070009@oracle.com> References: <546AAB5C.1070009@oracle.com> Message-ID: <546AC43E.8070800@oracle.com> Looks good. Not "R"eviewer. Thanks Yumin On 11/17/2014 6:13 PM, Ioi Lam wrote: > Please review a very small fix: > > http://cr.openjdk.java.net/~iklam/8064701-append-boot-v1/hotspot/ > > Bug: Some CDS optimizations should be disabled if bootclasspath is > modified by JVMTI > https://bugs.openjdk.java.net/browse/JDK-8064701 > > > Summary of fix: > > This change adds an API so that the class loader is notified when > JVMTI modifies > the boot classpath. Further CDS optimizations can use this API to > disable > optimizations that may be invalidated by boot classpath > modifications. > > Also added white box testing API for invoking JVMTI boot/system classpath > modifications for further CDS testing needs. > > Tests: > > JPRT > > Thanks > - Ioi From david.holmes at oracle.com Tue Nov 18 04:04:44 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Nov 2014 14:04:44 +1000 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <5469994D.3070208@oracle.com> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> <5463149A.6020506@oracle.com> <54637A9A.9040108@sap.com> <546470BD.9050303@oracle.com> <5464DED4.9040909@sap.com> <5469994D.3070208@oracle.com> Message-ID: <546AC55C.4090201@oracle.com> Gunter, On 17/11/2014 4:44 PM, David Holmes wrote: > On 14/11/2014 2:39 AM, Haug, Gunter wrote: >> >> On 13.11.2014 09:50, David Holmes wrote: >>> On 13/11/2014 1:19 AM, Haug, Gunter wrote: >>>> >>>> On 12.11.2014 09:04, David Holmes wrote: >>>>> Hi Gunter, >>>>> >>>>> On 11/11/2014 11:23 PM, Haug, Gunter wrote: >>>>>> Hi All, >>>>>> >>>>>> The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs >>>>>> improvement)' makes use of getrusage() to retrieve accurate >>>>>> per-thread data on resource usage. We can use exactly the same code >>>>>> on AIX to achieve this. >>>>>> >>>>>> Please review the following change: >>>>>> >>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-8064471 >>>>> >>>>> I have a couple of comments on this code which presumably also apply >>>>> to the orginal :( >>>> Yes, they apply to the original as well, see below. >>>>> >>>>> First this comment is no longer applicable (actually it was never >>>>> applicable to AIX!): >>>>> >>>>> // For now, we say that linux does not support vtime. I have no idea >>>>> // whether it can actually be made to (DLD, 9/13/05). >>>>> >>>> You're right. I will remove it. >>>>> Second this calculation seems wrong: >>>>> >>>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>>> (double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000 * >>>>> 1000); >>>>> >>>>> To me this performs integer division (ie truncation_) then converts >>>>> the resulting integer to a double. I would expect to see additional >>>>> parentheses (even if not needed, for clarity): >>>>> >>>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>>> ((double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec)) / (1000 * >>>>> 1000); >>>>> >>>>> or more simply divide by a floating-point value: >>>>> >>>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>>>> >>>>> and you don't need two double casts regardless as the expression will >>>>> be of type double as soon as there is one operand of type double. So >>>>> that should reduce to: >>>>> >>>>> return usage.ru_utime.tv_sec + usage.ru_stime.tv_sec + >>>>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>>>> >>>> OK. Do you want that we also change the Linux version like you >>>> proposed? >>> >>> I'll leave it up to you. If you leave this as AIX only then it tests >>> the new process :) There can be a follow up cleanup bug for linux. >> >> Hi David, >> >> I think it's not worth the effort to make two separate changes on linux >> and aix, so I fixed linux as well. Please find the new webrev below. >> There will probably be more opportunities to test the new process in the >> future. >> >> http://cr.openjdk.java.net/~simonis/webrevs/8064471.v2/ >> >> >> >> Now we need a sponsor, as it is not aix only anymore. > > I guess that will have to be me. :) I will try to look at this again > tomorrow. The original code was in fact correct - the double cast binds to the summation before the division is applied. Given that and the fact the linux code doesn't contain the incorrect comment, I don't see any need to modify the linux code. You can simply push the AIX change by itself. Sorry for messing you around on this. David > David > >> Thanks, >> Gunter >> >> >>> >>> Thanks, >>> David >>> >>>> Thanks, >>>> Gunter >>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> Thanks, >>>>>> Gunter >>>>>> >>>> >> From ivan.gerasimov at oracle.com Tue Nov 18 07:29:30 2014 From: ivan.gerasimov at oracle.com (Ivan Gerasimov) Date: Tue, 18 Nov 2014 10:29:30 +0300 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> Message-ID: <546AF55A.8090203@oracle.com> Hi Markus! The priority of the exiting thread will be raised for quite a short period of time -- right before the thread finishes exiting. There are two places where the priority is adjusted. Under normal conditions we should never see the first place hit. However, if we do, this means we have a huge number of threads. Raising the priority of one of them is a hint about which thread we want the scheduler to focus on. The second place is a bit different. We have several threads running immediately before ending the process. Some of them are at the exiting path and block exiting of the whole process. Raising the priority of those threads is a way to say we're not interested in all the other threads, as they are going to be terminated anyway. I just noticed that in second scenario it may be appropriate to set the priority of the current thread to the same level as for the exiting threads. This way it'll be given a fair chance to continue if the timeout expires. I also think it should be enough to set the priority level to THREAD_PRIORITY_ABOVE_NORMAL instead of THREAD_PRIORITY_HIGHEST. It will give just +1 to the priority value -- should be enough for the hint. Would you please take a look at the updated webrev: http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ Sincerely yours, Ivan On 17.11.2014 11:33, Markus Gr?nlund wrote: > I agree with David. > > The side effects will be unknown and very hard to debug. > > Is there another way to accomplish the results without manipulating base services? > > Thanks > Markus > > -----Original Message----- > From: David Holmes > Sent: den 17 november 2014 07:40 > To: Ivan Gerasimov; Daniel Daugherty > Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev > Subject: Re: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 > > On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >> Thank you Daniel! >> >> Please find the updated webrev with your suggestions incorporated here: >> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >> >> Concerning the thread priority: If the application is of >> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >> THREAD_PRIORITY_HIGHEST will result in its priority value to be only >> 10 (of maximum 31). >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs. >> 85).aspx >> >> >> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >> HIGHEST priority level will have priority value == 15 of 31. >> >> I believe, it should not be too much, and the machine will not become >> busy with only those closing threads. >> However, I hope it would be enough to make them complete faster than >> other threads of the NORMAL priority level withing the same application. > I don't think this is necessary or desirable. Under normal usage we're giving priority to exiting threads and that may disrupt the usual scheduling patterns that applications see. You may posit that it is "harmless" but we can't say that for sure. Nor can we actually know that this will help with this particular bug. I would not add in this new code. > > David > >> Sincerely yours, >> Ivan >> >> >> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>> Hello! >>>> >>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>> for exiting threads [win]) caused the warning message to be printed >>>> in some test environments: >>>> ----------- >>>> os_windows.cpp:3844 is in the newly updated >>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>> ----------- >>>> >>>> This has been observed with debug builds on highly loaded systems. >>>> >>>> >>>> To address the issue it is proposed to do three things: >>>> 1) increase the timeout for debug builds, >>>> 2) increase the maximum number of the thread handles to be stored, >>>> 3) rise the priority of the exiting threads, if we need to wait for >>>> them. >>>> >>>> Would you please help review the fix? >>>> >>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>> src/os/windows/vm/os_windows.cpp >>> >>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>> >>> That uses the smaller value for only one build config (PRODUCT). >>> >>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) >>> /*1 sec in product, 4 sec in debug*/ >>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>> Please add spaces between the comment delimiters and the comment >>> text. >>> >>> That uses the smaller timeout for only one build config (PRODUCT). >>> >>> line 3836 // Rise the priority... >>> Typo: 'Rise' -> 'Raise' >>> >>> About the general idea of raising the exiting thread's priority, >>> if the exiting thread is looping in some Win* OS code after this >>> point, will raising the priority make the machine unusable? >>> >>> Dan >>> >>> >>>> The fix was tested on all available platforms, with the hotspot >>>> testset. No failures. >>>> >>>> Sincerely yours, >>>> Ivan >>>> >>> >>> > From volker.simonis at gmail.com Tue Nov 18 09:50:37 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 18 Nov 2014 10:50:37 +0100 Subject: RFR(XS): 8064471: Port 8013895: G1: G1SummarizeRSetStats output on Linux needs improvement to AIX In-Reply-To: <546AC55C.4090201@oracle.com> References: <44EB1DA1040EEB44B7F343A782AD30789FEBA3E3@DEWDFEMB11B.global.corp.sap> <5463149A.6020506@oracle.com> <54637A9A.9040108@sap.com> <546470BD.9050303@oracle.com> <5464DED4.9040909@sap.com> <5469994D.3070208@oracle.com> <546AC55C.4090201@oracle.com> Message-ID: OK, thanks. Just pushed it to hotspot-rt and it worked! Regards, Volker On Tue, Nov 18, 2014 at 5:04 AM, David Holmes wrote: > Gunter, > > > On 17/11/2014 4:44 PM, David Holmes wrote: >> >> On 14/11/2014 2:39 AM, Haug, Gunter wrote: >>> >>> >>> On 13.11.2014 09:50, David Holmes wrote: >>>> >>>> On 13/11/2014 1:19 AM, Haug, Gunter wrote: >>>>> >>>>> >>>>> On 12.11.2014 09:04, David Holmes wrote: >>>>>> >>>>>> Hi Gunter, >>>>>> >>>>>> On 11/11/2014 11:23 PM, Haug, Gunter wrote: >>>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> The change '8013895: (G1: G1SummarizeRSetStats output on Linux needs >>>>>>> improvement)' makes use of getrusage() to retrieve accurate >>>>>>> per-thread data on resource usage. We can use exactly the same code >>>>>>> on AIX to achieve this. >>>>>>> >>>>>>> Please review the following change: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8064471-aixTime/webrev.00/ >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8064471 >>>>>> >>>>>> >>>>>> I have a couple of comments on this code which presumably also apply >>>>>> to the orginal :( >>>>> >>>>> Yes, they apply to the original as well, see below. >>>>>> >>>>>> >>>>>> First this comment is no longer applicable (actually it was never >>>>>> applicable to AIX!): >>>>>> >>>>>> // For now, we say that linux does not support vtime. I have no idea >>>>>> // whether it can actually be made to (DLD, 9/13/05). >>>>>> >>>>> You're right. I will remove it. >>>>>> >>>>>> Second this calculation seems wrong: >>>>>> >>>>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>>>> (double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000 * >>>>>> 1000); >>>>>> >>>>>> To me this performs integer division (ie truncation_) then converts >>>>>> the resulting integer to a double. I would expect to see additional >>>>>> parentheses (even if not needed, for clarity): >>>>>> >>>>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>>>> ((double) (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec)) / (1000 * >>>>>> 1000); >>>>>> >>>>>> or more simply divide by a floating-point value: >>>>>> >>>>>> return (double) (usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) + >>>>>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>>>>> >>>>>> and you don't need two double casts regardless as the expression will >>>>>> be of type double as soon as there is one operand of type double. So >>>>>> that should reduce to: >>>>>> >>>>>> return usage.ru_utime.tv_sec + usage.ru_stime.tv_sec + >>>>>> (usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / (1000.0 * 1000); >>>>>> >>>>> OK. Do you want that we also change the Linux version like you >>>>> proposed? >>>> >>>> >>>> I'll leave it up to you. If you leave this as AIX only then it tests >>>> the new process :) There can be a follow up cleanup bug for linux. >>> >>> >>> Hi David, >>> >>> I think it's not worth the effort to make two separate changes on linux >>> and aix, so I fixed linux as well. Please find the new webrev below. >>> There will probably be more opportunities to test the new process in the >>> future. >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/8064471.v2/ >>> >>> >>> >>> Now we need a sponsor, as it is not aix only anymore. >> >> >> I guess that will have to be me. :) I will try to look at this again >> tomorrow. > > > The original code was in fact correct - the double cast binds to the > summation before the division is applied. Given that and the fact the linux > code doesn't contain the incorrect comment, I don't see any need to modify > the linux code. You can simply push the AIX change by itself. > > Sorry for messing you around on this. > > David > > >> David >> >>> Thanks, >>> Gunter >>> >>> >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Gunter >>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>>> Thanks, >>>>>>> Gunter >>>>>>> >>>>> >>> > From markus.gronlund at oracle.com Tue Nov 18 13:02:45 2014 From: markus.gronlund at oracle.com (=?utf-8?B?TWFya3VzIEdyw7ZubHVuZA==?=) Date: Tue, 18 Nov 2014 05:02:45 -0800 (PST) Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <546AF55A.8090203@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> <546AF55A.8090203@oracle.com> Message-ID: <31e5d701-c75f-478b-b4a1-3585c40ba274@default> Hi Ivan, I don't want to you block you from getting this in - I need to get the full story behind all these changes (backtracking now). If I find something that I think we should revisit, we can always do that later. So pls go ahead. Thanks Markus PS. I have some concerns (but will need to get back to you on that after tracing down the exact details)). Do you have a particular test case that you have been working on for these changes? -----Original Message----- From: Ivan Gerasimov Sent: den 18 november 2014 08:30 To: Markus Gr?nlund; David Holmes; Daniel Daugherty Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev Subject: Re: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 Hi Markus! The priority of the exiting thread will be raised for quite a short period of time -- right before the thread finishes exiting. There are two places where the priority is adjusted. Under normal conditions we should never see the first place hit. However, if we do, this means we have a huge number of threads. Raising the priority of one of them is a hint about which thread we want the scheduler to focus on. The second place is a bit different. We have several threads running immediately before ending the process. Some of them are at the exiting path and block exiting of the whole process. Raising the priority of those threads is a way to say we're not interested in all the other threads, as they are going to be terminated anyway. I just noticed that in second scenario it may be appropriate to set the priority of the current thread to the same level as for the exiting threads. This way it'll be given a fair chance to continue if the timeout expires. I also think it should be enough to set the priority level to THREAD_PRIORITY_ABOVE_NORMAL instead of THREAD_PRIORITY_HIGHEST. It will give just +1 to the priority value -- should be enough for the hint. Would you please take a look at the updated webrev: http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ Sincerely yours, Ivan On 17.11.2014 11:33, Markus Gr?nlund wrote: > I agree with David. > > The side effects will be unknown and very hard to debug. > > Is there another way to accomplish the results without manipulating base services? > > Thanks > Markus > > -----Original Message----- > From: David Holmes > Sent: den 17 november 2014 07:40 > To: Ivan Gerasimov; Daniel Daugherty > Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev > Subject: Re: RFR 8064694: Kitchensink: WaitForMultipleObjects failed > in hotspot\src\os\windows\vm\os_windows.cpp: 3844 > > On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >> Thank you Daniel! >> >> Please find the updated webrev with your suggestions incorporated here: >> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >> >> Concerning the thread priority: If the application is of >> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >> THREAD_PRIORITY_HIGHEST will result in its priority value to be only >> 10 (of maximum 31). >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs. >> 85).aspx >> >> >> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >> HIGHEST priority level will have priority value == 15 of 31. >> >> I believe, it should not be too much, and the machine will not become >> busy with only those closing threads. >> However, I hope it would be enough to make them complete faster than >> other threads of the NORMAL priority level withing the same application. > I don't think this is necessary or desirable. Under normal usage we're giving priority to exiting threads and that may disrupt the usual scheduling patterns that applications see. You may posit that it is "harmless" but we can't say that for sure. Nor can we actually know that this will help with this particular bug. I would not add in this new code. > > David > >> Sincerely yours, >> Ivan >> >> >> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>> Hello! >>>> >>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>> for exiting threads [win]) caused the warning message to be printed >>>> in some test environments: >>>> ----------- >>>> os_windows.cpp:3844 is in the newly updated >>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>> ----------- >>>> >>>> This has been observed with debug builds on highly loaded systems. >>>> >>>> >>>> To address the issue it is proposed to do three things: >>>> 1) increase the timeout for debug builds, >>>> 2) increase the maximum number of the thread handles to be stored, >>>> 3) rise the priority of the exiting threads, if we need to wait for >>>> them. >>>> >>>> Would you please help review the fix? >>>> >>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>> src/os/windows/vm/os_windows.cpp >>> >>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>> >>> That uses the smaller value for only one build config (PRODUCT). >>> >>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) DEBUG_ONLY(4000) >>> /*1 sec in product, 4 sec in debug*/ >>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>> Please add spaces between the comment delimiters and the >>> comment text. >>> >>> That uses the smaller timeout for only one build config (PRODUCT). >>> >>> line 3836 // Rise the priority... >>> Typo: 'Rise' -> 'Raise' >>> >>> About the general idea of raising the exiting thread's priority, >>> if the exiting thread is looping in some Win* OS code after this >>> point, will raising the priority make the machine unusable? >>> >>> Dan >>> >>> >>>> The fix was tested on all available platforms, with the hotspot >>>> testset. No failures. >>>> >>>> Sincerely yours, >>>> Ivan >>>> >>> >>> > From daniel.daugherty at oracle.com Tue Nov 18 15:27:36 2014 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 18 Nov 2014 08:27:36 -0700 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <546AF55A.8090203@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> <546AF55A.8090203@oracle.com> Message-ID: <546B6568.7040701@oracle.com> > http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ src/os/windows/vm/os_windows.cpp No commments. Thumbs up. Dan On 11/18/14 12:29 AM, Ivan Gerasimov wrote: > Hi Markus! > > The priority of the exiting thread will be raised for quite a short > period of time -- right before the thread finishes exiting. > > There are two places where the priority is adjusted. > > Under normal conditions we should never see the first place hit. > However, if we do, this means we have a huge number of threads. > Raising the priority of one of them is a hint about which thread we > want the scheduler to focus on. > > The second place is a bit different. > We have several threads running immediately before ending the process. > Some of them are at the exiting path and block exiting of the whole > process. > Raising the priority of those threads is a way to say we're not > interested in all the other threads, as they are going to be > terminated anyway. > > I just noticed that in second scenario it may be appropriate to set > the priority of the current thread to the same level as for the > exiting threads. > This way it'll be given a fair chance to continue if the timeout expires. > > I also think it should be enough to set the priority level to > THREAD_PRIORITY_ABOVE_NORMAL instead of THREAD_PRIORITY_HIGHEST. > It will give just +1 to the priority value -- should be enough for the > hint. > > Would you please take a look at the updated webrev: > http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ > > Sincerely yours, > Ivan > > > On 17.11.2014 11:33, Markus Gr?nlund wrote: >> I agree with David. >> >> The side effects will be unknown and very hard to debug. >> >> Is there another way to accomplish the results without manipulating >> base services? >> >> Thanks >> Markus >> >> -----Original Message----- >> From: David Holmes >> Sent: den 17 november 2014 07:40 >> To: Ivan Gerasimov; Daniel Daugherty >> Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev >> Subject: Re: RFR 8064694: Kitchensink: WaitForMultipleObjects failed >> in hotspot\src\os\windows\vm\os_windows.cpp: 3844 >> >> On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >>> Thank you Daniel! >>> >>> Please find the updated webrev with your suggestions incorporated here: >>> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >>> >>> Concerning the thread priority: If the application is of >>> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >>> THREAD_PRIORITY_HIGHEST will result in its priority value to be only >>> 10 (of maximum 31). >>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs. >>> 85).aspx >>> >>> >>> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >>> HIGHEST priority level will have priority value == 15 of 31. >>> >>> I believe, it should not be too much, and the machine will not become >>> busy with only those closing threads. >>> However, I hope it would be enough to make them complete faster than >>> other threads of the NORMAL priority level withing the same >>> application. >> I don't think this is necessary or desirable. Under normal usage >> we're giving priority to exiting threads and that may disrupt the >> usual scheduling patterns that applications see. You may posit that >> it is "harmless" but we can't say that for sure. Nor can we actually >> know that this will help with this particular bug. I would not add in >> this new code. >> >> David >> >>> Sincerely yours, >>> Ivan >>> >>> >>> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>>> Hello! >>>>> >>>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>>> for exiting threads [win]) caused the warning message to be printed >>>>> in some test environments: >>>>> ----------- >>>>> os_windows.cpp:3844 is in the newly updated >>>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>>> ----------- >>>>> >>>>> This has been observed with debug builds on highly loaded systems. >>>>> >>>>> >>>>> To address the issue it is proposed to do three things: >>>>> 1) increase the timeout for debug builds, >>>>> 2) increase the maximum number of the thread handles to be stored, >>>>> 3) rise the priority of the exiting threads, if we need to wait for >>>>> them. >>>>> >>>>> Would you please help review the fix? >>>>> >>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>>> src/os/windows/vm/os_windows.cpp >>>> >>>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>> >>>> That uses the smaller value for only one build config (PRODUCT). >>>> >>>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) >>>> DEBUG_ONLY(4000) >>>> /*1 sec in product, 4 sec in debug*/ >>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>> Please add spaces between the comment delimiters and the comment >>>> text. >>>> >>>> That uses the smaller timeout for only one build config >>>> (PRODUCT). >>>> >>>> line 3836 // Rise the priority... >>>> Typo: 'Rise' -> 'Raise' >>>> >>>> About the general idea of raising the exiting thread's priority, >>>> if the exiting thread is looping in some Win* OS code after this >>>> point, will raising the priority make the machine unusable? >>>> >>>> Dan >>>> >>>> >>>>> The fix was tested on all available platforms, with the hotspot >>>>> testset. No failures. >>>>> >>>>> Sincerely yours, >>>>> Ivan >>>>> >>>> >>>> >> > From chris.plummer at oracle.com Tue Nov 18 22:08:30 2014 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 18 Nov 2014 14:08:30 -0800 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault In-Reply-To: <54641ADE.8030504@oracle.com> References: <545D939D.2030308@oracle.com> <5463B896.10801@oracle.com> <54641ADE.8030504@oracle.com> Message-ID: <546BC35E.4070402@oracle.com> Adding core-libs-dev at openjdk.java.net, since one of the changes is in java.c. Chris On 11/12/14 6:43 PM, David Holmes wrote: > Hi Chris, > > Sorry for the delay. > > On 13/11/2014 5:44 AM, Chris Plummer wrote: >> Hi, >> >> I'm still looking for reviewers. > > As the change is to the launcher it needs to be reviewed by the > launcher owner - which I think is serviceability (though also cc'd > Kumar :) ). > > Launcher change, and your rationale, seems okay to me. I'd probably > put the test in to jdk/test/tools/launcher/ though. > > Thanks, > David > >> thanks, >> >> Chris >> >> On 11/7/14 7:53 PM, Chris Plummer wrote: >>> This is an initial review for 6762191. I'm guessing there will be >>> recommendations to fix in a different way, but thought this would be a >>> good time to start the discussion. >>> >>> https://bugs.openjdk.java.net/browse/JDK-6762191 >>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ >>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ >>> >>> The bug is that if the -Xss size is set to something very small (like >>> 16k), on linux there will be a crash due to overwriting the end of the >>> stack. This happens before hotspot can compute its stack needs and >>> verify that the stack is big enough. >>> >>> It didn't seem viable to move the hotspot stack size check earlier. It >>> depends on too much other work done before that point, and the changes >>> would have been disruptive. The stack size check is currently done in >>> os::init_2(). >>> >>> What is needed is a check before the thread is created. That way we >>> can create a thread with a big enough stack to handle all needs up to >>> the point of the check in os::init_2(). This initial check does not >>> need to be the final check. It just needs to confirm that we have >>> enough stack to get us to the check in os::init_2(). >>> >>> I decided to check in java.c if the -Xss size is too small, and set it >>> to a larger size if it is. I hard coded this size to 32k (I'll explain >>> why 32k later). I suspect this is the part that will result in some >>> debate. If you have better suggestions let me know. If it does stay >>> here, then probably the 32k needs to be a #define, and maybe even an >>> OS porting interface, but I'm not sure where to put it. >>> >>> The reason I chose 32k is because this is big enough for all platforms >>> to get to the stack size check in os::init_2(). It is also smaller >>> than the actual minimum stack size allowed on any platform. 32-bit >>> windows has the smallest requirement at 64k. I add some printfs to >>> print the minimum stack requirement, and then ran a simple JTReg test >>> with every JPRT supported platform to get the results. >>> >>> The TooSmallStackSize.sh will run "java -version" with -Xss16k, >>> -Xss32k, and -XXss, where is the size from the >>> error message produced by the JVM, such as in the following: >>> >>> $ java -Xss32k -version >>> The stack size specified is too small, Specify at least 100k >>> Error: Could not create the Java Virtual Machine. >>> Error: A fatal exception has occurred. Program will exit. >>> >>> I ran this test through JPRT on all platforms, and they all pass. >>> >>> One thing to point out is that Windows behaves a bit different than >>> the other platforms. It always rounds the stack size up to a multiple >>> of 64k , so even if you specify -Xss16k, you get a 64k stack. On >>> 32-bit Windows with C1, 64k is also the minimum requirement, so there >>> is no error produced in this case. However, on 32-bit Windows with C2, >>> 68k is the minimum, so an error is produced since the stack will only >>> be 64k. There is no bug here. It's just a bit confusing. >>> >>> thanks, >>> >>> Chris >> From coleen.phillimore at oracle.com Tue Nov 18 22:33:09 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 18 Nov 2014 17:33:09 -0500 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546AA8D2.1050600@oracle.com> References: <546A28F3.1010802@oracle.com> <546AA8D2.1050600@oracle.com> Message-ID: <546BC925.6030000@oracle.com> Mandy, In arguments.cpp, I think this should be snprintf in case java_home is MAXPATHLEN long. + char endorsedDir[JVM_MAXPATHLEN]; + char extDir[JVM_MAXPATHLEN]; + const char* fileSep = os::file_separator(); + sprintf(endorsedDir, "%s%slib%sendorsed", Arguments::get_java_home(), fileSep, fileSep); + sprintf(extDir, "%s%slib%sext", Arguments::get_java_home(), fileSep, fileSep); + This list could be hard to maintain. I have no alternatives to suggest though. + // List of JAR files installed in the default lib/ext directory. + // -XX:+CheckEndorsedAndExtDirs checks if any non-JDK file installed The code looks correct though but a bit painful searching through directories. At least it's optional. Coleen On 11/17/14, 9:02 PM, Mandy Chung wrote: > Updated webrev: > http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.01/ > > This addresses Calvin's comment. It now keeps a list of the jar files > shipped with jre/lib/ext and determine if jre/lib/ext has any other > non-JDK jar files installed. > > Mandy > > On 11/17/2014 8:57 AM, Mandy Chung wrote: >> This requests both code review and 8u40 approval for: >> https://bugs.openjdk.java.net/browse/JDK-8064667 >> >> Webrev: >> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ >> >> JEP 220 [1] proposes to remove the endorsed standards override >> mechanism and extension mechanism. This patch adds a VM flag in 8u40 >> to help identify any existing uses of these mechanisms so that users >> can turn on the VM flag to help identify if they depend on the >> endorsed standards override mechanism and extension mechanism and can >> plan to prepare for the migration to a newer JDK release early on. >> When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the >> system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if >> ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any >> system extension directory contains JAR files. >> >> Thanks >> Mandy >> [1] http://openjdk.java.net/jeps/220 >> >> >> > From mandy.chung at oracle.com Tue Nov 18 22:55:41 2014 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 18 Nov 2014 14:55:41 -0800 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546BC925.6030000@oracle.com> References: <546A28F3.1010802@oracle.com> <546AA8D2.1050600@oracle.com> <546BC925.6030000@oracle.com> Message-ID: <546BCE6D.3020104@oracle.com> On 11/18/14 2:33 PM, Coleen Phillimore wrote: > Mandy, > > In arguments.cpp, I think this should be snprintf in case java_home is > MAXPATHLEN long. > That's what I was wondering as I copied from the existing code in arguments.cpp. > + char endorsedDir[JVM_MAXPATHLEN]; > + char extDir[JVM_MAXPATHLEN]; > + const char* fileSep = os::file_separator(); > + sprintf(endorsedDir, "%s%slib%sendorsed", > Arguments::get_java_home(), fileSep, fileSep); > + sprintf(extDir, "%s%slib%sext", Arguments::get_java_home(), > fileSep, fileSep); > + I will fix them to use snprintf. I assume there is a bug to fix the existing use of sprintf; if not you may want to file one. > > This list could be hard to maintain. I have no alternatives to > suggest though. I expect this list will rarely be changed for 8 update. > > + // List of JAR files installed in the default lib/ext directory. > + // -XX:+CheckEndorsedAndExtDirs checks if any non-JDK file installed > > The code looks correct though but a bit painful searching through > directories. At least it's optional. It's off by default. This is to help users using 8u40 to prepare for migration and scanning the directories should not be an issue. thanks for the review. Mandy > > Coleen > > On 11/17/14, 9:02 PM, Mandy Chung wrote: >> Updated webrev: >> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.01/ >> >> This addresses Calvin's comment. It now keeps a list of the jar >> files shipped with jre/lib/ext and determine if jre/lib/ext has any >> other non-JDK jar files installed. >> >> Mandy >> >> On 11/17/2014 8:57 AM, Mandy Chung wrote: >>> This requests both code review and 8u40 approval for: >>> https://bugs.openjdk.java.net/browse/JDK-8064667 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ >>> >>> JEP 220 [1] proposes to remove the endorsed standards override >>> mechanism and extension mechanism. This patch adds a VM flag in 8u40 >>> to help identify any existing uses of these mechanisms so that users >>> can turn on the VM flag to help identify if they depend on the >>> endorsed standards override mechanism and extension mechanism and >>> can plan to prepare for the migration to a newer JDK release early >>> on. When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if >>> the system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, >>> or if ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or >>> any system extension directory contains JAR files. >>> >>> Thanks >>> Mandy >>> [1] http://openjdk.java.net/jeps/220 >>> >>> >>> >> > From mandy.chung at oracle.com Tue Nov 18 23:06:38 2014 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 18 Nov 2014 15:06:38 -0800 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546BCE6D.3020104@oracle.com> References: <546A28F3.1010802@oracle.com> <546AA8D2.1050600@oracle.com> <546BC925.6030000@oracle.com> <546BCE6D.3020104@oracle.com> Message-ID: <546BD0FE.101@oracle.com> Coleen, Calvin, Thanks for the review. Here is the updated webrev and a new test: http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.02/ Mandy From calvin.cheung at oracle.com Tue Nov 18 23:25:16 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 18 Nov 2014 15:25:16 -0800 Subject: [8u40] Review request 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546BD0FE.101@oracle.com> References: <546A28F3.1010802@oracle.com> <546AA8D2.1050600@oracle.com> <546BC925.6030000@oracle.com> <546BCE6D.3020104@oracle.com> <546BD0FE.101@oracle.com> Message-ID: <546BD55C.7060509@oracle.com> On 11/18/2014 3:06 PM, Mandy Chung wrote: > Coleen, Calvin, > > Thanks for the review. Here is the updated webrev and a new test: > http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.02/ Looks good and thanks for adding the testcase. Minor nit about the testcase - the following import statements are extra: 34 import java.io.*; 36 import java.util.concurrent.TimeUnit; I don't need to see another webrev for the testcase change. thanks, Calvin > > Mandy > From chris.plummer at oracle.com Wed Nov 19 08:49:10 2014 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 19 Nov 2014 00:49:10 -0800 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault In-Reply-To: <546BC35E.4070402@oracle.com> References: <545D939D.2030308@oracle.com> <5463B896.10801@oracle.com> <54641ADE.8030504@oracle.com> <546BC35E.4070402@oracle.com> Message-ID: <546C5986.6010500@oracle.com> I've update the webrev to add STACK_SIZE_MINIMUM in place of the 32k references, and also moved the test from hotspot/test/runtime to jdk/test/tools/launcher as David requested. That required some adjustments to the test script, since test_env.sh does not exist in jdk/test, so I had to pull in the bits I needed into the script. http://cr.openjdk.java.net/~cjplummer/6762191/webrev.01/ I still need to rerun through JPRT. I'll do so once there are no more suggested changes. thanks, Chris On 11/18/14 2:08 PM, Chris Plummer wrote: > Adding core-libs-dev at openjdk.java.net, since one of the changes is in > java.c. > > Chris > > On 11/12/14 6:43 PM, David Holmes wrote: >> Hi Chris, >> >> Sorry for the delay. >> >> On 13/11/2014 5:44 AM, Chris Plummer wrote: >>> Hi, >>> >>> I'm still looking for reviewers. >> >> As the change is to the launcher it needs to be reviewed by the >> launcher owner - which I think is serviceability (though also cc'd >> Kumar :) ). >> >> Launcher change, and your rationale, seems okay to me. I'd probably >> put the test in to jdk/test/tools/launcher/ though. >> >> Thanks, >> David >> >>> thanks, >>> >>> Chris >>> >>> On 11/7/14 7:53 PM, Chris Plummer wrote: >>>> This is an initial review for 6762191. I'm guessing there will be >>>> recommendations to fix in a different way, but thought this would be a >>>> good time to start the discussion. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-6762191 >>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ >>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ >>>> >>>> The bug is that if the -Xss size is set to something very small (like >>>> 16k), on linux there will be a crash due to overwriting the end of the >>>> stack. This happens before hotspot can compute its stack needs and >>>> verify that the stack is big enough. >>>> >>>> It didn't seem viable to move the hotspot stack size check earlier. It >>>> depends on too much other work done before that point, and the changes >>>> would have been disruptive. The stack size check is currently done in >>>> os::init_2(). >>>> >>>> What is needed is a check before the thread is created. That way we >>>> can create a thread with a big enough stack to handle all needs up to >>>> the point of the check in os::init_2(). This initial check does not >>>> need to be the final check. It just needs to confirm that we have >>>> enough stack to get us to the check in os::init_2(). >>>> >>>> I decided to check in java.c if the -Xss size is too small, and set it >>>> to a larger size if it is. I hard coded this size to 32k (I'll explain >>>> why 32k later). I suspect this is the part that will result in some >>>> debate. If you have better suggestions let me know. If it does stay >>>> here, then probably the 32k needs to be a #define, and maybe even an >>>> OS porting interface, but I'm not sure where to put it. >>>> >>>> The reason I chose 32k is because this is big enough for all platforms >>>> to get to the stack size check in os::init_2(). It is also smaller >>>> than the actual minimum stack size allowed on any platform. 32-bit >>>> windows has the smallest requirement at 64k. I add some printfs to >>>> print the minimum stack requirement, and then ran a simple JTReg test >>>> with every JPRT supported platform to get the results. >>>> >>>> The TooSmallStackSize.sh will run "java -version" with -Xss16k, >>>> -Xss32k, and -XXss, where is the size from the >>>> error message produced by the JVM, such as in the following: >>>> >>>> $ java -Xss32k -version >>>> The stack size specified is too small, Specify at least 100k >>>> Error: Could not create the Java Virtual Machine. >>>> Error: A fatal exception has occurred. Program will exit. >>>> >>>> I ran this test through JPRT on all platforms, and they all pass. >>>> >>>> One thing to point out is that Windows behaves a bit different than >>>> the other platforms. It always rounds the stack size up to a multiple >>>> of 64k , so even if you specify -Xss16k, you get a 64k stack. On >>>> 32-bit Windows with C1, 64k is also the minimum requirement, so there >>>> is no error produced in this case. However, on 32-bit Windows with C2, >>>> 68k is the minimum, so an error is produced since the stack will only >>>> be 64k. There is no bug here. It's just a bit confusing. >>>> >>>> thanks, >>>> >>>> Chris >>> > From serguei.spitsyn at oracle.com Wed Nov 19 08:54:48 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 19 Nov 2014 00:54:48 -0800 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault In-Reply-To: <546C5986.6010500@oracle.com> References: <545D939D.2030308@oracle.com> <5463B896.10801@oracle.com> <54641ADE.8030504@oracle.com> <546BC35E.4070402@oracle.com> <546C5986.6010500@oracle.com> Message-ID: <546C5AD8.2090701@oracle.com> Reviewed Thanks, Serguei On 11/19/14 12:49 AM, Chris Plummer wrote: > I've update the webrev to add STACK_SIZE_MINIMUM in place of the 32k > references, and also moved the test from hotspot/test/runtime to > jdk/test/tools/launcher as David requested. That required some > adjustments to the test script, since test_env.sh does not exist in > jdk/test, so I had to pull in the bits I needed into the script. > > http://cr.openjdk.java.net/~cjplummer/6762191/webrev.01/ > > I still need to rerun through JPRT. I'll do so once there are no more > suggested changes. > > thanks, > > Chris > > On 11/18/14 2:08 PM, Chris Plummer wrote: >> Adding core-libs-dev at openjdk.java.net, since one of the changes is in >> java.c. >> >> Chris >> >> On 11/12/14 6:43 PM, David Holmes wrote: >>> Hi Chris, >>> >>> Sorry for the delay. >>> >>> On 13/11/2014 5:44 AM, Chris Plummer wrote: >>>> Hi, >>>> >>>> I'm still looking for reviewers. >>> >>> As the change is to the launcher it needs to be reviewed by the >>> launcher owner - which I think is serviceability (though also cc'd >>> Kumar :) ). >>> >>> Launcher change, and your rationale, seems okay to me. I'd probably >>> put the test in to jdk/test/tools/launcher/ though. >>> >>> Thanks, >>> David >>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 11/7/14 7:53 PM, Chris Plummer wrote: >>>>> This is an initial review for 6762191. I'm guessing there will be >>>>> recommendations to fix in a different way, but thought this would >>>>> be a >>>>> good time to start the discussion. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-6762191 >>>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ >>>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ >>>>> >>>>> The bug is that if the -Xss size is set to something very small (like >>>>> 16k), on linux there will be a crash due to overwriting the end of >>>>> the >>>>> stack. This happens before hotspot can compute its stack needs and >>>>> verify that the stack is big enough. >>>>> >>>>> It didn't seem viable to move the hotspot stack size check >>>>> earlier. It >>>>> depends on too much other work done before that point, and the >>>>> changes >>>>> would have been disruptive. The stack size check is currently done in >>>>> os::init_2(). >>>>> >>>>> What is needed is a check before the thread is created. That way we >>>>> can create a thread with a big enough stack to handle all needs up to >>>>> the point of the check in os::init_2(). This initial check does not >>>>> need to be the final check. It just needs to confirm that we have >>>>> enough stack to get us to the check in os::init_2(). >>>>> >>>>> I decided to check in java.c if the -Xss size is too small, and >>>>> set it >>>>> to a larger size if it is. I hard coded this size to 32k (I'll >>>>> explain >>>>> why 32k later). I suspect this is the part that will result in some >>>>> debate. If you have better suggestions let me know. If it does stay >>>>> here, then probably the 32k needs to be a #define, and maybe even an >>>>> OS porting interface, but I'm not sure where to put it. >>>>> >>>>> The reason I chose 32k is because this is big enough for all >>>>> platforms >>>>> to get to the stack size check in os::init_2(). It is also smaller >>>>> than the actual minimum stack size allowed on any platform. 32-bit >>>>> windows has the smallest requirement at 64k. I add some printfs to >>>>> print the minimum stack requirement, and then ran a simple JTReg test >>>>> with every JPRT supported platform to get the results. >>>>> >>>>> The TooSmallStackSize.sh will run "java -version" with -Xss16k, >>>>> -Xss32k, and -XXss, where is the size from the >>>>> error message produced by the JVM, such as in the following: >>>>> >>>>> $ java -Xss32k -version >>>>> The stack size specified is too small, Specify at least 100k >>>>> Error: Could not create the Java Virtual Machine. >>>>> Error: A fatal exception has occurred. Program will exit. >>>>> >>>>> I ran this test through JPRT on all platforms, and they all pass. >>>>> >>>>> One thing to point out is that Windows behaves a bit different than >>>>> the other platforms. It always rounds the stack size up to a multiple >>>>> of 64k , so even if you specify -Xss16k, you get a 64k stack. On >>>>> 32-bit Windows with C1, 64k is also the minimum requirement, so there >>>>> is no error produced in this case. However, on 32-bit Windows with >>>>> C2, >>>>> 68k is the minimum, so an error is produced since the stack will only >>>>> be 64k. There is no bug here. It's just a bit confusing. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>> >> > From david.holmes at oracle.com Wed Nov 19 10:12:42 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Nov 2014 20:12:42 +1000 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault In-Reply-To: <546C5986.6010500@oracle.com> References: <545D939D.2030308@oracle.com> <5463B896.10801@oracle.com> <54641ADE.8030504@oracle.com> <546BC35E.4070402@oracle.com> <546C5986.6010500@oracle.com> Message-ID: <546C6D1A.8050903@oracle.com> On 19/11/2014 6:49 PM, Chris Plummer wrote: > I've update the webrev to add STACK_SIZE_MINIMUM in place of the 32k > references, and also moved the test from hotspot/test/runtime to > jdk/test/tools/launcher as David requested. That required some > adjustments to the test script, since test_env.sh does not exist in > jdk/test, so I had to pull in the bits I needed into the script. Is there a reason this needs a shell script instead of using the testlibrary tools to launch the VM and check the output? Sorry that should have been mentioned much earlier. David > http://cr.openjdk.java.net/~cjplummer/6762191/webrev.01/ > > I still need to rerun through JPRT. I'll do so once there are no more > suggested changes. > > thanks, > > Chris > > On 11/18/14 2:08 PM, Chris Plummer wrote: >> Adding core-libs-dev at openjdk.java.net, since one of the changes is in >> java.c. >> >> Chris >> >> On 11/12/14 6:43 PM, David Holmes wrote: >>> Hi Chris, >>> >>> Sorry for the delay. >>> >>> On 13/11/2014 5:44 AM, Chris Plummer wrote: >>>> Hi, >>>> >>>> I'm still looking for reviewers. >>> >>> As the change is to the launcher it needs to be reviewed by the >>> launcher owner - which I think is serviceability (though also cc'd >>> Kumar :) ). >>> >>> Launcher change, and your rationale, seems okay to me. I'd probably >>> put the test in to jdk/test/tools/launcher/ though. >>> >>> Thanks, >>> David >>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 11/7/14 7:53 PM, Chris Plummer wrote: >>>>> This is an initial review for 6762191. I'm guessing there will be >>>>> recommendations to fix in a different way, but thought this would be a >>>>> good time to start the discussion. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-6762191 >>>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ >>>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ >>>>> >>>>> The bug is that if the -Xss size is set to something very small (like >>>>> 16k), on linux there will be a crash due to overwriting the end of the >>>>> stack. This happens before hotspot can compute its stack needs and >>>>> verify that the stack is big enough. >>>>> >>>>> It didn't seem viable to move the hotspot stack size check earlier. It >>>>> depends on too much other work done before that point, and the changes >>>>> would have been disruptive. The stack size check is currently done in >>>>> os::init_2(). >>>>> >>>>> What is needed is a check before the thread is created. That way we >>>>> can create a thread with a big enough stack to handle all needs up to >>>>> the point of the check in os::init_2(). This initial check does not >>>>> need to be the final check. It just needs to confirm that we have >>>>> enough stack to get us to the check in os::init_2(). >>>>> >>>>> I decided to check in java.c if the -Xss size is too small, and set it >>>>> to a larger size if it is. I hard coded this size to 32k (I'll explain >>>>> why 32k later). I suspect this is the part that will result in some >>>>> debate. If you have better suggestions let me know. If it does stay >>>>> here, then probably the 32k needs to be a #define, and maybe even an >>>>> OS porting interface, but I'm not sure where to put it. >>>>> >>>>> The reason I chose 32k is because this is big enough for all platforms >>>>> to get to the stack size check in os::init_2(). It is also smaller >>>>> than the actual minimum stack size allowed on any platform. 32-bit >>>>> windows has the smallest requirement at 64k. I add some printfs to >>>>> print the minimum stack requirement, and then ran a simple JTReg test >>>>> with every JPRT supported platform to get the results. >>>>> >>>>> The TooSmallStackSize.sh will run "java -version" with -Xss16k, >>>>> -Xss32k, and -XXss, where is the size from the >>>>> error message produced by the JVM, such as in the following: >>>>> >>>>> $ java -Xss32k -version >>>>> The stack size specified is too small, Specify at least 100k >>>>> Error: Could not create the Java Virtual Machine. >>>>> Error: A fatal exception has occurred. Program will exit. >>>>> >>>>> I ran this test through JPRT on all platforms, and they all pass. >>>>> >>>>> One thing to point out is that Windows behaves a bit different than >>>>> the other platforms. It always rounds the stack size up to a multiple >>>>> of 64k , so even if you specify -Xss16k, you get a 64k stack. On >>>>> 32-bit Windows with C1, 64k is also the minimum requirement, so there >>>>> is no error produced in this case. However, on 32-bit Windows with C2, >>>>> 68k is the minimum, so an error is produced since the stack will only >>>>> be 64k. There is no bug here. It's just a bit confusing. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>> >> > From ioi.lam at oracle.com Wed Nov 19 14:08:32 2014 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 19 Nov 2014 22:08:32 +0800 Subject: RFR(XS) 8065346 - WB_AddToBootstrapClassLoaderSearch calls JvmtiEnv::create_a_jvmti when not in _thread_in_vm state Message-ID: <546CA460.6080501@oracle.com> Hi, Please review a simple fix for whitebox test API: http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash/ https://bugs.openjdk.java.net/browse/JDK-8065346 Summary of fix: The JVMTI calls expect the current thread to be in VM state, but JNI GetStringUTFChars expects the thread to be in Native state. So I moved the ThreadToNativeFromVM constructors accordingly to make everyone happy. Tests: I ran the tests with a debug hotspot build and the tests passed after the fix. Thanks - Ioi From chris.plummer at oracle.com Wed Nov 19 15:52:11 2014 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 19 Nov 2014 07:52:11 -0800 Subject: [9] RFR (S) 6762191: Setting stack size to 16K causes segmentation fault In-Reply-To: <546C6D1A.8050903@oracle.com> References: <545D939D.2030308@oracle.com> <5463B896.10801@oracle.com> <54641ADE.8030504@oracle.com> <546BC35E.4070402@oracle.com> <546C5986.6010500@oracle.com> <546C6D1A.8050903@oracle.com> Message-ID: <546CBCAB.7040101@oracle.com> On 11/19/14 2:12 AM, David Holmes wrote: > On 19/11/2014 6:49 PM, Chris Plummer wrote: >> I've update the webrev to add STACK_SIZE_MINIMUM in place of the 32k >> references, and also moved the test from hotspot/test/runtime to >> jdk/test/tools/launcher as David requested. That required some >> adjustments to the test script, since test_env.sh does not exist in >> jdk/test, so I had to pull in the bits I needed into the script. > > Is there a reason this needs a shell script instead of using the > testlibrary tools to launch the VM and check the output? Not that I'm aware of. I guess I just really didn't look at what it would take to make it all in java. I'll have a look at java examples and convert it. Chris > > Sorry that should have been mentioned much earlier. > > David > > >> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.01/ >> >> I still need to rerun through JPRT. I'll do so once there are no more >> suggested changes. >> >> thanks, >> >> Chris >> >> On 11/18/14 2:08 PM, Chris Plummer wrote: >>> Adding core-libs-dev at openjdk.java.net, since one of the changes is in >>> java.c. >>> >>> Chris >>> >>> On 11/12/14 6:43 PM, David Holmes wrote: >>>> Hi Chris, >>>> >>>> Sorry for the delay. >>>> >>>> On 13/11/2014 5:44 AM, Chris Plummer wrote: >>>>> Hi, >>>>> >>>>> I'm still looking for reviewers. >>>> >>>> As the change is to the launcher it needs to be reviewed by the >>>> launcher owner - which I think is serviceability (though also cc'd >>>> Kumar :) ). >>>> >>>> Launcher change, and your rationale, seems okay to me. I'd probably >>>> put the test in to jdk/test/tools/launcher/ though. >>>> >>>> Thanks, >>>> David >>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 11/7/14 7:53 PM, Chris Plummer wrote: >>>>>> This is an initial review for 6762191. I'm guessing there will be >>>>>> recommendations to fix in a different way, but thought this would >>>>>> be a >>>>>> good time to start the discussion. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-6762191 >>>>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.jdk/ >>>>>> http://cr.openjdk.java.net/~cjplummer/6762191/webrev.00.hotspot/ >>>>>> >>>>>> The bug is that if the -Xss size is set to something very small >>>>>> (like >>>>>> 16k), on linux there will be a crash due to overwriting the end >>>>>> of the >>>>>> stack. This happens before hotspot can compute its stack needs and >>>>>> verify that the stack is big enough. >>>>>> >>>>>> It didn't seem viable to move the hotspot stack size check >>>>>> earlier. It >>>>>> depends on too much other work done before that point, and the >>>>>> changes >>>>>> would have been disruptive. The stack size check is currently >>>>>> done in >>>>>> os::init_2(). >>>>>> >>>>>> What is needed is a check before the thread is created. That way we >>>>>> can create a thread with a big enough stack to handle all needs >>>>>> up to >>>>>> the point of the check in os::init_2(). This initial check does not >>>>>> need to be the final check. It just needs to confirm that we have >>>>>> enough stack to get us to the check in os::init_2(). >>>>>> >>>>>> I decided to check in java.c if the -Xss size is too small, and >>>>>> set it >>>>>> to a larger size if it is. I hard coded this size to 32k (I'll >>>>>> explain >>>>>> why 32k later). I suspect this is the part that will result in some >>>>>> debate. If you have better suggestions let me know. If it does stay >>>>>> here, then probably the 32k needs to be a #define, and maybe even an >>>>>> OS porting interface, but I'm not sure where to put it. >>>>>> >>>>>> The reason I chose 32k is because this is big enough for all >>>>>> platforms >>>>>> to get to the stack size check in os::init_2(). It is also smaller >>>>>> than the actual minimum stack size allowed on any platform. 32-bit >>>>>> windows has the smallest requirement at 64k. I add some printfs to >>>>>> print the minimum stack requirement, and then ran a simple JTReg >>>>>> test >>>>>> with every JPRT supported platform to get the results. >>>>>> >>>>>> The TooSmallStackSize.sh will run "java -version" with -Xss16k, >>>>>> -Xss32k, and -XXss, where is the size from the >>>>>> error message produced by the JVM, such as in the following: >>>>>> >>>>>> $ java -Xss32k -version >>>>>> The stack size specified is too small, Specify at least 100k >>>>>> Error: Could not create the Java Virtual Machine. >>>>>> Error: A fatal exception has occurred. Program will exit. >>>>>> >>>>>> I ran this test through JPRT on all platforms, and they all pass. >>>>>> >>>>>> One thing to point out is that Windows behaves a bit different than >>>>>> the other platforms. It always rounds the stack size up to a >>>>>> multiple >>>>>> of 64k , so even if you specify -Xss16k, you get a 64k stack. On >>>>>> 32-bit Windows with C1, 64k is also the minimum requirement, so >>>>>> there >>>>>> is no error produced in this case. However, on 32-bit Windows >>>>>> with C2, >>>>>> 68k is the minimum, so an error is produced since the stack will >>>>>> only >>>>>> be 64k. There is no bug here. It's just a bit confusing. >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>> >>> >> From yumin.qi at oracle.com Wed Nov 19 17:28:29 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Wed, 19 Nov 2014 09:28:29 -0800 Subject: RFR(XS) 8065346 - WB_AddToBootstrapClassLoaderSearch calls JvmtiEnv::create_a_jvmti when not in _thread_in_vm state In-Reply-To: <546CA460.6080501@oracle.com> References: <546CA460.6080501@oracle.com> Message-ID: <546CD33D.5030903@oracle.com> Ioi, In fact you can use * const char* seg = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(segment));* which does not need state transition since it is in native. (Coleen pointed out in a codereview for my change to whitebox) But need ResouceMark first. Thanks Yumin On 11/19/2014 6:08 AM, Ioi Lam wrote: > Hi, > > Please review a simple fix for whitebox test API: > > http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash/ > https://bugs.openjdk.java.net/browse/JDK-8065346 > > Summary of fix: > > The JVMTI calls expect the current thread to be in VM state, but > JNI GetStringUTFChars > expects the thread to be in Native state. > > So I moved the ThreadToNativeFromVM constructors accordingly to > make everyone happy. > > Tests: > > I ran the tests with a debug hotspot build and the tests passed > after the fix. > > Thanks > - Ioi From mandy.chung at oracle.com Wed Nov 19 21:10:54 2014 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 19 Nov 2014 13:10:54 -0800 Subject: [8u40] Putback request for 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546A28F3.1010802@oracle.com> References: <546A28F3.1010802@oracle.com> Message-ID: <546D075D.8050601@oracle.com> Coleen and Calvin from runtime team have reviewed and approved this fix. I notice that jdk8u-dev got dropped in their review [1]. May I get the 8u40 approval to putback this change? Updated webrev: http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.02 Thanks Mandy [1] http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-November/013312.html On 11/17/2014 8:57 AM, Mandy Chung wrote: > This requests both code review and 8u40 approval for: > https://bugs.openjdk.java.net/browse/JDK-8064667 > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ > > JEP 220 [1] proposes to remove the endorsed standards override > mechanism and extension mechanism. This patch adds a VM flag in 8u40 > to help identify any existing uses of these mechanisms so that users > can turn on the VM flag to help identify if they depend on the > endorsed standards override mechanism and extension mechanism and can > plan to prepare for the migration to a newer JDK release early on. > When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the > system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if > ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any > system extension directory contains JAR files. > > Thanks > Mandy > [1] http://openjdk.java.net/jeps/220 > > > From naoto.sato at oracle.com Wed Nov 19 21:37:30 2014 From: naoto.sato at oracle.com (Naoto Sato) Date: Wed, 19 Nov 2014 13:37:30 -0800 Subject: [8u40] Putback request for 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546D075D.8050601@oracle.com> References: <546A28F3.1010802@oracle.com> <546D075D.8050601@oracle.com> Message-ID: <546D0D9A.70406@oracle.com> Since you've already got the code review done specifically for 8u, you are good to go. Naoto On 11/19/14, 1:10 PM, Mandy Chung wrote: > Coleen and Calvin from runtime team have reviewed and approved this > fix. I notice that jdk8u-dev got dropped in their review [1]. > > May I get the 8u40 approval to putback this change? > > Updated webrev: > http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.02 > > Thanks > Mandy > [1] > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-November/013312.html > > > On 11/17/2014 8:57 AM, Mandy Chung wrote: >> This requests both code review and 8u40 approval for: >> https://bugs.openjdk.java.net/browse/JDK-8064667 >> >> Webrev: >> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ >> >> JEP 220 [1] proposes to remove the endorsed standards override >> mechanism and extension mechanism. This patch adds a VM flag in 8u40 >> to help identify any existing uses of these mechanisms so that users >> can turn on the VM flag to help identify if they depend on the >> endorsed standards override mechanism and extension mechanism and can >> plan to prepare for the migration to a newer JDK release early on. >> When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the >> system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if >> ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any >> system extension directory contains JAR files. >> >> Thanks >> Mandy >> [1] http://openjdk.java.net/jeps/220 >> >> >> > From naoto.sato at oracle.com Wed Nov 19 22:33:31 2014 From: naoto.sato at oracle.com (Naoto Sato) Date: Wed, 19 Nov 2014 14:33:31 -0800 Subject: [8u40] Putback request for 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546D0D9A.70406@oracle.com> References: <546A28F3.1010802@oracle.com> <546D075D.8050601@oracle.com> <546D0D9A.70406@oracle.com> Message-ID: <546D1ABB.6090709@oracle.com> This was somewhat misleading, but take it as an "approval." Naoto On 11/19/14, 1:37 PM, Naoto Sato wrote: > Since you've already got the code review done specifically for 8u, you > are good to go. > > Naoto > > On 11/19/14, 1:10 PM, Mandy Chung wrote: >> Coleen and Calvin from runtime team have reviewed and approved this >> fix. I notice that jdk8u-dev got dropped in their review [1]. >> >> May I get the 8u40 approval to putback this change? >> >> Updated webrev: >> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.02 >> >> Thanks >> Mandy >> [1] >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-November/013312.html >> >> >> >> On 11/17/2014 8:57 AM, Mandy Chung wrote: >>> This requests both code review and 8u40 approval for: >>> https://bugs.openjdk.java.net/browse/JDK-8064667 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ >>> >>> JEP 220 [1] proposes to remove the endorsed standards override >>> mechanism and extension mechanism. This patch adds a VM flag in 8u40 >>> to help identify any existing uses of these mechanisms so that users >>> can turn on the VM flag to help identify if they depend on the >>> endorsed standards override mechanism and extension mechanism and can >>> plan to prepare for the migration to a newer JDK release early on. >>> When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the >>> system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if >>> ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any >>> system extension directory contains JAR files. >>> >>> Thanks >>> Mandy >>> [1] http://openjdk.java.net/jeps/220 >>> >>> >>> >> From ioi.lam at oracle.com Thu Nov 20 01:54:13 2014 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 20 Nov 2014 09:54:13 +0800 Subject: RFR(XS) 8065346 - WB_AddToBootstrapClassLoaderSearch calls JvmtiEnv::create_a_jvmti when not in _thread_in_vm state In-Reply-To: <546CD33D.5030903@oracle.com> References: <546CA460.6080501@oracle.com> <546CD33D.5030903@oracle.com> Message-ID: <546D49C5.4060400@oracle.com> Hi Yumin, Thanks for the review. I have updated the webrev at http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash-v2/ Thanks - Ioi On 11/20/14, 1:28 AM, Yumin Qi wrote: > Ioi, > > In fact you can use > * const char* seg = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(segment));* > which does not need state transition since it is in native. (Coleen > pointed out in a codereview for my change to whitebox) > But need ResouceMark first. > > Thanks > Yumin > > On 11/19/2014 6:08 AM, Ioi Lam wrote: >> Hi, >> >> Please review a simple fix for whitebox test API: >> >> http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash/ >> https://bugs.openjdk.java.net/browse/JDK-8065346 >> >> Summary of fix: >> >> The JVMTI calls expect the current thread to be in VM state, but >> JNI GetStringUTFChars >> expects the thread to be in Native state. >> >> So I moved the ThreadToNativeFromVM constructors accordingly to >> make everyone happy. >> >> Tests: >> >> I ran the tests with a debug hotspot build and the tests passed >> after the fix. >> >> Thanks >> - Ioi > From coleen.phillimore at oracle.com Thu Nov 20 02:10:32 2014 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 19 Nov 2014 21:10:32 -0500 Subject: RFR(XS) 8065346 - WB_AddToBootstrapClassLoaderSearch calls JvmtiEnv::create_a_jvmti when not in _thread_in_vm state In-Reply-To: <546D49C5.4060400@oracle.com> References: <546CA460.6080501@oracle.com> <546CD33D.5030903@oracle.com> <546D49C5.4060400@oracle.com> Message-ID: <546D4D98.1080903@oracle.com> I agree, this second version looks better. There are a bunch of bizarre transitions to native in whitebox.cpp that seem unnecessary. Coleen On 11/19/14, 8:54 PM, Ioi Lam wrote: > Hi Yumin, > > Thanks for the review. I have updated the webrev at > > http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash-v2/ > > Thanks > - Ioi > > On 11/20/14, 1:28 AM, Yumin Qi wrote: >> Ioi, >> >> In fact you can use >> * const char* seg = >> java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(segment));* >> which does not need state transition since it is in native. (Coleen >> pointed out in a codereview for my change to whitebox) >> But need ResouceMark first. >> >> Thanks >> Yumin >> >> On 11/19/2014 6:08 AM, Ioi Lam wrote: >>> Hi, >>> >>> Please review a simple fix for whitebox test API: >>> >>> http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash/ >>> https://bugs.openjdk.java.net/browse/JDK-8065346 >>> >>> Summary of fix: >>> >>> The JVMTI calls expect the current thread to be in VM state, but >>> JNI GetStringUTFChars >>> expects the thread to be in Native state. >>> >>> So I moved the ThreadToNativeFromVM constructors accordingly to >>> make everyone happy. >>> >>> Tests: >>> >>> I ran the tests with a debug hotspot build and the tests passed >>> after the fix. >>> >>> Thanks >>> - Ioi >> > From david.holmes at oracle.com Thu Nov 20 02:20:58 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 20 Nov 2014 12:20:58 +1000 Subject: RFR(XS) 8065346 - WB_AddToBootstrapClassLoaderSearch calls JvmtiEnv::create_a_jvmti when not in _thread_in_vm state In-Reply-To: <546D49C5.4060400@oracle.com> References: <546CA460.6080501@oracle.com> <546CD33D.5030903@oracle.com> <546D49C5.4060400@oracle.com> Message-ID: <546D500A.4000601@oracle.com> On 20/11/2014 11:54 AM, Ioi Lam wrote: > Hi Yumin, > > Thanks for the review. I have updated the webrev at > > http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash-v2/ Yep this looks good. Thanks, David > Thanks > - Ioi > > On 11/20/14, 1:28 AM, Yumin Qi wrote: >> Ioi, >> >> In fact you can use >> * const char* seg = >> java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(segment));* >> which does not need state transition since it is in native. (Coleen >> pointed out in a codereview for my change to whitebox) >> But need ResouceMark first. >> >> Thanks >> Yumin >> >> On 11/19/2014 6:08 AM, Ioi Lam wrote: >>> Hi, >>> >>> Please review a simple fix for whitebox test API: >>> >>> http://cr.openjdk.java.net/~iklam/8065346-jvmti-test-crash/ >>> https://bugs.openjdk.java.net/browse/JDK-8065346 >>> >>> Summary of fix: >>> >>> The JVMTI calls expect the current thread to be in VM state, but >>> JNI GetStringUTFChars >>> expects the thread to be in Native state. >>> >>> So I moved the ThreadToNativeFromVM constructors accordingly to >>> make everyone happy. >>> >>> Tests: >>> >>> I ran the tests with a debug hotspot build and the tests passed >>> after the fix. >>> >>> Thanks >>> - Ioi >> > From sean.coffey at oracle.com Thu Nov 20 09:21:41 2014 From: sean.coffey at oracle.com (=?windows-1252?Q?Se=E1n_Coffey?=) Date: Thu, 20 Nov 2014 09:21:41 +0000 Subject: [8u40] Putback request for 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546D1ABB.6090709@oracle.com> References: <546A28F3.1010802@oracle.com> <546D075D.8050601@oracle.com> <546D0D9A.70406@oracle.com> <546D1ABB.6090709@oracle.com> Message-ID: <546DB2A5.7060209@oracle.com> Is a CCC required for this change Mandy ? regards, Sean. On 19/11/2014 22:33, Naoto Sato wrote: > This was somewhat misleading, but take it as an "approval." > > Naoto > > On 11/19/14, 1:37 PM, Naoto Sato wrote: >> Since you've already got the code review done specifically for 8u, you >> are good to go. >> >> Naoto >> >> On 11/19/14, 1:10 PM, Mandy Chung wrote: >>> Coleen and Calvin from runtime team have reviewed and approved this >>> fix. I notice that jdk8u-dev got dropped in their review [1]. >>> >>> May I get the 8u40 approval to putback this change? >>> >>> Updated webrev: >>> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.02 >>> >>> Thanks >>> Mandy >>> [1] >>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-November/013312.html >>> >>> >>> >>> >>> On 11/17/2014 8:57 AM, Mandy Chung wrote: >>>> This requests both code review and 8u40 approval for: >>>> https://bugs.openjdk.java.net/browse/JDK-8064667 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ >>>> >>>> JEP 220 [1] proposes to remove the endorsed standards override >>>> mechanism and extension mechanism. This patch adds a VM flag in 8u40 >>>> to help identify any existing uses of these mechanisms so that users >>>> can turn on the VM flag to help identify if they depend on the >>>> endorsed standards override mechanism and extension mechanism and can >>>> plan to prepare for the migration to a newer JDK release early on. >>>> When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the >>>> system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if >>>> ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any >>>> system extension directory contains JAR files. >>>> >>>> Thanks >>>> Mandy >>>> [1] http://openjdk.java.net/jeps/220 >>>> >>>> >>>> >>> From ivan.gerasimov at oracle.com Thu Nov 20 12:51:18 2014 From: ivan.gerasimov at oracle.com (Ivan Gerasimov) Date: Thu, 20 Nov 2014 15:51:18 +0300 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <546B6568.7040701@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> <546AF55A.8090203@oracle.com> <546B6568.7040701@oracle.com> Message-ID: <546DE3C6.5050200@oracle.com> Thank you Daniel! David, are you still Okay with the updated webrev? Comparing to the previous one, I've added setting the priority of the current thread at the line 3880 and changed the priority level to from HIGHEST to ABOVE_NORMAL. Sincerely yours, Ivan On 18.11.2014 18:27, Daniel D. Daugherty wrote: > > http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ > > src/os/windows/vm/os_windows.cpp > No commments. > > Thumbs up. > > Dan > > > On 11/18/14 12:29 AM, Ivan Gerasimov wrote: >> Hi Markus! >> >> The priority of the exiting thread will be raised for quite a short >> period of time -- right before the thread finishes exiting. >> >> There are two places where the priority is adjusted. >> >> Under normal conditions we should never see the first place hit. >> However, if we do, this means we have a huge number of threads. >> Raising the priority of one of them is a hint about which thread we >> want the scheduler to focus on. >> >> The second place is a bit different. >> We have several threads running immediately before ending the process. >> Some of them are at the exiting path and block exiting of the whole >> process. >> Raising the priority of those threads is a way to say we're not >> interested in all the other threads, as they are going to be >> terminated anyway. >> >> I just noticed that in second scenario it may be appropriate to set >> the priority of the current thread to the same level as for the >> exiting threads. >> This way it'll be given a fair chance to continue if the timeout >> expires. >> >> I also think it should be enough to set the priority level to >> THREAD_PRIORITY_ABOVE_NORMAL instead of THREAD_PRIORITY_HIGHEST. >> It will give just +1 to the priority value -- should be enough for >> the hint. >> >> Would you please take a look at the updated webrev: >> http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ >> >> Sincerely yours, >> Ivan >> >> >> On 17.11.2014 11:33, Markus Gr?nlund wrote: >>> I agree with David. >>> >>> The side effects will be unknown and very hard to debug. >>> >>> Is there another way to accomplish the results without manipulating >>> base services? >>> >>> Thanks >>> Markus >>> >>> -----Original Message----- >>> From: David Holmes >>> Sent: den 17 november 2014 07:40 >>> To: Ivan Gerasimov; Daniel Daugherty >>> Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev >>> Subject: Re: RFR 8064694: Kitchensink: WaitForMultipleObjects failed >>> in hotspot\src\os\windows\vm\os_windows.cpp: 3844 >>> >>> On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >>>> Thank you Daniel! >>>> >>>> Please find the updated webrev with your suggestions incorporated >>>> here: >>>> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >>>> >>>> Concerning the thread priority: If the application is of >>>> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >>>> THREAD_PRIORITY_HIGHEST will result in its priority value to be only >>>> 10 (of maximum 31). >>>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs. >>>> 85).aspx >>>> >>>> >>>> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >>>> HIGHEST priority level will have priority value == 15 of 31. >>>> >>>> I believe, it should not be too much, and the machine will not become >>>> busy with only those closing threads. >>>> However, I hope it would be enough to make them complete faster than >>>> other threads of the NORMAL priority level withing the same >>>> application. >>> I don't think this is necessary or desirable. Under normal usage >>> we're giving priority to exiting threads and that may disrupt the >>> usual scheduling patterns that applications see. You may posit that >>> it is "harmless" but we can't say that for sure. Nor can we actually >>> know that this will help with this particular bug. I would not add >>> in this new code. >>> >>> David >>> >>>> Sincerely yours, >>>> Ivan >>>> >>>> >>>> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>>>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>>>> Hello! >>>>>> >>>>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>>>> for exiting threads [win]) caused the warning message to be printed >>>>>> in some test environments: >>>>>> ----------- >>>>>> os_windows.cpp:3844 is in the newly updated >>>>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>>>> ----------- >>>>>> >>>>>> This has been observed with debug builds on highly loaded systems. >>>>>> >>>>>> >>>>>> To address the issue it is proposed to do three things: >>>>>> 1) increase the timeout for debug builds, >>>>>> 2) increase the maximum number of the thread handles to be stored, >>>>>> 3) rise the priority of the exiting threads, if we need to wait for >>>>>> them. >>>>>> >>>>>> Would you please help review the fix? >>>>>> >>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>>>> src/os/windows/vm/os_windows.cpp >>>>> >>>>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>> >>>>> That uses the smaller value for only one build config (PRODUCT). >>>>> >>>>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) >>>>> DEBUG_ONLY(4000) >>>>> /*1 sec in product, 4 sec in debug*/ >>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>> Please add spaces between the comment delimiters and the comment >>>>> text. >>>>> >>>>> That uses the smaller timeout for only one build config >>>>> (PRODUCT). >>>>> >>>>> line 3836 // Rise the priority... >>>>> Typo: 'Rise' -> 'Raise' >>>>> >>>>> About the general idea of raising the exiting thread's priority, >>>>> if the exiting thread is looping in some Win* OS code after this >>>>> point, will raising the priority make the machine unusable? >>>>> >>>>> Dan >>>>> >>>>> >>>>>> The fix was tested on all available platforms, with the hotspot >>>>>> testset. No failures. >>>>>> >>>>>> Sincerely yours, >>>>>> Ivan >>>>>> >>>>> >>>>> >>> >> > > > From mandy.chung at oracle.com Thu Nov 20 17:25:09 2014 From: mandy.chung at oracle.com (Mandy Chung) Date: Thu, 20 Nov 2014 09:25:09 -0800 Subject: [8u40] Putback request for 8064667: Provide support to help identify use of endorsed standards and extension mechanism In-Reply-To: <546DB2A5.7060209@oracle.com> References: <546A28F3.1010802@oracle.com> <546D075D.8050601@oracle.com> <546D0D9A.70406@oracle.com> <546D1ABB.6090709@oracle.com> <546DB2A5.7060209@oracle.com> Message-ID: <546E23F5.6010006@oracle.com> I should file a CCC (thanks for the reminder) and this option should be documented in the release note or some document. Mandy On 11/20/14 1:21 AM, Se?n Coffey wrote: > Is a CCC required for this change Mandy ? > > regards, > Sean. > > On 19/11/2014 22:33, Naoto Sato wrote: >> This was somewhat misleading, but take it as an "approval." >> >> Naoto >> >> On 11/19/14, 1:37 PM, Naoto Sato wrote: >>> Since you've already got the code review done specifically for 8u, you >>> are good to go. >>> >>> Naoto >>> >>> On 11/19/14, 1:10 PM, Mandy Chung wrote: >>>> Coleen and Calvin from runtime team have reviewed and approved this >>>> fix. I notice that jdk8u-dev got dropped in their review [1]. >>>> >>>> May I get the 8u40 approval to putback this change? >>>> >>>> Updated webrev: >>>> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.02 >>>> >>>> Thanks >>>> Mandy >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-November/013312.html >>>> >>>> >>>> >>>> >>>> On 11/17/2014 8:57 AM, Mandy Chung wrote: >>>>> This requests both code review and 8u40 approval for: >>>>> https://bugs.openjdk.java.net/browse/JDK-8064667 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~mchung/jdk8u/webrevs/8064667/webrev.00/ >>>>> >>>>> JEP 220 [1] proposes to remove the endorsed standards override >>>>> mechanism and extension mechanism. This patch adds a VM flag in 8u40 >>>>> to help identify any existing uses of these mechanisms so that users >>>>> can turn on the VM flag to help identify if they depend on the >>>>> endorsed standards override mechanism and extension mechanism and can >>>>> plan to prepare for the migration to a newer JDK release early on. >>>>> When -XX:+CheckEndorsedAndExtDirs is set, the VM will exit if the >>>>> system property -Djava.endorsed.dirs or -Djava.ext.dirs is set, or if >>>>> ${java.home}/lib/endorsed or ${java.home}/lib/ext exists, or any >>>>> system extension directory contains JAR files. >>>>> >>>>> Thanks >>>>> Mandy >>>>> [1] http://openjdk.java.net/jeps/220 >>>>> >>>>> >>>>> >>>> > From vladimir.kempik at oracle.com Fri Nov 21 15:31:20 2014 From: vladimir.kempik at oracle.com (Vladimir Kempik) Date: Fri, 21 Nov 2014 18:31:20 +0300 Subject: RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment In-Reply-To: <546A50E3.6010200@oracle.com> References: <546A206A.4070604@oracle.com> <546A50E3.6010200@oracle.com> Message-ID: <546F5AC8.3050705@oracle.com> Hello Thanks for looking into this. It's impossible to collect needed data at the moment, the bug isn't reproducible now. And cpuid dump I've collected from ec2 virtual machine says that supports_processor_topology() should report false now: static bool supports_processor_topology() { return (_cpuid_info.std_max_function >= 0xB) && // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level. // Some cpus have max cpuid >= 0xB but do not support processor topology. (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); } which comes from this being false: (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); The check I've added is sanity check to prevent same crashes in future. Thanks. Vladimir On 17.11.2014 22:47, Vladimir Kozlov wrote: > According to next document the cpu has 10 cores (and 2 threads per core): > > http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz > > > hs_err in the bug report reports only 2 processors and next lines are > missing: > > physical id : 0 > siblings : 4 > core id : 0 > cpu cores : 4 > apicid : 0 > initial apicid : 0 > > I assume it is some kind of virtual environment with which cpuid > topology is not working (at least our code does not work). > We may missing some checks which indicates that topology is not > supported. > It would be nice if you can put all topology and related cpuid bits > from amazon ec2 in bug report. > Checking for 0 could be fine but if it is not 0 it could be still > wrong if topology info is not supported. > > Thanks, > Vladimir > > On 11/17/14 8:20 AM, Vladimir Kempik wrote: >> Hi, >> >> Please review patch adding sanity check to cores_per_cpu(): >> >> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8058935 >> >> Few months ago we've got reports of java crashing in amazon ec2 >> enviroment (they use Xen). >> https://bugs.openjdk.java.net/browse/JDK-8058935 >> https://bugs.openjdk.java.net/browse/JDK-8058937 >> >> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures >> -XX:+FlightRecorder >> >> After investigation I think the crash could only have happened if >> support_processor_topology() returned true and >> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero. >> >> I wasn't able to reproduce the bug on amazon ec2 cloud in present days. >> >> The patch adds sanity check, if cpu topology was used and resulted in 0 >> cores per cpu, then fallback to non-topology variant, which can't result >> in 0 cores per cpu. >> >> Testing: JPRT. >> >> Thanks, >> Vladimir. From dmitry.samersoff at oracle.com Fri Nov 21 15:47:22 2014 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Fri, 21 Nov 2014 18:47:22 +0300 Subject: RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment In-Reply-To: <546A50E3.6010200@oracle.com> References: <546A206A.4070604@oracle.com> <546A50E3.6010200@oracle.com> Message-ID: <546F5E8A.9090007@oracle.com> Vladimir, If my memory is not bogus, xen hypervisor used to alter cpuinfo provided. So as soon as we can't detect xen and use xen api to get CPU capabilities, Vladimir K. approach looks reasonable to me. -Dmitry On 2014-11-17 22:47, Vladimir Kozlov wrote: > According to next document the cpu has 10 cores (and 2 threads per core): > > http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz > > > hs_err in the bug report reports only 2 processors and next lines are > missing: > > physical id : 0 > siblings : 4 > core id : 0 > cpu cores : 4 > apicid : 0 > initial apicid : 0 > > I assume it is some kind of virtual environment with which cpuid > topology is not working (at least our code does not work). > We may missing some checks which indicates that topology is not supported. > It would be nice if you can put all topology and related cpuid bits from > amazon ec2 in bug report. > Checking for 0 could be fine but if it is not 0 it could be still wrong > if topology info is not supported. > > Thanks, > Vladimir > > On 11/17/14 8:20 AM, Vladimir Kempik wrote: >> Hi, >> >> Please review patch adding sanity check to cores_per_cpu(): >> >> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8058935 >> >> Few months ago we've got reports of java crashing in amazon ec2 >> enviroment (they use Xen). >> https://bugs.openjdk.java.net/browse/JDK-8058935 >> https://bugs.openjdk.java.net/browse/JDK-8058937 >> >> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures >> -XX:+FlightRecorder >> >> After investigation I think the crash could only have happened if >> support_processor_topology() returned true and >> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero. >> >> I wasn't able to reproduce the bug on amazon ec2 cloud in present days. >> >> The patch adds sanity check, if cpu topology was used and resulted in 0 >> cores per cpu, then fallback to non-topology variant, which can't result >> in 0 cores per cpu. >> >> Testing: JPRT. >> >> Thanks, >> Vladimir. -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From vladimir.kozlov at oracle.com Fri Nov 21 17:08:06 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 21 Nov 2014 09:08:06 -0800 Subject: RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment In-Reply-To: <546F5AC8.3050705@oracle.com> References: <546A206A.4070604@oracle.com> <546A50E3.6010200@oracle.com> <546F5AC8.3050705@oracle.com> Message-ID: <546F7176.5020508@oracle.com> > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); That check was added long ago for 6968646 and is present in jdk7 and 6update. And the failure happened in jdk which have it: # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops) But if Dmitry is right we can do nothing here. So your change seems valid in such case. One note - do you need to check (result == 0) in threads_per_core() too? Thanks, Vladimir On 11/21/14 7:31 AM, Vladimir Kempik wrote: > Hello > > Thanks for looking into this. > > It's impossible to collect needed data at the moment, the bug isn't reproducible now. And cpuid dump I've collected from > ec2 virtual machine says that supports_processor_topology() should report false now: > > static bool supports_processor_topology() { > return (_cpuid_info.std_max_function >= 0xB) && > // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level. > // Some cpus have max cpuid >= 0xB but do not support processor topology. > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); > } > > > which comes from this being false: > > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); > > The check I've added is sanity check to prevent same crashes in future. > > Thanks. Vladimir > > > On 17.11.2014 22:47, Vladimir Kozlov wrote: >> According to next document the cpu has 10 cores (and 2 threads per core): >> >> http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz >> >> hs_err in the bug report reports only 2 processors and next lines are missing: >> >> physical id : 0 >> siblings : 4 >> core id : 0 >> cpu cores : 4 >> apicid : 0 >> initial apicid : 0 >> >> I assume it is some kind of virtual environment with which cpuid topology is not working (at least our code does not >> work). >> We may missing some checks which indicates that topology is not supported. >> It would be nice if you can put all topology and related cpuid bits from amazon ec2 in bug report. >> Checking for 0 could be fine but if it is not 0 it could be still wrong if topology info is not supported. >> >> Thanks, >> Vladimir >> >> On 11/17/14 8:20 AM, Vladimir Kempik wrote: >>> Hi, >>> >>> Please review patch adding sanity check to cores_per_cpu(): >>> >>> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8058935 >>> >>> Few months ago we've got reports of java crashing in amazon ec2 >>> enviroment (they use Xen). >>> https://bugs.openjdk.java.net/browse/JDK-8058935 >>> https://bugs.openjdk.java.net/browse/JDK-8058937 >>> >>> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures >>> -XX:+FlightRecorder >>> >>> After investigation I think the crash could only have happened if >>> support_processor_topology() returned true and >>> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero. >>> >>> I wasn't able to reproduce the bug on amazon ec2 cloud in present days. >>> >>> The patch adds sanity check, if cpu topology was used and resulted in 0 >>> cores per cpu, then fallback to non-topology variant, which can't result >>> in 0 cores per cpu. >>> >>> Testing: JPRT. >>> >>> Thanks, >>> Vladimir. > From vladimir.kempik at oracle.com Fri Nov 21 17:19:18 2014 From: vladimir.kempik at oracle.com (Vladimir Kempik) Date: Fri, 21 Nov 2014 20:19:18 +0300 Subject: RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment In-Reply-To: <546F7176.5020508@oracle.com> References: <546A206A.4070604@oracle.com> <546A50E3.6010200@oracle.com> <546F5AC8.3050705@oracle.com> <546F7176.5020508@oracle.com> Message-ID: <546F7416.5080102@oracle.com> Hello >That check was added long ago for 6968646 and is present in jdk7 and 6update. And the failure happened in jdk which have it: I meant this check failed to do its job, there is no other way to get cores_per_cpu == 0 on intel cpu in this function. >One note - do you need to check (result == 0) in threads_per_core() too? for result to be 0 in cores_per_cpu() result = _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus / _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus needs to be zero and _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus to be non zero. in this case threads_per_core isn't affected: if (is_intel() && supports_processor_topology()) { result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; if _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus == 0 then we would crash in cores_per_cpu with div by zero anyway. That was my reason to do not edit threads_per_cpu. Thanks, Vladimir On 21.11.2014 20:08, Vladimir Kozlov wrote: > > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | > _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); > > That check was added long ago for 6968646 and is present in jdk7 and > 6update. And the failure happened in jdk which have it: > > # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build > 1.7.0_51-b13) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode > linux-amd64 compressed oops) > > But if Dmitry is right we can do nothing here. So your change seems > valid in such case. > > One note - do you need to check (result == 0) in threads_per_core() too? > > Thanks, > Vladimir > > On 11/21/14 7:31 AM, Vladimir Kempik wrote: >> Hello >> >> Thanks for looking into this. >> >> It's impossible to collect needed data at the moment, the bug isn't >> reproducible now. And cpuid dump I've collected from >> ec2 virtual machine says that supports_processor_topology() should >> report false now: >> >> static bool supports_processor_topology() { >> return (_cpuid_info.std_max_function >= 0xB) && >> // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level. >> // Some cpus have max cpuid >= 0xB but do not support processor >> topology. >> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | >> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); >> } >> >> >> which comes from this being false: >> >> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | >> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); >> >> The check I've added is sanity check to prevent same crashes in future. >> >> Thanks. Vladimir >> >> >> On 17.11.2014 22:47, Vladimir Kozlov wrote: >>> According to next document the cpu has 10 cores (and 2 threads per >>> core): >>> >>> http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz >>> >>> >>> hs_err in the bug report reports only 2 processors and next lines >>> are missing: >>> >>> physical id : 0 >>> siblings : 4 >>> core id : 0 >>> cpu cores : 4 >>> apicid : 0 >>> initial apicid : 0 >>> >>> I assume it is some kind of virtual environment with which cpuid >>> topology is not working (at least our code does not >>> work). >>> We may missing some checks which indicates that topology is not >>> supported. >>> It would be nice if you can put all topology and related cpuid bits >>> from amazon ec2 in bug report. >>> Checking for 0 could be fine but if it is not 0 it could be still >>> wrong if topology info is not supported. >>> >>> Thanks, >>> Vladimir >>> >>> On 11/17/14 8:20 AM, Vladimir Kempik wrote: >>>> Hi, >>>> >>>> Please review patch adding sanity check to cores_per_cpu(): >>>> >>>> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8058935 >>>> >>>> Few months ago we've got reports of java crashing in amazon ec2 >>>> enviroment (they use Xen). >>>> https://bugs.openjdk.java.net/browse/JDK-8058935 >>>> https://bugs.openjdk.java.net/browse/JDK-8058937 >>>> >>>> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures >>>> -XX:+FlightRecorder >>>> >>>> After investigation I think the crash could only have happened if >>>> support_processor_topology() returned true and >>>> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero. >>>> >>>> I wasn't able to reproduce the bug on amazon ec2 cloud in present >>>> days. >>>> >>>> The patch adds sanity check, if cpu topology was used and resulted >>>> in 0 >>>> cores per cpu, then fallback to non-topology variant, which can't >>>> result >>>> in 0 cores per cpu. >>>> >>>> Testing: JPRT. >>>> >>>> Thanks, >>>> Vladimir. >> From vladimir.kozlov at oracle.com Fri Nov 21 17:40:57 2014 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 21 Nov 2014 09:40:57 -0800 Subject: RFR: 8058935: CPU detection gives 0 cores per cpu, 2 threads per core in Amazon EC2 environment In-Reply-To: <546F7416.5080102@oracle.com> References: <546A206A.4070604@oracle.com> <546A50E3.6010200@oracle.com> <546F5AC8.3050705@oracle.com> <546F7176.5020508@oracle.com> <546F7416.5080102@oracle.com> Message-ID: <546F7929.60909@oracle.com> Okay. Looks good. Thanks, Vladimir On 11/21/14 9:19 AM, Vladimir Kempik wrote: > Hello > > > >That check was added long ago for 6968646 and is present in jdk7 and > 6update. And the failure happened in jdk which have it: > > I meant this check failed to do its job, there is no other way to get > cores_per_cpu == 0 on intel cpu in this function. > > > >One note - do you need to check (result == 0) in threads_per_core() too? > > for result to be 0 in cores_per_cpu() > > result = _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus / > _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; > > _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus needs to be zero and > _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus to be non zero. in this > case threads_per_core isn't affected: > > if (is_intel() && supports_processor_topology()) { > result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; > > if _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus == 0 then we would > crash in cores_per_cpu with div by zero anyway. > > That was my reason to do not edit threads_per_cpu. > > Thanks, Vladimir > On 21.11.2014 20:08, Vladimir Kozlov wrote: >> > (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | >> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); >> >> That check was added long ago for 6968646 and is present in jdk7 and >> 6update. And the failure happened in jdk which have it: >> >> # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build >> 1.7.0_51-b13) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode >> linux-amd64 compressed oops) >> >> But if Dmitry is right we can do nothing here. So your change seems >> valid in such case. >> >> One note - do you need to check (result == 0) in threads_per_core() too? >> >> Thanks, >> Vladimir >> >> On 11/21/14 7:31 AM, Vladimir Kempik wrote: >>> Hello >>> >>> Thanks for looking into this. >>> >>> It's impossible to collect needed data at the moment, the bug isn't >>> reproducible now. And cpuid dump I've collected from >>> ec2 virtual machine says that supports_processor_topology() should >>> report false now: >>> >>> static bool supports_processor_topology() { >>> return (_cpuid_info.std_max_function >= 0xB) && >>> // eax[4:0] | ebx[0:15] == 0 indicates invalid topology level. >>> // Some cpus have max cpuid >= 0xB but do not support processor >>> topology. >>> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | >>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); >>> } >>> >>> >>> which comes from this being false: >>> >>> (((_cpuid_info.tpl_cpuidB0_eax & 0x1f) | >>> _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus) != 0); >>> >>> The check I've added is sanity check to prevent same crashes in future. >>> >>> Thanks. Vladimir >>> >>> >>> On 17.11.2014 22:47, Vladimir Kozlov wrote: >>>> According to next document the cpu has 10 cores (and 2 threads per >>>> core): >>>> >>>> http://ark.intel.com/products/75275/Intel-Xeon-Processor-E5-2670-v2-25M-Cache-2_50-GHz >>>> >>>> >>>> hs_err in the bug report reports only 2 processors and next lines >>>> are missing: >>>> >>>> physical id : 0 >>>> siblings : 4 >>>> core id : 0 >>>> cpu cores : 4 >>>> apicid : 0 >>>> initial apicid : 0 >>>> >>>> I assume it is some kind of virtual environment with which cpuid >>>> topology is not working (at least our code does not >>>> work). >>>> We may missing some checks which indicates that topology is not >>>> supported. >>>> It would be nice if you can put all topology and related cpuid bits >>>> from amazon ec2 in bug report. >>>> Checking for 0 could be fine but if it is not 0 it could be still >>>> wrong if topology info is not supported. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/17/14 8:20 AM, Vladimir Kempik wrote: >>>>> Hi, >>>>> >>>>> Please review patch adding sanity check to cores_per_cpu(): >>>>> >>>>> http://cr.openjdk.java.net/~vkempik/8058935/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8058935 >>>>> >>>>> Few months ago we've got reports of java crashing in amazon ec2 >>>>> enviroment (they use Xen). >>>>> https://bugs.openjdk.java.net/browse/JDK-8058935 >>>>> https://bugs.openjdk.java.net/browse/JDK-8058937 >>>>> >>>>> JVM args was used to make the crash: -XX:+UnlockCommercialFeatures >>>>> -XX:+FlightRecorder >>>>> >>>>> After investigation I think the crash could only have happened if >>>>> support_processor_topology() returned true and >>>>> _cpuid_info.tpl_cpuidB1_ebx.bits.logical_cpus was zero. >>>>> >>>>> I wasn't able to reproduce the bug on amazon ec2 cloud in present >>>>> days. >>>>> >>>>> The patch adds sanity check, if cpu topology was used and resulted >>>>> in 0 >>>>> cores per cpu, then fallback to non-topology variant, which can't >>>>> result >>>>> in 0 cores per cpu. >>>>> >>>>> Testing: JPRT. >>>>> >>>>> Thanks, >>>>> Vladimir. >>> > From david.holmes at oracle.com Mon Nov 24 05:07:19 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 24 Nov 2014 15:07:19 +1000 Subject: RFR 8064694: Kitchensink: WaitForMultipleObjects failed in hotspot\src\os\windows\vm\os_windows.cpp: 3844 In-Reply-To: <546DE3C6.5050200@oracle.com> References: <5465F711.9090605@oracle.com> <54668EAF.9070807@oracle.com> <546915DF.7080106@oracle.com> <54699845.5010901@oracle.com> <546AF55A.8090203@oracle.com> <546B6568.7040701@oracle.com> <546DE3C6.5050200@oracle.com> Message-ID: <5472BD07.6050405@oracle.com> On 20/11/2014 10:51 PM, Ivan Gerasimov wrote: > Thank you Daniel! > > David, are you still Okay with the updated webrev? Yes. Thanks, David > Comparing to the previous one, I've added setting the priority of the > current thread at the line 3880 and changed the priority level to > from HIGHEST to ABOVE_NORMAL. > > Sincerely yours, > Ivan > > On 18.11.2014 18:27, Daniel D. Daugherty wrote: >> > http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ >> >> src/os/windows/vm/os_windows.cpp >> No commments. >> >> Thumbs up. >> >> Dan >> >> >> On 11/18/14 12:29 AM, Ivan Gerasimov wrote: >>> Hi Markus! >>> >>> The priority of the exiting thread will be raised for quite a short >>> period of time -- right before the thread finishes exiting. >>> >>> There are two places where the priority is adjusted. >>> >>> Under normal conditions we should never see the first place hit. >>> However, if we do, this means we have a huge number of threads. >>> Raising the priority of one of them is a hint about which thread we >>> want the scheduler to focus on. >>> >>> The second place is a bit different. >>> We have several threads running immediately before ending the process. >>> Some of them are at the exiting path and block exiting of the whole >>> process. >>> Raising the priority of those threads is a way to say we're not >>> interested in all the other threads, as they are going to be >>> terminated anyway. >>> >>> I just noticed that in second scenario it may be appropriate to set >>> the priority of the current thread to the same level as for the >>> exiting threads. >>> This way it'll be given a fair chance to continue if the timeout >>> expires. >>> >>> I also think it should be enough to set the priority level to >>> THREAD_PRIORITY_ABOVE_NORMAL instead of THREAD_PRIORITY_HIGHEST. >>> It will give just +1 to the priority value -- should be enough for >>> the hint. >>> >>> Would you please take a look at the updated webrev: >>> http://cr.openjdk.java.net/~igerasim/8064694/2/webrev/ >>> >>> Sincerely yours, >>> Ivan >>> >>> >>> On 17.11.2014 11:33, Markus Gr?nlund wrote: >>>> I agree with David. >>>> >>>> The side effects will be unknown and very hard to debug. >>>> >>>> Is there another way to accomplish the results without manipulating >>>> base services? >>>> >>>> Thanks >>>> Markus >>>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: den 17 november 2014 07:40 >>>> To: Ivan Gerasimov; Daniel Daugherty >>>> Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev >>>> Subject: Re: RFR 8064694: Kitchensink: WaitForMultipleObjects failed >>>> in hotspot\src\os\windows\vm\os_windows.cpp: 3844 >>>> >>>> On 17/11/2014 7:23 AM, Ivan Gerasimov wrote: >>>>> Thank you Daniel! >>>>> >>>>> Please find the updated webrev with your suggestions incorporated >>>>> here: >>>>> http://cr.openjdk.java.net/~igerasim/8064694/1/webrev/ >>>>> >>>>> Concerning the thread priority: If the application is of >>>>> NORMAL_PRIORITY_CLASS, then setting the thread's priority level to >>>>> THREAD_PRIORITY_HIGHEST will result in its priority value to be only >>>>> 10 (of maximum 31). >>>>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs. >>>>> 85).aspx >>>>> >>>>> >>>>> And if the process is HIGH_PRIORITY_CLASS, then the tread with the >>>>> HIGHEST priority level will have priority value == 15 of 31. >>>>> >>>>> I believe, it should not be too much, and the machine will not become >>>>> busy with only those closing threads. >>>>> However, I hope it would be enough to make them complete faster than >>>>> other threads of the NORMAL priority level withing the same >>>>> application. >>>> I don't think this is necessary or desirable. Under normal usage >>>> we're giving priority to exiting threads and that may disrupt the >>>> usual scheduling patterns that applications see. You may posit that >>>> it is "harmless" but we can't say that for sure. Nor can we actually >>>> know that this will help with this particular bug. I would not add >>>> in this new code. >>>> >>>> David >>>> >>>>> Sincerely yours, >>>>> Ivan >>>>> >>>>> >>>>> On 15.11.2014 2:22, Daniel D. Daugherty wrote: >>>>>> On 11/14/14 5:35 AM, Ivan Gerasimov wrote: >>>>>>> Hello! >>>>>>> >>>>>>> The recent fix for JDK-8059533 ((process) Make exiting process wait >>>>>>> for exiting threads [win]) caused the warning message to be printed >>>>>>> in some test environments: >>>>>>> ----------- >>>>>>> os_windows.cpp:3844 is in the newly updated >>>>>>> os::win32::exit_process_or_thread(Ept what, int exit_code) >>>>>>> ----------- >>>>>>> >>>>>>> This has been observed with debug builds on highly loaded systems. >>>>>>> >>>>>>> >>>>>>> To address the issue it is proposed to do three things: >>>>>>> 1) increase the timeout for debug builds, >>>>>>> 2) increase the maximum number of the thread handles to be stored, >>>>>>> 3) rise the priority of the exiting threads, if we need to wait for >>>>>>> them. >>>>>>> >>>>>>> Would you please help review the fix? >>>>>>> >>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8064694 >>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8064694/0/webrev/ >>>>>> src/os/windows/vm/os_windows.cpp >>>>>> >>>>>> line 3784: #define MAX_EXIT_HANDLES NOT_DEBUG(32) DEBUG_ONLY(128) >>>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>>> >>>>>> That uses the smaller value for only one build config (PRODUCT). >>>>>> >>>>>> line 3785: #define EXIT_TIMEOUT NOT_DEBUG(1000) >>>>>> DEBUG_ONLY(4000) >>>>>> /*1 sec in product, 4 sec in debug*/ >>>>>> Instead of NOT_DEBUG can you use PRODUCT_ONLY? >>>>>> Instead of DEBUG_ONLY can you used NOT_PRODUCT? >>>>>> Please add spaces between the comment delimiters and the comment >>>>>> text. >>>>>> >>>>>> That uses the smaller timeout for only one build config >>>>>> (PRODUCT). >>>>>> >>>>>> line 3836 // Rise the priority... >>>>>> Typo: 'Rise' -> 'Raise' >>>>>> >>>>>> About the general idea of raising the exiting thread's priority, >>>>>> if the exiting thread is looping in some Win* OS code after this >>>>>> point, will raising the priority make the machine unusable? >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> The fix was tested on all available platforms, with the hotspot >>>>>>> testset. No failures. >>>>>>> >>>>>>> Sincerely yours, >>>>>>> Ivan >>>>>>> >>>>>> >>>>>> >>>> >>> >> >> >> > From ioi.lam at oracle.com Mon Nov 24 11:58:52 2014 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 24 Nov 2014 19:58:52 +0800 Subject: RFR (S) [8u40] backport request 8065346 and 8064701 Message-ID: <54731D7C.4010504@oracle.com> Hi, Please review the backport of these two bugs from 9 to 8u40. The patches applied cleanly. http://cr.openjdk.java.net/~iklam/8065346_8064701_backport_8u40/ 8064701: Some CDS optimizations should be disabled if bootclasspath is modified by JVMTI Summary: Added API to track bootclasspath modification 8065346: WB_AddToBootstrapClassLoaderSearch calls JvmtiEnv::create_a_jvmti when not in _thread_in_vm state Summary: Removed ThreadToNativeFromVM and use java_lang_String::as_utf8_string instead Thanks - Ioi From yasuenag at gmail.com Mon Nov 24 13:21:41 2014 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Mon, 24 Nov 2014 22:21:41 +0900 Subject: RFR: JDK-8059586: hs_err report should treat redirected core pattern. In-Reply-To: <543E80F8.3080204@gmail.com> References: <542C8274.3010809@gmail.com> <54338B70.9080709@oracle.com> <543B1FD6.3000200@oracle.com> <543CF553.80601@gmail.com> <543DC2BF.9050407@oracle.com> <543E80F8.3080204@gmail.com> Message-ID: <547330E5.1050708@gmail.com> Hi all, I've uploaded webrev for this issue about a month ago. Could you review it and sponsor it? Thanks, Yasumasa On 10/15/2014 11:13 PM, Yasumasa Suenaga wrote: > Hi David, > > I've uploaded new webrev: > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.02/ > > >> I wasn't suggesting that you make such a change though because it is large and disruptive. > >> Unfactoring check_or_create_dump is a step backwards in terms of code sharing. > > I restored check_or_create_dump() to os_posix.cpp . > And I changed get_core_path() to create message which represents core dump path > (including filename) in each OS. > > >> Expanding the get_core_path in os_linux.cpp to handle the core_pattern may be okay (but I don't know enough about it to validate everything). > > I implemented all parameters in Linux kernel documentation: > https://www.kernel.org/doc/Documentation/sysctl/kernel.txt > > So I think that parameters which are processed are enough. > > > Thanks, > > Yasumasa > > > > (2014/10/15 9:41), David Holmes wrote: >> On 14/10/2014 8:05 PM, Yasumasa Suenaga wrote: >>> Hi David, >>> >>> Thank you for comments! >>> I've uploaded new webrev. Could you review it again? >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.01/ >>> >>> I am an author of jdk9. So I cannot commit it. >>> Could you be a sponsor for this enhancement? >>> >>> >>>> In which case that should be handled by the linux specific >>>> get_core_path() function. >>> >>> Agree. >>> So I implemented it in os_linux.cpp . >>> But part of format characters (%P: global pid, %s: signal, %t dump time) >>> are not processed >>> in this function because I think these parameters are difficult to >>> handle in it. >>> >>> %P: I could not find API for this. >>> %s: We have to change arguments of get_core_path() . >>> %t: This parameter means timestamp of coredump. It is decided in Kernel. >>> >>> >>>> Fixing this means changing all the os_posix using platforms. But your >>>> patch is not about this part. :) >>> >>> I moved os::check_or_create_dump() to each OS implementations (AIX, BSD, >>> Solaris, Linux) . >>> So I can write Linux specific code to check_or_create_dump() . >>> As a result, I could remove "#ifdef LINUX" from os_posix.cpp :-) >> >> I wasn't suggesting that you make such a change though because it is large and disruptive. The simple handling of the | part of core_pattern was basically ok. Expanding the get_core_path in os_linux.cpp to handle the core_pattern may be okay (but I don't know enough about it to validate everything). Unfactoring check_or_create_dump is a step backwards in terms of code sharing. >> >> Sorry this has grown too large for me to deal with right now. >> >> David >> ----- >> >>> >>>> Though I'm unclear whether it both invokes the program and creates a >>>> core dump file; or just invokes the program? >>> >>> If '|' is set, Linux kernel will just redirect core image to user process. >>> Kernel documentation says as below: >>> ------------ >>> . If the first character of the pattern is a '|', the kernel will treat >>> the rest of the pattern as a command to run. The core dump will be >>> written to the standard input of that program instead of to a file. >>> ------------ >>> >>> And implementation of coredump (do_coredump()) follows to it. >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c >>> >>> >>> In case of ABRT, ABRT dumps core image to default location >>> (/core.) >>> if user set unlimited to resource limit of core (ulimit -c) . >>> https://github.com/abrt/abrt/blob/master/src/hooks/abrt-hook-ccpp.c >>> >>> >>>> A few style nits - you need spaces around keywords and before braces >>>> I also suggest saying "Core dumps may be processed with ..." rather >>>> than "treated". >>>> And as you don't do anything in the non-redirect case I suggest >>>> collapsing this: >>> >>> I've fixed them. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> (2014/10/13 9:41), David Holmes wrote: >>>> Hi Yasumasa, >>>> >>>> On 7/10/2014 8:48 PM, Yasumasa Suenaga wrote: >>>>> Hi David, >>>>> >>>>> Sorry for my English. >>>>> >>>>> I want to propose that JVM should create message according to core >>>>> pattern (/proc/sys/kernel/core_pattern) . >>>>> So I filed it to JBS and created a patch. >>>> >>>> So I've had a quick look at this core_pattern business and it seems to >>>> me that there are two aspects to this. >>>> >>>> First, without the leading |, the entry in the core_pattern file is a >>>> naming pattern for the core file. In which case that should be handled >>>> by the linux specific get_core_path() function. Though that in itself >>>> can't fully report the expected name, as part of it is provided in the >>>> shared code in os::check_or_create_dump. Fixing this means changing >>>> all the os_posix using platforms. But your patch is not about this >>>> part. :) >>>> >>>> Second, with a leading | the core_pattern is actually the name of a >>>> program to execute when the program is about to core dump, and that is >>>> what you report with your patch. Though I'm unclear whether it both >>>> invokes the program and creates a core dump file; or just invokes the >>>> program? >>>> >>>> So with regards to this second part your patch seems functionally ok. >>>> I do dislike having a big chunk of linux specific code in this "posix" >>>> support file but ... >>>> >>>> A few style nits - you need spaces around keywords and before braces eg: >>>> >>>> if(x){ >>>> >>>> should be >>>> >>>> if (x) { >>>> >>>> I also suggest saying "Core dumps may be processed with ..." rather >>>> than "treated". >>>> >>>> And as you don't do anything in the non-redirect case I suggest >>>> collapsing this: >>>> >>>> 83 is_redirect = core_pattern[0] == '|'; >>>> 84 } >>>> 85 >>>> 86 if(is_redirect){ >>>> 87 jio_snprintf(buffer, bufferSize, >>>> 88 "Core dumps may be treated with \"%s\"", >>>> &core_pattern[1]); >>>> 89 } >>>> >>>> to just >>>> >>>> 83 if (core_pattern[0] == '|') { // redirect >>>> 84 jio_snprintf(buffer, bufferSize, "Core dumps may be >>>> processed with \"%s\"", &core_pattern[1]); >>>> 85 } >>>> 86 } >>>> >>>> Comments from other runtime folk appreciated. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> 2014/10/07 15:43 "David Holmes" >>>> >: >>>>> >>>>> Hi Yasumasa, >>>>> >>>>> I'm sorry but I don't understand what you are proposing. When you >>>>> say >>>>> "treat" do you mean "create"? Otherwise what do you mean by >>>>> "treated"? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 2/10/2014 8:38 AM, Yasumasa Suenaga wrote: >>>>> > I'm in Hackergarten @ JavaOne :-) >>>>> > >>>>> > >>>>> > Hi all, >>>>> > >>>>> > I would like to enhance the messages in hs_err report. >>>>> > Modern Linux kernel can treat core dump with user process >>>>> (e.g. ABRT) >>>>> > However, hs_err report cannot detect it. >>>>> > >>>>> > I think that hs_err report should output messages as below: >>>>> > ------------- >>>>> > Failed to write core dump. Core dumps may be treated with >>>>> "/usr/sbin/chroot /proc/%P/root /usr/libexec/abrt-hook-ccpp %s %c %p >>>>> %u %g %t e" >>>>> > ------------- >>>>> > >>>>> > I've uploaded webrev of this enhancement. >>>>> > Could you review it? >>>>> > >>>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.00/ >>>>> > >>>>> > This patch works fine on Fedora20 x86_64. >>>>> > >>>>> > >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Yasumasa >>>>> > >>>>> From jiangli.zhou at oracle.com Mon Nov 24 18:00:12 2014 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 24 Nov 2014 10:00:12 -0800 Subject: RFR (S) [8u40] backport request 8065346 and 8064701 In-Reply-To: <54731D7C.4010504@oracle.com> References: <54731D7C.4010504@oracle.com> Message-ID: <5473722C.2020907@oracle.com> Hi Ioi, Looks good for backport. Thanks, Jiangli On 11/24/2014 03:58 AM, Ioi Lam wrote: > Hi, > > Please review the backport of these two bugs from 9 to 8u40. The > patches applied cleanly. > > http://cr.openjdk.java.net/~iklam/8065346_8064701_backport_8u40/ > > 8064701: Some CDS optimizations should be disabled if bootclasspath is > modified by JVMTI > Summary: Added API to track bootclasspath modification > > 8065346: WB_AddToBootstrapClassLoaderSearch calls > JvmtiEnv::create_a_jvmti when not in _thread_in_vm state > Summary: Removed ThreadToNativeFromVM and use > java_lang_String::as_utf8_string instead > > Thanks > - Ioi From ioi.lam at oracle.com Tue Nov 25 01:32:27 2014 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 25 Nov 2014 09:32:27 +0800 Subject: RFR (S) [8u40] backport request 8065346 and 8064701 In-Reply-To: <5473722C.2020907@oracle.com> References: <54731D7C.4010504@oracle.com> <5473722C.2020907@oracle.com> Message-ID: <5473DC2B.7000903@oracle.com> Thanks Jiangli! - Ioi On 11/25/14, 2:00 AM, Jiangli Zhou wrote: > Hi Ioi, > > Looks good for backport. > > Thanks, > Jiangli > > On 11/24/2014 03:58 AM, Ioi Lam wrote: >> Hi, >> >> Please review the backport of these two bugs from 9 to 8u40. The >> patches applied cleanly. >> >> http://cr.openjdk.java.net/~iklam/8065346_8064701_backport_8u40/ >> >> 8064701: Some CDS optimizations should be disabled if bootclasspath >> is modified by JVMTI >> Summary: Added API to track bootclasspath modification >> >> 8065346: WB_AddToBootstrapClassLoaderSearch calls >> JvmtiEnv::create_a_jvmti when not in _thread_in_vm state >> Summary: Removed ThreadToNativeFromVM and use >> java_lang_String::as_utf8_string instead >> >> Thanks >> - Ioi > From yasuenag at gmail.com Tue Nov 25 03:34:44 2014 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Tue, 25 Nov 2014 12:34:44 +0900 Subject: guarantee(PageArmed == 0) failed: invaliant Message-ID: <5473F8D4.8000107@gmail.com> Hi all, My customer encountered crash with below messages: -------- Internal Error (safepoint.cpp:309) guarantee(PageArmed == 0) failed: invaliant -------- - JDK: JDK6u37 x64 - OS: RHEL 5.4 x86_64 I found similar issues in JBS: - JDK-7116986 - JDK-7156454 - JDK-8033717 I read safepoint.cpp in jdk9, I guess this error is caused in below: -------- if (int(iterations) == DeferPollingPageLoopCount) { guarantee (PageArmed == 0, "invariant") ; PageArmed = 1 ; os::make_polling_page_unreadable(); } -------- "iterations" is defined as "unsigned int", and increments in each loop. On the other hand, DeferPollingPageLoopCount is defined intx and default value is "-1" . "PageArmed" sets to 1. -------- if (DeferPollingPageLoopCount < 0) { // Make polling safepoint aware guarantee (PageArmed == 0, "invariant") ; PageArmed = 1 ; os::make_polling_page_unreadable(); } -------- If "iterations" is overflowed, do we encounter this guarantee ? I think this "if" statement should rewrite as below: -------- diff -r 7e08ae41ddbe src/share/vm/runtime/safepoint.cpp --- a/src/share/vm/runtime/safepoint.cpp Mon Nov 24 09:57:02 2014 +0100 +++ b/src/share/vm/runtime/safepoint.cpp Tue Nov 25 12:19:58 2014 +0900 @@ -288,7 +288,8 @@ // 9. On windows consider using the return value from SwitchThreadTo() // to drive subsequent spin/SwitchThreadTo()/Sleep(N) decisions. - if (int(iterations) == DeferPollingPageLoopCount) { + if ((DeferPollingPageLoopCount >= 0) && + (int(iterations) == DeferPollingPageLoopCount)) { guarantee (PageArmed == 0, "invariant") ; PageArmed = 1 ; os::make_polling_page_unreadable(); -------- If it is correct, I will file it to JBS and upload webrev. Could you help me to resolve this issue? Thanks, Yasumasa From david.holmes at oracle.com Tue Nov 25 07:04:20 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Nov 2014 17:04:20 +1000 Subject: guarantee(PageArmed == 0) failed: invaliant In-Reply-To: <5473F8D4.8000107@gmail.com> References: <5473F8D4.8000107@gmail.com> Message-ID: <547429F4.2020803@oracle.com> Hi Yasumasa, On 25/11/2014 1:34 PM, Yasumasa Suenaga wrote: > Hi all, > > My customer encountered crash with below messages: > -------- > Internal Error (safepoint.cpp:309) > guarantee(PageArmed == 0) failed: invaliant > -------- > - JDK: JDK6u37 x64 > - OS: RHEL 5.4 x86_64 > > I found similar issues in JBS: > - JDK-7116986 > - JDK-7156454 > - JDK-8033717 > > I read safepoint.cpp in jdk9, I guess this error is caused in below: > -------- > if (int(iterations) == DeferPollingPageLoopCount) { > guarantee (PageArmed == 0, "invariant") ; > PageArmed = 1 ; > os::make_polling_page_unreadable(); > } > -------- > > "iterations" is defined as "unsigned int", and increments in each loop. > On the other hand, DeferPollingPageLoopCount is defined intx and default > value is "-1" . > > "PageArmed" sets to 1. > -------- > if (DeferPollingPageLoopCount < 0) { > // Make polling safepoint aware > guarantee (PageArmed == 0, "invariant") ; > PageArmed = 1 ; > os::make_polling_page_unreadable(); > } > -------- > > > If "iterations" is overflowed, do we encounter this guarantee ? > I think this "if" statement should rewrite as below: No we want this overflow to trigger the guarantee failure - it indicates a problem elsewhere in the VM because a thread is not reaching the safepoint that has been requested, in a timely manner. When crashes like this occur you need to examine all the running threads to find out which are not safepoint-safe and then determine what they are doing and why they have not performed a safepoint check. David ------ > -------- > diff -r 7e08ae41ddbe src/share/vm/runtime/safepoint.cpp > --- a/src/share/vm/runtime/safepoint.cpp Mon Nov 24 09:57:02 2014 +0100 > +++ b/src/share/vm/runtime/safepoint.cpp Tue Nov 25 12:19:58 2014 +0900 > @@ -288,7 +288,8 @@ > // 9. On windows consider using the return value from SwitchThreadTo() > // to drive subsequent spin/SwitchThreadTo()/Sleep(N) decisions. > > - if (int(iterations) == DeferPollingPageLoopCount) { > + if ((DeferPollingPageLoopCount >= 0) && > + (int(iterations) == DeferPollingPageLoopCount)) { > guarantee (PageArmed == 0, "invariant") ; > PageArmed = 1 ; > os::make_polling_page_unreadable(); > -------- > > > If it is correct, I will file it to JBS and upload webrev. > Could you help me to resolve this issue? > > > Thanks, > > Yasumasa > From david.holmes at oracle.com Tue Nov 25 08:38:58 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Nov 2014 18:38:58 +1000 Subject: RFR: JDK-8059586: hs_err report should treat redirected core pattern. In-Reply-To: <547330E5.1050708@gmail.com> References: <542C8274.3010809@gmail.com> <54338B70.9080709@oracle.com> <543B1FD6.3000200@oracle.com> <543CF553.80601@gmail.com> <543DC2BF.9050407@oracle.com> <543E80F8.3080204@gmail.com> <547330E5.1050708@gmail.com> Message-ID: <54744022.2030208@oracle.com> Sorry Yasumasa, this fell off my radar and I was hoping for others to comment. We still need a second reviewer. The change in: src/os/aix/vm/os_aix.cpp src/os/solaris/vm/os_solaris.cpp jio_snprintf(buffer, bufferSize, "%s/core or core.%d", current_process_id()); has no argument for the %s - presumably p was intended. Thanks, David On 24/11/2014 11:21 PM, Yasumasa Suenaga wrote: > Hi all, > > I've uploaded webrev for this issue about a month ago. > Could you review it and sponsor it? > > > Thanks, > > Yasumasa > > > On 10/15/2014 11:13 PM, Yasumasa Suenaga wrote: >> Hi David, >> >> I've uploaded new webrev: >> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.02/ >> >> >>> I wasn't suggesting that you make such a change though because it is >>> large and disruptive. >> >>> Unfactoring check_or_create_dump is a step backwards in terms of code >>> sharing. >> >> I restored check_or_create_dump() to os_posix.cpp . >> And I changed get_core_path() to create message which represents core >> dump path >> (including filename) in each OS. >> >> >>> Expanding the get_core_path in os_linux.cpp to handle the >>> core_pattern may be okay (but I don't know enough about it to >>> validate everything). >> >> I implemented all parameters in Linux kernel documentation: >> https://www.kernel.org/doc/Documentation/sysctl/kernel.txt >> >> So I think that parameters which are processed are enough. >> >> >> Thanks, >> >> Yasumasa >> >> >> >> (2014/10/15 9:41), David Holmes wrote: >>> On 14/10/2014 8:05 PM, Yasumasa Suenaga wrote: >>>> Hi David, >>>> >>>> Thank you for comments! >>>> I've uploaded new webrev. Could you review it again? >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.01/ >>>> >>>> I am an author of jdk9. So I cannot commit it. >>>> Could you be a sponsor for this enhancement? >>>> >>>> >>>>> In which case that should be handled by the linux specific >>>>> get_core_path() function. >>>> >>>> Agree. >>>> So I implemented it in os_linux.cpp . >>>> But part of format characters (%P: global pid, %s: signal, %t dump >>>> time) >>>> are not processed >>>> in this function because I think these parameters are difficult to >>>> handle in it. >>>> >>>> %P: I could not find API for this. >>>> %s: We have to change arguments of get_core_path() . >>>> %t: This parameter means timestamp of coredump. It is decided in >>>> Kernel. >>>> >>>> >>>>> Fixing this means changing all the os_posix using platforms. But your >>>>> patch is not about this part. :) >>>> >>>> I moved os::check_or_create_dump() to each OS implementations (AIX, >>>> BSD, >>>> Solaris, Linux) . >>>> So I can write Linux specific code to check_or_create_dump() . >>>> As a result, I could remove "#ifdef LINUX" from os_posix.cpp :-) >>> >>> I wasn't suggesting that you make such a change though because it is >>> large and disruptive. The simple handling of the | part of >>> core_pattern was basically ok. Expanding the get_core_path in >>> os_linux.cpp to handle the core_pattern may be okay (but I don't know >>> enough about it to validate everything). Unfactoring >>> check_or_create_dump is a step backwards in terms of code sharing. >>> >>> Sorry this has grown too large for me to deal with right now. >>> >>> David >>> ----- >>> >>>> >>>>> Though I'm unclear whether it both invokes the program and creates a >>>>> core dump file; or just invokes the program? >>>> >>>> If '|' is set, Linux kernel will just redirect core image to user >>>> process. >>>> Kernel documentation says as below: >>>> ------------ >>>> . If the first character of the pattern is a '|', the kernel will treat >>>> the rest of the pattern as a command to run. The core dump will be >>>> written to the standard input of that program instead of to a file. >>>> ------------ >>>> >>>> And implementation of coredump (do_coredump()) follows to it. >>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c >>>> >>>> >>>> >>>> In case of ABRT, ABRT dumps core image to default location >>>> (/core.) >>>> if user set unlimited to resource limit of core (ulimit -c) . >>>> https://github.com/abrt/abrt/blob/master/src/hooks/abrt-hook-ccpp.c >>>> >>>> >>>>> A few style nits - you need spaces around keywords and before braces >>>>> I also suggest saying "Core dumps may be processed with ..." rather >>>>> than "treated". >>>>> And as you don't do anything in the non-redirect case I suggest >>>>> collapsing this: >>>> >>>> I've fixed them. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> (2014/10/13 9:41), David Holmes wrote: >>>>> Hi Yasumasa, >>>>> >>>>> On 7/10/2014 8:48 PM, Yasumasa Suenaga wrote: >>>>>> Hi David, >>>>>> >>>>>> Sorry for my English. >>>>>> >>>>>> I want to propose that JVM should create message according to core >>>>>> pattern (/proc/sys/kernel/core_pattern) . >>>>>> So I filed it to JBS and created a patch. >>>>> >>>>> So I've had a quick look at this core_pattern business and it seems to >>>>> me that there are two aspects to this. >>>>> >>>>> First, without the leading |, the entry in the core_pattern file is a >>>>> naming pattern for the core file. In which case that should be handled >>>>> by the linux specific get_core_path() function. Though that in itself >>>>> can't fully report the expected name, as part of it is provided in the >>>>> shared code in os::check_or_create_dump. Fixing this means changing >>>>> all the os_posix using platforms. But your patch is not about this >>>>> part. :) >>>>> >>>>> Second, with a leading | the core_pattern is actually the name of a >>>>> program to execute when the program is about to core dump, and that is >>>>> what you report with your patch. Though I'm unclear whether it both >>>>> invokes the program and creates a core dump file; or just invokes the >>>>> program? >>>>> >>>>> So with regards to this second part your patch seems functionally ok. >>>>> I do dislike having a big chunk of linux specific code in this "posix" >>>>> support file but ... >>>>> >>>>> A few style nits - you need spaces around keywords and before >>>>> braces eg: >>>>> >>>>> if(x){ >>>>> >>>>> should be >>>>> >>>>> if (x) { >>>>> >>>>> I also suggest saying "Core dumps may be processed with ..." rather >>>>> than "treated". >>>>> >>>>> And as you don't do anything in the non-redirect case I suggest >>>>> collapsing this: >>>>> >>>>> 83 is_redirect = core_pattern[0] == '|'; >>>>> 84 } >>>>> 85 >>>>> 86 if(is_redirect){ >>>>> 87 jio_snprintf(buffer, bufferSize, >>>>> 88 "Core dumps may be treated with \"%s\"", >>>>> &core_pattern[1]); >>>>> 89 } >>>>> >>>>> to just >>>>> >>>>> 83 if (core_pattern[0] == '|') { // redirect >>>>> 84 jio_snprintf(buffer, bufferSize, "Core dumps may be >>>>> processed with \"%s\"", &core_pattern[1]); >>>>> 85 } >>>>> 86 } >>>>> >>>>> Comments from other runtime folk appreciated. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> 2014/10/07 15:43 "David Holmes" >>>>> >: >>>>>> >>>>>> Hi Yasumasa, >>>>>> >>>>>> I'm sorry but I don't understand what you are proposing. When you >>>>>> say >>>>>> "treat" do you mean "create"? Otherwise what do you mean by >>>>>> "treated"? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 2/10/2014 8:38 AM, Yasumasa Suenaga wrote: >>>>>> > I'm in Hackergarten @ JavaOne :-) >>>>>> > >>>>>> > >>>>>> > Hi all, >>>>>> > >>>>>> > I would like to enhance the messages in hs_err report. >>>>>> > Modern Linux kernel can treat core dump with user process >>>>>> (e.g. ABRT) >>>>>> > However, hs_err report cannot detect it. >>>>>> > >>>>>> > I think that hs_err report should output messages as below: >>>>>> > ------------- >>>>>> > Failed to write core dump. Core dumps may be treated with >>>>>> "/usr/sbin/chroot /proc/%P/root /usr/libexec/abrt-hook-ccpp %s >>>>>> %c %p >>>>>> %u %g %t e" >>>>>> > ------------- >>>>>> > >>>>>> > I've uploaded webrev of this enhancement. >>>>>> > Could you review it? >>>>>> > >>>>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.00/ >>>>>> > >>>>>> > This patch works fine on Fedora20 x86_64. >>>>>> > >>>>>> > >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > Yasumasa >>>>>> > >>>>>> From yasuenag at gmail.com Tue Nov 25 08:48:33 2014 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Tue, 25 Nov 2014 17:48:33 +0900 Subject: guarantee(PageArmed == 0) failed: invaliant In-Reply-To: <547429F4.2020803@oracle.com> References: <5473F8D4.8000107@gmail.com> <547429F4.2020803@oracle.com> Message-ID: Hi David, Thank you for details. I can understand purpose for this guarantee. I read hs_err again, I found thread which state is _thread_new . I guess it is reason of this issue, but I cannot evaluate because core image was not available. If this crash will be reproduced, I will try check details. Thanks, Yasumasa 2014/11/25 16:04 "David Holmes" : > Hi Yasumasa, > > On 25/11/2014 1:34 PM, Yasumasa Suenaga wrote: > > Hi all, > > > > My customer encountered crash with below messages: > > -------- > > Internal Error (safepoint.cpp:309) > > guarantee(PageArmed == 0) failed: invaliant > > -------- > > - JDK: JDK6u37 x64 > > - OS: RHEL 5.4 x86_64 > > > > I found similar issues in JBS: > > - JDK-7116986 > > - JDK-7156454 > > - JDK-8033717 > > > > I read safepoint.cpp in jdk9, I guess this error is caused in below: > > -------- > > if (int(iterations) == DeferPollingPageLoopCount) { > > guarantee (PageArmed == 0, "invariant") ; > > PageArmed = 1 ; > > os::make_polling_page_unreadable(); > > } > > -------- > > > > "iterations" is defined as "unsigned int", and increments in each loop. > > On the other hand, DeferPollingPageLoopCount is defined intx and default > > value is "-1" . > > > > "PageArmed" sets to 1. > > -------- > > if (DeferPollingPageLoopCount < 0) { > > // Make polling safepoint aware > > guarantee (PageArmed == 0, "invariant") ; > > PageArmed = 1 ; > > os::make_polling_page_unreadable(); > > } > > -------- > > > > > > If "iterations" is overflowed, do we encounter this guarantee ? > > I think this "if" statement should rewrite as below: > > No we want this overflow to trigger the guarantee failure - it indicates > a problem elsewhere in the VM because a thread is not reaching the > safepoint that has been requested, in a timely manner. > > When crashes like this occur you need to examine all the running threads > to find out which are not safepoint-safe and then determine what they > are doing and why they have not performed a safepoint check. > > David > ------ > > > -------- > > diff -r 7e08ae41ddbe src/share/vm/runtime/safepoint.cpp > > --- a/src/share/vm/runtime/safepoint.cpp Mon Nov 24 09:57:02 2014 > +0100 > > +++ b/src/share/vm/runtime/safepoint.cpp Tue Nov 25 12:19:58 2014 > +0900 > > @@ -288,7 +288,8 @@ > > // 9. On windows consider using the return value from > SwitchThreadTo() > > // to drive subsequent spin/SwitchThreadTo()/Sleep(N) > decisions. > > > > - if (int(iterations) == DeferPollingPageLoopCount) { > > + if ((DeferPollingPageLoopCount >= 0) && > > + (int(iterations) == DeferPollingPageLoopCount)) { > > guarantee (PageArmed == 0, "invariant") ; > > PageArmed = 1 ; > > os::make_polling_page_unreadable(); > > -------- > > > > > > If it is correct, I will file it to JBS and upload webrev. > > Could you help me to resolve this issue? > > > > > > Thanks, > > > > Yasumasa > > > From staffan.larsen at oracle.com Tue Nov 25 09:15:34 2014 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Tue, 25 Nov 2014 10:15:34 +0100 Subject: RFR: JDK-8059586: hs_err report should treat redirected core pattern. In-Reply-To: <547330E5.1050708@gmail.com> References: <542C8274.3010809@gmail.com> <54338B70.9080709@oracle.com> <543B1FD6.3000200@oracle.com> <543CF553.80601@gmail.com> <543DC2BF.9050407@oracle.com> <543E80F8.3080204@gmail.com> <547330E5.1050708@gmail.com> Message-ID: src/os/bsd/vm/os_linux.cpp: I?m inclined to think this is too complicated and hard to test and maintain (and I see no tests in the webrev). Could we not simplify this to print a helpful message instead? Something that prints the core_pattern and perhaps some of the values that could be used for substitution, but does not do the actual substitution? I think that would go a long way but be a lot more maintainable. src/os/bsd/vm/os_bsd.cpp: On OS X cores are by default written to /cores/core.. This is configureable with the kern.corefile sysctl variable, although it is rare to do so. /Staffan > On 24 nov 2014, at 14:21, Yasumasa Suenaga wrote: > > Hi all, > > I've uploaded webrev for this issue about a month ago. > Could you review it and sponsor it? > > > Thanks, > > Yasumasa > > > On 10/15/2014 11:13 PM, Yasumasa Suenaga wrote: >> Hi David, >> >> I've uploaded new webrev: >> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.02/ >> >> >>> I wasn't suggesting that you make such a change though because it is large and disruptive. >> >>> Unfactoring check_or_create_dump is a step backwards in terms of code sharing. >> >> I restored check_or_create_dump() to os_posix.cpp . >> And I changed get_core_path() to create message which represents core dump path >> (including filename) in each OS. >> >> >>> Expanding the get_core_path in os_linux.cpp to handle the core_pattern may be okay (but I don't know enough about it to validate everything). >> >> I implemented all parameters in Linux kernel documentation: >> https://www.kernel.org/doc/Documentation/sysctl/kernel.txt >> >> So I think that parameters which are processed are enough. >> >> >> Thanks, >> >> Yasumasa >> >> >> >> (2014/10/15 9:41), David Holmes wrote: >>> On 14/10/2014 8:05 PM, Yasumasa Suenaga wrote: >>>> Hi David, >>>> >>>> Thank you for comments! >>>> I've uploaded new webrev. Could you review it again? >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.01/ >>>> >>>> I am an author of jdk9. So I cannot commit it. >>>> Could you be a sponsor for this enhancement? >>>> >>>> >>>>> In which case that should be handled by the linux specific >>>>> get_core_path() function. >>>> >>>> Agree. >>>> So I implemented it in os_linux.cpp . >>>> But part of format characters (%P: global pid, %s: signal, %t dump time) >>>> are not processed >>>> in this function because I think these parameters are difficult to >>>> handle in it. >>>> >>>> %P: I could not find API for this. >>>> %s: We have to change arguments of get_core_path() . >>>> %t: This parameter means timestamp of coredump. It is decided in Kernel. >>>> >>>> >>>>> Fixing this means changing all the os_posix using platforms. But your >>>>> patch is not about this part. :) >>>> >>>> I moved os::check_or_create_dump() to each OS implementations (AIX, BSD, >>>> Solaris, Linux) . >>>> So I can write Linux specific code to check_or_create_dump() . >>>> As a result, I could remove "#ifdef LINUX" from os_posix.cpp :-) >>> >>> I wasn't suggesting that you make such a change though because it is large and disruptive. The simple handling of the | part of core_pattern was basically ok. Expanding the get_core_path in os_linux.cpp to handle the core_pattern may be okay (but I don't know enough about it to validate everything). Unfactoring check_or_create_dump is a step backwards in terms of code sharing. >>> >>> Sorry this has grown too large for me to deal with right now. >>> >>> David >>> ----- >>> >>>> >>>>> Though I'm unclear whether it both invokes the program and creates a >>>>> core dump file; or just invokes the program? >>>> >>>> If '|' is set, Linux kernel will just redirect core image to user process. >>>> Kernel documentation says as below: >>>> ------------ >>>> . If the first character of the pattern is a '|', the kernel will treat >>>> the rest of the pattern as a command to run. The core dump will be >>>> written to the standard input of that program instead of to a file. >>>> ------------ >>>> >>>> And implementation of coredump (do_coredump()) follows to it. >>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c >>>> >>>> >>>> In case of ABRT, ABRT dumps core image to default location >>>> (/core.) >>>> if user set unlimited to resource limit of core (ulimit -c) . >>>> https://github.com/abrt/abrt/blob/master/src/hooks/abrt-hook-ccpp.c >>>> >>>> >>>>> A few style nits - you need spaces around keywords and before braces >>>>> I also suggest saying "Core dumps may be processed with ..." rather >>>>> than "treated". >>>>> And as you don't do anything in the non-redirect case I suggest >>>>> collapsing this: >>>> >>>> I've fixed them. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> (2014/10/13 9:41), David Holmes wrote: >>>>> Hi Yasumasa, >>>>> >>>>> On 7/10/2014 8:48 PM, Yasumasa Suenaga wrote: >>>>>> Hi David, >>>>>> >>>>>> Sorry for my English. >>>>>> >>>>>> I want to propose that JVM should create message according to core >>>>>> pattern (/proc/sys/kernel/core_pattern) . >>>>>> So I filed it to JBS and created a patch. >>>>> >>>>> So I've had a quick look at this core_pattern business and it seems to >>>>> me that there are two aspects to this. >>>>> >>>>> First, without the leading |, the entry in the core_pattern file is a >>>>> naming pattern for the core file. In which case that should be handled >>>>> by the linux specific get_core_path() function. Though that in itself >>>>> can't fully report the expected name, as part of it is provided in the >>>>> shared code in os::check_or_create_dump. Fixing this means changing >>>>> all the os_posix using platforms. But your patch is not about this >>>>> part. :) >>>>> >>>>> Second, with a leading | the core_pattern is actually the name of a >>>>> program to execute when the program is about to core dump, and that is >>>>> what you report with your patch. Though I'm unclear whether it both >>>>> invokes the program and creates a core dump file; or just invokes the >>>>> program? >>>>> >>>>> So with regards to this second part your patch seems functionally ok. >>>>> I do dislike having a big chunk of linux specific code in this "posix" >>>>> support file but ... >>>>> >>>>> A few style nits - you need spaces around keywords and before braces eg: >>>>> >>>>> if(x){ >>>>> >>>>> should be >>>>> >>>>> if (x) { >>>>> >>>>> I also suggest saying "Core dumps may be processed with ..." rather >>>>> than "treated". >>>>> >>>>> And as you don't do anything in the non-redirect case I suggest >>>>> collapsing this: >>>>> >>>>> 83 is_redirect = core_pattern[0] == '|'; >>>>> 84 } >>>>> 85 >>>>> 86 if(is_redirect){ >>>>> 87 jio_snprintf(buffer, bufferSize, >>>>> 88 "Core dumps may be treated with \"%s\"", >>>>> &core_pattern[1]); >>>>> 89 } >>>>> >>>>> to just >>>>> >>>>> 83 if (core_pattern[0] == '|') { // redirect >>>>> 84 jio_snprintf(buffer, bufferSize, "Core dumps may be >>>>> processed with \"%s\"", &core_pattern[1]); >>>>> 85 } >>>>> 86 } >>>>> >>>>> Comments from other runtime folk appreciated. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> 2014/10/07 15:43 "David Holmes" >>>>> >: >>>>>> >>>>>> Hi Yasumasa, >>>>>> >>>>>> I'm sorry but I don't understand what you are proposing. When you >>>>>> say >>>>>> "treat" do you mean "create"? Otherwise what do you mean by >>>>>> "treated"? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 2/10/2014 8:38 AM, Yasumasa Suenaga wrote: >>>>>> > I'm in Hackergarten @ JavaOne :-) >>>>>> > >>>>>> > >>>>>> > Hi all, >>>>>> > >>>>>> > I would like to enhance the messages in hs_err report. >>>>>> > Modern Linux kernel can treat core dump with user process >>>>>> (e.g. ABRT) >>>>>> > However, hs_err report cannot detect it. >>>>>> > >>>>>> > I think that hs_err report should output messages as below: >>>>>> > ------------- >>>>>> > Failed to write core dump. Core dumps may be treated with >>>>>> "/usr/sbin/chroot /proc/%P/root /usr/libexec/abrt-hook-ccpp %s %c %p >>>>>> %u %g %t e" >>>>>> > ------------- >>>>>> > >>>>>> > I've uploaded webrev of this enhancement. >>>>>> > Could you review it? >>>>>> > >>>>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.00/ >>>>>> > >>>>>> > This patch works fine on Fedora20 x86_64. >>>>>> > >>>>>> > >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > Yasumasa >>>>>> > >>>>>> From thomas.stuefe at gmail.com Tue Nov 25 14:12:19 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 25 Nov 2014 15:12:19 +0100 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process Message-ID: Hi all, I'd like to contribute a fix to error handling to improve stability of error reporting. Bug Report: https://bugs.openjdk.java.net/browse/JDK-8065895 Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.00/ Problem: When a synchronous error signal happens during error reporting, and the signal is different from the original signal which triggered error reporting, VM may die or hang (depends on platform). This causes empty or almost-empty hs-err files. Example: we first crash with a SIGILL (e.g in compiled code), then a SIGSEGV happens when printing stack trace. Secondary error handling should catch the SIGSEGV and continue error reporting with the next step. But that does not work in this case. Causes: - hotspot blocks all signals when installing signal handlers. Within the secondary signal handler, only the original signal gets unblocked, the rest remained blocked. If another synchronous error signal happens, it is still blocked. If the second signal is a synchronous signal, the OS would terminate the process right away because there is no way to defer synchronous error signals. - when installing signal handlers for secondary error handling, only signal handlers for SIGBUS and SIGSEGV were added; but more signals may happen during error handling (we saw SIGTRAP, SIGILL, ..etc). Fix: secondary signal handler is installed for all synchronous error signals (which is now a list and easily expandable in vmError_.cpp). All those signals are unblocked. In order to test the fix, some test code was added too: a) debug.cpp: changed "test_error_handler()" to a more generic "controlled_crash(int how)", which can be called at arbitrary places, not only at initialization time. "test_error_handler()" still exists and just calls "controlled_crash(ErrorHandlerTest)", so its behaviour did not change. b) expand controlled_crash(): - added option 14, a guaranteed crash with a SIGSEGV at a predefined address, which is printed out and can later be tested against. Note that I realize that this is a bit redundant to option 12 or 13, but the crash is guaranteed and it crashes with a not-null address which should turn up in hs-err file (to check that hs-err file is correct). - added option 15, a guaranteed crash with a SIGILL at a predefined instruction address. Here, the point is to get a real-world SIGILL (not just raising it) at a not-null known pc. c) Add a parameter "-XX:TestCrashDuringErrorHandler=", which works the same as "-XX:ErrorHandlerTest=". This parameter is used to trigger controlled crashes inside the error handler. That way secondary error handling can be tested. (a)-(c) allow us to test the fixes manually, for example: java -XX:ErrorHandlerTest=15 -XX:TestCrashDuringErrorHandler=14 causes a SIGILL during initialization, and a secondary SIGSEGV inside error handling. This demonstrates the effect of the bug. Without the fix, the VM will abort right away without finishing the hs-err file. -- I am in the process of writing some JTreg Tests, but I would like to put those into a separate change. This is because there are more fixes to error reporting in our pipeline and I'd like to bundle the jtreg tests in one change. Kind Regards, Thomas Stuefe From david.holmes at oracle.com Wed Nov 26 01:29:28 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 26 Nov 2014 11:29:28 +1000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: References: Message-ID: <54752CF8.5070408@oracle.com> Hi Thomas, A few quick comments as I need to think more about this: - On Solaris we use the UI thread API thr_* not pthreads - In debug.cpp for the SIGILL can you define the all zero case as a default so we only need to add platform specific definitions when all zeroes doesn't work. I really hate seeing all that CPU selection in shared code. :( - Style nit: please use i++ rather than i ++ Aside: we should eradicate the use of sigprocmask and replace with the thread specific version. Getting back to the "thinking more about this" ... If a synchronous signal is blocked at the time it is generated then it should remain pending on the thread (POSIX spec) but that doesn't tell us what the thread will then do - retry the faulting instruction? Become unschedulable? So I can easily imagine that a hang or process termination may result. In that sense unblocking those signals whilst handling the initial signal may well allow the error reporting process to continue further. But I'm unclear exactly how this plays out: - synchronous signal encountered - crash_handler invoked - VMError::report_and_die executes - secondary signal encountered - crash_handler invoked again - VMError::report_and_die executes again and sees the recursion and returns (ignoring abort due to excessive recursive errors) Is that right? So we actually return from the crash_handler? Because this puts us in undefined territory according to POSIX: "The behavior of a process is undefined after it returns normally from a signal-catching function for a SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(), sigqueue(), or raise()." On top of that you also have the issue that error reporting does a whole bunch of things that are not async-signal-safe so we can easily encounter hangs or aborts. But we're dying anyway so I guess none of this really matters. If re-enabling these signals allows error reporting to progress further in some cases then that is a win. Cheers, David On 26/11/2014 12:12 AM, Thomas St?fe wrote: > Hi all, > > I'd like to contribute a fix to error handling to improve stability of > error reporting. > > > Bug Report: > https://bugs.openjdk.java.net/browse/JDK-8065895 > > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.00/ > > > Problem: > > When a synchronous error signal happens during error reporting, and the > signal is different from the original signal which triggered error > reporting, VM may die or hang (depends on platform). This causes empty or > almost-empty hs-err files. > > Example: we first crash with a SIGILL (e.g in compiled code), then a > SIGSEGV happens when printing stack trace. > > Secondary error handling should catch the SIGSEGV and continue error > reporting with the next step. But that does not work in this case. > > Causes: > - hotspot blocks all signals when installing signal handlers. Within the > secondary signal handler, only the original signal gets unblocked, the rest > remained blocked. If another synchronous error signal happens, it is still > blocked. If the second signal is a synchronous signal, the OS would > terminate the process right away because there is no way to defer > synchronous error signals. > - when installing signal handlers for secondary error handling, only > signal handlers for SIGBUS and SIGSEGV were added; but more signals may > happen during error handling (we saw SIGTRAP, SIGILL, ..etc). > > Fix: > secondary signal handler is installed for all synchronous error signals > (which is now a list and easily expandable in vmError_.cpp). All those > signals are unblocked. > > In order to test the fix, some test code was added too: > > a) debug.cpp: changed "test_error_handler()" to a more generic > "controlled_crash(int how)", which can be called at arbitrary places, not > only at initialization time. "test_error_handler()" still exists and just > calls "controlled_crash(ErrorHandlerTest)", so its behaviour did not change. > > b) expand controlled_crash(): > - added option 14, a guaranteed crash with a SIGSEGV at a predefined > address, which is printed out and can later be tested against. Note that I > realize that this is a bit redundant to option 12 or 13, but the crash is > guaranteed and it crashes with a not-null address which should turn up in > hs-err file (to check that hs-err file is correct). > - added option 15, a guaranteed crash with a SIGILL at a predefined > instruction address. Here, the point is to get a real-world SIGILL (not > just raising it) at a not-null known pc. > > c) Add a parameter "-XX:TestCrashDuringErrorHandler=", which works the > same as "-XX:ErrorHandlerTest=". This parameter is used to trigger > controlled crashes inside the error handler. That way secondary error > handling can be tested. > > (a)-(c) allow us to test the fixes manually, for example: > > java -XX:ErrorHandlerTest=15 -XX:TestCrashDuringErrorHandler=14 > > causes a SIGILL during initialization, and a secondary SIGSEGV inside error > handling. This demonstrates the effect of the bug. Without the fix, the VM > will abort right away without finishing the hs-err file. > > -- > > I am in the process of writing some JTreg Tests, but I would like to put > those into a separate change. This is because there are more fixes to error > reporting in our pipeline and I'd like to bundle the jtreg tests in one > change. > > Kind Regards, > > Thomas Stuefe > From yumin.qi at oracle.com Wed Nov 26 01:36:47 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 25 Nov 2014 17:36:47 -0800 Subject: RFR(XS): 8053995: Add method to WhiteBox to get vm pagesize. In-Reply-To: <53DAC336.6050302@oracle.com> References: <53DAC336.6050302@oracle.com> Message-ID: <54752EAF.4020404@oracle.com> Please review bugs: https://bugs.openjdk.java.net/browse/JDK-8053995 webrev: http://cr.openjdk.java.net/~minqi/8053995/webrev01/ Now the API usage is in internal test case, see separate email for the webrev. It is same as previous version (webrev00). Thanks Yumin On 7/31/14, 3:29 PM, Yumin Qi wrote: > Please review: > > http://cr.openjdk.java.net/~minqi/8053995/webrev00/ > > Summary: Currently there is no java API to get underlying OS native VM > page size unless using Unsafe which is not recommended. The new added > method to WhiteBox can read this property and used in test. > > > Tests: JPRT and jtreg. > > Thanks > Yumin From david.holmes at oracle.com Wed Nov 26 01:54:08 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 26 Nov 2014 11:54:08 +1000 Subject: RFR(XS): 8053995: Add method to WhiteBox to get vm pagesize. In-Reply-To: <54752EAF.4020404@oracle.com> References: <53DAC336.6050302@oracle.com> <54752EAF.4020404@oracle.com> Message-ID: <547532C0.4080500@oracle.com> Hi Yumin, On 26/11/2014 11:36 AM, Yumin Qi wrote: > Please review > > bugs: https://bugs.openjdk.java.net/browse/JDK-8053995 > webrev: http://cr.openjdk.java.net/~minqi/8053995/webrev01/ The test also needs to ensure the testlibrary gets built. Otherwise seems okay. Thanks, David > Now the API usage is in internal test case, see separate email for the > webrev. > > It is same as previous version (webrev00). > > Thanks > Yumin > > On 7/31/14, 3:29 PM, Yumin Qi wrote: >> Please review: >> >> http://cr.openjdk.java.net/~minqi/8053995/webrev00/ >> >> Summary: Currently there is no java API to get underlying OS native VM >> page size unless using Unsafe which is not recommended. The new added >> method to WhiteBox can read this property and used in test. >> >> >> Tests: JPRT and jtreg. >> >> Thanks >> Yumin From yasuenag at gmail.com Wed Nov 26 03:39:33 2014 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 26 Nov 2014 12:39:33 +0900 Subject: RFR: JDK-8059586: hs_err report should treat redirected core pattern. In-Reply-To: <54744022.2030208@oracle.com> References: <542C8274.3010809@gmail.com> <54338B70.9080709@oracle.com> <543B1FD6.3000200@oracle.com> <543CF553.80601@gmail.com> <543DC2BF.9050407@oracle.com> <543E80F8.3080204@gmail.com> <547330E5.1050708@gmail.com> <54744022.2030208@oracle.com> Message-ID: Hi David, Thank you for reviewing! I will fix it after discussion with Staffan. Thanks Yasumasa 2014/11/25 17:39 "David Holmes" : > Sorry Yasumasa, this fell off my radar and I was hoping for others to > comment. We still need a second reviewer. > > The change in: > src/os/aix/vm/os_aix.cpp > src/os/solaris/vm/os_solaris.cpp > > jio_snprintf(buffer, bufferSize, "%s/core or core.%d", > current_process_id()); > > has no argument for the %s - presumably p was intended. > > Thanks, > David > > On 24/11/2014 11:21 PM, Yasumasa Suenaga wrote: > >> Hi all, >> >> I've uploaded webrev for this issue about a month ago. >> Could you review it and sponsor it? >> >> >> Thanks, >> >> Yasumasa >> >> >> On 10/15/2014 11:13 PM, Yasumasa Suenaga wrote: >> >>> Hi David, >>> >>> I've uploaded new webrev: >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.02/ >>> >>> >>> I wasn't suggesting that you make such a change though because it is >>>> large and disruptive. >>>> >>> >>> Unfactoring check_or_create_dump is a step backwards in terms of code >>>> sharing. >>>> >>> >>> I restored check_or_create_dump() to os_posix.cpp . >>> And I changed get_core_path() to create message which represents core >>> dump path >>> (including filename) in each OS. >>> >>> >>> Expanding the get_core_path in os_linux.cpp to handle the >>>> core_pattern may be okay (but I don't know enough about it to >>>> validate everything). >>>> >>> >>> I implemented all parameters in Linux kernel documentation: >>> https://www.kernel.org/doc/Documentation/sysctl/kernel.txt >>> >>> So I think that parameters which are processed are enough. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> >>> (2014/10/15 9:41), David Holmes wrote: >>> >>>> On 14/10/2014 8:05 PM, Yasumasa Suenaga wrote: >>>> >>>>> Hi David, >>>>> >>>>> Thank you for comments! >>>>> I've uploaded new webrev. Could you review it again? >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.01/ >>>>> >>>>> I am an author of jdk9. So I cannot commit it. >>>>> Could you be a sponsor for this enhancement? >>>>> >>>>> >>>>> In which case that should be handled by the linux specific >>>>>> get_core_path() function. >>>>>> >>>>> >>>>> Agree. >>>>> So I implemented it in os_linux.cpp . >>>>> But part of format characters (%P: global pid, %s: signal, %t dump >>>>> time) >>>>> are not processed >>>>> in this function because I think these parameters are difficult to >>>>> handle in it. >>>>> >>>>> %P: I could not find API for this. >>>>> %s: We have to change arguments of get_core_path() . >>>>> %t: This parameter means timestamp of coredump. It is decided in >>>>> Kernel. >>>>> >>>>> >>>>> Fixing this means changing all the os_posix using platforms. But your >>>>>> patch is not about this part. :) >>>>>> >>>>> >>>>> I moved os::check_or_create_dump() to each OS implementations (AIX, >>>>> BSD, >>>>> Solaris, Linux) . >>>>> So I can write Linux specific code to check_or_create_dump() . >>>>> As a result, I could remove "#ifdef LINUX" from os_posix.cpp :-) >>>>> >>>> >>>> I wasn't suggesting that you make such a change though because it is >>>> large and disruptive. The simple handling of the | part of >>>> core_pattern was basically ok. Expanding the get_core_path in >>>> os_linux.cpp to handle the core_pattern may be okay (but I don't know >>>> enough about it to validate everything). Unfactoring >>>> check_or_create_dump is a step backwards in terms of code sharing. >>>> >>>> Sorry this has grown too large for me to deal with right now. >>>> >>>> David >>>> ----- >>>> >>>> >>>>> Though I'm unclear whether it both invokes the program and creates a >>>>>> core dump file; or just invokes the program? >>>>>> >>>>> >>>>> If '|' is set, Linux kernel will just redirect core image to user >>>>> process. >>>>> Kernel documentation says as below: >>>>> ------------ >>>>> . If the first character of the pattern is a '|', the kernel will treat >>>>> the rest of the pattern as a command to run. The core dump will be >>>>> written to the standard input of that program instead of to a file. >>>>> ------------ >>>>> >>>>> And implementation of coredump (do_coredump()) follows to it. >>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/ >>>>> linux.git/tree/fs/coredump.c >>>>> >>>>> >>>>> >>>>> In case of ABRT, ABRT dumps core image to default location >>>>> (/core.) >>>>> if user set unlimited to resource limit of core (ulimit -c) . >>>>> https://github.com/abrt/abrt/blob/master/src/hooks/abrt-hook-ccpp.c >>>>> >>>>> >>>>> A few style nits - you need spaces around keywords and before braces >>>>>> I also suggest saying "Core dumps may be processed with ..." rather >>>>>> than "treated". >>>>>> And as you don't do anything in the non-redirect case I suggest >>>>>> collapsing this: >>>>>> >>>>> >>>>> I've fixed them. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> (2014/10/13 9:41), David Holmes wrote: >>>>> >>>>>> Hi Yasumasa, >>>>>> >>>>>> On 7/10/2014 8:48 PM, Yasumasa Suenaga wrote: >>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> Sorry for my English. >>>>>>> >>>>>>> I want to propose that JVM should create message according to core >>>>>>> pattern (/proc/sys/kernel/core_pattern) . >>>>>>> So I filed it to JBS and created a patch. >>>>>>> >>>>>> >>>>>> So I've had a quick look at this core_pattern business and it seems to >>>>>> me that there are two aspects to this. >>>>>> >>>>>> First, without the leading |, the entry in the core_pattern file is a >>>>>> naming pattern for the core file. In which case that should be handled >>>>>> by the linux specific get_core_path() function. Though that in itself >>>>>> can't fully report the expected name, as part of it is provided in the >>>>>> shared code in os::check_or_create_dump. Fixing this means changing >>>>>> all the os_posix using platforms. But your patch is not about this >>>>>> part. :) >>>>>> >>>>>> Second, with a leading | the core_pattern is actually the name of a >>>>>> program to execute when the program is about to core dump, and that is >>>>>> what you report with your patch. Though I'm unclear whether it both >>>>>> invokes the program and creates a core dump file; or just invokes the >>>>>> program? >>>>>> >>>>>> So with regards to this second part your patch seems functionally ok. >>>>>> I do dislike having a big chunk of linux specific code in this "posix" >>>>>> support file but ... >>>>>> >>>>>> A few style nits - you need spaces around keywords and before >>>>>> braces eg: >>>>>> >>>>>> if(x){ >>>>>> >>>>>> should be >>>>>> >>>>>> if (x) { >>>>>> >>>>>> I also suggest saying "Core dumps may be processed with ..." rather >>>>>> than "treated". >>>>>> >>>>>> And as you don't do anything in the non-redirect case I suggest >>>>>> collapsing this: >>>>>> >>>>>> 83 is_redirect = core_pattern[0] == '|'; >>>>>> 84 } >>>>>> 85 >>>>>> 86 if(is_redirect){ >>>>>> 87 jio_snprintf(buffer, bufferSize, >>>>>> 88 "Core dumps may be treated with \"%s\"", >>>>>> &core_pattern[1]); >>>>>> 89 } >>>>>> >>>>>> to just >>>>>> >>>>>> 83 if (core_pattern[0] == '|') { // redirect >>>>>> 84 jio_snprintf(buffer, bufferSize, "Core dumps may be >>>>>> processed with \"%s\"", &core_pattern[1]); >>>>>> 85 } >>>>>> 86 } >>>>>> >>>>>> Comments from other runtime folk appreciated. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> 2014/10/07 15:43 "David Holmes" >>>>>> >: >>>>>>> >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> I'm sorry but I don't understand what you are proposing. When you >>>>>>> say >>>>>>> "treat" do you mean "create"? Otherwise what do you mean by >>>>>>> "treated"? >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 2/10/2014 8:38 AM, Yasumasa Suenaga wrote: >>>>>>> > I'm in Hackergarten @ JavaOne :-) >>>>>>> > >>>>>>> > >>>>>>> > Hi all, >>>>>>> > >>>>>>> > I would like to enhance the messages in hs_err report. >>>>>>> > Modern Linux kernel can treat core dump with user process >>>>>>> (e.g. ABRT) >>>>>>> > However, hs_err report cannot detect it. >>>>>>> > >>>>>>> > I think that hs_err report should output messages as below: >>>>>>> > ------------- >>>>>>> > Failed to write core dump. Core dumps may be treated with >>>>>>> "/usr/sbin/chroot /proc/%P/root /usr/libexec/abrt-hook-ccpp %s >>>>>>> %c %p >>>>>>> %u %g %t e" >>>>>>> > ------------- >>>>>>> > >>>>>>> > I've uploaded webrev of this enhancement. >>>>>>> > Could you review it? >>>>>>> > >>>>>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.00/ >>>>>>> > >>>>>>> > This patch works fine on Fedora20 x86_64. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > Thanks, >>>>>>> > >>>>>>> > Yasumasa >>>>>>> > >>>>>>> >>>>>>> From yasuenag at gmail.com Wed Nov 26 03:54:48 2014 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 26 Nov 2014 12:54:48 +0900 Subject: RFR: JDK-8059586: hs_err report should treat redirected core pattern. In-Reply-To: References: <542C8274.3010809@gmail.com> <54338B70.9080709@oracle.com> <543B1FD6.3000200@oracle.com> <543CF553.80601@gmail.com> <543DC2BF.9050407@oracle.com> <543E80F8.3080204@gmail.com> <547330E5.1050708@gmail.com> Message-ID: Hi Staffan, Thank you for reviewing! os_linux.cpp: I want to print coredump location correctly to hs_err. So I want to output whether coredump is processed in other process or is written to file. If os::get_core_path() should be more simply, I will print raw string in core_pattern. os_bsd.cpp: I don't have OS X. So I cannot check it. I am focusing Linux in this enhancement. Could you file it as another enhancement if it need? Thanks, Yasumasa 2014/11/25 18:15 "Staffan Larsen" : > src/os/bsd/vm/os_linux.cpp: > I?m inclined to think this is too complicated and hard to test and > maintain (and I see no tests in the webrev). Could we not simplify this to > print a helpful message instead? Something that prints the core_pattern and > perhaps some of the values that could be used for substitution, but does > not do the actual substitution? I think that would go a long way but be a > lot more maintainable. > > src/os/bsd/vm/os_bsd.cpp: > On OS X cores are by default written to /cores/core.. This is > configureable with the kern.corefile sysctl variable, although it is rare > to do so. > > /Staffan > > > On 24 nov 2014, at 14:21, Yasumasa Suenaga wrote: > > > > Hi all, > > > > I've uploaded webrev for this issue about a month ago. > > Could you review it and sponsor it? > > > > > > Thanks, > > > > Yasumasa > > > > > > On 10/15/2014 11:13 PM, Yasumasa Suenaga wrote: > >> Hi David, > >> > >> I've uploaded new webrev: > >> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.02/ > >> > >> > >>> I wasn't suggesting that you make such a change though because it is > large and disruptive. > >> > >>> Unfactoring check_or_create_dump is a step backwards in terms of code > sharing. > >> > >> I restored check_or_create_dump() to os_posix.cpp . > >> And I changed get_core_path() to create message which represents core > dump path > >> (including filename) in each OS. > >> > >> > >>> Expanding the get_core_path in os_linux.cpp to handle the core_pattern > may be okay (but I don't know enough about it to validate everything). > >> > >> I implemented all parameters in Linux kernel documentation: > >> https://www.kernel.org/doc/Documentation/sysctl/kernel.txt > >> > >> So I think that parameters which are processed are enough. > >> > >> > >> Thanks, > >> > >> Yasumasa > >> > >> > >> > >> (2014/10/15 9:41), David Holmes wrote: > >>> On 14/10/2014 8:05 PM, Yasumasa Suenaga wrote: > >>>> Hi David, > >>>> > >>>> Thank you for comments! > >>>> I've uploaded new webrev. Could you review it again? > >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.01/ > >>>> > >>>> I am an author of jdk9. So I cannot commit it. > >>>> Could you be a sponsor for this enhancement? > >>>> > >>>> > >>>>> In which case that should be handled by the linux specific > >>>>> get_core_path() function. > >>>> > >>>> Agree. > >>>> So I implemented it in os_linux.cpp . > >>>> But part of format characters (%P: global pid, %s: signal, %t dump > time) > >>>> are not processed > >>>> in this function because I think these parameters are difficult to > >>>> handle in it. > >>>> > >>>> %P: I could not find API for this. > >>>> %s: We have to change arguments of get_core_path() . > >>>> %t: This parameter means timestamp of coredump. It is decided in > Kernel. > >>>> > >>>> > >>>>> Fixing this means changing all the os_posix using platforms. But your > >>>>> patch is not about this part. :) > >>>> > >>>> I moved os::check_or_create_dump() to each OS implementations (AIX, > BSD, > >>>> Solaris, Linux) . > >>>> So I can write Linux specific code to check_or_create_dump() . > >>>> As a result, I could remove "#ifdef LINUX" from os_posix.cpp :-) > >>> > >>> I wasn't suggesting that you make such a change though because it is > large and disruptive. The simple handling of the | part of core_pattern was > basically ok. Expanding the get_core_path in os_linux.cpp to handle the > core_pattern may be okay (but I don't know enough about it to validate > everything). Unfactoring check_or_create_dump is a step backwards in terms > of code sharing. > >>> > >>> Sorry this has grown too large for me to deal with right now. > >>> > >>> David > >>> ----- > >>> > >>>> > >>>>> Though I'm unclear whether it both invokes the program and creates a > >>>>> core dump file; or just invokes the program? > >>>> > >>>> If '|' is set, Linux kernel will just redirect core image to user > process. > >>>> Kernel documentation says as below: > >>>> ------------ > >>>> . If the first character of the pattern is a '|', the kernel will > treat > >>>> the rest of the pattern as a command to run. The core dump will be > >>>> written to the standard input of that program instead of to a file. > >>>> ------------ > >>>> > >>>> And implementation of coredump (do_coredump()) follows to it. > >>>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c > >>>> > >>>> > >>>> In case of ABRT, ABRT dumps core image to default location > >>>> (/core.) > >>>> if user set unlimited to resource limit of core (ulimit -c) . > >>>> https://github.com/abrt/abrt/blob/master/src/hooks/abrt-hook-ccpp.c > >>>> > >>>> > >>>>> A few style nits - you need spaces around keywords and before braces > >>>>> I also suggest saying "Core dumps may be processed with ..." rather > >>>>> than "treated". > >>>>> And as you don't do anything in the non-redirect case I suggest > >>>>> collapsing this: > >>>> > >>>> I've fixed them. > >>>> > >>>> > >>>> Thanks, > >>>> > >>>> Yasumasa > >>>> > >>>> > >>>> (2014/10/13 9:41), David Holmes wrote: > >>>>> Hi Yasumasa, > >>>>> > >>>>> On 7/10/2014 8:48 PM, Yasumasa Suenaga wrote: > >>>>>> Hi David, > >>>>>> > >>>>>> Sorry for my English. > >>>>>> > >>>>>> I want to propose that JVM should create message according to core > >>>>>> pattern (/proc/sys/kernel/core_pattern) . > >>>>>> So I filed it to JBS and created a patch. > >>>>> > >>>>> So I've had a quick look at this core_pattern business and it seems > to > >>>>> me that there are two aspects to this. > >>>>> > >>>>> First, without the leading |, the entry in the core_pattern file is a > >>>>> naming pattern for the core file. In which case that should be > handled > >>>>> by the linux specific get_core_path() function. Though that in itself > >>>>> can't fully report the expected name, as part of it is provided in > the > >>>>> shared code in os::check_or_create_dump. Fixing this means changing > >>>>> all the os_posix using platforms. But your patch is not about this > >>>>> part. :) > >>>>> > >>>>> Second, with a leading | the core_pattern is actually the name of a > >>>>> program to execute when the program is about to core dump, and that > is > >>>>> what you report with your patch. Though I'm unclear whether it both > >>>>> invokes the program and creates a core dump file; or just invokes the > >>>>> program? > >>>>> > >>>>> So with regards to this second part your patch seems functionally ok. > >>>>> I do dislike having a big chunk of linux specific code in this > "posix" > >>>>> support file but ... > >>>>> > >>>>> A few style nits - you need spaces around keywords and before braces > eg: > >>>>> > >>>>> if(x){ > >>>>> > >>>>> should be > >>>>> > >>>>> if (x) { > >>>>> > >>>>> I also suggest saying "Core dumps may be processed with ..." rather > >>>>> than "treated". > >>>>> > >>>>> And as you don't do anything in the non-redirect case I suggest > >>>>> collapsing this: > >>>>> > >>>>> 83 is_redirect = core_pattern[0] == '|'; > >>>>> 84 } > >>>>> 85 > >>>>> 86 if(is_redirect){ > >>>>> 87 jio_snprintf(buffer, bufferSize, > >>>>> 88 "Core dumps may be treated with \"%s\"", > >>>>> &core_pattern[1]); > >>>>> 89 } > >>>>> > >>>>> to just > >>>>> > >>>>> 83 if (core_pattern[0] == '|') { // redirect > >>>>> 84 jio_snprintf(buffer, bufferSize, "Core dumps may be > >>>>> processed with \"%s\"", &core_pattern[1]); > >>>>> 85 } > >>>>> 86 } > >>>>> > >>>>> Comments from other runtime folk appreciated. > >>>>> > >>>>> Thanks, > >>>>> David > >>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Yasumasa > >>>>>> > >>>>>> 2014/10/07 15:43 "David Holmes" >>>>>> >: > >>>>>> > >>>>>> Hi Yasumasa, > >>>>>> > >>>>>> I'm sorry but I don't understand what you are proposing. When you > >>>>>> say > >>>>>> "treat" do you mean "create"? Otherwise what do you mean by > >>>>>> "treated"? > >>>>>> > >>>>>> Thanks, > >>>>>> David > >>>>>> > >>>>>> On 2/10/2014 8:38 AM, Yasumasa Suenaga wrote: > >>>>>> > I'm in Hackergarten @ JavaOne :-) > >>>>>> > > >>>>>> > > >>>>>> > Hi all, > >>>>>> > > >>>>>> > I would like to enhance the messages in hs_err report. > >>>>>> > Modern Linux kernel can treat core dump with user process > >>>>>> (e.g. ABRT) > >>>>>> > However, hs_err report cannot detect it. > >>>>>> > > >>>>>> > I think that hs_err report should output messages as below: > >>>>>> > ------------- > >>>>>> > Failed to write core dump. Core dumps may be treated with > >>>>>> "/usr/sbin/chroot /proc/%P/root /usr/libexec/abrt-hook-ccpp %s > %c %p > >>>>>> %u %g %t e" > >>>>>> > ------------- > >>>>>> > > >>>>>> > I've uploaded webrev of this enhancement. > >>>>>> > Could you review it? > >>>>>> > > >>>>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.00/ > >>>>>> > > >>>>>> > This patch works fine on Fedora20 x86_64. > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > Thanks, > >>>>>> > > >>>>>> > Yasumasa > >>>>>> > > >>>>>> > > From thomas.stuefe at gmail.com Wed Nov 26 07:06:52 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 26 Nov 2014 08:06:52 +0100 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <54752CF8.5070408@oracle.com> References: <54752CF8.5070408@oracle.com> Message-ID: Hi David, thanks for looking at this. Here is the updated webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.01/ See my comments below. On Wed, Nov 26, 2014 at 2:29 AM, David Holmes wrote: > Hi Thomas, > > A few quick comments as I need to think more about this: > > - On Solaris we use the UI thread API thr_* not pthreads > Fixed, now I use thr_sigsetmask() (though both sigprocmask and pthread_sigmask seemed to work too) > - In debug.cpp for the SIGILL can you define the all zero case as a > default so we only need to add platform specific definitions when all > zeroes doesn't work. I really hate seeing all that CPU selection in shared > code. :( > Agreed and fixed, moved the CPU-specific sections into CPU-specific files. > - Style nit: please use i++ rather than i ++ > > Fixed. Aside: we should eradicate the use of sigprocmask and replace with the > thread specific version. > > Agree. Though I never saw any errors stemming from the use of sigprocmask(). According to POSIX, sigprocmask() is undefined in multithreaded environment, and I guess most OSes just default to pthread_sigmask. > Getting back to the "thinking more about this" ... If a synchronous signal > is blocked at the time it is generated then it should remain pending on the > thread (POSIX spec) but that doesn't tell us what the thread will then do - > retry the faulting instruction? Become unschedulable? So I can easily > imagine that a hang or process termination may result. This is exactly what happens, but it is actually covered by POSIX, see doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are generated while they are blocked, the result is undefined, unless the signal was generated by the *kill*() function, the *sigqueue*() function, or the *raise*() function." In reality, process usually aborts abnormally with the default action for the signal, e.g. printing out "Illegal Instruction". On MacOS, we hang (until the Watcherthread finally kills the VM). On old AIXes, we die without a trace. This also can be easily tried out by removing SIGILL from the list of signals in vmError_.cpp and executing: java -XX:ErrorHandlerTest=14 -XX:TestCrashInErrorHandler=15 which will crash first with a SIGSEGV, then in error handling with a secondary SIGILL. This will interrupt error reporting and kill or hang the process. In that sense unblocking those signals whilst handling the initial signal > may well allow the error reporting process to continue further. But I'm > unclear exactly how this plays out: > > - synchronous signal encountered > - crash_handler invoked > - VMError::report_and_die executes > - secondary signal encountered > - crash_handler invoked again > almost: not again, different signal handler now. First signal was handled by "JVM_handle__signal()" > - VMError::report_and_die executes again and sees the recursion and > returns (ignoring abort due to excessive recursive errors) > > No.. Is that right? So we actually return from the crash_handler? Oh, but we dont return. VMError::report_and_die() will just create a new frame and re-execute VMError::report() of the first VMError object. Which then will continue with the next STEP. We never return, for each secondary error signal a new frame is created. This all happens in VMError::report_and_die: -> first error ? anchor VMError object in a static variable and execute VMError::report() -> secondary error? -> different thread? just sleep forever -> same thread? new frame, re-enter VMError::report(). Once done, abort. I always found that rather neat, but in fact that is not our invention but Sun's :) Anyway, my fix does not change this behaviour for better or worse, it only makes it usable for more cases. > Because this puts us in undefined territory according to POSIX: > > "The behavior of a process is undefined after it returns normally from a > signal-catching function for a SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal > that was not generated by kill(), sigqueue(), or raise()." > > true, but we dont return... > On top of that you also have the issue that error reporting does a whole > bunch of things that are not async-signal-safe so we can easily encounter > hangs or aborts. > > But we're dying anyway so I guess none of this really matters. If > re-enabling these signals allows error reporting to progress further in > some cases then that is a win. > > Actually, this covers a lot of cases, mostly because SIGSEGV during error reporting is common, so if the original error was not SIGSEGV, but e.g. SIGILL, this would always result in broken hs-err files. The back story is that at SAP, we rely heavily on the hs-err files. They are our main tool for support, because working with cores is often not feasible. So, we put a lot of work in making error reporting reliable across all platforms. This is also covered by many tests which crash the VM in exciting ways and check the hs-err files for completeness. Kind Regards, Thomas > Cheers, > David > > > On 26/11/2014 12:12 AM, Thomas St?fe wrote: > >> Hi all, >> >> I'd like to contribute a fix to error handling to improve stability of >> error reporting. >> >> >> Bug Report: >> https://bugs.openjdk.java.net/browse/JDK-8065895 >> >> >> Webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.00/ >> >> >> Problem: >> >> When a synchronous error signal happens during error reporting, and the >> signal is different from the original signal which triggered error >> reporting, VM may die or hang (depends on platform). This causes empty or >> almost-empty hs-err files. >> >> Example: we first crash with a SIGILL (e.g in compiled code), then a >> SIGSEGV happens when printing stack trace. >> >> Secondary error handling should catch the SIGSEGV and continue error >> reporting with the next step. But that does not work in this case. >> >> Causes: >> - hotspot blocks all signals when installing signal handlers. Within >> the >> secondary signal handler, only the original signal gets unblocked, the >> rest >> remained blocked. If another synchronous error signal happens, it is still >> blocked. If the second signal is a synchronous signal, the OS would >> terminate the process right away because there is no way to defer >> synchronous error signals. >> - when installing signal handlers for secondary error handling, only >> signal handlers for SIGBUS and SIGSEGV were added; but more signals may >> happen during error handling (we saw SIGTRAP, SIGILL, ..etc). >> >> Fix: >> secondary signal handler is installed for all synchronous error signals >> (which is now a list and easily expandable in vmError_.cpp). All those >> signals are unblocked. >> >> In order to test the fix, some test code was added too: >> >> a) debug.cpp: changed "test_error_handler()" to a more generic >> "controlled_crash(int how)", which can be called at arbitrary places, not >> only at initialization time. "test_error_handler()" still exists and just >> calls "controlled_crash(ErrorHandlerTest)", so its behaviour did not >> change. >> >> b) expand controlled_crash(): >> - added option 14, a guaranteed crash with a SIGSEGV at a predefined >> address, which is printed out and can later be tested against. Note that I >> realize that this is a bit redundant to option 12 or 13, but the crash is >> guaranteed and it crashes with a not-null address which should turn up in >> hs-err file (to check that hs-err file is correct). >> - added option 15, a guaranteed crash with a SIGILL at a predefined >> instruction address. Here, the point is to get a real-world SIGILL (not >> just raising it) at a not-null known pc. >> >> c) Add a parameter "-XX:TestCrashDuringErrorHandler=", which works the >> same as "-XX:ErrorHandlerTest=". This parameter is used to trigger >> controlled crashes inside the error handler. That way secondary error >> handling can be tested. >> >> (a)-(c) allow us to test the fixes manually, for example: >> >> java -XX:ErrorHandlerTest=15 -XX:TestCrashDuringErrorHandler=14 >> >> causes a SIGILL during initialization, and a secondary SIGSEGV inside >> error >> handling. This demonstrates the effect of the bug. Without the fix, the VM >> will abort right away without finishing the hs-err file. >> >> -- >> >> I am in the process of writing some JTreg Tests, but I would like to put >> those into a separate change. This is because there are more fixes to >> error >> reporting in our pipeline and I'd like to bundle the jtreg tests in one >> change. >> >> Kind Regards, >> >> Thomas Stuefe >> >> From david.holmes at oracle.com Wed Nov 26 09:31:42 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 26 Nov 2014 19:31:42 +1000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: References: <54752CF8.5070408@oracle.com> Message-ID: <54759DFE.7020300@oracle.com> Hi Thomas, On 26/11/2014 5:06 PM, Thomas St?fe wrote: > Hi David, > > thanks for looking at this. Here is the updated webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.01/ > > See my comments below. > > On Wed, Nov 26, 2014 at 2:29 AM, David Holmes > wrote: > > Hi Thomas, > > A few quick comments as I need to think more about this: > > - On Solaris we use the UI thread API thr_* not pthreads > > > Fixed, now I use thr_sigsetmask() (though both sigprocmask and > pthread_sigmask seemed to work too) Thanks. They are interchangeable semantically but for consistency I prefer not to mix them. > - In debug.cpp for the SIGILL can you define the all zero case as a > default so we only need to add platform specific definitions when > all zeroes doesn't work. I really hate seeing all that CPU selection > in shared code. :( > > > Agreed and fixed, moved the CPU-specific sections into CPU-specific files. I'd really like to see a way to share the all-zeroes case so that we don't need to add platform specific code unnecessarily. > - Style nit: please use i++ rather than i ++ > > > Fixed. > > Aside: we should eradicate the use of sigprocmask and replace with > the thread specific version. > > > Agree. Though I never saw any errors stemming from the use of > sigprocmask(). According to POSIX, sigprocmask() is undefined in > multithreaded environment, and I guess most OSes just default to > pthread_sigmask. Yes "probably" works okay but I hate to see us using something with undefined semantics. That's future clean up though. > Getting back to the "thinking more about this" ... If a synchronous > signal is blocked at the time it is generated then it should remain > pending on the thread (POSIX spec) but that doesn't tell us what the > thread will then do - retry the faulting instruction? Become > unschedulable? So I can easily imagine that a hang or process > termination may result. > > > This is exactly what happens, but it is actually covered by POSIX, see > doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, SIGSEGV, or > SIGBUS signals are generated while they are blocked, the result is > undefined, unless the signal was generated by the /kill/() > > function, the /sigqueue/() > > function, or the /raise/() > > function." Thanks - I managed to miss that part even though I found the other part about the signal handling function returning. :( > In reality, process usually aborts abnormally with the default action > for the signal, e.g. printing out "Illegal Instruction". On MacOS, we > hang (until the Watcherthread finally kills the VM). On old AIXes, we > die without a trace. > > This also can be easily tried out by removing SIGILL from the list of > signals in vmError_.cpp and executing: > > java -XX:ErrorHandlerTest=14 -XX:TestCrashInErrorHandler=15 > > which will crash first with a SIGSEGV, then in error handling with a > secondary SIGILL. This will interrupt error reporting and kill or hang > the process. > > > In that sense unblocking those signals whilst handling the initial > signal may well allow the error reporting process to continue > further. But I'm unclear exactly how this plays out: > > - synchronous signal encountered > - crash_handler invoked > > - VMError::report_and_die executes > - secondary signal encountered > > - crash_handler invoked again > > > almost: not again, different signal handler now. First signal was > handled by "JVM_handle__signal()" Ah missed that - thanks - not that it makes much difference :) > - VMError::report_and_die executes again and sees the recursion and > returns (ignoring abort due to excessive recursive errors) > > > No.. > > Is that right? So we actually return from the crash_handler? > > > Oh, but we dont return. VMError::report_and_die() will just create a new > frame and re-execute VMError::report() of the first VMError object. > Which then will continue with the next STEP. We never return, for each > secondary error signal a new frame is created. I had trouble tracing through exactly what might happen on the recursive call to report_and_die. I see know that report comes from: staticBufferStream sbs(buffer, O_BUFLEN, &log); first_error->report(&sbs); first_error->_current_step = 0; // reset current_step first_error->_current_step_info = ""; // reset current_step string so the second time through we will call report and _current_step should indicate where to start executing from. > This all happens in VMError::report_and_die: > -> first error ? anchor VMError object in a static variable and execute > VMError::report() > -> secondary error? > -> different thread? just sleep forever > -> same thread? new frame, re-enter VMError::report(). Once done, abort. > > I always found that rather neat, but in fact that is not our invention > but Sun's :) Anyway, my fix does not change this behaviour for better or > worse, it only makes it usable for more cases. > > Because this puts us in undefined territory according to POSIX: > > "The behavior of a process is undefined after it returns normally > from a signal-catching function for a SIGBUS, SIGFPE, SIGILL, or > SIGSEGV signal that was not generated by kill(), sigqueue(), or > raise()." > > true, but we dont return... > > On top of that you also have the issue that error reporting does a > whole bunch of things that are not async-signal-safe so we can > easily encounter hangs or aborts. > > But we're dying anyway so I guess none of this really matters. If > re-enabling these signals allows error reporting to progress further > in some cases then that is a win. > > > Actually, this covers a lot of cases, mostly because SIGSEGV during > error reporting is common, so if the original error was not SIGSEGV, but > e.g. SIGILL, this would always result in broken hs-err files. > > The back story is that at SAP, we rely heavily on the hs-err files. They > are our main tool for support, because working with cores is often not > feasible. So, we put a lot of work in making error reporting reliable > across all platforms. This is also covered by many tests which crash the > VM in exciting ways and check the hs-err files for completeness. OK. Modulo the cpu specific SIGILL part everything else seems fine. Thanks, David > Kind Regards, Thomas > > Cheers, > David > > > On 26/11/2014 12:12 AM, Thomas St?fe wrote: > > Hi all, > > I'd like to contribute a fix to error handling to improve > stability of > error reporting. > > > Bug Report: > https://bugs.openjdk.java.net/__browse/JDK-8065895 > > > > Webrev: > http://cr.openjdk.java.net/~__stuefe/webrevs/8065895/webrev.__00/ > > > Problem: > > When a synchronous error signal happens during error reporting, > and the > signal is different from the original signal which triggered error > reporting, VM may die or hang (depends on platform). This causes > empty or > almost-empty hs-err files. > > Example: we first crash with a SIGILL (e.g in compiled code), then a > SIGSEGV happens when printing stack trace. > > Secondary error handling should catch the SIGSEGV and continue error > reporting with the next step. But that does not work in this case. > > Causes: > - hotspot blocks all signals when installing signal > handlers. Within the > secondary signal handler, only the original signal gets > unblocked, the rest > remained blocked. If another synchronous error signal happens, > it is still > blocked. If the second signal is a synchronous signal, the OS would > terminate the process right away because there is no way to defer > synchronous error signals. > - when installing signal handlers for secondary error > handling, only > signal handlers for SIGBUS and SIGSEGV were added; but more > signals may > happen during error handling (we saw SIGTRAP, SIGILL, ..etc). > > Fix: > secondary signal handler is installed for all synchronous error > signals > (which is now a list and easily expandable in vmError_.cpp). > All those > signals are unblocked. > > In order to test the fix, some test code was added too: > > a) debug.cpp: changed "test_error_handler()" to a more generic > "controlled_crash(int how)", which can be called at arbitrary > places, not > only at initialization time. "test_error_handler()" still exists > and just > calls "controlled_crash(__ErrorHandlerTest)", so its behaviour > did not change. > > b) expand controlled_crash(): > - added option 14, a guaranteed crash with a SIGSEGV at a > predefined > address, which is printed out and can later be tested against. > Note that I > realize that this is a bit redundant to option 12 or 13, but the > crash is > guaranteed and it crashes with a not-null address which should > turn up in > hs-err file (to check that hs-err file is correct). > - added option 15, a guaranteed crash with a SIGILL at a > predefined > instruction address. Here, the point is to get a real-world > SIGILL (not > just raising it) at a not-null known pc. > > c) Add a parameter "-XX:__TestCrashDuringErrorHandler=<__n>", > which works the > same as "-XX:ErrorHandlerTest=". This parameter is used to > trigger > controlled crashes inside the error handler. That way secondary > error > handling can be tested. > > (a)-(c) allow us to test the fixes manually, for example: > > java -XX:ErrorHandlerTest=15 -XX:__TestCrashDuringErrorHandler=14 > > causes a SIGILL during initialization, and a secondary SIGSEGV > inside error > handling. This demonstrates the effect of the bug. Without the > fix, the VM > will abort right away without finishing the hs-err file. > > -- > > I am in the process of writing some JTreg Tests, but I would > like to put > those into a separate change. This is because there are more > fixes to error > reporting in our pipeline and I'd like to bundle the jtreg tests > in one > change. > > Kind Regards, > > Thomas Stuefe > > From thomas.stuefe at gmail.com Wed Nov 26 11:37:44 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 26 Nov 2014 12:37:44 +0100 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <54759DFE.7020300@oracle.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> Message-ID: Hi David, ... > - In debug.cpp for the SIGILL can you define the all zero case as a >> default so we only need to add platform specific definitions when >> all zeroes doesn't work. I really hate seeing all that CPU selection >> in shared code. :( >> >> >> Agreed and fixed, moved the CPU-specific sections into CPU-specific files. >> > > I'd really like to see a way to share the all-zeroes case so that we don't > need to add platform specific code unnecessarily. > > sooo.. back to the original code then, just with the #ifdef, just with the zero-cases all folded in into the #else path? Or do you prefer something else? > - Style nit: please use i++ rather than i ++ >> >> >> Fixed. >> >> Aside: we should eradicate the use of sigprocmask and replace with >> the thread specific version. >> >> >> Agree. Though I never saw any errors stemming from the use of >> sigprocmask(). According to POSIX, sigprocmask() is undefined in >> multithreaded environment, and I guess most OSes just default to >> pthread_sigmask. >> > > Yes "probably" works okay but I hate to see us using something with > undefined semantics. That's future clean up though. > > We (SAP JVM) already use pthread_sigmask() / thr_sigsetmask() instead of sigprocmask. Works fine. We can port this to the OpenJDK. > Getting back to the "thinking more about this" ... If a synchronous >> signal is blocked at the time it is generated then it should remain >> pending on the thread (POSIX spec) but that doesn't tell us what the >> thread will then do - retry the faulting instruction? Become >> unschedulable? So I can easily imagine that a hang or process >> termination may result. >> >> >> This is exactly what happens, but it is actually covered by POSIX, see >> doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, SIGSEGV, or >> SIGBUS signals are generated while they are blocked, the result is >> undefined, unless the signal was generated by the /kill/() >> >> function, the /sigqueue/() >> >> function, or the /raise/() >> >> function." >> > > Thanks - I managed to miss that part even though I found the other part > about the signal handling function returning. :( It is well hidden, I found it by accident :) To me it looks like they kept it intentionally vague, to not block platforms where those signals could be somehow dealt with automatically? Hard to see though how this would work. > > > In reality, process usually aborts abnormally with the default action >> for the signal, e.g. printing out "Illegal Instruction". On MacOS, we >> hang (until the Watcherthread finally kills the VM). On old AIXes, we >> die without a trace. >> >> This also can be easily tried out by removing SIGILL from the list of >> signals in vmError_.cpp and executing: >> >> java -XX:ErrorHandlerTest=14 -XX:TestCrashInErrorHandler=15 >> >> which will crash first with a SIGSEGV, then in error handling with a >> secondary SIGILL. This will interrupt error reporting and kill or hang >> the process. >> >> >> In that sense unblocking those signals whilst handling the initial >> signal may well allow the error reporting process to continue >> further. But I'm unclear exactly how this plays out: >> >> - synchronous signal encountered >> - crash_handler invoked >> >> - VMError::report_and_die executes >> - secondary signal encountered >> >> - crash_handler invoked again >> >> >> almost: not again, different signal handler now. First signal was >> handled by "JVM_handle__signal()" >> > > Ah missed that - thanks - not that it makes much difference :) > > I just like nitpicking :) > - VMError::report_and_die executes again and sees the recursion and >> returns (ignoring abort due to excessive recursive errors) >> >> >> No.. >> >> Is that right? So we actually return from the crash_handler? >> >> >> Oh, but we dont return. VMError::report_and_die() will just create a new >> frame and re-execute VMError::report() of the first VMError object. >> Which then will continue with the next STEP. We never return, for each >> secondary error signal a new frame is created. >> > > I had trouble tracing through exactly what might happen on the recursive > call to report_and_die. I see know that report comes from: > > staticBufferStream sbs(buffer, O_BUFLEN, &log); > first_error->report(&sbs); > first_error->_current_step = 0; // reset current_step > first_error->_current_step_info = ""; // reset current_step string > > so the second time through we will call report and _current_step should > indicate where to start executing from. > > Exactly. There is also a catch, in that the stack usage goes up. Not endlessly, it is limited by the number of error reporting steps. The more stack VmError::report() does cost, the less well this works, especially in stack overflow scenarios. Which is why we extended SafeFetch and enabled it for the use in the error handler, which will be one of the the next patches I'd like to port to the OpenJDK, once this one is thru. > > This all happens in VMError::report_and_die: >> -> first error ? anchor VMError object in a static variable and execute >> VMError::report() >> -> secondary error? >> -> different thread? just sleep forever >> -> same thread? new frame, re-enter VMError::report(). Once done, >> abort. >> >> I always found that rather neat, but in fact that is not our invention >> but Sun's :) Anyway, my fix does not change this behaviour for better or >> worse, it only makes it usable for more cases. >> >> Because this puts us in undefined territory according to POSIX: >> >> "The behavior of a process is undefined after it returns normally >> from a signal-catching function for a SIGBUS, SIGFPE, SIGILL, or >> SIGSEGV signal that was not generated by kill(), sigqueue(), or >> raise()." >> >> true, but we dont return... >> >> On top of that you also have the issue that error reporting does a >> whole bunch of things that are not async-signal-safe so we can >> easily encounter hangs or aborts. >> >> But we're dying anyway so I guess none of this really matters. If >> re-enabling these signals allows error reporting to progress further >> in some cases then that is a win. >> >> >> Actually, this covers a lot of cases, mostly because SIGSEGV during >> error reporting is common, so if the original error was not SIGSEGV, but >> e.g. SIGILL, this would always result in broken hs-err files. >> >> The back story is that at SAP, we rely heavily on the hs-err files. They >> are our main tool for support, because working with cores is often not >> feasible. So, we put a lot of work in making error reporting reliable >> across all platforms. This is also covered by many tests which crash the >> VM in exciting ways and check the hs-err files for completeness. >> > > OK. Modulo the cpu specific SIGILL part everything else seems fine. > > Great. just tell me how you want that part. Kind regards, Thomas > Thanks, > David > From david.holmes at oracle.com Wed Nov 26 12:02:38 2014 From: david.holmes at oracle.com (David Holmes) Date: Wed, 26 Nov 2014 22:02:38 +1000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> Message-ID: <5475C15E.30207@oracle.com> On 26/11/2014 9:37 PM, Thomas St?fe wrote: > Hi David, > ... > > - In debug.cpp for the SIGILL can you define the all zero > case as a > default so we only need to add platform specific > definitions when > all zeroes doesn't work. I really hate seeing all that CPU > selection > in shared code. :( > > > Agreed and fixed, moved the CPU-specific sections into > CPU-specific files. > > > I'd really like to see a way to share the all-zeroes case so that we > don't need to add platform specific code unnecessarily. > > > sooo.. back to the original code then, just with the #ifdef, just with > the zero-cases all folded in into the #else path? Or do you prefer > something else? Elsewhere there is a pattern of defining per-platform values that can override the shared definition. eg: #ifndef HAS_SPECIAL_PLATFORM_VALUE_FOR_XXXX Foo XXX = ...; //shared/default initalization #endif but this assumes a platform specific header has already been included that can do: #define HAS_SPECIAL_PLATFORM_VALUE_FOR_XXXX Foo XXX = ... ; // platform specific initialization But that is not the case for debug.hpp. So I guess folding the zero-case into the else path is the best we can do. However I'm assuming the zero case will work for our internal platforms ... if it doesn't then we'd have to pollute the shared code with info for the closed platforms. :( David ----- > > - Style nit: please use i++ rather than i ++ > > > Fixed. > > Aside: we should eradicate the use of sigprocmask and > replace with > the thread specific version. > > > Agree. Though I never saw any errors stemming from the use of > sigprocmask(). According to POSIX, sigprocmask() is undefined in > multithreaded environment, and I guess most OSes just default to > pthread_sigmask. > > > Yes "probably" works okay but I hate to see us using something with > undefined semantics. That's future clean up though. > > > We (SAP JVM) already use pthread_sigmask() / thr_sigsetmask() instead of > sigprocmask. Works fine. We can port this to the OpenJDK. > > Getting back to the "thinking more about this" ... If a > synchronous > signal is blocked at the time it is generated then it > should remain > pending on the thread (POSIX spec) but that doesn't tell us > what the > thread will then do - retry the faulting instruction? Become > unschedulable? So I can easily imagine that a hang or process > termination may result. > > > This is exactly what happens, but it is actually covered by > POSIX, see > doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, SIGSEGV, or > SIGBUS signals are generated while they are blocked, the result is > undefined, unless the signal was generated by the /kill/() > > > function, the /sigqueue/() > > > function, or the /raise/() > > > function." > > > Thanks - I managed to miss that part even though I found the other > part about the signal handling function returning. :( > > > It is well hidden, I found it by accident :) To me it looks like they > kept it intentionally vague, to not block platforms where those signals > could be somehow dealt with automatically? Hard to see though how this > would work. > > > > In reality, process usually aborts abnormally with the default > action > for the signal, e.g. printing out "Illegal Instruction". On > MacOS, we > hang (until the Watcherthread finally kills the VM). On old > AIXes, we > die without a trace. > > This also can be easily tried out by removing SIGILL from the > list of > signals in vmError_.cpp and executing: > > java -XX:ErrorHandlerTest=14 -XX:TestCrashInErrorHandler=15 > > which will crash first with a SIGSEGV, then in error handling with a > secondary SIGILL. This will interrupt error reporting and kill > or hang > the process. > > > In that sense unblocking those signals whilst handling the > initial > signal may well allow the error reporting process to continue > further. But I'm unclear exactly how this plays out: > > - synchronous signal encountered > - crash_handler invoked > > - VMError::report_and_die executes > - secondary signal encountered > > - crash_handler invoked again > > > almost: not again, different signal handler now. First signal was > handled by "JVM_handle__signal()" > > > Ah missed that - thanks - not that it makes much difference :) > > > I just like nitpicking :) > > - VMError::report_and_die executes again and sees the > recursion and > returns (ignoring abort due to excessive recursive errors) > > > No.. > > Is that right? So we actually return from the crash_handler? > > > Oh, but we dont return. VMError::report_and_die() will just > create a new > frame and re-execute VMError::report() of the first VMError object. > Which then will continue with the next STEP. We never return, > for each > secondary error signal a new frame is created. > > > I had trouble tracing through exactly what might happen on the > recursive call to report_and_die. I see know that report comes from: > > staticBufferStream sbs(buffer, O_BUFLEN, &log); > first_error->report(&sbs); > first_error->_current_step = 0; // reset current_step > first_error->_current_step___info = ""; // reset current_step > string > > so the second time through we will call report and _current_step > should indicate where to start executing from. > > > Exactly. There is also a catch, in that the stack usage goes up. Not > endlessly, it is limited by the number of error reporting steps. > The more stack VmError::report() does cost, the less well this works, > especially in stack overflow scenarios. > > Which is why we extended SafeFetch and enabled it for the use in the > error handler, which will be one of the the next patches I'd like to > port to the OpenJDK, once this one is thru. > > > This all happens in VMError::report_and_die: > -> first error ? anchor VMError object in a static variable and > execute > VMError::report() > -> secondary error? > -> different thread? just sleep forever > -> same thread? new frame, re-enter VMError::report(). Once > done, abort. > > I always found that rather neat, but in fact that is not our > invention > but Sun's :) Anyway, my fix does not change this behaviour for > better or > worse, it only makes it usable for more cases. > > Because this puts us in undefined territory according to POSIX: > > "The behavior of a process is undefined after it returns > normally > from a signal-catching function for a SIGBUS, SIGFPE, > SIGILL, or > SIGSEGV signal that was not generated by kill(), sigqueue(), or > raise()." > > true, but we dont return... > > On top of that you also have the issue that error reporting > does a > whole bunch of things that are not async-signal-safe so we can > easily encounter hangs or aborts. > > But we're dying anyway so I guess none of this really > matters. If > re-enabling these signals allows error reporting to > progress further > in some cases then that is a win. > > > Actually, this covers a lot of cases, mostly because SIGSEGV during > error reporting is common, so if the original error was not > SIGSEGV, but > e.g. SIGILL, this would always result in broken hs-err files. > > The back story is that at SAP, we rely heavily on the hs-err > files. They > are our main tool for support, because working with cores is > often not > feasible. So, we put a lot of work in making error reporting > reliable > across all platforms. This is also covered by many tests which > crash the > VM in exciting ways and check the hs-err files for completeness. > > > OK. Modulo the cpu specific SIGILL part everything else seems fine. > > Great. just tell me how you want that part. > > Kind regards, Thomas > > Thanks, > David > From thomas.stuefe at gmail.com Wed Nov 26 13:33:27 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 26 Nov 2014 14:33:27 +0100 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <5475C15E.30207@oracle.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> Message-ID: Hi David, here you go: http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.02/ Reverted SIGILL-generating function back to its original form, plus the folding of the 000 case. I only can guess what your closed platforms are, but if it is ARM, I believe opcodes 0-31 are undefined. For ia64, 0 is undefined as well. Kind regards, Thomas On Wed, Nov 26, 2014 at 1:02 PM, David Holmes wrote: > On 26/11/2014 9:37 PM, Thomas St?fe wrote: > >> Hi David, >> ... >> >> - In debug.cpp for the SIGILL can you define the all zero >> case as a >> default so we only need to add platform specific >> definitions when >> all zeroes doesn't work. I really hate seeing all that CPU >> selection >> in shared code. :( >> >> >> Agreed and fixed, moved the CPU-specific sections into >> CPU-specific files. >> >> >> I'd really like to see a way to share the all-zeroes case so that we >> don't need to add platform specific code unnecessarily. >> >> >> sooo.. back to the original code then, just with the #ifdef, just with >> the zero-cases all folded in into the #else path? Or do you prefer >> something else? >> > > Elsewhere there is a pattern of defining per-platform values that can > override the shared definition. eg: > > #ifndef HAS_SPECIAL_PLATFORM_VALUE_FOR_XXXX > Foo XXX = ...; //shared/default initalization > #endif > > but this assumes a platform specific header has already been included that > can do: > > #define HAS_SPECIAL_PLATFORM_VALUE_FOR_XXXX > Foo XXX = ... ; // platform specific initialization > > But that is not the case for debug.hpp. > > So I guess folding the zero-case into the else path is the best we can do. > However I'm assuming the zero case will work for our internal platforms ... > if it doesn't then we'd have to pollute the shared code with info for the > closed platforms. :( > > David > ----- > > >> - Style nit: please use i++ rather than i ++ >> >> >> Fixed. >> >> Aside: we should eradicate the use of sigprocmask and >> replace with >> the thread specific version. >> >> >> Agree. Though I never saw any errors stemming from the use of >> sigprocmask(). According to POSIX, sigprocmask() is undefined in >> multithreaded environment, and I guess most OSes just default to >> pthread_sigmask. >> >> >> Yes "probably" works okay but I hate to see us using something with >> undefined semantics. That's future clean up though. >> >> >> We (SAP JVM) already use pthread_sigmask() / thr_sigsetmask() instead of >> sigprocmask. Works fine. We can port this to the OpenJDK. >> >> Getting back to the "thinking more about this" ... If a >> synchronous >> signal is blocked at the time it is generated then it >> should remain >> pending on the thread (POSIX spec) but that doesn't tell us >> what the >> thread will then do - retry the faulting instruction? Become >> unschedulable? So I can easily imagine that a hang or process >> termination may result. >> >> >> This is exactly what happens, but it is actually covered by >> POSIX, see >> doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, SIGSEGV, or >> SIGBUS signals are generated while they are blocked, the result is >> undefined, unless the signal was generated by the /kill/() >> > functions/kill.html >> > functions/kill.html>> >> function, the /sigqueue/() >> > functions/sigqueue.html >> > functions/sigqueue.html>> >> function, or the /raise/() >> > functions/raise.html >> >> > functions/raise.html>> >> function." >> >> >> Thanks - I managed to miss that part even though I found the other >> part about the signal handling function returning. :( >> >> >> It is well hidden, I found it by accident :) To me it looks like they >> kept it intentionally vague, to not block platforms where those signals >> could be somehow dealt with automatically? Hard to see though how this >> would work. >> >> >> >> In reality, process usually aborts abnormally with the default >> action >> for the signal, e.g. printing out "Illegal Instruction". On >> MacOS, we >> hang (until the Watcherthread finally kills the VM). On old >> AIXes, we >> die without a trace. >> >> This also can be easily tried out by removing SIGILL from the >> list of >> signals in vmError_.cpp and executing: >> >> java -XX:ErrorHandlerTest=14 -XX:TestCrashInErrorHandler=15 >> >> which will crash first with a SIGSEGV, then in error handling >> with a >> secondary SIGILL. This will interrupt error reporting and kill >> or hang >> the process. >> >> >> In that sense unblocking those signals whilst handling the >> initial >> signal may well allow the error reporting process to continue >> further. But I'm unclear exactly how this plays out: >> >> - synchronous signal encountered >> - crash_handler invoked >> >> - VMError::report_and_die executes >> - secondary signal encountered >> >> - crash_handler invoked again >> >> >> almost: not again, different signal handler now. First signal was >> handled by "JVM_handle__signal()" >> >> >> Ah missed that - thanks - not that it makes much difference :) >> >> >> I just like nitpicking :) >> >> - VMError::report_and_die executes again and sees the >> recursion and >> returns (ignoring abort due to excessive recursive errors) >> >> >> No.. >> >> Is that right? So we actually return from the crash_handler? >> >> >> Oh, but we dont return. VMError::report_and_die() will just >> create a new >> frame and re-execute VMError::report() of the first VMError >> object. >> Which then will continue with the next STEP. We never return, >> for each >> secondary error signal a new frame is created. >> >> >> I had trouble tracing through exactly what might happen on the >> recursive call to report_and_die. I see know that report comes from: >> >> staticBufferStream sbs(buffer, O_BUFLEN, &log); >> first_error->report(&sbs); >> first_error->_current_step = 0; // reset current_step >> first_error->_current_step___info = ""; // reset current_step >> >> string >> >> so the second time through we will call report and _current_step >> should indicate where to start executing from. >> >> >> Exactly. There is also a catch, in that the stack usage goes up. Not >> endlessly, it is limited by the number of error reporting steps. >> The more stack VmError::report() does cost, the less well this works, >> especially in stack overflow scenarios. >> >> Which is why we extended SafeFetch and enabled it for the use in the >> error handler, which will be one of the the next patches I'd like to >> port to the OpenJDK, once this one is thru. >> >> >> This all happens in VMError::report_and_die: >> -> first error ? anchor VMError object in a static variable and >> execute >> VMError::report() >> -> secondary error? >> -> different thread? just sleep forever >> -> same thread? new frame, re-enter VMError::report(). Once >> done, abort. >> >> I always found that rather neat, but in fact that is not our >> invention >> but Sun's :) Anyway, my fix does not change this behaviour for >> better or >> worse, it only makes it usable for more cases. >> >> Because this puts us in undefined territory according to >> POSIX: >> >> "The behavior of a process is undefined after it returns >> normally >> from a signal-catching function for a SIGBUS, SIGFPE, >> SIGILL, or >> SIGSEGV signal that was not generated by kill(), sigqueue(), >> or >> raise()." >> >> true, but we dont return... >> >> On top of that you also have the issue that error reporting >> does a >> whole bunch of things that are not async-signal-safe so we >> can >> easily encounter hangs or aborts. >> >> But we're dying anyway so I guess none of this really >> matters. If >> re-enabling these signals allows error reporting to >> progress further >> in some cases then that is a win. >> >> >> Actually, this covers a lot of cases, mostly because SIGSEGV >> during >> error reporting is common, so if the original error was not >> SIGSEGV, but >> e.g. SIGILL, this would always result in broken hs-err files. >> >> The back story is that at SAP, we rely heavily on the hs-err >> files. They >> are our main tool for support, because working with cores is >> often not >> feasible. So, we put a lot of work in making error reporting >> reliable >> across all platforms. This is also covered by many tests which >> crash the >> VM in exciting ways and check the hs-err files for completeness. >> >> >> OK. Modulo the cpu specific SIGILL part everything else seems fine. >> >> Great. just tell me how you want that part. >> >> Kind regards, Thomas >> >> Thanks, >> David >> >> From thomas.stuefe at gmail.com Wed Nov 26 14:12:52 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 26 Nov 2014 15:12:52 +0100 Subject: RFR: JDK-8059586: hs_err report should treat redirected core pattern. In-Reply-To: References: <542C8274.3010809@gmail.com> <54338B70.9080709@oracle.com> <543B1FD6.3000200@oracle.com> <543CF553.80601@gmail.com> <543DC2BF.9050407@oracle.com> <543E80F8.3080204@gmail.com> <547330E5.1050708@gmail.com> Message-ID: Hi Yasumasa, I am not a Reviewer. Barring the general decision of the real reviewers, here are some thoughts: os_linux.cpp - jio_snprintf() returns -1 on truncation. n+=written may walk backwards. I would probably check for (written >= 0) and also, at the start of the loop, for (n < sizeof(core_path)). - code is used in error reporting. I would be hesitant to create larger buffers on the stack. malloc may be better. - code does not detect truncation of core_path (unlikely but possible) the rest is more matter of taste: - I would prefer sizeof(core_path) over PATH_MAX at all places where you refer to the size of the buffer. So you could make the buffer very small and test e.g. how your code behaves with truncation. - when reading /proc/sys/kernel/core_uses_pid, using fgetc instead of fgets may be a tiny bit simpler. Kind Regards, Thomas On Wed, Nov 26, 2014 at 4:54 AM, Yasumasa Suenaga wrote: > Hi Staffan, > > Thank you for reviewing! > > os_linux.cpp: > I want to print coredump location correctly to hs_err. So I want to output > whether coredump is processed in other process or is written to file. > If os::get_core_path() should be more simply, I will print raw string in > core_pattern. > > os_bsd.cpp: > I don't have OS X. So I cannot check it. > I am focusing Linux in this enhancement. Could you file it as another > enhancement if it need? > > Thanks, > > Yasumasa > > 2014/11/25 18:15 "Staffan Larsen" : > > > src/os/bsd/vm/os_linux.cpp: > > I?m inclined to think this is too complicated and hard to test and > > maintain (and I see no tests in the webrev). Could we not simplify this > to > > print a helpful message instead? Something that prints the core_pattern > and > > perhaps some of the values that could be used for substitution, but does > > not do the actual substitution? I think that would go a long way but be a > > lot more maintainable. > > > > src/os/bsd/vm/os_bsd.cpp: > > On OS X cores are by default written to /cores/core.. This is > > configureable with the kern.corefile sysctl variable, although it is rare > > to do so. > > > > /Staffan > > > > > On 24 nov 2014, at 14:21, Yasumasa Suenaga wrote: > > > > > > Hi all, > > > > > > I've uploaded webrev for this issue about a month ago. > > > Could you review it and sponsor it? > > > > > > > > > Thanks, > > > > > > Yasumasa > > > > > > > > > On 10/15/2014 11:13 PM, Yasumasa Suenaga wrote: > > >> Hi David, > > >> > > >> I've uploaded new webrev: > > >> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.02/ > > >> > > >> > > >>> I wasn't suggesting that you make such a change though because it is > > large and disruptive. > > >> > > >>> Unfactoring check_or_create_dump is a step backwards in terms of code > > sharing. > > >> > > >> I restored check_or_create_dump() to os_posix.cpp . > > >> And I changed get_core_path() to create message which represents core > > dump path > > >> (including filename) in each OS. > > >> > > >> > > >>> Expanding the get_core_path in os_linux.cpp to handle the > core_pattern > > may be okay (but I don't know enough about it to validate everything). > > >> > > >> I implemented all parameters in Linux kernel documentation: > > >> https://www.kernel.org/doc/Documentation/sysctl/kernel.txt > > >> > > >> So I think that parameters which are processed are enough. > > >> > > >> > > >> Thanks, > > >> > > >> Yasumasa > > >> > > >> > > >> > > >> (2014/10/15 9:41), David Holmes wrote: > > >>> On 14/10/2014 8:05 PM, Yasumasa Suenaga wrote: > > >>>> Hi David, > > >>>> > > >>>> Thank you for comments! > > >>>> I've uploaded new webrev. Could you review it again? > > >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.01/ > > >>>> > > >>>> I am an author of jdk9. So I cannot commit it. > > >>>> Could you be a sponsor for this enhancement? > > >>>> > > >>>> > > >>>>> In which case that should be handled by the linux specific > > >>>>> get_core_path() function. > > >>>> > > >>>> Agree. > > >>>> So I implemented it in os_linux.cpp . > > >>>> But part of format characters (%P: global pid, %s: signal, %t dump > > time) > > >>>> are not processed > > >>>> in this function because I think these parameters are difficult to > > >>>> handle in it. > > >>>> > > >>>> %P: I could not find API for this. > > >>>> %s: We have to change arguments of get_core_path() . > > >>>> %t: This parameter means timestamp of coredump. It is decided in > > Kernel. > > >>>> > > >>>> > > >>>>> Fixing this means changing all the os_posix using platforms. But > your > > >>>>> patch is not about this part. :) > > >>>> > > >>>> I moved os::check_or_create_dump() to each OS implementations (AIX, > > BSD, > > >>>> Solaris, Linux) . > > >>>> So I can write Linux specific code to check_or_create_dump() . > > >>>> As a result, I could remove "#ifdef LINUX" from os_posix.cpp :-) > > >>> > > >>> I wasn't suggesting that you make such a change though because it is > > large and disruptive. The simple handling of the | part of core_pattern > was > > basically ok. Expanding the get_core_path in os_linux.cpp to handle the > > core_pattern may be okay (but I don't know enough about it to validate > > everything). Unfactoring check_or_create_dump is a step backwards in > terms > > of code sharing. > > >>> > > >>> Sorry this has grown too large for me to deal with right now. > > >>> > > >>> David > > >>> ----- > > >>> > > >>>> > > >>>>> Though I'm unclear whether it both invokes the program and creates > a > > >>>>> core dump file; or just invokes the program? > > >>>> > > >>>> If '|' is set, Linux kernel will just redirect core image to user > > process. > > >>>> Kernel documentation says as below: > > >>>> ------------ > > >>>> . If the first character of the pattern is a '|', the kernel will > > treat > > >>>> the rest of the pattern as a command to run. The core dump will > be > > >>>> written to the standard input of that program instead of to a > file. > > >>>> ------------ > > >>>> > > >>>> And implementation of coredump (do_coredump()) follows to it. > > >>>> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c > > >>>> > > >>>> > > >>>> In case of ABRT, ABRT dumps core image to default location > > >>>> (/core.) > > >>>> if user set unlimited to resource limit of core (ulimit -c) . > > >>>> https://github.com/abrt/abrt/blob/master/src/hooks/abrt-hook-ccpp.c > > >>>> > > >>>> > > >>>>> A few style nits - you need spaces around keywords and before > braces > > >>>>> I also suggest saying "Core dumps may be processed with ..." rather > > >>>>> than "treated". > > >>>>> And as you don't do anything in the non-redirect case I suggest > > >>>>> collapsing this: > > >>>> > > >>>> I've fixed them. > > >>>> > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Yasumasa > > >>>> > > >>>> > > >>>> (2014/10/13 9:41), David Holmes wrote: > > >>>>> Hi Yasumasa, > > >>>>> > > >>>>> On 7/10/2014 8:48 PM, Yasumasa Suenaga wrote: > > >>>>>> Hi David, > > >>>>>> > > >>>>>> Sorry for my English. > > >>>>>> > > >>>>>> I want to propose that JVM should create message according to core > > >>>>>> pattern (/proc/sys/kernel/core_pattern) . > > >>>>>> So I filed it to JBS and created a patch. > > >>>>> > > >>>>> So I've had a quick look at this core_pattern business and it seems > > to > > >>>>> me that there are two aspects to this. > > >>>>> > > >>>>> First, without the leading |, the entry in the core_pattern file > is a > > >>>>> naming pattern for the core file. In which case that should be > > handled > > >>>>> by the linux specific get_core_path() function. Though that in > itself > > >>>>> can't fully report the expected name, as part of it is provided in > > the > > >>>>> shared code in os::check_or_create_dump. Fixing this means changing > > >>>>> all the os_posix using platforms. But your patch is not about this > > >>>>> part. :) > > >>>>> > > >>>>> Second, with a leading | the core_pattern is actually the name of a > > >>>>> program to execute when the program is about to core dump, and that > > is > > >>>>> what you report with your patch. Though I'm unclear whether it both > > >>>>> invokes the program and creates a core dump file; or just invokes > the > > >>>>> program? > > >>>>> > > >>>>> So with regards to this second part your patch seems functionally > ok. > > >>>>> I do dislike having a big chunk of linux specific code in this > > "posix" > > >>>>> support file but ... > > >>>>> > > >>>>> A few style nits - you need spaces around keywords and before > braces > > eg: > > >>>>> > > >>>>> if(x){ > > >>>>> > > >>>>> should be > > >>>>> > > >>>>> if (x) { > > >>>>> > > >>>>> I also suggest saying "Core dumps may be processed with ..." rather > > >>>>> than "treated". > > >>>>> > > >>>>> And as you don't do anything in the non-redirect case I suggest > > >>>>> collapsing this: > > >>>>> > > >>>>> 83 is_redirect = core_pattern[0] == '|'; > > >>>>> 84 } > > >>>>> 85 > > >>>>> 86 if(is_redirect){ > > >>>>> 87 jio_snprintf(buffer, bufferSize, > > >>>>> 88 "Core dumps may be treated with \"%s\"", > > >>>>> &core_pattern[1]); > > >>>>> 89 } > > >>>>> > > >>>>> to just > > >>>>> > > >>>>> 83 if (core_pattern[0] == '|') { // redirect > > >>>>> 84 jio_snprintf(buffer, bufferSize, "Core dumps may > be > > >>>>> processed with \"%s\"", &core_pattern[1]); > > >>>>> 85 } > > >>>>> 86 } > > >>>>> > > >>>>> Comments from other runtime folk appreciated. > > >>>>> > > >>>>> Thanks, > > >>>>> David > > >>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Yasumasa > > >>>>>> > > >>>>>> 2014/10/07 15:43 "David Holmes" > >>>>>> >: > > >>>>>> > > >>>>>> Hi Yasumasa, > > >>>>>> > > >>>>>> I'm sorry but I don't understand what you are proposing. When > you > > >>>>>> say > > >>>>>> "treat" do you mean "create"? Otherwise what do you mean by > > >>>>>> "treated"? > > >>>>>> > > >>>>>> Thanks, > > >>>>>> David > > >>>>>> > > >>>>>> On 2/10/2014 8:38 AM, Yasumasa Suenaga wrote: > > >>>>>> > I'm in Hackergarten @ JavaOne :-) > > >>>>>> > > > >>>>>> > > > >>>>>> > Hi all, > > >>>>>> > > > >>>>>> > I would like to enhance the messages in hs_err report. > > >>>>>> > Modern Linux kernel can treat core dump with user process > > >>>>>> (e.g. ABRT) > > >>>>>> > However, hs_err report cannot detect it. > > >>>>>> > > > >>>>>> > I think that hs_err report should output messages as below: > > >>>>>> > ------------- > > >>>>>> > Failed to write core dump. Core dumps may be treated > with > > >>>>>> "/usr/sbin/chroot /proc/%P/root /usr/libexec/abrt-hook-ccpp %s > > %c %p > > >>>>>> %u %g %t e" > > >>>>>> > ------------- > > >>>>>> > > > >>>>>> > I've uploaded webrev of this enhancement. > > >>>>>> > Could you review it? > > >>>>>> > > > >>>>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.00/ > > >>>>>> > > > >>>>>> > This patch works fine on Fedora20 x86_64. > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > Thanks, > > >>>>>> > > > >>>>>> > Yasumasa > > >>>>>> > > > >>>>>> > > > > > From yumin.qi at oracle.com Wed Nov 26 17:36:16 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Wed, 26 Nov 2014 09:36:16 -0800 Subject: RFR(XS): 8053995: Add method to WhiteBox to get vm pagesize. In-Reply-To: <547532C0.4080500@oracle.com> References: <53DAC336.6050302@oracle.com> <54752EAF.4020404@oracle.com> <547532C0.4080500@oracle.com> Message-ID: <54760F90.6040100@oracle.com> Thanks for the review. Yes, the test will build testlibrary with @library /testlibrary /testlibrary/whitebox Thanks Yumin On 11/25/14, 5:54 PM, David Holmes wrote: > Hi Yumin, > > On 26/11/2014 11:36 AM, Yumin Qi wrote: >> Please review >> >> bugs: https://bugs.openjdk.java.net/browse/JDK-8053995 >> webrev: http://cr.openjdk.java.net/~minqi/8053995/webrev01/ > > The test also needs to ensure the testlibrary gets built. > > Otherwise seems okay. > > Thanks, > David > >> Now the API usage is in internal test case, see separate email for the >> webrev. >> >> It is same as previous version (webrev00). >> >> Thanks >> Yumin >> >> On 7/31/14, 3:29 PM, Yumin Qi wrote: >>> Please review: >>> >>> http://cr.openjdk.java.net/~minqi/8053995/webrev00/ >>> >>> Summary: Currently there is no java API to get underlying OS native VM >>> page size unless using Unsafe which is not recommended. The new added >>> method to WhiteBox can read this property and used in test. >>> >>> >>> Tests: JPRT and jtreg. >>> >>> Thanks >>> Yumin From calvin.cheung at oracle.com Wed Nov 26 18:07:29 2014 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Wed, 26 Nov 2014 10:07:29 -0800 Subject: RFR(XS): 8053995: Add method to WhiteBox to get vm pagesize. In-Reply-To: <54760F90.6040100@oracle.com> References: <53DAC336.6050302@oracle.com> <54752EAF.4020404@oracle.com> <547532C0.4080500@oracle.com> <54760F90.6040100@oracle.com> Message-ID: <547616E1.5090706@oracle.com> Looks good to me too. Calvin On 11/26/2014 9:36 AM, Yumin Qi wrote: > Thanks for the review. Yes, the test will build testlibrary with > > @library /testlibrary /testlibrary/whitebox > > > Thanks > Yumin > > > > On 11/25/14, 5:54 PM, David Holmes wrote: >> Hi Yumin, >> >> On 26/11/2014 11:36 AM, Yumin Qi wrote: >>> Please review >>> >>> bugs: https://bugs.openjdk.java.net/browse/JDK-8053995 >>> webrev: http://cr.openjdk.java.net/~minqi/8053995/webrev01/ >> >> The test also needs to ensure the testlibrary gets built. >> >> Otherwise seems okay. >> >> Thanks, >> David >> >>> Now the API usage is in internal test case, see separate email for the >>> webrev. >>> >>> It is same as previous version (webrev00). >>> >>> Thanks >>> Yumin >>> >>> On 7/31/14, 3:29 PM, Yumin Qi wrote: >>>> Please review: >>>> >>>> http://cr.openjdk.java.net/~minqi/8053995/webrev00/ >>>> >>>> Summary: Currently there is no java API to get underlying OS native VM >>>> page size unless using Unsafe which is not recommended. The new added >>>> method to WhiteBox can read this property and used in test. >>>> >>>> >>>> Tests: JPRT and jtreg. >>>> >>>> Thanks >>>> Yumin From yumin.qi at oracle.com Wed Nov 26 22:36:53 2014 From: yumin.qi at oracle.com (Yumin Qi) Date: Wed, 26 Nov 2014 14:36:53 -0800 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <543C591E.8010602@oracle.com> References: <543C591E.8010602@oracle.com> Message-ID: <54765605.8030909@oracle.com> Hi, please review again for new change for fixing the ClassCircularityError (CCE) in this test case. More debug tails revealed that the CCE always happened at the beginning of the loop, before the real loading of TestClass[1-3] loaded, transform is called against system classes too (though they did not get loaded by agent). The check for loader which passed to transform is done before calling loading 'TestClass3', if it is null skip loading. This can prevent from loading loader itself before loading 'TestClass3', thus avoid seeing $JarLoader$2 twice on PlaceHolderTable. Meanwhile remove the block 'sleep' which is used to workaround deadlock at the beginning of transform. With the change which only loads class TestClass3 when loader is not null, this workaround is not needed. It is the loader loading caused both the issues here. new URL: http://cr.openjdk.java.net/~minqi/8038468/webrev02/ On 10/13/14, 3:58 PM, Yumin Qi wrote: > bug: https://bugs.openjdk.java.net/browse/JDK-8038468 > webrev:*http://cr.openjdk.java.net/~minqi/8038468/webrev00/ > > the bug marked as confidential so post the webrev internally. > > Problem: The test case tries to load a class from the same jar via > agent in the middle of loading another class from the jar via same > class loader in same thread. The call happens in transform which is a > rare case --- in middle of loading class, loading another class. The > result is a CircularityError. When first class is in loading, in vm we > put JarLoader$2 on place holder table, then we start the defineClass, > which calls transform, begins loading the second class so go along the > same routine for loading JarLoader$2 first, found it already in > placeholder table. A CircularityError is thrown. > Fix: The test case should not call loading class with same class > loader in same thread from same jar in 'transform' method. I modify it > loading with system class loader and we expect see > ClassNotFoundException. Detail see bug comments. > > Thanks > Yumin * From karen.kinnear at oracle.com Wed Nov 26 22:55:38 2014 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 26 Nov 2014 17:55:38 -0500 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <54765605.8030909@oracle.com> References: <543C591E.8010602@oracle.com> <54765605.8030909@oracle.com> Message-ID: Yumin, Looks good. thanks very much, Karen On Nov 26, 2014, at 5:36 PM, Yumin Qi wrote: > Hi, please review again for new change for fixing the ClassCircularityError (CCE) in this test case. > > More debug tails revealed that the CCE always happened at the beginning of the loop, before the real loading of TestClass[1-3] loaded, transform is called against system classes too (though they did not get loaded by agent). The check for loader which passed to transform is done before calling loading 'TestClass3', if it is null skip loading. This can prevent from loading loader itself before loading 'TestClass3', thus avoid seeing $JarLoader$2 twice on PlaceHolderTable. Meanwhile remove the block 'sleep' which is used to workaround deadlock at the beginning of transform. With the change which only loads class TestClass3 when loader is not null, this workaround is not needed. It is the loader loading caused both the issues here. > > new URL: > http://cr.openjdk.java.net/~minqi/8038468/webrev02/ > > > On 10/13/14, 3:58 PM, Yumin Qi wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8038468 >> webrev:*http://cr.openjdk.java.net/~minqi/8038468/webrev00/ >> >> the bug marked as confidential so post the webrev internally. >> >> Problem: The test case tries to load a class from the same jar via agent in the middle of loading another class from the jar via same class loader in same thread. The call happens in transform which is a rare case --- in middle of loading class, loading another class. The result is a CircularityError. When first class is in loading, in vm we put JarLoader$2 on place holder table, then we start the defineClass, which calls transform, begins loading the second class so go along the same routine for loading JarLoader$2 first, found it already in placeholder table. A CircularityError is thrown. >> Fix: The test case should not call loading class with same class loader in same thread from same jar in 'transform' method. I modify it loading with system class loader and we expect see ClassNotFoundException. Detail see bug comments. >> >> Thanks >> Yumin * From serguei.spitsyn at oracle.com Wed Nov 26 23:01:20 2014 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 26 Nov 2014 15:01:20 -0800 Subject: RFR: 8038468: java/lang/instrument/ParallelTransformerLoader.sh fails with ClassCircularityError In-Reply-To: <54765605.8030909@oracle.com> References: <543C591E.8010602@oracle.com> <54765605.8030909@oracle.com> Message-ID: <54765BC0.30700@oracle.com> The fix looks good to me. The class loading condition change is reasonable. Thanks, Serguei On 11/26/14 2:36 PM, Yumin Qi wrote: > Hi, please review again for new change for fixing the > ClassCircularityError (CCE) in this test case. > > More debug tails revealed that the CCE always happened at the > beginning of the loop, before the real loading of TestClass[1-3] > loaded, transform is called against system classes too (though they > did not get loaded by agent). The check for loader which passed to > transform is done before calling loading 'TestClass3', if it is null > skip loading. This can prevent from loading loader itself before > loading 'TestClass3', thus avoid seeing $JarLoader$2 twice on > PlaceHolderTable. Meanwhile remove the block 'sleep' which is used to > workaround deadlock at the beginning of transform. With the change > which only loads class TestClass3 when loader is not null, this > workaround is not needed. It is the loader loading caused both the > issues here. > > new URL: > http://cr.openjdk.java.net/~minqi/8038468/webrev02/ > > > On 10/13/14, 3:58 PM, Yumin Qi wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8038468 >> webrev:*http://cr.openjdk.java.net/~minqi/8038468/webrev00/ >> >> the bug marked as confidential so post the webrev internally. >> >> Problem: The test case tries to load a class from the same jar via >> agent in the middle of loading another class from the jar via same >> class loader in same thread. The call happens in transform which is a >> rare case --- in middle of loading class, loading another class. The >> result is a CircularityError. When first class is in loading, in vm >> we put JarLoader$2 on place holder table, then we start the >> defineClass, which calls transform, begins loading the second class >> so go along the same routine for loading JarLoader$2 first, found it >> already in placeholder table. A CircularityError is thrown. >> Fix: The test case should not call loading class with same class >> loader in same thread from same jar in 'transform' method. I modify >> it loading with system class loader and we expect see >> ClassNotFoundException. Detail see bug comments. >> >> Thanks >> Yumin * From david.holmes at oracle.com Thu Nov 27 00:49:06 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 27 Nov 2014 10:49:06 +1000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> Message-ID: <54767502.6010907@oracle.com> On 26/11/2014 11:33 PM, Thomas St?fe wrote: > Hi David, > > here you go: http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.02/ > > Reverted SIGILL-generating function back to its original form, plus the > folding of the 000 case. Thanks Thomas! While we are awaiting a second reviewer I will test this out internally. It may take a day or two sorry. David > I only can guess what your closed platforms are, but if it is ARM, I > believe opcodes 0-31 are undefined. For ia64, 0 is undefined as well. > > Kind regards, Thomas > > > On Wed, Nov 26, 2014 at 1:02 PM, David Holmes > wrote: > > On 26/11/2014 9:37 PM, Thomas St?fe wrote: > > Hi David, > ... > > - In debug.cpp for the SIGILL can you define the > all zero > case as a > default so we only need to add platform specific > definitions when > all zeroes doesn't work. I really hate seeing all > that CPU > selection > in shared code. :( > > > Agreed and fixed, moved the CPU-specific sections into > CPU-specific files. > > > I'd really like to see a way to share the all-zeroes case > so that we > don't need to add platform specific code unnecessarily. > > > sooo.. back to the original code then, just with the #ifdef, > just with > the zero-cases all folded in into the #else path? Or do you prefer > something else? > > > Elsewhere there is a pattern of defining per-platform values that > can override the shared definition. eg: > > #ifndef HAS_SPECIAL_PLATFORM_VALUE___FOR_XXXX > Foo XXX = ...; //shared/default initalization > #endif > > but this assumes a platform specific header has already been > included that can do: > > #define HAS_SPECIAL_PLATFORM_VALUE___FOR_XXXX > Foo XXX = ... ; // platform specific initialization > > But that is not the case for debug.hpp. > > So I guess folding the zero-case into the else path is the best we > can do. However I'm assuming the zero case will work for our > internal platforms ... if it doesn't then we'd have to pollute the > shared code with info for the closed platforms. :( > > David > ----- > > > - Style nit: please use i++ rather than i ++ > > > Fixed. > > Aside: we should eradicate the use of sigprocmask and > replace with > the thread specific version. > > > Agree. Though I never saw any errors stemming from the > use of > sigprocmask(). According to POSIX, sigprocmask() is > undefined in > multithreaded environment, and I guess most OSes just > default to > pthread_sigmask. > > > Yes "probably" works okay but I hate to see us using > something with > undefined semantics. That's future clean up though. > > > We (SAP JVM) already use pthread_sigmask() / thr_sigsetmask() > instead of > sigprocmask. Works fine. We can port this to the OpenJDK. > > Getting back to the "thinking more about this" ... > If a > synchronous > signal is blocked at the time it is generated then it > should remain > pending on the thread (POSIX spec) but that > doesn't tell us > what the > thread will then do - retry the faulting > instruction? Become > unschedulable? So I can easily imagine that a hang > or process > termination may result. > > > This is exactly what happens, but it is actually covered by > POSIX, see > doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, > SIGSEGV, or > SIGBUS signals are generated while they are blocked, > the result is > undefined, unless the signal was generated by the /kill/() > > > > >> > function, the /sigqueue/() > > > > >> > function, or the /raise/() > > > > > >> > function." > > > Thanks - I managed to miss that part even though I found > the other > part about the signal handling function returning. :( > > > It is well hidden, I found it by accident :) To me it looks like > they > kept it intentionally vague, to not block platforms where those > signals > could be somehow dealt with automatically? Hard to see though > how this > would work. > > > > In reality, process usually aborts abnormally with the > default > action > for the signal, e.g. printing out "Illegal Instruction". On > MacOS, we > hang (until the Watcherthread finally kills the VM). On old > AIXes, we > die without a trace. > > This also can be easily tried out by removing SIGILL > from the > list of > signals in vmError_.cpp and executing: > > java -XX:ErrorHandlerTest=14 -XX:TestCrashInErrorHandler=15 > > which will crash first with a SIGSEGV, then in error > handling with a > secondary SIGILL. This will interrupt error reporting > and kill > or hang > the process. > > > In that sense unblocking those signals whilst > handling the > initial > signal may well allow the error reporting process > to continue > further. But I'm unclear exactly how this plays out: > > - synchronous signal encountered > - crash_handler invoked > > - VMError::report_and_die executes > - secondary signal encountered > > - crash_handler invoked again > > > almost: not again, different signal handler now. First > signal was > handled by "JVM_handle__signal()" > > > Ah missed that - thanks - not that it makes much difference :) > > > I just like nitpicking :) > > - VMError::report_and_die executes again and sees the > recursion and > returns (ignoring abort due to excessive recursive > errors) > > > No.. > > Is that right? So we actually return from the > crash_handler? > > > Oh, but we dont return. VMError::report_and_die() will just > create a new > frame and re-execute VMError::report() of the first > VMError object. > Which then will continue with the next STEP. We never > return, > for each > secondary error signal a new frame is created. > > > I had trouble tracing through exactly what might happen on the > recursive call to report_and_die. I see know that report > comes from: > > staticBufferStream sbs(buffer, O_BUFLEN, &log); > first_error->report(&sbs); > first_error->_current_step = 0; // reset > current_step > first_error->_current_step_____info = ""; // reset > current_step > > string > > so the second time through we will call report and > _current_step > should indicate where to start executing from. > > > Exactly. There is also a catch, in that the stack usage goes up. Not > endlessly, it is limited by the number of error reporting steps. > The more stack VmError::report() does cost, the less well this > works, > especially in stack overflow scenarios. > > Which is why we extended SafeFetch and enabled it for the use in the > error handler, which will be one of the the next patches I'd like to > port to the OpenJDK, once this one is thru. > > > This all happens in VMError::report_and_die: > -> first error ? anchor VMError object in a static > variable and > execute > VMError::report() > -> secondary error? > -> different thread? just sleep forever > -> same thread? new frame, re-enter > VMError::report(). Once > done, abort. > > I always found that rather neat, but in fact that is > not our > invention > but Sun's :) Anyway, my fix does not change this > behaviour for > better or > worse, it only makes it usable for more cases. > > Because this puts us in undefined territory > according to POSIX: > > "The behavior of a process is undefined after it > returns > normally > from a signal-catching function for a SIGBUS, SIGFPE, > SIGILL, or > SIGSEGV signal that was not generated by kill(), > sigqueue(), or > raise()." > > true, but we dont return... > > On top of that you also have the issue that error > reporting > does a > whole bunch of things that are not > async-signal-safe so we can > easily encounter hangs or aborts. > > But we're dying anyway so I guess none of this really > matters. If > re-enabling these signals allows error reporting to > progress further > in some cases then that is a win. > > > Actually, this covers a lot of cases, mostly because > SIGSEGV during > error reporting is common, so if the original error was not > SIGSEGV, but > e.g. SIGILL, this would always result in broken hs-err > files. > > The back story is that at SAP, we rely heavily on the > hs-err > files. They > are our main tool for support, because working with > cores is > often not > feasible. So, we put a lot of work in making error > reporting > reliable > across all platforms. This is also covered by many > tests which > crash the > VM in exciting ways and check the hs-err files for > completeness. > > > OK. Modulo the cpu specific SIGILL part everything else > seems fine. > > Great. just tell me how you want that part. > > Kind regards, Thomas > > Thanks, > David > > From david.holmes at oracle.com Thu Nov 27 00:57:34 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 27 Nov 2014 10:57:34 +1000 Subject: RFR(XS): 8053995: Add method to WhiteBox to get vm pagesize. In-Reply-To: <54760F90.6040100@oracle.com> References: <53DAC336.6050302@oracle.com> <54752EAF.4020404@oracle.com> <547532C0.4080500@oracle.com> <54760F90.6040100@oracle.com> Message-ID: <547676FE.3080204@oracle.com> On 27/11/2014 3:36 AM, Yumin Qi wrote: > Thanks for the review. Yes, the test will build testlibrary with > > @library /testlibrary /testlibrary/whitebox No that won't necessarily build the testlibrary. From other email: >> I'm having a problem running a test in 8u25 that uses the testlibrary >> ProcessTools API. I get a ClassNotFoundException. Looking in the >> classes directory I only see two testlibrary classes - which map to >> two specific testlibrary classes that one test has on its @build >> line. The test in question simply has: >> >> @library /testlibrary >> >> Does it need an explicit: >> >> @build com.oracle.java.testlibrary.* > > Yes. It turns out that JTReg might not compile the library classes on > demand (but it does sometimes). So it is better to specify the > required build manually. > > -JB- David ----- > > Thanks > Yumin > > > > On 11/25/14, 5:54 PM, David Holmes wrote: >> Hi Yumin, >> >> On 26/11/2014 11:36 AM, Yumin Qi wrote: >>> Please review >>> >>> bugs: https://bugs.openjdk.java.net/browse/JDK-8053995 >>> webrev: http://cr.openjdk.java.net/~minqi/8053995/webrev01/ >> >> The test also needs to ensure the testlibrary gets built. >> >> Otherwise seems okay. >> >> Thanks, >> David >> >>> Now the API usage is in internal test case, see separate email for the >>> webrev. >>> >>> It is same as previous version (webrev00). >>> >>> Thanks >>> Yumin >>> >>> On 7/31/14, 3:29 PM, Yumin Qi wrote: >>>> Please review: >>>> >>>> http://cr.openjdk.java.net/~minqi/8053995/webrev00/ >>>> >>>> Summary: Currently there is no java API to get underlying OS native VM >>>> page size unless using Unsafe which is not recommended. The new added >>>> method to WhiteBox can read this property and used in test. >>>> >>>> >>>> Tests: JPRT and jtreg. >>>> >>>> Thanks >>>> Yumin From david.holmes at oracle.com Thu Nov 27 05:18:15 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 27 Nov 2014 15:18:15 +1000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <54767502.6010907@oracle.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> Message-ID: <5476B417.9030008@oracle.com> On 27/11/2014 10:49 AM, David Holmes wrote: > On 26/11/2014 11:33 PM, Thomas St?fe wrote: >> Hi David, >> >> here you go: >> http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.02/ >> >> Reverted SIGILL-generating function back to its original form, plus the >> folding of the 000 case. > > Thanks Thomas! While we are awaiting a second reviewer I will test this > out internally. It may take a day or two sorry. Unfortunately, on ARM (emulator), the SIGILL test generated a SEGV instead: will jump to PC 0xb6fb1000, which should cause a SIGILL. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0xb6fb2000, pid=13095, tid=3060999280 If I read the ARM architecture manual correctly all zeroes will map to a conditional AND instruction (Ref A8.6.12 AND(register)) David > David > >> I only can guess what your closed platforms are, but if it is ARM, I >> believe opcodes 0-31 are undefined. For ia64, 0 is undefined as well. >> >> Kind regards, Thomas >> >> >> On Wed, Nov 26, 2014 at 1:02 PM, David Holmes > > wrote: >> >> On 26/11/2014 9:37 PM, Thomas St?fe wrote: >> >> Hi David, >> ... >> >> - In debug.cpp for the SIGILL can you define the >> all zero >> case as a >> default so we only need to add platform specific >> definitions when >> all zeroes doesn't work. I really hate seeing all >> that CPU >> selection >> in shared code. :( >> >> >> Agreed and fixed, moved the CPU-specific sections into >> CPU-specific files. >> >> >> I'd really like to see a way to share the all-zeroes case >> so that we >> don't need to add platform specific code unnecessarily. >> >> >> sooo.. back to the original code then, just with the #ifdef, >> just with >> the zero-cases all folded in into the #else path? Or do you >> prefer >> something else? >> >> >> Elsewhere there is a pattern of defining per-platform values that >> can override the shared definition. eg: >> >> #ifndef HAS_SPECIAL_PLATFORM_VALUE___FOR_XXXX >> Foo XXX = ...; //shared/default initalization >> #endif >> >> but this assumes a platform specific header has already been >> included that can do: >> >> #define HAS_SPECIAL_PLATFORM_VALUE___FOR_XXXX >> Foo XXX = ... ; // platform specific initialization >> >> But that is not the case for debug.hpp. >> >> So I guess folding the zero-case into the else path is the best we >> can do. However I'm assuming the zero case will work for our >> internal platforms ... if it doesn't then we'd have to pollute the >> shared code with info for the closed platforms. :( >> >> David >> ----- >> >> >> - Style nit: please use i++ rather than i ++ >> >> >> Fixed. >> >> Aside: we should eradicate the use of >> sigprocmask and >> replace with >> the thread specific version. >> >> >> Agree. Though I never saw any errors stemming from the >> use of >> sigprocmask(). According to POSIX, sigprocmask() is >> undefined in >> multithreaded environment, and I guess most OSes just >> default to >> pthread_sigmask. >> >> >> Yes "probably" works okay but I hate to see us using >> something with >> undefined semantics. That's future clean up though. >> >> >> We (SAP JVM) already use pthread_sigmask() / thr_sigsetmask() >> instead of >> sigprocmask. Works fine. We can port this to the OpenJDK. >> >> Getting back to the "thinking more about this" ... >> If a >> synchronous >> signal is blocked at the time it is generated >> then it >> should remain >> pending on the thread (POSIX spec) but that >> doesn't tell us >> what the >> thread will then do - retry the faulting >> instruction? Become >> unschedulable? So I can easily imagine that a hang >> or process >> termination may result. >> >> >> This is exactly what happens, but it is actually >> covered by >> POSIX, see >> doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, >> SIGSEGV, or >> SIGBUS signals are generated while they are blocked, >> the result is >> undefined, unless the signal was generated by the >> /kill/() >> >> >> > >> >> >> >> >> > >> >> >> function, the /sigqueue/() >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> function, or the /raise/() >> >> >> > >> >> >> >> >> >> > >> >> >> function." >> >> >> Thanks - I managed to miss that part even though I found >> the other >> part about the signal handling function returning. :( >> >> >> It is well hidden, I found it by accident :) To me it looks like >> they >> kept it intentionally vague, to not block platforms where those >> signals >> could be somehow dealt with automatically? Hard to see though >> how this >> would work. >> >> >> >> In reality, process usually aborts abnormally with the >> default >> action >> for the signal, e.g. printing out "Illegal >> Instruction". On >> MacOS, we >> hang (until the Watcherthread finally kills the VM). >> On old >> AIXes, we >> die without a trace. >> >> This also can be easily tried out by removing SIGILL >> from the >> list of >> signals in vmError_.cpp and executing: >> >> java -XX:ErrorHandlerTest=14 >> -XX:TestCrashInErrorHandler=15 >> >> which will crash first with a SIGSEGV, then in error >> handling with a >> secondary SIGILL. This will interrupt error reporting >> and kill >> or hang >> the process. >> >> >> In that sense unblocking those signals whilst >> handling the >> initial >> signal may well allow the error reporting process >> to continue >> further. But I'm unclear exactly how this plays >> out: >> >> - synchronous signal encountered >> - crash_handler invoked >> >> - VMError::report_and_die executes >> - secondary signal encountered >> >> - crash_handler invoked again >> >> >> almost: not again, different signal handler now. First >> signal was >> handled by "JVM_handle__signal()" >> >> >> Ah missed that - thanks - not that it makes much >> difference :) >> >> >> I just like nitpicking :) >> >> - VMError::report_and_die executes again and >> sees the >> recursion and >> returns (ignoring abort due to excessive recursive >> errors) >> >> >> No.. >> >> Is that right? So we actually return from the >> crash_handler? >> >> >> Oh, but we dont return. VMError::report_and_die() >> will just >> create a new >> frame and re-execute VMError::report() of the first >> VMError object. >> Which then will continue with the next STEP. We never >> return, >> for each >> secondary error signal a new frame is created. >> >> >> I had trouble tracing through exactly what might happen >> on the >> recursive call to report_and_die. I see know that report >> comes from: >> >> staticBufferStream sbs(buffer, O_BUFLEN, &log); >> first_error->report(&sbs); >> first_error->_current_step = 0; // reset >> current_step >> first_error->_current_step_____info = ""; // reset >> current_step >> >> string >> >> so the second time through we will call report and >> _current_step >> should indicate where to start executing from. >> >> >> Exactly. There is also a catch, in that the stack usage goes >> up. Not >> endlessly, it is limited by the number of error reporting steps. >> The more stack VmError::report() does cost, the less well this >> works, >> especially in stack overflow scenarios. >> >> Which is why we extended SafeFetch and enabled it for the use >> in the >> error handler, which will be one of the the next patches I'd >> like to >> port to the OpenJDK, once this one is thru. >> >> >> This all happens in VMError::report_and_die: >> -> first error ? anchor VMError object in a static >> variable and >> execute >> VMError::report() >> -> secondary error? >> -> different thread? just sleep forever >> -> same thread? new frame, re-enter >> VMError::report(). Once >> done, abort. >> >> I always found that rather neat, but in fact that is >> not our >> invention >> but Sun's :) Anyway, my fix does not change this >> behaviour for >> better or >> worse, it only makes it usable for more cases. >> >> Because this puts us in undefined territory >> according to POSIX: >> >> "The behavior of a process is undefined after it >> returns >> normally >> from a signal-catching function for a SIGBUS, >> SIGFPE, >> SIGILL, or >> SIGSEGV signal that was not generated by kill(), >> sigqueue(), or >> raise()." >> >> true, but we dont return... >> >> On top of that you also have the issue that error >> reporting >> does a >> whole bunch of things that are not >> async-signal-safe so we can >> easily encounter hangs or aborts. >> >> But we're dying anyway so I guess none of this >> really >> matters. If >> re-enabling these signals allows error reporting to >> progress further >> in some cases then that is a win. >> >> >> Actually, this covers a lot of cases, mostly because >> SIGSEGV during >> error reporting is common, so if the original error >> was not >> SIGSEGV, but >> e.g. SIGILL, this would always result in broken hs-err >> files. >> >> The back story is that at SAP, we rely heavily on the >> hs-err >> files. They >> are our main tool for support, because working with >> cores is >> often not >> feasible. So, we put a lot of work in making error >> reporting >> reliable >> across all platforms. This is also covered by many >> tests which >> crash the >> VM in exciting ways and check the hs-err files for >> completeness. >> >> >> OK. Modulo the cpu specific SIGILL part everything else >> seems fine. >> >> Great. just tell me how you want that part. >> >> Kind regards, Thomas >> >> Thanks, >> David >> >> From thomas.stuefe at gmail.com Thu Nov 27 07:36:44 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Nov 2014 08:36:44 +0100 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <5476B417.9030008@oracle.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> Message-ID: Unfortunately, I cannot test it, as I have no ARM environment. The best I can come up with without testing is this: http://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception Kind regards, Thomas On Thu, Nov 27, 2014 at 6:18 AM, David Holmes wrote: > On 27/11/2014 10:49 AM, David Holmes wrote: > >> On 26/11/2014 11:33 PM, Thomas St?fe wrote: >> >>> Hi David, >>> >>> here you go: >>> http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.02/ >>> >>> Reverted SIGILL-generating function back to its original form, plus the >>> folding of the 000 case. >>> >> >> Thanks Thomas! While we are awaiting a second reviewer I will test this >> out internally. It may take a day or two sorry. >> > > Unfortunately, on ARM (emulator), the SIGILL test generated a SEGV instead: > > will jump to PC 0xb6fb1000, which should cause a SIGILL. > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0xb6fb2000, pid=13095, tid=3060999280 > > If I read the ARM architecture manual correctly all zeroes will map to a > conditional AND instruction (Ref A8.6.12 AND(register)) > > David > > > David >> >> I only can guess what your closed platforms are, but if it is ARM, I >>> believe opcodes 0-31 are undefined. For ia64, 0 is undefined as well. >>> >>> Kind regards, Thomas >>> >>> >>> On Wed, Nov 26, 2014 at 1:02 PM, David Holmes >> > wrote: >>> >>> On 26/11/2014 9:37 PM, Thomas St?fe wrote: >>> >>> Hi David, >>> ... >>> >>> - In debug.cpp for the SIGILL can you define the >>> all zero >>> case as a >>> default so we only need to add platform specific >>> definitions when >>> all zeroes doesn't work. I really hate seeing all >>> that CPU >>> selection >>> in shared code. :( >>> >>> >>> Agreed and fixed, moved the CPU-specific sections into >>> CPU-specific files. >>> >>> >>> I'd really like to see a way to share the all-zeroes case >>> so that we >>> don't need to add platform specific code unnecessarily. >>> >>> >>> sooo.. back to the original code then, just with the #ifdef, >>> just with >>> the zero-cases all folded in into the #else path? Or do you >>> prefer >>> something else? >>> >>> >>> Elsewhere there is a pattern of defining per-platform values that >>> can override the shared definition. eg: >>> >>> #ifndef HAS_SPECIAL_PLATFORM_VALUE___FOR_XXXX >>> Foo XXX = ...; //shared/default initalization >>> #endif >>> >>> but this assumes a platform specific header has already been >>> included that can do: >>> >>> #define HAS_SPECIAL_PLATFORM_VALUE___FOR_XXXX >>> Foo XXX = ... ; // platform specific initialization >>> >>> But that is not the case for debug.hpp. >>> >>> So I guess folding the zero-case into the else path is the best we >>> can do. However I'm assuming the zero case will work for our >>> internal platforms ... if it doesn't then we'd have to pollute the >>> shared code with info for the closed platforms. :( >>> >>> David >>> ----- >>> >>> >>> - Style nit: please use i++ rather than i ++ >>> >>> >>> Fixed. >>> >>> Aside: we should eradicate the use of >>> sigprocmask and >>> replace with >>> the thread specific version. >>> >>> >>> Agree. Though I never saw any errors stemming from the >>> use of >>> sigprocmask(). According to POSIX, sigprocmask() is >>> undefined in >>> multithreaded environment, and I guess most OSes just >>> default to >>> pthread_sigmask. >>> >>> >>> Yes "probably" works okay but I hate to see us using >>> something with >>> undefined semantics. That's future clean up though. >>> >>> >>> We (SAP JVM) already use pthread_sigmask() / thr_sigsetmask() >>> instead of >>> sigprocmask. Works fine. We can port this to the OpenJDK. >>> >>> Getting back to the "thinking more about this" ... >>> If a >>> synchronous >>> signal is blocked at the time it is generated >>> then it >>> should remain >>> pending on the thread (POSIX spec) but that >>> doesn't tell us >>> what the >>> thread will then do - retry the faulting >>> instruction? Become >>> unschedulable? So I can easily imagine that a hang >>> or process >>> termination may result. >>> >>> >>> This is exactly what happens, but it is actually >>> covered by >>> POSIX, see >>> doc on pthread_sigmask: "If any of the SIGFPE, SIGILL, >>> SIGSEGV, or >>> SIGBUS signals are generated while they are blocked, >>> the result is >>> undefined, unless the signal was generated by the >>> /kill/() >>> >>> >>> >> functions/kill.html >>> >>> >>> >>> >>> >>> >> >>> >> >>> function, the /sigqueue/() >>> >>> >>> >> functions/sigqueue.html >>> >>> >>> >> functions/sigqueue.html> >>> >>> >>> >>> >> functions/sigqueue.html >>> >>> >>> >> >>> >>> >>> function, or the /raise/() >>> >>> >>> >> functions/raise.html >>> >>> >>> >> > >>> >>> >>> >>> >> >>> >> >>> function." >>> >>> >>> Thanks - I managed to miss that part even though I found >>> the other >>> part about the signal handling function returning. :( >>> >>> >>> It is well hidden, I found it by accident :) To me it looks like >>> they >>> kept it intentionally vague, to not block platforms where those >>> signals >>> could be somehow dealt with automatically? Hard to see though >>> how this >>> would work. >>> >>> >>> >>> In reality, process usually aborts abnormally with the >>> default >>> action >>> for the signal, e.g. printing out "Illegal >>> Instruction". On >>> MacOS, we >>> hang (until the Watcherthread finally kills the VM). >>> On old >>> AIXes, we >>> die without a trace. >>> >>> This also can be easily tried out by removing SIGILL >>> from the >>> list of >>> signals in vmError_.cpp and executing: >>> >>> java -XX:ErrorHandlerTest=14 >>> -XX:TestCrashInErrorHandler=15 >>> >>> which will crash first with a SIGSEGV, then in error >>> handling with a >>> secondary SIGILL. This will interrupt error reporting >>> and kill >>> or hang >>> the process. >>> >>> >>> In that sense unblocking those signals whilst >>> handling the >>> initial >>> signal may well allow the error reporting process >>> to continue >>> further. But I'm unclear exactly how this plays >>> out: >>> >>> - synchronous signal encountered >>> - crash_handler invoked >>> >>> - VMError::report_and_die executes >>> - secondary signal encountered >>> >>> - crash_handler invoked again >>> >>> >>> almost: not again, different signal handler now. First >>> signal was >>> handled by "JVM_handle__signal()" >>> >>> >>> Ah missed that - thanks - not that it makes much >>> difference :) >>> >>> >>> I just like nitpicking :) >>> >>> - VMError::report_and_die executes again and >>> sees the >>> recursion and >>> returns (ignoring abort due to excessive recursive >>> errors) >>> >>> >>> No.. >>> >>> Is that right? So we actually return from the >>> crash_handler? >>> >>> >>> Oh, but we dont return. VMError::report_and_die() >>> will just >>> create a new >>> frame and re-execute VMError::report() of the first >>> VMError object. >>> Which then will continue with the next STEP. We never >>> return, >>> for each >>> secondary error signal a new frame is created. >>> >>> >>> I had trouble tracing through exactly what might happen >>> on the >>> recursive call to report_and_die. I see know that report >>> comes from: >>> >>> staticBufferStream sbs(buffer, O_BUFLEN, &log); >>> first_error->report(&sbs); >>> first_error->_current_step = 0; // reset >>> current_step >>> first_error->_current_step_____info = ""; // reset >>> current_step >>> >>> string >>> >>> so the second time through we will call report and >>> _current_step >>> should indicate where to start executing from. >>> >>> >>> Exactly. There is also a catch, in that the stack usage goes >>> up. Not >>> endlessly, it is limited by the number of error reporting steps. >>> The more stack VmError::report() does cost, the less well this >>> works, >>> especially in stack overflow scenarios. >>> >>> Which is why we extended SafeFetch and enabled it for the use >>> in the >>> error handler, which will be one of the the next patches I'd >>> like to >>> port to the OpenJDK, once this one is thru. >>> >>> >>> This all happens in VMError::report_and_die: >>> -> first error ? anchor VMError object in a static >>> variable and >>> execute >>> VMError::report() >>> -> secondary error? >>> -> different thread? just sleep forever >>> -> same thread? new frame, re-enter >>> VMError::report(). Once >>> done, abort. >>> >>> I always found that rather neat, but in fact that is >>> not our >>> invention >>> but Sun's :) Anyway, my fix does not change this >>> behaviour for >>> better or >>> worse, it only makes it usable for more cases. >>> >>> Because this puts us in undefined territory >>> according to POSIX: >>> >>> "The behavior of a process is undefined after it >>> returns >>> normally >>> from a signal-catching function for a SIGBUS, >>> SIGFPE, >>> SIGILL, or >>> SIGSEGV signal that was not generated by kill(), >>> sigqueue(), or >>> raise()." >>> >>> true, but we dont return... >>> >>> On top of that you also have the issue that error >>> reporting >>> does a >>> whole bunch of things that are not >>> async-signal-safe so we can >>> easily encounter hangs or aborts. >>> >>> But we're dying anyway so I guess none of this >>> really >>> matters. If >>> re-enabling these signals allows error reporting to >>> progress further >>> in some cases then that is a win. >>> >>> >>> Actually, this covers a lot of cases, mostly because >>> SIGSEGV during >>> error reporting is common, so if the original error >>> was not >>> SIGSEGV, but >>> e.g. SIGILL, this would always result in broken hs-err >>> files. >>> >>> The back story is that at SAP, we rely heavily on the >>> hs-err >>> files. They >>> are our main tool for support, because working with >>> cores is >>> often not >>> feasible. So, we put a lot of work in making error >>> reporting >>> reliable >>> across all platforms. This is also covered by many >>> tests which >>> crash the >>> VM in exciting ways and check the hs-err files for >>> completeness. >>> >>> >>> OK. Modulo the cpu specific SIGILL part everything else >>> seems fine. >>> >>> Great. just tell me how you want that part. >>> >>> Kind regards, Thomas >>> >>> Thanks, >>> David >>> >>> >>> From david.holmes at oracle.com Thu Nov 27 09:01:05 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 27 Nov 2014 19:01:05 +1000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> Message-ID: <5476E851.8050802@oracle.com> On 27/11/2014 5:36 PM, Thomas St?fe wrote: > Unfortunately, I cannot test it, as I have no ARM environment. The best > I can come up with without testing is this: > http://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception The issue is how to handle this? Put ifdefs for ARM in the open code? Revert to your per-platform solution? Some other variation? Or do we just not care if we can't trigger SIGILL on ARM? Though I'd like to hear from the AARCH64 folk too. David > Kind regards, Thomas > > On Thu, Nov 27, 2014 at 6:18 AM, David Holmes > wrote: > > On 27/11/2014 10:49 AM, David Holmes wrote: > > On 26/11/2014 11:33 PM, Thomas St?fe wrote: > > Hi David, > > here you go: > http://cr.openjdk.java.net/~__stuefe/webrevs/8065895/webrev.__02/ > > > Reverted SIGILL-generating function back to its original > form, plus the > folding of the 000 case. > > > Thanks Thomas! While we are awaiting a second reviewer I will > test this > out internally. It may take a day or two sorry. > > > Unfortunately, on ARM (emulator), the SIGILL test generated a SEGV > instead: > > will jump to PC 0xb6fb1000, which should cause a SIGILL. > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0xb6fb2000, pid=13095, tid=3060999280 > > If I read the ARM architecture manual correctly all zeroes will map > to a conditional AND instruction (Ref A8.6.12 AND(register)) > > David > > > David > > I only can guess what your closed platforms are, but if it > is ARM, I > believe opcodes 0-31 are undefined. For ia64, 0 is undefined > as well. > > Kind regards, Thomas > > > On Wed, Nov 26, 2014 at 1:02 PM, David Holmes > > >> wrote: > > On 26/11/2014 9:37 PM, Thomas St?fe wrote: > > Hi David, > ... > > - In debug.cpp for the SIGILL can you > define the > all zero > case as a > default so we only need to add > platform specific > definitions when > all zeroes doesn't work. I really > hate seeing all > that CPU > selection > in shared code. :( > > > Agreed and fixed, moved the CPU-specific > sections into > CPU-specific files. > > > I'd really like to see a way to share the > all-zeroes case > so that we > don't need to add platform specific code > unnecessarily. > > > sooo.. back to the original code then, just with > the #ifdef, > just with > the zero-cases all folded in into the #else path? > Or do you > prefer > something else? > > > Elsewhere there is a pattern of defining per-platform > values that > can override the shared definition. eg: > > #ifndef HAS_SPECIAL_PLATFORM_VALUE_____FOR_XXXX > Foo XXX = ...; //shared/default initalization > #endif > > but this assumes a platform specific header has already > been > included that can do: > > #define HAS_SPECIAL_PLATFORM_VALUE_____FOR_XXXX > Foo XXX = ... ; // platform specific initialization > > But that is not the case for debug.hpp. > > So I guess folding the zero-case into the else path is > the best we > can do. However I'm assuming the zero case will work > for our > internal platforms ... if it doesn't then we'd have to > pollute the > shared code with info for the closed platforms. :( > > David > ----- > > > - Style nit: please use i++ rather > than i ++ > > > Fixed. > > Aside: we should eradicate the use of > sigprocmask and > replace with > the thread specific version. > > > Agree. Though I never saw any errors > stemming from the > use of > sigprocmask(). According to POSIX, > sigprocmask() is > undefined in > multithreaded environment, and I guess > most OSes just > default to > pthread_sigmask. > > > Yes "probably" works okay but I hate to see us > using > something with > undefined semantics. That's future clean up > though. > > > We (SAP JVM) already use pthread_sigmask() / > thr_sigsetmask() > instead of > sigprocmask. Works fine. We can port this to the > OpenJDK. > > Getting back to the "thinking more > about this" ... > If a > synchronous > signal is blocked at the time it is > generated > then it > should remain > pending on the thread (POSIX spec) > but that > doesn't tell us > what the > thread will then do - retry the faulting > instruction? Become > unschedulable? So I can easily > imagine that a hang > or process > termination may result. > > > This is exactly what happens, but it is > actually > covered by > POSIX, see > doc on pthread_sigmask: "If any of the > SIGFPE, SIGILL, > SIGSEGV, or > SIGBUS signals are generated while they > are blocked, > the result is > undefined, unless the signal was generated > by the > /kill/() > > > > > > > > > > > > >>> > function, the /sigqueue/() > > > > > > > > > > > > > > >>> > > function, or the /raise/() > > > > > > > > > > > > > >>> > function." > > > Thanks - I managed to miss that part even > though I found > the other > part about the signal handling function > returning. :( > > > It is well hidden, I found it by accident :) To me > it looks like > they > kept it intentionally vague, to not block platforms > where those > signals > could be somehow dealt with automatically? Hard to > see though > how this > would work. > > > > In reality, process usually aborts > abnormally with the > default > action > for the signal, e.g. printing out "Illegal > Instruction". On > MacOS, we > hang (until the Watcherthread finally > kills the VM). > On old > AIXes, we > die without a trace. > > This also can be easily tried out by > removing SIGILL > from the > list of > signals in vmError_.cpp and executing: > > java -XX:ErrorHandlerTest=14 > -XX:TestCrashInErrorHandler=15 > > which will crash first with a SIGSEGV, > then in error > handling with a > secondary SIGILL. This will interrupt > error reporting > and kill > or hang > the process. > > > In that sense unblocking those > signals whilst > handling the > initial > signal may well allow the error > reporting process > to continue > further. But I'm unclear exactly how > this plays > out: > > - synchronous signal encountered > - crash_handler invoked > > - VMError::report_and_die executes > - secondary signal encountered > > - crash_handler invoked again > > > almost: not again, different signal > handler now. First > signal was > handled by "JVM_handle__signal()" > > > Ah missed that - thanks - not that it makes much > difference :) > > > I just like nitpicking :) > > - VMError::report_and_die executes > again and > sees the > recursion and > returns (ignoring abort due to > excessive recursive > errors) > > > No.. > > Is that right? So we actually return > from the > crash_handler? > > > Oh, but we dont return. > VMError::report_and_die() > will just > create a new > frame and re-execute VMError::report() of > the first > VMError object. > Which then will continue with the next > STEP. We never > return, > for each > secondary error signal a new frame is created. > > > I had trouble tracing through exactly what > might happen > on the > recursive call to report_and_die. I see know > that report > comes from: > > staticBufferStream sbs(buffer, O_BUFLEN, > &log); > first_error->report(&sbs); > first_error->_current_step = 0; > // reset > current_step > first_error->_current_step_______info = > ""; // reset > current_step > > string > > so the second time through we will call report and > _current_step > should indicate where to start executing from. > > > Exactly. There is also a catch, in that the stack > usage goes > up. Not > endlessly, it is limited by the number of error > reporting steps. > The more stack VmError::report() does cost, the > less well this > works, > especially in stack overflow scenarios. > > Which is why we extended SafeFetch and enabled it > for the use > in the > error handler, which will be one of the the next > patches I'd > like to > port to the OpenJDK, once this one is thru. > > > This all happens in VMError::report_and_die: > -> first error ? anchor VMError object in > a static > variable and > execute > VMError::report() > -> secondary error? > -> different thread? just sleep forever > -> same thread? new frame, re-enter > VMError::report(). Once > done, abort. > > I always found that rather neat, but in > fact that is > not our > invention > but Sun's :) Anyway, my fix does not > change this > behaviour for > better or > worse, it only makes it usable for more cases. > > Because this puts us in undefined > territory > according to POSIX: > > "The behavior of a process is > undefined after it > returns > normally > from a signal-catching function for a > SIGBUS, > SIGFPE, > SIGILL, or > SIGSEGV signal that was not generated > by kill(), > sigqueue(), or > raise()." > > true, but we dont return... > > On top of that you also have the > issue that error > reporting > does a > whole bunch of things that are not > async-signal-safe so we can > easily encounter hangs or aborts. > > But we're dying anyway so I guess > none of this > really > matters. If > re-enabling these signals allows > error reporting to > progress further > in some cases then that is a win. > > > Actually, this covers a lot of cases, > mostly because > SIGSEGV during > error reporting is common, so if the > original error > was not > SIGSEGV, but > e.g. SIGILL, this would always result in > broken hs-err > files. > > The back story is that at SAP, we rely > heavily on the > hs-err > files. They > are our main tool for support, because > working with > cores is > often not > feasible. So, we put a lot of work in > making error > reporting > reliable > across all platforms. This is also covered > by many > tests which > crash the > VM in exciting ways and check the hs-err > files for > completeness. > > > OK. Modulo the cpu specific SIGILL part > everything else > seems fine. > > Great. just tell me how you want that part. > > Kind regards, Thomas > > Thanks, > David > > > From thomas.stuefe at gmail.com Thu Nov 27 09:27:02 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Nov 2014 10:27:02 +0100 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <5476E851.8050802@oracle.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> <5476E851.8050802@oracle.com> Message-ID: I am preparing a jtreg tests which would fail if no SIGILL is produced. A real SIGILL is needed to make the test meaningful, although I guess a fake SIGILL (kill() or raise()) would make the test pass too. Which could be a workaround for the time being. I could live with either #ifdef in shared code - shared code already contains lots of #ifdef ARM - or with cpu-specific files; I also could add the debug_.hpp files needed for your solution. Kind Regards, Thomas On Thu, Nov 27, 2014 at 10:01 AM, David Holmes wrote: > On 27/11/2014 5:36 PM, Thomas St?fe wrote: > >> Unfortunately, I cannot test it, as I have no ARM environment. The best >> I can come up with without testing is this: >> http://stackoverflow.com/questions/16081618/programmatically-cause- >> undefined-instruction-exception >> > > The issue is how to handle this? Put ifdefs for ARM in the open code? > Revert to your per-platform solution? Some other variation? Or do we just > not care if we can't trigger SIGILL on ARM? Though I'd like to hear from > the AARCH64 folk too. > > David > > Kind regards, Thomas >> >> On Thu, Nov 27, 2014 at 6:18 AM, David Holmes > > wrote: >> >> On 27/11/2014 10:49 AM, David Holmes wrote: >> >> On 26/11/2014 11:33 PM, Thomas St?fe wrote: >> >> Hi David, >> >> here you go: >> http://cr.openjdk.java.net/~__stuefe/webrevs/8065895/webrev. >> __02/ >> >> > 02/> >> >> Reverted SIGILL-generating function back to its original >> form, plus the >> folding of the 000 case. >> >> >> Thanks Thomas! While we are awaiting a second reviewer I will >> test this >> out internally. It may take a day or two sorry. >> >> >> Unfortunately, on ARM (emulator), the SIGILL test generated a SEGV >> instead: >> >> will jump to PC 0xb6fb1000, which should cause a SIGILL. >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0xb6fb2000, pid=13095, tid=3060999280 >> >> If I read the ARM architecture manual correctly all zeroes will map >> to a conditional AND instruction (Ref A8.6.12 AND(register)) >> >> David >> >> >> David >> >> I only can guess what your closed platforms are, but if it >> is ARM, I >> believe opcodes 0-31 are undefined. For ia64, 0 is undefined >> as well. >> >> Kind regards, Thomas >> >> >> On Wed, Nov 26, 2014 at 1:02 PM, David Holmes >> >> > >> >> wrote: >> >> On 26/11/2014 9:37 PM, Thomas St?fe wrote: >> >> Hi David, >> ... >> >> - In debug.cpp for the SIGILL can you >> define the >> all zero >> case as a >> default so we only need to add >> platform specific >> definitions when >> all zeroes doesn't work. I really >> hate seeing all >> that CPU >> selection >> in shared code. :( >> >> >> Agreed and fixed, moved the CPU-specific >> sections into >> CPU-specific files. >> >> >> I'd really like to see a way to share the >> all-zeroes case >> so that we >> don't need to add platform specific code >> unnecessarily. >> >> >> sooo.. back to the original code then, just with >> the #ifdef, >> just with >> the zero-cases all folded in into the #else path? >> Or do you >> prefer >> something else? >> >> >> Elsewhere there is a pattern of defining per-platform >> values that >> can override the shared definition. eg: >> >> #ifndef HAS_SPECIAL_PLATFORM_VALUE_____FOR_XXXX >> Foo XXX = ...; //shared/default initalization >> #endif >> >> but this assumes a platform specific header has already >> been >> included that can do: >> >> #define HAS_SPECIAL_PLATFORM_VALUE_____FOR_XXXX >> >> Foo XXX = ... ; // platform specific initialization >> >> But that is not the case for debug.hpp. >> >> So I guess folding the zero-case into the else path is >> the best we >> can do. However I'm assuming the zero case will work >> for our >> internal platforms ... if it doesn't then we'd have to >> pollute the >> shared code with info for the closed platforms. :( >> >> David >> ----- >> >> >> - Style nit: please use i++ rather >> than i ++ >> >> >> Fixed. >> >> Aside: we should eradicate the use of >> sigprocmask and >> replace with >> the thread specific version. >> >> >> Agree. Though I never saw any errors >> stemming from the >> use of >> sigprocmask(). According to POSIX, >> sigprocmask() is >> undefined in >> multithreaded environment, and I guess >> most OSes just >> default to >> pthread_sigmask. >> >> >> Yes "probably" works okay but I hate to see us >> using >> something with >> undefined semantics. That's future clean up >> though. >> >> >> We (SAP JVM) already use pthread_sigmask() / >> thr_sigsetmask() >> instead of >> sigprocmask. Works fine. We can port this to the >> OpenJDK. >> >> Getting back to the "thinking more >> about this" ... >> If a >> synchronous >> signal is blocked at the time it is >> generated >> then it >> should remain >> pending on the thread (POSIX spec) >> but that >> doesn't tell us >> what the >> thread will then do - retry the >> faulting >> instruction? Become >> unschedulable? So I can easily >> imagine that a hang >> or process >> termination may result. >> >> >> This is exactly what happens, but it is >> actually >> covered by >> POSIX, see >> doc on pthread_sigmask: "If any of the >> SIGFPE, SIGILL, >> SIGSEGV, or >> SIGBUS signals are generated while they >> are blocked, >> the result is >> undefined, unless the signal was generated >> by the >> /kill/() >> >> >> > functions/kill.html >> > functions/kill.html> >> >> >> > functions/kill.html >> > functions/kill.html>> >> >> >> > functions/kill.html >> > functions/kill.html> >> >> > functions/kill.html >> > functions/kill.html>>>> >> function, the /sigqueue/() >> >> >> > functions/sigqueue.html >> > functions/sigqueue.html> >> >> >> > functions/sigqueue.html >> > functions/sigqueue.html>> >> >> >> >> > functions/sigqueue.html >> > functions/sigqueue.html> >> >> >> > functions/sigqueue.html >> > functions/sigqueue.html>>>> >> >> function, or the /raise/() >> >> >> > functions/raise.html >> > functions/raise.html> >> >> >> >> > functions/raise.html >> > functions/raise.html>> >> >> >> >> > functions/raise.html >> > functions/raise.html> >> >> > functions/raise.html >> > functions/raise.html>>>> >> function." >> >> >> Thanks - I managed to miss that part even >> though I found >> the other >> part about the signal handling function >> returning. :( >> >> >> It is well hidden, I found it by accident :) To me >> it looks like >> they >> kept it intentionally vague, to not block platforms >> where those >> signals >> could be somehow dealt with automatically? Hard to >> see though >> how this >> would work. >> >> >> >> In reality, process usually aborts >> abnormally with the >> default >> action >> for the signal, e.g. printing out "Illegal >> Instruction". On >> MacOS, we >> hang (until the Watcherthread finally >> kills the VM). >> On old >> AIXes, we >> die without a trace. >> >> This also can be easily tried out by >> removing SIGILL >> from the >> list of >> signals in vmError_.cpp and executing: >> >> java -XX:ErrorHandlerTest=14 >> -XX:TestCrashInErrorHandler=15 >> >> which will crash first with a SIGSEGV, >> then in error >> handling with a >> secondary SIGILL. This will interrupt >> error reporting >> and kill >> or hang >> the process. >> >> >> In that sense unblocking those >> signals whilst >> handling the >> initial >> signal may well allow the error >> reporting process >> to continue >> further. But I'm unclear exactly how >> this plays >> out: >> >> - synchronous signal encountered >> - crash_handler invoked >> >> - VMError::report_and_die executes >> - secondary signal encountered >> >> - crash_handler invoked again >> >> >> almost: not again, different signal >> handler now. First >> signal was >> handled by "JVM_handle__signal()" >> >> >> Ah missed that - thanks - not that it makes much >> difference :) >> >> >> I just like nitpicking :) >> >> - VMError::report_and_die executes >> again and >> sees the >> recursion and >> returns (ignoring abort due to >> excessive recursive >> errors) >> >> >> No.. >> >> Is that right? So we actually return >> from the >> crash_handler? >> >> >> Oh, but we dont return. >> VMError::report_and_die() >> will just >> create a new >> frame and re-execute VMError::report() of >> the first >> VMError object. >> Which then will continue with the next >> STEP. We never >> return, >> for each >> secondary error signal a new frame is >> created. >> >> >> I had trouble tracing through exactly what >> might happen >> on the >> recursive call to report_and_die. I see know >> that report >> comes from: >> >> staticBufferStream sbs(buffer, O_BUFLEN, >> &log); >> first_error->report(&sbs); >> first_error->_current_step = 0; >> // reset >> current_step >> first_error->_current_step_______info = >> >> ""; // reset >> current_step >> >> string >> >> so the second time through we will call report >> and >> _current_step >> should indicate where to start executing from. >> >> >> Exactly. There is also a catch, in that the stack >> usage goes >> up. Not >> endlessly, it is limited by the number of error >> reporting steps. >> The more stack VmError::report() does cost, the >> less well this >> works, >> especially in stack overflow scenarios. >> >> Which is why we extended SafeFetch and enabled it >> for the use >> in the >> error handler, which will be one of the the next >> patches I'd >> like to >> port to the OpenJDK, once this one is thru. >> >> >> This all happens in VMError::report_and_die: >> -> first error ? anchor VMError object in >> a static >> variable and >> execute >> VMError::report() >> -> secondary error? >> -> different thread? just sleep forever >> -> same thread? new frame, re-enter >> VMError::report(). Once >> done, abort. >> >> I always found that rather neat, but in >> fact that is >> not our >> invention >> but Sun's :) Anyway, my fix does not >> change this >> behaviour for >> better or >> worse, it only makes it usable for more >> cases. >> >> Because this puts us in undefined >> territory >> according to POSIX: >> >> "The behavior of a process is >> undefined after it >> returns >> normally >> from a signal-catching function for a >> SIGBUS, >> SIGFPE, >> SIGILL, or >> SIGSEGV signal that was not generated >> by kill(), >> sigqueue(), or >> raise()." >> >> true, but we dont return... >> >> On top of that you also have the >> issue that error >> reporting >> does a >> whole bunch of things that are not >> async-signal-safe so we can >> easily encounter hangs or aborts. >> >> But we're dying anyway so I guess >> none of this >> really >> matters. If >> re-enabling these signals allows >> error reporting to >> progress further >> in some cases then that is a win. >> >> >> Actually, this covers a lot of cases, >> mostly because >> SIGSEGV during >> error reporting is common, so if the >> original error >> was not >> SIGSEGV, but >> e.g. SIGILL, this would always result in >> broken hs-err >> files. >> >> The back story is that at SAP, we rely >> heavily on the >> hs-err >> files. They >> are our main tool for support, because >> working with >> cores is >> often not >> feasible. So, we put a lot of work in >> making error >> reporting >> reliable >> across all platforms. This is also covered >> by many >> tests which >> crash the >> VM in exciting ways and check the hs-err >> files for >> completeness. >> >> >> OK. Modulo the cpu specific SIGILL part >> everything else >> seems fine. >> >> Great. just tell me how you want that part. >> >> Kind regards, Thomas >> >> Thanks, >> David >> >> >> >> From aph at redhat.com Thu Nov 27 09:45:15 2014 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Nov 2014 09:45:15 +0000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <5476E851.8050802@oracle.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> <5476E851.8050802@oracle.com> Message-ID: <5476F2AB.4050401@redhat.com> On 11/27/2014 09:01 AM, David Holmes wrote: > On 27/11/2014 5:36 PM, Thomas St?fe wrote: >> Unfortunately, I cannot test it, as I have no ARM environment. The best >> I can come up with without testing is this: >> http://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception > > The issue is how to handle this? Put ifdefs for ARM in the open code? > Revert to your per-platform solution? Some other variation? Or do we > just not care if we can't trigger SIGILL on ARM? Though I'd like to hear > from the AARCH64 folk too. I always use DCPS1 if I want an undefined instruction trap. 0x0100A0D4. Andrew. From thomas.stuefe at gmail.com Thu Nov 27 10:38:49 2014 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Nov 2014 11:38:49 +0100 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <5476F2AB.4050401@redhat.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> <5476E851.8050802@oracle.com> <5476F2AB.4050401@redhat.com> Message-ID: Hi Andrew, thank you! Does endianess matter ? On Thu, Nov 27, 2014 at 10:45 AM, Andrew Haley wrote: > On 11/27/2014 09:01 AM, David Holmes wrote: > > On 27/11/2014 5:36 PM, Thomas St?fe wrote: > >> Unfortunately, I cannot test it, as I have no ARM environment. The best > >> I can come up with without testing is this: > >> > http://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception > > > > The issue is how to handle this? Put ifdefs for ARM in the open code? > > Revert to your per-platform solution? Some other variation? Or do we > > just not care if we can't trigger SIGILL on ARM? Though I'd like to hear > > from the AARCH64 folk too. > > I always use DCPS1 if I want an undefined instruction trap. 0x0100A0D4. > > Andrew. > > > From aph at redhat.com Thu Nov 27 10:55:10 2014 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Nov 2014 10:55:10 +0000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> <5476E851.8050802@oracle.com> <5476F2AB.4050401@redhat.com> Message-ID: <5477030E.6070605@redhat.com> On 11/27/2014 10:38 AM, Thomas St?fe wrote: > Hi Andrew, thank you! Does endianess matter ? Yes. I'd do it symbolically rather than mess with endian defines: #ifdef AARCH64 unsigned insn; asm("b 1f; 0: dcps1; 1: ldr %0, 0b" : "=r"(insn)); #endif Andrew. From david.holmes at oracle.com Thu Nov 27 11:00:06 2014 From: david.holmes at oracle.com (David Holmes) Date: Thu, 27 Nov 2014 21:00:06 +1000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <5477030E.6070605@redhat.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> <5476E851.8050802@oracle.com> <5476F2AB.4050401@redhat.com> <5477030E.6070605@redhat.com> Message-ID: <54770436.8070705@oracle.com> On 27/11/2014 8:55 PM, Andrew Haley wrote: > On 11/27/2014 10:38 AM, Thomas St?fe wrote: >> Hi Andrew, thank you! Does endianess matter ? > > Yes. I'd do it symbolically rather than mess with endian defines: > > #ifdef AARCH64 > unsigned insn; > asm("b 1f; 0: dcps1; 1: ldr %0, 0b" : "=r"(insn)); > #endif Does that work for ARMv7? Thanks, David > Andrew. > From aph at redhat.com Thu Nov 27 11:04:22 2014 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Nov 2014 11:04:22 +0000 Subject: RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process In-Reply-To: <54770436.8070705@oracle.com> References: <54752CF8.5070408@oracle.com> <54759DFE.7020300@oracle.com> <5475C15E.30207@oracle.com> <54767502.6010907@oracle.com> <5476B417.9030008@oracle.com> <5476E851.8050802@oracle.com> <5476F2AB.4050401@redhat.com> <5477030E.6070605@redhat.com> <54770436.8070705@oracle.com> Message-ID: <54770536.5090101@redhat.com> On 11/27/2014 11:00 AM, David Holmes wrote: > On 27/11/2014 8:55 PM, Andrew Haley wrote: >> On 11/27/2014 10:38 AM, Thomas St?fe wrote: >>> Hi Andrew, thank you! Does endianess matter ? >> >> Yes. I'd do it symbolically rather than mess with endian defines: >> >> #ifdef AARCH64 >> unsigned insn; >> asm("b 1f; 0: dcps1; 1: ldr %0, 0b" : "=r"(insn)); >> #endif > > Does that work for ARMv7? Sorry, I don't know what a good choice there would be. And I must warn you: DCPS1 isn't necessarily guaranteed to do this forever, but it works on the kernels I've tried. Andrew. From michail.chernov at oracle.com Thu Nov 27 13:28:49 2014 From: michail.chernov at oracle.com (Michail Chernov) Date: Thu, 27 Nov 2014 16:28:49 +0300 Subject: RFR: 8064909: FragmentMetaspace.java got OutOfMemoryError In-Reply-To: <54765B70.10509@oracle.com> References: <5475D74A.2060907@oracle.com> <54762451.3070802@oracle.com> <54763D54.3070704@oracle.com> <54765B70.10509@oracle.com> Message-ID: <54772711.6000003@oracle.com> Hi, CC'ed hotspot-runtime-dev. Here is not test failure - test works as expected. OOME is occurred in compiler instance. private JavaCompiler javac; ... javac = ToolProvider.getSystemJavaCompiler(); ... int exitcode = javac.run(null, null, null, file.getCanonicalPath()); if (exitcode != 0) { throw new RuntimeException("javac failure when compiling: " + file.getCanonicalPath()); Here is 2 ways - rewrite getGeneratedClass (runtime/testlibrary/GeneratedClassLoader.java) to allow them to throw not only RuntimeException, or to catch RuntimeException and check exception message comparing with "javac failure when compiling:". Both ways seem to me are not as clear as expected for this simple test. More - javac does not throw anything - it just returns exitcode (non-zero) and writes its messages to System.err. Also I can add comment to code like "OOME with message "java.lang.OutOfMemoryError: Java heap space" doesn't mean that something wrong with metaspace - need just to increase -Xmx". Thanks, Michail On 27.11.2014 2:00, Jon Masamitsu wrote: > Dima, > > If this test fails with an OOME in the future, I would like it to be > obvious that the failure is not that an OOME occurred. I cannot > tell that from looking at the test. Can the test be changed so > I don't have to spend time figuring out that the OOME is not > a failure mode of the test? > > Jon > > > On 11/26/2014 12:51 PM, Dmitry Fazunenko wrote: >> Hi Jon, >> >> The original version of test worked for 80 seconds trying to perform >> as many iterations as possible. The number of iterations performed >> depended on how fast is the machine. With each next iteration the >> size of generated and loaded classes is growing, so on fast enough >> machines 80 seconds is enough to run out of heap while generating a >> class. >> >> The fix not only sets the heap, but limits iterations. 300m heap is >> enough for 200 iterations. >> >> Your approach, with catching OOME(heap) and passing will also work, >> but it will reduce the test readability (and potentially could bring >> more problems). >> >> An alternative approach would be to limit metaspace and heap >> accordingly and load classes until we don't run out metaspace... But >> this might take awhile. >> >> So, I hope that Michael's fix is good. >> >> Thanks for looking and expressing comments. >> Dima >> >> >> >> >> On 26.11.2014 22:04, Jon Masamitsu wrote: >>> Michail, >>> >>> Your change makes this test pass but it seems like at >>> some future date 300m might not be big enough >>> (for whatever reason). Could the test be make to >>> caught an OOME, print out a message saying that >>> an OOME doesn't mean the test failed but that >>> the test needs a larger heap? Then pass an >>> exception up (maybe some type of Runtime >>> exception - sorry if that is vague but I don't >>> what type of exception would make sense). That >>> would mean we wouldn't have to spend time >>> diagnosing what the OOME means again. >>> >>> Jon >>> >>> On 11/26/2014 5:36 AM, Michail Chernov wrote: >>>> Hi, >>>> >>>> Please review this simple fix for nightly test failure: >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eistepan/~mchernov/8064909/webrev.00/ >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8064909 >>>> >>>> Problem: test fails because of OOME (not enough heap size). >>>> Solution: heap size were increased. >>>> >>>> Testing: >>>> jtreg >>>> >>>> Thanks, >>>> Michail >>> >> > > > From yasuenag at gmail.com Sat Nov 29 15:44:30 2014 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Sun, 30 Nov 2014 00:44:30 +0900 Subject: RFR: JDK-8059586: hs_err report should treat redirected core pattern. In-Reply-To: References: <542C8274.3010809@gmail.com> <54338B70.9080709@oracle.com> <543B1FD6.3000200@oracle.com> <543CF553.80601@gmail.com> <543DC2BF.9050407@oracle.com> <543E80F8.3080204@gmail.com> <547330E5.1050708@gmail.com> Message-ID: <5479E9DE.7070703@gmail.com> Hi all, Thank you for checking my patch! I've uploaded new webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.03/hotspot.patch David: > The change in: > src/os/aix/vm/os_aix.cpp > src/os/solaris/vm/os_solaris.cpp > > jio_snprintf(buffer, bufferSize, "%s/core or core.%d", current_process_id()); > > has no argument for the %s - presumably p was intended. I've fixed. Staffan: > src/os/bsd/vm/os_linux.cpp: > Could we not simplify this to print a helpful message instead? Most of case in Linux, I think that core image name is "core." . In other case which except pipe redirection, I guess that user defines it. Thus I print string in kernel.core_pattern directly. > src/os/bsd/vm/os_bsd.cpp: > On OS X cores are by default written to /cores/core.. This is configureable with the kern.corefile sysctl variable, although it is rare to do so. Thank you! I changed path to "/cores/core." . Thomas: > - jio_snprintf() returns -1 on truncation. n+=written may walk backwards. I would probably check for (written >= 0) and also, at the start of the loop, for (n < sizeof(core_path)). > - code is used in error reporting. I would be hesitant to create larger buffers on the stack. malloc may be better. I've fixed them. > - code does not detect truncation of core_path (unlikely but possible) Do you mean variable name? "core_path" in my patch stores /proc/sys/kernel/core_pattern . Length of kernel.core_pattern is defined 128 chars in Linux Kernel Documentation. https://www.kernel.org/doc/Documentation/sysctl/kernel.txt Thus length of core_path (129 chars) is enough. > - when reading /proc/sys/kernel/core_uses_pid, using fgetc instead of fgets may be a tiny bit simpler. I changed to use fgetc() . Thanks, Yasumasa (2014/11/26 23:12), Thomas St?fe wrote: > Hi Yasumasa, > > I am not a Reviewer. Barring the general decision of the real reviewers, here are some thoughts: > > os_linux.cpp > > - jio_snprintf() returns -1 on truncation. n+=written may walk backwards. I would probably check for (written >= 0) and also, at the start of the loop, for (n < sizeof(core_path)). > - code is used in error reporting. I would be hesitant to create larger buffers on the stack. malloc may be better. > - code does not detect truncation of core_path (unlikely but possible) > > the rest is more matter of taste: > - I would prefer sizeof(core_path) over PATH_MAX at all places where you refer to the size of the buffer. So you could make the buffer very small and test e.g. how your code behaves with truncation. > - when reading /proc/sys/kernel/core_uses_pid, using fgetc instead of fgets may be a tiny bit simpler. > > Kind Regards, Thomas > > > > On Wed, Nov 26, 2014 at 4:54 AM, Yasumasa Suenaga > wrote: > > Hi Staffan, > > Thank you for reviewing! > > os_linux.cpp: > I want to print coredump location correctly to hs_err. So I want to output > whether coredump is processed in other process or is written to file. > If os::get_core_path() should be more simply, I will print raw string in > core_pattern. > > os_bsd.cpp: > I don't have OS X. So I cannot check it. > I am focusing Linux in this enhancement. Could you file it as another > enhancement if it need? > > Thanks, > > Yasumasa > > 2014/11/25 18:15 "Staffan Larsen" >: > > > src/os/bsd/vm/os_linux.cpp: > > I?m inclined to think this is too complicated and hard to test and > > maintain (and I see no tests in the webrev). Could we not simplify this to > > print a helpful message instead? Something that prints the core_pattern and > > perhaps some of the values that could be used for substitution, but does > > not do the actual substitution? I think that would go a long way but be a > > lot more maintainable. > > > > src/os/bsd/vm/os_bsd.cpp: > > On OS X cores are by default written to /cores/core.. This is > > configureable with the kern.corefile sysctl variable, although it is rare > > to do so. > > > > /Staffan > > > > > On 24 nov 2014, at 14:21, Yasumasa Suenaga > wrote: > > > > > > Hi all, > > > > > > I've uploaded webrev for this issue about a month ago. > > > Could you review it and sponsor it? > > > > > > > > > Thanks, > > > > > > Yasumasa > > > > > > > > > On 10/15/2014 11:13 PM, Yasumasa Suenaga wrote: > > >> Hi David, > > >> > > >> I've uploaded new webrev: > > >> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.02/ > > >> > > >> > > >>> I wasn't suggesting that you make such a change though because it is > > large and disruptive. > > >> > > >>> Unfactoring check_or_create_dump is a step backwards in terms of code > > sharing. > > >> > > >> I restored check_or_create_dump() to os_posix.cpp . > > >> And I changed get_core_path() to create message which represents core > > dump path > > >> (including filename) in each OS. > > >> > > >> > > >>> Expanding the get_core_path in os_linux.cpp to handle the core_pattern > > may be okay (but I don't know enough about it to validate everything). > > >> > > >> I implemented all parameters in Linux kernel documentation: > > >> https://www.kernel.org/doc/Documentation/sysctl/kernel.txt > > >> > > >> So I think that parameters which are processed are enough. > > >> > > >> > > >> Thanks, > > >> > > >> Yasumasa > > >> > > >> > > >> > > >> (2014/10/15 9:41), David Holmes wrote: > > >>> On 14/10/2014 8:05 PM, Yasumasa Suenaga wrote: > > >>>> Hi David, > > >>>> > > >>>> Thank you for comments! > > >>>> I've uploaded new webrev. Could you review it again? > > >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.01/ > > >>>> > > >>>> I am an author of jdk9. So I cannot commit it. > > >>>> Could you be a sponsor for this enhancement? > > >>>> > > >>>> > > >>>>> In which case that should be handled by the linux specific > > >>>>> get_core_path() function. > > >>>> > > >>>> Agree. > > >>>> So I implemented it in os_linux.cpp . > > >>>> But part of format characters (%P: global pid, %s: signal, %t dump > > time) > > >>>> are not processed > > >>>> in this function because I think these parameters are difficult to > > >>>> handle in it. > > >>>> > > >>>> %P: I could not find API for this. > > >>>> %s: We have to change arguments of get_core_path() . > > >>>> %t: This parameter means timestamp of coredump. It is decided in > > Kernel. > > >>>> > > >>>> > > >>>>> Fixing this means changing all the os_posix using platforms. But your > > >>>>> patch is not about this part. :) > > >>>> > > >>>> I moved os::check_or_create_dump() to each OS implementations (AIX, > > BSD, > > >>>> Solaris, Linux) . > > >>>> So I can write Linux specific code to check_or_create_dump() . > > >>>> As a result, I could remove "#ifdef LINUX" from os_posix.cpp :-) > > >>> > > >>> I wasn't suggesting that you make such a change though because it is > > large and disruptive. The simple handling of the | part of core_pattern was > > basically ok. Expanding the get_core_path in os_linux.cpp to handle the > > core_pattern may be okay (but I don't know enough about it to validate > > everything). Unfactoring check_or_create_dump is a step backwards in terms > > of code sharing. > > >>> > > >>> Sorry this has grown too large for me to deal with right now. > > >>> > > >>> David > > >>> ----- > > >>> > > >>>> > > >>>>> Though I'm unclear whether it both invokes the program and creates a > > >>>>> core dump file; or just invokes the program? > > >>>> > > >>>> If '|' is set, Linux kernel will just redirect core image to user > > process. > > >>>> Kernel documentation says as below: > > >>>> ------------ > > >>>> . If the first character of the pattern is a '|', the kernel will > > treat > > >>>> the rest of the pattern as a command to run. The core dump will be > > >>>> written to the standard input of that program instead of to a file. > > >>>> ------------ > > >>>> > > >>>> And implementation of coredump (do_coredump()) follows to it. > > >>>> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c > > >>>> > > >>>> > > >>>> In case of ABRT, ABRT dumps core image to default location > > >>>> (/core.) > > >>>> if user set unlimited to resource limit of core (ulimit -c) . > > >>>> https://github.com/abrt/abrt/blob/master/src/hooks/abrt-hook-ccpp.c > > >>>> > > >>>> > > >>>>> A few style nits - you need spaces around keywords and before braces > > >>>>> I also suggest saying "Core dumps may be processed with ..." rather > > >>>>> than "treated". > > >>>>> And as you don't do anything in the non-redirect case I suggest > > >>>>> collapsing this: > > >>>> > > >>>> I've fixed them. > > >>>> > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Yasumasa > > >>>> > > >>>> > > >>>> (2014/10/13 9:41), David Holmes wrote: > > >>>>> Hi Yasumasa, > > >>>>> > > >>>>> On 7/10/2014 8:48 PM, Yasumasa Suenaga wrote: > > >>>>>> Hi David, > > >>>>>> > > >>>>>> Sorry for my English. > > >>>>>> > > >>>>>> I want to propose that JVM should create message according to core > > >>>>>> pattern (/proc/sys/kernel/core_pattern) . > > >>>>>> So I filed it to JBS and created a patch. > > >>>>> > > >>>>> So I've had a quick look at this core_pattern business and it seems > > to > > >>>>> me that there are two aspects to this. > > >>>>> > > >>>>> First, without the leading |, the entry in the core_pattern file is a > > >>>>> naming pattern for the core file. In which case that should be > > handled > > >>>>> by the linux specific get_core_path() function. Though that in itself > > >>>>> can't fully report the expected name, as part of it is provided in > > the > > >>>>> shared code in os::check_or_create_dump. Fixing this means changing > > >>>>> all the os_posix using platforms. But your patch is not about this > > >>>>> part. :) > > >>>>> > > >>>>> Second, with a leading | the core_pattern is actually the name of a > > >>>>> program to execute when the program is about to core dump, and that > > is > > >>>>> what you report with your patch. Though I'm unclear whether it both > > >>>>> invokes the program and creates a core dump file; or just invokes the > > >>>>> program? > > >>>>> > > >>>>> So with regards to this second part your patch seems functionally ok. > > >>>>> I do dislike having a big chunk of linux specific code in this > > "posix" > > >>>>> support file but ... > > >>>>> > > >>>>> A few style nits - you need spaces around keywords and before braces > > eg: > > >>>>> > > >>>>> if(x){ > > >>>>> > > >>>>> should be > > >>>>> > > >>>>> if (x) { > > >>>>> > > >>>>> I also suggest saying "Core dumps may be processed with ..." rather > > >>>>> than "treated". > > >>>>> > > >>>>> And as you don't do anything in the non-redirect case I suggest > > >>>>> collapsing this: > > >>>>> > > >>>>> 83 is_redirect = core_pattern[0] == '|'; > > >>>>> 84 } > > >>>>> 85 > > >>>>> 86 if(is_redirect){ > > >>>>> 87 jio_snprintf(buffer, bufferSize, > > >>>>> 88 "Core dumps may be treated with \"%s\"", > > >>>>> &core_pattern[1]); > > >>>>> 89 } > > >>>>> > > >>>>> to just > > >>>>> > > >>>>> 83 if (core_pattern[0] == '|') { // redirect > > >>>>> 84 jio_snprintf(buffer, bufferSize, "Core dumps may be > > >>>>> processed with \"%s\"", &core_pattern[1]); > > >>>>> 85 } > > >>>>> 86 } > > >>>>> > > >>>>> Comments from other runtime folk appreciated. > > >>>>> > > >>>>> Thanks, > > >>>>> David > > >>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Yasumasa > > >>>>>> > > >>>>>> 2014/10/07 15:43 "David Holmes" > > >>>>>> >>: > > >>>>>> > > >>>>>> Hi Yasumasa, > > >>>>>> > > >>>>>> I'm sorry but I don't understand what you are proposing. When you > > >>>>>> say > > >>>>>> "treat" do you mean "create"? Otherwise what do you mean by > > >>>>>> "treated"? > > >>>>>> > > >>>>>> Thanks, > > >>>>>> David > > >>>>>> > > >>>>>> On 2/10/2014 8:38 AM, Yasumasa Suenaga wrote: > > >>>>>> > I'm in Hackergarten @ JavaOne :-) > > >>>>>> > > > >>>>>> > > > >>>>>> > Hi all, > > >>>>>> > > > >>>>>> > I would like to enhance the messages in hs_err report. > > >>>>>> > Modern Linux kernel can treat core dump with user process > > >>>>>> (e.g. ABRT) > > >>>>>> > However, hs_err report cannot detect it. > > >>>>>> > > > >>>>>> > I think that hs_err report should output messages as below: > > >>>>>> > ------------- > > >>>>>> > Failed to write core dump. Core dumps may be treated with > > >>>>>> "/usr/sbin/chroot /proc/%P/root /usr/libexec/abrt-hook-ccpp %s > > %c %p > > >>>>>> %u %g %t e" > > >>>>>> > ------------- > > >>>>>> > > > >>>>>> > I've uploaded webrev of this enhancement. > > >>>>>> > Could you review it? > > >>>>>> > > > >>>>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8059586/webrev.00/ > > >>>>>> > > > >>>>>> > This patch works fine on Fedora20 x86_64. > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > Thanks, > > >>>>>> > > > >>>>>> > Yasumasa > > >>>>>> > > > >>>>>> > > > > > >