RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Thu Aug 29 15:41:57 UTC 2019


I believe I need a second reviewer for this change. Could someone, 
please, review this change version 2 ? (David already reviewed it).

http://cr.openjdk.java.net/~mseledtsov/8228960.02/


Thank you in advance,

Misha


On 8/26/19 12:32 PM, mikhailo.seledtsov at oracle.com wrote:
> Hi David,
>
>   Thank you for review.
>
> On 8/26/19 12:57 AM, David Holmes wrote:
>> Hi Misha,
>>
>> On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
>>> Finally got some time to work on this issue.
>>> Since I have encountered problem using files for passing messages 
>>> between a container and a test driver (due to permissions), I looked 
>>> for alternative solutions. I am using the output of a container 
>>> process to signal when the main method has started, and it works. 
>>> This simplifies things quite a bit as well.
>>>
>>> Normally, we use OutputAnalyzer test utility to collect the whole 
>>> output once the process has completed, and then analyze the 
>>> resulting output for "contains some string", match, etc. However, 
>>> testutils/ProcessTools provides an API to consume the output as it 
>>> is produced. I am using this API to detect when the main() method of 
>>> the container has started.
>>
>> That seems reasonable. Do we want to make the following change to 
>> minimise unneeded output processing:
>>
>>          private Consumer<String> outputConsumer = s -> {
>> !            if (!mainMethodStarted && 
>> s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
>>                  System.out.println("MainContainer: setting 
>> mainMethodStarted");
>>                  mainMethodStarted = true;
>>              }
>>          };
> Thank you for the suggestion. I will update the code accordingly.
>>
>>> Updated webrev:
>>>      http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>>
>> Otherwise looks okay. Hopefully those other test cases will be 
>> enabled in the not too distant future.
>
> I hope so as well.
>
>
> Thank you,
>
> Misha
>
>>
>> Thanks,
>> David
>> -----
>>
>>>
>>> Testing:
>>>
>>>    Ran the test on Linux-x64, various multiple nodes in a test 
>>> cluster 50 times - All PASS
>>>
>>>
>>> Thank you,
>>>
>>> Misha
>>>
>>> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>>>
>>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>
>>>>>
>>>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>
>>>>>>> Hi Bob,
>>>>>>>
>>>>>>>    The workdir (JTwork/scratch) is created with the "test user" 
>>>>>>> permissions. Let me try to place the "signal" file in /tmp 
>>>>>>> instead, since /tmp should normally have a 777 permission on Linux.
>>>>>> Aren’t you creating a file inside a docker container and then 
>>>>>> checking for its existence outside of the container?
>>>>> Correct
>>>>>> Isn’t the root user running inside the container?
>>>>> By default it is. But it still fails to create a file, for some 
>>>>> reason. Can be related to selinux settings (for instance, see this 
>>>>> article: 
>>>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443), 
>>>>> I can not change those.
>>>> Is your JTWork/scratch on an NFS mounted file system?  If this is 
>>>> the case then the problem is that root is equivalent to nobody on
>>>> mounted file systems and can’t create files unless the directory 
>>>> has 777 permissions.  I just confirmed this. You’d have to either run
>>>> the container test as test-user or change the scratch directory 
>>>> permission.
>>>>
>>>> Bob.
>>>>
>>>>> My hope is that /tmp is configured to be accessed by a container 
>>>>> engine as a general purpose directory, hence I was thinking to try 
>>>>> it out.
>>>>>
>>>>>> Both processes don’t see the same /tmp right?   So that shouldn’t 
>>>>>> help.
>>>>> In my next experiment, I will map a /tmp from host to be a 
>>>>> /host-tmp inside the container (--volume /tmp:/host-tmp), then 
>>>>> write a signal file to /host-tmp.
>>>>>> If scratch has 777 permissions, anyone can create a file.
>>>>> scratch has  "rwxr-xr-x"
>>>>>> You have to be careful that you can clean up the
>>>>>> file from outside the container.  I’d make sure to create it with 
>>>>>> 777.
>>>>> I do use deleteOnExit(), so it should work (unless the JVM 
>>>>> crashes). I guess I could add extra layer of safety here, and set 
>>>>> the permissions to 777. Thank you for advice.
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Misha
>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>> If this works, I will have to add some unique number to the file 
>>>>>>> name, perhaps a PID of a child process.
>>>>>>>
>>>>>>> I will try this, and let you know how it works.
>>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Misha
>>>>>>>
>>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>>>> Sorry, I just looked at the webrev and you are trying the 
>>>>>>>> approach I suggested. I thought you
>>>>>>>> were trying to use file change notification.
>>>>>>>>
>>>>>>>> Where does the workdir get created?  Does it have 777 permissions?
>>>>>>>>
>>>>>>>> Bob.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette 
>>>>>>>>> <bob.vandette at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> What if you just poll for the creation of the file waiting 
>>>>>>>>> some small amount of time between polling with a maximum timeout.
>>>>>>>>>
>>>>>>>>> Bob.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Unfortunately, this approach does not seem to work on many of 
>>>>>>>>>> our test cluster machines. The creation of a "signal" file 
>>>>>>>>>> results in "PermissionDenied".
>>>>>>>>>>
>>>>>>>>>> The possible reason is the selinux configuration, or some 
>>>>>>>>>> other permission related stuff. The container tries to create 
>>>>>>>>>> a new file on a mounted volume on a host system, but host 
>>>>>>>>>> system denies it. I will look a bit deeper into this, but I 
>>>>>>>>>> think this type of issue can be encountered on any automated 
>>>>>>>>>> test system. Hence, we may have to abandon this approach.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Misha
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>> Here is an updated webrev: 
>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>>>
>>>>>>>>>>> I am using a simple file-based mechanism to communicate 
>>>>>>>>>>> between the processes. The "EventGeneratorLoop" process 
>>>>>>>>>>> creates a specific "signal" file on a shared mounted volume, 
>>>>>>>>>>> while the main test process waits  for the file to exist 
>>>>>>>>>>> before running the test cases.
>>>>>>>>>>>
>>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test 
>>>>>>>>>>> cluster is in progress.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>>
>>>>>>>>>>> Misha
>>>>>>>>>>>
>>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Thank you for reviewing the code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the 
>>>>>>>>>>>>>> test by possibly writing a
>>>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>>>> I will try out this approach.
>>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing 
>>>>>>>>>>>> serviceability-dev.
>>>>>>>>>>>>
>>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue 
>>>>>>>>>>>> with the premature sending of the attach signal?
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Misha
>>>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>>>> processes?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We’ve been fighting test reliability for a while now.  I 
>>>>>>>>>>>>>> can only hope we’re getting
>>>>>>>>>>>>>> to the end.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bob.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin 
>>>>>>>>>>>>>>> Gehwolf<sgehwolf at redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, 
>>>>>>>>>>>>>>> mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>>>> Please review this change that fixes a container test 
>>>>>>>>>>>>>>>> TestJcmdWithSideCar.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My investigation indicated that a root cause for this 
>>>>>>>>>>>>>>>> failure is:
>>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main 
>>>>>>>>>>>>>>>> class has not
>>>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>>>> The target test JVM has started, it is initializing, 
>>>>>>>>>>>>>>>> but has not loaded
>>>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several 
>>>>>>>>>>>>>>>> times, with a short
>>>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an 
>>>>>>>>>>>>>>> alternative.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to 
>>>>>>>>>>>>>>>> another bug:
>>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to 
>>>>>>>>>>>>>>>> s.j.h.oops.Instance",
>>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the 
>>>>>>>>>>>>>>>> test and skip a
>>>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>>>      Webrev: 
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>>>


More information about the serviceability-dev mailing list