RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'
David Holmes
david.holmes at oracle.com
Mon Aug 26 07:57:46 UTC 2019
Hi Misha,
On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
> Finally got some time to work on this issue.
> Since I have encountered problem using files for passing messages
> between a container and a test driver (due to permissions), I looked for
> alternative solutions. I am using the output of a container process to
> signal when the main method has started, and it works. This simplifies
> things quite a bit as well.
>
> Normally, we use OutputAnalyzer test utility to collect the whole output
> once the process has completed, and then analyze the resulting output
> for "contains some string", match, etc. However, testutils/ProcessTools
> provides an API to consume the output as it is produced. I am using this
> API to detect when the main() method of the container has started.
That seems reasonable. Do we want to make the following change to
minimise unneeded output processing:
private Consumer<String> outputConsumer = s -> {
! if (!mainMethodStarted &&
s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
System.out.println("MainContainer: setting
mainMethodStarted");
mainMethodStarted = true;
}
};
> Updated webrev:
> http://cr.openjdk.java.net/~mseledtsov/8228960.02/
Otherwise looks okay. Hopefully those other test cases will be enabled
in the not too distant future.
Thanks,
David
-----
>
> Testing:
>
> Ran the test on Linux-x64, various multiple nodes in a test cluster
> 50 times - All PASS
>
>
> Thank you,
>
> Misha
>
> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>
>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>
>>>
>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>
>>>>> Hi Bob,
>>>>>
>>>>> The workdir (JTwork/scratch) is created with the "test user"
>>>>> permissions. Let me try to place the "signal" file in /tmp instead,
>>>>> since /tmp should normally have a 777 permission on Linux.
>>>> Aren’t you creating a file inside a docker container and then
>>>> checking for its existence outside of the container?
>>> Correct
>>>> Isn’t the root user running inside the container?
>>> By default it is. But it still fails to create a file, for some
>>> reason. Can be related to selinux settings (for instance, see this
>>> article:
>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443),
>>> I can not change those.
>> Is your JTWork/scratch on an NFS mounted file system? If this is the
>> case then the problem is that root is equivalent to nobody on
>> mounted file systems and can’t create files unless the directory has
>> 777 permissions. I just confirmed this. You’d have to either run
>> the container test as test-user or change the scratch directory
>> permission.
>>
>> Bob.
>>
>>> My hope is that /tmp is configured to be accessed by a container
>>> engine as a general purpose directory, hence I was thinking to try it
>>> out.
>>>
>>>> Both processes don’t see the same /tmp right? So that shouldn’t help.
>>> In my next experiment, I will map a /tmp from host to be a /host-tmp
>>> inside the container (--volume /tmp:/host-tmp), then write a signal
>>> file to /host-tmp.
>>>> If scratch has 777 permissions, anyone can create a file.
>>> scratch has "rwxr-xr-x"
>>>> You have to be careful that you can clean up the
>>>> file from outside the container. I’d make sure to create it with 777.
>>> I do use deleteOnExit(), so it should work (unless the JVM crashes).
>>> I guess I could add extra layer of safety here, and set the
>>> permissions to 777. Thank you for advice.
>>>
>>>
>>> Thank you,
>>>
>>> Misha
>>>
>>>> Bob.
>>>>
>>>>> If this works, I will have to add some unique number to the file
>>>>> name, perhaps a PID of a child process.
>>>>>
>>>>> I will try this, and let you know how it works.
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Misha
>>>>>
>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>> Sorry, I just looked at the webrev and you are trying the approach
>>>>>> I suggested. I thought you
>>>>>> were trying to use file change notification.
>>>>>>
>>>>>> Where does the workdir get created? Does it have 777 permissions?
>>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>
>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette
>>>>>>> <bob.vandette at oracle.com> wrote:
>>>>>>>
>>>>>>> What if you just poll for the creation of the file waiting some
>>>>>>> small amount of time between polling with a maximum timeout.
>>>>>>>
>>>>>>> Bob.
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>
>>>>>>>> Unfortunately, this approach does not seem to work on many of
>>>>>>>> our test cluster machines. The creation of a "signal" file
>>>>>>>> results in "PermissionDenied".
>>>>>>>>
>>>>>>>> The possible reason is the selinux configuration, or some other
>>>>>>>> permission related stuff. The container tries to create a new
>>>>>>>> file on a mounted volume on a host system, but host system
>>>>>>>> denies it. I will look a bit deeper into this, but I think this
>>>>>>>> type of issue can be encountered on any automated test system.
>>>>>>>> Hence, we may have to abandon this approach.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Misha
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>> Here is an updated webrev:
>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>
>>>>>>>>> I am using a simple file-based mechanism to communicate between
>>>>>>>>> the processes. The "EventGeneratorLoop" process creates a
>>>>>>>>> specific "signal" file on a shared mounted volume, while the
>>>>>>>>> main test process waits for the file to exist before running
>>>>>>>>> the test cases.
>>>>>>>>>
>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test
>>>>>>>>> cluster is in progress.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Misha
>>>>>>>>>
>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for reviewing the code.
>>>>>>>>>>>
>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the
>>>>>>>>>>>> test by possibly writing a
>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>> I will try out this approach.
>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing
>>>>>>>>>> serviceability-dev.
>>>>>>>>>>
>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue
>>>>>>>>>> with the premature sending of the attach signal?
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Misha
>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>> processes?
>>>>>>>>>>>>
>>>>>>>>>>>> We’ve been fighting test reliability for a while now. I can
>>>>>>>>>>>> only hope we’re getting
>>>>>>>>>>>> to the end.
>>>>>>>>>>>>
>>>>>>>>>>>> Bob.
>>>>>>>>>>>>
>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin
>>>>>>>>>>>>> Gehwolf<sgehwolf at redhat.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700,
>>>>>>>>>>>>> mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>> Please review this change that fixes a container test
>>>>>>>>>>>>>> TestJcmdWithSideCar.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My investigation indicated that a root cause for this
>>>>>>>>>>>>>> failure is:
>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main
>>>>>>>>>>>>>> class has not
>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but
>>>>>>>>>>>>>> has not loaded
>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times,
>>>>>>>>>>>>>> with a short
>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an
>>>>>>>>>>>>> alternative.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also I have commented out the testCase02() due to another
>>>>>>>>>>>>>> bug:
>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance",
>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the test
>>>>>>>>>>>>>> and skip a
>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>
More information about the serviceability-dev
mailing list