RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'
mikhailo.seledtsov at oracle.com
mikhailo.seledtsov at oracle.com
Mon Aug 26 19:32:36 UTC 2019
Hi David,
Thank you for review.
On 8/26/19 12:57 AM, David Holmes wrote:
> Hi Misha,
>
> On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
>> Finally got some time to work on this issue.
>> Since I have encountered problem using files for passing messages
>> between a container and a test driver (due to permissions), I looked
>> for alternative solutions. I am using the output of a container
>> process to signal when the main method has started, and it works.
>> This simplifies things quite a bit as well.
>>
>> Normally, we use OutputAnalyzer test utility to collect the whole
>> output once the process has completed, and then analyze the resulting
>> output for "contains some string", match, etc. However,
>> testutils/ProcessTools provides an API to consume the output as it is
>> produced. I am using this API to detect when the main() method of the
>> container has started.
>
> That seems reasonable. Do we want to make the following change to
> minimise unneeded output processing:
>
> private Consumer<String> outputConsumer = s -> {
> ! if (!mainMethodStarted &&
> s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
> System.out.println("MainContainer: setting
> mainMethodStarted");
> mainMethodStarted = true;
> }
> };
Thank you for the suggestion. I will update the code accordingly.
>
>> Updated webrev:
>> http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>
> Otherwise looks okay. Hopefully those other test cases will be enabled
> in the not too distant future.
I hope so as well.
Thank you,
Misha
>
> Thanks,
> David
> -----
>
>>
>> Testing:
>>
>> Ran the test on Linux-x64, various multiple nodes in a test
>> cluster 50 times - All PASS
>>
>>
>> Thank you,
>>
>> Misha
>>
>> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>>
>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>
>>>>
>>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>
>>>>>> Hi Bob,
>>>>>>
>>>>>> The workdir (JTwork/scratch) is created with the "test user"
>>>>>> permissions. Let me try to place the "signal" file in /tmp
>>>>>> instead, since /tmp should normally have a 777 permission on Linux.
>>>>> Aren’t you creating a file inside a docker container and then
>>>>> checking for its existence outside of the container?
>>>> Correct
>>>>> Isn’t the root user running inside the container?
>>>> By default it is. But it still fails to create a file, for some
>>>> reason. Can be related to selinux settings (for instance, see this
>>>> article:
>>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443),
>>>> I can not change those.
>>> Is your JTWork/scratch on an NFS mounted file system? If this is
>>> the case then the problem is that root is equivalent to nobody on
>>> mounted file systems and can’t create files unless the directory has
>>> 777 permissions. I just confirmed this. You’d have to either run
>>> the container test as test-user or change the scratch directory
>>> permission.
>>>
>>> Bob.
>>>
>>>> My hope is that /tmp is configured to be accessed by a container
>>>> engine as a general purpose directory, hence I was thinking to try
>>>> it out.
>>>>
>>>>> Both processes don’t see the same /tmp right? So that shouldn’t
>>>>> help.
>>>> In my next experiment, I will map a /tmp from host to be a
>>>> /host-tmp inside the container (--volume /tmp:/host-tmp), then
>>>> write a signal file to /host-tmp.
>>>>> If scratch has 777 permissions, anyone can create a file.
>>>> scratch has "rwxr-xr-x"
>>>>> You have to be careful that you can clean up the
>>>>> file from outside the container. I’d make sure to create it with
>>>>> 777.
>>>> I do use deleteOnExit(), so it should work (unless the JVM
>>>> crashes). I guess I could add extra layer of safety here, and set
>>>> the permissions to 777. Thank you for advice.
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> Misha
>>>>
>>>>> Bob.
>>>>>
>>>>>> If this works, I will have to add some unique number to the file
>>>>>> name, perhaps a PID of a child process.
>>>>>>
>>>>>> I will try this, and let you know how it works.
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Misha
>>>>>>
>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>>> Sorry, I just looked at the webrev and you are trying the
>>>>>>> approach I suggested. I thought you
>>>>>>> were trying to use file change notification.
>>>>>>>
>>>>>>> Where does the workdir get created? Does it have 777 permissions?
>>>>>>>
>>>>>>> Bob.
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette
>>>>>>>> <bob.vandette at oracle.com> wrote:
>>>>>>>>
>>>>>>>> What if you just poll for the creation of the file waiting some
>>>>>>>> small amount of time between polling with a maximum timeout.
>>>>>>>>
>>>>>>>> Bob.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>
>>>>>>>>> Unfortunately, this approach does not seem to work on many of
>>>>>>>>> our test cluster machines. The creation of a "signal" file
>>>>>>>>> results in "PermissionDenied".
>>>>>>>>>
>>>>>>>>> The possible reason is the selinux configuration, or some
>>>>>>>>> other permission related stuff. The container tries to create
>>>>>>>>> a new file on a mounted volume on a host system, but host
>>>>>>>>> system denies it. I will look a bit deeper into this, but I
>>>>>>>>> think this type of issue can be encountered on any automated
>>>>>>>>> test system. Hence, we may have to abandon this approach.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Misha
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>> Here is an updated webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>>
>>>>>>>>>> I am using a simple file-based mechanism to communicate
>>>>>>>>>> between the processes. The "EventGeneratorLoop" process
>>>>>>>>>> creates a specific "signal" file on a shared mounted volume,
>>>>>>>>>> while the main test process waits for the file to exist
>>>>>>>>>> before running the test cases.
>>>>>>>>>>
>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test
>>>>>>>>>> cluster is in progress.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>> Misha
>>>>>>>>>>
>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for reviewing the code.
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the
>>>>>>>>>>>>> test by possibly writing a
>>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>>> I will try out this approach.
>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing
>>>>>>>>>>> serviceability-dev.
>>>>>>>>>>>
>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue
>>>>>>>>>>> with the premature sending of the attach signal?
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Misha
>>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>>> processes?
>>>>>>>>>>>>>
>>>>>>>>>>>>> We’ve been fighting test reliability for a while now. I
>>>>>>>>>>>>> can only hope we’re getting
>>>>>>>>>>>>> to the end.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bob.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin
>>>>>>>>>>>>>> Gehwolf<sgehwolf at redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700,
>>>>>>>>>>>>>> mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>>> Please review this change that fixes a container test
>>>>>>>>>>>>>>> TestJcmdWithSideCar.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My investigation indicated that a root cause for this
>>>>>>>>>>>>>>> failure is:
>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main
>>>>>>>>>>>>>>> class has not
>>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but
>>>>>>>>>>>>>>> has not loaded
>>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times,
>>>>>>>>>>>>>>> with a short
>>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an
>>>>>>>>>>>>>> alternative.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to
>>>>>>>>>>>>>>> another bug:
>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to
>>>>>>>>>>>>>>> s.j.h.oops.Instance",
>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the
>>>>>>>>>>>>>>> test and skip a
>>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>>
More information about the serviceability-dev
mailing list