RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'
mikhailo.seledtsov at oracle.com
mikhailo.seledtsov at oracle.com
Thu Aug 29 15:41:57 UTC 2019
I believe I need a second reviewer for this change. Could someone,
please, review this change version 2 ? (David already reviewed it).
http://cr.openjdk.java.net/~mseledtsov/8228960.02/
Thank you in advance,
Misha
On 8/26/19 12:32 PM, mikhailo.seledtsov at oracle.com wrote:
> Hi David,
>
> Thank you for review.
>
> On 8/26/19 12:57 AM, David Holmes wrote:
>> Hi Misha,
>>
>> On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
>>> Finally got some time to work on this issue.
>>> Since I have encountered problem using files for passing messages
>>> between a container and a test driver (due to permissions), I looked
>>> for alternative solutions. I am using the output of a container
>>> process to signal when the main method has started, and it works.
>>> This simplifies things quite a bit as well.
>>>
>>> Normally, we use OutputAnalyzer test utility to collect the whole
>>> output once the process has completed, and then analyze the
>>> resulting output for "contains some string", match, etc. However,
>>> testutils/ProcessTools provides an API to consume the output as it
>>> is produced. I am using this API to detect when the main() method of
>>> the container has started.
>>
>> That seems reasonable. Do we want to make the following change to
>> minimise unneeded output processing:
>>
>> private Consumer<String> outputConsumer = s -> {
>> ! if (!mainMethodStarted &&
>> s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
>> System.out.println("MainContainer: setting
>> mainMethodStarted");
>> mainMethodStarted = true;
>> }
>> };
> Thank you for the suggestion. I will update the code accordingly.
>>
>>> Updated webrev:
>>> http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>>
>> Otherwise looks okay. Hopefully those other test cases will be
>> enabled in the not too distant future.
>
> I hope so as well.
>
>
> Thank you,
>
> Misha
>
>>
>> Thanks,
>> David
>> -----
>>
>>>
>>> Testing:
>>>
>>> Ran the test on Linux-x64, various multiple nodes in a test
>>> cluster 50 times - All PASS
>>>
>>>
>>> Thank you,
>>>
>>> Misha
>>>
>>> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>>>
>>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>
>>>>>
>>>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>
>>>>>>> Hi Bob,
>>>>>>>
>>>>>>> The workdir (JTwork/scratch) is created with the "test user"
>>>>>>> permissions. Let me try to place the "signal" file in /tmp
>>>>>>> instead, since /tmp should normally have a 777 permission on Linux.
>>>>>> Aren’t you creating a file inside a docker container and then
>>>>>> checking for its existence outside of the container?
>>>>> Correct
>>>>>> Isn’t the root user running inside the container?
>>>>> By default it is. But it still fails to create a file, for some
>>>>> reason. Can be related to selinux settings (for instance, see this
>>>>> article:
>>>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443),
>>>>> I can not change those.
>>>> Is your JTWork/scratch on an NFS mounted file system? If this is
>>>> the case then the problem is that root is equivalent to nobody on
>>>> mounted file systems and can’t create files unless the directory
>>>> has 777 permissions. I just confirmed this. You’d have to either run
>>>> the container test as test-user or change the scratch directory
>>>> permission.
>>>>
>>>> Bob.
>>>>
>>>>> My hope is that /tmp is configured to be accessed by a container
>>>>> engine as a general purpose directory, hence I was thinking to try
>>>>> it out.
>>>>>
>>>>>> Both processes don’t see the same /tmp right? So that shouldn’t
>>>>>> help.
>>>>> In my next experiment, I will map a /tmp from host to be a
>>>>> /host-tmp inside the container (--volume /tmp:/host-tmp), then
>>>>> write a signal file to /host-tmp.
>>>>>> If scratch has 777 permissions, anyone can create a file.
>>>>> scratch has "rwxr-xr-x"
>>>>>> You have to be careful that you can clean up the
>>>>>> file from outside the container. I’d make sure to create it with
>>>>>> 777.
>>>>> I do use deleteOnExit(), so it should work (unless the JVM
>>>>> crashes). I guess I could add extra layer of safety here, and set
>>>>> the permissions to 777. Thank you for advice.
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Misha
>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>> If this works, I will have to add some unique number to the file
>>>>>>> name, perhaps a PID of a child process.
>>>>>>>
>>>>>>> I will try this, and let you know how it works.
>>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Misha
>>>>>>>
>>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>>>> Sorry, I just looked at the webrev and you are trying the
>>>>>>>> approach I suggested. I thought you
>>>>>>>> were trying to use file change notification.
>>>>>>>>
>>>>>>>> Where does the workdir get created? Does it have 777 permissions?
>>>>>>>>
>>>>>>>> Bob.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette
>>>>>>>>> <bob.vandette at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> What if you just poll for the creation of the file waiting
>>>>>>>>> some small amount of time between polling with a maximum timeout.
>>>>>>>>>
>>>>>>>>> Bob.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Unfortunately, this approach does not seem to work on many of
>>>>>>>>>> our test cluster machines. The creation of a "signal" file
>>>>>>>>>> results in "PermissionDenied".
>>>>>>>>>>
>>>>>>>>>> The possible reason is the selinux configuration, or some
>>>>>>>>>> other permission related stuff. The container tries to create
>>>>>>>>>> a new file on a mounted volume on a host system, but host
>>>>>>>>>> system denies it. I will look a bit deeper into this, but I
>>>>>>>>>> think this type of issue can be encountered on any automated
>>>>>>>>>> test system. Hence, we may have to abandon this approach.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Misha
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>> Here is an updated webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>>>
>>>>>>>>>>> I am using a simple file-based mechanism to communicate
>>>>>>>>>>> between the processes. The "EventGeneratorLoop" process
>>>>>>>>>>> creates a specific "signal" file on a shared mounted volume,
>>>>>>>>>>> while the main test process waits for the file to exist
>>>>>>>>>>> before running the test cases.
>>>>>>>>>>>
>>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test
>>>>>>>>>>> cluster is in progress.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>>
>>>>>>>>>>> Misha
>>>>>>>>>>>
>>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for reviewing the code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the
>>>>>>>>>>>>>> test by possibly writing a
>>>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>>>> I will try out this approach.
>>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing
>>>>>>>>>>>> serviceability-dev.
>>>>>>>>>>>>
>>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue
>>>>>>>>>>>> with the premature sending of the attach signal?
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Misha
>>>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>>>> processes?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We’ve been fighting test reliability for a while now. I
>>>>>>>>>>>>>> can only hope we’re getting
>>>>>>>>>>>>>> to the end.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bob.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin
>>>>>>>>>>>>>>> Gehwolf<sgehwolf at redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700,
>>>>>>>>>>>>>>> mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>>>> Please review this change that fixes a container test
>>>>>>>>>>>>>>>> TestJcmdWithSideCar.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My investigation indicated that a root cause for this
>>>>>>>>>>>>>>>> failure is:
>>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main
>>>>>>>>>>>>>>>> class has not
>>>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>>>> The target test JVM has started, it is initializing,
>>>>>>>>>>>>>>>> but has not loaded
>>>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several
>>>>>>>>>>>>>>>> times, with a short
>>>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an
>>>>>>>>>>>>>>> alternative.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to
>>>>>>>>>>>>>>>> another bug:
>>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to
>>>>>>>>>>>>>>>> s.j.h.oops.Instance",
>>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the
>>>>>>>>>>>>>>>> test and skip a
>>>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>>>
More information about the serviceability-dev
mailing list