RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Thu Aug 29 21:13:42 UTC 2019


Hi Bob,

   Thank you for review.

On 8/29/19 10:55 AM, Bob Vandette wrote:
> Misha,
>
> Looks good.  A couple of nits.
>
> 1. You might want to remove hotspot_containers section of ProblemList.txt since there are no bugs listed.
I can remove that section.
> 2. Can you make this timeout a constant just like TIME_TO_RUN_MAIN_PROCESS
>
>    73             mainContainer.waitForMainMethodStart(5*1000);
Will do.
>
> 3. assertIsAlive() is not used except for the commented out tests.  Do you think you’ll ultimately use
> this method or is this left over from previous attempts?
>
> 222         public void assertIsAlive() throws Exception {

Currently it is not in use. However when test cases 02 and 03 are back 
online, this check will be used.


Since the changes you suggest are minor, I will not post an updated 
webrev (let me know if you would like to see the updated webrev).

I will make the updates recommended by you, run another test cycle, and 
integrate the changes.


Thank you,

Misha

>
>
> Bob.
>
>
>> On Aug 29, 2019, at 11:41 AM, mikhailo.seledtsov at oracle.com wrote:
>>
>> I believe I need a second reviewer for this change. Could someone, please, review this change version 2 ? (David already reviewed it).
>>
>> http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>>
>>
>> Thank you in advance,
>>
>> Misha
>>
>>
>> On 8/26/19 12:32 PM, mikhailo.seledtsov at oracle.com wrote:
>>> Hi David,
>>>
>>>    Thank you for review.
>>>
>>> On 8/26/19 12:57 AM, David Holmes wrote:
>>>> Hi Misha,
>>>>
>>>> On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
>>>>> Finally got some time to work on this issue.
>>>>> Since I have encountered problem using files for passing messages between a container and a test driver (due to permissions), I looked for alternative solutions. I am using the output of a container process to signal when the main method has started, and it works. This simplifies things quite a bit as well.
>>>>>
>>>>> Normally, we use OutputAnalyzer test utility to collect the whole output once the process has completed, and then analyze the resulting output for "contains some string", match, etc. However, testutils/ProcessTools provides an API to consume the output as it is produced. I am using this API to detect when the main() method of the container has started.
>>>> That seems reasonable. Do we want to make the following change to minimise unneeded output processing:
>>>>
>>>>           private Consumer<String> outputConsumer = s -> {
>>>> !            if (!mainMethodStarted && s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
>>>>                   System.out.println("MainContainer: setting mainMethodStarted");
>>>>                   mainMethodStarted = true;
>>>>               }
>>>>           };
>>> Thank you for the suggestion. I will update the code accordingly.
>>>>> Updated webrev:
>>>>>       http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>>>> Otherwise looks okay. Hopefully those other test cases will be enabled in the not too distant future.
>>> I hope so as well.
>>>
>>>
>>> Thank you,
>>>
>>> Misha
>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Testing:
>>>>>
>>>>>     Ran the test on Linux-x64, various multiple nodes in a test cluster 50 times - All PASS
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Misha
>>>>>
>>>>> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>>>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>
>>>>>>>>> Hi Bob,
>>>>>>>>>
>>>>>>>>>     The workdir (JTwork/scratch) is created with the "test user" permissions. Let me try to place the "signal" file in /tmp instead, since /tmp should normally have a 777 permission on Linux.
>>>>>>>> Aren’t you creating a file inside a docker container and then checking for its existence outside of the container?
>>>>>>> Correct
>>>>>>>> Isn’t the root user running inside the container?
>>>>>>> By default it is. But it still fails to create a file, for some reason. Can be related to selinux settings (for instance, see this article: https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443), I can not change those.
>>>>>> Is your JTWork/scratch on an NFS mounted file system?  If this is the case then the problem is that root is equivalent to nobody on
>>>>>> mounted file systems and can’t create files unless the directory has 777 permissions.  I just confirmed this. You’d have to either run
>>>>>> the container test as test-user or change the scratch directory permission.
>>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>> My hope is that /tmp is configured to be accessed by a container engine as a general purpose directory, hence I was thinking to try it out.
>>>>>>>
>>>>>>>> Both processes don’t see the same /tmp right?   So that shouldn’t help.
>>>>>>> In my next experiment, I will map a /tmp from host to be a /host-tmp inside the container (--volume /tmp:/host-tmp), then write a signal file to /host-tmp.
>>>>>>>> If scratch has 777 permissions, anyone can create a file.
>>>>>>> scratch has  "rwxr-xr-x"
>>>>>>>> You have to be careful that you can clean up the
>>>>>>>> file from outside the container.  I’d make sure to create it with 777.
>>>>>>> I do use deleteOnExit(), so it should work (unless the JVM crashes). I guess I could add extra layer of safety here, and set the permissions to 777. Thank you for advice.
>>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Misha
>>>>>>>
>>>>>>>> Bob.
>>>>>>>>
>>>>>>>>> If this works, I will have to add some unique number to the file name, perhaps a PID of a child process.
>>>>>>>>>
>>>>>>>>> I will try this, and let you know how it works.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Misha
>>>>>>>>>
>>>>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>>>>>> Sorry, I just looked at the webrev and you are trying the approach I suggested. I thought you
>>>>>>>>>> were trying to use file change notification.
>>>>>>>>>>
>>>>>>>>>> Where does the workdir get created?  Does it have 777 permissions?
>>>>>>>>>>
>>>>>>>>>> Bob.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette <bob.vandette at oracle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> What if you just poll for the creation of the file waiting some small amount of time between polling with a maximum timeout.
>>>>>>>>>>>
>>>>>>>>>>> Bob.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, this approach does not seem to work on many of our test cluster machines. The creation of a "signal" file results in "PermissionDenied".
>>>>>>>>>>>>
>>>>>>>>>>>> The possible reason is the selinux configuration, or some other permission related stuff. The container tries to create a new file on a mounted volume on a host system, but host system denies it. I will look a bit deeper into this, but I think this type of issue can be encountered on any automated test system. Hence, we may have to abandon this approach.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Misha
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>> Here is an updated webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am using a simple file-based mechanism to communicate between the processes. The "EventGeneratorLoop" process creates a specific "signal" file on a shared mounted volume, while the main test process waits  for the file to exist before running the test cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test cluster is in progress.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Misha
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Thank you for reviewing the code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the test by possibly writing a
>>>>>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>>>>>> I will try out this approach.
>>>>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing serviceability-dev.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue with the premature sending of the attach signal?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Misha
>>>>>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>>>>>> processes?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We’ve been fighting test reliability for a while now.  I can only hope we’re getting
>>>>>>>>>>>>>>>> to the end.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bob.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin Gehwolf<sgehwolf at redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>>>>>> Please review this change that fixes a container test TestJcmdWithSideCar.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My investigation indicated that a root cause for this failure is:
>>>>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main class has not
>>>>>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but has not loaded
>>>>>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times, with a short
>>>>>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an alternative.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to another bug:
>>>>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance",
>>>>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the test and skip a
>>>>>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>>>>>       Webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>>>>>


More information about the serviceability-dev mailing list