From palmer at dabbelt.com Thu Aug 4 02:08:53 2022 From: palmer at dabbelt.com (Palmer Dabbelt) Date: Wed, 03 Aug 2022 19:08:53 -0700 (PDT) Subject: 答复: The usage of fence.i in openjdk In-Reply-To: <37edd86e.4d5b0.1824f5168bf.Coremail.yangfei@iscas.ac.cn> Message-ID: On Sat, 30 Jul 2022 06:35:11 PDT (-0700), yangfei at iscas.ac.cn wrote: > Hi Palmer, > > > > -----Original Messages----- > > From: "Palmer Dabbelt" > > Sent Time: 2022-07-30 02:02:59 (Saturday) > > To: yadonn.wang at huawei.com > > Cc: vladimir.kempik at gmail.com, riscv-port-dev at openjdk.org > > Subject: Re: ??: The usage of fence.i in openjdk > > > > On Fri, 29 Jul 2022 08:12:21 PDT (-0700), yadonn.wang at huawei.com wrote: > > > Hi, Vladimir, > > > > > >> I believe Java?s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous. > > > Could you describe in detail why the use of fence.i is dangerous? I think it may be just inefficient but not dangerous. > > > > The issue here is that fence.i applies to the current hart, whereas > > Linux userspace processes just know the current thread. Executing a > > fence.i in userspace adds a bunch of orderings to the thread's state > > that can only be moved to a new hart via another fence.i. Normally that > > sort of thing isn't such a big deal (there's similar state for things > > like fence and lr), but the SiFive chips implement fence.i by flushing > > the instruction cache which is a slow operation to put on the scheduling > > path. > > > > The only way for the kernel to avoid that fence.i on the scheduling path > > is for it to know if one's been executed in userspace. There's no way > > to trap on fence.i so instead the Linux uABI just requires userspace to > > make a syscall (or a VDSO library call). If userspace directly executes > > a fence.i then the kernel won't know and thus can't ensure the thread > > state is adequately moved to the new hart during scheduling, which may > > result in incorrect behavior. > > > > We've known for a while that this will cause performance issues for JITs > > on some implemenations, but so far it's just not been a priority. I > > poked around the RISC-V OpenJDK port a few months ago and I think > > there's some improvements that can be made there, but we're probably > > also going to want some kernel support. Exactly how to fix it is > > probably going to depend on the workloads and implementations, though, > > and while I think I understand the OpenJDK part pretty well it's not > > clear what the other fence.i implementations are doing. > > > > In the long run we're also going to need some ISA support for doing this > > sanely, but that's sort of a different problem. I've been kind of > > crossing my fingers and hoping that anyone who has a system where JIT > > performance is important is also going to have some better write/fetch > > ordering instructions, but given how long it's been maybe that's a bad > > plan. > > > > That said, the direct fence.i is incorrect and it's likely that the > > long-term solution involves making the VDSO call so it's probably best > > to swap over. I remember having written a patch to do that at some > > point, but I can't find it so maybe I just forgot to send it? > > Thanks for all those considerations about the design. It's very helpfull. > > > > To a certain extent, this code is hangover from the aarch64 port and we use fence.i to mimic isb. > > >> Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method. > > > You're right. It'd better place a full data fence before the syscall, because we cannot guarantee here the syscall leave a data fence there before IPI remote fence.i to other harts > > > > That's not necessary with the current implementation, but it's not > > But looks like this is not reflected in kernel function flush_icache_mm? > I checked the code and it looks to me that data fence is issued for only one path: > > 59 if (mm == current->active_mm && local) { > 60 /* > 61 * It's assumed that at least one strongly ordered operation is > 62 * performed on this hart between setting a hart's cpumask bit > 63 * and scheduling this MM context on that hart. Sending an SBI > 64 * remote message will do this, but in the case where no > 65 * messages are sent we still need to order this hart's writes > 66 * with flush_icache_deferred(). > 67 */ > 68 smp_mb(); > 69 } > > I just want to make sure that the data fence is there in this syscall with the current implementation. > But I am not familar with the riscv linux kernel code and it's appreciated if you have more details. That comment is trying to explain this, but essentially we're assuming that sbi_remote_fence_i() is a full fence. That's probably not written down, but it's true in practice as there's no direct remote fence.i instrnuction so this just interrupts the target hart and there's a bunch of fences on that path in order to make sure right IPI message type shows up remotely. This data fence exists on the local path to handle the case where there are other threads, but those threads are not currently running on any hart. In that case we need the data fence because those threads could start running on a hart at any time in the future, so without the local data fence we'd just have a remote fence.i which isn't sufficient to ensure visibility (this is explicitly called out in the ISA manual). I don't know if that makes more or less sense though... > > Thanks, > Fei From vladimir.kempik at gmail.com Fri Aug 5 22:06:32 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Sat, 6 Aug 2022 01:06:32 +0300 Subject: The usage of fence.i in openjdk In-Reply-To: <1CADD7EC-49F0-4665-BF59-E8526D6AF54C@gmail.com> References: <00EDAECF-F0AE-473A-B124-5C24CC1B8542@gmail.com> <303ab75147704124b9759934da0107e5@huawei.com> <1CADD7EC-49F0-4665-BF59-E8526D6AF54C@gmail.com> Message-ID: More on this subject I can see the use of ifence() in the code is identical to the use of isb() in aarch64. Checking the documentation for fence.i and isb, I don?t see them to be 1:1 identical fence.i ( https://five-embeddev.com/riscv-isa-manual/latest/zifencei.html ): FENCE.I instruction provides explicit synchronization between writes to instruction memory and instruction fetches on the same hart. ISB ( https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Barriers/ISB-in-more-detail ): An ISB flushes the pipeline, and re-fetches the instructions from the cache or memory and ensures that the effects of any completed context-changing operation before the ISB are visible to any instruction after the ISB. It also ensures that any context-changing operations after the ISB instruction only take effect after the ISB has been executed and are not seen by instructions before the ISB. And some info from the web: To me it sound like isb ( in aarch64) does the job a bit different than fence.i ( in rv64) So, I think here: __ la_patchable(t0, RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::fixup_callers_callsite)), offset); __ jalr(x1, t0, offset); // Explicit fence.i required because fixup_callers_callsite may change the code // stream. __ safepoint_ifence(); __ pop_CPU_state(); // restore sp __ leave(); __ bind(L); we still have a small chance to start executing invalid ( old) code from l1i if right after safepoint_ifence() our thread would be moved to another hart. Otherwise if fixup_callers_callsite would call icache_flush() somewhere inside, then safepoint_ifence wouldn?t be needed here Regards, Vladimir > 30 ???? 2022 ?., ? 13:29, Vladimir Kempik ???????(?): > > Hello > Thanks for explanation. > that sounds like the fence.i in userspace code is not needed at all > Regards, Vladimir >> 30 ???? 2022 ?., ? 05:41, wangyadong (E) ???????(?): >> >>> Lets say you have a thread A running on hart 1. >>> You've changed some code in region 0x11223300 and need fence.i before executing that code. >>> you execute fence.i in your thread A running on hart 1. >>> right after that your thread ( for some reason) got rescheduled ( by kernel) to hart 2. >>> if hart 2 had something in l1i corresponding to region 0x11223300, then you gonna have a problem: l1i on hart 2 has old code, it wasn?t refreshed, because fence.i was executed on hart 1 ( and never on hart 2). And you thread gonna execute old code, or mix of old and new code. >> >> @vladimir Thanks for your explanation. I understand your concern now. We know the fence.i's scope, so the write hart does not rely solely on the fence.i in RISC-V port, but calls the icache_flush syscall in ICache::invalidate_range() every time after modifying the code. >> >> For example: >> Hart 1 >> void MacroAssembler::emit_static_call_stub() { >> // CompiledDirectStaticCall::set_to_interpreted knows the >> // exact layout of this stub. >> >> ifence(); >> mov_metadata(xmethod, (Metadata*)NULL); <- patchable code here >> >> // Jump to the entry point of the i2c stub. >> int32_t offset = 0; >> movptr_with_offset(t0, 0, offset); >> jalr(x0, t0, offset); >> } >> >> Hart 2 (write hart) >> void NativeMovConstReg::set_data(intptr_t x) { >> // ... >> // Store x into the instruction stream. >> MacroAssembler::pd_patch_instruction_size(instruction_address(), (address)x); <- write code >> ICache::invalidate_range(instruction_address(), movptr_instruction_size); <- syscall here >> // ... >> } >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From palmer at dabbelt.com Sat Aug 6 18:15:12 2022 From: palmer at dabbelt.com (Palmer Dabbelt) Date: Sat, 06 Aug 2022 11:15:12 -0700 (PDT) Subject: The usage of fence.i in openjdk In-Reply-To: Message-ID: On Fri, 05 Aug 2022 15:06:32 PDT (-0700), vladimir.kempik at gmail.com wrote: > More on this subject > I can see the use of ifence() in the code is identical to the use of isb() in aarch64. > Checking the documentation for fence.i and isb, I don?t see them to be 1:1 identical > > fence.i ( https://five-embeddev.com/riscv-isa-manual/latest/zifencei.html ): > FENCE.I instruction provides explicit synchronization between writes to instruction memory and instruction fetches on the same hart. > > ISB ( https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Barriers/ISB-in-more-detail ): > An ISB flushes the pipeline, and re-fetches the instructions from the cache or memory and ensures that the effects of any completed context-changing operation before the ISB are visible to any instruction after the ISB. It also ensures that any context-changing operations after the ISB instruction only take effect after the ISB has been executed and are not seen by instructions before the ISB. > And some info from the web: > > To me it sound like isb ( in aarch64) does the job a bit different than fence.i ( in rv64) Broadly speaking I'd agree, with the caveat that there's no formal description of the RISC-V instruction fetch ordering requirements so it's kind of hard to tell exactly what fence.i does in detail. > So, I think here: > > __ la_patchable(t0, RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::fixup_callers_callsite)), offset); > __ jalr(x1, t0, offset); > > // Explicit fence.i required because fixup_callers_callsite may change the code > // stream. > __ safepoint_ifence(); > > __ pop_CPU_state(); > // restore sp > __ leave(); > __ bind(L); > > we still have a small chance to start executing invalid ( old) code from l1i if right after safepoint_ifence() our thread would be moved to another hart. Otherwise if fixup_callers_callsite would call icache_flush() somewhere inside, then safepoint_ifence wouldn?t be needed here I don't know enough about the port to tell for sure, but generally speaking any code that executes a fence.i directly (ie, not from the VDSO call) can't rely on the orderings implied by the ISA for correctness. I can imagine cases where a fence.i is the right thing to do for a performance reason (maybe the old code is safe, but it's better for performance to eagerly move to the new code), but with such a loose definition of fetch ordering that's going to require a lot of implementation-specific dependencies for correctness. I wouldn't be at all surprised if the VDSO call is way too slow for some important workloads, but that's an issue we'd need to fix. Exactly what the fix is will depend on what's wrong with the VDSO call, but I think there's two main reasons: either the VDSO call is too slow because it enters the kernel, or userspace can't emit a call at all because of some other restrictions (maybe it doesn't know where the VDSO is yet, there's stack issues, etc). Those should both be fixable, but there's lots of things to fix so it just hasn't been a priority. > Regards, Vladimir > >> 30 ???? 2022 ?., ? 13:29, Vladimir Kempik ???????(?): >> >> Hello >> Thanks for explanation. >> that sounds like the fence.i in userspace code is not needed at all >> Regards, Vladimir >>> 30 ???? 2022 ?., ? 05:41, wangyadong (E) ???????(?): >>> >>>> Lets say you have a thread A running on hart 1. >>>> You've changed some code in region 0x11223300 and need fence.i before executing that code. >>>> you execute fence.i in your thread A running on hart 1. >>>> right after that your thread ( for some reason) got rescheduled ( by kernel) to hart 2. >>>> if hart 2 had something in l1i corresponding to region 0x11223300, then you gonna have a problem: l1i on hart 2 has old code, it wasn?t refreshed, because fence.i was executed on hart 1 ( and never on hart 2). And you thread gonna execute old code, or mix of old and new code. >>> >>> @vladimir Thanks for your explanation. I understand your concern now. We know the fence.i's scope, so the write hart does not rely solely on the fence.i in RISC-V port, but calls the icache_flush syscall in ICache::invalidate_range() every time after modifying the code. >>> >>> For example: >>> Hart 1 >>> void MacroAssembler::emit_static_call_stub() { >>> // CompiledDirectStaticCall::set_to_interpreted knows the >>> // exact layout of this stub. >>> >>> ifence(); >>> mov_metadata(xmethod, (Metadata*)NULL); <- patchable code here >>> >>> // Jump to the entry point of the i2c stub. >>> int32_t offset = 0; >>> movptr_with_offset(t0, 0, offset); >>> jalr(x0, t0, offset); >>> } >>> >>> Hart 2 (write hart) >>> void NativeMovConstReg::set_data(intptr_t x) { >>> // ... >>> // Store x into the instruction stream. >>> MacroAssembler::pd_patch_instruction_size(instruction_address(), (address)x); <- write code >>> ICache::invalidate_range(instruction_address(), movptr_instruction_size); <- syscall here >>> // ... >>> } >>> >> From yadonn.wang at huawei.com Mon Aug 8 06:32:56 2022 From: yadonn.wang at huawei.com (wangyadong (E)) Date: Mon, 8 Aug 2022 06:32:56 +0000 Subject: The usage of fence.i in openjdk In-Reply-To: References: <00EDAECF-F0AE-473A-B124-5C24CC1B8542@gmail.com> <303ab75147704124b9759934da0107e5@huawei.com> <1CADD7EC-49F0-4665-BF59-E8526D6AF54C@gmail.com> Message-ID: <7afddc13451e4615b3db5db86473562f@huawei.com> >Otherwise if fixup_callers_callsite would call icache_flush() somewhere inside, then safepoint_ifence wouldn?t be needed here Yes, we called icache_flush in fixup_callers_callsite: SharedRuntime::fixup_callers_callsite->NativeCall::set_destination_mt_safe->ICache::invalidate_range-> icache_flush. And I started a PR to fix the usage of fence.i in user space of RISC-V port: https://github.com/openjdk/jdk/pull/9770. ISBs used in Aarch64 port are following the AArch64 Reference Manual: Ensuring the visibility of updates to instructions for a multiprocessor The ARMv8 architecture requires a PE that performs an instruction cache maintenance operation to execute a DSB instruction to ensure completion of the maintenance operation. This ensures that the cache maintenance operation is complete on all PEs in the Inner Shareable shareability domain. An ISB is not broadcast, and so does not affect other PEs. This means that any other PE must perform its own ISB synchronization after it knows that the update is visible, if it is necessary to ensure its synchronization with the update. The following example shows how this might be done: AArch64 P1 STR X11, [X1] ;X11 contains a new instruction to stored in program memory DC CVAU, X1 ; clean to PoU makes visible to instruction cache DSB ISH ; ensure completion of the clean on all processors IC IVAU, X1 ; ensure instruction cache/branch predictor discard stale data DSB ISH ; ensure completion of the ICache and branch predictor ; invalidation on all processors STR W0, [X2] ; set flag to signal completion ISB ; synchronize context on this processor BR R1 ; branch to new code P2-Px WAIT ([X2] == 1) ; wait for flag signalling completion ISB ; synchronize context on this processor BR X1 ; branch to new code From: Vladimir Kempik [mailto:vladimir.kempik at gmail.com] Sent: Saturday, August 6, 2022 6:07 AM To: wangyadong (E) Cc: Palmer Dabbelt ; riscv-port-dev at openjdk.org Subject: Re: The usage of fence.i in openjdk More on this subject I can see the use of ifence() in the code is identical to the use of isb() in aarch64. Checking the documentation for fence.i and isb, I don?t see them to be 1:1 identical fence.i ( https://five-embeddev.com/riscv-isa-manual/latest/zifencei.html ): FENCE.I instruction provides explicit synchronization between writes to instruction memory and instruction fetches on the same hart. ISB ( https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Barriers/ISB-in-more-detail ): An ISB flushes the pipeline, and re-fetches the instructions from the cache or memory and ensures that the effects of any completed context-changing operation before the ISB are visible to any instruction after the ISB. It also ensures that any context-changing operations after the ISB instruction only take effect after the ISB has been executed and are not seen by instructions before the ISB. And some info from the web: To me it sound like isb ( in aarch64) does the job a bit different than fence.i ( in rv64) So, I think here: __ la_patchable(t0, RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::fixup_callers_callsite)), offset); __ jalr(x1, t0, offset); // Explicit fence.i required because fixup_callers_callsite may change the code // stream. __ safepoint_ifence(); __ pop_CPU_state(); // restore sp __ leave(); __ bind(L); we still have a small chance to start executing invalid ( old) code from l1i if right after safepoint_ifence() our thread would be moved to another hart. Otherwise if fixup_callers_callsite would call icache_flush() somewhere inside, then safepoint_ifence wouldn?t be needed here Regards, Vladimir 30 ???? 2022 ?., ? 13:29, Vladimir Kempik > ???????(?): Hello Thanks for explanation. that sounds like the fence.i in userspace code is not needed at all Regards, Vladimir 30 ???? 2022 ?., ? 05:41, wangyadong (E) > ???????(?): Lets say you have a thread A running on hart 1. You've changed some code in region 0x11223300 and need fence.i before executing that code. you execute fence.i in your thread A running on hart 1. right after that your thread ( for some reason) got rescheduled ( by kernel) to hart 2. if hart 2 had something in l1i corresponding to region 0x11223300, then you gonna have a problem: l1i on hart 2 has old code, it wasn?t refreshed, because fence.i was executed on hart 1 ( and never on hart 2). And you thread gonna execute old code, or mix of old and new code. @vladimir Thanks for your explanation. I understand your concern now. We know the fence.i's scope, so the write hart does not rely solely on the fence.i in RISC-V port, but calls the icache_flush syscall in ICache::invalidate_range() every time after modifying the code. For example: Hart 1 void MacroAssembler::emit_static_call_stub() { // CompiledDirectStaticCall::set_to_interpreted knows the // exact layout of this stub. ifence(); mov_metadata(xmethod, (Metadata*)NULL); <- patchable code here // Jump to the entry point of the i2c stub. int32_t offset = 0; movptr_with_offset(t0, 0, offset); jalr(x0, t0, offset); } Hart 2 (write hart) void NativeMovConstReg::set_data(intptr_t x) { // ... // Store x into the instruction stream. MacroAssembler::pd_patch_instruction_size(instruction_address(), (address)x); <- write code ICache::invalidate_range(instruction_address(), movptr_instruction_size); <- syscall here // ... } -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Mon Aug 15 10:28:46 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 15 Aug 2022 13:28:46 +0300 Subject: CompareAndSet fails intermittently for riscv Message-ID: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> Hello Wanted to let you know I have observed some tests to fail on jdk19/20 with risc-v. These tests fail on WeakCompareAndSet* methods. https://bugs.openjdk.org/browse/JDK-8292360 Failures happen more often on slower hardware. We might be missing some barrier in risc-v port. Regards, Vladimir. From yadonn.wang at huawei.com Tue Aug 16 06:34:16 2022 From: yadonn.wang at huawei.com (wangyadong (E)) Date: Tue, 16 Aug 2022 06:34:16 +0000 Subject: CompareAndSet fails intermittently for riscv In-Reply-To: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> Message-ID: <4a453e519e5a4a0aa81211456f5c1890@huawei.com> > These tests fail on WeakCompareAndSet* methods. > https://bugs.openjdk.org/browse/JDK-8292360 Yes, we know about these test failures, and we've found that these failures also occur on our "powerful" aarch64 servers. I think there is some uncertainty about the test itself, and the probability of passing increased when you raise the test parameter of "weakAttempts". More barriers won't work 100%, unless you try again the loop after sc fails, using the "strong" version of cas. > Failures happen more often on slower hardware. I guess it's the same on the aarch64 platform, but I don't have the aarch64 hardware that's as slow as current risc-v hardware. And I found "weakAttempts" was first introduced by Aleksey because PPC reported the failures, see https://bugs.openjdk.org/browse/JDK-8155739. We now run the weak-cas tests with a high "weakAttempts" on unmatched. -----Original Message----- From: riscv-port-dev [mailto:riscv-port-dev-retn at openjdk.org] On Behalf Of Vladimir Kempik Sent: Monday, August 15, 2022 6:29 PM To: riscv-port-dev at openjdk.org Subject: CompareAndSet fails intermittently for riscv Hello Wanted to let you know I have observed some tests to fail on jdk19/20 with risc-v. These tests fail on WeakCompareAndSet* methods. https://bugs.openjdk.org/browse/JDK-8292360 Failures happen more often on slower hardware. We might be missing some barrier in risc-v port. Regards, Vladimir. From shade at redhat.com Tue Aug 16 07:54:41 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Aug 2022 09:54:41 +0200 Subject: CompareAndSet fails intermittently for riscv In-Reply-To: <4a453e519e5a4a0aa81211456f5c1890@huawei.com> References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> <4a453e519e5a4a0aa81211456f5c1890@huawei.com> Message-ID: On 8/16/22 08:34, wangyadong (E) wrote: > Yes, we know about these test failures, and we've found that these failures also occur on our > "powerful" aarch64 servers. I think there is some uncertainty about the test itself, and the > probability of passing increased when you raise the test parameter of "weakAttempts". Yes, the test itself is flaky. Unfortunately, disabling the test completely is worse option, because it would miss the actual broken weak CASes. > We now run the weak-cas tests with a high "weakAttempts" on unmatched. How much you bump weakAttempts to? I also think we would be better with some backoff in the loop, so that we have more chance in succeeding. -- Thanks, -Aleksey From shade at redhat.com Tue Aug 16 08:24:23 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Aug 2022 10:24:23 +0200 Subject: CompareAndSet fails intermittently for riscv In-Reply-To: References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> <4a453e519e5a4a0aa81211456f5c1890@huawei.com> Message-ID: <103246f1-4804-c9b2-ddb9-2af13a4c4ebf@redhat.com> On 8/16/22 09:54, Aleksey Shipilev wrote: > On 8/16/22 08:34, wangyadong (E) wrote: >> Yes, we know about these test failures, and we've found that these failures also occur on our >> "powerful" aarch64 servers. I think there is some uncertainty about the test itself, and the >> probability of passing increased when you raise the test parameter of "weakAttempts". > Yes, the test itself is flaky. > > Unfortunately, disabling the test completely is worse option, because it would miss the actual > broken weak CASes. > >> We now run the weak-cas tests with a high "weakAttempts" on unmatched. > > How much you bump weakAttempts to? > > I also think we would be better with some backoff in the loop, so that we have more chance in > succeeding. Let's try this: https://github.com/openjdk/jdk/pull/9889 Please run it on your RISC-V machine, Vladimir? Please run it on your AArch64 machine, Yadong? -- Thanks, -Aleksey From yadonn.wang at huawei.com Tue Aug 16 09:10:58 2022 From: yadonn.wang at huawei.com (wangyadong (E)) Date: Tue, 16 Aug 2022 09:10:58 +0000 Subject: CompareAndSet fails intermittently for riscv In-Reply-To: <103246f1-4804-c9b2-ddb9-2af13a4c4ebf@redhat.com> References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> <4a453e519e5a4a0aa81211456f5c1890@huawei.com> <103246f1-4804-c9b2-ddb9-2af13a4c4ebf@redhat.com> Message-ID: <27d3d1d1f5b64d928daada054a81402d@huawei.com> > Let's try this: > https://github.com/openjdk/jdk/pull/9889 That's ok. We'll try this both on aarch64 and risc-v. > How much you bump weakAttempts to? We took 10 (the current default value) on our aarch64 servers and 100 on the unmatched boards, and it's often 1 failure or 2 after each tier-1 test. -----Original Message----- From: Aleksey Shipilev [mailto:shade at redhat.com] Sent: Tuesday, August 16, 2022 4:24 PM To: wangyadong (E) ; Vladimir Kempik ; riscv-port-dev at openjdk.org Subject: Re: CompareAndSet fails intermittently for riscv On 8/16/22 09:54, Aleksey Shipilev wrote: > On 8/16/22 08:34, wangyadong (E) wrote: >> Yes, we know about these test failures, and we've found that these >> failures also occur on our "powerful" aarch64 servers. I think there >> is some uncertainty about the test itself, and the probability of passing increased when you raise the test parameter of "weakAttempts". > Yes, the test itself is flaky. > > Unfortunately, disabling the test completely is worse option, because > it would miss the actual broken weak CASes. > >> We now run the weak-cas tests with a high "weakAttempts" on unmatched. > > How much you bump weakAttempts to? > > I also think we would be better with some backoff in the loop, so that > we have more chance in succeeding. Let's try this: https://github.com/openjdk/jdk/pull/9889 Please run it on your RISC-V machine, Vladimir? Please run it on your AArch64 machine, Yadong? -- Thanks, -Aleksey From vladimir.kempik at gmail.com Tue Aug 16 10:37:36 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 16 Aug 2022 13:37:36 +0300 Subject: CompareAndSet fails intermittently for riscv In-Reply-To: <27d3d1d1f5b64d928daada054a81402d@huawei.com> References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> <4a453e519e5a4a0aa81211456f5c1890@huawei.com> <103246f1-4804-c9b2-ddb9-2af13a4c4ebf@redhat.com> <27d3d1d1f5b64d928daada054a81402d@huawei.com> Message-ID: <321070E6-AFBF-4215-88E2-7AFB764734CC@gmail.com> Hello Aleksey I have run these tests on hifive board, reported the result in PR it mostly times out, still two failures present. > 16 ???. 2022 ?., ? 12:10, wangyadong (E) ???????(?): > >> Let's try this: >> https://github.com/openjdk/jdk/pull/9889 > > That's ok. We'll try this both on aarch64 and risc-v. > >> How much you bump weakAttempts to? > We took 10 (the current default value) on our aarch64 servers and 100 on the unmatched boards, and it's often 1 failure or 2 after each tier-1 test. > > -----Original Message----- > From: Aleksey Shipilev [mailto:shade at redhat.com] > Sent: Tuesday, August 16, 2022 4:24 PM > To: wangyadong (E) ; Vladimir Kempik ; riscv-port-dev at openjdk.org > Subject: Re: CompareAndSet fails intermittently for riscv > > On 8/16/22 09:54, Aleksey Shipilev wrote: >> On 8/16/22 08:34, wangyadong (E) wrote: >>> Yes, we know about these test failures, and we've found that these >>> failures also occur on our "powerful" aarch64 servers. I think there >>> is some uncertainty about the test itself, and the probability of passing increased when you raise the test parameter of "weakAttempts". >> Yes, the test itself is flaky. >> >> Unfortunately, disabling the test completely is worse option, because >> it would miss the actual broken weak CASes. >> >>> We now run the weak-cas tests with a high "weakAttempts" on unmatched. >> >> How much you bump weakAttempts to? >> >> I also think we would be better with some backoff in the loop, so that >> we have more chance in succeeding. > > Let's try this: > https://github.com/openjdk/jdk/pull/9889 > > Please run it on your RISC-V machine, Vladimir? > Please run it on your AArch64 machine, Yadong? > > -- > Thanks, > -Aleksey > From vladimir.kempik at gmail.com Tue Aug 16 13:43:43 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 16 Aug 2022 16:43:43 +0300 Subject: Some tier1 tests timeout on risc-v Message-ID: Hello Still trying to get clean tier1 results, apart from weakCompareAndSet failures ( which can be ignored on risc-v) I see three more tests to error due to timeouts: test/langtools/tools/javac/7142086/T7142086.java , needed timeout increase from 10 sec to 30 to make test pass on thead board. test/langtools/tools/javac/failover/CheckAttributedTree.java timeout increase from default 480 sec to 600, related to https://bugs.openjdk.org/browse/JDK-6982992 test/langtools/tools/javac/lambda/LambdaParserTest.java timeout increase from default 480 sec to 600 Is this a right way to workaround these issues ? Should we update tests somehow ? Regards, Vladimir From shade at redhat.com Tue Aug 16 15:42:20 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Aug 2022 17:42:20 +0200 Subject: Some tier1 tests timeout on risc-v In-Reply-To: References: Message-ID: <722215ed-c7ab-5baf-091d-91d6557aa711@redhat.com> On 8/16/22 15:43, Vladimir Kempik wrote: > Hello > Still trying to get clean tier1 results, apart from weakCompareAndSet failures ( which can be ignored on risc-v) I see three more tests to error due to timeouts: > > test/langtools/tools/javac/7142086/T7142086.java , needed timeout increase from 10 sec to 30 to make test pass on thead board. > > test/langtools/tools/javac/failover/CheckAttributedTree.java timeout increase from default 480 sec to 600, related to https://bugs.openjdk.org/browse/JDK-6982992 > > test/langtools/tools/javac/lambda/LambdaParserTest.java timeout increase from default 480 sec to 600 > > Is this a right way to workaround these issues ? Should we update tests somehow ? You can always find the arbitrary slow board and/or VM mode where a reasonable default timeout would not suffice. The current timeouts are implicitly selected to work on the majority of cases. For everything else, I think you are supposed to bump the timeout factor: $ make test TEST=tier1 JTREG="TIMEOUT_FACTOR=4" -- Thanks, -Aleksey From yangfei at iscas.ac.cn Wed Aug 17 02:11:18 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Wed, 17 Aug 2022 10:11:18 +0800 (GMT+08:00) Subject: CompareAndSet fails intermittently for riscv In-Reply-To: References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> <4a453e519e5a4a0aa81211456f5c1890@huawei.com> Message-ID: <160cac9b.17fb.182a991c555.Coremail.yangfei@iscas.ac.cn> Hi, > -----Original Messages----- > From: "Aleksey Shipilev" > Sent Time: 2022-08-16 15:54:41 (Tuesday) > To: "wangyadong (E)" , "Vladimir Kempik" , "riscv-port-dev at openjdk.org" > Cc: > Subject: Re: CompareAndSet fails intermittently for riscv > > On 8/16/22 08:34, wangyadong (E) wrote: > > Yes, we know about these test failures, and we've found that these failures also occur on our > > "powerful" aarch64 servers. I think there is some uncertainty about the test itself, and the > > probability of passing increased when you raise the test parameter of "weakAttempts". > Yes, the test itself is flaky. > > Unfortunately, disabling the test completely is worse option, because it would miss the actual > broken weak CASes. > > > We now run the weak-cas tests with a high "weakAttempts" on unmatched. > > How much you bump weakAttempts to? > > I also think we would be better with some backoff in the loop, so that we have more chance in > succeeding. I suspect another jtreg test: test/jdk/java/util/concurrent/atomic/Serial.java also have the same issue. I once reduced this test into [1] and it looks that this will always fail on my unmatched board. But it passes if we disable the intrinsic like: $ java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_weakCompareAndSetLongRelease Serial Looking at the source code, I think this test also invokes the weakCompareAndSetLongRelease in some way. In src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java: public void accumulate(double x) { Cell[] cs; long b, v, r; int m; Cell c; if ((cs = cells) != null || ((r = doubleToRawLongBits (function.applyAsDouble(longBitsToDouble(b = base), x))) != b && !casBase(b, r))) { int index = getProbe(); boolean uncontended = true; if (cs == null || (m = cs.length - 1) < 0 || (c = cs[index & m]) == null || !(uncontended = ((r = doubleToRawLongBits (function.applyAsDouble (longBitsToDouble(v = c.value), x))) == v) || c.cas(v, r))) <======== doubleAccumulate(x, function, uncontended, index); } } In src/java.base/share/classes/java/util/concurrent/atomic/Striped64.java: @jdk.internal.vm.annotation.Contended static final class Cell { volatile long value; Cell(long x) { value = x; } final boolean cas(long cmp, long val) { return VALUE.weakCompareAndSetRelease(this, cmp, val); <======== } ........ final void doubleAccumulate(double x, DoubleBinaryOperator fn, boolean wasUncontended, int index) { if (index == 0) { ThreadLocalRandom.current(); // force initialization index = getProbe(); wasUncontended = true; } for (boolean collide = false;;) { // True if last slot nonempty Cell[] cs; Cell c; int n; long v; if ((cs = cells) != null && (n = cs.length) > 0) { if ((c = cs[(n - 1) & index]) == null) { if (cellsBusy == 0) { // Try to attach new Cell Cell r = new Cell(Double.doubleToRawLongBits(x)); if (cellsBusy == 0 && casCellsBusy()) { try { // Recheck under lock Cell[] rs; int m, j; if ((rs = cells) != null && (m = rs.length) > 0 && rs[j = (m - 1) & index] == null) { rs[j] = r; break; } } finally { cellsBusy = 0; } continue; // Slot is now non-empty } } collide = false; } else if (!wasUncontended) // CAS already known to fail wasUncontended = true; // Continue after rehash else if (c.cas(v = c.value, apply(fn, v, x))) <======== break; ........ [1] import java.util.concurrent.atomic.DoubleAccumulator; import java.util.function.DoubleBinaryOperator; import java.io.Serializable; import java.io.IOException; public class Serial { public static void main(String[] args) { for (int i = 0; i < 12000; i++) { testDoubleAccumulator1(i); testDoubleAccumulator2(i); } } static DoubleBinaryOperator plus = (DoubleBinaryOperator & Serializable) (x, y) -> x + y; static DoubleAccumulator a = new DoubleAccumulator(plus, 13.9d); static DoubleAccumulator b = new DoubleAccumulator(plus, 13.9d); static void testDoubleAccumulator1(int i) { a.accumulate(17.5d); b.accumulate(17.5d); if (a.get() != b.get()) throw new RuntimeException("Unexpected value, iter: " + i); } static void testDoubleAccumulator2(int i) { a.reset(); b.reset(); if (a.get() != b.get()) { System.out.println("====> a.get() : " + a.get()); System.out.println("====> b.get() : " + b.get()); throw new RuntimeException("Unexpected value after reset, iter: " + i); } } } From vladimir.kempik at gmail.com Mon Aug 22 08:56:25 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 22 Aug 2022 11:56:25 +0300 Subject: tier2 not clean Message-ID: <4C8C2649-D0AD-49B0-9B04-E8E6CD50E84F@gmail.com> Hello Trying to get a clean result on tier1/2 and found this test ( part of tier2) to constantly fail on risc-v ( both in qemu and thead): runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java I haven?t found anything about it in JBS. Is it something known or should I file a bug in jbs ? Regard, Vladimir. [2.367s][error][cds ] java.lang.ClassFormatError: Incompatible magic value 16909060 in class file jdk/internal/math/FDBigInteger VM exits due to exception, use -Xlog:cds,exceptions=trace for detail ]; stderr: [] exitValue = 255 java.lang.RuntimeException: 'Preload Warning: Cannot find jdk/internal/math/FDBigInteger' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) at ExceptionDuringDumpAtObjectsInitPhase.main(ExceptionDuringDumpAtObjectsInitPhase.java:71) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125) at java.base/java.lang.Thread.run(Thread.java:1589) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangfeilong at huawei.com Mon Aug 22 10:58:45 2022 From: jiangfeilong at huawei.com (jiangfeilong) Date: Mon, 22 Aug 2022 10:58:45 +0000 Subject: tier2 not clean In-Reply-To: <4C8C2649-D0AD-49B0-9B04-E8E6CD50E84F@gmail.com> References: <4C8C2649-D0AD-49B0-9B04-E8E6CD50E84F@gmail.com> Message-ID: <69ec34f7951042a2910b05567b3a48f0@huawei.com> Hi Vladimir, I?ve run the test you mentioned on the unmatched board and it passed without failure. The command is as follows: make run-test TEST="test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java" Building target 'run-test' in configuration 'linux-riscv64-server-release' Test selection 'test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java', will run: * jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java Running test 'jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java' Passed: runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java Test results: passed: 1 Finished running test 'jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java' Test report is stored in build/linux-riscv64-server-release/test-results/jtreg_test_hotspot_jtreg_runtime_cds_appcds_javaldr_ExceptionDuringDumpAtObjectsInitPhase_java ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java 1 1 0 0 ============================== TEST SUCCESS Did you test on unmatched board, or would you please provide the command that you run this test? Thanks, Feilong. From: riscv-port-dev On Behalf Of Vladimir Kempik Sent: Monday, August 22, 2022 4:56 PM To: riscv-port-dev at openjdk.org Subject: tier2 not clean Hello Trying to get a clean result on tier1/2 and found this test ( part of tier2) to constantly fail on risc-v ( both in qemu and thead): runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java I haven?t found anything about it in JBS. Is it something known or should I file a bug in jbs ? Regard, Vladimir. [2.367s][error][cds ] java.lang.ClassFormatError: Incompatible magic value 16909060 in class file jdk/internal/math/FDBigInteger VM exits due to exception, use -Xlog:cds,exceptions=trace for detail ]; stderr: [] exitValue = 255 java.lang.RuntimeException: 'Preload Warning: Cannot find jdk/internal/math/FDBigInteger' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) at ExceptionDuringDumpAtObjectsInitPhase.main(ExceptionDuringDumpAtObjectsInitPhase.java:71) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125) at java.base/java.lang.Thread.run(Thread.java:1589) -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Mon Aug 22 12:20:43 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 22 Aug 2022 15:20:43 +0300 Subject: tier2 not clean In-Reply-To: <69ec34f7951042a2910b05567b3a48f0@huawei.com> References: <4C8C2649-D0AD-49B0-9B04-E8E6CD50E84F@gmail.com> <69ec34f7951042a2910b05567b3a48f0@huawei.com> Message-ID: <5E6B032B-EB6D-459E-84F3-B3B75F6B71E6@gmail.com> Hello launching it two ways: 1) make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/java_tiers/jdk20 JT_HOME=~/java_tiers/jtreg6.1/ JDK_IMAGE_DIR=~/java_tiers/jdk20 TEST_IMAGE_DIR=~/java_tiers/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=2" run-test-prebuilt TEST="test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java" 2) ~/java_tiers/jdk20/bin/java -jar ~/java_tiers/jtreg6.1/lib/jtreg.jar test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java both produce same result, (with and without classes.jsa present in lib/server path) tested on sifive unmatched/qemu/alibaba thead rvb-ice, jdk19/20. the jdk is crosscompiled, so missing classes.jsa ( for appcds) by default. jdk is build with gcc 11.2.0 Regards, Vladimir > 22 ???. 2022 ?., ? 13:58, jiangfeilong ???????(?): > > Hi Vladimir, > > I?ve run the test you mentioned on the unmatched board and it passed without failure. The command is as follows: > > make run-test TEST="test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java" > Building target 'run-test' in configuration 'linux-riscv64-server-release' > Test selection 'test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java', will run: > * jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java > > Running test 'jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java' > Passed: runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java > Test results: passed: 1 > > Finished running test 'jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java' > Test report is stored in build/linux-riscv64-server-release/test-results/jtreg_test_hotspot_jtreg_runtime_cds_appcds_javaldr_ExceptionDuringDumpAtObjectsInitPhase_java > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java > 1 1 0 0 > ============================== > TEST SUCCESS > > Did you test on unmatched board, or would you please provide the command that you run this test? > > Thanks, Feilong. > > From: riscv-port-dev On Behalf Of Vladimir Kempik > Sent: Monday, August 22, 2022 4:56 PM > To: riscv-port-dev at openjdk.org > Subject: tier2 not clean > > Hello > Trying to get a clean result on tier1/2 and found this test ( part of tier2) to constantly fail on risc-v ( both in qemu and thead): > > runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java > > I haven?t found anything about it in JBS. > > Is it something known or should I file a bug in jbs ? > > Regard, Vladimir. > > > [2.367s][error][cds ] java.lang.ClassFormatError: Incompatible magic value 16909060 in class file jdk/internal/math/FDBigInteger > VM exits due to exception, use -Xlog:cds,exceptions=trace for detail > ]; > stderr: [] > exitValue = 255 > > java.lang.RuntimeException: 'Preload Warning: Cannot find jdk/internal/math/FDBigInteger' missing from stdout/stderr > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) > at ExceptionDuringDumpAtObjectsInitPhase.main(ExceptionDuringDumpAtObjectsInitPhase.java:71) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:578) > at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125) > at java.base/java.lang.Thread.run(Thread.java:1589) -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Mon Aug 22 12:45:01 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 22 Aug 2022 15:45:01 +0300 Subject: tier2 not clean In-Reply-To: <5E6B032B-EB6D-459E-84F3-B3B75F6B71E6@gmail.com> References: <4C8C2649-D0AD-49B0-9B04-E8E6CD50E84F@gmail.com> <69ec34f7951042a2910b05567b3a48f0@huawei.com> <5E6B032B-EB6D-459E-84F3-B3B75F6B71E6@gmail.com> Message-ID: <023BE584-AD31-4936-B953-56A061ED8B49@gmail.com> Also, invalid magic value for class - 16909060 is 0x01020304 in hex. > 22 ???. 2022 ?., ? 15:20, Vladimir Kempik ???????(?): > > Hello > launching it two ways: > > 1) make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/java_tiers/jdk20 JT_HOME=~/java_tiers/jtreg6.1/ JDK_IMAGE_DIR=~/java_tiers/jdk20 TEST_IMAGE_DIR=~/java_tiers/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=2" run-test-prebuilt TEST="test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java" > > 2) ~/java_tiers/jdk20/bin/java -jar ~/java_tiers/jtreg6.1/lib/jtreg.jar test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java > > both produce same result, (with and without classes.jsa present in lib/server path) > > tested on sifive unmatched/qemu/alibaba thead rvb-ice, jdk19/20. > > the jdk is crosscompiled, so missing classes.jsa ( for appcds) by default. > > jdk is build with gcc 11.2.0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Mon Aug 22 13:33:29 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 22 Aug 2022 16:33:29 +0300 Subject: tier2 not clean In-Reply-To: <023BE584-AD31-4936-B953-56A061ED8B49@gmail.com> References: <4C8C2649-D0AD-49B0-9B04-E8E6CD50E84F@gmail.com> <69ec34f7951042a2910b05567b3a48f0@huawei.com> <5E6B032B-EB6D-459E-84F3-B3B75F6B71E6@gmail.com> <023BE584-AD31-4936-B953-56A061ED8B49@gmail.com> Message-ID: One more update, the issue seems to be an issue with my build, as Alexey?s build from https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-riscv64-server-release-gcc12-glibc2.33.tar.xz ( which in fact is built with gcc 11.2.0 - "buildbot" with gcc 11.2.0 ) is fine. I will investigate and update if there are any issues on jdk side to be fixed. > 22 ???. 2022 ?., ? 15:45, Vladimir Kempik ???????(?): > > Also, invalid magic value for class - 16909060 is 0x01020304 in hex. > >> 22 ???. 2022 ?., ? 15:20, Vladimir Kempik > ???????(?): >> >> Hello >> launching it two ways: >> >> 1) make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/java_tiers/jdk20 JT_HOME=~/java_tiers/jtreg6.1/ JDK_IMAGE_DIR=~/java_tiers/jdk20 TEST_IMAGE_DIR=~/java_tiers/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=2" run-test-prebuilt TEST="test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java" >> >> 2) ~/java_tiers/jdk20/bin/java -jar ~/java_tiers/jtreg6.1/lib/jtreg.jar test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java >> >> both produce same result, (with and without classes.jsa present in lib/server path) >> >> tested on sifive unmatched/qemu/alibaba thead rvb-ice, jdk19/20. >> >> the jdk is crosscompiled, so missing classes.jsa ( for appcds) by default. >> >> jdk is build with gcc 11.2.0 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Mon Aug 22 15:47:28 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 22 Aug 2022 18:47:28 +0300 Subject: tier2 not clean In-Reply-To: References: <4C8C2649-D0AD-49B0-9B04-E8E6CD50E84F@gmail.com> <69ec34f7951042a2910b05567b3a48f0@huawei.com> <5E6B032B-EB6D-459E-84F3-B3B75F6B71E6@gmail.com> <023BE584-AD31-4936-B953-56A061ED8B49@gmail.com> Message-ID: <9ADE6714-063B-43DE-9904-1D77FAC1572C@gmail.com> it?s an interesting bug/feature. I was building riscv jdk with cross-compilation. first step was to build build-jdk for host system (x86_64) from the same source code ( bootjdk = jdk18_x86_64) second step was to build risc-v jdk using build-jdk from previous step and bootjdk = jdk18_x86_64 the issue were cause by disabling C2 (jvm options : -compiler2) in build-jdk ( and only in build-jdk) this resulted in FDBigInteger class being missed in lib/classlist file ( it?s a symptom tho) while the class were still present in java.base.mod for some reason it was affecting not only build-jdk but the final jdk too. After removing -compiler2 from --with-jvm-features of build-jdk the test passes Regards, Vladimir > 22 ???. 2022 ?., ? 16:33, Vladimir Kempik ???????(?): > > One more update, the issue seems to be an issue with my build, as Alexey?s build from https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-riscv64-server-release-gcc12-glibc2.33.tar.xz ( which in fact is built with gcc 11.2.0 - "buildbot" with gcc 11.2.0 ) is fine. > > I will investigate and update if there are any issues on jdk side to be fixed. > >> 22 ???. 2022 ?., ? 15:45, Vladimir Kempik > ???????(?): >> >> Also, invalid magic value for class - 16909060 is 0x01020304 in hex. >> >>> 22 ???. 2022 ?., ? 15:20, Vladimir Kempik > ???????(?): >>> >>> Hello >>> launching it two ways: >>> >>> 1) make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/java_tiers/jdk20 JT_HOME=~/java_tiers/jtreg6.1/ JDK_IMAGE_DIR=~/java_tiers/jdk20 TEST_IMAGE_DIR=~/java_tiers/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=2" run-test-prebuilt TEST="test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java" >>> >>> 2) ~/java_tiers/jdk20/bin/java -jar ~/java_tiers/jtreg6.1/lib/jtreg.jar test/hotspot/jtreg/runtime/cds/appcds/javaldr/ExceptionDuringDumpAtObjectsInitPhase.java >>> >>> both produce same result, (with and without classes.jsa present in lib/server path) >>> >>> tested on sifive unmatched/qemu/alibaba thead rvb-ice, jdk19/20. >>> >>> the jdk is crosscompiled, so missing classes.jsa ( for appcds) by default. >>> >>> jdk is build with gcc 11.2.0 >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at redhat.com Wed Aug 24 17:23:42 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Aug 2022 19:23:42 +0200 Subject: DoubleAccumulator and RISC-V weirdness Message-ID: <5250883d-f188-c3b4-84d9-f83ac1487b4e@redhat.com> (cc-ing risc-port-dev@) Hi Doug, A (little?) puzzle for you. We in RISC-V land are facing a weird bug. On this test: import java.util.concurrent.atomic.DoubleAccumulator; import java.util.function.DoubleBinaryOperator; public class Serial { public static void main(String[] args) { for (int i = 0; i < 12000; i++) { test(i); } } static void test(int i) { DoubleBinaryOperator plus = (x, y) -> x + y; DoubleAccumulator a = new DoubleAccumulator(plus, 13.9d); DoubleAccumulator b = new DoubleAccumulator(plus, 13.9d); a.accumulate(17.5d); b.accumulate(17.5d); if (a.get() != b.get()) throw new RuntimeException("Unexpected value, iter: " + i); /* System.out.println("==== a before: " + a.get()); a.debug(); System.out.println("==== b before: " + b.get()); b.debug(); System.out.println("!!! RESET"); */ a.reset(); b.reset(); /* System.out.println("==== a after: " + a.get()); a.debug(); System.out.println("==== b after: " + b.get()); b.debug(); */ if (a.get() != b.get()) { throw new RuntimeException("Unexpected value after reset, iter: " + i); } } } ...and RISC-V machines, we have: $ ~/test-jdk/bin/java Serial Exception in thread "main" java.lang.RuntimeException: Unexpected value after reset, iter: 3777 at Serial.test(Serial.java:28) at Serial.main(Serial.java:8) I cannot find anything obviously wrong with RISC-V port CASes implementation, so I started suspecting DoubleAccumulator itself. Here is what I know: - DoubleAccumulator/Striped64 use weakCompareAndSetRelease; - RISC-V implements weak CASes with LR/SC; - On our existing hardware implementations, LR/SC seem to be so weak, they spuriously fail often; - Replacing weak CASes implementations with strong ones (= introducing retry loops) fixes the test; If I dump stuff from the DoubleAccumulator itself by using: /** * Show me what you got. */ public void debug() { System.out.println("Base: " + longBitsToDouble(base)); System.out.println("Cells: "); Cell[] cs = cells; if (cs != null) { for (Cell c : cells) { if (c != null) { System.out.println(" " + longBitsToDouble(c.value)); } else { System.out.println(" null"); } } } else { System.out.println(" Nothing."); } System.out.println(); } ...then "a" and "b" are structurally different, and they do produce different sums after the reset: ==== a before: 31.4 Base: 31.4 Cells: Nothing. ==== b before: 31.4 Base: 13.9 Cells: null 17.5 !!! RESET ==== a after: 13.9 Base: 13.9 Cells: Nothing. ==== b after: 27.8 Base: 13.9 Cells: null 13.9 So what I think happens is following: - In DoubleAccumulator.accumulate(), we tried to weak-CAS-add the value to base; - That weak-CAS spuriously failed (even in single-threaded mode!), we proceeded to create the cell; - DoubleAccumulator.reset() came, reset both base and the cell to identity; - Then we asked for DoubleAccumulator.get(), and got twice the value (base + cell); This looks to be a bona-fide DoubleAccumulator logic bug, that we were just able to detect when weak CAS spuriously failed in single-threaded workload. Shouldn't we reset cells to 0, rather than to "identity"? diff --git a/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java b/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java index d04b8253ee7..4bb638348a6 100644 --- a/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java +++ b/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java @@ -160,7 +160,7 @@ public class DoubleAccumulator extends Striped64 implements Serializable { if (cs != null) { for (Cell c : cs) if (c != null) - c.reset(identity); + c.reset(0L); } } This fixes the test on RISC-V too. On x86_64, java/util/concurrent tests also pass. Thoughts? If you agree this is a bug, I can PR the fix :) -- Thanks, -Aleksey From shade at redhat.com Wed Aug 24 17:33:47 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Aug 2022 19:33:47 +0200 Subject: DoubleAccumulator and RISC-V weirdness In-Reply-To: <5250883d-f188-c3b4-84d9-f83ac1487b4e@redhat.com> References: <5250883d-f188-c3b4-84d9-f83ac1487b4e@redhat.com> Message-ID: <44b58975-8c8e-2161-8ce2-83347c3d193d@redhat.com> On 8/24/22 19:23, Aleksey Shipilev wrote: > Thoughts? If you agree this is a bug, I can PR the fix :) Dang. The minute I hit the "Send" button I of course realized this is "identity" (!), so applying the function to it was supposed to be neutral. It is then the user error to put in binary addition as the function, for which "13.9" is not an identity. The test we looked at is the minimized version of ./test/jdk/java/util/concurrent/atomic/Serial.java: https://github.com/openjdk/jdk/blob/568be58e8521e5e87baca1872ba8cc1941607bb7/test/jdk/java/util/concurrent/atomic/Serial.java#L66-L79 ...which makes it a test bug? -- Thanks, -Aleksey From shade at redhat.com Wed Aug 24 18:15:50 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Aug 2022 20:15:50 +0200 Subject: DoubleAccumulator and RISC-V weirdness In-Reply-To: <44b58975-8c8e-2161-8ce2-83347c3d193d@redhat.com> References: <5250883d-f188-c3b4-84d9-f83ac1487b4e@redhat.com> <44b58975-8c8e-2161-8ce2-83347c3d193d@redhat.com> Message-ID: <5a3fb28f-cf75-8202-ca99-d00b53a1bb61@redhat.com> On 8/24/22 19:33, Aleksey Shipilev wrote: > On 8/24/22 19:23, Aleksey Shipilev wrote: >> Thoughts? If you agree this is a bug, I can PR the fix :) > > Dang. The minute I hit the "Send" button I of course realized this is "identity" (!), so applying > the function to it was supposed to be neutral. It is then the user error to put in binary addition > as the function, for which "13.9" is not an identity. > > The test we looked at is the minimized version of ./test/jdk/java/util/concurrent/atomic/Serial.java: > > https://github.com/openjdk/jdk/blob/568be58e8521e5e87baca1872ba8cc1941607bb7/test/jdk/java/util/concurrent/atomic/Serial.java#L66-L79 > > ...which makes it a test bug? Here we go: https://bugs.openjdk.org/browse/JDK-8292877 https://github.com/openjdk/jdk/pull/10002 -- Thanks, -Aleksey From shade at redhat.com Wed Aug 24 18:21:54 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Aug 2022 20:21:54 +0200 Subject: CompareAndSet fails intermittently for riscv In-Reply-To: <160cac9b.17fb.182a991c555.Coremail.yangfei@iscas.ac.cn> References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> <4a453e519e5a4a0aa81211456f5c1890@huawei.com> <160cac9b.17fb.182a991c555.Coremail.yangfei@iscas.ac.cn> Message-ID: <67386e23-ba05-aa1b-70e3-f9739188bd95@redhat.com> On 8/17/22 04:11, yangfei at iscas.ac.cn wrote: > I suspect another jtreg test: test/jdk/java/util/concurrent/atomic/Serial.java also have the same issue. > I once reduced this test into [1] and it looks that this will always fail on my unmatched board. > But it passes if we disable the intrinsic like: > $ java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_weakCompareAndSetLongRelease Serial Whoa, I missed this. This is actually the test bug that manifests on single-threaded RISC-V :P https://mail.openjdk.org/pipermail/riscv-port-dev/2022-August/000594.html -- Thanks, -Aleksey From dl at cs.oswego.edu Wed Aug 24 19:34:02 2022 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 24 Aug 2022 15:34:02 -0400 Subject: DoubleAccumulator and RISC-V weirdness In-Reply-To: <5a3fb28f-cf75-8202-ca99-d00b53a1bb61@redhat.com> References: <5250883d-f188-c3b4-84d9-f83ac1487b4e@redhat.com> <44b58975-8c8e-2161-8ce2-83347c3d193d@redhat.com> <5a3fb28f-cf75-8202-ca99-d00b53a1bb61@redhat.com> Message-ID: <4ccc955c-0022-5d6d-202e-c490ffb719cb@cs.oswego.edu> Thanks for finding and fixing this. Embarrassingly, we once received a bug report about an application also misusing identity, and closed it without checking if the same error appeared in tests. On 8/24/22 14:15, Aleksey Shipilev wrote: > On 8/24/22 19:33, Aleksey Shipilev wrote: >> On 8/24/22 19:23, Aleksey Shipilev wrote: >>> Thoughts? If you agree this is a bug, I can PR the fix :) >> >> Dang. The minute I hit the "Send" button I of course realized this is >> "identity" (!), so applying >> the function to it was supposed to be neutral. It is then the user >> error to put in binary addition >> as the function, for which "13.9" is not an identity. >> >> The test we looked at is the minimized version of >> ./test/jdk/java/util/concurrent/atomic/Serial.java: >> https://github.com/openjdk/jdk/blob/568be58e8521e5e87baca1872ba8cc1941607bb7/test/jdk/java/util/concurrent/atomic/Serial.java#L66-L79 >> >> ...which makes it a test bug? > > Here we go: > ? https://bugs.openjdk.org/browse/JDK-8292877 > ? https://github.com/openjdk/jdk/pull/10002 > From yangfei at iscas.ac.cn Thu Aug 25 08:08:34 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Thu, 25 Aug 2022 16:08:34 +0800 (GMT+08:00) Subject: CompareAndSet fails intermittently for riscv In-Reply-To: <67386e23-ba05-aa1b-70e3-f9739188bd95@redhat.com> References: <130A0810-C340-452B-851A-133DCFDDEA2C@gmail.com> <4a453e519e5a4a0aa81211456f5c1890@huawei.com> <160cac9b.17fb.182a991c555.Coremail.yangfei@iscas.ac.cn> <67386e23-ba05-aa1b-70e3-f9739188bd95@redhat.com> Message-ID: <1172aff3.2011.182d40bb891.Coremail.yangfei@iscas.ac.cn> Hi, > -----Original Messages----- > From: "Aleksey Shipilev" > Sent Time: 2022-08-25 02:21:54 (Thursday) > To: yangfei at iscas.ac.cn > Cc: "wangyadong (E)" , "Vladimir Kempik" , "riscv-port-dev at openjdk.org" > Subject: Re: CompareAndSet fails intermittently for riscv > > On 8/17/22 04:11, yangfei at iscas.ac.cn wrote: > > I suspect another jtreg test: test/jdk/java/util/concurrent/atomic/Serial.java also have the same issue. > > I once reduced this test into [1] and it looks that this will always fail on my unmatched board. > > But it passes if we disable the intrinsic like: > > $ java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_weakCompareAndSetLongRelease Serial > Whoa, I missed this. This is actually the test bug that manifests on single-threaded RISC-V :P > https://mail.openjdk.org/pipermail/riscv-port-dev/2022-August/000594.html Nice analysis. Suprprised to know it's in fact a bug for such an old test case! Thanks, Fei From shade at redhat.com Mon Aug 29 10:27:45 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 29 Aug 2022 12:27:45 +0200 Subject: Mainline is broken? Message-ID: Hi there, I am trying to run the RISC-V tests in mainline, and it fails like: # make test JTREG="TIMEOUT_FACTOR=32" TEST=compiler/unsafe/ TEST_VM_OPTS="-DweakAttempts=10000" | ts -s 00:00:00 Building target 'test' in configuration 'linux-riscv64-server-fastdebug' 00:17:10 Creating interim java.base.jmod 00:17:56 Creating interim jimage 00:19:16 Creating jdk.jlink.jmod 00:19:25 Creating java.base.jmod 00:20:35 Creating jdk image 00:21:15 java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class jdk.tools.jlink.plugin.ResourcePoolEntry (jdk.internal.loader.ClassLoaders$AppClassLoader is in module java.base of loader 'bootstrap'; jdk.tools.jlink.plugin.ResourcePoolEntry is in module jdk.jlink of loader 'app') 00:21:15 at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) 00:21:15 at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ... I managed to bisect this to: commit 054c23f484522881a0879176383d970a8de41201 Author: Erik ?sterlund Date: Thu Aug 25 09:48:55 2022 +0000 8290025: Remove the Sweeper Reviewed-by: stefank, kvn, iveresov, coleenp, vlivanov, mdoerr ...but I wonder if this is just my board misbehaving, or anyone else is seeing it? -- Thanks, -Aleksey From yangfei at iscas.ac.cn Mon Aug 29 10:57:28 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Mon, 29 Aug 2022 18:57:28 +0800 (GMT+08:00) Subject: Mainline is broken? In-Reply-To: References: Message-ID: <2e38c3b9.39d2.182e93fcbf3.Coremail.yangfei@iscas.ac.cn> Hi, > -----Original Messages----- > From: "Aleksey Shipilev" > Sent Time: 2022-08-29 18:27:45 (Monday) > To: "riscv-port-dev at openjdk.org" > Cc: > Subject: Mainline is broken? > > Hi there, > > I am trying to run the RISC-V tests in mainline, and it fails like: > > # make test JTREG="TIMEOUT_FACTOR=32" TEST=compiler/unsafe/ TEST_VM_OPTS="-DweakAttempts=10000" | ts -s > 00:00:00 Building target 'test' in configuration 'linux-riscv64-server-fastdebug' > 00:17:10 Creating interim java.base.jmod > 00:17:56 Creating interim jimage > 00:19:16 Creating jdk.jlink.jmod > 00:19:25 Creating java.base.jmod > 00:20:35 Creating jdk image > 00:21:15 java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot > be cast to class jdk.tools.jlink.plugin.ResourcePoolEntry > (jdk.internal.loader.ClassLoaders$AppClassLoader is in module java.base of loader 'bootstrap'; > jdk.tools.jlink.plugin.ResourcePoolEntry is in module jdk.jlink of loader 'app') > 00:21:15 at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > 00:21:15 at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) > ... > > I managed to bisect this to: > > commit 054c23f484522881a0879176383d970a8de41201 > Author: Erik ?sterlund > Date: Thu Aug 25 09:48:55 2022 +0000 > > 8290025: Remove the Sweeper > > Reviewed-by: stefank, kvn, iveresov, coleenp, vlivanov, mdoerr > > > ...but I wonder if this is just my board misbehaving, or anyone else is seeing it? Yes, we have noticed this build failure last weekend and have fixed it: https://github.com/openjdk/jdk/pull/10056 Could you please take a look? The exact fix for the problem was in changes in file: src/hotspot/cpu/riscv/gc/g1/g1BarrierSetAssembler_riscv.cpp We also noticed that ZGC and shenandoah won't work after JDK-8290025. This PR also provides the necessary RV-specific changes for JDK-8290025. Thanks, Fei From shade at redhat.com Mon Aug 29 11:05:25 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 29 Aug 2022 13:05:25 +0200 Subject: Mainline is broken? In-Reply-To: <2e38c3b9.39d2.182e93fcbf3.Coremail.yangfei@iscas.ac.cn> References: <2e38c3b9.39d2.182e93fcbf3.Coremail.yangfei@iscas.ac.cn> Message-ID: <62af6320-8265-ec74-efe8-7d2ec0c810cf@redhat.com> On 8/29/22 12:57, yangfei at iscas.ac.cn wrote: > Yes, we have noticed this build failure last weekend and have fixed it: https://github.com/openjdk/jdk/pull/10056 > Could you please take a look? > > The exact fix for the problem was in changes in file: src/hotspot/cpu/riscv/gc/g1/g1BarrierSetAssembler_riscv.cpp > We also noticed that ZGC and shenandoah won't work after JDK-8290025. This PR also provides the necessary RV-specific changes for JDK-8290025. Ah, that PR is literally next in my TODO queue for today. I am now trying to build/run with that patch. What tripped me is that PR looks like the enhancement, and the JBS issue is Enhancement too. Perhaps it should be named/marked more appropriately? -- Thanks, -Aleksey From shade at redhat.com Mon Aug 29 16:34:28 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 29 Aug 2022 18:34:28 +0200 Subject: Mainline is broken? In-Reply-To: <62af6320-8265-ec74-efe8-7d2ec0c810cf@redhat.com> References: <2e38c3b9.39d2.182e93fcbf3.Coremail.yangfei@iscas.ac.cn> <62af6320-8265-ec74-efe8-7d2ec0c810cf@redhat.com> Message-ID: On 8/29/22 13:05, Aleksey Shipilev wrote: > On 8/29/22 12:57, yangfei at iscas.ac.cn wrote: >> Yes, we have noticed this build failure last weekend and have fixed it: https://github.com/openjdk/jdk/pull/10056 >> Could you please take a look? >> >> The exact fix for the problem was in changes in file: src/hotspot/cpu/riscv/gc/g1/g1BarrierSetAssembler_riscv.cpp >> We also noticed that ZGC and shenandoah won't work after JDK-8290025. This PR also provides the necessary RV-specific changes for JDK-8290025. > > Ah, that PR is literally next in my TODO queue for today. I am now trying to build/run with that patch. Yes, PR 10056 works! -- Thanks, -Aleksey From vladimir.kempik at gmail.com Mon Aug 29 16:50:07 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 29 Aug 2022 20:50:07 +0400 Subject: One more test fails from vmTestBase Message-ID: <894008AE-8213-41D3-B0CA-23DFF3BC76DE@gmail.com> Hello While running some tests (tier4:hotspot) and especially vmTestbase set of tests I have found some test to constantly fail on jdk20/risc-v: test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java seeing lots of these errors in the log: JNI object public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" public int nsk.stress.jni.objectsJNI.i = 1072273735 public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" public float nsk.stress.jni.objectsJNI.f = 0.49921197 public double nsk.stress.jni.objectsJNI.d = NaN Java object public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" public int nsk.stress.jni.objectsJNI.i = 1072273735 public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" public float nsk.stress.jni.objectsJNI.f = 0.955677 public double nsk.stress.jni.objectsJNI.d = 0.49921197125621386 The fields No. 4 are different Objects are different That doesn?t sound right, should I file a bug in JBS ? Running it this way: make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/syntaj20 JT_HOME=~/jtreg6.1/ JDK_IMAGE_DIR=~/syntaj20 TEST_IMAGE_DIR=~/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=8" run-test-prebuilt TEST="test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java" Checked on hifive and thead Regards, Vladimir. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangfei at iscas.ac.cn Mon Aug 29 23:39:36 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Tue, 30 Aug 2022 07:39:36 +0800 (GMT+08:00) Subject: Mainline is broken? In-Reply-To: References: <2e38c3b9.39d2.182e93fcbf3.Coremail.yangfei@iscas.ac.cn> <62af6320-8265-ec74-efe8-7d2ec0c810cf@redhat.com> Message-ID: <53845968.4e07.182ebf98d76.Coremail.yangfei@iscas.ac.cn> Hi Aleksey, > -----Original Messages----- > From: "Aleksey Shipilev" > Sent Time: 2022-08-30 00:34:28 (Tuesday) > To: yangfei at iscas.ac.cn > Cc: "riscv-port-dev at openjdk.org" > Subject: Re: Mainline is broken? > > On 8/29/22 13:05, Aleksey Shipilev wrote: > > On 8/29/22 12:57, yangfei at iscas.ac.cn wrote: > >> Yes, we have noticed this build failure last weekend and have fixed it: https://github.com/openjdk/jdk/pull/10056 > >> Could you please take a look? > >> > >> The exact fix for the problem was in changes in file: src/hotspot/cpu/riscv/gc/g1/g1BarrierSetAssembler_riscv.cpp > >> We also noticed that ZGC and shenandoah won't work after JDK-8290025. This PR also provides the necessary RV-specific changes for JDK-8290025. > > > > Ah, that PR is literally next in my TODO queue for today. I am now trying to build/run with that patch. > > Yes, PR 10056 works! Great to know that it resolves the problem :-) We witnessed silmilar issues were happening infrequently in JDK mainline resulting in build failures on linux-riscv64. Those issues are supposed to be exposed during the PR integration process. And this reminds me of the GHA support for linux-riscv64: https://bugs.openjdk.org/browse/JDK-8283929 Is this still on your radar? Thanks, Fei From jiangfeilong at huawei.com Tue Aug 30 11:22:42 2022 From: jiangfeilong at huawei.com (jiangfeilong) Date: Tue, 30 Aug 2022 11:22:42 +0000 Subject: One more test fails from vmTestBase In-Reply-To: <894008AE-8213-41D3-B0CA-23DFF3BC76DE@gmail.com> References: <894008AE-8213-41D3-B0CA-23DFF3BC76DE@gmail.com> Message-ID: <4ae8694d451d42f6a0cac3f504570fbb@huawei.com> Thanks for pointing this out. I got the same error on the unmatched board. > That doesn?t sound right, should I file a bug in JBS ? Yes, please file a bug to help track this issue. Thanks, Feilong. From: riscv-port-dev On Behalf Of Vladimir Kempik Sent: Tuesday, August 30, 2022 12:50 AM To: riscv-port-dev at openjdk.org Subject: One more test fails from vmTestBase Hello While running some tests (tier4:hotspot) and especially vmTestbase set of tests I have found some test to constantly fail on jdk20/risc-v: test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java seeing lots of these errors in the log: JNI object public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" public int nsk.stress.jni.objectsJNI.i = 1072273735 public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" public float nsk.stress.jni.objectsJNI.f = 0.49921197 public double nsk.stress.jni.objectsJNI.d = NaN Java object public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" public int nsk.stress.jni.objectsJNI.i = 1072273735 public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" public float nsk.stress.jni.objectsJNI.f = 0.955677 public double nsk.stress.jni.objectsJNI.d = 0.49921197125621386 The fields No. 4 are different Objects are different That doesn?t sound right, should I file a bug in JBS ? Running it this way: make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/syntaj20 JT_HOME=~/jtreg6.1/ JDK_IMAGE_DIR=~/syntaj20 TEST_IMAGE_DIR=~/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=8" run-test-prebuilt TEST="test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java" Checked on hifive and thead Regards, Vladimir. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Tue Aug 30 12:40:09 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 30 Aug 2022 16:40:09 +0400 Subject: One more test fails from vmTestBase In-Reply-To: <4ae8694d451d42f6a0cac3f504570fbb@huawei.com> References: <894008AE-8213-41D3-B0CA-23DFF3BC76DE@gmail.com> <4ae8694d451d42f6a0cac3f504570fbb@huawei.com> Message-ID: Ok, just wanted to make sure it?s not only me ( who sees failure) filled https://bugs.openjdk.org/browse/JDK-8293100 Regards ,Vladimir > 30 ???. 2022 ?., ? 15:22, jiangfeilong ???????(?): > > Thanks for pointing this out. I got the same error on the unmatched board. > > That doesn?t sound right, should I file a bug in JBS ? > Yes, please file a bug to help track this issue. > > Thanks, Feilong. > > From: riscv-port-dev On Behalf Of Vladimir Kempik > Sent: Tuesday, August 30, 2022 12:50 AM > To: riscv-port-dev at openjdk.org > Subject: One more test fails from vmTestBase > > Hello > While running some tests (tier4:hotspot) and especially vmTestbase set of tests I have found some test to constantly fail on jdk20/risc-v: > > test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java > > seeing lots of these errors in the log: > > JNI object > public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" > public int nsk.stress.jni.objectsJNI.i = 1072273735 > public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 > public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" > public float nsk.stress.jni.objectsJNI.f = 0.49921197 > public double nsk.stress.jni.objectsJNI.d = NaN > Java object > public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" > public int nsk.stress.jni.objectsJNI.i = 1072273735 > public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 > public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" > public float nsk.stress.jni.objectsJNI.f = 0.955677 > public double nsk.stress.jni.objectsJNI.d = 0.49921197125621386 > The fields No. 4 are different > Objects are different > > > That doesn?t sound right, should I file a bug in JBS ? > > Running it this way: > make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/syntaj20 JT_HOME=~/jtreg6.1/ JDK_IMAGE_DIR=~/syntaj20 TEST_IMAGE_DIR=~/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=8" run-test-prebuilt TEST="test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java" > Checked on hifive and thead > > > Regards, Vladimir. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yunyao.zxl at alibaba-inc.com Wed Aug 31 08:33:31 2022 From: yunyao.zxl at alibaba-inc.com (Xiaolin Zheng) Date: Wed, 31 Aug 2022 16:33:31 +0800 Subject: =?UTF-8?B?UmU6IE9uZSBtb3JlIHRlc3QgZmFpbHMgZnJvbSB2bVRlc3RCYXNl?= In-Reply-To: References: <894008AE-8213-41D3-B0CA-23DFF3BC76DE@gmail.com> <4ae8694d451d42f6a0cac3f504570fbb@huawei.com>, Message-ID: Hi Vladimir, Thanks for reporting this issue. I took a dive into this failure and found the root cause is that callee-saved float registers are unfortunately missed to be saved in StubGenerator::generate_call_stub(). These tests are passed at my local branch now and I will file a patch to solve this asap. Best, Xiaolin ------------------------------------------------------------------ From:Vladimir Kempik Send Time:2022?8?30?(???) 20:40 To:undefined ; undefined Cc:undefined Subject:Re: One more test fails from vmTestBase Ok, just wanted to make sure it?s not only me ( who sees failure) filled https://bugs.openjdk.org/browse/JDK-8293100 Regards ,Vladimir 30 ???. 2022 ?., ? 15:22, jiangfeilong > ???????(?): Thanks for pointing this out. I got the same error on the unmatched board. > That doesn?t sound right, should I file a bug in JBS ? Yes, please file a bug to help track this issue. Thanks, Feilong. From: riscv-port-dev > On Behalf Of Vladimir Kempik Sent: Tuesday, August 30, 2022 12:50 AM To: riscv-port-dev at openjdk.org Subject: One more test fails from vmTestBase Hello While running some tests (tier4:hotspot) and especially vmTestbase set of tests I have found some test to constantly fail on jdk20/risc-v: test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java seeing lots of these errors in the log: JNI object public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" public int nsk.stress.jni.objectsJNI.i = 1072273735 public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" public float nsk.stress.jni.objectsJNI.f = 0.49921197 public double nsk.stress.jni.objectsJNI.d = NaN Java object public java.lang.String nsk.stress.jni.objectsJNI.instName = "Thread-2" public int nsk.stress.jni.objectsJNI.i = 1072273735 public long nsk.stress.jni.objectsJNI.l = 8325085374318028103 public char[] nsk.stress.jni.objectsJNI.c = "Thread-2" public float nsk.stress.jni.objectsJNI.f = 0.955677 public double nsk.stress.jni.objectsJNI.d = 0.49921197125621386 The fields No. 4 are different Objects are different That doesn?t sound right, should I file a bug in JBS ? Running it this way: make OUTPUTDIR=build/prebuilt-output BOOT_JDK=~/syntaj20 JT_HOME=~/jtreg6.1/ JDK_IMAGE_DIR=~/syntaj20 TEST_IMAGE_DIR=~/test20/ LOG_CMDLINES=true JTREG="TIMEOUT_FACTOR=8" run-test-prebuilt TEST="test/hotspot/jtreg/vmTestbase/nsk/stress/jni/jnistress002.java" Checked on hifive and thead Regards, Vladimir. -------------- next part -------------- An HTML attachment was scrubbed... URL: