Cooperative vs preemtive scheduling of virtual theads

Mon Jun 8 20:22:40 UTC 2020

There are multiple issues here, and I will try to cover them all. Currently
class loading doesn't introduce a scheduling point (in fact, it pins), but even
if we made this a guarantee, that's not a contributor to the problem you think
you have, nor am I certain it will help at all.

For one, even if we guaranteed class loading doesn't introduce a scheduling
point, you wouldn't be able to rely on HashMap code not having any, because the
existence of such scheduling points are a hidden implementation detail of
HashMap that could change from one version to another. That's because scheduling
is not cooperative, and code is not made aware of scheduling point in code it
calls via, say, the type checker. The reason scheduling isn't cooperative is
that that would have resulted in a split world, duplicate APIs and no forward
compatibility. So you cannot rely on third-party code not having scheduling
points regardless of class loading.

For another -- you don't need to. If class loading in the middle of HashMap's
operation is a problem for virtual threads, then it's also a problem without
them (I don't know if that's the case in practice). Suppose that the HashMap's
instance were published somehow, say, in some static variable, then class
loading itself can run arbitrary code in class initialisers. If the instance
were in some inconsistent state, then that initialisation code could also
observe it in an inconsistent state. The way to avoid that is by ensuring your
data structure is never accessible to consumers while it is in an inconsistent
state (writes to a field are always atomic with respect to other
scheduler-neighbours if your scheduler is single-threaded). If the guarantee
that class loading does not introduce a scheduling point makes it significantly
easier, we will consider making that promise.

- Ron

On 8 June 2020 at 17:52:05, Andrey Lomakin (lomakin.andrey at gmail.com(mailto:lomakin.andrey at gmail.com)) wrote:

> Guys,
> I am very appreciate for the time which you spent on all those explanations and feedbacks. 
> But couple of things still not complepletely clear for me. I will start one by one.
> As Ron wrote and the same can be read in "State of Loom" paper which he published
> > "For example, a scheduler with a single worker platform thread would make all memory operations totally ordered, not require the use of locks, and would allow using, say, HashMap instead of a ConcurrentHashMap"
> 
> As Alan wrote 
> > Yes, it's possible in theory but unlikely in the current implementation
> > because of the limitation that the thread is pinned while holding a
> > monitor (ClassLoaders will almost always be holding a monitor when
> > loading classes). 
> 
> And I suppose when we say HashMap we mean any data structure which was not designed to be shared between OS threads and do not perfrom any IO calls . Like RB Tree, Fibonacci Heap and so on. 
> But data structures itself obviously can use their own classes, lets take HashMap like more or less abstract example.
> In current implementation of HashMap nodes can be presented as entires contained inside of linked list or inside of the tree (depends on amount of hash collisions).
> Let suppose that hash map is used to share state between virtual threads which are executed by single carrier thread. 
> And amount of hash collisions for all nodes is bellow the threshould. Then in one virtual thread we reach this threshould.
> 
> So basically I would like to discuss this pseudo code: 
> 
> if(amoun_of_hash_collissions>= hahs_collisions_threshould) { 
> TreeNode treeNode = new TreeNode();
> } 
> 
> Also as far as I know JVM is not required to load any class before it is really used into code (I do not work on that level on daily basis so may be I am wrong of course). 
> So creation of TreeNode object *theoretically* may introduce scheduling point which may lead or may not lead (depends on details of implementation of data structure, like copying of data into intermediate container before addition to the TreeNode) to the inconsisitent state of the data structure.
> 
> What I want to understand is boundaries of abstraction which I can use to create code which works as fast as possible. Could you explain what is wrong with logic above and why usage of single thread carrier allowes usage of data structures like HashMaps. I do wish that it will be possible (like it is/was in quasar project) but I am eager to understand why. 
> 
> 
> 
> On Mon, Jun 8, 2020 at 6:12 PM Alan Bateman wrote:
> > On 08/06/2020 15:52, Ron Pressler wrote:
> > > You should not make any assumptions about where scheduling points are. Thread.yield
> > > is not currently specified, but we *may* specify it so that it guarantees returning
> > > to the scheduler.
> > >
> > Just to add a bit more to this. The current implementation will return
> > to the scheduler when the thread is not pinned. If pinned then it just
> > continues.
> > 
> > -Alan
> 
> 
> --
> Best regards,
> Andrey Lomakin.
>