Perl 6 Concurrency development


Parrot currently has no threading support. There's some sort of working implementation that's on it's way out (whiteknight/kill_threads branch).

New plan

The new plan is to implement green threads/tasklets that may then get scheduled to different operating system worker threads. Data sharing between threads works by using proxy objects that cache locally for read access and post writes as messages to the thread owning that data which may perform those writes at it's own convenience. This should allow a locking free implementation with very good read performance, very good local write performance and rather low cross thread write performance.

If the scheduler is smart enough to schedule tasklets with high shared write rates on the same OS threads and tasklets with high local access rates on different threads, overall performance should be quite well in most cases.

Perl 6

Once tasklets are usable, they can be used to implement async blocks, junctive autothreading and hyper operations.

sorear suggested that a better way to handle signals asynchronously is to start a thread on delivery. Since tasklets would be dead cheap, this seems to be a very sensible aproach. I can see two ways to do this:

  • registering a signal handler could mean starting a tasklet and blocking till the signal gets delivered. Unregistering would just mean to kill that tasklet.
  • the VM could start a new tasklet on signal arrival.

Code structure

Parrot has a concurrency scheduler in src/scheduler.c that schedules events, signals and tasks, documented in docs/pdds/pdd25_concurrency.pod. Chandler's gsoc_threads branch unifies these to tasks.

17:25 <@whiteknight> also, we should add an interface somewhere, like to the scheduler PMC, to change the amount of time between preemptive task switches

17:25 <@whiteknight> setting it to 0 or somethign similar should turn off preemption

17:26 <@nine> Yep, exposing the scheduler is on my list of future improvements

17:26 <@benabik> Writing tests for multi-threaded code is difficult.

<@whiteknight> The scheduler needs a hell of a lot of improvements. We should start putting together a tasklist

17:35 <@whiteknight> Timer PMCs need to be completely rewritten. Right now they are inside-out from what they need to be

17:36 <@whiteknight> The scheduler also needs to get it's grubby hands away from managing exception handlers

14:55 <@whiteknight> nine: so what do you think we need next? I think we're going to need an API for a simple low-level threading compatibility layer to gloss over the differences between Posix and Win32

14:57 <@whiteknight> nine: the hardest part is really answering the basic architectural questions about how we want to handle data integrity, how we want to handle multi-threading with GC, etc

14:58 <@whiteknight> nine: We have two PMCs: One is the original PMC, the other is a proxy. We overload the vtables on the proxy to schedule requests to the original, instead of performing them directly

14:58 <@whiteknight> nine: we combine that maybe with a simple mailbox-like thread-safe queue and I think we can ignore the rest of the dirty details

14:59 <@nine> whiteknight: that's the part I understand easily. What I have not found out yet is where exactly the proxy PMC gets created. Simply create a proxy for every known PMC when moving a task to a different thread?

14:59 <@whiteknight> The way I see it, assuming GC can be magically thread-safe, we can put a lock on the scheduler task queues, and we can put a lock on the mailbox type, and the rest of the system can be essentially lock-free so long as we follow the approved interfaces

15:00 <@whiteknight> nine: No, that would be the task of the mailbox. We would have a mailbox between two threads. MailBox.send_object(my_object) would create a proxy on the target thread for the my_object variable

15:01 <@whiteknight> so when you did "var foo = MailBox.receive_object()" or whatever, you would be receiving a proxy

15:01 <@whiteknight> ...unless you were receiving from the same thread, then I suppose you could be getting a reference to the original

15:01 <@whiteknight> so a Mailbox type is really the crux of the idea

15:01 <@nine> whiteknight: so it's more a task for the HLL compiler to generate the code to share variables between threads? Like Perl 5's my $foo : shared?

15:02 <@whiteknight> nine: I think so, yes. So long as you pass data around in a consistent way, you can avoid data corruption

15:02 <@whiteknight> that's what I really want most: a system which makes it almost impossible to corrupt data internally

15:05 <@whiteknight> if cross-thread data updates are basically scheduled tasks, then we can implement critical sections easily by disabling green thread task preemption on a given thread

15:05 <@whiteknight> then when we exit the critical section, the scheduler can run through and play out all the update tasks

15:09 <@whiteknight> that's the trade-off I'm making with this design. We don't really need locks at the PIR/HLL level, because cross-thread data writes are disabled by default. Use message-passing to schedule tasks across threads instead, and make modifications in-band

15:09 <@whiteknight> that means writing multi-threaded code is exactly the same as writing green_threaded code, so long as you make proper use of the scheduler and mailboxes

15:10 <@nine> regarding GC: since every thread has it's own interpreter, I think it should also have it's own GC. MailBox could contain a list of referenced objects so the GC would not kill an object that's still used in other threads.

15:10 <@whiteknight> yes, that's what I think will be the easiest to do first

15:10 <@whiteknight> Eventually, I have a paper at home about a concurrent GC which operates on a separate thread and requires no global stops

Powered by CiderCMS

Edit this page