Serialize GAP calls from within Julia. #505

rbehrends · 2020-07-25T11:21:47Z

This is a draft version for ensuring that GAP is not called from more than one Julia thread at a time.

It currently works with Threads.nthreads() == 1, but not if you, say, export JULIA_NUM_THREADS=4 and then run the following test:

using GAP

n = 1000
Threads.@threads for i = 1:n
  GAP.Globals.SymmetricGroup(7)
end

Most likely, there is still a call to GAP that is not properly synced and causing the resultant crashes.

Closes #483

codecov · 2020-07-25T11:33:22Z

Codecov Report

Merging #505 into master will decrease coverage by 0.03%.
The diff coverage is 61.06%.

@@            Coverage Diff             @@
##           master     #505      +/-   ##
==========================================
- Coverage   72.35%   72.31%   -0.04%     
==========================================
  Files          71       72       +1     
  Lines        5266     5461     +195     
==========================================
+ Hits         3810     3949     +139     
- Misses       1456     1512      +56

Impacted Files	Coverage Δ
pkg/JuliaInterface/src/JuliaInterface.c	`89.26% <ø> (-0.07%)`	⬇️
pkg/JuliaInterface/src/calls.c	`78.39% <ø> (ø)`
pkg/JuliaInterface/src/convert.c	`87.17% <ø> (ø)`
src/GAP.jl	`75.45% <0.00%> (+5.64%)`	⬆️
src/sync.jl	`33.96% <33.96%> (ø)`
src/ccalls.jl	`94.40% <84.09%> (-2.38%)`	⬇️
test/sync.jl	`91.66% <91.66%> (ø)`
src/gap_to_julia.jl	`95.85% <100.00%> (+0.46%)`	⬆️
src/julia_to_gap.jl	`90.32% <100.00%> (+1.16%)`	⬆️
docs/make.jl	`0.00% <0.00%> (-84.62%)`	⬇️
... and 22 more

src/ccalls.jl

src/sync.jl

fingolfin · 2020-07-27T16:32:20Z

src/sync.jl

+    @inline function lock()
+        tid = Threads.threadid()
+        if sync_level[tid] == 0
+            Base.lock(mutex)
+        end
+        sync_level[tid] += 1
+    end


I don't understand: We are using a ReentrantLock. Why do we need to keep track on every thread whether we already locked the lock? I.e. why do we need sync_level?

Is it because locks are per-task but you want per-thread locks? But why? Shouldn't we prevent any attempts to invoke GAP simultaneously from two different tasks, even if they happen to run on the same thread? I mean, GAP still won't expect two concurrent executions and things can and will break in various ways, no?

(BTW, note that at least in some build configurations, tasks can migrate between threads... however, I have no idea if this can happen while the task holds a lock is being held... hopefully not).

This is something that's still a bit of a work in progress.

Julia has ReentrantLock, which is reentrant on a per-task basis. It also has SpinLock, which works on a per-thread basis, but will eat CPU while blocking. So, in a way, that's a placeholder implementation, and I think I may actually have to wrap a pthread mutex for this to work properly.

Even if ReentrantLock works, the lock() method for ReentrantLock() is comparatively expensive, which is why I tried to optimize it for the nested case.

(ReentrantLock working on a per-task basis is also what I suspect may be one of the causes for the crashes I'm seeing.)

@sync

With the Julia @sync macros now doing the actual synchronization work, the C implementation is no longer needed and can be removed.

deps/build.jl

src/GAP.jl

Locking relied on tasks not being migrated between threads; we now associate the lock status with the locking task.

Macros and precompilation do not necessarily mix in Julia. If macro evaluation is dependent on program state and the program state changes later, the macro does not get reevaluated. However, Julia does track function dependencies, so we can force recompilation by altering any underlying regular functions.

fingolfin · 2020-08-07T15:55:41Z

Did you have a look at replacing ReentrantLock by using a Threads.Condition wrapped around a Threads.SpinLock (as suggested on Slack)?

test/runtests.jl

rbehrends · 2020-08-07T17:58:29Z

Did you have a look at replacing ReentrantLock by using a Threads.Condition wrapped around a Threads.SpinLock (as suggested on Slack)?

I have played around with various implementations, but haven't really seen any performance differences, especially as there is also the try/catch overhead, which appears to be of about the same magnitude (and also potentially disrupts optimizations). I actually removed the check to see if the lock is already being held, as it performed slightly worse and added unnecessary complexity.

A reentrant thread lock might still be useful in allowing several tasks to run GAP code on the same thread, but the logic then becomes a bit tricky.

I may be looking into biased locking, especially as it might allow us to avoid exception management in some cases. This would also be useful for the Singular.jl code.

We allow for two alternative synchronization modes. One uses a standard mutex, the other pins GAP to a specific thread (the main thread by default). When pinned to a thread, GAP can only be called from that thread. Calling it from another thread will result in an error. While pinning is more restrictive than locking, it is also considerably faster, incurring very little overhead.

fingolfin · 2020-09-04T23:37:00Z

src/GAP.jl

@@ -383,10 +384,10 @@ function prompt()
    # save the current SIGINT handler
    # HACK: the hardcoded value for SIG_DFL is not portable, revise this
    # install GAP's SIGINT handler
-    old_sigint = ccall(:signal, Ptr{Cvoid}, (Cint, Ptr{Cvoid}), Base.SIGINT, C_NULL)
+    old_sigint = @sync ccall(:signal, Ptr{Cvoid}, (Cint, Ptr{Cvoid}), Base.SIGINT, C_NULL)


Does that make sense for a call into the C library? Anyway, perhaps we can just restrict GAP.prompt() to the main thread or so somehow? If we ever end up racing this function from multiple threads, all hell will break loose anyway...

fingolfin · 2020-09-04T23:40:07Z

src/sync.jl

+        end))
+    end
+
+    function switch(mode::SyncMode)


Does this function exist only for debugging / benchmarking? Or whatever else is its purpose? In either case, a brief source comment indicating its purpose would be nice.

The goal here is to let an (experienced) user switch between locking modes. By default, this pins the GAP interpreter to the main thread. However, one can then also switch to full mutex mode in case GAP needs to be accessed from more than one thread, or to turn off locking in case somebody wants to avoid the overhead involved and takes care of synchronization in a different fashion, e.g. via using @sync directly.

I agree that there should be a comment and will be adding one. I'll probably also rename @sync, since Julia's Base.@sync has different semantics and that would be confusing.

fingolfin · 2020-09-04T23:41:12Z

src/sync.jl

+    # by installing the proper version during __init__(), we force
+    # selective recompilation of the affected functions as needed.
+
+    function _mode_mutex()


This is unused in practice, it seems (as nothing calls switch(), it's single caller). It should either be removed or else a brief source comment be added which explains why it's here.

@sync

This is to avoid conflicts with Base.@sync.

fingolfin · 2020-10-31T22:49:04Z

This needs to be rebased, and we need to figure out what to do with it...

Serialize GAP calls from within Julia.

fe0f25a

rbehrends added kind: enhancement New feature or request DO NOT MERGE DO NOT MERGE labels Jul 25, 2020

fingolfin reviewed Jul 27, 2020

View reviewed changes

rbehrends added 2 commits July 31, 2020 10:47

Remove old C-based synchronization code.

9af4881

With the Julia @sync macros now doing the actual synchronization work, the C implementation is no longer needed and can be removed.

Make global variables const instead.

7b639b1

rbehrends force-pushed the sync-ccalls branch from adc3ba6 to 2a4ca0a Compare July 31, 2020 09:39

fingolfin reviewed Aug 1, 2020

View reviewed changes

deps/build.jl Outdated Show resolved Hide resolved

fingolfin reviewed Aug 1, 2020

View reviewed changes

src/GAP.jl Outdated Show resolved Hide resolved

rbehrends force-pushed the sync-ccalls branch from 46ea35c to 8156f8e Compare August 1, 2020 23:27

Wrap entire initialize() function in @sync.

48ed5e3

rbehrends force-pushed the sync-ccalls branch from f49bf36 to e632bcc Compare August 5, 2020 14:06

Make locking logic task-based instead of thread-based.

8dfd851

Locking relied on tasks not being migrated between threads; we now associate the lock status with the locking task.

rbehrends changed the title ~~WIP: Serialize GAP calls from within Julia.~~ Serialize GAP calls from within Julia. Aug 5, 2020

rbehrends removed the DO NOT MERGE DO NOT MERGE label Aug 5, 2020

rbehrends force-pushed the sync-ccalls branch 2 times, most recently from e8e3cdf to b4125ba Compare August 7, 2020 05:38

Add tests for thread safety.

54b99fc

fingolfin reviewed Aug 7, 2020

View reviewed changes

test/runtests.jl Show resolved Hide resolved

rbehrends force-pushed the sync-ccalls branch from e04bf28 to 54b99fc Compare August 7, 2020 16:16

fingolfin reviewed Sep 4, 2020

View reviewed changes

fingolfin closed this Sep 8, 2020

fingolfin reopened this Sep 8, 2020

Rename @sync -> @lock.

6707eeb

This is to avoid conflicts with Base.@sync.

Document GAP.Sync.switch() method.

d677f30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize GAP calls from within Julia. #505

Serialize GAP calls from within Julia. #505

rbehrends commented Jul 25, 2020 •

edited by fingolfin

Loading

codecov bot commented Jul 25, 2020 •

edited

Loading

fingolfin Jul 27, 2020

rbehrends Jul 27, 2020

fingolfin commented Aug 7, 2020

rbehrends commented Aug 7, 2020

fingolfin Sep 4, 2020

fingolfin Sep 4, 2020

rbehrends Sep 5, 2020

fingolfin Sep 4, 2020

fingolfin commented Oct 31, 2020

Serialize GAP calls from within Julia. #505

Are you sure you want to change the base?

Serialize GAP calls from within Julia. #505

Conversation

rbehrends commented Jul 25, 2020 • edited by fingolfin Loading

codecov bot commented Jul 25, 2020 • edited Loading

Codecov Report

fingolfin Jul 27, 2020

Choose a reason for hiding this comment

rbehrends Jul 27, 2020

Choose a reason for hiding this comment

fingolfin commented Aug 7, 2020

rbehrends commented Aug 7, 2020

fingolfin Sep 4, 2020

Choose a reason for hiding this comment

fingolfin Sep 4, 2020

Choose a reason for hiding this comment

rbehrends Sep 5, 2020

Choose a reason for hiding this comment

fingolfin Sep 4, 2020

Choose a reason for hiding this comment

fingolfin commented Oct 31, 2020

rbehrends commented Jul 25, 2020 •

edited by fingolfin

Loading

codecov bot commented Jul 25, 2020 •

edited

Loading