Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lasting API decisions #44

Open
waldemarhorwat opened this issue Oct 8, 2024 · 2 comments
Open

Lasting API decisions #44

waldemarhorwat opened this issue Oct 8, 2024 · 2 comments

Comments

@waldemarhorwat
Copy link

waldemarhorwat commented Oct 8, 2024

The structs proposal will be the first shared memory feature that's convenient for wide use. As such, we should make sure we don't shoot ourselves in the foot, particularly with performance and complexity.

Building APIs on top of structs

Consider an example API built on top of structs built by users, a third party, or us. Let's say we build an abstraction for a queue of elements shared across threads/agents. Any agent can insert or retrieve elements inserted by itself or other agents. The queue q provides two methods:

  • q.Push(x) inserts an element x into the queue. x must be a primitive or a shared struct.
  • q.Pop() removes an element from the queue and returns it or null if the queue is empty. For the purposes of this example the choice of which element it retrieves is not relevant.

If thread A does q.Push(42) and thread B does x = q.Pop() and it turns out that x gets 42 then things are simple.

But consider what happens if thread A wants to pass a shared struct with some data through the queue.

shared struct Point {
  constructor(x, y) {
    this.x = x;
    this.y = y;
  }
  x; y;
}

Thread A:

let p = new Point(4, 8);
q.Push(p);

Thread B:

let a = q.Pop();
if (a != null) {
  let x = a.x;
  let y = a.y;
  …
}

Any reasonable queue API implementation needs to provide enough synchronization so that thread B gets to see the x and y values 4 and 8 that thread A wrote. We don't want B to read x and get undefined. The synchronization must be provided by the queue — there's nothing that B could do on its own to synchronize. This is standard practice in almost all such APIs.

Synchronization models

The question becomes the nature of the synchronization that the queue API should provide. In practice the simplest (and most efficient answer) is ACQUIRE/RELEASE. This guarantees that thread B sees all writes done to the Point it got when reading from it.

Unfortunately the ACQUIRE/RELEASE model is not supported by structs so users cannot provide it. Only the far slower SEQ_CST reads and writes are supported. This means that the queue will provide SEQ_CST semantics and its users will come to rely on that — it will be a breaking change to switch to ACQUIRE/RELEASE later.

Performance and complexity

The problem is that uncontended SEQ_CST reads and writes are somewhere around 100x slower than ACQUIRE/RELEASE ones which have about the same performance as unordered reads and writes (and, in fact, on x86 use the exact same machine instructions as unordered reads and writes). The performance difference is real, resulting in massive heroics cross-cutting between applications and the OS to get around just one SEQ_CST operation — see restartable sections for one example. This has caused enormous complexity, pain, and bugs.

Outside of mutexes, SEQ_CST reads and writes do have a use case, but it's tiny. Only a few extremely obscure algorithms require those. In more than a decade of working with atomics I've never run across a single case where I needed SEQ_CST reads or writes, whereas I've used ACQUIRE/RELEASE ones on a daily basis.

Can this wait for later?

No. SEQ_CST will be baked into every user API built on top of shared structs. It's already a problem with SEQ_CST being unnecessarily baked into various existing spec methods for which ACQUIRE/RELEASE would suffice. The more these get used, the more we'll be needlessly stuck with poor performance and higher complexity forever — exactly the thing we want to avoid.

I might present on this topic at a future meeting.

@syg
Copy link
Collaborator

syg commented Oct 8, 2024

I'm certainly happy to include acq/rel earlier than later -- in fact it's probably better to do so as a proposal that can land earlier and independently of shared structs since it will affect SABs as well.

My one constraint here is to do so in lockstep with Wasm, just as we did with the original memory model, because, well, we have the same memory model. cc @conrad-watt

@conrad-watt
Copy link

Seems sensible! The immediate observations I would make:

  • We should take care to use the "modernised" definition of acq/rel that incorporates relatively recent fixes in the C++ model (see the distinction between simply-happens-before and strongly-happens-before introduced here). Our current SC-only model is still "correct", since these subtleties are only relevant once acq/rel is introduced.
  • This would also be a natural opportunity to introduce acq/rel/acq-rel and (especially) SC fence operations, to mirror Wasm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants