locks/leases with KV similar to SETNX #4803

aep · 2023-11-19T13:19:44Z

aep
Nov 19, 2023

with nats having KV now, i wonder if we can just drop redis entirely.
however, we use redis SETNX to build locks, (or leases in k8s terms)

the algo is well described here https://redis.io/docs/manual/patterns/distributed-locks/
and has been battle tested over the years.

has anyone done similar work with nats yet?
I guess we can somehow use Create() and that should be atomic?
but there's no TTL, so its unclear how you would expire the lock if the locker is dead.

jnmoyne · 2023-11-19T17:31:18Z

jnmoyne
Nov 19, 2023
Collaborator

You can do distributed synchronization and locked with NATS using the DiscardNewPerSubject discard policy with a limit of 1 message max per subject, and setting a max age for messages in the stream.

The subject is the "key" that you want to lock, if you succeed to publish on the subject then you have acquired the lock, which you release by removing the message from the stream. If the publish fails then you could not get the lock.

This is very similar to what is described in this blog: https://nats.io/blog/new-per-subject-discard-policy/ (basically a variation on the same theme).

And yes you could also use the atomic compare and set features of JetStream (eg do an "insert" rather than "upsert") to the same effect.

10 replies

ripienaar Nov 21, 2023
Collaborator

There is a Progress ack that you can use to tell it you're still busy - but this is in the libraries only not in CLI

aep Nov 21, 2023
Author

yes this works!

but is it correct? this is so simple, i feel like i must be not seeing something

package main

import (
	"github.com/nats-io/nats.go"
	"time"
	"fmt"
)

func main() {

	nc, err := nats.Connect("localhost")
	if err != nil {
		panic(err)
	}

	js, err := nc.JetStream()
	if err != nil {
		panic(err)
	}

	js.AddStream(&nats.StreamConfig{
		Name:     "work",
		Subjects: []string{"color.*"},
		Storage:  nats.MemoryStorage,
		MaxMsgsPerSubject: 1,
		Discard: nats.DiscardNew,
	})

	ch := make(chan *nats.Msg, 64)
	nc.ChanQueueSubscribe("worker", "workers", ch)
	for msg := range ch {
		fmt.Printf("Received a message: %s\n", string(msg.Data))
		go func(msg *nats.Msg) {
			for i := 0; i < 30; i++ {
				fmt.Println("working...")
				msg.InProgress()
				time.Sleep(1 * time.Second)
			}
			msg.Ack()
			js.PurgeStream("work") 
		}(msg)
	}

}

aep Nov 21, 2023
Author

hmm one problem that show up immediately is that msg.InProgress() never fails. its alwyas returning nil even if i kill the nats server. so this is not a reliable way to know if you're really holding a lock

ripienaar Nov 21, 2023
Collaborator

Yeah, I think the server do respond though so should be possible to do something to get a confirm, need to check if thats the case though.

Try doing a nc.Request(msg.Reply, []byte("+WPI")) see if you get a response, thats what in progress does

aep Nov 21, 2023
Author

nc.Request(msg.Reply, []byte("+WPI"))

this appears to reliably give me an error when the server is gone.
thanks so much, this looks very elegant.
going to poke around some more to see if there's any issues with this and wrap it up in a blog post.

aep · 2023-11-22T10:48:31Z

aep
Nov 22, 2023
Author

the previous comment describes the solution to building a lock with NATS. it is roughly:

nats stream add  work --defaults --discard-per-subject --subjects='locks.*' --storage=memory   --discard=new   --max-msgs-per-subject=1

nats req locks.bob red # ok
nats req locks.bob red # will fail

nats stream rmm locks 1 #clear the lock

however, with the help of the maintainers i discovered a much smarter method of ensuring only one worker runs on one item
exclusively

writeup with more context is on my blog https://docs.kraudcloud.com/blog/2023/11/22/exclusive-worker-tokens-with-nats/

the basic idea is that the same DiscardNewPerSubject construct can be used to push a notification and the lock to the workers. nats will hold the item in the queue until the worker acks it, leading to an exclusive lock on the item.

package main

import (
    "fmt"
    "github.com/nats-io/nats.go"
    "time"
)

func InitNats() {
    nc, err := nats.Connect("localhost")
    if err != nil {
        panic(err)
    }
    defer nc.Close()

    js, err := nc.JetStream()
    if err != nil {
        panic(err)
    }

    // first create an inbox queue holding the latest state of work to be done
    // values in here are replaced when new work on the same topic is submitted
    _, err = js.AddStream(&nats.StreamConfig{
        Name:     "inbox",
        Subjects: []string{"inbox.*"},

        MaxMsgsPerSubject: 1,
        Discard:           nats.DiscardNew,
    })
    if err != nil {
        panic(fmt.Sprintf("Error creating jetstream [needs a nats-server with -js] : %v", err))
    }

    // items are moved from the inbox into a token lock.
    // these are held by a worker until its done and only THEN a new value is pulled from the inbox.
    // if the worker fails to ack the item, it is resent to a different worker
    _, err = js.AddStream(&nats.StreamConfig{
        Name: "work",
        Sources: []*nats.StreamSource{
            {
                Name: "inbox",
            },
        },

        MaxMsgsPerSubject: 1,
        Discard:           nats.DiscardNew,

        // this means you cant update a running token
        DiscardNewPerSubject: true,

        // an ack deletes the message and frees the topic for new work
        Retention: nats.WorkQueuePolicy,
    })
    if err != nil {
        panic(fmt.Sprintf("Error creating jetstream [needs a nats-server with -js] : %v", err))
    }

    // push the token into a delivery group
    _, err = js.AddConsumer("work", &nats.ConsumerConfig{
        Durable:        "work",
        DeliverSubject: "work",
        DeliverGroup:   "workers",
        DeliverPolicy:  nats.DeliverAllPolicy,
        AckPolicy:      nats.AckExplicitPolicy,
        AckWait:        30 * time.Second,
        Heartbeat:      time.Second,
    })
    if err != nil {
        panic(fmt.Sprintf("Error creating jetstream consumer : %v", err))
    }

    ch := make(chan *nats.Msg, 64)
    nc.ChanQueueSubscribe("work", "workers", ch)

    for msg := range ch {

        if len(msg.Reply) == 0 {
            // not jetstream, probably keepalive
            continue
        }

        fmt.Println(msg.Reply)

        fmt.Printf("Received a message on %s: %s\n", msg.Subject, string(msg.Data))
        go func(msg *nats.Msg) {
            for i := 0; i < 60; i++ {
                fmt.Println("working...")
                rsp, err := nc.Request(msg.Reply, []byte("+WPI"), time.Second)
                if err != nil {
                    // lost lock, stop immediately or we risk working in parallel
                    panic(err)
                }
                fmt.Println("got in progress response", string(rsp.Data))
                time.Sleep(1 * time.Second)
            }
            fmt.Println("done")
            msg.Ack()
        }(msg)
    }
}

0 replies

pySilver · 2024-08-10T02:19:57Z

pySilver
Aug 10, 2024

@aep have you tried this in production?

1 reply

aep Aug 10, 2024
Author

Unfortunately no. We're just using temporal now.
However, I tested this alot, and I didn't discover any issues with the idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

locks/leases with KV similar to SETNX #4803

{{title}}

Replies: 3 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

locks/leases with KV similar to SETNX #4803

aep Nov 19, 2023

Replies: 3 comments · 11 replies

jnmoyne Nov 19, 2023 Collaborator

ripienaar Nov 21, 2023 Collaborator

aep Nov 21, 2023 Author

aep Nov 21, 2023 Author

ripienaar Nov 21, 2023 Collaborator

aep Nov 21, 2023 Author

aep Nov 22, 2023 Author

pySilver Aug 10, 2024

aep Aug 10, 2024 Author

aep
Nov 19, 2023

Replies: 3 comments 11 replies

jnmoyne
Nov 19, 2023
Collaborator

ripienaar Nov 21, 2023
Collaborator

aep Nov 21, 2023
Author

aep Nov 21, 2023
Author

ripienaar Nov 21, 2023
Collaborator

aep Nov 21, 2023
Author

aep
Nov 22, 2023
Author

pySilver
Aug 10, 2024

aep Aug 10, 2024
Author