-
Notifications
You must be signed in to change notification settings - Fork 138
Sync Gateway DB online offline requirements
#Motivation There are a number of use cases where Admins are currently required to restart a Sync Gateway instance. These design notes capture how Sync Gateway could be updated so that these use cases can be completed without requiring a restart Sync Gateway.
#Use Cases
-
Admins wish to create new databases in an 'offline' state so that users can not immediately access the DB.
-
Admins want to take a DB offline on one or more Sync Gateway instances without affecting the other DB's on those instances. This makes it possible to restart/reset the Sync Gateway connection to a bucket without doing a full restart of the Sync Gateway process, to address issues like dropped TAP feeds.
-
It is possible for the TAP feed from Couchbase Server to become unavailable to a Sync Gateway instance, this results in no further entries appearing in the SG _changes feeds. Sync Gateway should automatically determine when this situation is non transient and take the affected DB's offline.
#Requirements
##Online/Offline operations must be Idempotent
A request to take a DB offline/online when the DB is already in the requested state, will have no effect, a success status code will be returned.
Operations called on a DB via the REST API during online/offline state transitions should be rejected with an appropriate status code.
##Bringing a DB online
When a DB is going online the following actions must be taken
-
Disconnect the DB from the underlying bucket
-
Reload the config for the DB instance being brought back online
-
Reconnect the DB to the underlying bucket in defined in the reloaded config
-
Re-enable the processing of all REST API calls against the DB end-point
##Taking a DB offline.
When a DB is taken offline the following actions must be taken
-
New REST calls on the DB are blocked and an appropriate status code returned.
-
In progress continuous, longpoll and web sockets _changes feeds against the DB will be terminated and an appropriate status code returned.
-
(OPTIONAL) when a DB is being taken offline all open requests against the DB should be drained and the appropriate error code returned to the client. The current architecture may not easily support this, the decision to implement will depend on the estimated level of effort vs the added value to clients.
Internally the DB maintains it's connection to the underlying bucket, the connection to the bucket is cycled when the DB is brought back online. This allows some DB admin operation to be supported in the offline state.
#Operations supported while a DB is offline. While a DB is offline the following administrative operations will still be functional.
-
Get DB config via REST API, the response will contain an “offline”:true property.
-
PUT new config parameters for DB
-
Run a _resync for an updated sync function
A DB Should be offline when calling _resync via the REST API, if the DB is online an error status code should be returned to the caller. The main steps of the _resync function are:
- Disconnect from the underlying bucket
- Reload DB configuration to pick up new sync_function
- Reconnect to underlying bucket
- Resync all document in the DB
During a _resync operation all other REST operations will remain offline, an attempt to bring the DB online will return an error status code to the caller.
##Auto detect TAP feed offline
Currently the SG detects when the TAP feed is not available but takes no action, Sync Gateway should differentiate between transient issues and service impacting issues when loosing the TAP feed.
Once Sync Gateway determines that the loss of the TAP feed is impacting service availability it should automatically take all affected DB's offline following the standard offline operation semantics described above.
See ticket #722.
When creating a DB via REST API a caller can pass “offline”:true, Sync Gateway will create the DB but leave it in the offline state.
##Logging
Additional logging should be added for the online/offline operations, these should be logged at level 'WARNING' as these operations can impact service availability.
#Additional Actions
##Documentation
Update Best Practices Guide Update Developer Portal Docs
##New Tickets
Add a reload config operation to the REST API, to reload all config for the SG instance.
New Bucket level events should be added to WebHooks, these could be used to trigger changes to external systems such as load balancers.