Timeline offloading persistence #9444

arpad-m · 2024-10-17T17:13:04Z

Persist timeline offloaded state to S3.

Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful.

This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched.

This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start.

We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active.

Part of #9386, #8088

github-actions · 2024-10-17T18:29:26Z

5316 tests run: 5102 passed, 0 failed, 214 skipped (full report)

Flaky tests (4)

Postgres 17

test_local_only_layers_after_crash: debug-x86-64
test_pg_regress[4]: release-arm64
test_pg_regress[None]: debug-x86-64
test_readonly_node_gc: debug-x86-64

Code coverage* (full report)

functions: 31.4% (7676 of 24440 functions)
lines: 48.9% (60320 of 123406 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
5ad1198 at 2024-10-22T21:02:42.725Z :recycle:}

The test has some smaller modifications from test_timeline_offloading: * the method by which we determine the offloaded state * us doing manual offloading all the time * only one offloaded branch * a test in the end that deletion of a tenant works (so timeline deletion deletes the manifest as well)

erikgrinaker

It's unclear to me how we deal with interrupted offload operations. Or more broadly, tenant manifest conflicts, given that we duplicate the tenant manifest across shards.

The RFC says that shard 0 should be authoritative -- is it worth giving it special treatment, such that we ensure shard 0's manifest is uploaded before the other shards? Do we rely on the offloaded state being eventually consistent, and a failed actor being required to retry until that's the case? I'm not familiar enough with the design to enumerate all of the possible hazards here, but you get my drift.

It would be worth spelling all of this out in the manifest's doc comment, along with safety arguments and appropriate test cases.

pageserver/src/tenant/timeline/offload.rs

pageserver/src/tenant/remote_timeline_client.rs

pageserver/src/tenant/remote_timeline_client/manifest.rs

pageserver/src/tenant.rs

pageserver/src/tenant/timeline/offload.rs

pageserver/src/tenant/remote_timeline_client.rs

pageserver/src/tenant.rs

pageserver/src/tenant/remote_timeline_client/download.rs

arpad-m · 2024-10-21T13:09:18Z

It's unclear to me how we deal with interrupted offload operations. Or more broadly, tenant manifest conflicts, given that we duplicate the tenant manifest across shards.

There is no cross-shard offload operation: each shard can decide on the offload on its own, which it does during compaction. So that's why one shouldn't have a unified tenant manifest either. I don't think it makes much sense to have only some shards offloaded as an intentional state, there is nothing gained by that. But it's also not an erroneous condition, and most features of the pageserver are per-shard, and not per-tenant.

We'd otherwise need to make offloading be coordinated by the storage controller, something we gain little from.

However, this made me think about shard splits. We'll probably need to copy over the manifests, or write them from scratch (and what do do with the pre-split manifests? probably we can delete them).

erikgrinaker · 2024-10-21T14:21:10Z

There is no cross-shard offload operation: each shard can decide on the offload on its own, which it does during compaction. So that's why one shouldn't have a unified tenant manifest either. I don't think it makes much sense to have only some shards offloaded as an intentional state, there is nothing gained by that. But it's also not an erroneous condition, and most features of the pageserver are per-shard, and not per-tenant.

I see. Making offloading a shard-local operation seems reasonable. I was confused by the below from the PR description, as well as the RFC's reference to an authoritative manifest:

The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines.

It seems worthwhile to me to explicitly scope the manifest to a shard rather than a tenant, e.g. by referring to it as a tenant shard manifest (or just shard manifest), and point out that it's perfectly valid (if unexpected) for them to diverge across shards. This would prevent confusion and footguns if we later try to use it as tenant-global. But maybe that's just me -- feel free to get others' opinion on this as well.

However, this made me think about shard splits. We'll probably need to copy over the manifests

Yes, good catch, we'll need to propagate these.

…ence

This reverts commit d4fec5f.

…plitting

Also, handle dangling references in the tenant loading code

erikgrinaker

this made me think about shard splits. We'll probably need to copy over the manifests, or write them from scratch

For posterity: this has been added as a follow-up task on #8088.

pageserver/src/tenant/timeline/delete.rs

We support multiple storage backends now, so remove the `_s3_` from the name. Analogous to the names adopted for tenant manifests added in #9444.

Before, we didn't copy over the `index-part.json` of offloaded timelines to the new shard's location, resulting in the new shard not knowing the timeline even exists. In #9444, we copy over the manifest, but we also need to do this for `index-part.json`. As the operations to do are mostly the same between offloaded and non-offloaded timelines, we can iterate over all of them in the same loop, after the introduction of a `TimelineOrOffloadedArcRef` type to generalize over the two cases. This is analogous to the deletion code added in #8907. The added test also ensures that the sharded archival config endpoint works, something that has not yet been ensured by tests. Part of #8088

arpad-m added 5 commits October 18, 2024 00:15

util functions

407e644

wip

82202ac

Implement the remainder of offloading persistence

fa3e490

comments

116efe2

comment

b9d33fb

arpad-m force-pushed the arpad/offload_persistence branch from b630c9c to b9d33fb Compare October 17, 2024 22:50

arpad-m added 2 commits October 19, 2024 02:18

Add archived_at field for future flattening logic

0bf6b0a

arpad-m marked this pull request as ready for review October 19, 2024 00:38

arpad-m requested a review from a team as a code owner October 19, 2024 00:38

arpad-m requested a review from erikgrinaker October 19, 2024 00:38

arpad-m added 4 commits October 19, 2024 02:51

clippy

149f394

comment

b78a79b

Make tests pass

115e717

Ensure the timeline is archived upon offloading it

bf2bdb3

This was referenced Oct 21, 2024

Epic: Pageserver Timeline Archival #8088

Open

persistence for offloaded state #9386

Open

erikgrinaker reviewed Oct 21, 2024

View reviewed changes

jcsp reviewed Oct 21, 2024

View reviewed changes

pageserver/src/tenant/remote_timeline_client/download.rs Outdated Show resolved Hide resolved

arpad-m added 3 commits October 21, 2024 15:18

fmt and doc fix

9623a7d

Address some review comments

8476799

Use TENANT_MANIFEST instead of Generation::none

542642c

arpad-m added 5 commits October 21, 2024 21:05

Merge remote-tracking branch 'origin/main' into arpad/offload_persist…

64efdfd

…ence

Add "archived_at" field

a99b865

Add back paragraph

88fc3da

Put imports to top level

958b356

Comments

63e0cbf

arpad-m added 10 commits October 21, 2024 21:41

Add comment

4063340

Move retry_forever_download_to_vec into GenericRemoteStorage

d4fec5f

Make RemoteTimelineClient be lazily constructed

b458926

Fix

0e03442

Fix and rename manifest path

ba92ed2

fmt

99451bf

Default for deletion progress

3206740

Revert "Move retry_forever_download_to_vec into GenericRemoteStorage"

795f05a

This reverts commit d4fec5f.

Add UploadQueueNotReadyError and do upload_tenant_manifest in shard s…

ea949e2

…plitting

Delete offloaded timelines from the manifest

0949d3e

Also, handle dangling references in the tenant loading code

arpad-m requested a review from erikgrinaker October 22, 2024 10:57

arpad-m added 4 commits October 22, 2024 12:58

fmt

5b29d28

Adjustments

7ac1c43

fix and improve logging

08159f2

Fixes

5ad1198

erikgrinaker approved these changes Oct 22, 2024

View reviewed changes

pageserver/src/tenant/timeline/delete.rs Show resolved Hide resolved

arpad-m mentioned this pull request Oct 22, 2024

Rename IndexPart::{from_s3_bytes,to_s3_bytes} #9481

Merged

arpad-m enabled auto-merge (squash) October 22, 2024 15:39

arpad-m merged commit 6f8fcdf into main Oct 22, 2024
80 checks passed

arpad-m deleted the arpad/offload_persistence branch October 22, 2024 20:52

arpad-m added a commit that referenced this pull request Oct 22, 2024

Rename IndexPart::{from_s3_bytes,to_s3_bytes} (#9481)

3a3bd34

We support multiple storage backends now, so remove the `_s3_` from the name. Analogous to the names adopted for tenant manifests added in #9444.

arpad-m mentioned this pull request Oct 23, 2024

Support offloaded timelines during shard split #9489

Merged

arpad-m mentioned this pull request Oct 28, 2024

pageserver: refactor generation-aware loading code into generic #9545

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeline offloading persistence #9444

Timeline offloading persistence #9444

arpad-m commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Oct 17, 2024 •

edited

Loading

Postgres 17

erikgrinaker left a comment

arpad-m commented Oct 21, 2024

erikgrinaker commented Oct 21, 2024 •

edited

Loading

erikgrinaker left a comment

Timeline offloading persistence #9444

Timeline offloading persistence #9444

Conversation

arpad-m commented Oct 17, 2024 • edited Loading

github-actions bot commented Oct 17, 2024 • edited Loading

5316 tests run: 5102 passed, 0 failed, 214 skipped (full report)

Postgres 17

Code coverage* (full report)

erikgrinaker left a comment

Choose a reason for hiding this comment

arpad-m commented Oct 21, 2024

erikgrinaker commented Oct 21, 2024 • edited Loading

erikgrinaker left a comment

Choose a reason for hiding this comment

arpad-m commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Oct 17, 2024 •

edited

Loading

erikgrinaker commented Oct 21, 2024 •

edited

Loading