Add timeline offload mechanism #8907

arpad-m · 2024-09-03T23:23:36Z

Implements an initial mechanism for offloading of archived timelines.

Offloading is implemented as specified in the RFC.

For now, there is no persistence, so a restart of the pageserver will retrigger downloads until the timeline is offloaded again.

We trigger offloading in the compaction loop because we need the signal for whether compaction is done and everything has been uploaded or not.

Part of #8088

github-actions · 2024-09-04T00:16:49Z

5085 tests run: 4878 passed, 0 failed, 207 skipped (full report)

Flaky tests (2)

Postgres 17

test_pg_regress[None]: debug-x86-64
test_subscriber_restart: release-x86-64

Code coverage* (full report)

functions: 31.3% (7506 of 23952 functions)
lines: 49.5% (60276 of 121841 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
1665c67 at 2024-10-07T23:50:16.674Z :recycle:}

Moves the per-timeline code to load timeline metadata into a new dedicated function called `load_timeline_metadata`. The old `load_timeline_metadata` becomes `load_timelines_metadata`. Split out of #8907 Part of #8088

pageserver/src/tenant.rs

arpad-m · 2024-10-03T00:51:10Z

marking as ready for review as I can add a test in a later PR (it needs additional functionality to manually offload).

VladLazar

The logic looks good to me. I wonder if we can simplify a bit though.

pageserver/src/tenant.rs

VladLazar

Looks good, but check_ancestor_of_to_be_unarchived_is_not_archived vs check_to_be_unarchived_timeline_has_no_archived_parent confuses me.

pageserver/src/tenant.rs

VladLazar

LGTM. This isn't used in prod right now so let's merge and iron it out as we go.

Persist timeline offloaded state to S3. Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful. This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched. This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start. We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active. Part of #9386, #8088

Before, we didn't copy over the `index-part.json` of offloaded timelines to the new shard's location, resulting in the new shard not knowing the timeline even exists. In #9444, we copy over the manifest, but we also need to do this for `index-part.json`. As the operations to do are mostly the same between offloaded and non-offloaded timelines, we can iterate over all of them in the same loop, after the introduction of a `TimelineOrOffloadedArcRef` type to generalize over the two cases. This is analogous to the deletion code added in #8907. The added test also ensures that the sharded archival config endpoint works, something that has not yet been ensured by tests. Part of #8088

arpad-m requested a review from koivunej September 3, 2024 23:23

arpad-m mentioned this pull request Sep 3, 2024

Epic: Pageserver Timeline Archival #8088

Open

arpad-m mentioned this pull request Sep 20, 2024

Move load_timeline_metadata into separate function #9080

Merged

arpad-m force-pushed the arpad/timeline_offload branch from d76a83b to 25461cc Compare September 20, 2024 14:07

arpad-m force-pushed the arpad/timeline_offload branch 2 times, most recently from c363e87 to da16915 Compare September 24, 2024 11:35

arpad-m added 6 commits September 30, 2024 22:51

Add timeline offload mechanism

d99b33e

Add persistence for offloading in memory

fc331f0

Implement unoffloading

ae2abae

fmt

386ab88

Put OffloadedTimeline into Arc

1b4212d

Implement offloaded timeline deletion

21139c8

arpad-m force-pushed the arpad/timeline_offload branch from 24ea738 to 21139c8 Compare September 30, 2024 20:51

clippy

8505c82

jcsp reviewed Oct 2, 2024

View reviewed changes

pageserver/src/tenant.rs Show resolved Hide resolved

jcsp reviewed Oct 2, 2024

View reviewed changes

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

jcsp reviewed Oct 2, 2024

View reviewed changes

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

jcsp reviewed Oct 2, 2024

View reviewed changes

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

arpad-m added 2 commits October 3, 2024 02:44

Review comments

83840af

new_state

1990335

arpad-m marked this pull request as ready for review October 3, 2024 00:50

arpad-m requested a review from a team as a code owner October 3, 2024 00:50

arpad-m requested a review from jcsp October 3, 2024 00:51

VladLazar reviewed Oct 4, 2024

View reviewed changes

pageserver/src/tenant.rs Show resolved Hide resolved

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

arpad-m added 3 commits October 4, 2024 16:23

Remove timeline from offloaded ones if unarchiving

6063deb

Review comments

5480136

Move out some components of apply_timeline_archival_config

38f33a6

Move unoffloading into separate function as well

48ea921

arpad-m requested a review from VladLazar October 4, 2024 15:02

arpad-m mentioned this pull request Oct 5, 2024

Shut down timelines during offload and add offload tests #9289

Merged

VladLazar reviewed Oct 7, 2024

View reviewed changes

arpad-m added 2 commits October 7, 2024 17:44

typo

dd2300b

fixes

a6aa996

arpad-m requested a review from VladLazar October 7, 2024 15:57

arpad-m added 2 commits October 7, 2024 18:47

rename

31f8bc6

clippy

1665c67

arpad-m mentioned this pull request Oct 8, 2024

Also consider offloaded timelines for obtaining retain_lsn #9308

Merged

VladLazar approved these changes Oct 8, 2024

View reviewed changes

jcsp approved these changes Oct 8, 2024

View reviewed changes

arpad-m merged commit e8ae376 into main Oct 8, 2024
79 checks passed

arpad-m deleted the arpad/timeline_offload branch October 8, 2024 23:33

jcsp mentioned this pull request Oct 14, 2024

timeline archival: slimmed down timeline object #8460

Closed

arpad-m mentioned this pull request Oct 17, 2024

Timeline offloading persistence #9444

Merged

arpad-m mentioned this pull request Oct 23, 2024

Support offloaded timelines during shard split #9489

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeline offload mechanism #8907

Add timeline offload mechanism #8907

arpad-m commented Sep 3, 2024 •

edited

Loading

github-actions bot commented Sep 4, 2024 •

edited

Loading

Postgres 17

arpad-m commented Oct 3, 2024

VladLazar left a comment

VladLazar left a comment

VladLazar left a comment •

edited

Loading

Add timeline offload mechanism #8907

Add timeline offload mechanism #8907

Conversation

arpad-m commented Sep 3, 2024 • edited Loading

github-actions bot commented Sep 4, 2024 • edited Loading

5085 tests run: 4878 passed, 0 failed, 207 skipped (full report)

Postgres 17

Code coverage* (full report)

arpad-m commented Oct 3, 2024

VladLazar left a comment

Choose a reason for hiding this comment

VladLazar left a comment

Choose a reason for hiding this comment

VladLazar left a comment • edited Loading

Choose a reason for hiding this comment

arpad-m commented Sep 3, 2024 •

edited

Loading

github-actions bot commented Sep 4, 2024 •

edited

Loading

VladLazar left a comment •

edited

Loading