fast imports: initial Importer and Storage changes #9218

problame · 2024-10-01T11:32:34Z

Context

This PR contains PoC-level changes for a product feature that allows onboarding large databases into Neon without going through the regular data path.

Changes

This internal RFC provides all the context

https://github.com/neondatabase/cloud/pull/19799

In the language of the RFC, this PR covers

the Importer code (fast_import)
all the Pageserver changes (mgmt API changes, flow implementation, etc)
a basic test for the Pageserver changes

Reviewing

As acknowledged in the RFC, the code added in this PR is not ready for general availability.
Also, the architecture is not to be discussed in this PR, but in the RFC and associated Slack channel instead.

Reviewers of this PR should take that into consideration.
The quality bar to apply during review depends on what area of the code is being reviewed:

Importer code (fast_import): practically anything goes
Core flow (flow.rs):
- Malicious input data must be expected and the existing threat models apply.
- The code must not be safe to execute on dedicated Pageserver instances:
  - This means in particular that tenants on other Pageserver instances must not be affected negatively wrt data confidentiality, integrity or availability.
Other code: the usual quality bar
- Pay special attention to correct use of gate guards, timeline cancellation in all places during shutdown & migration, etc.
- Consider the broader system impact; if you find potentially problematic interactions with Storage features that were not covered in the RFC, bring that up during the review.

I recommend submitting three separate reviews, for the three high-level areas with different quality bars.

References

(Internal-only)

…hack/fast-import

It runs the command successfully. Doesn't try to attach it to the pageserver on it yet BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest --preserve-database-files test_runner/regress/test_pg_import.py

Doesn't work yet, I think because index_part.json is missing

…hack/fast-import

XXX: untested, not sure if it works..

…hack/fast-import

Test passes, yay!

…efactor-timeline-create-idempotency

…e-idempotency' into problame/fast-import Checked that cloud.git e2e test still passes cloud.git commit 0d16ecc1f84bf8cceba7369ea436fe6f34c49430

…#9366) # Problem Timeline creation can either be bootstrap or branch. The distinction is made based on whether the `ancestor_*` fields are present or not. In the PGDATA import code (#9218), I add a third variant to timeline creation. # Solution The above pushed me to refactor the code in Pageserver to distinguish the different creation requests through enum variants. There is no externally observable effect from this change. On the implementation level, a notable change is that the acquisition of the `TimelineCreationGuard` happens later than before. This is necessary so that we have everything in place to construct the `CreateTimelineIdempotency`. Notably, this moves the acquisition of the creation guard _after_ the acquisition of the `gc_cs` lock in the case of branching. This might appear as if we're at risk of holding `gc_cs` longer than before this PR, but, even before this PR, we were holding `gc_cs` until after the `wait_completion()` that makes the timeline creation durable in S3 returns. I don't see any deadlock risk with reversing the lock acquisition order. As a drive-by change, I found that the `create_timeline()` function in `neon_local` is unused, so I removed it. # Refs * platform context: #9218 * product context: neondatabase/cloud#17507 * next PR stacked atop this one: #9501

…eline-create-idempotency

…e docs

This PR adds a pageserver mgmt API to scan a layer file for disposable keys. It hooks it up to the sharding compaction test, demonstrating that we're not filtering out all disposable keys. This is extracted from PGDATA import (#9218) where I do the filtering of layer files based on `is_key_disposable`.

## Problem `local_fs` doesn't return file sizes, which I need in PGDATA import (#9218) ## Solution Include file sizes in the result. I would have liked to add a unit test, and started doing that in * #9510 by extending the common object storage tests (`libs/remote_storage/tests/common/tests.rs`) to check for sizes as well. But it turns out that localfs is not even covered by the common object storage tests and upon closer inspection, it seems that this area needs more attention. => punt the effort into #9510

…e/neon into problame/fast-import

…73fb4d4c4281b22753b29a33

# Context In the PGDATA import code (#9218) I add a third way to create timelines, namely, by importing from a copy of a vanilla PGDATA directory in object storage. For idempotency, I'm using the PGDATA object storage location specification, which is stored in the IndexPart for the entire lifespan of the timeline. When loading the timeline from remote storage, that value gets stored inside `struct Timeline` and timeline creation compares the creation argument with that value to determine idempotency of the request. # Changes This PR refactors the existing idempotency handling of Timeline bootstrap and branching such that we simply compare the `CreateTimelineIdempotency` struct, using the derive-generated `PartialEq` implementation. Also, by spelling idempotency out in the type names, I find it adds a lot of clarity. The pathway to idempotency via requester-provided idempotency key also becomes very straight-forward, if we ever want to do this in the future. # Refs * platform context: #9218 * product context: neondatabase/cloud#17507 * stacks on top of #9366

kelvich and others added 30 commits September 12, 2024 09:59

first sketch

98d128d

CLI args parsing

241724f

compiles

28616b0

resolve conflicts

94c393b

fix awaits

daedec6

Add python test

0fc584e

now it produces an image layer

0c85644

Merge branch 'hack/fast-import' of github.com:neondatabase/neon into …

3fe8b69

…hack/fast-import

Test fix

abed355

Add --tenant-id and --timeline-id options

fe975ac

Test passes now

6563be1

It runs the command successfully. Doesn't try to attach it to the pageserver on it yet BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest --preserve-database-files test_runner/regress/test_pg_import.py

test: Attach the tenant, start endpoint on it

04ec8bd

Doesn't work yet, I think because index_part.json is missing

Create index_part.json

e6e0b27

common iterators for pg data dirs

71340e3

resolve conflicts

842ac7c

Parse postgres version from control file

578da1d

Import dbdir, relmaps, reldirs

0c64d55

Rename the image layer to not have the temp suffix

9759d6e

merge

be28bd8

clean imports

38dfecb

Merge branch 'hack/fast-import' of github.com:neondatabase/neon into …

189386b

…hack/fast-import

fix oder of insertion for relmaps and reldirs

80fed9c

import relation sizes

b81dbc8

remove old timeline init code

3a452d8

Import SLRUs

4d27048

Import dummy pg_twophase dir entry

85f4e96

Create controlfile and checkpoint entries

7b90ec6

XXX: untested, not sure if it works..

track rel file import time

357c07d

Merge branch 'hack/fast-import' of github.com:neondatabase/neon into …

8df3883

…hack/fast-import

Fake LSN

9351ba2

Test passes, yay!

problame added 3 commits October 24, 2024 18:39

Merge branch 'problame/refactor-timeline-create-args' into problame/r…

7307e1b

…efactor-timeline-create-idempotency

Merge remote-tracking branch 'origin/problame/refactor-timeline-creat…

d5ab35f

…e-idempotency' into problame/fast-import Checked that cloud.git e2e test still passes cloud.git commit 0d16ecc1f84bf8cceba7369ea436fe6f34c49430

remove remnants of old import api

3fb321b

problame mentioned this pull request Oct 25, 2024

remote_storage(local_fs): return correct file sizes #9511

Merged

problame added 2 commits October 25, 2024 12:16

Merge remote-tracking branch 'origin/main' into problame/refactor-tim…

0b47270

…eline-create-idempotency

avoid duplicate wait_completion() code in case of duplicates & improv…

aa7a385

…e docs

problame and others added 3 commits October 25, 2024 14:22

Merge aa7a385 into e0c7f1c

411fe5e

Merge commit 'refs/pull/9501/merge' of https://github.com/neondatabas…

4c3e3b0

…e/neon into problame/fast-import

corresponding cloud.git commit that works with this: 65d60f68d6232206…

1c90e04

…73fb4d4c4281b22753b29a33

problame added 8 commits November 13, 2024 16:10

Merge remote-tracking branch 'origin/main' into problame/fast-import

8006244

minimize diff

9b712d4

index part versioning and regression test

6a3105d

remove stale file

7dd48c4

de-mess the module structure

eae372d

some more sanity to code structure

37c9f10

revert debug stuff, renames, slightly improved doc comments

e1fb8d6

add more todos

78bcbef

problame changed the title ~~WIP: import timeline from PGDATA~~ fast imports: initial Importer and Storage changes Nov 15, 2024

problame requested review from hlinnaka, kelvich and jcsp November 15, 2024 16:30

problame marked this pull request as ready for review November 15, 2024 16:30

problame requested review from a team as code owners November 15, 2024 16:30

problame requested a review from NanoBjorn November 15, 2024 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast imports: initial Importer and Storage changes #9218

fast imports: initial Importer and Storage changes #9218

problame commented Oct 1, 2024 •

edited

Loading

fast imports: initial Importer and Storage changes #9218

Are you sure you want to change the base?

fast imports: initial Importer and Storage changes #9218

Conversation

problame commented Oct 1, 2024 • edited Loading

Context

Changes

Reviewing

References

problame commented Oct 1, 2024 •

edited

Loading