Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast imports: initial Importer and Storage changes #9218

Open
wants to merge 163 commits into
base: main
Choose a base branch
from

Conversation

problame
Copy link
Contributor

@problame problame commented Oct 1, 2024

Context

This PR contains PoC-level changes for a product feature that allows onboarding large databases into Neon without going through the regular data path.

Changes

This internal RFC provides all the context

In the language of the RFC, this PR covers

  • the Importer code (fast_import)
  • all the Pageserver changes (mgmt API changes, flow implementation, etc)
  • a basic test for the Pageserver changes

Reviewing

As acknowledged in the RFC, the code added in this PR is not ready for general availability.
Also, the architecture is not to be discussed in this PR, but in the RFC and associated Slack channel instead.

Reviewers of this PR should take that into consideration.
The quality bar to apply during review depends on what area of the code is being reviewed:

  • Importer code (fast_import): practically anything goes
  • Core flow (flow.rs):
    • Malicious input data must be expected and the existing threat models apply.
    • The code must not be safe to execute on dedicated Pageserver instances:
      • This means in particular that tenants on other Pageserver instances must not be affected negatively wrt data confidentiality, integrity or availability.
  • Other code: the usual quality bar
    • Pay special attention to correct use of gate guards, timeline cancellation in all places during shutdown & migration, etc.
    • Consider the broader system impact; if you find potentially problematic interactions with Storage features that were not covered in the RFC, bring that up during the review.

I recommend submitting three separate reviews, for the three high-level areas with different quality bars.

References

(Internal-only)

kelvich and others added 30 commits September 12, 2024 09:59
It runs the command successfully. Doesn't try to attach it to the
pageserver on it yet

    BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest --preserve-database-files test_runner/regress/test_pg_import.py
Doesn't work yet, I think because index_part.json is missing
XXX: untested, not sure if it works..
Test passes, yay!
…e-idempotency' into problame/fast-import

Checked that cloud.git e2e test still passes
cloud.git commit 0d16ecc1f84bf8cceba7369ea436fe6f34c49430
problame added a commit that referenced this pull request Oct 25, 2024
…#9366)

# Problem

Timeline creation can either be bootstrap or branch.
The distinction is made based on whether the `ancestor_*` fields are
present or not.

In the PGDATA import code
(#9218), I add a third variant
to timeline creation.

# Solution

The above pushed me to refactor the code in Pageserver to distinguish
the different creation requests through enum variants.

There is no externally observable effect from this change.

On the implementation level, a notable change is that the acquisition of
the `TimelineCreationGuard` happens later than before. This is necessary
so that we have everything in place to construct the
`CreateTimelineIdempotency`. Notably, this moves the acquisition of the
creation guard _after_ the acquisition of the `gc_cs` lock in the case
of branching. This might appear as if we're at risk of holding `gc_cs`
longer than before this PR, but, even before this PR, we were holding
`gc_cs` until after the `wait_completion()` that makes the timeline
creation durable in S3 returns. I don't see any deadlock risk with
reversing the lock acquisition order.

As a drive-by change, I found that the `create_timeline()` function in
`neon_local` is unused, so I removed it.

# Refs

* platform context: #9218
* product context: neondatabase/cloud#17507
* next PR stacked atop this one:
#9501
problame added a commit that referenced this pull request Oct 25, 2024
This PR adds a pageserver mgmt API to scan a layer file for disposable
keys.

It hooks it up to the sharding compaction test, demonstrating that we're
not filtering out all disposable keys.

This is extracted from PGDATA import
(#9218)
where I do the filtering of layer files based on `is_key_disposable`.
problame added a commit that referenced this pull request Oct 25, 2024
## Problem

`local_fs` doesn't return file sizes, which I need in PGDATA import
(#9218)

## Solution

Include file sizes in the result.

I would have liked to add a unit test, and started doing that in 

* #9510

by extending the common object storage tests
(`libs/remote_storage/tests/common/tests.rs`) to check for sizes as
well.

But it turns out that localfs is not even covered by the common object
storage tests and upon closer inspection, it seems that this area needs
more attention.
=> punt the effort into #9510
problame added a commit that referenced this pull request Oct 25, 2024
# Context

In the PGDATA import code
(#9218) I add a third way to
create timelines, namely, by importing from a copy of a vanilla PGDATA
directory in object storage.

For idempotency, I'm using the PGDATA object storage location
specification, which is stored in the IndexPart for the entire lifespan
of the timeline. When loading the timeline from remote storage, that
value gets stored inside `struct Timeline` and timeline creation
compares the creation argument with that value to determine idempotency
of the request.

# Changes

This PR refactors the existing idempotency handling of Timeline
bootstrap and branching such that we simply compare the
`CreateTimelineIdempotency` struct, using the derive-generated
`PartialEq` implementation.

Also, by spelling idempotency out in the type names, I find it adds a
lot of clarity.

The pathway to idempotency via requester-provided idempotency key also
becomes very straight-forward, if we ever want to do this in the future.

# Refs
* platform context: #9218
* product context: neondatabase/cloud#17507
* stacks on top of #9366
@problame problame changed the title WIP: import timeline from PGDATA fast imports: initial Importer and Storage changes Nov 15, 2024
@problame problame marked this pull request as ready for review November 15, 2024 16:30
@problame problame requested review from a team as code owners November 15, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants