-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast imports: initial Importer and Storage changes #9218
Open
problame
wants to merge
163
commits into
main
Choose a base branch
from
problame/fast-import
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It runs the command successfully. Doesn't try to attach it to the pageserver on it yet BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest --preserve-database-files test_runner/regress/test_pg_import.py
Doesn't work yet, I think because index_part.json is missing
XXX: untested, not sure if it works..
…efactor-timeline-create-idempotency
…e-idempotency' into problame/fast-import Checked that cloud.git e2e test still passes cloud.git commit 0d16ecc1f84bf8cceba7369ea436fe6f34c49430
problame
added a commit
that referenced
this pull request
Oct 25, 2024
…#9366) # Problem Timeline creation can either be bootstrap or branch. The distinction is made based on whether the `ancestor_*` fields are present or not. In the PGDATA import code (#9218), I add a third variant to timeline creation. # Solution The above pushed me to refactor the code in Pageserver to distinguish the different creation requests through enum variants. There is no externally observable effect from this change. On the implementation level, a notable change is that the acquisition of the `TimelineCreationGuard` happens later than before. This is necessary so that we have everything in place to construct the `CreateTimelineIdempotency`. Notably, this moves the acquisition of the creation guard _after_ the acquisition of the `gc_cs` lock in the case of branching. This might appear as if we're at risk of holding `gc_cs` longer than before this PR, but, even before this PR, we were holding `gc_cs` until after the `wait_completion()` that makes the timeline creation durable in S3 returns. I don't see any deadlock risk with reversing the lock acquisition order. As a drive-by change, I found that the `create_timeline()` function in `neon_local` is unused, so I removed it. # Refs * platform context: #9218 * product context: neondatabase/cloud#17507 * next PR stacked atop this one: #9501
problame
added a commit
that referenced
this pull request
Oct 25, 2024
This PR adds a pageserver mgmt API to scan a layer file for disposable keys. It hooks it up to the sharding compaction test, demonstrating that we're not filtering out all disposable keys. This is extracted from PGDATA import (#9218) where I do the filtering of layer files based on `is_key_disposable`.
problame
added a commit
that referenced
this pull request
Oct 25, 2024
## Problem `local_fs` doesn't return file sizes, which I need in PGDATA import (#9218) ## Solution Include file sizes in the result. I would have liked to add a unit test, and started doing that in * #9510 by extending the common object storage tests (`libs/remote_storage/tests/common/tests.rs`) to check for sizes as well. But it turns out that localfs is not even covered by the common object storage tests and upon closer inspection, it seems that this area needs more attention. => punt the effort into #9510
…e/neon into problame/fast-import
…73fb4d4c4281b22753b29a33
problame
added a commit
that referenced
this pull request
Oct 25, 2024
# Context In the PGDATA import code (#9218) I add a third way to create timelines, namely, by importing from a copy of a vanilla PGDATA directory in object storage. For idempotency, I'm using the PGDATA object storage location specification, which is stored in the IndexPart for the entire lifespan of the timeline. When loading the timeline from remote storage, that value gets stored inside `struct Timeline` and timeline creation compares the creation argument with that value to determine idempotency of the request. # Changes This PR refactors the existing idempotency handling of Timeline bootstrap and branching such that we simply compare the `CreateTimelineIdempotency` struct, using the derive-generated `PartialEq` implementation. Also, by spelling idempotency out in the type names, I find it adds a lot of clarity. The pathway to idempotency via requester-provided idempotency key also becomes very straight-forward, if we ever want to do this in the future. # Refs * platform context: #9218 * product context: neondatabase/cloud#17507 * stacks on top of #9366
problame
changed the title
WIP: import timeline from PGDATA
fast imports: initial Importer and Storage changes
Nov 15, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
This PR contains PoC-level changes for a product feature that allows onboarding large databases into Neon without going through the regular data path.
Changes
This internal RFC provides all the context
In the language of the RFC, this PR covers
fast_import
)Reviewing
As acknowledged in the RFC, the code added in this PR is not ready for general availability.
Also, the architecture is not to be discussed in this PR, but in the RFC and associated Slack channel instead.
Reviewers of this PR should take that into consideration.
The quality bar to apply during review depends on what area of the code is being reviewed:
fast_import
): practically anything goesflow.rs
):I recommend submitting three separate reviews, for the three high-level areas with different quality bars.
References
(Internal-only)