Storage & Compute release 2024-09-23 #9095

This is a pre requisite for #8681

@Lzjing-1997

If the primary is started at an LSN within the first of a 16 MB WAL segment, the "long XLOG page header" at the beginning of the segment was not initialized correctly. That has gone unnnoticed, because under normal circumstances, nothing looks at the page header. The WAL that is streamed to the safekeepers starts at the new record's LSN, not at the beginning of the page, so that bogus page header didn't propagate elsewhere, and a primary server doesn't normally read the WAL its written. Which is good because the contents of the page would be bogus anyway, as it wouldn't contain any of the records before the LSN where the new record is written. Except that in the following cases a primary does read its own WAL: 1. When there are two-phase transactions in prepared state at checkpoint. The checkpointer reads the two-phase state from the XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/. 2. Logical decoding reads the WAL starting from the replication slot's restart LSN. This PR fixes the problem with two-phase transactions. For that, it's sufficient to initialize the page header correctly. The checkpointer only needs to read XLOG_XACT_PREPARE records that were generated after the server startup, so it's still OK that older WAL is missing / bogus. I have not investigated if we have a problem with logical decoding, however. Let's deal with that separately. Special thanks to @Lzjing-1997, who independently found the same bug and opened a PR to fix it, although I did not use that PR.

Moves the per-timeline code to load timeline metadata into a new dedicated function called `load_timeline_metadata`. The old `load_timeline_metadata` becomes `load_timelines_metadata`. Split out of #8907 Part of #8088

We frequently mess up our submodule references. This adds one safeguard: it checks that the submodule references are only updated "forwards", not to some older commit, or a commit that's not a descended of the previous one. As next step, I'm thinking that we should automate things so that when you merge a PR to the 'neon' repository that updates the submodule references, the REL_*_STABLE_neon branches are automatically updated to match the submodule references. That way, you never need to manually merge PRs in the postgres repository, it's all triggered from commits in the 'neon' repository. But that's not included here.

We were already building v0.4.0 as an indirect dependency, so this avoids having to build two different versions of it.

To eliminate one version of it from our dependency tree.

cargo update ciborium iana-time-zone lazy_static schannel uuid cargo update [email protected] cargo update --precise 2.9.7 ureq It might be worthwhile just update all our dependencies at some point, but this is aimed at pruning the dependency tree, to make the build a little faster. That's also why I didn't update ureq to the latest version: that would've added a dependency to yet another version of rustls.

These were not referenced in any of the other Cargo.toml files in the workspace. They were not being built because of that, so there was little harm in having them listed, but let's be tidy.

## Problem Previously, the storage controller may send compute notifications containing stale pageservers (i.e. pageserver serving the shard was detached). This happened because detaches did not update the compute hook state. ## Summary of Changes Update compute hook state on shard detach. Fixes #8928

Not used in production, but in benchmarks, to demonstrate minimal RTT. (It would be nice to not have to copy the 8KiB of zeroes, but, that would require larger protocol changes). Found this useful in investigation #8952.

#8935) refs #8184 stacked atop #8934 This PR changes from ignoring the config field to rejecting configs that contain it. PR neondatabase/infra#1903 removes the field usage from `pageserver.toml`. It rolls into prod sooner or in the same release as this PR.

There is discrepancy with the spec, it has PUT

Part of #8002 Close #8920 Legacy compaction (as well as gc-compaction) rely on the GC process to remove unused layer files, but this relies on many factors (i.e., key partition) to ensure data in a dropped table can be eventually removed. In gc-compaction, we consider the keyspace information when doing the compaction process. If a key is not in the keyspace, we will skip that key and not include it in the final output. However, this is not easy to implement because gc-compaction considers branch points (i.e., retain_lsns) and the retained keyspaces could change across different LSNs. Therefore, for now, we only remove aux v1 keys in the compaction process. ## Summary of changes * Add `FilterIterator` to filter out keys. * Integrate `FilterIterator` with gc-compaction. * Add `collect_gc_compaction_keyspace` for a spec of keyspaces that can be retained during the gc-compaction process. --------- Signed-off-by: Alex Chi Z <[email protected]>

It was pretty noisy. It changed from debug to info level in commit 78938d1, but I believe that was not purpose.

## Problem LFC cache entry is chunk (right now size of chunk is 1Mb). LFC statistics shows number of chunks, but not number of used pages. And autoscaling team wants to know how sparse LFC is: https://neondb.slack.com/archives/C04DGM6SMTM/p1726782793595969 It is possible to obtain it from the view `select count(*) from local_cache`. Nut it is expensive operation, enumerating all entries in LFC under lock. ## Summary of changes This PR added "file_cache_used_pages" to `neon_lfc_stats` view: ``` select * from neon_lfc_stats; lfc_key | lfc_value -----------------------+----------- file_cache_misses | 3139029 file_cache_hits | 4098394 file_cache_used | 1024 file_cache_writes | 3173728 file_cache_size | 1024 file_cache_used_pages | 25689 (6 rows) ``` Please notice that this PR doesn't change neon extension API, so no need to create new version of Neon extension. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <[email protected]>

We can't FlushOneBuffer when we're in redo-only mode on PageServer, so make execution of that function conditional on us not running in pageserver walredo mode.

…g names (#9105) The warnings: warning: elided lifetime has a name --> pageserver/src/metrics.rs:1386:29 | 1382 | pub(crate) fn start_timer<'c: 'a, 'a>( | -- lifetime `'a` declared here ... 1386 | ) -> Option<impl Drop + '_> { | ^^ this elided lifetime gets resolved as `'a` | = note: `#[warn(elided_named_lifetimes)]` on by default warning: elided lifetime has a name --> pageserver/src/metrics.rs:1537:46 | 1534 | pub(crate) fn start_recording<'c: 'a, 'a>( | -- lifetime `'a` declared here ... 1537 | ) -> BasebackupQueryTimeOngoingRecording<'_, '_> { | ^^ this elided lifetime gets resolved as `'a` warning: elided lifetime has a name --> pageserver/src/metrics.rs:1537:50 | 1534 | pub(crate) fn start_recording<'c: 'a, 'a>( | -- lifetime `'a` declared here ... 1537 | ) -> BasebackupQueryTimeOngoingRecording<'_, '_> { | ^^ this elided lifetime gets resolved as `'a` warning: elided lifetime has a name --> pageserver/src/tenant.rs:3630:25 | 3622 | async fn prepare_new_timeline<'a>( | -- lifetime `'a` declared here ... 3630 | ) -> anyhow::Result<UninitializedTimeline> { | ^^^^^^^^^^^^^^^^^^^^^ this elided lifetime gets resolved as `'a`

Seems nice to keep all these together. This also provides a nice place for a README file to describe the compute image build process. For now, it briefly describes the contents of the directory, but can be expanded.

Instead of adding them to the VM image late in the build process, when putting together the final VM image, include them in the earlier compute image already. That makes it more convenient to edit the files, and to test them.

Part of #8128, fixes #8872. ## Problem See #8872. ## Summary of changes - Retry `list_timeline_blobs` another time if - there are layer file keys listed but not index. - failed to download index. - Instrument code with `analyze-tenant` and `analyze-timeline` span. - Remove `initdb_archive` check, it could have been deleted. - Return with exit code 1 on fatal error if `--exit-code` parameter is set. Signed-off-by: Yuchen Liang <[email protected]>

## Problem We need to be able to run the regression tests against a cloud-based Neon staging instance to prepare the migration to the arm architecture. ## Summary of changes Some tests were modified to work on the cloud instance (i.e. added passwords, server-side copy changed to client-side, etc) --------- Co-authored-by: Alexander Bayandin <[email protected]>

## Problem Scheduling on tenant creation uses different heuristics compared to the scheduling done during background optimizations. This results in scenarios where shards are created and then immediately migrated by the optimizer. ## Summary of changes 1. Make scheduler aware of the type of the shard it is scheduling (attached vs secondary). We wish to have different heuristics. 2. For attached shards, include the attached shard count from the context in the node score calculation. This brings initial shard scheduling in line with what the optimization passes do. 3. Add a test for (2). This looks like a bigger change than required, but the refactoring serves as the basis for az-aware shard scheduling where we also need to make the distinction between attached and secondary shards. Closes #8969

neondatabase/cloud#12018

The PostgreSQL 17 vendor module is now based on postgres/postgres @ d7ec59a63d745ba74fba0e280bbf85dc6d1caa3e, presumably the final code change before the V17 tag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage & Compute release 2024-09-23 #9095

Storage & Compute release 2024-09-23 #9095

Commits on Sep 20, 2024

Commits on Sep 21, 2024

Commits on Sep 22, 2024

Commits on Sep 23, 2024

Commits on Sep 24, 2024