Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage & Compute release 2024-09-23 #9095

Merged
merged 25 commits into from
Sep 24, 2024
Merged

Storage & Compute release 2024-09-23 #9095

merged 25 commits into from
Sep 24, 2024

Commits on Sep 20, 2024

  1. Bump vendor/postgres to include extension path fix (#9076)

    This is a pre requisite for
    #8681
    lubennikovaav authored Sep 20, 2024
    1 Configuration menu
    Copy the full SHA
    f03f7b3 View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2024

  1. Fix init of WAL page header at startup (#8914)

    If the primary is started at an LSN within the first of a 16 MB WAL
    segment, the "long XLOG page header" at the beginning of the segment was
    not initialized correctly. That has gone unnnoticed, because under
    normal circumstances, nothing looks at the page header. The WAL that is
    streamed to the safekeepers starts at the new record's LSN, not at the
    beginning of the page, so that bogus page header didn't propagate
    elsewhere, and a primary server doesn't normally read the WAL its
    written. Which is good because the contents of the page would be bogus
    anyway, as it wouldn't contain any of the records before the LSN where
    the new record is written.
    
    Except that in the following cases a primary does read its own WAL:
    
    1. When there are two-phase transactions in prepared state at
    checkpoint. The checkpointer reads the two-phase state from the
    XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/.
    
    2. Logical decoding reads the WAL starting from the replication slot's
    restart LSN.
    
    This PR fixes the problem with two-phase transactions. For that, it's
    sufficient to initialize the page header correctly. The checkpointer
    only needs to read XLOG_XACT_PREPARE records that were generated after
    the server startup, so it's still OK that older WAL is missing / bogus.
    
    I have not investigated if we have a problem with logical decoding,
    however. Let's deal with that separately.
    
    Special thanks to @Lzjing-1997, who independently found the same bug
    and opened a PR to fix it, although I did not use that PR.
    hlinnaka authored Sep 21, 2024
    1 Configuration menu
    Copy the full SHA
    9a32aa8 View commit details
    Browse the repository at this point in the history
  2. Move load_timeline_metadata into separate function (#9080)

    Moves the per-timeline code to load timeline metadata into a new
    dedicated function called `load_timeline_metadata`. The old
    `load_timeline_metadata` becomes `load_timelines_metadata`.
    
    Split out of #8907
    
    Part of #8088
    arpad-m authored Sep 21, 2024
    1 Configuration menu
    Copy the full SHA
    a3800dc View commit details
    Browse the repository at this point in the history

Commits on Sep 22, 2024

  1. Check submodule forward progress (#8949)

    We frequently mess up our submodule references. This adds one safeguard:
    it checks that the submodule references are only updated "forwards", not
    to some older commit, or a commit that's not a descended of the previous 
    one.
    
    As next step, I'm thinking that we should automate things so that when
    you merge a PR to the 'neon' repository that updates the submodule
    references, the REL_*_STABLE_neon branches are automatically updated to
    match the submodule references. That way, you never need to manually
    merge PRs in the postgres repository, it's all triggered from commits in
    the 'neon' repository. But that's not included here.
    hlinnaka authored Sep 22, 2024
    1 Configuration menu
    Copy the full SHA
    c9b2ec9 View commit details
    Browse the repository at this point in the history
  2. Update "hostname" crate

    We were already building v0.4.0 as an indirect dependency, so this
    avoids having to build two different versions of it.
    hlinnaka committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    ecd615a View commit details
    Browse the repository at this point in the history
  3. Update "memoffset" crate

    To eliminate one version of it from our dependency tree.
    hlinnaka committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    913af44 View commit details
    Browse the repository at this point in the history
  4. Update a few dependencies, removing some indirect dependencies

        cargo update ciborium iana-time-zone lazy_static schannel uuid
        cargo update [email protected]
        cargo update  --precise 2.9.7 ureq
    
    It might be worthwhile just update all our dependencies at some point,
    but this is aimed at pruning the dependency tree, to make the build a
    little faster. That's also why I didn't update ureq to the latest
    version: that would've added a dependency to yet another version of
    rustls.
    hlinnaka committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    9f65389 View commit details
    Browse the repository at this point in the history
  5. Remove unused crates from workspace Cargo.toml

    These were not referenced in any of the other Cargo.toml files in the
    workspace. They were not being built because of that, so there was
    little harm in having them listed, but let's be tidy.
    hlinnaka committed Sep 22, 2024
    1 Configuration menu
    Copy the full SHA
    e16e827 View commit details
    Browse the repository at this point in the history

Commits on Sep 23, 2024

  1. storcon: update compute hook state on detach (#9045)

    ## Problem
    
    Previously, the storage controller may send compute notifications
    containing stale pageservers (i.e. pageserver serving the shard was
    detached). This happened because detaches did not update the compute
    hook state.
    
    ## Summary of Changes
    
    Update compute hook state on shard detach.
    
    Fixes #8928
    VladLazar authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    5432155 View commit details
    Browse the repository at this point in the history
  2. walredo: add a ping method (#8952)

    Not used in production, but in benchmarks, to demonstrate minimal RTT.
    (It would be nice to not have to copy the 8KiB of zeroes, but, that
    would require larger protocol changes).
    
    Found this useful in investigation
    #8952.
    problame authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    59b4c2e View commit details
    Browse the repository at this point in the history
  3. compact_level0_phase1: remove final traces of value access mode config (

    #8935)
    
    refs #8184
    stacked atop #8934
    
    This PR changes from ignoring the config field to rejecting configs that
    contain it.
    
    PR neondatabase/infra#1903 removes the field
    usage from `pageserver.toml`.
    
    It rolls into prod sooner or in the same release as this PR.
    problame authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    4d5add9 View commit details
    Browse the repository at this point in the history
  4. change HTTP method to comply with spec (#9100)

    There is discrepancy with the spec, it has PUT
    nikitakalyanov authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    f446e08 View commit details
    Browse the repository at this point in the history
  5. feat(pageserver): filter keys with gc-compaction (#9004)

    Part of #8002
    
    Close #8920
    
    Legacy compaction (as well as gc-compaction) rely on the GC process to
    remove unused layer files, but this relies on many factors (i.e., key
    partition) to ensure data in a dropped table can be eventually removed.
    
    In gc-compaction, we consider the keyspace information when doing the
    compaction process. If a key is not in the keyspace, we will skip that
    key and not include it in the final output.
    
    However, this is not easy to implement because gc-compaction considers
    branch points (i.e., retain_lsns) and the retained keyspaces could
    change across different LSNs. Therefore, for now, we only remove aux v1
    keys in the compaction process.
    
    ## Summary of changes
    
    * Add `FilterIterator` to filter out keys.
    * Integrate `FilterIterator` with gc-compaction.
    * Add `collect_gc_compaction_keyspace` for a spec of keyspaces that can
    be retained during the gc-compaction process.
    
    ---------
    
    Signed-off-by: Alex Chi Z <[email protected]>
    skyzh authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    2969952 View commit details
    Browse the repository at this point in the history
  6. test: Downgrade info message on removing empty directories (#9093)

    It was pretty noisy. It changed from debug to info level in commit
    78938d1, but I believe that was not purpose.
    hlinnaka authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    df39962 View commit details
    Browse the repository at this point in the history
  7. Add views for metrics about pageserver requests (#9008)

    The metrics include a histogram of how long we need to wait for a
    GetPage request, number of reconnects, and number of requests among
    other things.
    
    The metrics are not yet exported anywhere, but you can query them
    manually.
    
    Note: This does *not* bump the default version of the 'neon' extension. We
    will do that later, as a separate PR. The reason is that this allows us to roll back
    the compute image smoothly, if necessary. Once the image that includes the
    new extension .so file with the new functions has been rolled out, and we're
    confident that we don't need to roll back the image anymore, we can change
    default extension version and actually start using the new functions and views.
    
    This is what the view looks like:
    
    ```
    postgres=# select * from neon_perf_counters ;
                    metric                 | bucket_le |  value   
    ---------------------------------------+-----------+----------
     getpage_wait_seconds_count            |           |      300
     getpage_wait_seconds_sum              |           | 0.048506
     getpage_wait_seconds_bucket           |     2e-05 |        0
     getpage_wait_seconds_bucket           |     3e-05 |        0
     getpage_wait_seconds_bucket           |     6e-05 |       71
     getpage_wait_seconds_bucket           |    0.0001 |      124
     getpage_wait_seconds_bucket           |    0.0002 |      248
     getpage_wait_seconds_bucket           |    0.0003 |      279
     getpage_wait_seconds_bucket           |    0.0006 |      297
     getpage_wait_seconds_bucket           |     0.001 |      298
     getpage_wait_seconds_bucket           |     0.002 |      298
     getpage_wait_seconds_bucket           |     0.003 |      298
     getpage_wait_seconds_bucket           |     0.006 |      300
     getpage_wait_seconds_bucket           |      0.01 |      300
     getpage_wait_seconds_bucket           |      0.02 |      300
     getpage_wait_seconds_bucket           |      0.03 |      300
     getpage_wait_seconds_bucket           |      0.06 |      300
     getpage_wait_seconds_bucket           |       0.1 |      300
     getpage_wait_seconds_bucket           |       0.2 |      300
     getpage_wait_seconds_bucket           |       0.3 |      300
     getpage_wait_seconds_bucket           |       0.6 |      300
     getpage_wait_seconds_bucket           |         1 |      300
     getpage_wait_seconds_bucket           |         2 |      300
     getpage_wait_seconds_bucket           |         3 |      300
     getpage_wait_seconds_bucket           |         6 |      300
     getpage_wait_seconds_bucket           |        10 |      300
     getpage_wait_seconds_bucket           |        20 |      300
     getpage_wait_seconds_bucket           |        30 |      300
     getpage_wait_seconds_bucket           |        60 |      300
     getpage_wait_seconds_bucket           |       100 |      300
     getpage_wait_seconds_bucket           |  Infinity |      300
     getpage_prefetch_requests_total       |           |       69
     getpage_sync_requests_total           |           |      231
     getpage_prefetch_misses_total         |           |        0
     getpage_prefetch_discards_total       |           |        0
     pageserver_requests_sent_total        |           |      323
     pageserver_requests_disconnects_total |           |        0
     pageserver_send_flushes_total         |           |      323
     file_cache_hits_total                 |           |        0
    (39 rows)
    ```
    hlinnaka authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    263dfba View commit details
    Browse the repository at this point in the history
  8. Maintain number of used pages for LFC (#9088)

    ## Problem
    
    LFC cache entry is chunk (right now size of chunk is 1Mb). LFC
    statistics shows number of chunks, but not number of used pages. And
    autoscaling team wants to know how sparse LFC is:
    https://neondb.slack.com/archives/C04DGM6SMTM/p1726782793595969
    It is possible to obtain it from the view `select count(*) from
    local_cache`.
    Nut it is expensive operation, enumerating all entries in LFC under
    lock.
    
    ## Summary of changes
    
    This PR added "file_cache_used_pages" to `neon_lfc_stats` view:
    ```
     select * from neon_lfc_stats;
            lfc_key        | lfc_value 
    -----------------------+-----------
     file_cache_misses     |   3139029
     file_cache_hits       |   4098394
     file_cache_used       |      1024
     file_cache_writes     |   3173728
     file_cache_size       |      1024
     file_cache_used_pages |     25689
    (6 rows)
    ```
    
    Please notice that this PR doesn't change neon extension API, so no need
    to create new version of Neon extension.
     
    ## Checklist before requesting a review
    
    - [ ] I have performed a self-review of my code.
    - [ ] If it is a core feature, I have added thorough tests.
    - [ ] Do we need to implement analytics? if so did you add the relevant
    metrics to the dashboard?
    - [ ] If this PR requires public announcement, mark it with
    /release-notes label and add several sentences in this section.
    
    ## Checklist before merging
    
    - [ ] Do not forget to reformat commit message to not include the above
    checklist
    
    Co-authored-by: Konstantin Knizhnik <[email protected]>
    knizhnik and Konstantin Knizhnik authored Sep 23, 2024
    Configuration menu
    Copy the full SHA
    1c5d6e5 View commit details
    Browse the repository at this point in the history
  9. NOAI (#9084)

    We can't FlushOneBuffer when we're in redo-only mode on PageServer, so
    make execution of that function conditional on us not running in
    pageserver walredo mode.
    MMeent authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    d865881 View commit details
    Browse the repository at this point in the history
  10. Fix compiler warnings with nightly rustc about elided lifetimes havin…

    …g names (#9105)
    
    The warnings:
    
        warning: elided lifetime has a name
            --> pageserver/src/metrics.rs:1386:29
             |
        1382 |     pub(crate) fn start_timer<'c: 'a, 'a>(
    | -- lifetime `'a` declared here
        ...
        1386 |     ) -> Option<impl Drop + '_> {
    | ^^ this elided lifetime gets resolved as `'a`
             |
             = note: `#[warn(elided_named_lifetimes)]` on by default
    
        warning: elided lifetime has a name
            --> pageserver/src/metrics.rs:1537:46
             |
        1534 |     pub(crate) fn start_recording<'c: 'a, 'a>(
    | -- lifetime `'a` declared here
        ...
        1537 |     ) -> BasebackupQueryTimeOngoingRecording<'_, '_> {
    | ^^ this elided lifetime gets resolved as `'a`
    
        warning: elided lifetime has a name
            --> pageserver/src/metrics.rs:1537:50
             |
        1534 |     pub(crate) fn start_recording<'c: 'a, 'a>(
    | -- lifetime `'a` declared here
        ...
        1537 |     ) -> BasebackupQueryTimeOngoingRecording<'_, '_> {
    | ^^ this elided lifetime gets resolved as `'a`
    
        warning: elided lifetime has a name
            --> pageserver/src/tenant.rs:3630:25
             |
        3622 |     async fn prepare_new_timeline<'a>(
    | -- lifetime `'a` declared here
        ...
        3630 |     ) -> anyhow::Result<UninitializedTimeline> {
    | ^^^^^^^^^^^^^^^^^^^^^ this elided lifetime gets resolved as `'a`
    hlinnaka authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    e7e6319 View commit details
    Browse the repository at this point in the history
  11. Move files related to building compute image into compute/ dir

    Seems nice to keep all these together. This also provides a nice place
    for a README file to describe the compute image build process. For
    now, it briefly describes the contents of the directory, but can be
    expanded.
    hlinnaka committed Sep 23, 2024
    Configuration menu
    Copy the full SHA
    3a110e4 View commit details
    Browse the repository at this point in the history
  12. Move metric exporter and pgbouncer config files

    Instead of adding them to the VM image late in the build process, when
    putting together the final VM image, include them in the earlier
    compute image already. That makes it more convenient to edit the
    files, and to test them.
    hlinnaka committed Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    3ad5672 View commit details
    Browse the repository at this point in the history
  13. scrubber: retry when missing index key in the listing (#8873)

    Part of #8128, fixes #8872.
    
    ## Problem
    
    See #8872.
    
    ## Summary of changes
    
    - Retry `list_timeline_blobs` another time if 
      - there are layer file keys listed but not index.
      - failed to download index.
    - Instrument code with `analyze-tenant` and `analyze-timeline` span.
    - Remove `initdb_archive` check, it could have been deleted.
    - Return with exit code 1 on fatal error if `--exit-code` parameter is set.
    
    Signed-off-by: Yuchen Liang <[email protected]>
    yliang412 authored Sep 23, 2024
    1 Configuration menu
    Copy the full SHA
    37aa6fd View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2024

  1. Add regression tests for a cloud-based Neon instance (#8681)

    ## Problem
    We need to be able to run the regression tests against a cloud-based
    Neon staging instance to prepare the migration to the arm architecture.
    
    ## Summary of changes
    Some tests were modified to work on the cloud instance (i.e. added
    passwords, server-side copy changed to client-side, etc)
    
    ---------
    
    Co-authored-by: Alexander Bayandin <[email protected]>
    a-masterov and bayandin authored Sep 24, 2024
    1 Configuration menu
    Copy the full SHA
    91d9476 View commit details
    Browse the repository at this point in the history
  2. storcon: improve initial shard scheduling (#9081)

    ## Problem
    
    Scheduling on tenant creation uses different heuristics compared to the
    scheduling done during
    background optimizations. This results in scenarios where shards are
    created and then immediately
    migrated by the optimizer. 
    
    ## Summary of changes
    
    1. Make scheduler aware of the type of the shard it is scheduling
    (attached vs secondary).
    We wish to have different heuristics.
    2. For attached shards, include the attached shard count from the
    context in the node score
    calculation. This brings initial shard scheduling in line with what the
    optimization passes do.
    3. Add a test for (2).
    
    This looks like a bigger change than required, but the refactoring
    serves as the basis for az-aware
    shard scheduling where we also need to make the distinction between
    attached and secondary shards.
    
    Closes #8969
    VladLazar authored Sep 24, 2024
    1 Configuration menu
    Copy the full SHA
    9490360 View commit details
    Browse the repository at this point in the history
  3. 1 Configuration menu
    Copy the full SHA
    2b65a2b View commit details
    Browse the repository at this point in the history
  4. Update PostgreSQL 17 from 17rc1 to 17.0 (#9119)

    The PostgreSQL 17 vendor module is now based on postgres/postgres @
    d7ec59a63d745ba74fba0e280bbf85dc6d1caa3e, presumably the final code
    change before the V17 tag.
    MMeent authored Sep 24, 2024
    1 Configuration menu
    Copy the full SHA
    fc67f8d View commit details
    Browse the repository at this point in the history