Action has become very slow #543

ledermann · 2023-11-01T17:29:08Z

Ensure the following before filing this issue

I verified it reproduces with the latest version with - uses: ruby/setup-ruby@v1 (see Versioning policy)
I tried to reproduce the issue locally by following the workflow steps (including all commands done by ruby/setup-ruby, except for Downloading Ruby & Extracting Ruby),
and it did not reproduce locally (if it does reproduce locally, it's not a ruby/setup-ruby issue)

Are you running on a GitHub-hosted runner or a self-hosted runner?

GitHub-hosted runner

Link to the failed workflow job (must be a public workflow job, so the necessary information is available)

https://github.com/templatus/templatus-hotwire/actions/runs/6713970783/job/18246455669

Any other notes?

The action has become slow lately, the setting up Ruby went from around 10 seconds to over 2 minutes.

Furthermore, there is a strange difference between the duration measured by GitHub and the time indicated by the action itself. See this example: The job itself says "9.52 seconds", but GitHub says "2m 21s"

The code I use is very simple:

- name: Set up Ruby
  uses: ruby/setup-ruby@v1
  with:
    bundler-cache: true

The text was updated successfully, but these errors were encountered:

MSP-Greg · 2023-11-01T18:15:40Z

I've noticed this also. If you turn on the timestamps or download the log, the following is shown:

2023-11-01T00:23:21.0213624Z Cache key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-cf68edb4543c7c3c8470ee8a3b747071dac9a7f804eae170528713c041520cc8
2023-11-01T00:23:21.6767350Z Cache Size: ~54 MB (56493164 B)
2023-11-01T00:23:21.6876926Z [command]/usr/bin/tar -xf /home/runner/work/_temp/439c9c90-f5cc-42d3-9750-233b8e5abcfc/cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --use-compress-program unzstd
2023-11-01T00:23:22.1771109Z Received 56493164 of 56493164 (100.0%), 53.8 MBs/sec
2023-11-01T00:23:22.2658245Z Cache restored successfully
2023-11-01T00:23:22.2744930Z Found cache for key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-21ff1081c32103b8892cccf02164ddbb1185541cdcb08c073975b58f69c420dc
2023-11-01T00:23:22.2754773Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle install --jobs 4
2023-11-01T00:23:25.6540433Z Fetching gem metadata from https://rubygems.org/.........
2023-11-01T00:23:25.9126946Z Fetching sidekiq 7.2.0
2023-11-01T00:23:26.0051303Z Installing sidekiq 7.2.0
2023-11-01T00:23:26.0760837Z Bundle complete! 35 Gemfile dependencies, 155 gems now installed.
2023-11-01T00:23:26.0763313Z Bundled gems are installed into `./vendor/bundle`
2023-11-01T00:23:26.1237137Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle clean
2023-11-01T00:23:26.5386807Z Removing sidekiq (7.1.6)
2023-11-01T00:23:26.5480805Z Saving cache
2023-11-01T00:23:26.5691124Z [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --files-from manifest.txt --use-compress-program zstdmt
2023-11-01T00:23:29.9885417Z Cache Size: ~54 MB (56581855 B)
2023-11-01T00:23:30.0962980Z Cache saved successfully
2023-11-01T00:23:30.1085596Z Took   9.52 seconds
2023-11-01T00:25:40.4096614Z ##[group]Run actions/[email protected]
2023-11-01T00:23:21.0213624Z Cache key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-cf68edb4543c7c3c8470ee8a3b747071dac9a7f804eae170528713c041520cc8
2023-11-01T00:23:21.6767350Z Cache Size: ~54 MB (56493164 B)
2023-11-01T00:23:21.6876926Z [command]/usr/bin/tar -xf /home/runner/work/_temp/439c9c90-f5cc-42d3-9750-233b8e5abcfc/cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --use-compress-program unzstd
2023-11-01T00:23:22.1771109Z Received 56493164 of 56493164 (100.0%), 53.8 MBs/sec
2023-11-01T00:23:22.2658245Z Cache restored successfully
2023-11-01T00:23:22.2744930Z Found cache for key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-21ff1081c32103b8892cccf02164ddbb1185541cdcb08c073975b58f69c420dc
2023-11-01T00:23:22.2754773Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle install --jobs 4
2023-11-01T00:23:25.6540433Z Fetching gem metadata from https://rubygems.org/.........
2023-11-01T00:23:25.9126946Z Fetching sidekiq 7.2.0
2023-11-01T00:23:26.0051303Z Installing sidekiq 7.2.0
2023-11-01T00:23:26.0760837Z Bundle complete! 35 Gemfile dependencies, 155 gems now installed.
2023-11-01T00:23:26.0763313Z Bundled gems are installed into `./vendor/bundle`
2023-11-01T00:23:26.1237137Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle clean
2023-11-01T00:23:26.5386807Z Removing sidekiq (7.1.6)
2023-11-01T00:23:26.5480805Z Saving cache
2023-11-01T00:23:26.5691124Z [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --files-from manifest.txt --use-compress-program zstdmt
2023-11-01T00:23:29.9885417Z Cache Size: ~54 MB (56581855 B)
2023-11-01T00:23:30.0962980Z Cache saved successfully
2023-11-01T00:23:30.1085596Z Took   9.52 seconds
2023-11-01T00:25:40.4096614Z ##[group]Run actions/[email protected]

So, the slowdown (2:10) is between the last log for setup-ruby and the first log of the next step in your Actions job, as the last two lines of the above log show. What's taking up the 2:10 is something I haven't yet looked at.

But, some of the logs here and elsewhere do not consistently show the delays...

dentarg · 2023-11-01T20:50:39Z

The job itself says "9.52 seconds", but GitHub says "2m 21s"

the logs here and elsewhere do not consistently show the delays

Sounds to me that the slowness/delays is outside of the control of this action?

I can see it myself too, 5 days ago it went from 30s to 2min+ https://github.com/Starkast/wikimum/actions/workflows/ci.yml, but it is not consistent, there was one quick job after a slow job, both jobs used 54a18e2 – this too indicates it is not something in this action that causes the slowness.

Let's ping the GitHub people that helped in #494, @Steve-Glass @chkimes, maybe they can look into this.

chkimes · 2023-11-01T21:02:37Z

For those who are experiencing this, does it always appear to happen after restoring from the cache? I reviewed the timestamps, and it's interesting that there is a significant gap between the last output of one step and the first output of the next.

chkimes · 2023-11-01T21:11:49Z

Through sheer luck, I think you have pinged one of the few people who might understand what is going on. I was poking at the cache action a few months ago to fix the cache behavior when the connection to Azure Blob gets interrupted. One of the things I discovered while testing that is that the client code was leaking promises (I was not able to track this down).

I ended up working around that in the cache action itself here: https://github.com/actions/cache/blob/704facf57e6136b1bc63b828d79edcd491f0ee84/src/restoreImpl.ts#L102

setup-ruby however, is referencing the @actions/cache package directly so it's not using the early exit code: https://github.com/ruby/setup-ruby/blob/master/bundler.js#L5C7-L5C12

The root fix is probably to address this as a bugfix in the cache library. A quick mitigation could be to add a process.exit call here: https://github.com/ruby/setup-ruby/blob/master/index.js#L32

MSP-Greg · 2023-11-01T21:39:25Z

@chkimes

does it always appear to happen after restoring from the cache?

From a quick look at several jobs, it appears that the issue is saving the cache, as all the jobs I looked at took the normal amount of time, but also had exact cache key hits on the restore, so they weren't saving the cache. HTH...

chkimes · 2023-11-01T22:25:25Z

Ah interesting. I don't know of anything specific that would cause a difference in behavior for save vs. restore - the issue could also exist in either code path since a leaked promise will block the runtime from executing regardless of where it's leaked from. But if all instances are showing a cache save, then that's a strong sign.

Are there any examples of a save step that don't experience the delay? That is, do we have a 100% repro during cache saves?

dentarg · 2023-11-01T22:47:31Z

Are there any examples of a save step that don't experience the delay?

I think this is one https://github.com/Starkast/wikimum/actions/runs/6638880163/job/18036138791#step:4:124

ledermann · 2023-11-02T06:27:11Z

I think this is one https://github.com/Starkast/wikimum/actions/runs/6638880163/job/18036138791#step:4:124

This example is from Oct 25, so it was run before setup-ruby was updated to Node 20 on Oct 26 in v1.159.0

I see the delay only if the following two conditions are met:

v1.159.0 is used (Node 20)
Cache is saved

I made this test:

Clear cache (using gh cache delete --all)
Run last workflow again → Cache is saved → 2-min delay
Run last workflow again → Cache is not saved, no delay
Revert to v1.158.0, new workflow → Cache is not saved, no delay
Clear cache again
Run last workflow again → Cache is saved, no delay

By the way, there is a similar issue in setup-node after upgrading to Node 20, which also occurs only after saving the cache:
actions/setup-node#878

ledermann · 2023-11-02T07:46:11Z

Based on my observations above, I created a repo with a minimal reproducible example:
https://github.com/ledermann/setup-ruby-delay-demo/actions

There you can see that saving the cache is slow with v1.159.0, but fast with v1.158.0.

chkimes · 2023-11-03T15:16:05Z

Thanks! This is great investigation - let me get the toolkit team involved to look at the cache package and how its behavior might have changed in Node 20.

Bo98 · 2023-11-05T02:03:57Z

This is likely due to nodejs/node#47228. The reason it regressed in Node 20 compared to Node 16 is because Node 19 changed the default of keepAlive from false to true.

The HTTP client code is wrapped in the @actions/http-client package, which controls its own default of keepAlive. However it doesn't override Node's default correctly, and there's already an open PR to fix this: actions/toolkit#1572. This will restore the Node 16 behaviour for anything using @actions/http-client.

I've tested that PR (see Bo98/toolkit-test@2fc3770) and can confirm it fixes the slowness: https://github.com/Bo98/toolkit-test/actions/runs/6758550452/job/18370288942 (compared to without that PR: https://github.com/Bo98/toolkit-test/actions/runs/6758456188/job/18370108410).

One of the things I discovered while testing that is that the client code was leaking promises (I was not able to track this down).

I ended up working around that in the cache action itself here: https://github.com/actions/cache/blob/704facf57e6136b1bc63b828d79edcd491f0ee84/src/restoreImpl.ts#L102

The reason you likely saw this is because the restore side of @actions/cache does force keepAlive on itself: https://github.com/actions/toolkit/blob/fe3e7ce9a7f995d29d1fcfd226a32bca407f9dc8/packages/cache/src/internal/downloadUtils.ts#L220, so that explains seeing the issue under Node 16 in that particular scenario.

see ruby/setup-ruby#543

eregon · 2023-11-07T13:59:11Z

The root fix is probably to address this as a bugfix in the cache library. A quick mitigation could be to add a process.exit call here: https://github.com/ruby/setup-ruby/blob/master/index.js#L32

Would this solve this keepAlive connections hanging? If so that sounds an easy/safe enough fix to merge.

* See ruby#543

eregon · 2023-11-07T18:03:32Z

Yes, that helps, thank you all for the investigation!
Unfortunate that the Node 20 update brought this bug in.

PR in #546

Slow build in https://github.com/ruby/setup-ruby/actions/runs/6788405695/job/18453344720 (2min 17s)

2023-11-07T17:54:28.4897445Z Took   3.80 seconds
2023-11-07T17:56:38.5288805Z ##[group]Run ruby -v

Fast build in https://github.com/ruby/setup-ruby/actions/runs/6788443622/job/18453477353?pr=546 (5s, as advertised :) )

2023-11-07T17:58:41.1757838Z Took   2.20 seconds
2023-11-07T17:58:41.1894986Z ##[group]Run ruby -v

* See #543

eregon · 2023-11-07T18:18:15Z

@MSP-Greg You probably want to explicitly process.exit() too in https://github.com/ruby/setup-ruby-pkgs so this performance bug is fixed too for setup-ruby-pkgs.
(and same for potentially other actions reusing this action's source code)

eregon · 2023-11-07T18:21:09Z

One left unknown is builds on #540 seem fast. Anyway, since we worked around the issue it's all fine now.

dentarg · 2023-11-07T21:33:26Z

One left unknown is builds on #540 seem fast

No Saving cache happening?

As with other actions like setup-node, I'm seeing 2-4 minute delays in post cache actions lately. Apparently this is because of a change in node behavior: ruby/setup-ruby#543 (comment) The fix, as with other actions, is to explicitly exit so as not to wait for hanging promises.

* Explicitly exit the process to not wait for hanging promises As with other actions like setup-node, I'm seeing 2-4 minute delays in post cache actions lately. Apparently this is because of a change in node behavior: ruby/setup-ruby#543 (comment) The fix, as with other actions, is to explicitly exit so as not to wait for hanging promises. * transpiled

julik added a commit to cheddar-me/pecorino that referenced this issue Nov 7, 2023

Seems like caching is the issue here

5dcc7f1

see ruby/setup-ruby#543

julik mentioned this issue Nov 7, 2023

Make GH actions actually run cheddar-me/pecorino#1

Merged

eregon added a commit to eregon/setup-ruby that referenced this issue Nov 7, 2023

Explicitly exit after this action runs to not wait for hanging promises

d03597f

* See ruby#543

eregon mentioned this issue Nov 7, 2023

Explicitly exit after this action runs to not wait for hanging promises #546

Merged

eregon closed this as completed Nov 7, 2023

eregon added a commit that referenced this issue Nov 7, 2023

Explicitly exit after this action runs to not wait for hanging promises

036ef45

* See #543

eregon mentioned this issue Nov 7, 2023

Slow post-action after v4 actions/setup-node#878

Closed

5 tasks

eregon mentioned this issue Nov 7, 2023

Make sure RequestOptions.keepAlive is applied properly on node20 runtime actions/toolkit#1572

Merged

xiniria mentioned this issue Nov 27, 2023

Explicitly exit the process to not wait for hanging promises actions/setup-node#907

Open

2 tasks

colinrotherham mentioned this issue Dec 12, 2023

Workaround delays in ~/.npm cache restore time alphagov/govuk-frontend#4558

Merged

Bo98 mentioned this issue Dec 22, 2023

Update action to node20 actions/cache#1284

Merged

10 tasks

fniephaus mentioned this issue Feb 14, 2024

Oracle JDK takes significantly longer to set up actions/setup-java#596

Closed

5 tasks

htpaf mentioned this issue Feb 29, 2024

setup-java post action caching for v4 is slower than v3 actions/setup-java#601

Closed

5 tasks

kamatsuoka mentioned this issue Mar 20, 2024

Explicitly exit the process to not wait for hanging promises pdm-project/setup-pdm#51

Merged

kamatsuoka mentioned this issue Apr 1, 2024

move process.exit() to cache-save.ts pdm-project/setup-pdm#53

Merged

rick-a-lane-ii mentioned this issue Apr 24, 2024

Set keepAlive to false datawire/infra-actions#99

Merged

This was referenced Aug 2, 2024

Delay due to bug in node20 runtime gitleaks/gitleaks-action#162

Open

Extremely slow Post Setup steps for actions/cache@v3 actions/cache#1442

Closed

phaubertin mentioned this issue Oct 19, 2024

Cache NASM phaubertin/jinue#61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action has become very slow #543

Action has become very slow #543

ledermann commented Nov 1, 2023

MSP-Greg commented Nov 1, 2023

dentarg commented Nov 1, 2023

chkimes commented Nov 1, 2023

chkimes commented Nov 1, 2023

MSP-Greg commented Nov 1, 2023 •

edited

Loading

chkimes commented Nov 1, 2023 •

edited

Loading

dentarg commented Nov 1, 2023

ledermann commented Nov 2, 2023

ledermann commented Nov 2, 2023

chkimes commented Nov 3, 2023 •

edited

Loading

Bo98 commented Nov 5, 2023 •

edited

Loading

eregon commented Nov 7, 2023

eregon commented Nov 7, 2023 •

edited

Loading

eregon commented Nov 7, 2023 •

edited

Loading

eregon commented Nov 7, 2023

dentarg commented Nov 7, 2023

Action has become very slow #543

Action has become very slow #543

Comments

ledermann commented Nov 1, 2023

Ensure the following before filing this issue

Are you running on a GitHub-hosted runner or a self-hosted runner?

Link to the failed workflow job (must be a public workflow job, so the necessary information is available)

Any other notes?

MSP-Greg commented Nov 1, 2023

dentarg commented Nov 1, 2023

chkimes commented Nov 1, 2023

chkimes commented Nov 1, 2023

MSP-Greg commented Nov 1, 2023 • edited Loading

chkimes commented Nov 1, 2023 • edited Loading

dentarg commented Nov 1, 2023

ledermann commented Nov 2, 2023

ledermann commented Nov 2, 2023

chkimes commented Nov 3, 2023 • edited Loading

Bo98 commented Nov 5, 2023 • edited Loading

eregon commented Nov 7, 2023

eregon commented Nov 7, 2023 • edited Loading

eregon commented Nov 7, 2023 • edited Loading

eregon commented Nov 7, 2023

dentarg commented Nov 7, 2023

MSP-Greg commented Nov 1, 2023 •

edited

Loading

chkimes commented Nov 1, 2023 •

edited

Loading

chkimes commented Nov 3, 2023 •

edited

Loading

Bo98 commented Nov 5, 2023 •

edited

Loading

eregon commented Nov 7, 2023 •

edited

Loading

eregon commented Nov 7, 2023 •

edited

Loading