-
-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action has become very slow #543
Comments
I've noticed this also. If you turn on the timestamps or download the log, the following is shown:
So, the slowdown (2:10) is between the last log for setup-ruby and the first log of the next step in your Actions job, as the last two lines of the above log show. What's taking up the 2:10 is something I haven't yet looked at. But, some of the logs here and elsewhere do not consistently show the delays... |
Sounds to me that the slowness/delays is outside of the control of this action? I can see it myself too, 5 days ago it went from 30s to 2min+ https://github.com/Starkast/wikimum/actions/workflows/ci.yml, but it is not consistent, there was one quick job after a slow job, both jobs used 54a18e2 – this too indicates it is not something in this action that causes the slowness. Let's ping the GitHub people that helped in #494, @Steve-Glass @chkimes, maybe they can look into this. |
For those who are experiencing this, does it always appear to happen after restoring from the cache? I reviewed the timestamps, and it's interesting that there is a significant gap between the last output of one step and the first output of the next. |
Through sheer luck, I think you have pinged one of the few people who might understand what is going on. I was poking at the cache action a few months ago to fix the cache behavior when the connection to Azure Blob gets interrupted. One of the things I discovered while testing that is that the client code was leaking promises (I was not able to track this down). I ended up working around that in the cache action itself here: https://github.com/actions/cache/blob/704facf57e6136b1bc63b828d79edcd491f0ee84/src/restoreImpl.ts#L102
The root fix is probably to address this as a bugfix in the cache library. A quick mitigation could be to add a process.exit call here: https://github.com/ruby/setup-ruby/blob/master/index.js#L32 |
From a quick look at several jobs, it appears that the issue is saving the cache, as all the jobs I looked at took the normal amount of time, but also had exact cache key hits on the restore, so they weren't saving the cache. HTH... |
Ah interesting. I don't know of anything specific that would cause a difference in behavior for save vs. restore - the issue could also exist in either code path since a leaked promise will block the runtime from executing regardless of where it's leaked from. But if all instances are showing a cache save, then that's a strong sign. Are there any examples of a save step that don't experience the delay? That is, do we have a 100% repro during cache saves? |
I think this is one https://github.com/Starkast/wikimum/actions/runs/6638880163/job/18036138791#step:4:124 |
This example is from Oct 25, so it was run before setup-ruby was updated to Node 20 on Oct 26 in v1.159.0 I see the delay only if the following two conditions are met:
I made this test:
By the way, there is a similar issue in setup-node after upgrading to Node 20, which also occurs only after saving the cache: |
Based on my observations above, I created a repo with a minimal reproducible example: There you can see that saving the cache is slow with |
Thanks! This is great investigation - let me get the toolkit team involved to look at the cache package and how its behavior might have changed in Node 20. |
This is likely due to nodejs/node#47228. The reason it regressed in Node 20 compared to Node 16 is because Node 19 changed the default of The HTTP client code is wrapped in the I've tested that PR (see Bo98/toolkit-test@2fc3770) and can confirm it fixes the slowness: https://github.com/Bo98/toolkit-test/actions/runs/6758550452/job/18370288942 (compared to without that PR: https://github.com/Bo98/toolkit-test/actions/runs/6758456188/job/18370108410).
The reason you likely saw this is because the restore side of |
Would this solve this keepAlive connections hanging? If so that sounds an easy/safe enough fix to merge. |
Yes, that helps, thank you all for the investigation! PR in #546 Slow build in https://github.com/ruby/setup-ruby/actions/runs/6788405695/job/18453344720 (2min 17s)
Fast build in https://github.com/ruby/setup-ruby/actions/runs/6788443622/job/18453477353?pr=546 (5s, as advertised :) )
|
@MSP-Greg You probably want to explicitly |
One left unknown is builds on #540 seem fast. Anyway, since we worked around the issue it's all fine now. |
No |
As with other actions like setup-node, I'm seeing 2-4 minute delays in post cache actions lately. Apparently this is because of a change in node behavior: ruby/setup-ruby#543 (comment) The fix, as with other actions, is to explicitly exit so as not to wait for hanging promises.
* Explicitly exit the process to not wait for hanging promises As with other actions like setup-node, I'm seeing 2-4 minute delays in post cache actions lately. Apparently this is because of a change in node behavior: ruby/setup-ruby#543 (comment) The fix, as with other actions, is to explicitly exit so as not to wait for hanging promises. * transpiled
Ensure the following before filing this issue
I verified it reproduces with the latest version with
- uses: ruby/setup-ruby@v1
(see Versioning policy)I tried to reproduce the issue locally by following the workflow steps (including all commands done by
ruby/setup-ruby
, except forDownloading Ruby
&Extracting Ruby
),and it did not reproduce locally (if it does reproduce locally, it's not a ruby/setup-ruby issue)
Are you running on a GitHub-hosted runner or a self-hosted runner?
GitHub-hosted runner
Link to the failed workflow job (must be a public workflow job, so the necessary information is available)
https://github.com/templatus/templatus-hotwire/actions/runs/6713970783/job/18246455669
Any other notes?
The action has become slow lately, the setting up Ruby went from around 10 seconds to over 2 minutes.
Furthermore, there is a strange difference between the duration measured by GitHub and the time indicated by the action itself. See this example: The job itself says "9.52 seconds", but GitHub says "2m 21s"
The code I use is very simple:
The text was updated successfully, but these errors were encountered: