-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate CI failures #56
Conversation
Huh. Works like a charm. I'll rerun to see if it's a fluke. |
Seems to have failed again. Hm, so it's not numpy and not related to anything in #55. |
Thanks @isuruf for checking for sumpy influence! I just added a run without cc @kaushikcfd |
Hmm.
|
So the first go passed. Now running a second round to see if it's repeatable. |
For perspective, the examples failure is distinct. Full story at #57. This did not exhibit the failure I am concerned about here, which is the main Linux pytest failure. I'll rerun again, to see if it holds up. |
Did the Linux tests fail after numpy was bumped down? I just remember the examples failing in a while now. Those failures really looked like some sort of out of memory issue. Did https://gitlab.tiker.net/inducer/pytential/-/issues/131 ever improve? |
The only way I know to curb this misery going forward is running downstream CI along with upstream projects, in this case loopy, as I propose here: inducer/loopy#220. If that works out, I'll probably apply the same idea to meshmode (inducer/meshmode#113). |
I don't recall such an instance. I'll bump numpy back up here, to check. But I don't expect it to fail.
I agree, though I don't see (yet?) how the loopy PRs would inflate memory usage in a substantial fashion.
No, didn't. It just wasn't bad enough to be a problem. In addition, there's a similar-looking mystery (illinois-ceesd/mirgecom#212) being chased down in mirgecom. |
It seems to have failed, rerun?
Maybe worth adding a memory pool already in pytest_generate_tests_for_pyopencl_array_context? Although yeah, that would just hide the issue. |
Wha? 🤯 Sure, I'll rerun, but now I don't know what to believe. Is this something that's brought about by numpy or the loopy or both? |
Ugh, no. Not a fan of sweeping stuff under the rug. |
Just to add another variable: looking at the CI history, last scheduled run on Ubuntu 18.04 passed just fine, but then the next ones on Ubuntu 20.04 started failing. Can we pin it to Ubuntu 18.04 to see if that passes reliably? Besides that, no idea, since it seems to pass intermittently.. |
Hmm, so possibly the common theme among all these changes (newer ubuntu, loopy PRs, numpy 1.20) is just that they each ever so slightly increase memory usage... |
Passed this time around, FWIW. |
Alright, I'm now super confused. Reverting the Loopy PRs that we suspected caused problems actually did exactly nothing to help inducer/loopy#220 pass. So that theory is pretty dead in the water to me. |
My next best plan is to go hunt this stupid memory leak. Grrr. |
illinois-ceesd/mirgecom#212 if you'd like to follow the saga. |
Using jemalloc for the CI (#58) seems to help. See illinois-ceesd/mirgecom#212 for more details. Closing here. |
cc @alexfikl