-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Tests timing out occasionally #4209
Comments
From a look at the top 50 slowest, there seems to be some kind of systematic increase in test run times in the failing runners. Note: test_streamplot_2D is > 20s even on a good day! |
@MDAnalysis/coredevs this is actually a pretty big issue, if anyone has any ideas here that'd be appreciated. |
My initial suspicion is that this might be happening: pytest-dev/pytest#11174 Need to gather a bit more evidence though (which annoyingly means re-running the tests until we can gather more random failures). A potential short term solution is to just stick on pytest-timeout, but it really only shifts the problem from a job not finishing to one failing (so really it counts as a failure both ways and CI remains red). |
Unfortunately I've been unable to narrow down where this is happening. From runs on #4197 my current conclusions are:
My current completely random guess is that it's something to do with the latest 7.4.0 release and maybe coverage? (I haven't seen it happen on runner not trying to use pytest-cov functionality). I've ran out of ideas for now and I don't think I have the time to debug any furtther. So I'm going to propose the following:
|
Sounds good to me @IAlibay, thanks for putting in a great effort trying to figure it out. |
Sounds very sensible. Thanks for all the work! |
* speed up `test_streamplot_2D()` because it has been reported to take ~20 seconds on a regular basis in CI in MDAnalysisgh-4209 * we don't really need to plot the output, which was taking most of the time, and instead we can just check the data structures that MDAnalysis returns (this may be a better test by some definitions anyway...); I suppose we could also spot check a few values in the arrays if we wanted as well * locally, that single test seems to run in 0.39 s on this branch vs. 4.7 s on `develop`
I'm seeing a consistent timeout (after 2.5 hours) in mdanalysis 2.5.0 on i386 after updating gsd to v3, see https://ci.debian.net/packages/m/mdanalysis/testing/i386/ The timeout occurs in testsuite/MDAnalysisTests/parallelism/test_multiprocessing.py |
Interesting, I completely forgot that the latest version of GSD had a very strange memory buffer thing going on. We did see tests time out in other places though, so I'm not fully convinced it's the sole issue here, but it is definitely worth investigating. |
Although @drew-parsons, MDAnalysis 2.5.0 doesn't support GSD 3.0+. The GSD release was after the 2.5.0 release so we didn't get to retrospectively pin this in our setup.py but it is reflected in our conda recipe. We're pretty swamped with just the two package indexes (pypi and conda-forge) so this kind of stuff tends to completely fall off our radar. Is there any way we can better synch up / report on this? |
@drew-parsons, I would very much appreciate if you didn't have to do that. Ideally the source code which we provide for releases here should match what folks encounter in the wild. If you need us to do a bugfix release then I am more than happy to do that if necessary. Would you mabe be interested in having a quick call to discuss your needs here? I don't fully understand the debian packaging ecosystem and I would like to make sure we are providing the right things where possible. If this is of interest could maybe ping us an email at [email protected] or maybe we can discuss this more on our discord? (https://discord.gg/fXTSfDJyxE) |
In a sense it's not a huge issue. Only i386 is failing here, other arches are happy As far as the Debian distributions goes, the issue is in Debian unstable, it's "supposed" to break from time to time. We'll fix it. The reason Debian needed gsd 3 is because 2.7 was failing to build with latest python tools, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1042136. It was simplest to upgrade gsd to v3 where this was fixed. gsd is being relegated to optional anyway from the perspective of mdanalysis, so temporary disruption on i386 is not a big problem (we can just skip the test if necessary). But a bugfix release would also be welcome. Good to not skip tests if we can run them successfully. |
Worth noting, the i386 test passes now. |
* TST: test_streamplot_2D faster * speed up `test_streamplot_2D()` because it has been reported to take ~20 seconds on a regular basis in CI in gh-4209 * we don't really need to plot the output, which was taking most of the time, and instead we can just check the data structures that MDAnalysis returns (this may be a better test by some definitions anyway...); I suppose we could also spot check a few values in the arrays if we wanted as well * locally, that single test seems to run in 0.39 s on this branch vs. 4.7 s on `develop` * TST: PR 4221 revisions * use `pytest.approx` for single value comparisons * `u1` and `v1` are now checked more thoroughly for their actual floating point values
This is still an issue, we are still getting a ton of failures with |
# test if they can be created without generating
# the offset simultaneously.
# The tested XTC file is way too short to induce a race scenario but the
# test is included as documentation for the scenario that used to create
# a problem (see PR #3375 and issues #3230, #1988) @yuxuanzhuang you wrote the test. Do you have some more insights what might be happening here? Are we using some sort of file-locking that might block everything? I have seen this test come up (always?) when I restarted a timed-out CI runner so it makes sense to figure this one out and then see if anything else is odd. |
I have been trying to battle with it over the past couple of days, but none of my attempts seem to be working. The branch of testing I am currently working on is the one with #4162 merged---so much higher failure rate :) (yuxuanzhuang#6). Here are the attempts that have failed:
The only solution seems to be disabling all tests that involve multiprocessing (dask, on the other hand, seems fine). I believe the test timeout is related to the multiprocessing pool not being able to terminate when a lot of jobs are present. Given that it's almost a 100% failure rate if parallel analysis codes are merged, compared to a 1/20 chance (?), I would assume that the more multiprocessing pools there are, the higher the chance of failure. The other problem is that it never failed on my local workstation or on the compute cluster nodes—so I need a 30-minute testing cycle to rule out any possible cases on GitHub, which is really frustrating. Moreover, sometimes the timeout appears as a test failure, and sometimes it simply doesn't show up except in the duration (e.g. here). |
@yuxuanzhuang that last test result you point to is a codecov failure, that's a different problem? |
I think the codecov failure is just because I hacked the repository to run the test on my own repo? |
Are you talking about this failed runner? https://github.com/yuxuanzhuang/mdanalysis/actions/runs/6074912996/job/16479999506 |
yes
|
Primarily I now see |
@orbeckst I will try and dig into this one ASAP. |
I believe I’ve finally identified the culprit behind the timeout: the I’m not very familiar with Python's Below is a minimal example that reproduces the hanging issue on my laptop (Mac M3 Pro): for i in {1..100}; do python test_sig.py; done ...
Exception ignored in: <function _get_module_lock.<locals>.cb at 0x1016ae160>
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 451, in cb
File "/Users/scottzhuang/jupyter_ground/sig.py", line 5, in <lambda>
signal.signal(signal.SIGTERM, lambda n, f: sys.exit(1))
^^^^^^^^^^^
SystemExit: 1
|
FYI: There was some concern about this code raised by @IAlibay in glotzerlab/gsd#255 . |
Ahhhh this makes a lot of sense. I had completely forgotten about that one - indeed the way GSD is doing this seems like it would be prone to issues. Thanks so much for all your work identifying this @yuxuanzhuang ! My take is that we should just raise this upstream and remove GSD from relevant multiprocessing tests, it's not something that can be fixed without a patch upstream. |
Amazing work @yuxuanzhuang! Thanks so much for diving deep on this, much appreciated. |
A few additional thoughts:
Since the second approach could potentially cause unforeseen interference with other packages, I would prefer the first approach if it’s feasible. |
GSD is an optional dependency, this is something we should be doing as much as possible. |
My suggested approach:
|
Actually I would even recommend we move towards #3 either way, we can always revert CI inclusion later. |
Really anything to get the CI to a state where we can get just a green pass for normal PRs... Option 3 sounds sensible. If we xfail GSD tests in standard CI, should we reduce the timeout duration from 200s to, say, 20s, so that these failures happen more quickly and not hold up runners? |
I assigned @yuxuanzhuang , primarily to indicate who's been doing the expert code sleuthing here ;-) 🔎 |
For some reason some of the tests are every so often timing out. It's been mostly the ubuntu full deps py3.10 gh actions tests, but we've seen it in the macos and azure runners too.
It's rather hard to debug and I'm currenttly cycling CI with pytest-timeout in #4197
The text was updated successfully, but these errors were encountered: