-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E3SM-IO failed on 1-process run #37
Comments
Yes, I'll do some more testing and release a new version today. |
@wkliao, I just released v1.8, please let me know if you find any issue. |
I am getting a test program hanging problem when There is no error message. The test was terminated as it ran out of time. |
Just realized that the failed test program was not using Log VOL. |
Hi @wkliao, I tried the Log VOL group and other basic tests on Perlmutter and they all ran successfully with Cache and Async VOL. So I'm not sure what went wrong there, can you try running the test again? Is there a verbose mode that can print out where it got stuck? |
As this failure happened on the GitHub actions, I suggest to create a new workflow in Async VOL to test group.cpp only. Please use the following software.
You can reuse part of the yaml file. Note testing group.cpp requires no Log VOL |
I reran the same GitHub workflow again and it failed (hang) at a different test program. The test uses the following environment variables. Could you please check whether they are OK.
|
The HDF5_ASYNC_EXE_* ones are not necessary but they should be harmless, I'll try setting up an environment the same as the GitHub Actions runner and find out what is causing the hang. |
I have a new vol-async 1.8.1 release which seems to fix the hang issue, however, there are new errors with |
The error message says Cache VOL requires the test programs to call
|
The hanging problem re-appeared in E3SM-IO. |
I think Huihuo has been updating Cache VOL actively, probably better for the E3SM-IO tests to use the release version. |
Currently, there is no release versions in Cache VOL. I have made a request, see HDFGroup/vol-cache#22. |
@wkliao if you like, you can try the previous v1.2 release: https://github.com/hpc-io/vol-cache/releases/tag/v1.2. I'll push a new release soon. |
I see hang issue with F case. Basically, it stops at H5VLfile_close call. |
Hi @zhenghh04 |
I am using the develop branch of vol-async 73a870d to test E3SM-IO benchmark.
One of the tests failed. The failed command runs on 1 MPI process, but
the same command runs fine with 16 processes.
Below are the related env variables.
Here is the run command.
Part of GDB trace is given below.
The text was updated successfully, but these errors were encountered: