-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crusher Pipeline Status #33
base: develop
Are you sure you want to change the base?
Conversation
PR is in good shape, but for some reason we aren't able to trigger Crusher pipelines through a pipeline trigger at the moment https://code.olcf.ornl.gov/ci/csc359/dev/exago/-/issues/5 |
Seems like we will only be able to run on develop on a schedule, and then for PRs you can manually trigger in the web UI. This also means that spack modules will have to be maintained by hand by @nkoukpaizan and others for the meantime. |
Fun bug here was pre-commit can't auto-fix GitHub workflow pipelines without more permissions. Leaving this as a feature, not a bug for now unless someone objects. |
This is almost ready for review. Thanks @nkoukpaizan for getting the updated modules in place to make this relatively trivial :) |
Sbatch account needs updating, and for some reason the success pipeline is triggering as well as the failure one. The needs relationship between pipelines should also be addressed to run in parallel / without blocking. |
Getting some test failures on Crusher. TODO:
|
20 and 40 are new failures to me. 33 and 34 are already skipped on other testing platforms. This will be re-visited after P6 and JIRA have been closed out, and will likely come in next release unless someone has some time to take a look. cc @jaelynlitz since you were dealing with the latest relevant GitLab issue https://gitlab.pnnl.gov/exasgd/frameworks/exago/-/issues/459 Since we likely need to update HiOp, lets wait on things like #20 and #30 to be merged before re-visiting this IMO |
@abhyshr Since the pipelines are functional, another option that I did not initially consider is just merging and having failing pipelines on Crusher for the moment. The issues may be resolved in the end through hiop updates and other bugfixes, so we can always re-visit and see if there are platform specific bugs... |
.gitlab/ornl/ascent.gitlab-ci.yml
Outdated
@@ -102,6 +102,7 @@ | |||
- rm -rf "$WORKDIR" | |||
|
|||
Ascent Build: | |||
needs: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am struggling to understand the difference between dependencies and needs in gitlab terminology but other than my lack of knowledge these changes look reasonable. Great work.
PS:
Dependencies just inherit artifacts form previous jobs.
Needs lets you run out of order of staging .
At least that my understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct. I had to google to understand this. needs: []
just lets a job run ASAP when the workflow starts. We probably only needs dependencies for reporting jobs on PNNL where partition matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just cherry-pick this fix into #20
The 4 test failures are either num iters or objective value differences which is to be expected with changing the version of hiop right? I suppose a question for @abhyshr if the differences are within a tolerance we feel okay with allowing under a warning For ref:
|
It is very much possible that the solution changes a bit (objective value and the number of iterations) when there is an update to HiOp (or any of its dependencies). I would check the solution on other platforms and see if you are getting the same test failures as well. In that case, we will need to update the reference/expected solution for the failing tests. |
Moving to 1.6.1 for now |
1af859c
to
4224870
Compare
Closes #9