-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify environment detection #499
Comments
I know of several users who rely on the default scheduler detection for systems without templates. I think it is a very helpful and important feature for ease of use. There have only been a couple exceptions where this hasn’t worked, on clusters with multiple (incorrectly configured) schedulers, and those are easy to resolve by defining a custom environment as described above (roughly the same amount of work that would be required for ALL systems without templates if the default scheduler environments were removed). I’m strongly in favor of the status quo here. |
Really I think the only thing we could do better is change the logic a bit so that if more than one scheduler executable is detected, we print a warning that the scheduler type is ambiguous and we are guessing $scheduler_name I also think autodetection is important and short of some user survey it is hard to guess how many people have had things JustWork ™️ since they don't open up github issues. |
Interesting, could you share what systems they are using it on? If people are actually making use of this feature I'm fine preserving it, but we're almost guaranteed to get misclassifications on systems without a defined environment unless we start parsing the exception payloads rather than using the simple solution of just returning Of the machines that we support, almost all of them would fall into the "incorrectly configured" bucket that you mentioned. Great Lakes, Comet, Bridges (not sure about Bridges2), and Stampede2 all have a |
Agreed, but we already print the environment which for the base scheduler cases is hopefully obvious (e.g.
Yeah that's absolutely true. I can only speak to anecdotal experience, which aren't very helpful. I've either helped people from other places set up custom environments, or helped people set up and run a scheduler locally. The latter case definitely benefits from scheduler-only detection. |
One other alternative to address this issue would be to make it even easier to configure a custom environment with the correct scheduler and template. I am imagining some kind of assistant like this:
This would probably much easier to achieve if we allowed the template and environment configuration via config file. Likely preferable over the current approach anyways. |
I like that idea, I think it would be nice. I should be able to handle the flow schema work soon (just pending wrapping up some of the administrative hurdles discussed offline) and we could consider changes to support this as part of schema version 2. Some informal polling (thanks @bdice!) indicated that one other critical use case is the use of a default scheduler environment with a custom template. That case alone is IMO sufficient reason to support scheduler-based detection. Bradley also made the point (offline) that environments don't really matter until the point that users interact with them (e.g. via submission or checking status), which I agree with. My main concern with #498 is that in my experience it basically means that most machines running a SLURM scheduler will also detect the presence of a PBS scheduler because I think the main action item then is to to make sure that all signac-flow code paths that interact with the environment will give a consistent error if they detect that they're using a command for a scheduler that isn't actually present. This seems like something that should be implemented at the level of the |
@bdice @mikemhenry @csadorf following up on #498, I think we should consider removing scheduler-based environment detection entirely. @bdice correct me if I've misunderstood the problem, but #498 came up because we don't support Andes, so this check succeeds for the
DefaultLSFEnvironment
and we fall back to scheduler detection. When this logic was originally written, subclassingEnvironment
was nontrivial, but now a minimal environment just needs to define thehostname_pattern
to be detected.If we're concerned that regexes will scare users off of defining new environments (and I'm not sure that we should be), we could even define a small convenience
DEFAULT_PATTERN=".*"
. This would make defining a new environment as simple as:Such environments would fall back to the default Slurm template but otherwise be functional. We would also change
ComputeEnvironment.is_present
to always returnFalse if hostname_pattern is None
. This change would remove any erroneous environment detection, simplify our code, and remove the need for unnecessary scheduler interaction. Is there any reason this wouldn't work?The text was updated successfully, but these errors were encountered: