Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch exceptions so we don't die #94

Open
dsschult opened this issue Jun 1, 2017 · 3 comments
Open

catch exceptions so we don't die #94

dsschult opened this issue Jun 1, 2017 · 3 comments

Comments

@dsschult
Copy link
Collaborator

dsschult commented Jun 1, 2017

@philippeller says:

I'm running the pyglidein client as a standalone process with the delay = ... seconds option.....every now and then a qsub command fails and excepts.....

Traceback (most recent call last):
  File "./client.py", line 198, in <module>
    main()
  File "./client.py", line 174, in main
    scheduler.submit(s, partition)
  File "/storage/home/pde3/pyglidein/submit.py", line 336, in submit
    raise Exception('failed to launch glidein')
Exception: failed to launch glidein

Could we change the bahviour of the exception handling that the client stays alive?

@dsschult
Copy link
Collaborator Author

dsschult commented Jun 1, 2017

Just put a general try / except catch around the entire client so it can loop properly.

@jvansanten
Copy link
Collaborator

At my site I explicitly turn warnings into errors with qsub -w e, as this usually means that I need to intervene because the client is submitting jobs that will never be serviced. You can also make qsub fail gracefully with qsub -w w (if you want to know about it) or qsub -w n (if you don't).

@dsschult
Copy link
Collaborator Author

dsschult commented Jun 1, 2017

I'd argue that the client should know that it didn't launch a glidein, but we should continue running in the hope that it's transient.

Ideally we'd just pass the failure info to monitoring, which would alert the human.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants