Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If a job fails very quickly, we never get any logs #79

Open
ihodes opened this issue Dec 8, 2016 · 5 comments
Open

If a job fails very quickly, we never get any logs #79

ihodes opened this issue Dec 8, 2016 · 5 comments
Labels

Comments

@ihodes
Copy link
Member

ihodes commented Dec 8, 2016

No description provided.

@ihodes ihodes added the bug label Dec 8, 2016
@smondet
Copy link
Member

smondet commented Dec 8, 2016

We already try to save the logs when a job dies: https://github.com/hammerlab/coclobas/blob/master/src/lib/server.ml#L227

What happened in your case?
Can you still describe it for example?

@ihodes
Copy link
Member Author

ihodes commented Dec 8, 2016

Describe works, the job is run on the docker, but dies immediately (bad CLI args in my shell script) and exists.

opam@e2b78e43fa00:/coclo/_cocloroot/logs/logs/job/522c556b-b975-567e-b254-02d4beadc9ca/commands$ cat 1481229903345_3c1b3504.json
{
  "command": {
    "command": "kubectl logs 522c556b-b975-567e-b254-02d4beadc9ca",
    "stdout": "",
    "stderr":
      "Error from server: Get https://gke-ihodes-coco3-cluster-default-pool-36378887-pskd:10250/containerLogs/default/522c556b-b975-567e-b254-02d4beadc9ca/522c556b-b975-567e-b254-02d4beadc9cacontainer: No SSH tunnels currently open. Were the targets able to accept an ssh-key for user \"gke-e170239faa5e49b2ac95\"?\n",
    "status": [ "Exited", 1 ],
    "exn": null
  }
}

@ihodes
Copy link
Member Author

ihodes commented Dec 8, 2016

This may be due to the Google Gcloud metadata limitation; we run out of room at 32kb or something absurd (project-wide).

@armish
Copy link
Member

armish commented Dec 8, 2016

I also had similar issues where the describe log showed a successful allocation of resources and the initiation of the job, yet the job fails without any kubernetes log. For example, when you pass invalid URLs to wget (that is passing a poorly constructed URL to either --tumor, --rna or --normal), those fetch jobs also fail fast and leave no trace behind them.

@ihodes
Copy link
Member Author

ihodes commented Dec 8, 2016

This may have been the "ran out of metadata space on GCP" issue again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants