Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot create yadage workflows #535

Closed
tiborsimko opened this issue Aug 8, 2022 · 4 comments · Fixed by #536
Closed

cannot create yadage workflows #535

tiborsimko opened this issue Aug 8, 2022 · 4 comments · Fixed by #536

Comments

@tiborsimko
Copy link
Member

Seeing the following problem:

$ cd reana-demo-helloworld 
$ reana-client run -w test -f ./reana-yadage.yaml
==> Creating a workflow...
==> Verifying REANA specification file... /reana-demo-helloworld/reana-yadage.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying REANA specification parameters...
  -> SUCCESS: REANA specification parameters appear valid.
==> Verifying workflow parameters and commands...
  -> SUCCESS: Workflow parameters and commands appear valid.
==> Verifying dangerous workflow operations...
  -> SUCCESS: Workflow operations appear valid.
==> Verifying compute backends in REANA specification file...
  -> SUCCESS: Workflow compute backends appear to be valid.
==> ERROR: Cannot create workflow test:
None is not of type 'object'

Failed validating 'type' in schema['properties']['workflow']['properties']['specification']:
    {'id': '/properties/workflow/properties/specification',
     'title': 'Workflow specification in yaml format.',
     'type': 'object'}

On instance['workflow']['specification']:
    None

Found out that it is due to #530, the bisecting showed:

*  (HEAD -> bad, upstream/pr/530) 82b9689 rest: validate REANA specification, Vladyslav Moisieienkov, 5 days ago
*  (upstream/pr/529, good) fde3591 openapi: add missing schemas to error responses, Bruno Rosendo, 5 days ago

@VMois Can you please have a look?

@VMois
Copy link

VMois commented Aug 8, 2022

The command below for the Serial workflow works:

reana-client run -w test

The issue is that when Yadage is used, specification stays None. Below is specification right before being sent to server.

{'version': '0.3.0', 'inputs': {'files': ['code/helloworld.py', 'data/names.txt'], 'directories': ['workflow/yadage'], 'parameters': {'sleeptime': 0, 'inputfile': 'data/names.txt', 'helloworld': 'code/helloworld.py'}}, 'workflow': {'type': 'yadage', 'file': 'workflow/yadage/workflow.yaml', 'specification': None}, 'outputs': {'files': ['helloworld/greetings.txt']}}

The reason why Yadage fails is that reana-client doesn't send a Yadage specification (code). It is added later when the workflow starts (code)

@VMois
Copy link

VMois commented Aug 8, 2022

Some notes:

CWL, Snakemake, and Yadage are multi-file workflow engines. The create endpoint is only responsible for creating a workflow. No files have been uploaded yet, so we cannot read the specifications. But how did it work before? Currently, reana-client loads both CWL and Snakemake locally and sends them as reana_specification to the create endpoint. Due to the exception with Yadage, its specification is loaded in start_workflow. Not a clean way, but it works.

With reana-client-go, we will be unable to load workflow specifications for CWL, Snakemake, and Yadage. Because of that, we will need to move that to reana-server.

It is also important to remember that the create endpoint is also responsible for Gitlab integration which works a bit differently from how our users work with workflows, but this is something to think about later.

Keeping things above in mind, let's think about what would be a good clean step-by-step way of starting workflow:

  1. create_workflow. Check quotas, validate workflow name and REANA specification (not a specification for CWL, Yadage, or Snakemake), and create workflow and workspace.
  2. upload. User uploads necessary files, including files related to workflow engine specification, etc.
  3. start_workflow. Check quotas, load engines specification, validate engines specification, and start a workflow.

note: I would also include our idea of switching from reana_specification in DB to reading from reana.yaml directly. Here is the issue.

Looks nice. One exception here is Gitlab integration which will still require steps 1-3 to be present in the create endpoint.

So, what to do next? I propose moving validation to start_workflow for this issue. It should solve a problem. In addition, I will create an issue about loading engine specifications in start_workflow. This will help reana-client-go in the nearest future.

What do you think?

P.S Looking into r-server and r-w-c code clearly shows we have boundary issues between them. Half the logic for workflow creation is in r-server, and another half is in r-w-c. We had multiple issues open talking about this problem: #508, #478. Spending some time addressing those issues in the near future would be useful. cc: @tiborsimko

@tiborsimko
Copy link
Member Author

@VMois Nice analysis, I fully agree with making only a quick fix here (which will fix the current status of the master branch) and proceeding with the idea of getting rid of reana_spec as part of the other linked issue (which will take quite long to implement).

One downside to the proposed quick fix, if we do the validation only at the time of starting the workflow, is that the clients might upload O(100 GB) worth of big files before discovering that their Yadage workflow specification have some problem. We could implement it as a quick fix, but perhaps improve immediately this, either by run yadage validation on the (Python) thick client side, or for the later thin-client future we could upload just parts necessary for the workflow to be validated and created (i.e. parse reana.yaml and read the necessary workflow: include files), which would enable to make the validation and creation sooner before we would proceed with the regular "all file upload" . (But this could be done also later during moving away from reana_spec and moving towards thin clients.)

@VMois
Copy link

VMois commented Aug 9, 2022

One downside to the proposed quick fix, if we do the validation only at the time of starting the workflow, is that the clients might upload O(100 GB) worth of big files before discovering that their Yadage workflow specification have some problem.

That is a good point. For now, when you restart workflow, you can specify a new reana_specification file in start_workflow. So technically, in case of invalid specification on a first try, the user can submit a new one without re-uploading files (cause restarted workflows share a workspace). But, I agree, this is not ideal either. I would prefer a validation step between creating workflow and uploading big files.

VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
VMois pushed a commit to VMois/reana-server that referenced this issue Aug 9, 2022
- because client doesn't include Yadage specification in "create" endpoint, validation fails;
- workflow validation is moved to "start_workflow" so in case Yadage specification has multiple files, they can be loaded and properly validated.

closes reanahub#535
@mdonadoni mdonadoni self-assigned this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants