Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in object input structure use cached values rather than starting new runs for exec #4916

Open
mahesh-panchal opened this issue Apr 15, 2024 · 1 comment · May be fixed by #5351
Open

Comments

@mahesh-panchal
Copy link
Contributor

Bug report

Expected behavior and actual behavior

Changes to objects passed as input should trigger new runs when using -resume. When changing an object's structure, e.g. putting it in an array, or map, the process uses the cached values from before, rather than executing new runs.

Steps to reproduce the problem

First run:

workflow {
    Channel.of(
        [ id:'foo', taxid: '632'],
        [ id:'bar', taxid: '632']
    )
    | TASK
    | view
}

process TASK {
    input:
    val meta

    exec:
    file("$task.workDir/node_id.txt").text = meta.taxid
    
    output:
    tuple val(meta), path('node_id.txt'), emit: node_id
}

Changed code (run 2):

workflow {
    Channel.of(
        [map:[ id:'foo', taxid: '632']],
        [map:[ id:'bar', taxid: '632']]
    )
    | TASK
    | view
}

process TASK {
    input:
    val meta

    exec:
    file("$task.workDir/node_id.txt").text = meta.taxid
    
    output:
    tuple val(meta), path('node_id.txt'), emit: node_id
}

Program output

Run 1:

$ nextflow run main.nf 
N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [zen_leibniz] DSL2 - revision: eaf2b5bb3d
executor >  local (2)
[31/f024e0] process > TASK (2) [100%] 2 of 2 ✔
[[id:foo, taxid:632], /workspace/Nextflow_sandbox/work/8e/853bdb1d954eef65d82eb81ba481e1/node_id.txt]
[[id:bar, taxid:632], /workspace/Nextflow_sandbox/work/31/f024e0f621ef2b27d3567887a7c5d3/node_id.txt]

Run 2: Expect error due to change in input object structure, but uses cached values instead.

$ nextflow run main.nf -resume
N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [cheesy_allen] DSL2 - revision: 8074d4d3e0
[6b/fb2322] process > TASK (1) [100%] 2 of 2, cached: 2 ✔
[[id:bar, taxid:632], /workspace/Nextflow_sandbox/work/56/2935d71088d17207dab38381590386/node_id.txt]
[[id:foo, taxid:632], /workspace/Nextflow_sandbox/work/6b/fb2322b25197873c2aaf6f0c75084e/node_id.txt]

Environment

  • Nextflow version: 23.10.1
  • Java version: openjdk 17.0.10-internal 2024-01-16
  • Operating system: Linux (Gitpod)
  • Bash version: GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
@bentsherman
Copy link
Member

This is happening because maps are hashed by hashing their values:

if( value instanceof Map ) {
// note: should map be order invariant as Set ?
for( Object item : ((Map)value).values() )
hasher = CacheHelper.hasher( hasher, item, mode );
return hasher;
}

So there is no difference between a value and value in a map, or even changing the keys of the map, though changing the order will change the hash.

@bentsherman bentsherman added the bug label Oct 2, 2024
@bentsherman bentsherman linked a pull request Oct 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants