Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR workflow not working for specific documents #20

Open
snam24 opened this issue Jul 24, 2024 · 3 comments
Open

OCR workflow not working for specific documents #20

snam24 opened this issue Jul 24, 2024 · 3 comments
Assignees

Comments

@snam24
Copy link

snam24 commented Jul 24, 2024

It's stuck at the embedding stage. I've tried a solution to #7 but it did not help.
The output file such as original_filename (ocr).pdf is corrupted and can't be opened. Please see the debug below. When I open up the progress bar with typing ocr on Alfred, the debug goes on and on.

========================================
Workflow Cache Path
========================================
/Users/myname/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr
[10:29:18.371] OCR[Debug] Processing complete
[10:29:18.371] OCR[Debug] Passing output 'OCR Failure: Nothing to recognize
' to Play Sound
[10:29:23.292] OCR[Universal Action] Processing complete
[10:29:23.301] OCR[Universal Action] Passing output '/Users/myname/Library/CloudStorage/Dropbox/Papers/original_filename.pdf' to Arg and Vars
[10:29:23.303] OCR[Arg and Vars] Processing complete
[10:29:23.304] OCR[Arg and Vars] Passing output '' to Run Script
[10:29:30.701] OCR[Script Filter] Queuing argument '(null)'
[10:29:30.718] OCR[Script Filter] Script with argv '(null)' finished
[10:29:30.725] OCR[Script Filter] {
  "variables" : {
    "progress_step" : "1",
    "prevp" : "0",
    "stop" : "1"
  },
  "items" : [
    {
      "title" : "Embedding OCR 0 of 24 (50%)",
      "valid" : false,
      "subtitle" : "━───────────────────"
    }
  ],
  "rerun" : 0.1
}
[10:29:30.822] OCR[Script Filter] Queuing argument '(null)'
[10:29:30.908] OCR[Script Filter] Script with argv '(null)' finished
[10:29:30.913] OCR[Script Filter] {
  "variables" : {
    "stop" : "1",
    "progress_step" : "2",
    "prevp" : "0"
  },
  "items" : [
    {
      "title" : "Embedding OCR 0 of 24 (50%)",
      "valid" : false,
      "subtitle" : "━━──────────────────"
    }
  ],
  "rerun" : 0.1
}
[10:29:31.011] OCR[Script Filter] Queuing argument '(null)'
[10:29:31.023] OCR[Script Filter] Script with argv '(null)' finished
[10:29:31.027] OCR[Script Filter] {
  "variables" : {
    "stop" : "1",
    "prevp" : "0",
    "progress_step" : "3"
  },
  "rerun" : 0.1,
  "items" : [
    {
      "title" : "Embedding OCR 0 of 24 (50%)",
      "valid" : false,
      "subtitle" : "━━━─────────────────"
    }
  ]
}
[10:29:31.128] OCR[Script Filter] Queuing argument '(null)'
[10:29:31.203] OCR[Script Filter] Script with argv '(null)' finished
[10:29:31.209] OCR[Script Filter] {
  "rerun" : 0.1,
  "items" : [
    {
      "subtitle" : "━━━━────────────────",
      "title" : "Embedding OCR 0 of 24 (50%)",
      "valid" : false
    }
  ],
  "variables" : {
    "stop" : "1",
    "progress_step" : "4",
    "prevp" : "0"
  }
}
[10:29:31.308] OCR[Script Filter] Queuing argument '(null)'
[10:29:31.319] OCR[Script Filter] Script with argv '(null)' finished
[10:29:31.325] OCR[Script Filter] {
  "items" : [
    {
      "valid" : false,
      "subtitle" : "━━━━━━──────────────",
      "title" : "Embedding OCR 0 of 24 (50%)"
    }
  ],
  "rerun" : 0.1,
  "variables" : {
    "progress_step" : "5",
    "stop" : "1",
    "prevp" : "0"
  }
}
[10:29:31.424] OCR[Script Filter] Queuing argument '(null)'
[10:29:31.436] OCR[Script Filter] Script with argv '(null)' finished
[10:29:31.437] OCR[Script Filter] {
  "items" : [
    {
      "title" : "Embedding OCR 0 of 24 (50%)",
      "subtitle" : "━━━━━━━─────────────",
      "valid" : false
    }
  ],
  "variables" : {
    "stop" : "1",
    "prevp" : "0",
    "progress_step" : "6"
  },
  "rerun" : 0.1
}
[10:29:31.541] OCR[Script Filter] Queuing argument '(null)'
[10:29:31.616] OCR[Script Filter] Script with argv '(null)' finished
[10:29:31.622] OCR[Script Filter] {
  "items" : [
    {
      "valid" : false,
      "title" : "Embedding OCR 0 of 24 (50%)",
      "subtitle" : "━━━━━━━━────────────"
    }
  ],
  "variables" : {
    "progress_step" : "7",
    "prevp" : "0",
    "stop" : "1"
  },
  "rerun" : 0.1
}
[10:29:31.721] OCR[Script Filter] Queuing argument '(null)'
[10:29:31.733] OCR[Script Filter] Script with argv '(null)' finished
[10:29:31.738] OCR[Script Filter] {
  "rerun" : 0.1,
  "variables" : {
    "progress_step" : "8",
    "prevp" : "1",
    "stop" : "1"
  },
  "items" : [
    {
      "subtitle" : "━━━━━━━━━───────────",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ]
}
[10:29:31.838] OCR[Script Filter] Queuing argument '(null)'
[10:29:31.913] OCR[Script Filter] Script with argv '(null)' finished
[10:29:31.918] OCR[Script Filter] {
  "rerun" : 0.1,
  "variables" : {
    "progress_step" : "9",
    "prevp" : "1",
    "stop" : "1"
  },
  "items" : [
    {
      "valid" : false,
      "subtitle" : "━━━━━━━━━━──────────",
      "title" : "Embedding OCR 1 of 24 (52%)"
    }
  ]
}
[10:29:32.018] OCR[Script Filter] Queuing argument '(null)'
[10:29:32.029] OCR[Script Filter] Script with argv '(null)' finished
[10:29:32.034] OCR[Script Filter] {
  "variables" : {
    "progress_step" : "10",
    "prevp" : "1",
    "stop" : "1"
  },
  "items" : [
    {
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false,
      "subtitle" : "━━━━━━━━━━━─────────"
    }
  ],
  "rerun" : 0.1
}
[10:29:32.135] OCR[Script Filter] Queuing argument '(null)'
[10:29:32.209] OCR[Script Filter] Script with argv '(null)' finished
[10:29:32.215] OCR[Script Filter] {
  "rerun" : 0.1,
  "variables" : {
    "stop" : "1",
    "prevp" : "1",
    "progress_step" : "11"
  },
  "items" : [
    {
      "subtitle" : "━━━━━━━━━━━━────────",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ]
}
[10:29:32.314] OCR[Script Filter] Queuing argument '(null)'
[10:29:32.389] OCR[Script Filter] Script with argv '(null)' finished
[10:29:32.395] OCR[Script Filter] {
  "rerun" : 0.1,
  "variables" : {
    "prevp" : "1",
    "progress_step" : "12",
    "stop" : "1"
  },
  "items" : [
    {
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false,
      "subtitle" : "━━━━━━━━━━━━━───────"
    }
  ]
}
[10:29:32.494] OCR[Script Filter] Queuing argument '(null)'
[10:29:32.569] OCR[Script Filter] Script with argv '(null)' finished
[10:29:32.574] OCR[Script Filter] {
  "variables" : {
    "stop" : "1",
    "prevp" : "1",
    "progress_step" : "13"
  },
  "items" : [
    {
      "subtitle" : "━━━━━━━━━━━━━━──────",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ],
  "rerun" : 0.1
}
[10:29:32.675] OCR[Script Filter] Queuing argument '(null)'
[10:29:32.750] OCR[Script Filter] Script with argv '(null)' finished
[10:29:32.754] OCR[Script Filter] {
  "items" : [
    {
      "subtitle" : "─━━━━━━━━━━━━━━─────",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ],
  "variables" : {
    "stop" : "1",
    "progress_step" : "14",
    "prevp" : "1"
  },
  "rerun" : 0.1
}
[10:29:32.855] OCR[Script Filter] Queuing argument '(null)'
[10:29:32.866] OCR[Script Filter] Script with argv '(null)' finished
[10:29:32.871] OCR[Script Filter] {
  "items" : [
    {
      "valid" : false,
      "subtitle" : "──━━━━━━━━━━━━━━────",
      "title" : "Embedding OCR 1 of 24 (52%)"
    }
  ],
  "rerun" : 0.1,
  "variables" : {
    "stop" : "1",
    "progress_step" : "15",
    "prevp" : "1"
  }
}
[10:29:32.971] OCR[Script Filter] Queuing argument '(null)'
[10:29:33.046] OCR[Script Filter] Script with argv '(null)' finished
[10:29:33.051] OCR[Script Filter] {
  "variables" : {
    "prevp" : "1",
    "progress_step" : "16",
    "stop" : "1"
  },
  "rerun" : 0.1,
  "items" : [
    {
      "subtitle" : "───━━━━━━━━━━━━━━───",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ]
}
[10:29:33.152] OCR[Script Filter] Queuing argument '(null)'
[10:29:33.226] OCR[Script Filter] Script with argv '(null)' finished
[10:29:33.231] OCR[Script Filter] {
  "rerun" : 0.1,
  "variables" : {
    "prevp" : "1",
    "stop" : "1",
    "progress_step" : "17"
  },
  "items" : [
    {
      "subtitle" : "────━━━━━━━━━━━━━━──",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ]
}
[10:29:33.331] OCR[Script Filter] Queuing argument '(null)'
[10:29:33.406] OCR[Script Filter] Script with argv '(null)' finished
[10:29:33.411] OCR[Script Filter] {
  "variables" : {
    "progress_step" : "18",
    "stop" : "1",
    "prevp" : "1"
  },
  "items" : [
    {
      "subtitle" : "─────━━━━━━━━━━━━━━─",
      "valid" : false,
      "title" : "Embedding OCR 1 of 24 (52%)"
    }
  ],
  "rerun" : 0.1
}
[10:29:33.511] OCR[Script Filter] Queuing argument '(null)'
[10:29:33.522] OCR[Script Filter] Script with argv '(null)' finished
[10:29:33.527] OCR[Script Filter] {
  "rerun" : 0.1,
  "items" : [
    {
      "valid" : false,
      "subtitle" : "──────━━━━━━━━━━━━━━",
      "title" : "Embedding OCR 1 of 24 (52%)"
    }
  ],
  "variables" : {
    "progress_step" : "19",
    "stop" : "1",
    "prevp" : "1"
  }
}
[10:29:33.628] OCR[Script Filter] Queuing argument '(null)'
[10:29:33.703] OCR[Script Filter] Script with argv '(null)' finished
[10:29:33.708] OCR[Script Filter] {
  "items" : [
    {
      "subtitle" : "───────━━━━━━━━━━━━━",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ],
  "variables" : {
    "prevp" : "1",
    "stop" : "1",
    "progress_step" : "20"
  },
  "rerun" : 0.1
}
[10:29:33.808] OCR[Script Filter] Queuing argument '(null)'
[10:29:33.882] OCR[Script Filter] Script with argv '(null)' finished
[10:29:33.883] OCR[Script Filter] {
  "rerun" : 0.1,
  "variables" : {
    "stop" : "1",
    "progress_step" : "21",
    "prevp" : "1"
  },
  "items" : [
    {
      "valid" : false,
      "title" : "Embedding OCR 1 of 24 (52%)",
      "subtitle" : "────────━━━━━━━━━━━━"
    }
  ]
}
[10:29:33.987] OCR[Script Filter] Queuing argument '(null)'
[10:29:34.062] OCR[Script Filter] Script with argv '(null)' finished
[10:29:34.068] OCR[Script Filter] {
  "rerun" : 0.1,
  "items" : [
    {
      "subtitle" : "─────────━━━━━━━━━━━",
      "title" : "Embedding OCR 1 of 24 (52%)",
      "valid" : false
    }
  ],
  "variables" : {
    "prevp" : "1",
    "progress_step" : "22",
    "stop" : "1"
  }
}
[10:29:34.167] OCR[Script Filter] Queuing argument '(null)'
[10:29:34.242] OCR[Script Filter] Script with argv '(null)' finished
[10:29:34.246] OCR[Script Filter] {
  "rerun" : 0.1,
  "items" : [
    {
      "valid" : false,
      "title" : "Embedding OCR 1 of 24 (52%)",
      "subtitle" : "──────────━━━━━━━━━━"
    }
  ],
  "variables" : {
    "stop" : "1",
    "prevp" : "1",
    "progress_step" : "23"
  }
}
[10:29:34.347] OCR[Script Filter] Queuing argument '(null)'
[10:29:34.423] OCR[Script Filter] Script with argv '(null)' finished
[10:29:34.428] OCR[Script Filter] {
  "variables" : {
    "stop" : "1",
    "progress_step" : "24",
    "prevp" : "2"
  },
  "rerun" : 0.1,
  "items" : [
    {
      "valid" : false,
      "subtitle" : "───────────━━━━━━━━━",
      "title" : "Embedding OCR 2 of 24 (54%)"
    }
  ]
}
[10:29:34.528] OCR[Script Filter] Queuing argument '(null)'
[10:29:34.603] OCR[Script Filter] Script with argv '(null)' finished
[10:29:34.608] OCR[Script Filter] {
  "variables" : {
    "stop" : "1",
    "progress_step" : "25",
    "prevp" : "2"
  },
  "rerun" : 0.1,
  "items" : [
    {
      "title" : "Embedding OCR 2 of 24 (54%)",
      "subtitle" : "────────────━━━━━━━━",
      "valid" : false
    }
  ]
}
[10:29:34.708] OCR[Script Filter] Queuing argument '(null)'
[10:29:34.782] OCR[Script Filter] Script with argv '(null)' finished
[10:29:34.788] OCR[Script Filter] {
  "variables" : {
    "stop" : "1",
    "progress_step" : "26",
    "prevp" : "2"
  },
  "items" : [
    {
      "subtitle" : "─────────────━━━━━━━",
      "title" : "Embedding OCR 2 of 24 (54%)",
      "valid" : false
    }
  ],
  "rerun" : 0.1
}
@zeitlings
Copy link
Owner

Hey @snam24
If clearing the cache files as described in #7 doesn't resolve the issue (/Users/myname/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr), the problem might be related to the specific document you're trying to perform OCR on.

  • Is this happening with all PDFs or just specific documents?
  • If possible, could you share the PDF that's causing the issue? This will allow me to investigate further and potentially identify the root cause.

@snam24
Copy link
Author

snam24 commented Jul 26, 2024

Hi @zeitlings,

Thanks for getting back to me. I tried another document and it worked. I guess it's related to the specific files as you said. I'm afraid I cannot upload/share the pdfs on the public space, but you can download them below if you have access. Thanks!

https://doi.org/10.1093/rfs/7.1.125
https://www.jstor.org/stable/2946648

@zeitlings
Copy link
Owner

Hey @snam24
Thanks for sharing that info and providing the sources! I'll definitely take a closer look when I have some time.

@zeitlings zeitlings self-assigned this Jul 26, 2024
@zeitlings zeitlings changed the title OCR workflow not working OCR workflow not working for specific documents Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants