-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: [ISSUE] "Error fetching data and indexing: [Errno 2] No such file or directory: '/data/INFILE.pdf'" in API workflow execution #595
Comments
@Deepak-Kesavan Thanks, but still got the same error in Workflow execution API returned different error from before then seems logs were truncated before finished. my update instructions:
I will remove all unstract images other than DB, and try again.
|
still unresolved. |
Thanks for the update @kun432 . We will investigate this further and get back. Please include the logs from unstract-worker if possible. |
I removed everything including all the container images, volumes, networks and tried again but no luck. Here's unstract-worker's log.
|
also some things I found:
Hope these are of some help. |
Regarding the above error, we are using the file named |
@kun432 I am unable to replicate the issue. I even tried renaming the file to Japanese text, and it ran successfully. Have you tried using a different file other than the one you are currently using? Additionally, by setting REMOVE_CONTAINER_ON_EXIT=False in the worker's .env file, you can prevent the tool container from being removed, which might provide additional logs. |
@kun432 in addition to what @Deepak-Kesavan suggested can you also check if there are any files with in the below folder
Please share the |
This means you used Llama Parse as text extractor? Because, as I said before,
so, I don't think this problems come from my Japanese invoice PDF files. |
@kun432 I initially thought the issue might be due to the name of the PDF, but it seems that was not the case. Could you please provide the information mentioned in the comments above by @ritwik-g and me so we can debug this further? |
@kun432 I missed this message earlier. If your use case is working fine with LLMWhisperer I think then the issue might be that the Llama Parser fails to parse japanese text? So can you confirm if the issue is happening mainly with Llama Parse? If that's the case you might need to try using Llama Parse directly once to see if the extraction is working or not. |
(after removed all cloned repo, containers, images, volumes and re-clone) newly set up and tested with Llama Parse as text extractor, and the same error happened again (currently seems 100% reproducable). I summarized my whole setup procedures and logs below: |
adding the result of using LLMWhisperer above Gist (see the last comment)
Llama Parse CAN handle Japanese text. Using Llama Parse, as I attached a screen shot in my first comment, extracting seems working in Prompt Studio (this means the document was parsed using Llama Parse, right?). so I guess there's something wrong in workflow execution and it will show up only when using Llama Parse. |
@kun432 yes this might be llama parse specific problem. Thanks for the detailed steps for reproducing. Let us take a look in to this. |
@kun432 looks like this is an issue already reported by our QA. This is a high priority bug but we are working on some other critical items. Will be picking this up as soon as possible. For the time being if you are able to make use of llmwhisprer please try to use it. |
Describe the bug
Following "Getting Started" instructions, but got an error in API workflow. Always the same errors happen.
To reproduce
Following "Getting Started" instructions, except:
Errors happens:
got the same error always.
Expected behavior
Environment details
Additional context
I have 3 sample PDF.
Parsing all of them works perfectly in Prompt Studio.
OTOH, in both API and Workflow, Parsing all of above always failed.
Screenshots
API reqeust from Postman
and logs
"Run Workflow"
Prompt Studio works.
The text was updated successfully, but these errors were encountered: