-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--example option gets content,metadata,names,read, but cannot get 'text' #171
Comments
Since this is removed from the examples, does it mean text extraction is no longer possible? I'm looking to find a pdf library to extract text from a pdf, not the whole content just the text. Was that what the example did, and if so is there a way to still do that? Btw thanks for sharing this library! 🚀 |
FWIW this is the commit that removed the text example: 520cd39. Sadly, it does not explain why the example was removed. I tried to just call the code that gets the operations ( |
Text extraction is possible, but has been moved out of this library. https://github.com/pdf-rs/pdf_render/blob/master/render/examples/trace.rs |
While there's a level where the file structure of PDF makes text editing and modification difficult to do without "rendering" it, I think it's possibly a bit absurd that there's no attempt to collect the text on a page without implementing a disconcertingly large number of operations and transforms on text. I have been struggling with figuring out a (relatively) trivial method of obtaining something resembling the text within PDFs that can handle kerning/positioning of characters without my naive "put a space between every |
It is a hugely nontrivial task. pdf_tools is one attempt at it. But one you are ouside that, no chance. |
ah! I missed |
Everything i wrote is because i have hit a specific edge case in the wild. |
If you only need to parse pdf files from one source, it is a different problem. They you can make assumptions about the file. For something general, you will find that for any assumption, someone wrote a piece of code that proves it wrong. |
Error: no example target name 'text'
The text was updated successfully, but these errors were encountered: