You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is the use case. I'd like to index custom types (by extending Tika). Those files can have different extensions (it doesn't matter). Also extensions sometime can vary, like htm and html that are included separately as of now. Also I'd like to index metadata from photos, so jpg vs jpeg is next that comes to mind.
I suggest to check content-type (or something) returned by Tika instead of supportsFile and, probably, ignore it if it is application/octet-stream.
The text was updated successfully, but these errors were encountered:
This would introduce a heavy load on the IO system. We would also have to check compressed files, which might also include compressed files and so on. This introduces another problem: opening a file from a compressed file from within the search results. There is no cross-platform solution available to implement this.
This can be an option. I'm not sure if enabling that per folder is an overkill, but certainly doable as a global option if I really want to index everything only in a certain folder and I don't need anything else.
I would just skip compressed files… at least for now and report those as such. I'm not 100% sure but on top of my head, Tika can consider single level containers, i.e. not stuff in a zip inside of another zip.
Here is the use case. I'd like to index custom types (by extending Tika). Those files can have different extensions (it doesn't matter). Also extensions sometime can vary, like htm and html that are included separately as of now. Also I'd like to index metadata from photos, so jpg vs jpeg is next that comes to mind.
I suggest to check content-type (or something) returned by Tika instead of supportsFile and, probably, ignore it if it is application/octet-stream.
The text was updated successfully, but these errors were encountered: