List of changes between versions
From version 0.6 onward python
and pip
need to be installed on the system.
See more below in the Changes section.
-
Windows: https://www.python.org/downloads/windows/
- NOTE: make sure to check the box that says "Add Python to PATH" so that pip can be found by the server script without having to make any assumptions
-
Linux: Use your package manager (e.g.
sudo apt install python3 python3-pip
)
- Removed the frozen executable from the release files in favor of an Automatic1111 stile batch script
- Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
For this reason I decided to axe the PyInstaller frozen EXE all together and go with a batch script that will:- Allow user to more easily set environment variables (a few of the most relevant ones are already set as empty in the script)
- Create or reuse a virtual environment in a folder
venv
in the same directory as the script - Install the minimum required packages in it to run the server
- Run the server
- Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
- Added a plugin manager to install/uninstall plugins on demand
-
The installed plugins can be controlled via the new version of the firefox extension or directly using the
manage_plugins/
endpoint. -
The plugins will by be installed under
$OCT_BASE_DIR/plugins
which by default will be under your user profile (e.g.C:\Users\username\.ocr_translate
on windows).
If you have trouble with space underC:\
consider setting theOCT_BASE_DIR
environment variable to a different location. -
The plugin data is stored in a JSON file inside the project plugins_data.json
-
Version/Scope/Extras of a package to be installed can be controlled via environment variables
OCT_PKG_<package_name(uppercase)>_[VERSION|SCOPE|EXTRAS]
(eg to change torch to version A.B.C you would set
OCT_PKG_TORCH_VERSION="A.B.C"
). If the package name contains a-
it should be replaced with_min_
in the package name -
Removed env variable
AUTOCREATE_VALIDATED_MODELS
and relative server initialization. Now models are created/activated or deactivated via the plugin manager, when the respective plugin is installed/uninstalled.
-
- Streamlined docker image to also use the
run_server.py
script for initialization. - Added plugin for
ollama
(https://github.com/ollama/ollama) for translation using LLMs- Note ollama needs to be run/installed separately and the plugin will just make calls to the server.
- Use the
OCT_OLLAMA_ENDPOINT
environment variable to specify the endpoint of the ollama server (see the plugin page for more details)
- Added plugin for
PaddleOCR
(https://github.com/PaddlePaddle/PaddleOCR) (Box and OCR) (seems to work very well with chinese).- The default versions installed by the
plugin_manager
ofpaddlepaddle
(2.5.2
on linux and2.6.1
on windows) might not work for every system as there can be underlying failures in the C++ code that the plugin uses. The version installed can be controlled using the environment variableOCT_PKG_PADDLEPADDLE_VERSION
.
- The default versions installed by the
- Added possibility to specify extra
DJANGO_ALLOWED_HOSTS
and a server bind address via environment variables. (Fixes #30) - Manual model is not implemented as an entrypoint anymore (will work also without recreating models).
- OCR models can now use a
tokenizer
and aprocessor
from different models. - Added caching of the languages and allowed box/ocr/tsl models for faster response times on the handshake endpoint.
- New endpoint
run_tsl_xua
made to work withXUnity.AutoTranslator
(https://github.com/bbepis/XUnity.AutoTranslator) - Improved API return codes
- Implemented endpoint for manual translation
- Added autocorrect capability to Trie
- Added endpoint for sending allowed options given the loaded models
- Improved admin interface to allow users to more easily add models to the database
- Changed handshake endpoint behavior to send more information required by the extension
- Improved run_server script for better modularity and reporting
- Minor fixes
Now it is possible to use OCR models that work on a single line.
Before the pipeline would pass the entire BOX to the OCR model which would make model trained on single line spit out nonsensical results.
Now model can be created with ocr_mode
sto to merged
[default] or single
.
If set to single the non-merged bounding boxes will be passed and the model will afterward stich the text back together by reasonably ordering the Boxes by line/column chunks.
- Modified the API for the
OCRBoxModel._box_detection
should now return a list of dictionaries containing'merged: tuple[int, int, int, int]
the merged bounding box and'single': list[tuple[int, int, int, int]]
a list of single bounding boxes that has been merged intomerged
. - Modified the database models:
OCRModel
: Addedocr_mode
field with possible values:merged
[default]single
.BBox
: Foreign keyfrom_ocr
renamed tofrom_ocr_merged
BBox
: Added foreign keyfrom_ocr_single
BBox
: Added foreign keyto_merged
(point to the mergedBBox
generated by merging THIS + other boxes)OCRRun
: Foreign keyresult
renamed toresult_merged
(denote the output was from a merged real/mock run)OCRRun
: Added foreign keyresult_single
(denote the output was from a single run)
- Fixed a bug related to Issue #11 where the
%userprofile%/.ocr_translate
folder was not being properly created by the EXE release if it did not exists.
restore_missing_spaces with no trie (None for that language) was causing exceptions. Now the server will skip this step if the trie for the selected language is not found.
Removed runaway print statements
- All feature for box/ocr/tsl have been moved to plugins in separate packages
- Improved pre-parsing of OCRed text for languages with latin alphabet
- Introduced a way to remove ghost carachter generated at the begin/end of every string
- Introduced Trie capability
- Can use trie to detect if an incorrect work ("helloworld") should be split into multiple valid words (["hello", "world"])
- Added English word list/freq file.
Plugins can now be used to also add models to the database via the following entrypoints:
ocr_translate.box_data
ocr_translate.ocr_data
ocr_translate.tsl_data
The entrypoint should point to a dict
with the info to create the model.
See init of plugins for example (care that box/ocr/tsl may need to define different keys).
Information about model-specific language codes is now encoded into an iso1_map
field of the model.
- Before new models with custom codes in a plugin would require to also edit the main repo and adding a new column to languages in the database.
- Now the plugin can set the
lang_code
to whatever is closest to the model codes, and overwrite what does not match usingiso1_map
, by mapping iso-639-1 codes to the model-specific ones.
Tag only without release as the changes still requires plugins to be baked in with the installer (they cannot be dynamically added without an hack-ish solution).
Restructured the code to make it pluginable. No change should be noticeable from a user experience point of view, but now it should be much easier to contribute to the code (new functionalities can be introduced by writing a plugin without having to modify this codebase).
- The models entries in the database now requires an
entrypoint
field to identify which model should be used to load it. - The functionality related to
easyocr
,tesseract
andhugginface
models have been moved to theocr_translate/plugins
folder, and are now plugins (kept in the main codebase to leave an example on how a plugin can work).