-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Python 3 and speedup #26
Open
maage
wants to merge
20
commits into
hellman:master
Choose a base branch
from
maage:py3-numpy
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Accept 0x format (0x00) - Raise error if char is empty - Handle None in parse_char
- Also it is string.ascii_letters
Also: update charset and routine Drop routines not used
This was referenced Sep 18, 2019
Closed
@hellman bump |
I've submitted pr #28 with most of minor issues as there is conflicts. |
Speedup is nice! Would be nice to detect numpy before using it, because having numpy as a dependency seems an overkill. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I had problems with xortool to handle all files in test/data and also I wanted to use Python 3.
After initial fixes, I noticed it was somewhat slow and limited how big files it could handle.
This PR is draft as I've dropped some features. And changed some outputs. Like output true byte repr (b'secret') instead of trying to beautify it. It is handy to paste from there if you need it in your python code. Also some output messages are removed or changed.
To achieve better performance and also fixed some bugs, I've used several tools. First I've cleaned up code to remove issues linters usually warn about (naming and docstrings warnings were mostly ignored). Second I've tried to use python3 functions if there were suitable. This should remove some bugs and there is less generic code to maintain. Third I've used bytes as internal representation. Strings are only used when ever they are needed. Fouth is tech choice to use numpy. This xoring is good target for matrix operations. Also it should keep memory usage somewhat in check. Fifth I've use generators to lower memory usage. xortool writes keys and files as it discovers new keys. Sixth, I've tried to limit branching factor as xortool can get stuck with totally random files with multiple numbers of max occurence top charactes. This is not that carefully tuned out so you might want to check it. You still can end up filling your disk up.
I can currently discover a key for over 100 MiB file with known keylength under 2 minutes of CPU time and about 1.1 GiB memory. It takes couple of minutes more to write 25 GiB files out.
To compare original 'xortool -b -l 65 test/data/ls_xored' takes more time than that. And my version has the results within couple of seconds.
Because my algorithm is subtly different you can get different keys compared to old. when there is no one winning key. Mostly this is because of branching factor limitation.
I've tested this only on Linux (Fedora 30) with:
python3-3.7.4-1.fc30.x86_64
python3-numpy-1.16.4-2.fc30.x86_64