Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Python 3 and speedup #26

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

RFC: Python 3 and speedup #26

wants to merge 20 commits into from

Conversation

maage
Copy link

@maage maage commented Sep 18, 2019

I had problems with xortool to handle all files in test/data and also I wanted to use Python 3.

After initial fixes, I noticed it was somewhat slow and limited how big files it could handle.

This PR is draft as I've dropped some features. And changed some outputs. Like output true byte repr (b'secret') instead of trying to beautify it. It is handy to paste from there if you need it in your python code. Also some output messages are removed or changed.

To achieve better performance and also fixed some bugs, I've used several tools. First I've cleaned up code to remove issues linters usually warn about (naming and docstrings warnings were mostly ignored). Second I've tried to use python3 functions if there were suitable. This should remove some bugs and there is less generic code to maintain. Third I've used bytes as internal representation. Strings are only used when ever they are needed. Fouth is tech choice to use numpy. This xoring is good target for matrix operations. Also it should keep memory usage somewhat in check. Fifth I've use generators to lower memory usage. xortool writes keys and files as it discovers new keys. Sixth, I've tried to limit branching factor as xortool can get stuck with totally random files with multiple numbers of max occurence top charactes. This is not that carefully tuned out so you might want to check it. You still can end up filling your disk up.

I can currently discover a key for over 100 MiB file with known keylength under 2 minutes of CPU time and about 1.1 GiB memory. It takes couple of minutes more to write 25 GiB files out.

To compare original 'xortool -b -l 65 test/data/ls_xored' takes more time than that. And my version has the results within couple of seconds.

Because my algorithm is subtly different you can get different keys compared to old. when there is no one winning key. Mostly this is because of branching factor limitation.

I've tested this only on Linux (Fedora 30) with:
python3-3.7.4-1.fc30.x86_64
python3-numpy-1.16.4-2.fc30.x86_64

@noraj
Copy link

noraj commented Oct 24, 2019

@hellman bump

@hellman hellman marked this pull request as ready for review October 25, 2019 07:38
@maage
Copy link
Author

maage commented Nov 2, 2019

I've submitted pr #28 with most of minor issues as there is conflicts.
I have this numpy patch, but it is against #28 and is mainly not changed from here. I've just tried to move away all small and unrelated pieces from numpy stuff.
Latest numpy patch

@hellman
Copy link
Owner

hellman commented Jan 9, 2020

Speedup is nice! Would be nice to detect numpy before using it, because having numpy as a dependency seems an overkill.

@bee-san bee-san mentioned this pull request Jun 12, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants