Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check-manifest became super slow #141

Open
jugmac00 opened this issue Apr 7, 2021 · 7 comments
Open

check-manifest became super slow #141

jugmac00 opened this issue Apr 7, 2021 · 7 comments

Comments

@jugmac00
Copy link
Contributor

jugmac00 commented Apr 7, 2021

For e.g. https://github.com/jugmac00/flask-reuploaded (but also for my other projects), I notice an increased runtime for the checks.

The biggest jump was from version 0.40 to 0.41, but it looks like it is getting slower with each release...

I do run check-manifest via pre-commit, but I could also reproduce the problem with a standalone version of check-manifest.

I only had a quick look and looks like the pyproject.toml support could have introduced the degraded performance. The runtime difference between 0.45 and 0.46 can be hardly explained by the changelog.

no check-manifest

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
bandit...................................................................Passed

real    0m1,113s
user    0m1,760s
sys     0m0,245s



0.40

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m1,577s


0.41

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m4,366s
user    0m4,471s
sys     0m0,511s


0.42

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m4,401s
user    0m4,557s
sys     0m0,456s

0.43

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m4,743s
user    0m4,808s
sys     0m0,503s

0.44

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

0.45

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m8,464s
user    0m8,068s
sys     0m0,878s

0.46

❯ time pre-commit run --all-files
isort....................................................................Passed
Debug Statements (Python)................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
check-python-versions....................................................Passed
check-manifest...........................................................Passed
bandit...................................................................Passed

real    0m10,674s
user    0m10,093s
sys     0m1,118s

tests without pre-commit

0.40

❯ time check-manifest 
lists of files in version control and sdist match

real	0m0,440s
user	0m0,388s
sys	0m0,055s

0.41

❯ time check-manifest 
lists of files in version control and sdist match

real	0m3,311s
user	0m2,848s
sys	0m0,245s

0.45

❯ time check-manifest 
lists of files in version control and sdist match

real	0m9,664s
user	0m8,324s
sys	0m0,812s

0.46

❯ time check-manifest 
lists of files in version control and sdist match

real	0m10,041s
user	0m8,671s
sys	0m0,875s

So, the changelog maybe is not really complete - as even without using pre-commit there is a performance degradation between 0.45 and 0.46 which only changelog entry says pre-commit now uses Python 3.

@mgedmin
Copy link
Owner

mgedmin commented Apr 7, 2021

I don't know what to say. All other changes in 0.46 were trivial cleanups/refactorings and the switch from Travis CI to GitHub Actions.

I wonder if it would be useful to have check-manifest print the times it took to build the sdists in verbose mode, but of course that wouldn't help the comparisons.

I suppose you could use strace with timings enabled...

@jugmac00
Copy link
Contributor Author

jugmac00 commented Apr 8, 2021

Thanks for your reply!

I never used strace before, but this looks not so good :-/

22:45:00 read(4, "lists of files in version contro"..., 4096) = 50
22:45:09 read(4, 0x1848090, 4096)       = -1 EIO (Input/output error)

Then I learned about the new option --no-build-isolation ( https://github.com/mgedmin/check-manifest#version-control-integration )

Then it looks like

22:52:28 read(4, "lists of files in version contro"..., 4096) = 50
22:52:30 read(4, 0x1512db0, 4096)       = -1 EIO (Input/output error)

Way better!

You can reproduce this with https://github.com/jugmac00/flask-reuploaded

Actually, I am happy again with check-manifest taking 2 s, but I am pretty curious what you could tell me about the above strace outputs.

@mgedmin
Copy link
Owner

mgedmin commented Apr 9, 2021

Huh, it looks like you're stracing pre-commit rather than check-manifest. I was hoping to see the clone()/execve()/wait() from check-manifest invoking the actual sdist build thing (python setup.py sdist or the pep-517 thing I forget what it's called).

If you pass -f to strace, it'll trace all the child processes. I have a script to make it easier figure out which pid maps to which process: https://pypi.org/project/strace-process-tree/

Anyway, it looks like you've identified that the main slowdown is due to using isolated PEP-517 builds. And as for 0.45 -> 0.46 regression, is there any chance it can be explained by measurement errors (due to CPU frequency scaling or something)?

@mgedmin
Copy link
Owner

mgedmin commented Apr 9, 2021

I am pretty curious what you could tell me about the above strace outputs.

You see pre-commit reading check-manifest's output from a pipe (fd 4). The timestamp indicates when the read() syscall started, you might want to also pass -T to strace to see how long each syscall took to run.

I cannot explain why the next read returns EIO rather than 0 (for EOF) or EAGAIN (in case pre-commit uses nonblocking I/O). Were there any syscalls between the two read() calls? Are you on Linux?

@jugmac00
Copy link
Contributor Author

jugmac00 commented Apr 14, 2021

Hi Marius,
thanks for taking the time to give me a quick introduction to strace.

I repeated the strace with -f directly for check-manifest which gave me 60k lines of output oO
https://gist.github.com/jugmac00/ee8068afa2f30be57de44401e7ab474e

I lack both the knowledge and the time to get anything useful out of the output.

As I found a working solution ( --no-build-isolation), I came here to close the issue, as I do not want you to dig deeper, as for reasons I seem to be the only one affected.

Just when I wanted to close the issue, I saw that someone else linked to this issue - see above my comment.

Up to you what you do with this issue - if you want me to do perform any tests, just say so, but I am afraid I do not have the time to learn to interpret the output of strace currently.

@mgedmin
Copy link
Owner

mgedmin commented Apr 14, 2021

Heh, yes, strace files are something. I wrote strace-process-tree so I wouldn't have to interpret them myself.

Now strace-process-tree doesn't like this format, but after some careful massaging with vim to have a pid on every line and avoid 'strace: Process NNN attached' messages breaking clone(... lines in half, I get:

8550 check-manifest [11.2s @0.0s]
  ├─8551 /bin/sh -c "uname -p 2> /dev/null" [0.0s @0.7s]
  │   └─8552 uname -p [0.0s @0.7s]
  ├─8553 git ls-files -z [0.0s @0.8s]
  ├─8554 /home/jugmac00/.local/pipx/venvs... -m build --sdist . --outdir /tmp/check-manifest-ol8rlbqi-sdi... [5.2s @0.9s]
  │   ├─8555 /home/jugmac00/.local/pipx/venvs... -m pip install --prefix /tmp/build-env-qwm98rzv --ignore-installed --no-warn-script-location -r /tmp/build-reqs-blpa_v85.txt [3.8s @1.3s]
  │   │   ├─8556 /sbin/ldconfig -p [@2.3s]
  │   │   ├─8556 /sbin/ldconfig.real -p [0.0s @2.3s]
  │   │   ├─8557 lsb_release -a [0.3s @2.4s]
  │   │   │   └─8558 dpkg-query -f '${Version} ${Provides}
  │   │   │             ' -W lsb-core lsb-cxx lsb-graphics lsb-desktop lsb-languages lsb-multimedia lsb-printing lsb-security [0.1s @2.6s]
  │   │   ├─8559 uname -rs [0.0s @2.7s]
  │   │   └─8560 /bin/sh -c "uname -p 2> /dev/null" [0.0s @2.7s]
  │   │       └─8561 uname -p [0.0s @2.7s]
  │   └─8564 /tmp/build-env-qwm98rzv/bin/pyth... /home/jugmac00/.local/pipx/venvs... build_sdist /tmp/tmpbr_cffjf [0.8s @5.1s]
  │       └─8565 /bin/sh -c "uname -p 2> /dev/null" [0.0s @5.5s]
  │           └─8566 uname -p [0.0s @5.5s]
  └─8567 /home/jugmac00/.local/pipx/venvs... -m build --sdist . --outdir /tmp/check-manifest-km_njyqc-sdi... [5.0s @6.1s]
      ├─8568 /home/jugmac00/.local/pipx/venvs... -m pip install --prefix /tmp/build-env-250p6syu --ignore-installed --no-warn-script-location -r /tmp/build-reqs-es7qklvl.txt [3.5s @6.5s]
      │   ├─8569 /sbin/ldconfig -p [@7.4s]
      │   ├─8569 /sbin/ldconfig.real -p [0.0s @7.4s]
      │   ├─8570 lsb_release -a [0.2s @7.5s]
      │   │   └─8571 dpkg-query -f '${Version} ${Provides}
      │   │             ' -W lsb-core lsb-cxx lsb-graphics lsb-desktop lsb-languages lsb-multimedia lsb-printing lsb-security [0.1s @7.7s]
      │   ├─8572 uname -rs [0.0s @7.7s]
      │   └─8573 /bin/sh -c "uname -p 2> /dev/null" [0.0s @7.8s]
      │       └─8574 uname -p [0.0s @7.8s]
      └─8575 /tmp/build-env-250p6syu/bin/pyth... /home/jugmac00/.local/pipx/venvs... build_sdist /tmp/tmp24jy2wxz [1.0s @10.0s]
          └─8576 /bin/sh -c "uname -p 2> /dev/null" [0.0s @10.5s]
              └─8577 uname -p [0.0s @10.5s]

As you can see, there are two python -m build --sdist ... invocations that take 5.2 seconds and 5.0 seconds respectively, and then there's a second of overhead in check-manifest itself, most of it at the very beginning, before even the git ls-files invocation.

That second of overhead is something that I would be interested in optimizing, if I had the time.

The two python -m build --sdist invocations could be done in parallel, which might speed things up (assuming enough CPU cores and no fighting over I/O bandwidth). If I had time.

@tpvasconcelos
Copy link

Unfortunately, I don't have the time to look into this right now but wanted to say that I have also experienced this same issue. As you can see below I am getting similar results to @jugmac00 for version 0.47

$ time check-manifest --version
check-manifest version 0.47
check-manifest --version  0.17s user 0.06s system 98% cpu 0.235 total

$ time check-manifest
lists of files in version control and sdist match
check-manifest  8.31s user 2.35s system 92% cpu 11.560 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants