Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to process files in batches #6312

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

carlos-granados
Copy link
Contributor

If you are running Rector in CI or if you have several machines available to run it and you have a large code base, you may want to split processing of all your files in several batches, each of which can be run in a different machine. Since Rector processes each file individually and independently of all other files we can easily split the list of files to process.

This PR implements two new command line options batch-index and batch-total that allow you to split the Rector run in several batches. These batches are expected to be run in parallel, not consecutevily.

As a demonstration of how this could work, this PR temporarily adds a new rule to the rector configuration and runs it both as a single run and in parallel batches in CI. The results are similar to this:

Screenshot 2024-09-18 at 11 50 03

As you can see, the total run in a single batch is 1m06s while the run of the batches is between 21s and 41s, so you can save ~25s of time by running this in batches. Probably not worth it for this case but if you have a very large code base where Rector takes 4m to run, you could easily split it in 4 runs of 1:30 minutes or so, saving a good amount of time on the total run

As you can see in the example run, the issues found in the single run match the sum of the issues found in the individual runs

@staabm
Copy link
Contributor

staabm commented Sep 18, 2024

Is this command meant for dry-runs only?

I wonder whether the results will be consistent in case the actions modify the sources and commit them? Would it need rebase and stuff to fast forward etc?

@carlos-granados
Copy link
Contributor Author

@staabm given that every batch would be working on different files, there should be no conflict between the commits. The only difference with a single run is that you would have several commits instead of a single one

@carlos-granados
Copy link
Contributor Author

@staabm to test running in no dry-run mode in batches, I created a new branch and added a commit action. I had to use the EndBug/add-and-commit action because the stefanzweifel/git-auto-commit-action action used by Rector does not support pulling before pushing and so is unable to work in parallel.
You can see the run here https://github.com/carlos-granados/rector-src/actions/runs/10930549614
And the added commits here: https://github.com/carlos-granados/rector-src/commits/test-batch-commits/

@carlos-granados
Copy link
Contributor Author

@TomasVotruba I just rebased this branch. Any interest in this code? Just wanted to know if I need to keep updating it. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants