Add the possibility to process files in batches #6312
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If you are running Rector in CI or if you have several machines available to run it and you have a large code base, you may want to split processing of all your files in several batches, each of which can be run in a different machine. Since Rector processes each file individually and independently of all other files we can easily split the list of files to process.
This PR implements two new command line options
batch-index
andbatch-total
that allow you to split the Rector run in several batches. These batches are expected to be run in parallel, not consecutevily.As a demonstration of how this could work, this PR temporarily adds a new rule to the rector configuration and runs it both as a single run and in parallel batches in CI. The results are similar to this:
As you can see, the total run in a single batch is 1m06s while the run of the batches is between 21s and 41s, so you can save ~25s of time by running this in batches. Probably not worth it for this case but if you have a very large code base where Rector takes 4m to run, you could easily split it in 4 runs of 1:30 minutes or so, saving a good amount of time on the total run
As you can see in the example run, the issues found in the single run match the sum of the issues found in the individual runs