Add the possibility to process files in batches #6312

carlos-granados · 2024-09-18T10:06:44Z

If you are running Rector in CI or if you have several machines available to run it and you have a large code base, you may want to split processing of all your files in several batches, each of which can be run in a different machine. Since Rector processes each file individually and independently of all other files we can easily split the list of files to process.

This PR implements two new command line options batch-index and batch-total that allow you to split the Rector run in several batches. These batches are expected to be run in parallel, not consecutevily.

As a demonstration of how this could work, this PR temporarily adds a new rule to the rector configuration and runs it both as a single run and in parallel batches in CI. The results are similar to this:

As you can see, the total run in a single batch is 1m06s while the run of the batches is between 21s and 41s, so you can save ~25s of time by running this in batches. Probably not worth it for this case but if you have a very large code base where Rector takes 4m to run, you could easily split it in 4 runs of 1:30 minutes or so, saving a good amount of time on the total run

As you can see in the example run, the issues found in the single run match the sum of the issues found in the individual runs

staabm · 2024-09-18T18:39:39Z

Is this command meant for dry-runs only?

I wonder whether the results will be consistent in case the actions modify the sources and commit them? Would it need rebase and stuff to fast forward etc?

carlos-granados · 2024-09-18T19:38:15Z

@staabm given that every batch would be working on different files, there should be no conflict between the commits. The only difference with a single run is that you would have several commits instead of a single one

carlos-granados · 2024-09-18T21:47:13Z

@staabm to test running in no dry-run mode in batches, I created a new branch and added a commit action. I had to use the EndBug/add-and-commit action because the stefanzweifel/git-auto-commit-action action used by Rector does not support pulling before pushing and so is unable to work in parallel.
You can see the run here https://github.com/carlos-granados/rector-src/actions/runs/10930549614
And the added commits here: https://github.com/carlos-granados/rector-src/commits/test-batch-commits/

carlos-granados · 2024-11-06T15:55:10Z

@TomasVotruba I just rebased this branch. Any interest in this code? Just wanted to know if I need to keep updating it. Cheers!

Add the possibility to process files in batches

3b272dd

carlos-granados force-pushed the batch-processing branch from 404775e to 3b272dd Compare November 6, 2024 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the possibility to process files in batches #6312

Add the possibility to process files in batches #6312

carlos-granados commented Sep 18, 2024

staabm commented Sep 18, 2024 •

edited

Loading

carlos-granados commented Sep 18, 2024

carlos-granados commented Sep 18, 2024

carlos-granados commented Nov 6, 2024

Add the possibility to process files in batches #6312

Are you sure you want to change the base?

Add the possibility to process files in batches #6312

Conversation

carlos-granados commented Sep 18, 2024

staabm commented Sep 18, 2024 • edited Loading

carlos-granados commented Sep 18, 2024

carlos-granados commented Sep 18, 2024

carlos-granados commented Nov 6, 2024

staabm commented Sep 18, 2024 •

edited

Loading