Refactor entity matching name cleaner to be more efficient #3953
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
As part of the SEC to EIA record linkage development, I had to make some changes to the PUDL company name cleaning module to make it more efficient and useful. The code for this module was originally pulled OS Climate's repo, but it was no longer maintained there. I didn't make significant changes when I pulled out that module, and thus it had some quirks and inefficiencies.
What problem does this address?
What did you change?
apply
to apply the regex replacement rules, I usedpd.Series.replace
so that this replacement is vectorized.CompanyNameCleaner
classDocumentation
Make sure to update relevant aspects of the documentation.
Tasks
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list