You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
phileas-benchmark results show that email address detection is more CPU intensive (and requires more memory & stack space) than other regex-based filters.
The current regex is known to be pretty intense -- so it might make sense to have a "relaxed" version that performs better without trading off too much accuracy?
The text was updated successfully, but these errors were encountered:
@jzonthemtn I'm looking at a few regex variations that show better performance, but I need to do some more testing to see how accuracy is affected in the data I have available.
One interesting bit though -- the email address filter currently does not use the \b...\b fencing that many of the regex-based filters use. Wrapping the current email address regex in \b...\b roughly doubles performance on its own. I think that makes sense since it reduces how greedy some of those matches will be.
👆 Since we're also discussing use of \b from a confidence standpoint (in #120), I thought this was kinda neat to see how much the \b...\b fencing plays into performance too.
* 131 Adding option to email filter for just email addresses with valid TLDs.
* #131 Adding property to docs.
* #121 Adding strict email option to docs.
phileas-benchmark results show that email address detection is more CPU intensive (and requires more memory & stack space) than other regex-based filters.
The current regex is known to be pretty intense -- so it might make sense to have a "relaxed" version that performs better without trading off too much accuracy?
The text was updated successfully, but these errors were encountered: