-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve confidence estimation for credit card numbers #120
Comments
The first case turns out to be easy to solve with
With
👆 no changes to Phileas required to solve this first part |
That is really awesome. Do you think it be beneficial to include that ignored pattern as an option in the filter profile just to keep the user from having to set it manually? There could be a boolean on |
Well, I'm applying this |
I agree. Wrote #130 to capture it separate from this issue. |
The changes proposed in Sorry this turned out to be a multi-part issue, I'll try to keep things more atomic ⚛️ |
I'm using Phileas to redact logging data, and see two interesting patterns that result in false positives on credit cards.
What is interesting is that LUHN checks (while certainly helpful) do not appear to be sufficient to prevent all cases where random data can leak through. (~5% of UUID or timestamp fields may contain valid LUHNs)
The solution to the first case could be reducing credit card confidence if the matched value is in an expected range (like timestamps over the last year and 3 months into the future). I haven't done the math but seems like that's a small number of values with valid LUHN checksums to exclude if we're considering a reasonably small time range.
The solution to the second case could be reducing credit card confidence when the match is found within the context of a larger string. Confidence in phone numbers is reduced if the phone number is embedded within a larger string, and we've found this extremely helpful in eliminating false positives. It would be very helpful if credit card filtering had a similar behavior.
Unfortunately there is no obvious/easy workaround, but seems like improved confidence estimation for credit cards would be generally useful (since detecting and redacting credit cards is a universal requirement for PII engines)
The text was updated successfully, but these errors were encountered: