-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job: Sanitize instead of validate the job names #16
Conversation
if mangled || len(runes) > maxNameLength { | ||
// Name was mangled or is too long, truncate and append hash. | ||
const hashLen = 10 | ||
hash := fmt.Sprintf("%x", sha256.Sum256([]byte(name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to avoid some "bad" words here, c. f. the cilium/cilium fix in cilium/cilium#35031 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you referring to? What's a bad word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about https://github.com/cilium/cilium/pull/35031/files#diff-383201c3e15adcbd860745a3ce7762a5bcc54e5e94a9d7e57283abd20358f708R35-R55, but I'm not sure it's super relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a non-ascii test case would give some more confidence, unicode and length calculations are always a bit iffy
e.g. I'm not sure è and e` are one rune or not, but in our case they'd be replaced by either one or two underscores 🤷
if mangled || len(runes) > maxNameLength { | ||
// Name was mangled or is too long, truncate and append hash. | ||
const hashLen = 10 | ||
hash := fmt.Sprintf("%x", sha256.Sum256([]byte(name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about https://github.com/cilium/cilium/pull/35031/files#diff-383201c3e15adcbd860745a3ce7762a5bcc54e5e94a9d7e57283abd20358f708R35-R55, but I'm not sure it's super relevant
{ | ||
name: "long name", | ||
in: strings.Repeat("a", maxNameLength*2), | ||
out: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-c2a908d98f", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly worth adding some non-ascii unicode char tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Do we plan to triage the mangled names somehow? If we are not logging anything how do you think we could find all the occurrences?
job/job.go
Outdated
case r >= '0' && r <= '9': | ||
case r == '-' || r == '_': | ||
default: | ||
// Replace unallowed characters with underscores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about replacing only space
with _
and simply ignore the other characters?
in: "X$%^&Y",
out: "X____Y-4af3ba2ac8",
I don't really see much value in just having lots of _
appended together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point, I'll do that!
runes := []rune(name) | ||
mangled := false | ||
for i, r := range runes[:min(maxNameLength, len(name))] { | ||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing with the previous regex, now we are allowing names to start with numbers and contain only numbers, though it's kind-a weird but acceptable. no action needed for this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the implementation was much simpler this way so though that might as well allow these.
hash := fmt.Sprintf("%x", sha256.Sum256([]byte(name))) | ||
newLen := min(maxNameLength-hashLen, len(runes)) | ||
runes = runes[:newLen] | ||
return string(runes) + "-" + hash[:hashLen] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could potentially prefix with something to highlight that this name is mangled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the suffix is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, the suffix is not highlighting that the name was mangled because the name is invalid, I was thinking to something like **-mangled-HASH
The job names may come from many different sources and it is too much of a foot-gun to reject the names by panicing. Instead to keep the names within a nice constraint set of characters and length, sanitize the names silently. Related PR that had to work around the panics: cilium/cilium#35031 Signed-off-by: Jussi Maki <[email protected]>
2a126de
to
416460f
Compare
No. We'll fix them as necessary. Right now we're not really caring about the job names anywhere. |
The job names may come from many different sources and it is too much of a foot-gun to reject the names by panicing.
Instead to keep the names within a nice constraint set of characters and length, sanitize the names silently.
Related PR that had to work around the panics:
cilium/cilium#35031