Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job: Sanitize instead of validate the job names #16

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Sep 25, 2024

The job names may come from many different sources and it is too much of a foot-gun to reject the names by panicing.

Instead to keep the names within a nice constraint set of characters and length, sanitize the names silently.

Related PR that had to work around the panics:
cilium/cilium#35031

@joamaki joamaki requested a review from a team as a code owner September 25, 2024 11:23
@joamaki joamaki requested review from bimmlerd and removed request for a team September 25, 2024 11:23
if mangled || len(runes) > maxNameLength {
// Name was mangled or is too long, truncate and append hash.
const hashLen = 10
hash := fmt.Sprintf("%x", sha256.Sum256([]byte(name)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to avoid some "bad" words here, c. f. the cilium/cilium fix in cilium/cilium#35031 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you referring to? What's a bad word?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job/job.go Show resolved Hide resolved
job/job.go Show resolved Hide resolved
job/job.go Show resolved Hide resolved
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a non-ascii test case would give some more confidence, unicode and length calculations are always a bit iffy
e.g. I'm not sure è and e` are one rune or not, but in our case they'd be replaced by either one or two underscores 🤷

if mangled || len(runes) > maxNameLength {
// Name was mangled or is too long, truncate and append hash.
const hashLen = 10
hash := fmt.Sprintf("%x", sha256.Sum256([]byte(name)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
name: "long name",
in: strings.Repeat("a", maxNameLength*2),
out: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-c2a908d98f",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly worth adding some non-ascii unicode char tests

job/job.go Show resolved Hide resolved
@ovidiutirla ovidiutirla self-requested a review September 26, 2024 07:44
Copy link
Contributor

@ovidiutirla ovidiutirla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Do we plan to triage the mangled names somehow? If we are not logging anything how do you think we could find all the occurrences?

job/job.go Outdated
case r >= '0' && r <= '9':
case r == '-' || r == '_':
default:
// Replace unallowed characters with underscores.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about replacing only space with _ and simply ignore the other characters?

in:   "X$%^&Y",
out:  "X____Y-4af3ba2ac8",

I don't really see much value in just having lots of _ appended together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point, I'll do that!

runes := []rune(name)
mangled := false
for i, r := range runes[:min(maxNameLength, len(name))] {
switch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing with the previous regex, now we are allowing names to start with numbers and contain only numbers, though it's kind-a weird but acceptable. no action needed for this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the implementation was much simpler this way so though that might as well allow these.

job/job.go Outdated Show resolved Hide resolved
hash := fmt.Sprintf("%x", sha256.Sum256([]byte(name)))
newLen := min(maxNameLength-hashLen, len(runes))
runes = runes[:newLen]
return string(runes) + "-" + hash[:hashLen]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could potentially prefix with something to highlight that this name is mangled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suffix is enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, the suffix is not highlighting that the name was mangled because the name is invalid, I was thinking to something like **-mangled-HASH

The job names may come from many different sources and
it is too much of a foot-gun to reject the names by panicing.

Instead to keep the names within a nice constraint set of
characters and length, sanitize the names silently.

Related PR that had to work around the panics:
cilium/cilium#35031

Signed-off-by: Jussi Maki <[email protected]>
@joamaki
Copy link
Contributor Author

joamaki commented Sep 26, 2024

LGTM

Do we plan to triage the mangled names somehow? If we are not logging anything how do you think we could find all the occurrences?

No. We'll fix them as necessary. Right now we're not really caring about the job names anywhere.

@joamaki joamaki merged commit aa37668 into main Sep 26, 2024
1 check passed
@joamaki joamaki deleted the pr/joamaki/sanitize-job-name branch September 26, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants