Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend functionality of Wandb Config Diff script #687

Merged
merged 12 commits into from
Oct 30, 2024
Merged

Conversation

kyleclo
Copy link
Contributor

@kyleclo kyleclo commented Aug 2, 2024

  • Add tests for flatten_dict() in utils
  • Extend functionality for flatten_dict() to also flatten any dicts that exist in Lists
  • Extend the wandb config comparison script to use the extended flatten_dict()

Motivation is, while comparing configs, the current implementation doesn't perform comparison of some key aspects of the configs, namely config keys representing dataset paths (which are all List[str]) as well as keys like config["evaluators.value"] which are List[Dict].

The current behavior looks something like this:
image

where we can see that the fields data.value.paths and evaluators.value aren't easily comparable.

The new behavior looks like this:
image
where it preserves behavior of original script under old keys, but performs a specialized diff just for data.paths and evaluators.

for data.paths, the idea is that we just need to know the names of the datasets from the paths + how many files used. Added new function that does that specifically.

…nt compare wandb config script to also flatten list dicts
@kyleclo kyleclo requested a review from dirkgr August 2, 2024 09:41
olmo/util.py Outdated Show resolved Hide resolved
}
if len(keys_with_differences) > 0:
for k in sorted(keys_with_differences):
if isinstance(left_config[k], list) and isinstance(right_config[k], list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also don't need this if you treat lists as Mapping[int, Any]. And it will work right even if the list entries are complex. On the other hand, the output will look different / be less compact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you simplify this now that lists are properly flattened?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems this will never happen now?

@kyleclo kyleclo requested a review from dirkgr October 10, 2024 22:49
dictionary (dict): The nested dictionary to be flattened.
parent_key (str, optional): The parent key to be prepended to the keys of the flattened dictionary. Defaults to "".
separator (str, optional): The separator to be used between the parent key and the keys of the flattened dictionary. Defaults to ".".
include_lists (bool, optional): Whether to convert lists to dictionaries with integer keys. Defaults to False.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever want to turn this off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for now, but seems fine to extend this in case someone wants to add different logic for dealing w list-valued config params?

}
if len(keys_with_differences) > 0:
for k in sorted(keys_with_differences):
if isinstance(left_config[k], list) and isinstance(right_config[k], list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you simplify this now that lists are properly flattened?

components = path.split("/")

# Remove common suffixes like 'allenai'
components = [c for c in components if c != "allenai"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't think that's confusing? For copy and paste?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 109 to 110

return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return

}
if len(keys_with_differences) > 0:
for k in sorted(keys_with_differences):
if isinstance(left_config[k], list) and isinstance(right_config[k], list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems this will never happen now?

@dirkgr
Copy link
Member

dirkgr commented Oct 22, 2024

You let me know when this is ready for another look?

@kyleclo kyleclo requested a review from dirkgr October 28, 2024 17:33
@kyleclo kyleclo merged commit 837a4ff into main Oct 30, 2024
12 checks passed
@kyleclo kyleclo deleted the kylel/config-diff branch October 30, 2024 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants