Extend functionality of Wandb Config Diff script #687

kyleclo · 2024-08-02T09:41:43Z

Add tests for flatten_dict() in utils
Extend functionality for flatten_dict() to also flatten any dicts that exist in Lists
Extend the wandb config comparison script to use the extended flatten_dict()

Motivation is, while comparing configs, the current implementation doesn't perform comparison of some key aspects of the configs, namely config keys representing dataset paths (which are all List[str]) as well as keys like config["evaluators.value"] which are List[Dict].

The current behavior looks something like this:

where we can see that the fields data.value.paths and evaluators.value aren't easily comparable.

The new behavior looks like this:

where it preserves behavior of original script under old keys, but performs a specialized diff just for data.paths and evaluators.

for data.paths, the idea is that we just need to know the names of the datasets from the paths + how many files used. Added new function that does that specifically.

…nt compare wandb config script to also flatten list dicts

olmo/util.py

dirkgr · 2024-08-02T14:27:21Z

scripts/compare_wandb_configs.py

+    }
+    if len(keys_with_differences) > 0:
+        for k in sorted(keys_with_differences):
+            if isinstance(left_config[k], list) and isinstance(right_config[k], list):


You also don't need this if you treat lists as Mapping[int, Any]. And it will work right even if the list entries are complex. On the other hand, the output will look different / be less compact.

Can you simplify this now that lists are properly flattened?

Actually, it seems this will never happen now?

dirkgr · 2024-10-12T00:41:17Z

olmo/util.py

+        dictionary (dict): The nested dictionary to be flattened.
+        parent_key (str, optional): The parent key to be prepended to the keys of the flattened dictionary. Defaults to "".
+        separator (str, optional): The separator to be used between the parent key and the keys of the flattened dictionary. Defaults to ".".
+        include_lists (bool, optional): Whether to convert lists to dictionaries with integer keys. Defaults to False.


Do we ever want to turn this off?

not for now, but seems fine to extend this in case someone wants to add different logic for dealing w list-valued config params?

dirkgr · 2024-10-12T00:44:22Z

scripts/compare_wandb_configs.py

+    }
+    if len(keys_with_differences) > 0:
+        for k in sorted(keys_with_differences):
+            if isinstance(left_config[k], list) and isinstance(right_config[k], list):


Can you simplify this now that lists are properly flattened?

dirkgr · 2024-10-12T00:47:43Z

scripts/compare_wandb_configs.py

+    components = path.split("/")
+
+    # Remove common suffixes like 'allenai'
+    components = [c for c in components if c != "allenai"]


You don't think that's confusing? For copy and paste?

dirkgr · 2024-10-12T00:48:09Z

scripts/compare_wandb_configs.py

+
+    return


Suggested change

return

dirkgr · 2024-10-12T00:52:50Z

scripts/compare_wandb_configs.py

+    }
+    if len(keys_with_differences) > 0:
+        for k in sorted(keys_with_differences):
+            if isinstance(left_config[k], list) and isinstance(right_config[k], list):


Actually, it seems this will never happen now?

dirkgr · 2024-10-22T21:43:03Z

You let me know when this is ready for another look?

add test for flatten dict; extend flatten dict to handle lists; augme…

027bcc3

…nt compare wandb config script to also flatten list dicts

kyleclo requested a review from dirkgr August 2, 2024 09:41

dirkgr requested changes Aug 2, 2024

View reviewed changes

kyleclo added 4 commits August 7, 2024 11:14

new flatten

792de79

Merge branch 'main' into kylel/config-diff

62c4d1d

Merge branch 'main' into kylel/config-diff

f824795

add richer diff functionality between configs

147013f

kyleclo requested a review from dirkgr October 10, 2024 22:49

kyleclo added 4 commits October 10, 2024 15:50

broken imports

85b8422

Merge branch 'main' into kylel/config-diff

8a93dc8

linting; mypy

9c89418

changelog

500e943

dirkgr requested changes Oct 12, 2024

View reviewed changes

get rid of simplify path

3b037a4

cleanup script; add more visible diff for evaluators

ed77860

kyleclo requested a review from dirkgr October 28, 2024 17:33

pylint

f2c2a15

dirkgr approved these changes Oct 29, 2024

View reviewed changes

kyleclo merged commit 837a4ff into main Oct 30, 2024
12 checks passed

kyleclo deleted the kylel/config-diff branch October 30, 2024 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend functionality of Wandb Config Diff script #687

Extend functionality of Wandb Config Diff script #687

kyleclo commented Aug 2, 2024 •

edited

Loading

dirkgr Aug 2, 2024

dirkgr Oct 12, 2024

dirkgr Oct 12, 2024

dirkgr Oct 12, 2024

kyleclo Oct 28, 2024

dirkgr Oct 12, 2024

dirkgr Oct 12, 2024

kyleclo Oct 28, 2024

dirkgr Oct 12, 2024

dirkgr Oct 12, 2024

dirkgr commented Oct 22, 2024

Extend functionality of Wandb Config Diff script #687

Extend functionality of Wandb Config Diff script #687

Conversation

kyleclo commented Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr commented Oct 22, 2024

kyleclo commented Aug 2, 2024 •

edited

Loading