Add ability to request specific dimension data. #35
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WORK IN PROGRESS
Description
This PR adds the ability to request specific dimension data. For example, given this call to
get_data()
(which has the renameddate_range
andgroup_by
parameters), note the fields specified for thegroup_by
andfilters
parameters:The
df
variable will be assigned a data frame like the following (which is in the long format from #32), noting the extra columns for ACCESS ID, ORCID, and Globus ID:access-example1
1234-5678-1234-5678
globus-example1
access-example2
1234-5678-1234-5679
globus-example2
If the
group_by
parameter is just given a single string, e.g.:Then the only dimension columns in the data frame will be ID and Label.
Similarly, if the
filters
parameter does not have a field specified, e.g.:Then the Label field will be used for filtering. Otherwise, it will use the specified field for filtering, as in the first example above.
Otherwise,
group_by
can take a dictionary with a single key where the key is the dimension's ID or label. The value can be a collection (as in the first example above) or a single string, e.g.:In which case the only dimension columns in the data frame will be ID and ORCID.
The data frame returned by the
get_dimension_metadata()
method will now include a column listing the additional dimension fields that can be used for grouping or filtering, e.g.:The data frame returned by the
get_dimension_data()
method will now have an additionalfields
parameter that will allow specifying which fields to include in the resulting data frame, as in:The resulting
df
will have this structure:access-example1
1234-5678-1234-5678
access-example2
1234-5678-1234-5679
If the
fields
parameter is not given, then only the ID and Label fields will be included.Motivation and Context
Since some entities have multiple IDs associated with them depending on the context, this PR enables the Data Analytics Framework to make it easier to work with such entities.
Tests performed
Types of changes
Checklist:
CHANGELOG.md
has been updateddocs/developing.md
) produces no errorsxdmod-notebooks
repository as necessary, and the notebooks all run successfully