-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(grouping): Store grouping-method-specific metadata in GroupHashMetadata
#80534
base: master
Are you sure you want to change the base?
Conversation
346966d
to
3088a38
Compare
0dca61a
to
496a70f
Compare
3088a38
to
17feb79
Compare
496a70f
to
4e19a22
Compare
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #80534 +/- ##
=======================================
Coverage 78.42% 78.42%
=======================================
Files 7211 7211
Lines 319706 319795 +89
Branches 44004 44027 +23
=======================================
+ Hits 250719 250800 +81
- Misses 62599 62604 +5
- Partials 6388 6391 +3 |
17feb79
to
9851620
Compare
4e19a22
to
3d60849
Compare
9851620
to
9dc1640
Compare
3d60849
to
74249cb
Compare
9dc1640
to
1484bfd
Compare
74249cb
to
226bc12
Compare
1484bfd
to
15764da
Compare
226bc12
to
9e7fc6e
Compare
15764da
to
8000c32
Compare
9e7fc6e
to
5364c58
Compare
|
||
# TODO: This (and `contributing_variant` below) are typed as `Any` so that we don't have to cast | ||
# them to whatever specific subtypes of `BaseVariant` and `GroupingComponent` (respectively) | ||
# each of the helper calls below requires. Casting once, to a type retrieved from a look-up, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the way to do this as written is to type narrow within the various if functions. e.g. on the first condition:
if hash_basis == HashBasis.STACKTRACE:
if not isinstance(
contributing_component,
StacktraceGroupingComponent
| ExceptionGroupingComponent
| ChainedExceptionGroupingComponent
| ThreadsGroupingComponent,
):
raise ValueError(
"Contributing component is not a StacktraceGroupingComponent for stacktrace basis"
)
if not isinstance(contributing_variant, ComponentVariant):
raise ValueError("Contributing variant is not a ComponentVariant for stacktrace basis")
hashing_metadata = _get_stacktrace_hashing_metadata(
contributing_variant, contributing_component
)
I would argue for replacing conditional with polymorphism as an alternative, but understand that given existing grouping code, this way is easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah... something like this, probably?
class BaseHashBasis[
VariantType: BaseVariant,
ComponentType: BaseGroupingComponent[Any] | None,
MetadataType: HashingMetadata,
]:
name: str
variants: dict[str, BaseVariant]
contributing_variant: VariantType
contributing_component: ComponentType | None
def __init__(
self,
variants: dict[str, BaseVariant],
contributing_variant: VariantType,
contributing_component: ComponentType | None = None,
):
self.variants = variants
self.contributing_variant = contributing_variant
self.contributing_component = contributing_component
if contributing_component:
self.subcomponents_by_id = {
subcomponent.id: subcomponent for subcomponent in contributing_component.values
}
def get_hashing_metadata(self) -> HashingMetadata:
raise NotImplementedError()
class TemplateHashBasis(
BaseHashBasis[ComponentVariant, TemplateGroupingComponent, TemplateHashingMetadata]
):
name: str = "template"
def get_hashing_metadata(self):
metadata: TemplateHashingMetadata = {}
if self.subcomponents_by_id["filename"].values:
metadata["template_name"] = self.subcomponents_by_id["filename"].values[0]
if self.subcomponents_by_id["context-line"].values:
metadata["template_context_line"] = self.subcomponents_by_id["context-line"].values[0]
return metadata
def get_hash_basis_and_metadata(
event: Event, project: Project, variants: dict[str, BaseVariant]
) -> tuple[HashBasis, HashingMetadata]:
# Skipping some of the details below, but you get the idea
contributing_variant = ...
contributing_component = ...
hash_basis, hash_basis_type = GROUPING_METHODS_BY_DESCRIPTION[method_description]
hashing_metadata = hash_basis_type(
variants, contributing_variant, contributing_component
).get_hashing_metadata()
return hash_basis, hashing_metadata
Would the typechecker be any happier with that, though? Specifically, wouldn't I still have to type contributing_variant
and contributing_component
as Any
in order to pass them to the TemplateHashBasis
/etc constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking moreso putting formatting methods on the variant / component classes, and aiming to remove hash-basis as its own enum if possible. i don't have time right now to go into details, so please feel free to ignore, or look at the type narrowing using isinstance
. whatever's easiest and can help get this merged faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PR looks good. included one idea to fix some of the Any typing. Could in the future look to replace more of the conditional logic with methods on the various Variant
and Component
classes. I have not reviewed the snapshots, and will do so tomorrow to spot check outputs. it feels kind of weird that there's no regular tests of this, but i assume in grouping code we let snapshots cover things generally.
The problem with putting it there is there's not a nice 1-on-1 correspondence between hash basis and either of those class families. This is from one of my attempts along the way: VARIANT_AND_COMPONENT_TYPES_BY_HASH_BASIS = {
HashBasis.STACKTRACE: (
ComponentVariant,
StacktraceGroupingComponent
| ExceptionGroupingComponent
| ChainedExceptionGroupingComponent
| ThreadsGroupingComponent,
),
HashBasis.MESSAGE: (
ComponentVariant,
MessageGroupingComponent | ExceptionGroupingComponent | ChainedExceptionGroupingComponent,
),
HashBasis.FINGERPRINT: (
CustomFingerprintVariant | SaltedComponentVariant,
None,
),
HashBasis.SECURITY_VIOLATION: (
ComponentVariant,
CSPGroupingComponent
| ExpectCTGroupingComponent
| ExpectStapleGroupingComponent
| HPKPGroupingComponent,
),
HashBasis.TEMPLATE: (ComponentVariant, TemplateGroupingComponent),
HashBasis.CHECKSUM: (ChecksumVariant, None),
HashBasis.FALLBACK: (BaseVariant, None),
} You'd end up still needing conditionals, just in the various classes' methods, wouldn't you? Like, (Putting it on components has the same problem - |
9fa314a
to
d8bd096
Compare
86258ea
to
8e461b2
Compare
d8bd096
to
17b2bb9
Compare
8e461b2
to
50ad1ab
Compare
17b2bb9
to
bb4227a
Compare
50ad1ab
to
1f35e4b
Compare
bb4227a
to
6ff5709
Compare
1f35e4b
to
d3b2e51
Compare
6ff5709
to
f777dc8
Compare
d3b2e51
to
9084f15
Compare
f777dc8
to
b9e323d
Compare
9084f15
to
4cb506d
Compare
…able (#80531) This adds a new field, `hashing_metadata`, to the `GroupHashMetadata` table, to serve as a compliment to the `hash_basis` field added in #79835. Whereas that field stores the overall grouping method (stacktrace, message, custom fingerprint, etc.), this new field will store more detailed, grouping-method specific data. For example, when grouping on message, it will store whether or not the message was parameterized; when grouping on fingerprint, it will store the source of the fingerprint; when grouping on stacktrace, it will note whether the stacktrace was found in an exception or in a thread. (The full scope of the data stored for each grouping method can be found in the `XYZHashingMetadata` types added in this PR.) Code to add data to this field is included in #80534.
4cb506d
to
621c832
Compare
This is a follow-up to #80531 (which added a new
hashing_metaadata
field to theGroupHashMetadata
table), adding code to actual store data in the new field.