Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(grouping): Store grouping-method-specific metadata in GroupHashMetadata #80534

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

lobsterkatie
Copy link
Member

This is a follow-up to #80531 (which added a new hashing_metaadata field to the GroupHashMetadata table), adding code to actual store data in the new field.

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 92.70833% with 7 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/grouping/ingest/grouphash_metadata.py 91.86% 2 Missing and 5 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #80534   +/-   ##
=======================================
  Coverage   78.42%   78.42%           
=======================================
  Files        7211     7211           
  Lines      319706   319795   +89     
  Branches    44004    44027   +23     
=======================================
+ Hits       250719   250800   +81     
- Misses      62599    62604    +5     
- Partials     6388     6391    +3     

@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 17feb79 to 9851620 Compare November 11, 2024 20:17
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 9851620 to 9dc1640 Compare November 12, 2024 06:36
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 9dc1640 to 1484bfd Compare November 12, 2024 07:06
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 1484bfd to 15764da Compare November 14, 2024 01:19
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 15764da to 8000c32 Compare November 14, 2024 01:46

# TODO: This (and `contributing_variant` below) are typed as `Any` so that we don't have to cast
# them to whatever specific subtypes of `BaseVariant` and `GroupingComponent` (respectively)
# each of the helper calls below requires. Casting once, to a type retrieved from a look-up,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the way to do this as written is to type narrow within the various if functions. e.g. on the first condition:

    if hash_basis == HashBasis.STACKTRACE:
        if not isinstance(
            contributing_component,
            StacktraceGroupingComponent
            | ExceptionGroupingComponent
            | ChainedExceptionGroupingComponent
            | ThreadsGroupingComponent,
        ):
            raise ValueError(
                "Contributing component is not a StacktraceGroupingComponent for stacktrace basis"
            )
        if not isinstance(contributing_variant, ComponentVariant):
            raise ValueError("Contributing variant is not a ComponentVariant for stacktrace basis")

        hashing_metadata = _get_stacktrace_hashing_metadata(
            contributing_variant, contributing_component
        )

I would argue for replacing conditional with polymorphism as an alternative, but understand that given existing grouping code, this way is easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah... something like this, probably?

class BaseHashBasis[
    VariantType: BaseVariant,
    ComponentType: BaseGroupingComponent[Any] | None,
    MetadataType: HashingMetadata,
]:
    name: str
    variants: dict[str, BaseVariant]
    contributing_variant: VariantType
    contributing_component: ComponentType | None

    def __init__(
        self,
        variants: dict[str, BaseVariant],
        contributing_variant: VariantType,
        contributing_component: ComponentType | None = None,
    ):
        self.variants = variants
        self.contributing_variant = contributing_variant
        self.contributing_component = contributing_component

        if contributing_component:
            self.subcomponents_by_id = {
                subcomponent.id: subcomponent for subcomponent in contributing_component.values
            }

    def get_hashing_metadata(self) -> HashingMetadata:
        raise NotImplementedError()


class TemplateHashBasis(
    BaseHashBasis[ComponentVariant, TemplateGroupingComponent, TemplateHashingMetadata]
):
    name: str = "template"

    def get_hashing_metadata(self):
        metadata: TemplateHashingMetadata = {}

        if self.subcomponents_by_id["filename"].values:
            metadata["template_name"] = self.subcomponents_by_id["filename"].values[0]
        if self.subcomponents_by_id["context-line"].values:
            metadata["template_context_line"] = self.subcomponents_by_id["context-line"].values[0]

        return metadata

def get_hash_basis_and_metadata(
    event: Event, project: Project, variants: dict[str, BaseVariant]
) -> tuple[HashBasis, HashingMetadata]:
    # Skipping some of the details below, but you get the idea
    contributing_variant = ...
    contributing_component = ...

    hash_basis, hash_basis_type = GROUPING_METHODS_BY_DESCRIPTION[method_description]

    hashing_metadata = hash_basis_type(
        variants, contributing_variant, contributing_component
    ).get_hashing_metadata()

    return hash_basis, hashing_metadata 

Would the typechecker be any happier with that, though? Specifically, wouldn't I still have to type contributing_variant and contributing_component as Any in order to pass them to the TemplateHashBasis/etc constructor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking moreso putting formatting methods on the variant / component classes, and aiming to remove hash-basis as its own enum if possible. i don't have time right now to go into details, so please feel free to ignore, or look at the type narrowing using isinstance. whatever's easiest and can help get this merged faster.

Copy link
Member

@JoshFerge JoshFerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR looks good. included one idea to fix some of the Any typing. Could in the future look to replace more of the conditional logic with methods on the various Variant and Component classes. I have not reviewed the snapshots, and will do so tomorrow to spot check outputs. it feels kind of weird that there's no regular tests of this, but i assume in grouping code we let snapshots cover things generally.

@lobsterkatie
Copy link
Member Author

lobsterkatie commented Nov 14, 2024

replace more of the conditional logic with methods on the various Variant and Component classes

The problem with putting it there is there's not a nice 1-on-1 correspondence between hash basis and either of those class families. This is from one of my attempts along the way:

VARIANT_AND_COMPONENT_TYPES_BY_HASH_BASIS = {
    HashBasis.STACKTRACE: (
        ComponentVariant,
        StacktraceGroupingComponent
        | ExceptionGroupingComponent
        | ChainedExceptionGroupingComponent
        | ThreadsGroupingComponent,
    ),
    HashBasis.MESSAGE: (
        ComponentVariant,
        MessageGroupingComponent | ExceptionGroupingComponent | ChainedExceptionGroupingComponent,
    ),
    HashBasis.FINGERPRINT: (
        CustomFingerprintVariant | SaltedComponentVariant,
        None,
    ),
    HashBasis.SECURITY_VIOLATION: (
        ComponentVariant,
        CSPGroupingComponent
        | ExpectCTGroupingComponent
        | ExpectStapleGroupingComponent
        | HPKPGroupingComponent,
    ),
    HashBasis.TEMPLATE: (ComponentVariant, TemplateGroupingComponent),
    HashBasis.CHECKSUM: (ChecksumVariant, None),
    HashBasis.FALLBACK: (BaseVariant, None),
}

You'd end up still needing conditionals, just in the various classes' methods, wouldn't you? Like, ComponentVariant would need to check to see which of stacktrace, message, violation, and template it was dealing with, and then we're back to where we started, no?

(Putting it on components has the same problem - ExceptionGroupingComponent, for example, would need to check if we're doing stacktrace or message hashing. Also, both CustomFingerprintVariant and SaltedCompoentVariant would need the code to get fingerprint metadata, so then does that one type of metadata-getter become a mixin, when none of the rest are? Any way you slice it, it's not exactly pretty...)

@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 9fa314a to d8bd096 Compare November 14, 2024 23:29
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from d8bd096 to 17b2bb9 Compare November 15, 2024 00:47
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 17b2bb9 to bb4227a Compare November 15, 2024 15:23
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from bb4227a to 6ff5709 Compare November 15, 2024 18:03
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from 6ff5709 to f777dc8 Compare November 15, 2024 19:19
@lobsterkatie lobsterkatie force-pushed the kmclb-add-hashing-metadata-types-and-field branch from f777dc8 to b9e323d Compare November 15, 2024 21:08
lobsterkatie added a commit that referenced this pull request Nov 15, 2024
…able (#80531)

This adds a new field, `hashing_metadata`, to the `GroupHashMetadata` table, to serve as a compliment to the `hash_basis` field added in #79835. 

Whereas that field stores the overall grouping method (stacktrace, message, custom fingerprint, etc.), this new field will store more detailed, grouping-method specific data. For example, when grouping on message, it will store whether or not the message was parameterized; when grouping on fingerprint, it will store the source of the fingerprint; when grouping on stacktrace, it will note whether the stacktrace was found in an exception or in a thread. (The full scope of the data stored for each grouping method can be found in the `XYZHashingMetadata` types added in this PR.)

Code to add data to this field is included in #80534.
Base automatically changed from kmclb-add-hashing-metadata-types-and-field to master November 15, 2024 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants