Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Utility functions that interacts with the rules #40

Open
BurnzZ opened this issue May 3, 2022 · 2 comments
Open

Proposal: Utility functions that interacts with the rules #40

BurnzZ opened this issue May 3, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@BurnzZ
Copy link
Contributor

BurnzZ commented May 3, 2022

Background

Following the acceptance of #27, developers could now use URL patterns to declare which Page Objects would work on specific URL patterns (reference code).

Problem

For large code bases, there might be hundreds of Page Objects which in turn could also result in hundreds of OverrideRule created using the @handle_urls annotation.

This could be unwieldy especially when they're spread out across multiple different subpackages and submodules within a Page Object Project. A project could utilize other Page Objects from other external packages, leading to a deeper roots.

Moreover, overlapping rules (e.g. POs improving on older POs) could add another layer of complexity. It should be immediately clear which PO would be executed according to URL pattern and priority.

Idea

There should be some sort of collection of utility functions that could interact with the List[OverrideRule] from the registry. Suppose that we have:

from web_poet import default_registry, consume_modules

consume_modules(my_page_objects, some_other_project, another_project)
rules = default_registry.get_overrides()

We could then have something like:

from web_poet import rule_match

# Explore which OverrideRules are matches a given URL.
rule_match.find(rules, url="https://example.com/product/electronics?id=123")
# Returns: [OverrideRule_1, OverrideRule_2, OverrideRule_3, OverrideRule_4]

# It could also narrow down the search
rule_match.find(rules, url="https://example.com/product/electronics?id=123", overridden=ProductPage)
# Returns: [OverrideRule_2, OverrideRule_4]

# Finding the rules for a given set of criteria could result in multiple OverrideRules.
# This could be POs improving on older POs which could also improve on other POs.

# However, what we would ultimately want is the Final rule that has the highest priority
rule_match.final(rules, url="https://example.com/product/electronics?id=123", overridden=ProductPage)
# Returns: OverrideRule_2

This could help lead in creating test suites in projects that utilize other Page Object projects:

assert ImprovedProductPage == rule_match.final(
    rules, "https://example.com/product/electronics?id=123", overridden=ProductPage
).use

Other Notes:

  • I see that the rule_match.find() is quite similar to how the PageObjectRegistry.search_override() method behaves (reference).
    • Refactoring it to a function (instead of a method) could cover developer use cases wherein the List[OverrideRule] is not created by the default_registry (or some custom registry). For example, it could merely be a simple configuration file containing all of the List[OverrideRule] that is manually maintained.
    • However, in any case, the rule_match.find() that is explored above aims to have an actual URL instead of a Pattern (which PageObjectRegistry.search_overrides() has)
@BurnzZ BurnzZ added the enhancement New feature or request label May 3, 2022
@kmike
Copy link
Member

kmike commented May 6, 2022

I think that's a good idea, but probably it would make sense to wait a bit, when a real-world use case would pop up. Then we can think about how to help solving it.

@Gallaecio
Copy link
Member

For the stated issue, I wonder if an opt-in setting in scrapy-poet that enables logging a debug message indicating which page object is used for any given URL and requested output, and why, could do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants