Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Handle request list user input #326

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Pijukatel
Copy link
Contributor

@Pijukatel Pijukatel commented Nov 18, 2024

Add helper function to handle request list inputs.

Closes: #310

@github-actions github-actions bot added this to the 103rd sprint - Tooling team milestone Nov 18, 2024
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Nov 18, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

src/apify/_actor.py Outdated Show resolved Hide resolved
@Pijukatel Pijukatel changed the title Handle request list user input feat: Handle request list user input Nov 18, 2024
@Pijukatel Pijukatel marked this pull request as ready for review November 18, 2024 17:28
@vdusek vdusek self-requested a review November 19, 2024 09:36
pyproject.toml Outdated Show resolved Hide resolved
src/apify/storages/_known_actor_input_keys.py Outdated Show resolved Hide resolved
@janbuchar janbuchar self-requested a review November 19, 2024 09:50
src/apify/storages/_known_actor_input_keys.py Outdated Show resolved Hide resolved
src/apify/storages/_actor_inputs.py Outdated Show resolved Hide resolved
Finalize tests.
Split to its own file.
Fix typing issues

WIP
TODO: Finish test for it.
WIP
Add test for checking all genrated request properties.
Add helper class for input keys
Add top level Input class for handling actor inputs.
Add few more tests for regex
It had too many assumptions that users might not be interested in.
Users should create such Input helper classes based on their specific inputs and their names.
Update TCH001, TCH002, TCH003 uses.
@Pijukatel Pijukatel force-pushed the handle-request-list-user-input branch from c237c66 to 6ff9e90 Compare November 19, 2024 12:59
@Pijukatel
Copy link
Contributor Author

Sorry for the force-push. Due to my previous mainly Gerrit-based experience I was expecting this Rebase button in Github to do something more clean in the log...

@@ -141,6 +141,9 @@ indent-style = "space"
docstring-quotes = "double"
inline-quotes = "single"

[tool.ruff.lint.flake8-type-checking]
runtime-evaluated-base-classes = ["pydantic.BaseModel", "crawlee.configuration.Configuration"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though crawlee.configuration.Configuration inherits from pydantic_settings.BaseSettings and that one from pydantic.BaseModel, ruff has some issues in following inheritance hierarchy too far. So until that changes, some models will have to be explicitly mentioned even though they inherit from pydantic.BaseModel.
See closed issue where this is described as known limitation to certain degree:
astral-sh/ruff#8725

Copy link
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good! Just a bunch of nits to iron out...

url_input_adapter = TypeAdapter(list[Union[_RequestsFromUrlInput, _SimpleUrlInput]])


class RequestList(CrawleeRequestList):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a @docs_group decorator here with an appropriate group. Also, can you mention that this class also adds the open method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet available in version used in sdk. Added it as comment for now.

Comment on lines 21 to 29
class _RequestDetails(BaseModel):
method: HttpMethod = 'GET'
payload: str = ''
headers: dict[str, str] = Field(default_factory=dict)
user_data: dict[str, str] = Field(default_factory=dict, alias='userData')


class _RequestsFromUrlInput(_RequestDetails):
requests_from_url: str = Field(alias='requestsFromUrl')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for consistency - we usually use the headers: Annotated[dict[str, str], Field(default_factory=dict)] = {} way of writing pydantic models - it lets you use the default value the way it was originally intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I didn't know it works like this in Pydantic. I was actually trigger by it so much, I had to test, that it is not mutable default input :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it. It shouldn't even be necessary to have both default_factory and an actual default, but mypy seems to disagree.

src/apify/storages/request_list.py Outdated Show resolved Hide resolved
src/apify/storages/request_list.py Outdated Show resolved Hide resolved
src/apify/storages/request_list.py Outdated Show resolved Hide resolved
Comment on lines +47 to +51
expected_user_data = UserData()
if 'userData' in optional_input:
for key, value in optional_input['userData'].items():
expected_user_data[key] = value
assert request.user_data == expected_user_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the copy of optional_input['userData'] necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not exactly copy. It is transformation to UserData and that object does not have any convenient dict based constructor. It would be nice to do just UserData(optional_input['userData'])

Comment on lines 99 to 101
with mock.patch.object(
http_client, 'send_request', return_value=_create_dummy_response(mocked_read_outputs)
) as mocked_send_request:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, you could either take advantage of respx or just make a mock implementation of BaseHttpClient (probably with unittest.mock)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to go with mocking, then please mock BaseHttpClient instead of a concrete implementation. I have a feeling that using respx could result in a more readable code - did you try it?

src/apify/storages/request_list.py Outdated Show resolved Hide resolved
Comment on lines +54 to +56
name is name of the returned RequestList
request_list_sources_input can contain list dicts with either url or requestsFromUrl key
http_client is client that will be used to send get request to url defined in requestsFromUrl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the same docblock style as we do in the rest of the code please.

http_client is client that will be used to send get request to url defined in requestsFromUrl

Example request_list_sources_input:
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not http_client:
http_client = HttpxHttpClient()

ulr_inputs = url_input_adapter.validate_python(request_list_sources_input) # instance of list[Union[...]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tpyo, plus the comment was just meant for you, it's not that useful in the code.

Comment on lines 99 to 101
with mock.patch.object(
http_client, 'send_request', return_value=_create_dummy_response(mocked_read_outputs)
) as mocked_send_request:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to go with mocking, then please mock BaseHttpClient instead of a concrete implementation. I have a feeling that using respx could result in a more readable code - did you try it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle request list and proxy configuration inputs in a user-friendly way
4 participants