Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather statistics on HTTP rate limit errors #318

Open
janbuchar opened this issue Nov 7, 2024 · 6 comments
Open

Gather statistics on HTTP rate limit errors #318

janbuchar opened this issue Nov 7, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@janbuchar
Copy link
Contributor

This is necessary for apify/crawlee-python#60 to bring benefit to running crawlee on Apify

Rate limit errors should have status code 429 (please recheck)

@janbuchar janbuchar added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Nov 7, 2024
@vdusek
Copy link
Contributor

vdusek commented Nov 7, 2024

@Mantisus

@B4nan
Copy link
Member

B4nan commented Nov 7, 2024

This is handled on client level, not SDK. Client is making requests, and saving stats on the rate limits (yes, its about the 429 status codes). Crawlee then reads this from the client.

https://github.com/apify/apify-client-js/blob/master/src/statistics.ts#L18
https://github.com/apify/crawlee/blob/master/packages/core/src/autoscaling/snapshotter.ts#L383

The only thing that SDK is doing here is switching the storage client to the apify client on the platform, which I know you have a bit differently in the python versions.

@janbuchar
Copy link
Contributor Author

janbuchar commented Nov 7, 2024

In Python SDK, we wrap the client instead of using it directly as a StorageManager. Also, the Python API client does not seem to collect those statistics (please correct me if I'm wrong).

@B4nan
Copy link
Member

B4nan commented Nov 7, 2024

Also, the Python API client does not seem to collect those statistics (please correct me if I'm wrong).

Maybe, but that doesn't mean it shouldn't be implemented there, right? SDK is not making requests, the client is.

@janbuchar
Copy link
Contributor Author

Also, the Python API client does not seem to collect those statistics (please correct me if I'm wrong).

Maybe, but that doesn't mean it shouldn't be implemented there, right? SDK is not making requests, the client is.

Eh, the design philosophy of the javascript counterpart seems to suggest that, even though I'm not terribly happy about it. Having the client handle errors, retries and so on makes it kinda accumulate responsibilites that one (me at least) might not expect there.

Anyways, regardless of whether we implement rate limit statistics tracking here or in the client, we'll need to make modifications to the ApifyStorageClient which resides in the SDK codebase.

@B4nan
Copy link
Member

B4nan commented Nov 8, 2024

Having the client handle errors, retries and so on makes it kinda accumulate responsibilites that one (me at least) might not expect there.

But it's designed that way already, we cant just change everything now. Python client already handles retries, right? So it feels natural to track this there. I also don't like the idea of having things implemented differently here and in the JS version, it will make things harder to reason about, to understand, to support, to transfer knowledge and whatnot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

4 participants