Add experimental monitor mode to BasicCrawler #2692

ImBIOS · 2024-10-02T05:50:02Z

Add a new Monitor class to track and display time estimation and concurrency status in the CLI output at regular intervals.

Monitor Class:
- Add Monitor class in packages/core/src/monitor.ts.
- Include logic to write into the output and gather and calculate the monitor data.
BasicCrawler Integration:
- Import Monitor class in packages/basic-crawler/src/internals/basic-crawler.ts.
- Initialize and start the Monitor class in the run function.
- Ensure monitor output and log output are written on separate lines.
- Add monitor option to BasicCrawlerOptions interface.

Fixes apify#2680 Add a new Monitor class to track and display time estimation and concurrency status in the CLI output at regular intervals. * **Monitor Class**: - Add `Monitor` class in `packages/core/src/monitor.ts`. - Include logic to write into the output and gather and calculate the monitor data. * **BasicCrawler Integration**: - Import `Monitor` class in `packages/basic-crawler/src/internals/basic-crawler.ts`. - Initialize and start the `Monitor` class in the `run` function. - Ensure monitor output and `log` output are written on separate lines. - Add `monitor` option to `BasicCrawlerOptions` interface. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/apify/crawlee/issues/2680?shareId=XXXX-XXXX-XXXX-XXXX).

…tainability

B4nan

thanks for the PR!

few nits to begin with, i will try it out later today

packages/basic-crawler/src/internals/basic-crawler.ts

B4nan · 2024-10-02T07:14:15Z

packages/basic-crawler/src/internals/basic-crawler.ts

+     * If you encounter issues due to this change, please:
+     * - report it to us: https://github.com/apify/crawlee
+     * - set `requestLocking` to `false` in the `experiments` option of the crawler


please remove this part, its wrong. also this should not be included in experiments

B4nan · 2024-10-02T08:19:42Z

packages/basic-crawler/src/internals/basic-crawler.ts

@@ -904,11 +912,15 @@ export class BasicCrawler<Context extends CrawlingContext = BasicCrawlingContext
        this.events.on(EventType.MIGRATING, boundPauseOnMigration);
        this.events.on(EventType.ABORTING, boundPauseOnMigration);

+        const monitor = this.experiments.monitor ? new Monitor(this.stats, this.log) : null;


I would keep this conditional, but not inside experiments, we use those for unstable features, I don't forsee any actual problems with it here.

ImBIOS · 2024-10-02T08:42:45Z

@B4nan If you trying to run this code, You should expect something to go wrong because I'm not done yet (that's why I made this PR a draft), I should make the other log above this monitor like in puppeteer-cluster, so it's readable and tidy. I'm not implementing it yet. I'm not even running this code yet, I'm just writing it with imagination in GitHub (while testing their Beta Copilot Workspace to help me understand crawlee's codebase faster).

Not even writing a test yet, should I write a unit or e2e test for it? which one or both?

Co-authored-by: Martin Adámek <[email protected]>

B4nan · 2024-10-02T08:46:52Z

Yes, we need some tests, both works fine, but I am not sure now how to implement checks for the E2E tests really.

I'm not even running this code yet

That's a bit scary, opening a PR with code you haven't even tried...

You should expect something to go wrong because I'm not done yet (that's why I made this PR a draft)

We won't be merging this before it's ready, the fact that this is in progress PR is not relevant.

ImBIOS · 2024-10-02T08:53:22Z

That's a bit scary, opening a PR with code you haven't even tried...

Plz don't worry, look at this:

What can go wrong?

…tainability

…into add-monitor-mode

…tainability

B4nan · 2024-10-04T10:42:00Z

you will need to add some tests. maybe the e2e test could do the job here, we at least need to know it wont break anything, we can check how it works manually, but its important to verify no runtime failures can happen.

fnesveda added the t-tooling Issues with this label are in the ownership of the tooling team. label Oct 2, 2024

refactor: Refactor Monitor class to improve code readability and main…

c7c6a8b

…tainability

B4nan requested changes Oct 2, 2024

View reviewed changes

Update packages/basic-crawler/src/internals/basic-crawler.ts

9f669e4

Co-authored-by: Martin Adámek <[email protected]>

ImBIOS added 4 commits October 2, 2024 10:40

refactor: Refactor Monitor class to improve code readability and main…

8d48c6a

…tainability

Merge branch 'add-monitor-mode' of https://github.com/ImBIOS/crawlee …

b5e81cb

…into add-monitor-mode

refactor: Refactor Monitor class to improve code readability and main…

47e294f

…tainability

refactor: Refactor Monitor class to improve code readability and main…

d4ae373

…tainability

ImBIOS marked this pull request as ready for review October 2, 2024 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental monitor mode to BasicCrawler #2692

Add experimental monitor mode to BasicCrawler #2692

ImBIOS commented Oct 2, 2024 •

edited

Loading

B4nan left a comment

B4nan Oct 2, 2024

B4nan Oct 2, 2024

ImBIOS commented Oct 2, 2024 •

edited

Loading

B4nan commented Oct 2, 2024

ImBIOS commented Oct 2, 2024

B4nan commented Oct 4, 2024 •

edited

Loading

Add experimental monitor mode to BasicCrawler #2692

Are you sure you want to change the base?

Add experimental monitor mode to BasicCrawler #2692

Conversation

ImBIOS commented Oct 2, 2024 • edited Loading

B4nan left a comment

Choose a reason for hiding this comment

B4nan Oct 2, 2024

Choose a reason for hiding this comment

B4nan Oct 2, 2024

Choose a reason for hiding this comment

ImBIOS commented Oct 2, 2024 • edited Loading

B4nan commented Oct 2, 2024

ImBIOS commented Oct 2, 2024

B4nan commented Oct 4, 2024 • edited Loading

ImBIOS commented Oct 2, 2024 •

edited

Loading

ImBIOS commented Oct 2, 2024 •

edited

Loading

B4nan commented Oct 4, 2024 •

edited

Loading